AI IntegrationJuly 2025
Filipino Speech Coach
Role
Backend and AI integration developer
Tech stack
Next.jsFastAPIAzure VMWhisperXSupabaseCloudflare R2


Overview
A cloud-native speech training application that transmits user audio to a FastAPI backend hosted on an Azure VM. Utilizing the WhisperX model for character-level alignment and phoneme transcription. Metadata and transaction logs are stored in Supabase, while audio recordings and public assets are cached and served via Cloudflare R2 bucket storage.
Key Contributions & Findings
- 01Integrated WhisperX speech-to-text with phoneme alignment, obtaining character-level timing paired with acoustic analysis for lexical stress analysis.
- 02Deployed a FastAPI server in an Azure VM
- 03Utilized Supabase for real-time user database sync, coupled with Cloudflare R2 bucket storage to achieve fast and responsive asset loading.
- 04Implemented Spaced Repetition scheduling logic to optimize user retention and pronunciation improvement over time
Infrastructure
System Architecture
A cloud-native machine learning pipeline. Audio payloads are sent from the React frontend via REST API endpoints to a FastAPI server hosted on an Azure VM, which executes phoneme-level WhisperX inference. User scores and metadata are stored in Supabase, while raw audio files are persisted securely in Cloudflare R2 bucket storage.