AI IntegrationJuly 2025

Filipino Speech Coach

Role

Backend and AI integration developer

Tech stack

Next.jsFastAPIAzure VMWhisperXSupabaseCloudflare R2

Overview

A cloud-native speech training application that transmits user audio to a FastAPI backend hosted on an Azure VM. Utilizing the WhisperX model for character-level alignment and phoneme transcription. Metadata and transaction logs are stored in Supabase, while audio recordings and public assets are cached and served via Cloudflare R2 bucket storage.

Key Contributions & Findings

01Integrated WhisperX speech-to-text with phoneme alignment, obtaining character-level timing paired with acoustic analysis for lexical stress analysis.
02Deployed a FastAPI server in an Azure VM
03Utilized Supabase for real-time user database sync, coupled with Cloudflare R2 bucket storage to achieve fast and responsive asset loading.
04Implemented Spaced Repetition scheduling logic to optimize user retention and pronunciation improvement over time

Infrastructure

System Architecture

A cloud-native machine learning pipeline. Audio payloads are sent from the React frontend via REST API endpoints to a FastAPI server hosted on an Azure VM, which executes phoneme-level WhisperX inference. User scores and metadata are stored in Supabase, while raw audio files are persisted securely in Cloudflare R2 bucket storage.