Adds speech-to-text (STT) functionality by allowing users to upload audio clips, start transcription jobs, and download transcriptions.
Introduces new API endpoints for STT upload, start, and download.
Also, converts AudioClip to WAV byte array.
Improves UI element handling by encapsulating UI logic.
Enhances state management for session activity and input
enablement through properties. Introduces event handlers for button
clicks to decouple UI interactions.