Real-time speech to text transcription

Build an iOS app that lets you transcribe and play back an audio recording. It includes Bonus features as well.

Example UX Flow

Simple and accessible UI with 2 buttons (record & play) and 1 text box.
User's flow:
- Click "Record", the button turns into "Stop recording"
- Speak
- As the user speaks, transcribe the speech into the text box
- Once done, click "Stop recording"
- Click "Play" to hear your recording back
You can use the Google's Speech to Text API for transcribing the audio recording. It provides timestamps for the transcribed words. You can use them to highlight the words as they are spoken back.

Modularity in your code, a clear and concise separation of concerns.
Declarative code - since this is the paradigm Apple is shifting to Combine, we want our engineers to be equipped to adopt the latest technologies. We leverage RxSwift at Speechify.
Minimum boilerplate. We understand that there are many elegant ways (involving lots of protocols and abstraction) to describe middleware, UI and the connections in between - but our codebase is built for speedy development and rapid iteration.

Use a modern reactive programming framework, such as RxSwift or Combine (i.e. demonstrate a practical understanding of the reactive programming paradigm). We use RxSwift + MVVM at Speechify.
Accessibility support for the UI.
Write unit tests. We love tests at Speechify!
Highlighted the transcribed text word by word as you play the audio recording.
Duck the audio in case the user starts playing music (from Spotify, Apple Music) while the recording is going.