How to Transcribe an Interview (Fast, Free, Accurate)
A practical guide to transcribing interviews in minutes — upload audio or video to AI, convert Zoom or Teams recordings, handle phone interviews, and export clean transcripts.
Manual transcription of a one-hour interview takes between four and six hours. AI transcription takes four to six minutes. That gap is why almost every working journalist, researcher, and content creator has switched — the question is just how to set it up correctly the first time.
This guide covers three common interview scenarios, what makes transcription accurate or inaccurate, and how to get the most useful output from your transcript.
Why transcription accuracy varies (and what you control)
Before the method, it helps to understand the one factor that matters most: audio quality. AI transcription models — including the Whisper-class models that power most tools today — convert speech to text based on acoustic signal. A clear recording with one speaker close to the microphone transcribes at 97–99% accuracy. A phone call recorded through a laptop speaker in a reverberant room transcribes at 82–88%.
You can't fix bad audio after the fact, but you can make better recording decisions before the interview. More on that in the tips section.
Method 1: Upload an audio or video file to TranscribTxt
This is the baseline workflow and works for any file you already have on your computer.
Step 1: Go to TranscribTxt and upload your file.
Drag the file into the upload zone or click to browse. No account is required on the free plan. Supported formats include MP3, MP4, WAV, M4A, MOV, and WEBM. Files up to 2 GB are accepted, which covers around 20+ hours of audio at standard quality.
The file is transferred over HTTPS directly to the processing server.
Step 2: Select the interview language.
Use the language dropdown. TranscribTxt supports 99 languages including English, Spanish, French, German, Portuguese, Russian, Arabic, Japanese, Chinese (Mandarin), Korean, Hindi, and many more. If you are unsure of the dominant language, use Auto-detect — the model identifies it from the first 30 seconds.
Step 3: Wait for processing.
A 45-minute interview file typically processes in 2–4 minutes on GPU servers. You will see a progress indicator during upload and a separate indicator during transcription.
Step 4: Review and export.
Once processing is complete, the transcript appears on screen. Use the Copy button to paste directly into a document, or click Download .txt to save the file.
Your uploaded file is deleted from the server as soon as the transcript is ready. TranscribTxt does not retain audio or video files.
Pro plan users ($12/month) also get Download .srt — a subtitle file with timestamps, useful for adding captions to a video version of the interview. Business plan users ($29/month) get JSON export with word-level timestamps and speaker diarization (automated labeling of who said what).
Method 2: Zoom or Teams recorded call
Remote interviews recorded through Zoom, Microsoft Teams, or Google Meet produce a video file you upload exactly like any other.
Getting the file from Zoom:
Local recordings save automatically to Documents > Zoom > [Meeting name] on your computer. The file is named zoom_0.mp4. If you recorded to the Zoom cloud instead, log in at zoom.us, go to Recordings, and download the MP4.
Getting the file from Microsoft Teams:
Teams saves cloud recordings to SharePoint or OneDrive (check with your IT team for the exact location). From the meeting chat, click the recording thumbnail and choose Download.
Getting the file from Google Meet:
Recordings save to the organizer's Google Drive under a folder called "Meet Recordings." Download the MP4 from there.
Once you have the file, upload it to TranscribTxt the same way as any other video. Processing time for a typical 60-minute meeting recording is 3–6 minutes.
One note: recorded video calls often have audio quality issues that reduce accuracy — participants on weak internet connections, people speaking while muted, room echo from laptop speakers. Expect 90–95% accuracy rather than the 97–99% you would get from a dedicated microphone recording.
Method 3: In-person interview recorded on a phone
Phone recordings are the most common interview scenario for journalists and researchers working in the field.
iPhone: The Voice Memos app saves recordings as M4A files. After the interview, open the Memos app, tap the recording, tap the three-dot menu, and select Share. AirDrop it to your Mac or email it to yourself.
Android: The Voice Recorder or Recorder app (varies by manufacturer) saves as MP3 or M4A. Use Files or Google Drive to transfer to your computer.
Once the file is on your computer, upload it to TranscribTxt. M4A and MP3 both work without any conversion.
Accuracy tip for phone recordings: The placement of the phone during recording makes the biggest difference. A phone lying flat on a table between two speakers picks up both voices reasonably well, but the person sitting farther from the phone will be quieter in the recording. If the imbalance is significant, the AI model may struggle with the quieter voice. A better setup: hold the phone close to each speaker as they talk, or use a small portable recorder with dual microphone inputs.
Tips for better transcription accuracy
Record one speaker at a time. Overlapping speech is where every AI model makes the most errors. Brief interruptions are fine; sustained crosstalk produces inaccurate output. If you are conducting the interview, train yourself to wait a beat before responding.
Use a microphone, not a laptop speaker. Even an $30 USB microphone placed 30–40 cm from the speaker produces significantly cleaner audio than a laptop's built-in mic from across a desk.
Avoid background noise. Coffee shops, open offices, and outdoor locations introduce noise that reduces accuracy. A quiet room with soft furnishings — carpet, curtains, upholstered chairs — absorbs echo and produces noticeably cleaner recordings.
State names and technical terms clearly. AI models make the most errors on proper nouns: names, places, organization names, technical jargon. If your interview involves terms the model is unlikely to have encountered, enunciate them clearly and plan to review them manually.
Review at speed. Use VLC media player set to 0.75x speed and follow along with the transcript. Read the AI output and correct as you go. For a 45-minute interview, this review pass takes about 15 minutes. Focus your attention on proper nouns and anything that looks phonetically plausible but wrong.
Exporting: which format to use
Plain text (.txt) is the right choice for most interview workflows: notes, quotes for articles, qualitative coding, meeting summaries. It is clean, editable in any text editor or word processor, and easy to paste into Claude or ChatGPT for summarization. Available on the free plan.
SRT subtitles (.srt) add timestamps to every line of text in a format video editors understand. If your interview will be published as a video — on YouTube, as part of a documentary, or as a social media clip — SRT captions make the video accessible and improve retention. SRT export is available on the Pro plan ($12/month). You upload the .srt file directly to YouTube Studio or drag it into your timeline in Premiere Pro or DaVinci Resolve.
JSON with speaker labels is the format to use if you are building anything programmatic — a searchable archive of interviews, a transcript review tool, or an integration with a research database. The JSON output includes word-level timestamps and speaker identifiers. This is a Business plan feature ($29/month), as is the speaker diarization that makes it useful.
Common mistakes to avoid
Trusting the output without review. AI transcription at 97% accuracy still means roughly three errors per 100 words. In a 5,000-word transcript, that is 150 errors. Most are minor (a word substitution or punctuation issue), but a misheard name or number can create a meaningful factual error if you quote it directly.
Skipping the language selection. Auto-detect works well but occasionally misidentifies languages when the first 30 seconds of audio is quiet or has significant ambient noise. Set the language explicitly when you know it.
Recording both sides through a phone call on speaker. Phone audio recorded through a laptop or phone speaker introduces significant audio degradation — the signal has already been compressed once by the phone network and then picked up again by a microphone in the room. Whenever possible, record phone interviews using a call recording app rather than holding your phone on speaker.
The transcript itself is only as good as the recording. Get the recording right, and the rest of the workflow takes under ten minutes from upload to clean text file.
Frequently Asked Questions
How long does it take to transcribe a 60-minute interview?
AI transcription tools process a 60-minute interview in roughly 3–5 minutes. After that, budget 15–20 minutes for a review pass to fix proper nouns and technical terms. The total time is under 30 minutes — compared to 4–6 hours of manual typing.
Is TranscribTxt free for interview transcription?
Yes. The free plan includes 60 minutes of transcription per month with no account required. That covers one standard 60-minute interview at no cost. Export is plain TXT on the free plan.
Can I transcribe an interview recorded on my phone?
Yes. Record using your phone's built-in Voice Memos app (iPhone) or Voice Recorder (Android), then transfer the M4A or MP3 file to your computer and upload it to TranscribTxt. Most phone recordings transcribe at 93–97% accuracy in a quiet room.
Does TranscribTxt support speaker labels (who said what)?
Speaker diarization — the feature that labels which speaker said each line — is available on the Business plan ($29/month). The free and Pro plans produce a continuous transcript without speaker attribution.
What file formats can I upload for interview transcription?
TranscribTxt accepts MP4, MOV, AVI, MKV, WEBM, MP3, WAV, and M4A. Both audio-only and video files work. The maximum file size is 2 GB, which covers several hours of audio at standard quality.