Use case 9 min read2026-06-07

How to transcribe interviews for journalism (accurate quotes, fast turnaround)

A practical guide for journalists and reporters on transcribing recorded interviews. Covers accuracy for direct quotes, source confidentiality, phone audio, and getting usable text before deadline.

Journalists spend roughly four hours transcribing for every one hour of recorded interview. For a reporter on deadline, that ratio is not sustainable.

AI transcription cuts it to 15 to 25 minutes total for a 30-minute interview: 2 to 3 minutes for processing, 15 to 20 minutes to review and correct quotes. That's a real time saving, not a rounding error.

This covers how to use AI transcription in a journalism workflow, what to watch for with accuracy, and how to handle source confidentiality.

The journalism transcription problem

Manual transcription from a recorded interview runs at roughly a 4:1 ratio. Thirty minutes of audio takes two hours to transcribe accurately, including pausing, rewinding, and verifying quotes. For longer interviews, the math gets worse.

The accuracy requirement for journalism is also higher than most other use cases. A garbled word in a meeting transcript is an inconvenience. A garbled word in a direct quote that goes to print is a correction or a legal problem.

AI transcription doesn't eliminate review. It eliminates the first 80% of the work so you can focus on verifying the quotes that matter.

What AI transcription gets right and what it gets wrong

Gets right:

Continuous speech from clear speakers
Common proper nouns (major cities, well-known names, major institutions)
Conversational language and standard phrasing
Timestamps for every segment, useful for locating specific quotes in the recording

Frequently gets wrong:

Unusual names (sources' last names, local officials, niche organizations)
Technical vocabulary specific to the story's beat
Heavily accented speech
Crosstalk, interruptions, or overlapping voices
Quiet passages, phone audio, recordings made in noisy environments

For journalism, the practical workflow is: run the AI transcript, then manually verify every direct quote you plan to use. The AI saves time on everything around the quotes; the quotes themselves still need a human ear.

How to transcribe a recorded interview

Step 1: Export your recording. Voice memo apps save as M4A. Field recorders typically produce WAV or MP3. Zoom and Teams produce MP4. All of these work.

Step 2: Upload to TranscribTxt. Go to transcribtxt.com, upload the file, select the language. For English interviews, auto-detect works reliably. For interviews conducted in another language, select it explicitly.

Step 3: Wait for processing. A 30-minute interview takes 2 to 3 minutes. A 90-minute interview takes roughly 7 to 10 minutes.

Step 4: Download the transcript. The free plan exports plain text. The Pro plan ($12/month) adds SRT format with timestamps, which lets you jump directly to a specific moment in the recording when you need to verify a quote.

Step 5: Review quotes before use. Go through each direct quote you plan to use. Play that segment in the recording. Confirm the words match exactly. Correct any AI errors.

The audio file is deleted from TranscribTxt's servers as soon as the transcript is ready.

Source confidentiality

Interviews with sources who spoke on background, off the record, or under confidentiality agreements introduce an additional consideration: the recording contains information you've agreed to protect.

Before uploading any sensitive recording to a cloud transcription service, check what happens to the file after processing.

TranscribTxt deletes audio files immediately after the transcript is generated. The transcript text stays in your account; the audio does not persist.

OpenAI Whisper running locally is the maximum-security option. The recording never leaves your machine. Setup requires Python and roughly 10 minutes. It's slower than cloud services without a GPU but produces accurate results and generates no external data transfer.

Avoid services whose terms of service include language about retaining audio for "service improvement" or "model training." That means your source's voice and words are stored on an external server.

Working with phone interviews

Phone audio introduces specific accuracy challenges. Recordings made through a carrier line have compressed audio, which increases transcription errors. Recordings made through speakerphone or by holding a phone near a recorder are worse still.

On typical phone audio, expect accuracy in the 88 to 94% range rather than the 95 to 98% you'd get from a direct recording. The number of corrections goes up, but the time saved over manual transcription still holds.

If you regularly interview sources by phone, a direct recording method makes a material difference. Most smartphones support call recording through third-party apps that capture both sides of the call cleanly.

Multilingual interviews

For interviews conducted in languages other than English, most AI transcription tools struggle. Otter.ai is primarily English. Descript is English-first.

TranscribTxt supports 99 languages. If you're working with a source in Spanish, French, Arabic, Mandarin, or any other major language, upload the file with the language selected explicitly rather than relying on auto-detect.

For interviews where a source switches between two languages mid-sentence, results are mixed. Code-switching is a known challenge for current speech models.

Speaker labels for two-person interviews

Standard interview transcripts run as a continuous block of text, which requires manual work to attribute speech to interviewer and source.

Speaker diarization on the Pro plan ($12/month) and Business plan ($29/month) labels each block of speech as Speaker 1, Speaker 2, which you rename in the downloaded file. For an hour-long interview, this saves meaningful editing time. (For a fuller explanation of how this works, see what speaker diarization is.)

Diarization accuracy is highest with two clearly distinct voices recorded on clean audio. If the interviewer and source have similar voices or the recording has significant background noise, some blocks will be misattributed and need manual correction.

Practical comparison for journalists

	TranscribTxt	Manual transcription	Human service (Rev)
Time for 30-min interview	20-25 min total	2 hours	24-48 hours
Cost	Free (5 files/mo) or $12/mo	Your time	$45 for 30 min ($1.50/min)
Accuracy	92-98%	100%	99%+
Quote verification needed	Yes	No	Recommended
Source audio retained	No (deleted after)	N/A	Depends on service
Languages	99	Any	30+ (AI), limited (human)

For most daily journalism work, AI transcription at the free or Pro tier covers the workflow. Human transcription from a service like Rev makes sense for high-stakes material where every word needs to be right the first time, without review.

Getting started

The free plan at transcribtxt.com covers 5 files per month with no card required. For reporters who transcribe multiple interviews a week, Pro at $12/month gives 1,200 minutes, roughly 20 hours of recorded interviews, with SRT timestamps and speaker labels included.

Related reading: how to transcribe an interview (step by step), the best transcription software for researchers, and how to transcribe a focus group for multi-person recordings.

Frequently Asked Questions

How do journalists transcribe interviews?

Most journalists now record the interview, then run the audio through an AI transcription tool and review the output for quotes. A 30-minute interview transcribes in 2 to 3 minutes and needs 15 to 20 minutes of review. Before AI, reporters either transcribed manually at a 4:1 time ratio or paid a human service with a 1 to 2 day turnaround.

What is the best interview transcription software for journalists?

For accuracy-to-price, TranscribTxt (ElevenLabs Scribe) handles uploaded interview files in 99 languages with speaker labels on Pro and Business, deletes audio after transcription, and starts free. For human-level accuracy on hard audio, Rev's human transcription at about $1.50/minute. For maximum source confidentiality, OpenAI Whisper running locally so the recording never leaves your machine.

What is the fastest way to transcribe a recorded interview?

Upload the audio or video file to an AI transcription tool. A 30-minute interview takes roughly 2 to 3 minutes to process. Reviewing and correcting the output for a 30-minute interview takes another 15 to 20 minutes. Total time: under 25 minutes from recording to clean transcript, compared to 2 hours of manual transcription.

How accurate is AI transcription for journalistic interviews?

On clean audio with a native English speaker, accuracy is typically 95 to 98%. The most common errors are proper nouns, unusual names, and technical terms specific to the story. These need manual review before using any quote in print. On phone audio or recordings with background noise, accuracy drops to 88 to 94%.

Is AI transcription safe to use with confidential sources?

It depends on the tool. Some services retain uploaded audio for model training. TranscribTxt deletes audio files from its servers immediately after the transcript is generated. If source confidentiality is a concern, check a tool's data retention policy before uploading anything sensitive. For maximum security, OpenAI Whisper running locally never sends data to any server.

Can I transcribe a phone interview recording?

Yes, but phone audio produces more errors than studio-quality recordings. Typical accuracy on phone audio is 88 to 94%. The AI handles it, but expect more corrections, particularly for names and quiet passages. A recording made through the phone speaker rather than a direct line produces worse results.

Does TranscribTxt support speaker labels for two-person interviews?

Speaker diarization (labeling who said what) is available on the Pro plan ($12/month) and Business plan ($29/month). It identifies speakers as Speaker 1, Speaker 2, which you rename in the downloaded file. For one-on-one interviews, this significantly speeds up the editing process. The feature works best with two clearly distinct voices recorded cleanly.

Back to all guides