RSS

Monthly Archives: November 2021

Automatic transcription – some real-world case studies 2: automatic closed-captioning

There has been a lot of talk lately about automatic transcription, or AI (artificial intelligence) transcription. This includes speech-to-text software and means that a transcription of your voice or recording is made automatically rather than by a human. I’ve recently experienced working with clients who use voice-to-text software and receiving an automatic closed-captioning file from a video meeting platform, and I’m taking the opportunity to share my experiences with both. Last time, I looked at features to look out for with speech-to-text software and this time I’m talking about automatic closed-captioning.

Using closed-captioning to create live subtitles and texts

A client for whom I’m transcribing focus groups (so discussion groups of several people with one facilitator) had one group that included a participant living with a hearing impairment. They turned on the closed-captioning feature in the video meeting platform they were using, so that the participant could read what the other participants were saying. As it recorded everyone’s speech in real time and then generated a text afterwards, my client sent it to me to see if I’d find it useful.

As I’ve been thinking about offering an automatic transcription editing service next to my full transcription service, I was really interested in seeing how this worked.

What does real-time closed-captioning or automatic transcription look like?

In my opinion, automatic real-time closed-captioning is not there yet in terms of generating a good, usable transcription. Here are the downfalls I noticed in the tape (you’ll notice some of these if you turn on the subtitles on the news, etc. – which are very rarely produced by humans these days).

  • Time stamps were added every few seconds which is great for some clients but my focus group transcription clients usually only want it every ten minutes.
  • There was no differentiation of speakers, although new utterances were usually started on a new line (this could be a new utterance by a new speaker or a new utterance by the same speaker).
  • If two people spoke at once the speech was jumbled.
  • Even captioning of the slowest, clearest and most “accentless” (Received Pronunciation) speaker was full of errors including homophones, missed words and repeated words.
  • If someone had an accent (regional or English as an additional language), it pretty well failed to cope at all.
  • If someone spoke quickly, it pretty well failed to cope at all.
  • Ums and ers were not recorded, which is understandable in terms of a participant needing to know what the others were saying, but is not useful when your client has requested a full verbatim transcription (see my article on the types of transcription here).

In summary, the transcription produced for this session by the closed-captioning software would not have been of any use to the researcher without extensive editing.

I have also had a look at the automatic transcription on various video playing platforms such as YouTube and the same issues have appeared there, too.

Is it quicker to edit an automatically generated transcription than to transcribe it from scratch?

With this particular client, while the participants varied over the groups, I had transcribed a fair few groups and had an idea of how many audio minutes I was transcribing per hour. It’s also worth noting that I’m experienced in editing other people’s transcriptions, as I used to be the go-to transcriber for tricky sessions at a big worldwide conference.

Bearing those points in mind, using the closed-caption transcript and editing it to the same standard as one I had done from scratch took exactly the same time as transcribing it from scratch would have taken! There was less actual straight typing, but more mouse work and clicking, so I don’t think it saved me much risk of RSI, either.

I will keep looking at this issue over the next few years, as automatic closed-captioning and the transcripts it produces are bound to improve with improved technology and voice recognition.


In this article, I have discussed the use of automatic closed-captioning and whether it can be used to generate transcripts that replace or can be used as a basis for human transcription.

If you have experience of using automatic closed-captioning, particularly in languages other than English, please comment with anything else you’ve noticed that it would be useful for people to know.

Other relevant articles on this website

Automatic transcription – some real-world case studies 1: voice-to-text software

Why you need to be human to produce a good transcription

Using a transcription app rather than a human transcriber – pros and cons

What are the types of transcription?

What information does my transcriber need?

How to be a good transcription customer

How long does transcription take?

Recording and sending audio files for researchers and journalists

How to get into transcribing as a job

The technology transcribers use

 
2 Comments

Posted by on November 22, 2021 in Skillset, Transcription

 

Tags: ,

Automatic transcription – some real-world case studies 1: voice-to-text software

There’s a lot of talk about automatic transcription, or AI (artificial intelligence) transcription, also known as voice-to-text software. This means that a transcription of your voice or recording is made automatically rather than with human input. What’s the state-of-art of this at the moment? I’ve recently experienced working with both a client using voice-to-text software to generate text that I edit and receiving an automatic closed-captioning file from a video meeting platform, so I thought it would be a good opportunity to share my experiences with both. This article looks at speech-to-text software and the next one will examine closed-captioning.

Using voice-to-text software to create text documents

A couple of my regular editing clients use voice-to-text software to create documents which they send to me for me to edit. I have also worked with a number of students who, because they live with a visual impairment or a physical issue (for example, RSI that makes typing painful and difficult), have used this method to generate sometimes very long documents.

What common features of voice-to-text-generated documents can an editor look out for?

Here’s what I’ve noticed about what documents look like when the client is using voice-to-text software

  • The outcome is a lot more accurate when using more sophisticated voice-to-text software that can “learn” the speaker’s voice, rather than out-of-the-box, one-size-fits-all software.
  • The outcome is also a lot more accurate and able to cope with “standard” (Received Pronunciation) slowly and clearly spoken English (in this case; I’m guessing it’s the same with other languages but would love to know for sure). The software can struggle with accents and fast speakers.
  • The most common issues with voice-to-text software are
    • Homophones – the software doesn’t know which spelling the speaker wants to use out of two alternatives that sound the same – bear/bare, which/witch, etc. This is really common and can lead to some very odd sentences and potential embarrassing issues. Note that these can’t be spotted by having the software read the text back to the speaker, as the words sound the same.
    • Added words – the software registers two separate words when there’s only one: “repeated distractions” becomes “repeat and distractions”.
    • Missed words and parts of words – if the speaker speaks quickly and skips over short words or swallows the middle of words, they might not register in the software: “paddle boards” becomes “pad boards”; “fruit and nut” becomes “fruited nut” or “fruit nut”.
    • Missed punctuation – this usually has to be spoken in in a set formula by the speaker. If they don’t do that, the punctuation won’t be there.

These issues are quite different from the usual ones met in editing people’s texts, whether their English is their first or additional language. Just as particular Language 1s will bleed through into writers’ other languages (as an L1 English speaker, I am likely to put French and Spanish sentences into an incorrect English word order, for example), dictated English has its own little oddities and patterns that you need to look out for.

How can the speaker and editor combat issues with speech-to-text documents?

There are a few things the speaker and then the editor can do to mitigate these issues.

  • The speaker could speak slowly and clearly, enunciating all the words and their endings and putting the punctuation in as required.
  • If there is an option to “teach” the software the speaker’s voice, I recommend doing that for optimum results.
  • Always have someone check a speech-to-text-generated text.
  • The speaker/client could let the editor know that they’ve used such software, so the editor can be on high alert for the features listed above (remember that Spellcheck won’t necessarily notice correctly spelled homophones).
  • The editor could watch for oddly worded sentences as well as the grammar / spelling / punctuation issues they usually look out for.

In this article, I have discussed voice-to-text software that is sometimes used to generate documents, what the client/speaker can do to make sure the text they generate is as accurate as possible and what the editor of such documents can look out for.

If you have experience of using speech-to-text software, particularly in languages other than English, please comment with anything else you’ve noticed that it would be useful for people to know.

A friend talks about this issue with regard to an interview she conducted – read about her experience in this guest post.

Next time, I’ll talk about my experience of automatic closed-captioning on a video meeting platform.

Other relevant articles on this website

Automatic transcription – some real-world case studies 2: automatic closed-captioning (coming soon!)

Why you need to be human to produce a good transcription

Using a transcription app rather than a human transcriber – pros and cons

What are the types of transcription?

What information does my transcriber need?

How to be a good transcription customer

How long does transcription take?

Recording and sending audio files for researchers and journalists

How to get into transcribing as a job

The technology transcribers use

 
4 Comments

Posted by on November 8, 2021 in Skillset, Transcription

 

Tags: ,