Case

Automatic live subtitles with Web Captioner

In our quest to develop cost efficient methods that increase the accessibility for live video productions, we wanted to do an experiment with autogenerated live subtitles. We found Web Captioner, an innovative web application that listens to spoken words and in real-time automatically display them as text in a web browser. Luckily the Swedish Standards Institute had earlier asked us for a quotation to live stream a breakfast seminar concerning a new European standard for Design for all. They accepted our suggestion to autocaption their seminar. It turned out to be a valuable opportunity that demonstrates new technology as well as a ground for creative discussions about potential uses and obstacles for automated web subtitling.

The video here is clip from the actual seminar (in Swedish), and illustrates how it looked like in the live stream when Richard Gatarski made a short presentation about Web Captioner.

Before we explain what and how we did, let us illustrate how important needs can be solved from bottom up – or if you like, “by a grass root movement”. This thanks to new technology, voluntary efforts, innovative approaches, and global cooperation.

Web Captioner, and how we found it

Web Captioner is developed by Curt Grimes, an application Developer at Northern Illinois University. In May 2016 a small church, where he work as a volunteer, welcomed a new member who was deaf. The church started to search for a way to make their services accessible to the newcomer, As a small community church they have limited resources, and must base their efforts on voluntary work. Hiring professional stenographers was out of question. Instead they first tried using Dragon NaturallySpeaking, a popular speech recognition package. But that did not work out very well. A richer description of the church’s efforts can be found in Curt’s blog post Why I started Web Captioner. What follows here is just a brief summary from our perspective.

In early 2017 Curt learned about W3C’s Web Speech API Specification that was completed in 2012. Since he works as a professional web developer Curt thought of giving Web Speech a try. He begun by exploring Google’s Web Speech API Demonstration. His idea was to build a web page that listens to spoken words, and in real-time type them out as text on that page.

In March 2017 Curt had a prototype running and posted about it in the Facebook group Visual Church Media (26,000+ members). The response by churches was very positive, but rather moderate in scope. In June Curt gave his application its official name (Web Captioner) and “slapped some branding on it”. June 24 he started the Web Captioner Users Group on Facebook.

Now Web Captioner took off, very much helped by people like John Taylor who on June 26 wrote “I’m pretty pumped about it! Webcaptioner.com live captioning playback with incredible accuracy! Totally trying this live this Sunday!”.

The same day Dave Edwards demonstrated Web Captioner in the live show The World of Live Streaming (WOLS). That inspired George Price, a member of the WOLS show team, to start the discussion Automatic Closed Captions in the vMix user’s forum. And that’s how we learned about Web Captioner.

Even though we were extremely busy with other productions at that time, we started to participate in the discussions. July 3 Curt blogged “Web Captioner’s First Week: 50 Countries and Over 92,000 Words”.

Feedback from users all over the world quickly resulted in many new features, including the option to save text recordings (July 8), and support for 40 languages (July 20). Other requests were made, including automatic translation between languages; being able to publish the text on public web pages; and caption formatting (e.g. in SubRip-format).

We did a few tests, using live recordings from some of our earlier productions, and reported our findings. July 28, in connection with a major political news story in Sweden, we published a short video discussion that featured Web Captioner. It was about IT, live news captioning, politics, and source criticism (in Swedish IT, ‘Textning, Politik och Källkritik). The idea behind the video was driven by the notion that Web Captioner did a (sort of) better job than the broadcasting channel’s human stenographers.

August 15, after our vacation (which is long in Sweden ;), we asked Curt Grimes about the possibility to feed Web Captioner’s output directly into our production software (vMix). That is, supplementing the text page with API interaction with a live video mixing application. Curt found that very interesting, and started to work towards a solution. A few days later he offered interested users to try a closed beta version with vMix support. As of today (October 3) it works, but a few issues needs to be sorted out. Remember, Curt is doing all this on his spare time. We all appreciate each and every step forward that he takes to help this growing community of users and fans.

WeStreamU’s previous accessibility experience

The story of our accessibility development efforts is rather long, so here we’ll just mention a few key points related to the production at hand,

It started back in 2009 with a request to supplement a traditional live stream from a seminar with live captions and sign interpretation. The request came from Hans von Axelson, who at that time worked for Handisam, now The Swedish Agency for Participation,

Making that production turned out to be more difficult than we initially thought. Even so, we managed to solve it by simultaneously sending three live streams. One with a regular production, one with the sign interpreter, and one with live text created by stenographers. Afterwards we mixed these three streams into one video, see the YouTube playlist from that seminar.

Even though we as video producers considered the result to be inferior, a lot of people and organizations thought it was a smashing hit. One person that attended the seminar in 2009 was Mia Ahlgren, at that time with The Swedish Disability Rights Federation. Mia’s and Hans von Axelson’s enthusiasm and (demanding ;) needs resulted in development efforts and a few more accessible live stream productions.

By the summer of 2010 all that culminated in much better productions at the Swedish “political week in Almedalen”. This particular project, that also included remotely generated sign interpretation and captioning, was partly funded by The Swedish Post and Telecom Agency (PTS) and is further documented in English.

The documentation from our efforts towards more accessible live stream productions so far are mainly in Swedish, see our Services section and relevant blog posts. Some cases, written in English, focus on live captioning. These include: Swedish Prime Minister Stefan Löfvén with live captions, About live captioning at Intersteno 2015, and Public Service and accessible live streaming.

What the SIS seminar was about

WeStreamU produce live streamed video from many different kinds of events. We are not focused on any particular customers, event types or subjects. Consequentally we typically do not understand or engage us in the actual content.

The current seminar was rather another exception that encompassed stuff we do. It was organized and managed by others. The Swedish Standards Institute (SIS) manages technical committees (TK) in various areas. One of these, SIS/TK 536, is the committee that developes standards for accessibility. The 15 committee members include “Funktionsrätt Sverige” (in English The Swedish Disability Rights Federation) which is the united voice of the Swedish disability movement and “Myndigheten för Delaktighet” (MFD, in English The Swedish Agency for Participation.

Within the EU, the draft “EN 17161 Design for All – Accessibility following a Design for All approach in products, goods and services – Extending the range of users” is on referall until November 9. That draft was together with the EU mandate 473 the focus at the September 22 breakfast seminar, organized by SIS. (Again, all this subject stuff is not our field, so please excuse us for any errors in the description.)

MFD, via Hans von Axelson, suggested to SIS that the seminar should be live streamed. Upon request we submitted a quotation to SIS, and suggested it would be a wonderful opportunity to test Web Captioner under real conditions. That is, an event with multiple speakers, a complicated subject, and a typical audio setup (different microphones and public audio).
WestreamU’s quotation and suggestion was after some refinements accepted by SIS.

The technical setup for our experiment

Our assignment was to secure the AV-setup, live stream, record, and project Web Captioner’s text on a stage monitor. WestreamU’s founder Richard Gatarski was asked to do a five-minute presentation about Web Captioner. He was also Technical Director for the production, and the producer Karl Danielsson took care of the cameras as well as audio and video mixing.

The description that follows is aimed for interested who has at least a basic understanding of live video production technology. The purpose is to give a general idea for those who might want to do a similar production, not to explain the setup in every detail.

We decided to use two laptops running the live video streaming and production software vMix. Both laptops were connected to the venue’s local network, which in turn provided internet access. One laptop was used to create the original production. With the other laptop we added subtitles to the original production and provided the live text for stage projection.

Furthermore, both laptops streamed live as well as recorded its output. The Original production was a standard setup with two cameras, projector signal integration, and audio management. Its output (video and audio) was shared via NDI to the local network. In the Text production laptop vMix picked up the NDI-signal without any latency.

Web Captioner is a stand alone web application that needs to run in Google Chrome. As described above the development effort to have API integration with other applications, in our case vMix, was unfortunately not ready for use. Instead we used another method to interact with Web Captioner.

Chrome was set to full screen in a window positioned on a second monitor attached via HDMI. Chrome’s audio source was set to a virtual audio cable originating from vMix’s audio bus A. The content of that window was “screen captured” into vMix.

The full capture with real-time text was sent to the stage projector via HDMI. The bottom three lines of text capture area was combined with the originals production’s video, thus creating the final Text production output. We also had a third laptop connected to the internet in order to monitor the live streams on the web.

Results

So, how did the experiment turn out? Well, from an engineering perspective it went very well. Everything worked as planned without any interruptions or technical mishaps.

We did not expect the quality of the text produced to be perfect, or close to what skilled human speech to text reporters typically provide. Nevertheless, what came out of Web Captioner was surprisingly good. And given the text production cost (zero) one must say – very good. Which is kind of reflected in the testimonial video from two of the organizers.

We were also happy to find that thanks to the seminar’s live stream many more Swedes learned about automatic live captioning. One of these were Micke Kring, ICT educator at the Swedish Elementary School Årstaskolan. He watched SIS’ livecast, and 99 minutes after it was over he was producing an autocaptioned live stream of his own. An hour later Micke’s production was shared in the Facebook group Digital Sammhällskunskap (eng. Digital society knowledge, 5 500+ members). And the sharing went on…

What do YOU think?

We had no intention of judging or assessing how useful or correct the subtitles Web Captioner generated was. That are questions for further research and tests. Our ambition was to experiment, share what we learned, and initiate/fuel discussions about automated subtitling.

Anyone interested are free to evaluate the text recorded by Web Captioner (by downloading the text file). As well as watch the live autocaptioned recordings in SIS’ playlist on YouTube to compare what was being said with what was written. Or, compare with the recordings that has subtitles produced afterwards by human translators.

Page updates

2017-10-03 Original posting