specification v0.01 (early closed beta)
October 21, 2019


GetLiveCaptions is an open, simple, and cost effective protocol for delivering real-time captions from a reporting service to a live streaming production.

A “reporting service” is here understood as a cloud service that takes text written by a captioner (a.k.a. Speech To Text Reporter, STTR) and feeds that to its server. A “streaming production” uses applocations like vMix, Tricaster, etc to publish its output live on the web.

Basically the live streaming application use the GET HTTP request method to pull XML formatted caption blocks from the reporting service’s server.

GetLiveCaptions’ simplicity implies advantages as well as restrictions, including:

+ Firewall friendly (no special port requirements)

+ No or minimal development efforts required by the live streaming app

– No error checking (words may be lost due to a bad internet connection, although unlikely)

– Reporter cannot format the text or position the captions


Future specifications may add the option to support:

  • character-by-character, word-by-word, or line-by-line (currently word-by-word)
  • scrolling within a block or block by block (currently block-by-block)
  • recording, methods and delivery

Known issues:

Name/values specified in the query is per session. We need to figure out how to end one session.
For example, if lines=2 the reporting service delivers 2 lines during the whole session.

Should we support plain text?

Requesting a caption block

A server that fully implements the GetLiveCaptions protocol will accept and support all the listed name/value pairs. Fewer or none are possible, as long as that is agreed between the reporting service and live streaming producer.

All defaults may be specified by the reporting service, including user dependant defaults (i.e. that the reporting service supports user specific defaults).

Name    Value
user any string the server uses to identify a customer
event any string the server uses to identify the specific event managed by the customer
type specifies the response format: [rss] or [xml], suggested default [xml]
lines number of rows in the caption block: [1], [2], [3], [4], suggested default [2]
length maximum number of characters per line, suggested default [40]
hold number of milliseconds before a captions block is replaced, suggested default [200]
align [left], [right], [center], suggested default [left]
record how and if the blocks should be recorded: [no], [transcript], [srt]



Response data types

A few live streaming applications are already capable of using externally sourced data of different types, including plain text and marked up as xml.

Tricaster RSS
Wirecast (none, can use RSS but no refresh option)
OBS (none, but a plug-in could be developed)
VidBlaster (unknown)
Livestream (unknown)

For GetLiveCaptions XML is preferred because of its simplicity, although RSS is also supported because some live streaming apps use it.


It is adviced that the encoding format is UTF-8 to support international characters.
Adviced, but not necessary, name the tags as in the example.

Example with four lines in the captions block:

<?xml version="1.0" encoding="UTF-8"?>
    <line>text line 1</line>
    <line>text line 2</line>
    <line>text line 3</line>
    <line>text line 4</line>


It is adviced that the encoding format is UTF-8 to support international characters.
RSS is a dialect of XML, developed for another use. Therefore, most of its meta data is ignored here. Even so it is adviced to include the RSS-required channel elements (title, description, link) as the live streaming app may expect them. The content of those three elements may be arbitrary, although the link may well be formatted as a url (in case the live streaming app parse it).

A couple of the live streaming apps that currently handle RSS utilize at least the item elements title, link, pubDate, and description. Therefore GetLiveCaptions use these four elements to support up to four lines of text in a captions block.

Example with four lines in the captions block:

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
    <title>An example of RSS formated as a captions block</title>
    <description>This is a demo feed.</description>
       <title>Line one</title>
       <link>Line two</link>
       <pubDate>Line three</pubDate>
       <description>Line four</description>

Filling the captions block

Current specification is for word-by-word, and block-by-block (not scrolling within a block).

The reporting service should for each request always deliver the number of lines as requested in the query string, or the default number if the number of lines is not specified in the query.

The captions block should be updated immediately with every new word, indicated by any following separation character such as space, comma, point, etc.

Word wrap a line if it’s number of characters exceeds [length].

Long words (e.g. beyond half [length]) may be separated into two words by adding a hyphen and putting the remaining part on the next line(s). The separation can be made hard (lenght limit) or intelligent (lexical). Examples:

hard (length =40):


When all the lines of a captions block are filled, the reporting service should hold it for the specified time before starting on a new empty captions block.

The reporter may “clear” the captions block by entering empty lines.

Using the captions blocks

It is basically up to the live stream producer to decide how the captions blocks should be presented in the live stream.

GetLiveCaptions is designed to not require any editing of the captions by the live streaming app. But of course that is possible if needed.

Some live streaming apps may wrap one long line into multiple shorter ones. Other apps may require that the line separation is determined in the captions block as provided by the reporting service.