Skip to content

Commit

Permalink
feat(captions): parse in-band captions from fmp4 segments (#197)
Browse files Browse the repository at this point in the history
* initial parsing out of NAL units

* Fixing syntax error

* Move caption parsing out into another file

* Add parsing/probing of mp4 containers to get caption timing data

* Added parsing out of sample table. Pushed captions onto CaptionStream

* Cleaning up:
- Move discardEmulationPreventionBytes to captionsParser
- Cleanup functions in mp4Probe.
  - Now only parseEmbeddedCaptions is exported for captions
  - Now handles multiple traks and ignores non-video traks
  - Renamed functions: captionTracksFromInit -> parseInitForCaptionMetadata, captionTracksFromSegment -> parseSegmentForSamples, captionNals -> parseCaptionNals

* Add troubleshooting guide

* Parse through segment only once:
- in probe.js, combined functions parseSegmentForSamples and parseCaptionNals.
- Renamed parseSegmentForSamples -> parseCaptionNals.
- linting fixes

* Cleaning up in probe.js and created mp4/captions-parser.js:
- Use existing timescale() method.
- Only use video track timescales, parse the segment only once rather than for each video track.
- Moving all caption parsing methods to new file captions-parser.js

* In mp4/captions-parser.js:
- Using parseTfdt instead of parseDecodeTime
- Using parseHdlr instead of parseHandlerType
- Using parseTfhd from mp4Inspector instead

In mp4-inspector.js:
- Added durationIsEmpty and defaultBaseIsMoof flag handling to parseTfhd

* In mp4/captions-parser.js:
- Parse samples from all truns
- Use parseTrun instead of oldParseSamples

* Renamed m2ts/captions-parser.js -> tools/cea708-parser.js

* Sort samples and fail fast if no video traks

* mp4/captions-parser.js: remove unused code and add comments
tools/mp4-inspector.js: add comments to parse.trun

* mp4-inspector: revert changes to nalParse
m2ts/caption-stream.js, mp4/captions-parser.js, tools/cea708-parser.js, tools/mp4-inspector.js: clean up imports/exports

* - Added a captions debugging page
- Added a test content creation doc
- Updated doc/captions.md with more information

* Update docs

* Adding a few starter tests: one with real content, another with generated content

* Using only needed parts of real test segment. Using correct init test segment.

* Fixing linting errors for test, expanding test a bit

* Use a different segment that can be reduced in size more easily

* Add a bit more to captions doc

* Adding function contracts to: mp4/captions-parser.js

* Use moved test util methods in caption-stream.test.js

* Update test content doc to use a specific format

* update test-content doc

* Update test-content.md

* Fail fast if video track doesn't have captions. Added sensible unit tests

* captions-parser.js: Handle SEIs that match the last sample
test/captions-parser.test.js: Made a more complex test out of generated test data

* Update to captions-parser:
- Return both the active streams and the captions from the caption-parser.
- Update tests

* Use the first video track found only

* captions-parser: fixing bugs - making sure all sample properties have default values

* Update test-content.md

* CR comments:
- corrections to test-content guide and troubleshooting guide
- renamed cea708-parser -> caption-packet-parser

* Captions-parser:
- restructure to persist CaptionStream across segments
- rework mapping of pts/dts for SEI packets
- fix tests

* Update test-content.md

* Allow setting an init segment instead of parsing both init and segment each time

* In captions-parser.js: modify parse's signature to take videoTrackIds and timescales

* Use shift instead of splice

* CaptionsParser -> CaptionParser
  • Loading branch information
ldayananda authored and forbesjo committed Jul 16, 2018
1 parent 6d7173e commit 7ad13aa
Show file tree
Hide file tree
Showing 16 changed files with 13,611 additions and 230 deletions.
407 changes: 407 additions & 0 deletions debug/captions.html

Large diffs are not rendered by default.

41 changes: 26 additions & 15 deletions docs/captions.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,35 @@
Captions come in two varieties, based on their relationship to the
video. Typically on the web, captions are delivered as a separate file
and associated with a video through the `<track>` element. This type
of captions are sometimes referred to as *out-of-band*. The
alternative method involves embedding the caption data directly into
of captions are sometimes referred to as *out-of-band*.

The alternative method involves embedding the caption data directly into
the video content and is sometimes called *in-band captions*. In-band
captions exist in many videos today that were originally encoded for
broadcast and they are also a standard method used to provide captions
for live events.
for live events. In-band HLS captions follow the CEA-708 standard.

In this project, in-band captions are parsed using a [CaptionStream](#caption-stream). For MPEG2-TS sources, the CaptionStream is used as part of the [Transmuxer TS Pipeline](#transmuxer). For ISOBMFF sources, the CaptionStream is used as part of the [MP4 CaptionParser](#mp4-caption-parser).

## Is my stream CEA-608/CEA-708 compatible?

In-band HLS captions follow the CEA-708 standard.
If you are having difficulties getting caption data as you expect out of Mux.js, take a look at our [Troubleshooting Guide](/docs/troubleshooting.md#608/708-caption-parsing) to ensure your content is compatible.

# Useful Tools

- [CCExtractor](#cc-extractor)
- [Thumbcoil](#thumbcoil)

# References
- [Rec. ITU-T H.264](https://www.itu.int/rec/T-REC-H.264): H.264 video data specification. CEA-708 captions
are encapsulated in supplemental enhancement information (SEI)
network abstraction layer (NAL) units within the video stream.
- [ANSI/SCTE
128-1](https://www.scte.org/documents/pdf/Standards/ANSI_SCTE%20128-1%202013.pdf):
the binary encapsulation of caption data within an SEI
user_data_registered_itu_t_t35 payload.
- CEA-708-E: describes the framing and interpretation of caption data
reassembled out of the picture user data blobs.
- CEA-608-E: specifies the hex to character mapping for extended language
characters.
- [Rec. ITU-T H.264](#h264-spec): H.264 video data specification. CEA-708 captions are encapsulated in supplemental enhancement information (SEI) network abstraction layer (NAL) units within the video stream.
- [ANSI/SCTE 128-1](#ansi-scte-spec): the binary encapsulation of caption data within an SEI user_data_registered_itu_t_t35 payload.
- CEA-708-E: describes the framing and interpretation of caption data reassembled out of the picture user data blobs.
- CEA-608-E: specifies the hex to character mapping for extended language characters.

[h264-spec]: https://www.itu.int/rec/T-REC-H.264
[ansi-scte-spec]: https://www.scte.org/documents/pdf/Standards/ANSI_SCTE%20128-1%202013.pdf
[caption-stream]: /lib/m2ts/caption-stream.js
[transmuxer]: /lib/mp4/transmuxer.js
[mp4-caption-parser]: /lib/mp4/caption-parser.js
[thumbcoil]: http://thumb.co.il/
[cc-extractor]: https://github.com/CCExtractor/ccextractor
61 changes: 61 additions & 0 deletions docs/test-content.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Creating Test Content

## Table of Contents

- [CEA-608 Content](#creating-cea-608-content)

## Creating CEA-608 Content

- Use ffmpeg to create an MP4 file to start with:

`ffmpeg -f lavfi -i testsrc=duration=300:size=1280x720:rate=30 -profile:v baseline -pix_fmt yuv420p output.mp4` (no audio)

`ffmpeg -f lavfi -i testsrc=duration=300:size=1280x720:rate=30 -profile:v baseline -pix_fmt yuv420p -filter_complex "anoisesrc=d=300" output.mp4` (audio + video)

This uses ffmpeg's built-in `testsrc` source which generates a test video pattern with a color and timestamp. For this example, we are using a duration of `300` seconds, a size of `1280x720` and a framerate of `30fps`. We also specify extra settings `profile` and `pix_fmt` to force the output to be encoded using `avc1.42C01F`.

- Create an [srt file](#srt) with the captions you would like to see with their timestamps.

- Use ffmpeg to convert `ouput.mp4` to a flv file:

`ffmpeg -i output.mp4 -acodec copy -vcodec copy output.flv`

- Use [libcaption](#libcaption) to embed the captions into the flv:

`flv+srt output.flv captions.srt with-captions.flv`

- Use ffmpeg to convert `with-captions.flv` to mp4

`ffmpeg -i with-captions.flv -acodec copy -vcodec copy with-captions.mp4`

- Use [Bento4](#bento4) to convert the file into a FMP4 file:

`bento4 mp4fragment with-captions.mp4 \
--verbosity 3 \
--fragment-duration 4000 \
--timescale 90000 \
with-captions-fragment.mf4`

Then do *either* of the following:

- Use [Bento4](#bento4) to split the file into an init segment and a fmp4 media segments:

`bento4 mp4split --verbose \
--init-segment with-captions-init.mp4 \
--media-segment segs/with-captions-segment-%llu.m4s \
with-captions-fragment.mf4`

- Use [Bento4](#bento4) to create a DASH manifest:

`bento4 mp4dash -v \
--mpd-name=with-captions.mpd \
--init-segment=with-captions-init.mp4 \
--subtitles
with-captions-fragment.mf4`

This will create a DASH MPD and media segments in a new directory called `output`.


[srt]: https://en.wikipedia.org/wiki/SubRip#SubRip_text_file_format
[libcaption]: https://github.com/szatmary/libcaption
[bento4]: https://www.bento4.com/documentation/
16 changes: 16 additions & 0 deletions docs/troubleshooting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Troubleshooting Guide

## Table of Contents
- [608/708 Caption Parsing](caption-parsing)

## 608/708 Caption Parsing

**I have a stream with caption data in more than one field, but only captions from one field are being returned**

You may want to confirm the SEI NAL units are constructed according to the CEA-608 or CEA-708 specification. Specifically:

- that control codes/commands are doubled
- control codes starting from 0x14, 0x20 and ending with 0x14, 0x2f in field 1 are replaced with 0x15, 0x20 to 0x15, 0x2f when used in field 2
- control codes starting from 0x1c, 0x20 and ending with 0x1c, 0x2f in field 1 are replaced with 0x1d, 0x20 to 0x1d, 0x2f when used in field 2

[caption-parsing]: /docs/troubleshooting.md#608/708-caption-parsing
135 changes: 8 additions & 127 deletions lib/m2ts/caption-stream.js
Original file line number Diff line number Diff line change
Expand Up @@ -17,128 +17,8 @@
// Link To Transport
// -----------------

// Supplemental enhancement information (SEI) NAL units have a
// payload type field to indicate how they are to be
// interpreted. CEAS-708 caption content is always transmitted with
// payload type 0x04.
var USER_DATA_REGISTERED_ITU_T_T35 = 4,
RBSP_TRAILING_BITS = 128,
Stream = require('../utils/stream');

/**
* Parse a supplemental enhancement information (SEI) NAL unit.
* Stops parsing once a message of type ITU T T35 has been found.
*
* @param bytes {Uint8Array} the bytes of a SEI NAL unit
* @return {object} the parsed SEI payload
* @see Rec. ITU-T H.264, 7.3.2.3.1
*/
var parseSei = function(bytes) {
var
i = 0,
result = {
payloadType: -1,
payloadSize: 0
},
payloadType = 0,
payloadSize = 0;

// go through the sei_rbsp parsing each each individual sei_message
while (i < bytes.byteLength) {
// stop once we have hit the end of the sei_rbsp
if (bytes[i] === RBSP_TRAILING_BITS) {
break;
}

// Parse payload type
while (bytes[i] === 0xFF) {
payloadType += 255;
i++;
}
payloadType += bytes[i++];

// Parse payload size
while (bytes[i] === 0xFF) {
payloadSize += 255;
i++;
}
payloadSize += bytes[i++];

// this sei_message is a 608/708 caption so save it and break
// there can only ever be one caption message in a frame's sei
if (!result.payload && payloadType === USER_DATA_REGISTERED_ITU_T_T35) {
result.payloadType = payloadType;
result.payloadSize = payloadSize;
result.payload = bytes.subarray(i, i + payloadSize);
break;
}

// skip the payload and parse the next message
i += payloadSize;
payloadType = 0;
payloadSize = 0;
}

return result;
};

// see ANSI/SCTE 128-1 (2013), section 8.1
var parseUserData = function(sei) {
// itu_t_t35_contry_code must be 181 (United States) for
// captions
if (sei.payload[0] !== 181) {
return null;
}

// itu_t_t35_provider_code should be 49 (ATSC) for captions
if (((sei.payload[1] << 8) | sei.payload[2]) !== 49) {
return null;
}

// the user_identifier should be "GA94" to indicate ATSC1 data
if (String.fromCharCode(sei.payload[3],
sei.payload[4],
sei.payload[5],
sei.payload[6]) !== 'GA94') {
return null;
}

// finally, user_data_type_code should be 0x03 for caption data
if (sei.payload[7] !== 0x03) {
return null;
}

// return the user_data_type_structure and strip the trailing
// marker bits
return sei.payload.subarray(8, sei.payload.length - 1);
};

// see CEA-708-D, section 4.4
var parseCaptionPackets = function(pts, userData) {
var results = [], i, count, offset, data;

// if this is just filler, return immediately
if (!(userData[0] & 0x40)) {
return results;
}

// parse out the cc_data_1 and cc_data_2 fields
count = userData[0] & 0x1f;
for (i = 0; i < count; i++) {
offset = i * 3;
data = {
type: userData[offset + 2] & 0x03,
pts: pts
};

// capture cc data when cc_valid is 1
if (userData[offset + 2] & 0x04) {
data.ccData = (userData[offset + 3] << 8) | userData[offset + 4];
results.push(data);
}
}
return results;
};
var Stream = require('../utils/stream');
var cea708Parser = require('../tools/caption-packet-parser');

var CaptionStream = function() {

Expand All @@ -165,23 +45,23 @@ var CaptionStream = function() {

CaptionStream.prototype = new Stream();
CaptionStream.prototype.push = function(event) {
var sei, userData;
var sei, userData, newCaptionPackets;

// only examine SEI NALs
if (event.nalUnitType !== 'sei_rbsp') {
return;
}

// parse the sei
sei = parseSei(event.escapedRBSP);
sei = cea708Parser.parseSei(event.escapedRBSP);

// ignore everything but user_data_registered_itu_t_t35
if (sei.payloadType !== USER_DATA_REGISTERED_ITU_T_T35) {
if (sei.payloadType !== cea708Parser.USER_DATA_REGISTERED_ITU_T_T35) {
return;
}

// parse out the user data payload
userData = parseUserData(sei);
userData = cea708Parser.parseUserData(sei);

// ignore unrecognized userData
if (!userData) {
Expand Down Expand Up @@ -210,7 +90,8 @@ CaptionStream.prototype.push = function(event) {
}

// parse out CC data packets and save them for later
this.captionPackets_ = this.captionPackets_.concat(parseCaptionPackets(event.pts, userData));
newCaptionPackets = cea708Parser.parseCaptionPackets(event.pts, userData);
this.captionPackets_ = this.captionPackets_.concat(newCaptionPackets);
if (this.latestDts_ !== event.dts) {
this.numSameDts_ = 0;
}
Expand Down
Loading

0 comments on commit 7ad13aa

Please sign in to comment.