feat(captions): parse in-band captions from fmp4 segments (#197)

* initial parsing out of NAL units * Fixing syntax error * Move caption parsing out into another file * Add parsing/probing of mp4 containers to get caption timing data * Added parsing out of sample table. Pushed captions onto CaptionStream * Cleaning up: - Move discardEmulationPreventionBytes to captionsParser - Cleanup functions in mp4Probe. - Now only parseEmbeddedCaptions is exported for captions - Now handles multiple traks and ignores non-video traks - Renamed functions: captionTracksFromInit -> parseInitForCaptionMetadata, captionTracksFromSegment -> parseSegmentForSamples, captionNals -> parseCaptionNals * Add troubleshooting guide * Parse through segment only once: - in probe.js, combined functions parseSegmentForSamples and parseCaptionNals. - Renamed parseSegmentForSamples -> parseCaptionNals. - linting fixes * Cleaning up in probe.js and created mp4/captions-parser.js: - Use existing timescale() method. - Only use video track timescales, parse the segment only once rather than for each video track. - Moving all caption parsing methods to new file captions-parser.js * In mp4/captions-parser.js: - Using parseTfdt instead of parseDecodeTime - Using parseHdlr instead of parseHandlerType - Using parseTfhd from mp4Inspector instead In mp4-inspector.js: - Added durationIsEmpty and defaultBaseIsMoof flag handling to parseTfhd * In mp4/captions-parser.js: - Parse samples from all truns - Use parseTrun instead of oldParseSamples * Renamed m2ts/captions-parser.js -> tools/cea708-parser.js * Sort samples and fail fast if no video traks * mp4/captions-parser.js: remove unused code and add comments tools/mp4-inspector.js: add comments to parse.trun * mp4-inspector: revert changes to nalParse m2ts/caption-stream.js, mp4/captions-parser.js, tools/cea708-parser.js, tools/mp4-inspector.js: clean up imports/exports * - Added a captions debugging page - Added a test content creation doc - Updated doc/captions.md with more information * Update docs * Adding a few starter tests: one with real content, another with generated content * Using only needed parts of real test segment. Using correct init test segment. * Fixing linting errors for test, expanding test a bit * Use a different segment that can be reduced in size more easily * Add a bit more to captions doc * Adding function contracts to: mp4/captions-parser.js * Use moved test util methods in caption-stream.test.js * Update test content doc to use a specific format * update test-content doc * Update test-content.md * Fail fast if video track doesn't have captions. Added sensible unit tests * captions-parser.js: Handle SEIs that match the last sample test/captions-parser.test.js: Made a more complex test out of generated test data * Update to captions-parser: - Return both the active streams and the captions from the caption-parser. - Update tests * Use the first video track found only * captions-parser: fixing bugs - making sure all sample properties have default values * Update test-content.md * CR comments: - corrections to test-content guide and troubleshooting guide - renamed cea708-parser -> caption-packet-parser * Captions-parser: - restructure to persist CaptionStream across segments - rework mapping of pts/dts for SEI packets - fix tests * Update test-content.md * Allow setting an init segment instead of parsing both init and segment each time * In captions-parser.js: modify parse's signature to take videoTrackIds and timescales * Use shift instead of splice * CaptionsParser -> CaptionParser
videojs · Jul 16, 2018 · 7ad13aa · 7ad13aa
1 parent 6d7173e
commit 7ad13aa
Show file tree

Hide file tree

Showing 16 changed files with 13,611 additions and 230 deletions.
diff --git a/debug/captions.html b/debug/captions.html
diff --git a/docs/captions.md b/docs/captions.md
@@ -2,24 +2,35 @@
 Captions come in two varieties, based on their relationship to the
 video. Typically on the web, captions are delivered as a separate file
 and associated with a video through the `<track>` element. This type
-of captions are sometimes referred to as *out-of-band*. The
-alternative method involves embedding the caption data directly into
+of captions are sometimes referred to as *out-of-band*.
+
+The alternative method involves embedding the caption data directly into
 the video content and is sometimes called *in-band captions*. In-band
 captions exist in many videos today that were originally encoded for
 broadcast and they are also a standard method used to provide captions
-for live events.
+for live events. In-band HLS captions follow the CEA-708 standard.
+
+In this project, in-band captions are parsed using a [CaptionStream](#caption-stream). For MPEG2-TS sources, the CaptionStream is used as part of the [Transmuxer TS Pipeline](#transmuxer). For ISOBMFF sources, the CaptionStream is used as part of the [MP4 CaptionParser](#mp4-caption-parser).
+
+## Is my stream CEA-608/CEA-708 compatible?
 
-In-band HLS captions follow the CEA-708 standard.
+If you are having difficulties getting caption data as you expect out of Mux.js, take a look at our [Troubleshooting Guide](/docs/troubleshooting.md#608/708-caption-parsing) to ensure your content is compatible.
+
+# Useful Tools
+
+- [CCExtractor](#cc-extractor)
+- [Thumbcoil](#thumbcoil)
 
 # References
-- [Rec. ITU-T H.264](https://www.itu.int/rec/T-REC-H.264): H.264 video data specification. CEA-708 captions
-  are encapsulated in supplemental enhancement information (SEI)
-  network abstraction layer (NAL) units within the video stream.
-- [ANSI/SCTE
-  128-1](https://www.scte.org/documents/pdf/Standards/ANSI_SCTE%20128-1%202013.pdf):
-  the binary encapsulation of caption data within an SEI
-  user_data_registered_itu_t_t35 payload.
-- CEA-708-E: describes the framing and interpretation of caption data
-  reassembled out of the picture user data blobs.
-- CEA-608-E: specifies the hex to character mapping for extended language
-  characters.
+- [Rec. ITU-T H.264](#h264-spec): H.264 video data specification. CEA-708 captions are encapsulated in supplemental enhancement information (SEI) network abstraction layer (NAL) units within the video stream.
+- [ANSI/SCTE 128-1](#ansi-scte-spec): the binary encapsulation of caption data within an SEI user_data_registered_itu_t_t35 payload.
+- CEA-708-E: describes the framing and interpretation of caption data reassembled out of the picture user data blobs.
+- CEA-608-E: specifies the hex to character mapping for extended language characters.
+
+[h264-spec]: https://www.itu.int/rec/T-REC-H.264
+[ansi-scte-spec]: https://www.scte.org/documents/pdf/Standards/ANSI_SCTE%20128-1%202013.pdf
+[caption-stream]: /lib/m2ts/caption-stream.js
+[transmuxer]: /lib/mp4/transmuxer.js
+[mp4-caption-parser]: /lib/mp4/caption-parser.js
+[thumbcoil]: http://thumb.co.il/
+[cc-extractor]: https://github.com/CCExtractor/ccextractor
diff --git a/docs/test-content.md b/docs/test-content.md
@@ -0,0 +1,61 @@
+# Creating Test Content
+
+## Table of Contents
+
+- [CEA-608 Content](#creating-cea-608-content)
+
+## Creating CEA-608 Content
+
+- Use ffmpeg to create an MP4 file to start with:
+
+  `ffmpeg -f lavfi -i testsrc=duration=300:size=1280x720:rate=30 -profile:v baseline -pix_fmt yuv420p output.mp4` (no audio)
+
+   `ffmpeg -f lavfi -i testsrc=duration=300:size=1280x720:rate=30 -profile:v baseline -pix_fmt yuv420p -filter_complex "anoisesrc=d=300" output.mp4` (audio + video)
+
+  This uses ffmpeg's built-in `testsrc` source which generates a test video pattern with a color and timestamp. For this example, we are using a duration of `300` seconds, a size of `1280x720` and a framerate of `30fps`. We also specify extra settings `profile` and `pix_fmt` to force the output to be encoded using `avc1.42C01F`.
+
+- Create an [srt file](#srt) with the captions you would like to see with their timestamps.
+
+- Use ffmpeg to convert `ouput.mp4` to a flv file:
+
+  `ffmpeg -i output.mp4 -acodec copy -vcodec copy output.flv`
+
+- Use [libcaption](#libcaption) to embed the captions into the flv:
+
+  `flv+srt output.flv captions.srt with-captions.flv`
+
+- Use ffmpeg to convert `with-captions.flv` to mp4
+
+  `ffmpeg -i with-captions.flv -acodec copy -vcodec copy with-captions.mp4`
+
+- Use [Bento4](#bento4) to convert the file into a FMP4 file:
+
+  `bento4 mp4fragment with-captions.mp4 \
+    --verbosity 3 \
+    --fragment-duration 4000 \
+    --timescale 90000 \
+    with-captions-fragment.mf4`
+
+Then do *either* of the following:
+
+- Use [Bento4](#bento4) to split the file into an init segment and a fmp4 media segments:
+
+  `bento4 mp4split --verbose \
+    --init-segment with-captions-init.mp4 \
+    --media-segment segs/with-captions-segment-%llu.m4s \
+    with-captions-fragment.mf4`
+
+- Use [Bento4](#bento4) to create a DASH manifest:
+
+  `bento4 mp4dash -v \
+    --mpd-name=with-captions.mpd \
+    --init-segment=with-captions-init.mp4 \
+    --subtitles
+    with-captions-fragment.mf4`
+
+  This will create a DASH MPD and media segments in a new directory called `output`.
+
+
+[srt]: https://en.wikipedia.org/wiki/SubRip#SubRip_text_file_format
+[libcaption]: https://github.com/szatmary/libcaption
+[bento4]: https://www.bento4.com/documentation/
diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md
@@ -0,0 +1,16 @@
+# Troubleshooting Guide
+
+## Table of Contents
+- [608/708 Caption Parsing](caption-parsing)
+
+## 608/708 Caption Parsing
+
+**I have a stream with caption data in more than one field, but only captions from one field are being returned**
+
+You may want to confirm the SEI NAL units are constructed according to the CEA-608 or CEA-708 specification. Specifically:
+
+- that control codes/commands are doubled
+- control codes starting from 0x14, 0x20 and ending with 0x14, 0x2f in field 1 are replaced with 0x15, 0x20 to 0x15, 0x2f when used in field 2
+- control codes starting from 0x1c, 0x20 and ending with 0x1c, 0x2f in field 1 are replaced with 0x1d, 0x20 to 0x1d, 0x2f when used in field 2
+
+[caption-parsing]: /docs/troubleshooting.md#608/708-caption-parsing
diff --git a/lib/m2ts/caption-stream.js b/lib/m2ts/caption-stream.js
@@ -17,128 +17,8 @@
 // Link To Transport
 // -----------------
 
-// Supplemental enhancement information (SEI) NAL units have a
-// payload type field to indicate how they are to be
-// interpreted. CEAS-708 caption content is always transmitted with
-// payload type 0x04.
-var USER_DATA_REGISTERED_ITU_T_T35 = 4,
-    RBSP_TRAILING_BITS = 128,
-    Stream = require('../utils/stream');
-
-/**
-  * Parse a supplemental enhancement information (SEI) NAL unit.
-  * Stops parsing once a message of type ITU T T35 has been found.
-  *
-  * @param bytes {Uint8Array} the bytes of a SEI NAL unit
-  * @return {object} the parsed SEI payload
-  * @see Rec. ITU-T H.264, 7.3.2.3.1
-  */
-var parseSei = function(bytes) {
-  var
-    i = 0,
-    result = {
-      payloadType: -1,
-      payloadSize: 0
-    },
-    payloadType = 0,
-    payloadSize = 0;
-
-  // go through the sei_rbsp parsing each each individual sei_message
-  while (i < bytes.byteLength) {
-    // stop once we have hit the end of the sei_rbsp
-    if (bytes[i] === RBSP_TRAILING_BITS) {
-      break;
-    }
-
-    // Parse payload type
-    while (bytes[i] === 0xFF) {
-      payloadType += 255;
-      i++;
-    }
-    payloadType += bytes[i++];
-
-    // Parse payload size
-    while (bytes[i] === 0xFF) {
-      payloadSize += 255;
-      i++;
-    }
-    payloadSize += bytes[i++];
-
-    // this sei_message is a 608/708 caption so save it and break
-    // there can only ever be one caption message in a frame's sei
-    if (!result.payload && payloadType === USER_DATA_REGISTERED_ITU_T_T35) {
-      result.payloadType = payloadType;
-      result.payloadSize = payloadSize;
-      result.payload = bytes.subarray(i, i + payloadSize);
-      break;
-    }
-
-    // skip the payload and parse the next message
-    i += payloadSize;
-    payloadType = 0;
-    payloadSize = 0;
-  }
-
-  return result;
-};
-
-// see ANSI/SCTE 128-1 (2013), section 8.1
-var parseUserData = function(sei) {
-  // itu_t_t35_contry_code must be 181 (United States) for
-  // captions
-  if (sei.payload[0] !== 181) {
-    return null;
-  }
-
-  // itu_t_t35_provider_code should be 49 (ATSC) for captions
-  if (((sei.payload[1] << 8) | sei.payload[2]) !== 49) {
-    return null;
-  }
-
-  // the user_identifier should be "GA94" to indicate ATSC1 data
-  if (String.fromCharCode(sei.payload[3],
-                          sei.payload[4],
-                          sei.payload[5],
-                          sei.payload[6]) !== 'GA94') {
-    return null;
-  }
-
-  // finally, user_data_type_code should be 0x03 for caption data
-  if (sei.payload[7] !== 0x03) {
-    return null;
-  }
-
-  // return the user_data_type_structure and strip the trailing
-  // marker bits
-  return sei.payload.subarray(8, sei.payload.length - 1);
-};
-
-// see CEA-708-D, section 4.4
-var parseCaptionPackets = function(pts, userData) {
-  var results = [], i, count, offset, data;
-
-  // if this is just filler, return immediately
-  if (!(userData[0] & 0x40)) {
-    return results;
-  }
-
-  // parse out the cc_data_1 and cc_data_2 fields
-  count = userData[0] & 0x1f;
-  for (i = 0; i < count; i++) {
-    offset = i * 3;
-    data = {
-      type: userData[offset + 2] & 0x03,
-      pts: pts
-    };
-
-    // capture cc data when cc_valid is 1
-    if (userData[offset + 2] & 0x04) {
-      data.ccData = (userData[offset + 3] << 8) | userData[offset + 4];
-      results.push(data);
-    }
-  }
-  return results;
-};
+var Stream = require('../utils/stream');
+var cea708Parser = require('../tools/caption-packet-parser');
 
 var CaptionStream = function() {
 
@@ -165,23 +45,23 @@ var CaptionStream = function() {
 
 CaptionStream.prototype = new Stream();
 CaptionStream.prototype.push = function(event) {
-  var sei, userData;
+  var sei, userData, newCaptionPackets;
 
   // only examine SEI NALs
   if (event.nalUnitType !== 'sei_rbsp') {
     return;
   }
 
   // parse the sei
-  sei = parseSei(event.escapedRBSP);
+  sei = cea708Parser.parseSei(event.escapedRBSP);
 
   // ignore everything but user_data_registered_itu_t_t35
-  if (sei.payloadType !== USER_DATA_REGISTERED_ITU_T_T35) {
+  if (sei.payloadType !== cea708Parser.USER_DATA_REGISTERED_ITU_T_T35) {
     return;
   }
 
   // parse out the user data payload
-  userData = parseUserData(sei);
+  userData = cea708Parser.parseUserData(sei);
 
   // ignore unrecognized userData
   if (!userData) {
@@ -210,7 +90,8 @@ CaptionStream.prototype.push = function(event) {
   }
 
   // parse out CC data packets and save them for later
-  this.captionPackets_ = this.captionPackets_.concat(parseCaptionPackets(event.pts, userData));
+  newCaptionPackets = cea708Parser.parseCaptionPackets(event.pts, userData);
+  this.captionPackets_ = this.captionPackets_.concat(newCaptionPackets);
   if (this.latestDts_ !== event.dts) {
     this.numSameDts_ = 0;
   }