File Formats-ISOBMFF - M2R Multimedia Networking
Transcription
File Formats-ISOBMFF - M2R Multimedia Networking
Multimedia Networking: The ISO Base Media File Format Family of Standards Cyril Concolato 1 Institut Mines-Télécom History Development of the MPEG-4 Standard (1998) • • • Requirement for support for more streams than just 1 audio & 1 video Call for Proposals Selected Base Technology: QuickTime Development of Motion JPEG-2000 Standard • • Call for Proposals Selected Base Technology: QuickTime Consequence • Development of a common standard: ─ The ISO Base Media File Format ─ based on QuickTime File Format Successful standard • 2 MPEG, JPEG, Flash, 3GPP, Microsoft, Sony, … Institut Mines-Télécom Multiple Standard Specifications MPEG-4 Part 12 ISOBMFF MPEG-4 Part 14 (MP4) MPEG-4 Part 15 (AVC, HEVC) MPEG-21 Part 9 (MP21) MPEGDASH = JPEG-2000 Part 12 MPEG-4 Part 30 (WebVTT, TTML) 3GPP / 3GPP2 Core standard extended for • • • • • 3 Specific MPEG-4 Systems storage Specific H264 | AVC, SVC, MVC, HEVC requirements Storage of XML or non-timed data HTTP Streaming features (MPEG-DASH) Storage of subtitles Institut Mines-Télécom Flash (F4V) Identifying ISOBMFF files Extension not sufficient • mp4, m4a, m4s, 3gp, f4v, ism, … Magic number/sniffing: • • mandatory structure in some files: ‘ftyp’ or ‘styp’ ‘ftyp’/’styp’ indicate “brands” ─ Major brand: “best use” of the file ─ Compatible brands • Example: “isom”, “avc1”, “isoX” (X=2…9), “mp41”, “mp71”, … MIME types (RFC 6381) • • • • “video/mp4”: if it contains visual data “audio/mp4”: otherwise, if it contains audio, “application/mp4”: otherwise (in particular metadata, …) Sub-parameters ─ “codecs” • • • Comma-separated list of track information Uses the sample entry 4cc: “avc1”, “mp4a”, “stpp”, Additional codec-specific information (profiles, levels …) ─ “profiles” • 6 brands Institut Mines-Télécom ISOBMFF: Principles Logical Structure • Information organization Physical Structure • Byte organization 7 Institut Mines-Télécom ISOBMFF: Logical Structure (1/4) A file • Contains ─ Streamable timed data in tracks of a movie ─ Other data (untimed or non streamable) in items ─ Or a combination of both • Defines a common timeline for all tracks for synchronization • Declares its type and its compatibility 8 Institut Mines-Télécom ISOBMFF: Logical Structure (2/4) An item • Represents data consumed as a whole and valid for the entire duration of the movie ─ E.g. metadata: copyright information ─ E.g cover image • May have associated properties (type, position, size …) • May be encrypted and/or compressed • If multiple items are used, the primary item is the entry point 9 Institut Mines-Télécom ISOBMFF: Logical Structure (3/4) A track • Corresponds to timed data of a specific type, ─ typically media data for a given codec (elementary stream) ⚠ Some tracks may not contain media data (e.g. hint tracks) • Is decomposed into samples ─ Of variable sizes and durations • Is associated to a single decoder ⚠ except for scalable codecs (N tracks – 1 decoder) • Has associated decoder configuration(s): sample description(s) ─ Can vary over time • • • • 10 May be linked, grouped or alternative to other tracks May have associated untimed data in items May be encrypted May have metadata or user data attached Institut Mines-Télécom ISOBMFF: Logical Structure (4/4) A sample • Represents data used by a decoder at given times (DTS, CTS) in the media timeline • Has properties (size, position, random access, decoder configuration…) • May be described in terms of sub-samples ─ E.g. an HEVC/AVC AU decomposed into NAL units • May be associated to similar samples in sample groups ─ E.g. samples that have the same dependencies on an other sample • May have sample-specific auxiliary information ─ E.g. samples that use the same encryption key 11 Institut Mines-Télécom ISOBMFF: Physical Structure Data is stored in a basic structure called box • No data outside of a box Each box has • • • • length (4 or 8 bytes), type (4 printable chars), possibly version and flags, and type-specific data Extensible format • Unknown boxes can be skipped (syntactically) Header information is a hierarchical set of boxes • 12 typically rooted by a ‘moov’ or ‘meta’ box Institut Mines-Télécom Important Boxes ftyp (File Type) • One per file • Version • Compatibility with other file types mdat (Media Data): • N per file moov (Movie): • High-level Box for signalling information for the movie mvhd (Movie Header): • Generic information about the movie trak (Track): • High-level Box for signaling info for the track meta (Metadata): • High-level Box for signaling info for metadata And many more … 13 Institut Mines-Télécom hdlr (Handler) • Indication of the type of the track or the type of metadata dinf/dref (Data Information/Data Reference) • Location of media data (local vs. remote) stbl (Sample Table) • High level box for sample signalling information stsd (Sample Description) • Low-level box for signalling elementary stream configuration stts (Sample To Time) • Low-level box for signalling DTS for each sample • Run-Length encoded stsz (Sample To Size) • Low-level box for signalling the size of each sample • Run-Length encoded Typical Box Hierarchy (1 track) ISO File ftyp mvhd vmhd trak iods tkhd mdia ... mdhd minf hdlr dinf dref 14 mdat moov Data not box-structured … stbl stsd Institut Mines-Télécom stts stsz … … Typical Box Hierarchy (Untimed data) ISO File ftyp hdlr meta iinf mdat iloc ipro sinf 15 Institut Mines-Télécom … Dual Headed File: Timed and Untimed data File ftyp vmhd mdat mvhd trak tkhd mdia tref mdhd minf hdlr meta hdlr dinf dref page 17 moov iinf iloc sinf stbl stsd Institut Mines-Télécom stts ctts ipro stsc stsz … ISOBMFF: Physical Structure Separate Data vs. Header storage Header information describing the properties of samples, of tracks is stored separately from the sample data (header vs payload) Track Movie header header Track header Movie Media Data 18 sample Institut Mines-Télécom Video track information Audio track information sample frame sample sample frame 18 Media data storage Unstructured, bag of bytes • Can’t extract sample data easily without using header information Data bytes are stored in one or more boxes of specific types • • mainly ‘mdat’ (or ‘idat’ for item data in some cases) Usually in the same file as the ‘moov’ or ‘meta’ header ─ may also be stored in a separate file that doesn’t use boxes Sample data storage • The bytes of a sample are stored contiguously ─ (except if extractors are used, eg SVC) • Bytes of multiple time-contiguous samples can be stored contiguously (called chunks) ─ Chunks of different tracks can be interleaved Item data storage • The bytes of an item can be stored in separate byte ranges (called extents) ─ Item data and media data can be interleaved 19 Institut Mines-Télécom Physical Storage of Samples Meta-data Box type: moov Box type: mvhd Box type: iods Creation date Duration … Box type: trak ES descriptors… Media-data Box type: mdat Box type: tkhd TrackID Width, height … Box type: mdia Box type: minf AUj Box type: stbl Box type: stsd Box type: mp4v Box type: esds Dec config Descr SL Config Descr… 20 Institut Mines-Télécom Box type: stco … Chunk i Chunk i+1 … AUk AUk+1 AUl Chunk (i+1) Progressive Download of ISO Files Interleaving of Chunks • N seconds of audio, N seconds of video, … Constrained box order • ‘ftyp’, ‘moov’, ‘meta’ … first • Optional ‘pdin’ box for informing on initial delay before playback • ‘mdat’ last Limitations • Cannot cover live cases as the whole ‘moov’ box is written first and only 1 ‘moov’ box allowed per file 21 Institut Mines-Télécom Movie Fragments Single ‘moov’ box limitation • • Cannot be written until the entire data is known Size New requirements • Being able to flush regularly the header upon recording Introducing « Movie Fragments » • Initial structure (moov + N trak) ─ Slightly modified ─ Adding indication that additional movie fragments exist • Followed by a sequence of new boxes for signalling ─ moof + N traf ─ Each traf is made of K trun • 22 Additional data for fragments still stored in mdat boxes Institut Mines-Télécom Movie Fragments Box Hierarchy ISO File ftyp moov mdat moof mdat mfhd mvex mehd traf … traf trex tfhd … trun … trex trun 23 Institut Mines-Télécom Structure of a fragmented file moov Video track Audio track trak trak trex trex Movie mdat moof traf traf trun trun trun Movie Fragment mdat … moof traf traf trun trun mdat 24 Institut Mines-Télécom trun Movie Fragment Sample information in fragmented files Header compression • Default sample information ─ Duration, size, sample description ─ sample properties (sync, depended, …) ─ « sample_flags » • Possibly coded in 3 places: ─ moov/mvex/trex: default for all samples in the track ─ moof/traf/tfhd: default for all truns in the moof ─ moof/traf/trun: default for the first sample Practically • Smooth Streaming imposes defaults to be in tfhd ─ No factorization between movie fragments 25 Institut Mines-Télécom Hint Tracks Special Track • • • • Dedicated to a delivery protocol Stores timed instructions to create protocol packets Linked to a media track Not (necessarily) containing media data Examples • Hint track for streaming of MP4 using RTP • Hint track for delivery over FLUTE Variant: Reception Hint Track • Storage of delivery streams in ISO (MPEG-2 TS and RTP) 26 Institut Mines-Télécom Hint track and other tracks Movie (Signalling) Video track trak moov Hint track trak Sample Data sample hint mdat header pointer 27 Institut Mines-Télécom sample sample frame hint header pointer sample frame ISOBMFF: File Types & Usages (1/2) Plain Files • • Simple recording of timed data (data first, header last) ISOBMFF Tools: mdat, moov, … Progressive Files • • Progressive download and playback (Header first, data last and interleaved) ISOBMFF Tools: storage using multiple interleaved chunks Fragmented Files • • 28 Files for long-running recording sessions (multiple blocks of header and data) ISOBMFF Tools: Movie fragments Institut Mines-Télécom ISOBMFF: File Types & Usages (2/2) Segmented Files • • Self-contained fragments stored in single file or in separate files for HTTP streaming ISOBMFF Tools: segment files, indexing Packaging files • • Storage of related timed or untimed data (e.g. JPEG or XML + audio/video) ISOBMFF Tools: ‘meta’ box Files for generating transport-related streams • • Protocol-specific instructions to create streams from files ISOBMFF Tools: hint tracks (RTP, FLUTE, …) Files recording transport-related streams • • 29 Recording of protocol-specific packets into files for replay ISOBMFF Tools: reception hint tracks (RTP, MPEG-2 TS) Institut Mines-Télécom ISOBMFF Timelines Media Timeline Movie Timeline 30 Institut Mines-Télécom ISOBMFF Media Timeline Expressing times of each AU in its track Boxes involved • • • • • • 31 “mdhd”: gives the timescale “stts”: provides the DTS for each sample “ctts”: provides CTS for each sample, when needed “cslg”:additional info for specific CTS/DTS config “tfdt”: time anchor for movie fragments “trun”: timing for movie fragments relative to tfdt Institut Mines-Télécom Basic Media Timeline Information “stts” / “trun” coding • DTS values given as sample_delta values ─ DTS(0) = 0 ─ DTS(i) = SUM(sample_delta, 0, i-1) • Values are run-length encoded “ctts” / “trun” coding • Offset from the DTS ─ Box version 0: offset >=0 ─ Box version 1: offset >=0 or < 0 • CTS = DTS + Composition Offset • Values are run-length encoded 32 Institut Mines-Télécom Times in Movie Fragments Default behavior • TM(0) = 0 • TM(i+1) = TM(i) + Duration(i) Special case • In some scenarios (e.g. DASH), upon receiving of a first fragment, the end time of the previous sample is unknown! • Need to insert the media time of the first sample in a fragment ─ TrackFragmentBaseMediaDecodeTime tfdt ─ If tfdt > TM(i), the duration of sample i-1 is extended • Useful for variable frame rate streams, e.g. for subtitles 33 Institut Mines-Télécom ISOBMFF Movie Timeline Expresses times for the presentation of the movie from its tracks • • Main use for multi-tracks synchronization ─ E.g. CTStrack1(AU0) CTStrack2(AU0) Useful for single track presentation when not all samples are presented ─ E.g. start presenting track 1 from the 3rd second Boxes involved • • • “mvhd”: gives the timescale for movie times “edts”: gives instructions for presenting a track “sidx”: timing for random access points in segments Track media times (i.e. CTS) are mapped onto the Movie Timeline • Default: simple, linear mapping T(movie, track(i), sample(k)) = CTS(track(i), sample(k)) * mvhd.timescale/mdhd.timescale • 34 Complex, non-linear mapping with the concept of Edit List Institut Mines-Télécom ISOBMFF Edit List List of edit entries • type, media time, movie duration Types of edit entries • • • dwell (frame freeze), empty (no presentation), Normal playback (with possible speed changes) Media times do not necessarily align with frame start times ! empty 0 normal T1 dwell T2 normal T(movie, track(i)) T3 CTS (track(i)) 0 35 Institut Mines-Télécom Typical Edit List example Use case: File with 1 audio and 1 video stream • Case 1: video with B frames ─ Presentation timestamp of the first video frame: • • CTSvideo(AU0) >0 Case 2: first audio AU not produced at the same time as the first video AU ─ For each track DTS(AU0) is by definition 0! Option 1: audio shift • • Video track: no edit list Audio track: ─ One « empty » edit list entry from 0 to T(movie, F0) ─ One « normal » entry from CTS=0 to the end of the track • Drawback: need to apply the edit for every new track (subtitles …) Option 2: video shift • • 36 Audio track: no edit list Video track: one « normal » entry from CTS(F0) to the end of the media Institut Mines-Télécom Sample Groups Mechanism to store additional sample signaling • A box providing descriptors (sgpd) ─ Located in stbl: global for the entire track ─ Located in traf: applicable only for the fragment • A box providing which descriptor apply to which sample (sbgp) ─ Located in stbl ─ Located in traf Grouping types • Each descriptor/group is identified by a 4CC ─ Ex: ‘rap ’, ‘roll’, ’prol’ • Contains several entries ─ Ex: one entry per « roll » distance • The mapping sample->group can be parametrized ─ ex: view priority in MVC In practice • 37 Smooth Streaming imposes that descriptions be on fragments only Institut Mines-Télécom Sample Auxiliary Information Possibility to add data associated to a sample • Not used directly by the decoder ─ Ex: data for decrypting the sample • Without having to define a new track • Can be used in fragmented and non-fragmented mode Tools • • • • 38 Data type indicated by a 4CC (ex: « cenc ») Box ‘saiz’ indicates the additional data sizes Box ‘saio’ indicates the additional data offsets Additional sample data shall be in the same file as sample data Institut Mines-Télécom Random Access Points ‘sync’ box • • If absent, all samples are RAP (e.g. audio streams) If present, RAP are signaled (I-frame, IDR) Sample Groups • • • « rap »: non-IDR intra frames « roll »: nb samples to decode until perfect reconstruction is reached « prol »: audio, nb samples before this sample to decode to produce perfect reconstruction for this sample Independent and Disposable Samples • • • • is_leading: signaling for open GOP sample_depends_on: signaling for I frames or not sample_is_depended_on: signaling of reference frames sample_has_redundancy: redundant coding ─ Can be used to signal duplicated samples for text streams 39 Institut Mines-Télécom Dependencies between tracks Track reference: ‘tref’ • • track N uses or refers to track(s) K Examples ─ ‘hint’: hint track for one or more media track ─ ‘chap’: chapter track for the referenced media tracks ─ ‘scal’: scalable track using data from media tracks Track group: ‘trgr’ • Tracks in the same group share a common feature ─ e.g. may be subtitle tracks Track Selection: ‘tsel’ • Provides selection information for alternate tracks ─ E.g. for scalable tracks 40 Institut Mines-Télécom ISOBMFF for NALU-based streams (part 15) Principles • Reuses the ISOBMFF structures • Defines sample description entry format and sample format Sample format • Each NALU preceded by a 1, 2, or 4 bytes ─ Exact size given in the sample description entry • No start codes • Random Access Points ─ « SyncSample »: only IDR ─ SampleGroup ‘rap ’ for all the others 41 Institut Mines-Télécom AVC/HEVC specificities Sample description entry formats • ‘avc1’/’hvc1’ ─ All Parameter Sets in a box ‘avcC’/‘hvcC’ ─ Decoder configurations cannot be changed without a new ‘moov’! • ‘avc3’/‘hev1’ ─ Parameter Sets in IDR or in the sample description entry Specific boxes ‘avcC’ ou ‘hvcC’ • profile tier level, chroma format & bit depth • PPS & SPS lists • HEVC: + VPS & SEI messages 42 Institut Mines-Télécom Storage of Interactive Applications in ISO Static Interactive Application • Storage as metadata • Ex: HTML, XML, … Dynamic, Timed Application • Storage as track • Ex: ─ MPEG-4 Scene Description Streams, ─ 3GPP DIMS Streams Case of Images • Image as Tracks • Image as Metadata item 43 Institut Mines-Télécom