File Formats-ISOBMFF - M2R Multimedia Networking

Transcription

File Formats-ISOBMFF - M2R Multimedia Networking
Multimedia Networking:
The ISO Base Media File Format
Family of Standards
Cyril Concolato
1
Institut Mines-Télécom
History
 Development of the MPEG-4 Standard (1998)
•
•
•
Requirement for support for more streams than just 1
audio & 1 video
Call for Proposals
Selected Base Technology: QuickTime
 Development of Motion JPEG-2000 Standard
•
•
Call for Proposals
Selected Base Technology: QuickTime
 Consequence
•
Development of a common standard:
─ The ISO Base Media File Format
─ based on QuickTime File Format
 Successful standard
•
2
MPEG, JPEG, Flash, 3GPP, Microsoft, Sony, …
Institut Mines-Télécom
Multiple Standard Specifications
MPEG-4
Part 12
ISOBMFF
MPEG-4
Part 14
(MP4)
MPEG-4
Part 15
(AVC,
HEVC)
MPEG-21
Part 9
(MP21)
MPEGDASH
=
JPEG-2000
Part 12
MPEG-4
Part 30
(WebVTT,
TTML)
3GPP /
3GPP2
 Core standard extended for
•
•
•
•
•
3
Specific MPEG-4 Systems storage
Specific H264 | AVC, SVC, MVC, HEVC requirements
Storage of XML or non-timed data
HTTP Streaming features (MPEG-DASH)
Storage of subtitles
Institut Mines-Télécom
Flash
(F4V)
Identifying ISOBMFF files

Extension not sufficient
•

mp4, m4a, m4s, 3gp, f4v, ism, …
Magic number/sniffing:
•
•
mandatory structure in some files: ‘ftyp’ or ‘styp’
‘ftyp’/’styp’ indicate “brands”
─ Major brand: “best use” of the file
─ Compatible brands
•

Example: “isom”, “avc1”, “isoX” (X=2…9), “mp41”, “mp71”, …
MIME types (RFC 6381)
•
•
•
•
“video/mp4”: if it contains visual data
“audio/mp4”: otherwise, if it contains audio,
“application/mp4”: otherwise (in particular metadata, …)
Sub-parameters
─ “codecs”
•
•
•
Comma-separated list of track information
Uses the sample entry 4cc: “avc1”, “mp4a”, “stpp”,
Additional codec-specific information (profiles, levels …)
─ “profiles”
•
6
brands
Institut Mines-Télécom
ISOBMFF: Principles
 Logical Structure
• Information organization
 Physical Structure
• Byte organization
7
Institut Mines-Télécom
ISOBMFF: Logical Structure (1/4)
 A file
• Contains
─ Streamable timed data in tracks of a movie
─ Other data (untimed or non streamable) in items
─ Or a combination of both
• Defines a common timeline for all tracks for
synchronization
• Declares its type and its compatibility
8
Institut Mines-Télécom
ISOBMFF: Logical Structure (2/4)
 An item
• Represents data consumed as a whole and valid for
the entire duration of the movie
─ E.g. metadata: copyright information
─ E.g cover image
• May have associated properties (type, position, size
…)
• May be encrypted and/or compressed
• If multiple items are used, the primary item is the entry
point
9
Institut Mines-Télécom
ISOBMFF: Logical Structure (3/4)
 A track
•
Corresponds to timed data of a specific type,
─ typically media data for a given codec (elementary stream)
⚠ Some tracks may not contain media data (e.g. hint tracks)
•
Is decomposed into samples
─ Of variable sizes and durations
•
Is associated to a single decoder
⚠ except for scalable codecs (N tracks – 1 decoder)
•
Has associated decoder configuration(s): sample
description(s)
─ Can vary over time
•
•
•
•
10
May be linked, grouped or alternative to other tracks
May have associated untimed data in items
May be encrypted
May have metadata or user data attached
Institut Mines-Télécom
ISOBMFF: Logical Structure (4/4)
 A sample
• Represents data used by a decoder at given times
(DTS, CTS) in the media timeline
• Has properties (size, position, random access, decoder
configuration…)
• May be described in terms of sub-samples
─ E.g. an HEVC/AVC AU decomposed into NAL units
• May be associated to similar samples in sample groups
─ E.g. samples that have the same dependencies on an
other sample
• May have sample-specific auxiliary information
─ E.g. samples that use the same encryption key
11
Institut Mines-Télécom
ISOBMFF: Physical Structure
 Data is stored in a basic structure called box
•
No data outside of a box
 Each box has
•
•
•
•
length (4 or 8 bytes),
type (4 printable chars),
possibly version and flags,
and type-specific data
 Extensible format
•
Unknown boxes can be skipped (syntactically)
 Header information is a hierarchical set of boxes
•
12
typically rooted by a ‘moov’ or ‘meta’ box
Institut Mines-Télécom
Important Boxes






ftyp (File Type)
• One per file
• Version
• Compatibility with other file types
mdat (Media Data):
• N per file
moov (Movie):
• High-level Box for signalling
information for the movie
mvhd (Movie Header):
• Generic information about the
movie
trak (Track):
• High-level Box for signaling info for
the track
meta (Metadata):
• High-level Box for signaling info for
metadata
And many more …
13
Institut Mines-Télécom






hdlr (Handler)
• Indication of the type of the track or the type of
metadata
dinf/dref (Data Information/Data Reference)
• Location of media data (local vs. remote)
stbl (Sample Table)
• High level box for sample signalling
information
stsd (Sample Description)
• Low-level box for signalling elementary stream
configuration
stts (Sample To Time)
• Low-level box for signalling DTS for each
sample
• Run-Length encoded
stsz (Sample To Size)
• Low-level box for signalling the size of each
sample
• Run-Length encoded
Typical Box Hierarchy (1 track)
ISO File
ftyp
mvhd
vmhd
trak
iods
tkhd
mdia
...
mdhd
minf
hdlr
dinf
dref
14
mdat
moov
Data not
box-structured
…
stbl
stsd
Institut Mines-Télécom
stts
stsz
…
…
Typical Box Hierarchy (Untimed data)
ISO File
ftyp
hdlr
meta
iinf
mdat
iloc
ipro
sinf
15
Institut Mines-Télécom
…
Dual Headed File: Timed and Untimed data
File
ftyp
vmhd
mdat
mvhd
trak
tkhd
mdia
tref
mdhd
minf
hdlr
meta
hdlr
dinf
dref
page 17
moov
iinf
iloc
sinf
stbl
stsd
Institut Mines-Télécom
stts
ctts
ipro
stsc
stsz
…
ISOBMFF: Physical Structure
Separate Data vs. Header storage
Header information describing the properties of samples, of tracks is stored
separately from the sample data (header vs payload)
Track
Movie header
header Track
header
Movie
Media
Data
18
sample
Institut Mines-Télécom
Video track information
Audio track information
sample
frame
sample
sample
frame
18
Media data storage

Unstructured, bag of bytes
•

Can’t extract sample data easily without using header information
Data bytes are stored in one or more boxes of specific types
•
•
mainly ‘mdat’ (or ‘idat’ for item data in some cases)
Usually in the same file as the ‘moov’ or ‘meta’ header
─ may also be stored in a separate file that doesn’t use boxes

Sample data storage
•
The bytes of a sample are stored contiguously
─ (except if extractors are used, eg SVC)
•
Bytes of multiple time-contiguous samples can be stored contiguously (called
chunks)
─ Chunks of different tracks can be interleaved

Item data storage
•
The bytes of an item can be stored in separate byte ranges
(called extents)
─ Item data and media data can be interleaved
19
Institut Mines-Télécom
Physical Storage of Samples
Meta-data
Box type:
moov
Box type: mvhd Box type: iods
Creation date
Duration …
Box type: trak
ES descriptors…
Media-data
Box type:
mdat
Box type: tkhd
TrackID
Width, height …
Box type:
mdia
Box type: minf
AUj
Box type: stbl
Box type: stsd
Box type:
mp4v
Box type: esds
Dec config Descr
SL Config Descr…
20
Institut Mines-Télécom
Box type: stco
…
Chunk i
Chunk i+1
…
AUk
AUk+1
AUl
Chunk (i+1)
Progressive Download of ISO Files
 Interleaving of Chunks
• N seconds of audio, N seconds of video, …
 Constrained box order
• ‘ftyp’, ‘moov’, ‘meta’ … first
• Optional ‘pdin’ box for informing on initial delay before
playback
• ‘mdat’ last
 Limitations
• Cannot cover live cases as the whole ‘moov’ box is
written first and only 1 ‘moov’ box allowed per file
21
Institut Mines-Télécom
Movie Fragments
 Single ‘moov’ box limitation
•
•
Cannot be written until the entire data is known
Size
 New requirements
•
Being able to flush regularly the header upon recording
 Introducing « Movie Fragments »
•
Initial structure (moov + N trak)
─ Slightly modified
─ Adding indication that additional movie fragments exist
•
Followed by a sequence of new boxes for signalling
─ moof + N traf
─ Each traf is made of K trun
•
22
Additional data for fragments still stored in mdat boxes
Institut Mines-Télécom
Movie Fragments Box Hierarchy
ISO File
ftyp
moov
mdat
moof
mdat
mfhd
mvex
mehd
traf
…
traf
trex
tfhd
…
trun
…
trex
trun
23
Institut Mines-Télécom
Structure of a fragmented file
moov
Video track
Audio track
trak
trak
trex
trex
Movie
mdat
moof
traf
traf
trun
trun
trun
Movie Fragment
mdat
…
moof
traf
traf
trun
trun
mdat
24
Institut Mines-Télécom
trun
Movie Fragment
Sample information in fragmented files
 Header compression
• Default sample information
─ Duration, size, sample description
─ sample properties (sync, depended, …)
─ « sample_flags »
• Possibly coded in 3 places:
─ moov/mvex/trex: default for all samples in the track
─ moof/traf/tfhd: default for all truns in the moof
─ moof/traf/trun: default for the first sample
 Practically
• Smooth Streaming imposes defaults to be in tfhd
─ No factorization between movie fragments
25
Institut Mines-Télécom
Hint Tracks
 Special Track
•
•
•
•
Dedicated to a delivery protocol
Stores timed instructions to create protocol packets
Linked to a media track
Not (necessarily) containing media data
 Examples
• Hint track for streaming of MP4 using RTP
• Hint track for delivery over FLUTE
 Variant: Reception Hint Track
• Storage of delivery streams in ISO (MPEG-2 TS and
RTP)
26
Institut Mines-Télécom
Hint track and other tracks
Movie (Signalling)
Video track
trak
moov
Hint track
trak
Sample Data
sample
hint
mdat
header
pointer
27
Institut Mines-Télécom
sample
sample
frame
hint
header
pointer
sample
frame
ISOBMFF: File Types & Usages (1/2)
 Plain Files
•
•
Simple recording of timed data (data first, header last)
ISOBMFF Tools: mdat, moov, …
 Progressive Files
•
•
Progressive download and playback (Header first, data
last and interleaved)
ISOBMFF Tools: storage using multiple interleaved
chunks
 Fragmented Files
•
•
28
Files for long-running recording sessions (multiple blocks
of header and data)
ISOBMFF Tools: Movie fragments
Institut Mines-Télécom
ISOBMFF:
File Types & Usages (2/2)
 Segmented Files
•
•
Self-contained fragments stored in single file or in separate files
for HTTP streaming
ISOBMFF Tools: segment files, indexing
 Packaging files
•
•
Storage of related timed or untimed data (e.g. JPEG or XML +
audio/video)
ISOBMFF Tools: ‘meta’ box
 Files for generating transport-related streams
•
•
Protocol-specific instructions to create streams from files
ISOBMFF Tools: hint tracks (RTP, FLUTE, …)
 Files recording transport-related streams
•
•
29
Recording of protocol-specific packets into files for replay
ISOBMFF Tools: reception hint tracks (RTP, MPEG-2 TS)
Institut Mines-Télécom
ISOBMFF Timelines
 Media Timeline
 Movie Timeline
30
Institut Mines-Télécom
ISOBMFF Media Timeline
 Expressing times of each AU in its track
 Boxes involved
•
•
•
•
•
•
31
“mdhd”: gives the timescale
“stts”: provides the DTS for each sample
“ctts”: provides CTS for each sample, when needed
“cslg”:additional info for specific CTS/DTS config
“tfdt”: time anchor for movie fragments
“trun”: timing for movie fragments relative to tfdt
Institut Mines-Télécom
Basic Media Timeline Information
 “stts” / “trun” coding
• DTS values given as sample_delta values
─ DTS(0) = 0
─ DTS(i) = SUM(sample_delta, 0, i-1)
• Values are run-length encoded
 “ctts” / “trun” coding
• Offset from the DTS
─ Box version 0: offset >=0
─ Box version 1: offset >=0 or < 0
• CTS = DTS + Composition Offset
• Values are run-length encoded
32
Institut Mines-Télécom
Times in Movie Fragments
 Default behavior
• TM(0) = 0
• TM(i+1) = TM(i) + Duration(i)
 Special case
• In some scenarios (e.g. DASH), upon receiving of a
first fragment, the end time of the previous sample is
unknown!
• Need to insert the media time of the first sample in a
fragment
─ TrackFragmentBaseMediaDecodeTime tfdt
─ If tfdt > TM(i), the duration of sample i-1 is extended
• Useful for variable frame rate streams, e.g. for subtitles
33
Institut Mines-Télécom
ISOBMFF Movie Timeline
 Expresses times for the presentation of the movie from its tracks
•
•
Main use for multi-tracks synchronization
─ E.g. CTStrack1(AU0)  CTStrack2(AU0)
Useful for single track presentation when not all samples are
presented
─ E.g. start presenting track 1 from the 3rd second
 Boxes involved
•
•
•
“mvhd”: gives the timescale for movie times
“edts”: gives instructions for presenting a track
“sidx”: timing for random access points in segments
 Track media times (i.e. CTS) are mapped onto the Movie Timeline
•
Default: simple, linear mapping
T(movie, track(i), sample(k)) =
CTS(track(i), sample(k)) * mvhd.timescale/mdhd.timescale
•
34
Complex, non-linear mapping with the concept of Edit List
Institut Mines-Télécom
ISOBMFF Edit List
 List of edit entries
•
type, media time, movie duration
 Types of edit entries
•
•
•
dwell (frame freeze),
empty (no presentation),
Normal playback (with possible speed changes)
 Media times do not necessarily align with frame start
times !
empty
0
normal
T1
dwell
T2
normal
T(movie, track(i))
T3
CTS (track(i))
0
35
Institut Mines-Télécom
Typical Edit List example
 Use case: File with 1 audio and 1 video stream
•
Case 1: video with B frames
─ Presentation timestamp of the first video frame:
•
•
CTSvideo(AU0) >0
Case 2: first audio AU not produced at the same time as the first
video AU
─ For each track DTS(AU0) is by definition 0!
 Option 1: audio shift
•
•
Video track: no edit list
Audio track:
─ One « empty » edit list entry from 0 to T(movie, F0)
─ One « normal » entry from CTS=0 to the end of the track
•
Drawback: need to apply the edit for every new track (subtitles …)
 Option 2: video shift
•
•
36
Audio track: no edit list
Video track: one « normal » entry from CTS(F0) to the end of the
media
Institut Mines-Télécom
Sample Groups
 Mechanism to store additional sample signaling
•
A box providing descriptors (sgpd)
─ Located in stbl: global for the entire track
─ Located in traf: applicable only for the fragment
•
A box providing which descriptor apply to which sample (sbgp)
─ Located in stbl
─ Located in traf
 Grouping types
•
Each descriptor/group is identified by a 4CC
─ Ex: ‘rap ’, ‘roll’, ’prol’
•
Contains several entries
─ Ex: one entry per « roll » distance
•
The mapping sample->group can be parametrized
─ ex: view priority in MVC
 In practice
•
37
Smooth Streaming imposes that descriptions be on fragments only
Institut Mines-Télécom
Sample Auxiliary Information
 Possibility to add data associated to a sample
• Not used directly by the decoder
─ Ex: data for decrypting the sample
• Without having to define a new track
• Can be used in fragmented and non-fragmented mode
 Tools
•
•
•
•
38
Data type indicated by a 4CC (ex: « cenc »)
Box ‘saiz’ indicates the additional data sizes
Box ‘saio’ indicates the additional data offsets
Additional sample data shall be in the same file as
sample data
Institut Mines-Télécom
Random Access Points
 ‘sync’ box
•
•
If absent, all samples are RAP (e.g. audio streams)
If present, RAP are signaled (I-frame, IDR)
 Sample Groups
•
•
•
« rap »: non-IDR intra frames
« roll »: nb samples to decode until perfect reconstruction is
reached
« prol »: audio, nb samples before this sample to decode to
produce perfect reconstruction for this sample
 Independent and Disposable Samples
•
•
•
•
is_leading: signaling for open GOP
sample_depends_on: signaling for I frames or not
sample_is_depended_on: signaling of reference frames
sample_has_redundancy: redundant coding
─ Can be used to signal duplicated samples for text streams
39
Institut Mines-Télécom
Dependencies between tracks
 Track reference: ‘tref’
•
•
track N uses or refers to track(s) K
Examples
─ ‘hint’: hint track for one or more media track
─ ‘chap’: chapter track for the referenced media tracks
─ ‘scal’: scalable track using data from media tracks
 Track group: ‘trgr’
•
Tracks in the same group share a common feature
─ e.g. may be subtitle tracks
 Track Selection: ‘tsel’
•
Provides selection information for alternate tracks
─ E.g. for scalable tracks
40
Institut Mines-Télécom
ISOBMFF for NALU-based streams (part 15)
 Principles
• Reuses the ISOBMFF structures
• Defines sample description entry format and sample format
 Sample format
• Each NALU preceded by a 1, 2, or 4 bytes
─ Exact size given in the sample description entry
• No start codes
• Random Access Points
─ « SyncSample »: only IDR
─ SampleGroup ‘rap ’ for all the others
41
Institut Mines-Télécom
AVC/HEVC specificities
 Sample description entry formats
• ‘avc1’/’hvc1’
─ All Parameter Sets in a box ‘avcC’/‘hvcC’
─ Decoder configurations cannot be changed without a new
‘moov’!
• ‘avc3’/‘hev1’
─ Parameter Sets in IDR or in the sample description entry
 Specific boxes ‘avcC’ ou ‘hvcC’
• profile tier level, chroma format & bit depth
• PPS & SPS lists
• HEVC: + VPS & SEI messages
42
Institut Mines-Télécom
Storage of Interactive Applications in ISO
 Static Interactive Application
• Storage as metadata
• Ex: HTML, XML, …
 Dynamic, Timed Application
• Storage as track
• Ex:
─ MPEG-4 Scene Description Streams,
─ 3GPP DIMS Streams
 Case of Images
• Image as Tracks
• Image as Metadata item
43
Institut Mines-Télécom