shithub: opus

--- a/doc/draft-terriberry-oggopus.xml

+++ b/doc/draft-terriberry-oggopus.xml

@@ -51,7 +51,7 @@

 </address>

 </author>

-<date day="3" month="July" year="2012"/>

+<date day="16" month="July" year="2012"/>

 <area>RAI</area>

 <workgroup>codec</workgroup>

@@ -141,7 +141,7 @@

  (ID) header, which uniquely identifies a stream as Opus audio.

 The format of this header is defined in <xref target="id_header"/>.

 It MUST be placed alone (without any other packet data) on the first page of

- the logical Ogg bitstream.

+ the logical Ogg bitstream, and must complete on that page.

 This page MUST have its 'beginning of stream' flag set.

 </t>

<t>

@@ -164,9 +164,9 @@

  logical Ogg bitstream.

 </t>

<t>

-The first N-1 Opus packets, if any, are packed one after another in sequence

- into the Ogg packet, using the self-delimiting framing from Appendix&nbsp;B

- of <xref target="RFCOpus"/>.

+The first N-1 Opus packets, if any, are packed one after another into the Ogg

+ packet, using the self-delimiting framing from Appendix&nbsp;B of

+ <xref target="RFCOpus"/>.

 The remaining Opus packet is packed at the end of the Ogg packet using the

  regular, undelimited framing from Section&nbsp;3 of <xref target="RFCOpus"/>.

 All of the Opus packets in a single Ogg packet MUST be constrained to have the

@@ -244,6 +244,7 @@

  not transmitted.

 </t>

+<section anchor="preskip" title="Pre-skip">

<t>

 There is some amount of latency introduced during the decoding process, to

  allow for overlap in the MDCT modes, stereo mixing in the LP modes, and

@@ -269,7 +270,9 @@

 This amount need not be a multiple of 2.5&nbsp;ms, may be smaller than a single

  packet, or may span the contents of several packets.

 </t>

+</section>

+<section anchor="pcm_sample_position" title="PCM Sample Position">

<t>

 The PCM sample position is determined from the granule position using the

  formula

@@ -306,7 +309,7 @@

<t>

 Vorbis streams use a granule position smaller than the number of audio samples

  contained in the first audio data page to indicate that some of those samples

- must be trimmed from the output. See <xref target="vorbis-trim"/>.

+ must be trimmed from the output (see <xref target="vorbis-trim"/>).

 However, to do so, Vorbis requires that the first audio data page contains

  exactly two packets, in order to allow the decoder to perform PCM position

  adjustments before needing to return any PCM data.

@@ -315,7 +318,9 @@

  large packets in streams with a very large number of channels might not fit on

  a single page.

 </t>

+</section>

+<section title="end_trimming" title="End Trimming">

<t>

 The page with the 'end of stream' flag set MAY have a granule position that

  indicates the page contains less audio data than would normally be returned by

@@ -330,7 +335,10 @@

 The number of discarded samples SHOULD be no larger than the number decoded

  from the last packet.

 </t>

+</section>

+<section anchor="start_granpos_restrictions"

+ title="Restrictions on the Initial Granule Position">

<t>

 The granule position of the first audio data page with a completed packet MAY

  be larger than the number of samples contained in packets that complete on

@@ -367,6 +375,32 @@

 </t>

 </section>

+<section anchor="seeking_and_preroll" title="Seeking and Pre-roll">

+<t>

+Seeking in Ogg files is best performed using a bisection search for a page

+ whose granule position corresponds to a PCM position at or before the seek

+ target.

+With appropriately weighted bisection, accurate seeking can be performed with

+ just three or four bisections even in multi-gigabyte files.

+See <xref target="seeking"/> for general implementation guidance.

+</t>

+<t>

+When seeking within an Ogg Opus stream, the decoder SHOULD start decoding (and

+ discarding the output) at least 3840&nbsp;samples (80&nbsp;ms) prior to the

+ seek target in order to ensure that the output audio is correct by the time it

+ reaches the seek target.

+This 'pre-roll' is separate from, and unrelated to, the 'pre-skip' used at the

+ beginning of the stream.

+If the point 80&nbsp;ms prior to the seek target comes before the initial PCM

+ sample position, the decoder SHOULD start decoding from the beginning of the

+ stream, applying pre-skip as normal, regardless of whether the pre-skip is

+ larger or smaller than 80&nbsp;ms.

+</t>

+</section>

+</section>

 <section anchor="headers" title="Header Packets">

<t>

 An Opus stream contains exactly two mandatory header packets.

@@ -473,12 +507,12 @@

 An Ogg Opus player SHOULD select the playback sample rate according to the

  following procedure:

 <list style="numbers">

-<t>If the hardware supports 48&nbsp;kHz playback, decode at 48&nbsp;kHz;</t>

-<t>Else, if the hardware's highest available sample rate is a supported

- rate, decode at this sample rate;</t>

-<t>Else, if the hardware's highest available sample rate is less than

- 48&nbsp;kHz, decode at the highest supported rate above this and resample;</t>

-<t>Else, decode at 48&nbsp;kHz and resample.</t>

+<t>If the hardware supports 48&nbsp;kHz playback, decode at 48&nbsp;kHz.</t>

+<t>Otherwise, if the hardware's highest available sample rate is a supported

+ rate, decode at this sample rate.</t>

+<t>Otherwise, if the hardware's highest available sample rate is less than

+ 48&nbsp;kHz, decode at the highest supported rate above this and resample.</t>

+<t>Otherwise, decode at 48&nbsp;kHz and resample.</t>

 </list>

 However, the 'Input Sample Rate' field allows the encoder to pass the sample

  rate of the original input stream as metadata.

@@ -542,9 +576,28 @@

  allowed channel count.

 The details are described in <xref target="channel_mapping"/>.

 </t>

+<t><spanx style="strong">Channel Mapping Table</spanx>:

+This table defines the mapping from encoded streams to output channels.

+It is omitted when the channel mapping family is 0, but REQUIRED otherwise.

+Its contents are specified in <xref target="channel_mapping"/>.

+</t>

 </list>

 </t>

+<t>

+All fields in the ID headers are REQUIRED, except for the channel mapping

+ table, which is omitted when the channel mapping family is 0.

+Implementations SHOULD reject ID headers which do not contain enough data for

+ these fields, even if they contain a valid Magic Signature.

+Future versions of this specification, even backwards-compatible versions,

+ might include additional fields in the ID header.

+If an ID header has a compatible major version, but a larger minor version,

+ an implementation MUST NOT reject it for containing additional data not

+ specified here.

+However, implementations MAY reject streams in which the ID header does not

+ complete on the first page.

+</t>

 <section anchor="channel_mapping" title="Channel Mapping">

<t>

 An Ogg Opus stream allows mapping one number of Opus streams (N) to a possibly

@@ -658,9 +711,8 @@

 <vspace blankLines="1"/>

 Allowed numbers of channels: 1...8.<vspace/>

 Channel meanings depend on the number of channels.

-See <xref target="vorbis-mapping">the

- Vorbis mapping</xref> for the assignments from output channel number to

- specific speaker locations.

+See <xref target="vorbis-mapping"/> for the assignments from output channel

+ number to specific speaker locations.

 <vspace blankLines="1"/>

 </t>

 <t>Family&nbsp;255 (no defined channel meaning):

@@ -756,13 +808,13 @@

 <t><spanx style="strong">Vendor String</spanx> (variable length, UTF-8 vector):

 <vspace blankLines="1"/>

 This is a simple human-readable tag for vendor information, encoded as a UTF-8

- string.

+ string&nbsp;<xref target="RFC3629"/>.

 No terminating NUL octet is required.

 <vspace blankLines="1"/>

 This tag is intended to identify the codec encoder and encapsulation

- implementations, for tracing differences in technical behavior. The

- user-facing encoding application can use the 'ENCODER' user commment

- tag name to identify themselves.

+ implementations, for tracing differences in technical behavior.

+The user-facing encoding application can use the 'ENCODER' user commment tag

+ name to identify themselves.

 <vspace blankLines="1"/>

 </t>

 <t><spanx style="strong">User Comment List Length</spanx> (32 bits, unsigned,

@@ -795,6 +847,17 @@

 </t>

<t>

+The vendor string length and user comment list length are REQUIRED, and

+ implementations SHOULD reject comment headers that do not contain enough data

+ for these fields, or that do not contain enough data for the corresponding

+ vendor string or user comments they describe.

+Making this check before allocating the associated memory to contain the data

+ may help prevent a possible Denial-of-Service (DoS) attack from small comment

+ headers that claim to contain strings longer than the entire packet or more

+ user comments than than could possibly fit in the packet.

+</t>

+<t>

 The user comment strings follow the NAME=value format described by

  <xref target="vorbis-comment"/> with the same recommended tag names.

 One new comment tag is introduced for Ogg Opus:

@@ -836,20 +899,12 @@

 That information should instead be stored in the ID header's 'output gain'

  field.

 </t>

 </section>

 </section>

-<section anchor="other_implementation_notes"

- title="Other Implementation Notes">

+<section anchor="packet_size_limits" title="Packet Size Limits">

<t>

-When seeking within an Ogg Opus stream, the decoder should start decoding (and

- discarding the output) at least 3840&nbsp;samples (80&nbsp;ms) prior to the

- seek point in order to ensure that the output audio is correct at the seek

- point.

-</t>

-<t>

 Technically valid Opus packets can be arbitrarily large due to the padding

  format, although the amount of non-padding data they can contain is bounded.

 These packets might be spread over a similarly enormous number of Ogg pages.

@@ -978,6 +1033,7 @@

 <references title="Normative References">

 <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml"?>

+<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3629.xml"?>

 <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3533.xml"?>

 <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.5334.xml"?>

 <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.6381.xml"?>

@@ -1033,6 +1089,16 @@

 <!--?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3550.xml"?-->

 <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.4732.xml"?>

+<reference anchor="seeking"

+ target="http://wiki.xiph.org/Seeking">

+<front>

+<title>Granulepos Encoding and How Seeking Really Works</title>

+<author initials="S." surname="Pfeiffer" fullname="Silvia Pfeiffer"/>

+<author initials="C." surname="Parker" fullname="Conrad Parker"/>

+<author initials="G." surname="Maxwell" fullname="Greg Maxwell"/>

+</front>

+</reference>

 <reference anchor="replay-gain"

  target="http://wiki.xiph.org/VorbisComment#Replay_Gain">