shithub: opus

Download patch

ref: b3744613b7aa0d56913950740bc24db75c807a68
parent: 3527f9d4c4a15e2303a19d0a72564a32383d3a50
author: Timothy B. Terriberry <[email protected]>
date: Mon Jul 16 09:17:27 EDT 2012

Updates from mailing list and other small fixes.

* Bump the document date.
* Mandate that the ID header must complete on the first page (to
   remove any ambiguities about this requirement in RFC 3533).
* Remove reundant wording that rillian forgot to remove in 360a4117.
* Split the "Granule Position" section into subsections.
* Move the first paragraph of the "Other Implementation Notes"
   section into the "Granule Position" section, add general seeking
   implementation guidance, and be specific about the interaction
   between pre-roll and pre-skip.
* Retitle the remaining contents of the "Other Implementation Notes"
   section to "Packet Size Limits"
* Specify that all the header fields are REQUIRED (and add a
   description of the Channel Mapping Table as a whole, so we can
   say when it is REQUIRED).
* Specify that implementations MUST NOT reject headers with extra
   data if they have an unknown minor version number.
* Add a reference to RFC 3629 (UTF-8).
* Minor formatting adjustments to vorbis-trim and vorbis-mapping
   cites.
* Eliminate semicolons and terrible "Else, if" constructs.

--- a/doc/draft-terriberry-oggopus.xml
+++ b/doc/draft-terriberry-oggopus.xml
@@ -51,7 +51,7 @@
 </address>
 </author>
 
-<date day="3" month="July" year="2012"/>
+<date day="16" month="July" year="2012"/>
 <area>RAI</area>
 <workgroup>codec</workgroup>
 
@@ -141,7 +141,7 @@
  (ID) header, which uniquely identifies a stream as Opus audio.
 The format of this header is defined in <xref target="id_header"/>.
 It MUST be placed alone (without any other packet data) on the first page of
- the logical Ogg bitstream.
+ the logical Ogg bitstream, and must complete on that page.
 This page MUST have its 'beginning of stream' flag set.
 </t>
 <t>
@@ -164,9 +164,9 @@
  logical Ogg bitstream.
 </t>
 <t>
-The first N-1 Opus packets, if any, are packed one after another in sequence
- into the Ogg packet, using the self-delimiting framing from Appendix&nbsp;B
- of <xref target="RFCOpus"/>.
+The first N-1 Opus packets, if any, are packed one after another into the Ogg
+ packet, using the self-delimiting framing from Appendix&nbsp;B of
+ <xref target="RFCOpus"/>.
 The remaining Opus packet is packed at the end of the Ogg packet using the
  regular, undelimited framing from Section&nbsp;3 of <xref target="RFCOpus"/>.
 All of the Opus packets in a single Ogg packet MUST be constrained to have the
@@ -244,6 +244,7 @@
  not transmitted.
 </t>
 
+<section anchor="preskip" title="Pre-skip">
 <t>
 There is some amount of latency introduced during the decoding process, to
  allow for overlap in the MDCT modes, stereo mixing in the LP modes, and
@@ -269,7 +270,9 @@
 This amount need not be a multiple of 2.5&nbsp;ms, may be smaller than a single
  packet, or may span the contents of several packets.
 </t>
+</section>
 
+<section anchor="pcm_sample_position" title="PCM Sample Position">
 <t>
 The PCM sample position is determined from the granule position using the
  formula
@@ -306,7 +309,7 @@
 <t>
 Vorbis streams use a granule position smaller than the number of audio samples
  contained in the first audio data page to indicate that some of those samples
- must be trimmed from the output. See <xref target="vorbis-trim"/>.
+ must be trimmed from the output (see <xref target="vorbis-trim"/>).
 However, to do so, Vorbis requires that the first audio data page contains
  exactly two packets, in order to allow the decoder to perform PCM position
  adjustments before needing to return any PCM data.
@@ -315,7 +318,9 @@
  large packets in streams with a very large number of channels might not fit on
  a single page.
 </t>
+</section>
 
+<section title="end_trimming" title="End Trimming">
 <t>
 The page with the 'end of stream' flag set MAY have a granule position that
  indicates the page contains less audio data than would normally be returned by
@@ -330,7 +335,10 @@
 The number of discarded samples SHOULD be no larger than the number decoded
  from the last packet.
 </t>
+</section>
 
+<section anchor="start_granpos_restrictions"
+ title="Restrictions on the Initial Granule Position">
 <t>
 The granule position of the first audio data page with a completed packet MAY
  be larger than the number of samples contained in packets that complete on
@@ -367,6 +375,32 @@
 </t>
 </section>
 
+<section anchor="seeking_and_preroll" title="Seeking and Pre-roll">
+<t>
+Seeking in Ogg files is best performed using a bisection search for a page
+ whose granule position corresponds to a PCM position at or before the seek
+ target.
+With appropriately weighted bisection, accurate seeking can be performed with
+ just three or four bisections even in multi-gigabyte files.
+See <xref target="seeking"/> for general implementation guidance.
+</t>
+
+<t>
+When seeking within an Ogg Opus stream, the decoder SHOULD start decoding (and
+ discarding the output) at least 3840&nbsp;samples (80&nbsp;ms) prior to the
+ seek target in order to ensure that the output audio is correct by the time it
+ reaches the seek target.
+This 'pre-roll' is separate from, and unrelated to, the 'pre-skip' used at the
+ beginning of the stream.
+If the point 80&nbsp;ms prior to the seek target comes before the initial PCM
+ sample position, the decoder SHOULD start decoding from the beginning of the
+ stream, applying pre-skip as normal, regardless of whether the pre-skip is
+ larger or smaller than 80&nbsp;ms.
+</t>
+</section>
+
+</section>
+
 <section anchor="headers" title="Header Packets">
 <t>
 An Opus stream contains exactly two mandatory header packets.
@@ -473,12 +507,12 @@
 An Ogg Opus player SHOULD select the playback sample rate according to the
  following procedure:
 <list style="numbers">
-<t>If the hardware supports 48&nbsp;kHz playback, decode at 48&nbsp;kHz;</t>
-<t>Else, if the hardware's highest available sample rate is a supported
- rate, decode at this sample rate;</t>
-<t>Else, if the hardware's highest available sample rate is less than
- 48&nbsp;kHz, decode at the highest supported rate above this and resample;</t>
-<t>Else, decode at 48&nbsp;kHz and resample.</t>
+<t>If the hardware supports 48&nbsp;kHz playback, decode at 48&nbsp;kHz.</t>
+<t>Otherwise, if the hardware's highest available sample rate is a supported
+ rate, decode at this sample rate.</t>
+<t>Otherwise, if the hardware's highest available sample rate is less than
+ 48&nbsp;kHz, decode at the highest supported rate above this and resample.</t>
+<t>Otherwise, decode at 48&nbsp;kHz and resample.</t>
 </list>
 However, the 'Input Sample Rate' field allows the encoder to pass the sample
  rate of the original input stream as metadata.
@@ -542,9 +576,28 @@
  allowed channel count.
 The details are described in <xref target="channel_mapping"/>.
 </t>
+<t><spanx style="strong">Channel Mapping Table</spanx>:
+This table defines the mapping from encoded streams to output channels.
+It is omitted when the channel mapping family is 0, but REQUIRED otherwise.
+Its contents are specified in <xref target="channel_mapping"/>.
+</t>
 </list>
 </t>
 
+<t>
+All fields in the ID headers are REQUIRED, except for the channel mapping
+ table, which is omitted when the channel mapping family is 0.
+Implementations SHOULD reject ID headers which do not contain enough data for
+ these fields, even if they contain a valid Magic Signature.
+Future versions of this specification, even backwards-compatible versions,
+ might include additional fields in the ID header.
+If an ID header has a compatible major version, but a larger minor version,
+ an implementation MUST NOT reject it for containing additional data not
+ specified here.
+However, implementations MAY reject streams in which the ID header does not
+ complete on the first page.
+</t>
+
 <section anchor="channel_mapping" title="Channel Mapping">
 <t>
 An Ogg Opus stream allows mapping one number of Opus streams (N) to a possibly
@@ -658,9 +711,8 @@
 <vspace blankLines="1"/>
 Allowed numbers of channels: 1...8.<vspace/>
 Channel meanings depend on the number of channels.
-See <xref target="vorbis-mapping">the
- Vorbis mapping</xref> for the assignments from output channel number to
- specific speaker locations.
+See <xref target="vorbis-mapping"/> for the assignments from output channel
+ number to specific speaker locations.
 <vspace blankLines="1"/>
 </t>
 <t>Family&nbsp;255 (no defined channel meaning):
@@ -756,13 +808,13 @@
 <t><spanx style="strong">Vendor String</spanx> (variable length, UTF-8 vector):
 <vspace blankLines="1"/>
 This is a simple human-readable tag for vendor information, encoded as a UTF-8
- string.
+ string&nbsp;<xref target="RFC3629"/>.
 No terminating NUL octet is required.
 <vspace blankLines="1"/>
 This tag is intended to identify the codec encoder and encapsulation
- implementations, for tracing differences in technical behavior. The
- user-facing encoding application can use the 'ENCODER' user commment
- tag name to identify themselves.
+ implementations, for tracing differences in technical behavior.
+The user-facing encoding application can use the 'ENCODER' user commment tag
+ name to identify themselves.
 <vspace blankLines="1"/>
 </t>
 <t><spanx style="strong">User Comment List Length</spanx> (32 bits, unsigned,
@@ -795,6 +847,17 @@
 </t>
 
 <t>
+The vendor string length and user comment list length are REQUIRED, and
+ implementations SHOULD reject comment headers that do not contain enough data
+ for these fields, or that do not contain enough data for the corresponding
+ vendor string or user comments they describe.
+Making this check before allocating the associated memory to contain the data
+ may help prevent a possible Denial-of-Service (DoS) attack from small comment
+ headers that claim to contain strings longer than the entire packet or more
+ user comments than than could possibly fit in the packet.
+</t>
+
+<t>
 The user comment strings follow the NAME=value format described by
  <xref target="vorbis-comment"/> with the same recommended tag names.
 One new comment tag is introduced for Ogg Opus:
@@ -836,20 +899,12 @@
 That information should instead be stored in the ID header's 'output gain'
  field.
 </t>
-
 </section>
 
 </section>
 
-<section anchor="other_implementation_notes"
- title="Other Implementation Notes">
+<section anchor="packet_size_limits" title="Packet Size Limits">
 <t>
-When seeking within an Ogg Opus stream, the decoder should start decoding (and
- discarding the output) at least 3840&nbsp;samples (80&nbsp;ms) prior to the
- seek point in order to ensure that the output audio is correct at the seek
- point.
-</t>
-<t>
 Technically valid Opus packets can be arbitrarily large due to the padding
  format, although the amount of non-padding data they can contain is bounded.
 These packets might be spread over a similarly enormous number of Ogg pages.
@@ -978,6 +1033,7 @@
 <references title="Normative References">
 
 <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml"?>
+<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3629.xml"?>
 <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3533.xml"?>
 <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.5334.xml"?>
 <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.6381.xml"?>
@@ -1033,6 +1089,16 @@
 
 <!--?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3550.xml"?-->
 <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.4732.xml"?>
+
+<reference anchor="seeking"
+ target="http://wiki.xiph.org/Seeking">
+<front>
+<title>Granulepos Encoding and How Seeking Really Works</title>
+<author initials="S." surname="Pfeiffer" fullname="Silvia Pfeiffer"/>
+<author initials="C." surname="Parker" fullname="Conrad Parker"/>
+<author initials="G." surname="Maxwell" fullname="Greg Maxwell"/>
+</front>
+</reference>
 
 <reference anchor="replay-gain"
  target="http://wiki.xiph.org/VorbisComment#Replay_Gain">