shithub: opus

Download patch

ref: f26c35306410228f3870e136502620b4177b9112
parent: 81bb160d17c9542ad9cfba05067371b618d1719f
author: Timothy B. Terriberry <[email protected]>
date: Tue Feb 16 13:05:10 EST 2016

oggopus: Address Stephen Farrell's IESG comments.

- Clarify that 125,829,120 is just 120 MB.
- Add a figure to Section 3 of an example logical stream.
- Add a reference for Q notation.
- Refer to the downmixing figures in the text.
- Clarify that user comments are UTF-8.
- Clarify that the -573 and 111 gain values are examples.
- Add specific advice to implementors on areas that have security
   implications.

--- a/doc/draft-ietf-codec-oggopus.xml
+++ b/doc/draft-ietf-codec-oggopus.xml
@@ -164,8 +164,32 @@
 
 <section anchor="packet_organization" title="Packet Organization">
 <t>
-An Ogg Opus stream is organized as follows.
+An Ogg Opus stream is organized as follows (see
+ <xref target="packet-org-example"/> for an example).
 </t>
+
+<figure anchor="packet-org-example"
+ title="Example packet organization for a logical Ogg Opus stream"
+ align="center">
+<artwork align="center"><![CDATA[
+    Page 1         Pages 2 ... n        Pages (n+1) ...
+ +------------+ +---+ +---+ ... +---+ +-----------+ +---------+ +--
+ |            | |   | |   |     |   | |           | |         | |
+ |+----------+| |+-----------------+| |+-------------------+ +-----
+ |||ID Header|| ||  Comment Header || ||Audio Data Packet 1| | ...
+ |+----------+| |+-----------------+| |+-------------------+ +-----
+ |            | |   | |   |     |   | |           | |         | |
+ +------------+ +---+ +---+ ... +---+ +-----------+ +---------+ +--
+ ^      ^                           ^
+ |      |                           |
+ |      |                           Mandatory Page Break
+ |      |
+ |      ID header is contained on a single page
+ |
+ 'Beggining Of Stream'
+]]></artwork>
+</figure>
+
 <t>
 There are two mandatory header packets.
 The first packet in the logical Ogg bitstream MUST contain the identification
@@ -739,7 +763,8 @@
 This is a gain to be applied when decoding.
 It is 20*log10 of the factor by which to scale the decoder output to achieve
  the desired playback volume, stored in a 16-bit, signed, two's complement
- fixed-point value with 8 fractional bits (i.e., Q7.8).
+ fixed-point value with 8 fractional bits (i.e.,
+ Q7.8&nbsp;<xref target="q-notation"/>).
 <vspace blankLines="1"/>
 To apply the gain, an implementation could use
 <figure align="center">
@@ -991,9 +1016,12 @@
 </t>
 
 <t>
-Implementations MAY use the following matrices to implement downmixing from
- multichannel files using <xref target="channel_mapping_1">Channel Mapping
- Family 1</xref>, which are known to give acceptable results for stereo.
+Implementations MAY use the matrices in
+ Figures&nbsp;<xref target="downmix-matrix-3" format="counter"/>
+ through&nbsp;<xref target="downmix-matrix-8" format="counter"/> to implement
+ downmixing from multichannel files using
+ <xref target="channel_mapping_1">Channel Mapping Family 1</xref>, which are
+ known to give acceptable results for stereo.
 Matrices for 3 and 4 channels are normalized so each coefficient row sums
  to 1 to avoid clipping.
 For 5 or more channels they are normalized to 2 as a compromise between
@@ -1210,7 +1238,8 @@
 </t>
 <t>User Comment #i String (variable length, UTF-8 vector):
 <vspace blankLines="1"/>
-This field contains a single user comment string.
+This field contains a single user comment encoded as a UTF-8
+ string&nbsp;<xref target="RFC3629"/>.
 There is one for each user comment indicated by the 'user comment list length'
  field.
 </t>
@@ -1246,9 +1275,9 @@
 Implementations MUST avoid attempting to allocate excessive amounts of memory
  when presented with a very large comment header.
 To accomplish this, implementations MAY treat a stream as invalid if it has a
- comment header larger than 125,829,120&nbsp;octets, and MAY ignore individual
- comments that are not fully contained within the first 61,440&nbsp;octets of
- the comment header.
+ comment header larger than 125,829,120&nbsp;octets (120&nbsp;MB), and MAY
+ ignore individual comments that are not fully contained within the first
+ 61,440&nbsp;octets of the comment header.
 </t>
 
 <section anchor="comment_format" title="Tag Definitions">
@@ -1287,6 +1316,7 @@
  played as part of a particular collection of tracks.
 The gain is also a Q7.8 fixed point number in dB, as in the ID header's
  'output gain' field.
+The values '-573' and '111' given here are just examples.
 </t>
 <t>
 An Ogg Opus stream MUST NOT have more than one of each of these tags, and if
@@ -1522,6 +1552,56 @@
 </t>
 
 <t>
+Header parsing code contains the most likely area for potential overruns.
+It is important for implementations to ensure their buffers contain enough
+ data for all of the required fields before attempting to read it (for example,
+ for all of the channel map data in the ID header).
+Implementations would do well to validate the indices of the channel map, also,
+ to ensure they meet all of the restrictions outlined in
+ <xref target="channel_mapping"/>, in order to avoid attempting to read data
+ from channels that do not exist.
+</t>
+
+<t>
+To avoid excessive resource usage, we advise implementations to be especially
+ wary of streams that might cause them to process far more data than was
+ actually transmitted.
+For example, a relatively small comment header may contain values for the
+ string lengths or user comment list length that imply that it is many
+ gigabytes in size.
+Even computing the size of the required buffer could overflow a 32-bit integer,
+ and actually attempting to allocate such a buffer before verifying it would be
+ a reasonable size is a bad idea.
+After reading the user comment list length, implementations might wish to
+ verify that the header contains at least the minimum amount of data for that
+ many comments (4&nbsp;additional octets per comment, to indicate each has a
+ length of zero) before proceeding any further, again taking care to avoid
+ overflow in these calculations.
+If allocating an array of pointers to point at these strings, the size of the
+ pointers may be larger than 4&nbsp;octets, potentially requiring a separate
+ overflow check.
+</t>
+
+<t>
+Another bug in this class we have observed more than once involves the handling
+ of invalid data at the end of a stream.
+Often, implementations will seek to the end of a stream to locate the last
+ timestamp in order to compute its total duration.
+If they do not find a valid capture pattern and Ogg page from the desired
+ logical stream, they will back up and try again.
+If care is not taken to avoid re-scanning data that was already scanned, this
+ search can quickly devolve into something with a complexity that is quadratic
+ in the amount of invalid data.
+</t>
+
+<t>
+In general when seeking, implementations will wish to be cautious about the
+ effects of invalid granule position values, and ensure all algorithms will
+ continue to make progress and eventually terminate, even if these are missing
+ or out-of-order.
+</t>
+
+<t>
 Like most other container formats, Ogg Opus streams SHOULD NOT be used with
  insecure ciphers or cipher modes that are vulnerable to known-plaintext
  attacks.
@@ -1717,6 +1797,14 @@
 </front>
 </reference>
 
+<reference anchor="q-notation"
+ target="https://en.wikipedia.org/w/index.php?title=Q_%28number_format%29&amp;oldid=697252615">
+<front>
+<title>Q (number format)</title>
+<author><organization>Wikipedia</organization></author>
+<date month="December" year="2015"/>
+</front>
+</reference>
 
 <reference anchor="replay-gain"
  target="https://wiki.xiph.org/VorbisComment#Replay_Gain">