ref: fc0276fad4ca6aa4b83230329f9fe5ad8b60a621
parent: 25c2f620b6de0f47a9d7e6e7859c035b31031122
author: Timothy B. Terriberry <[email protected]>
date: Tue Jul 7 07:25:42 EDT 2015
Update the oggopus draft. This version resolves some issues with the packet size limits raised by Mark Harris.
--- a/doc/draft-ietf-codec-oggopus.xml
+++ b/doc/draft-ietf-codec-oggopus.xml
@@ -11,7 +11,7 @@
]>
<?rfc toc="yes" symrefs="yes" ?>
-<rfc ipr="trust200902" category="std" docName="draft-ietf-codec-oggopus-07">
+<rfc ipr="trust200902" category="std" docName="draft-ietf-codec-oggopus-08">
<front>
<title abbrev="Ogg Opus">Ogg Encapsulation for the Opus Audio Codec</title>
@@ -60,7 +60,7 @@
</address>
</author>
-<date day="28" month="April" year="2015"/>
+<date day="6" month="July" year="2015"/>
<area>RAI</area>
<workgroup>codec</workgroup>
@@ -923,9 +923,9 @@
<section anchor="downmix" title="Downmixing">
<t>
-An Ogg Opus player MUST play any Ogg Opus stream with a channel mapping family
- of 0 or 1, even if the number of channels does not match the physically
- connected audio hardware.
+An Ogg Opus player MUST support any valid channel mapping with a channel
+ mapping family of 0 or 1, even if the number of channels does not match the
+ physically connected audio hardware.
Players SHOULD perform channel mixing to increase or reduce the number of
channels as needed.
</t>
@@ -1181,6 +1181,16 @@
as desired.
</t>
+<t>
+The comment header can be arbitrarily large and might be spread over a large
+ number of Ogg pages.
+Decoders SHOULD avoid attempting to allocate excessive amounts of memory when
+ presented with a very large comment header.
+To accomplish this, decoders MAY reject a comment header larger than
+ 125,829,120 octets, and MAY ignore individual comments that are not fully
+ contained within the first 61,440 octets of the comment header.
+</t>
+
<section anchor="comment_format" title="Tag Definitions">
<t>
The user comment strings follow the NAME=value format described by
@@ -1262,20 +1272,26 @@
Technically, valid Opus packets can be arbitrarily large due to the padding
format, although the amount of non-padding data they can contain is bounded.
These packets might be spread over a similarly enormous number of Ogg pages.
-Encoders SHOULD use no more padding than is necessary to make a variable
- bitrate (VBR) stream constant bitrate (CBR).
+Encoders SHOULD limit the use of padding in audio data packets to no more than
+ is necessary to make a variable bitrate (VBR) stream constant bitrate (CBR).
+Decoders SHOULD reject audio data packets larger than 61,440 octets per Opus
+ stream.
+Such packets necessarily contain more padding than needed for this purpose.
Decoders SHOULD avoid attempting to allocate excessive amounts of memory when
presented with a very large packet.
-Decoders SHOULD reject packets larger than 60 kB per channel, and display
- a warning message, and MAY reject packets larger than 7.5 kB per channel.
+Decoders MAY reject or partially process audio data packets larger than
+ 61,440 octets in an Ogg Opus stream with channel mapping families 0
+ or 1.
+Decoders MAY reject or partially process audio data packets in any Ogg Opus
+ stream if the packet is larger than 61,440 octets and also larger than
+ 7,680 octets per Opus stream.
The presence of an extremely large packet in the stream could indicate a
memory exhaustion attack or stream corruption.
</t>
<t>
In an Ogg Opus stream, the largest possible valid packet that does not use
- padding has a size of (61,298*N - 2) octets, or about 60 kB per
- Opus stream.
-With 255 streams, this is 15,630,988 octets (14.9 MB) and can
+ padding has a size of (61,298*N - 2) octets.
+With 255 streams, this is 15,630,988 octets and can
span up to 61,298 Ogg pages, all but one of which will have a granule
position of -1.
This is of course a very extreme packet, consisting of 255 streams, each
@@ -1284,23 +1300,25 @@
efficient manner allowed (a VBR code 3 Opus packet).
Even in such a packet, most of the data will be zeros as 2.5 ms frames
cannot actually use all 1275 octets.
+</t>
+<t>
The largest packet consisting of entirely useful data is
- (15,326*N - 2) octets, or about 15 kB per stream.
+ (15,326*N - 2) octets.
This corresponds to 120 ms of audio encoded as 10 ms frames in either
SILK or Hybrid mode, but at a data rate of over 1 Mbps, which makes little
sense for the quality achieved.
-A more reasonable limit is (7,664*N - 2) octets, or about 7.5 kB
- per stream.
+</t>
+<t>
+A more reasonable limit is (7,664*N - 2) octets.
This corresponds to 120 ms of audio encoded as 20 ms stereo CELT mode
frames, with a total bitrate just under 511 kbps (not counting the Ogg
encapsulation overhead).
-With N=8, the maximum number of channels currently defined by mapping
- family 1, this gives a maximum packet size of 61,310 octets, or just
- under 60 kB.
-This is still quite conservative, as it assumes each output channel is taken
- from one decoded channel of a stereo packet.
-An implementation could reasonably choose any of these numbers for its internal
- limits.
+For channel mapping family 1, N=8 provides a reasonable upper bound, as it
+ allows for each of the 8 possible output channels to be decoded from a
+ separate stereo Opus stream.
+This gives a size of 61,310 octets, which is rounded up to a multiple of
+ 1,024 octets to yield the audio data packet size of 61,440 octets
+ that any implementation is expected to be able to process successfully.
</t>
</section>
@@ -1489,9 +1507,9 @@
<section anchor="Acknowledgments" title="Acknowledgments">
<t>
-Thanks to Greg Maxwell, Christopher "Monty" Montgomery, and Jean-Marc Valin for
- their valuable contributions to this document.
-Additional thanks to Andrew D'Addesio, Greg Maxwell, and Vincent Penqeurc'h for
+Thanks to Mark Harris, Greg Maxwell, Christopher "Monty" Montgomery, and
+ Jean-Marc Valin for their valuable contributions to this document.
+Additional thanks to Andrew D'Addesio, Greg Maxwell, and Vincent Penquerc'h for
their feedback based on early implementations.
</t>
</section>
@@ -1610,7 +1628,7 @@
</reference>
<reference anchor="vorbis-mapping"
- target="https://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-800004.3.9">
+ target="https://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-810004.3.9">
<front>
<title>The Vorbis I Specification, Section 4.3.9 Output Channel Order</title>
<author initials="C." surname="Montgomery"
@@ -1620,7 +1638,7 @@
</reference>
<reference anchor="vorbis-trim"
- target="https://xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-130000A.2">
+ target="https://xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-132000A.2">
<front>
<title>The Vorbis I Specification, Appendix A: Embedding Vorbis
into an Ogg stream</title>