shithub: opus

Download patch

ref: 917cd6e6ae4a00c8f63368b5b249cc793b96bb20
parent: 66767ee837b6bd545f982e6f89f563e02b507ea1
author: Timothy B. Terriberry <[email protected]>
date: Mon Oct 31 07:09:34 EDT 2011

Minor draft edits.

--- a/doc/draft-ietf-codec-opus.xml
+++ b/doc/draft-ietf-codec-opus.xml
@@ -174,7 +174,8 @@
 ]]></artwork>
 </figure>
 <t>
-With this definition, if lo&gt;hi, the lower bound is the one that is enforced.
+With this definition, if lo&nbsp;&gt;&nbsp;hi, the lower bound is the one that
+ is enforced.
 </t>
 </section>
 
@@ -280,7 +281,12 @@
  and requires an additional 5&nbsp;ms look-ahead for noise shaping estimation.
  A small additional delay (up to 1.2 ms) may be required for sampling rate conversion.
 Like Vorbis and many other modern codecs, SILK is inherently designed for
- variable-bitrate (VBR) coding, though the encoder can also produce constant-bitrate (CBR).
+ variable-bitrate (VBR) coding, though the encoder can also produce
+ constant-bitrate (CBR) streams.
+The version of SILK used in Opus is substantially modified from, and not
+ compatible with, the stand-alone SILK codec previously deployed by Skype.
+This document does not serve to define that format, but those interested in the
+ original SILK codec should see <xref target="SILK"/> instead.
 </t>
 
 <t>
@@ -487,20 +493,15 @@
 </section>
 
 <section anchor="modes" title="Internal Framing">
+
 <t>
-As described, the two layers can be combined in three possible operating modes:
-<list style="numbers">
-<t>An LP-only mode for use in low bitrate connections with an audio bandwidth
- of WB or less,</t>
-<t>A Hybrid (LP+MDCT) mode for SWB or FB speech at medium bitrates, and</t>
-<t>An MDCT-only mode for very low delay speech transmission as well as music
- transmission (NB to FB).</t>
-</list>
-</t>
-<t>
-A single packet may contain multiple audio frames.
-However, they must share a common set of parameters, including the operating
- mode, audio bandwidth, frame size, and channel count (mono vs. stereo).
+The Opus encoder produces "packets", which are each a contiguous set of bytes
+ meant to be transmitted as a single unit.
+The packets described here do not include such things as IP, UDP, or RTP
+ headers which are normally found in a transport-layer packet.
+A single packet may contain multiple audio frames, so long as they share a
+ common set of parameters, including the operating mode, audio bandwidth, frame
+ size, and channel count (mono vs. stereo).
 This section describes the possible combinations of these parameters and the
  internal framing used to pack multiple frames into a single packet.
 This framing is not self-delimiting.
@@ -536,6 +537,17 @@
 <t>
 The top five bits of the TOC byte, labeled "config", encode one of 32 possible
  configurations of operating mode, audio bandwidth, and frame size.
+As described, the LP layer and MDCT layer can be combined in three possible
+ operating modes:
+<list style="numbers">
+<t>An LP-only mode for use in low bitrate connections with an audio bandwidth
+ of WB or less,</t>
+<t>A Hybrid (LP+MDCT) mode for SWB or FB speech at medium bitrates, and</t>
+<t>An MDCT-only mode for very low delay speech transmission as well as music
+ transmission (NB to FB).</t>
+</list>
+The 32 possible configurations each identify which one of these operating modes
+ the packet uses, as well as the audio bandwidth and the frame size.
 <xref target="config_bits"/> lists the parameters for each configuration.
 </t>
 <texttable anchor="config_bits" title="TOC Byte Configuration Parameters">
@@ -1004,7 +1016,7 @@
 <t>
 Suppose there is a context with n symbols, identified with an index that ranges
  from 0 to n-1.
-The parameters needed to encode or decode a symbol in this context are
+The parameters needed to encode or decode symbol k in this context are
  represented by a three-tuple (fl[k],&nbsp;fh[k],&nbsp;ft), with
  0&nbsp;&lt;=&nbsp;fl[k]&nbsp;&lt;&nbsp;fh[k]&nbsp;&lt;=&nbsp;ft&nbsp;&lt;=&nbsp;65535.
 The values of this tuple are derived from the probability model for the
@@ -1032,7 +1044,7 @@
 Both val and rng are 32-bit unsigned integer values.
 The decoder initializes rng to 128 and initializes val to 127 minus the top 7
  bits of the first input octet.
-The remaining bit is saved for use in the renormalization procedure described
+It saves the remaining bit for use in the renormalization procedure described
  in <xref target="range-decoder-renorm"/>, which the decoder invokes
  immediately after initialization to read additional bits and establish the
  invariant that rng&nbsp;&gt;&nbsp;2**23.
@@ -5405,7 +5417,7 @@
 
 </section>
 
-<section anchor="switching" title="Mode Switching">
+<section anchor="switching" title="Configuration Switching">
 
 <!--TODO: Document mandated decoder resets and fix references to here-->