shithub: opus

Download patch

ref: e4689464eb5f1cc33f958a119deb3669919c529d
parent: eb8b3c2b07a761fdb3fcb6f7e39679ccfba595a6
author: Timothy B. Terriberry <[email protected]>
date: Mon Apr 23 20:37:04 EDT 2012

Addressing AD issues

Including a description of the PVQ encoder and decoder

--- a/doc/draft-ietf-codec-opus.xml
+++ b/doc/draft-ietf-codec-opus.xml
@@ -943,7 +943,8 @@
 They are reserved for future applications, such as in-band headers (containing
  metadata, etc.).
 Packets which violate these constraints may cause implementations of
- <em>this</em> specification to treat them as malformed, and discard them.
+ <spanx style="emph">this</spanx> specification to treat them as malformed, and
+ discard them.
 </t>
 <t>
 These constraints are summarized here for reference:
@@ -1983,6 +1984,7 @@
 ]]></artwork>
 </figure>
 N.b., w1_Q13 is computed first here, because w0_Q13 depends on it.
+The constant 6554 is approximately 0.1 in Q16.
 </t>
 
 <texttable anchor="silk_stereo_weights_table"
@@ -2105,7 +2107,8 @@
 A separate quantization gain is coded for each 5&nbsp;ms subframe.
 These gains control the step size between quantization levels of the excitation
  signal and, therefore, the quality of the reconstruction.
-They are independent of the pitch gains coded for voiced frames.
+They are independent of and unrelated to the pitch contours coded for voiced
+ frames.
 The quantization gains are themselves uniformly quantized to 6&nbsp;bits on a
  log scale, giving them a resolution of approximately 1.369&nbsp;dB and a range
  of approximately 1.94&nbsp;dB to 88.21&nbsp;dB.
@@ -2762,6 +2765,7 @@
 w_Q9[k] = y + ((213*f*y)>>16)
 ]]></artwork>
 </figure>
+The constant 46214 here is approximately the square root of 2 in Q15.
 The cb1_Q8[] vector completely determines these weights, and they may be
  tabulated and stored as 13-bit unsigned values (with a range of 1819 to 5227,
  inclusive) to avoid computing them when decoding.
@@ -3453,6 +3457,7 @@
 Then for each k from d_LPC-1 down to 0, if
  abs(a32_Q24[k][k])&nbsp;&gt;&nbsp;16773022, the filter is unstable and the
  recurrence stops.
+The constant 16773022 here is approximately 0.99975 in Q24.
 Otherwise, row k-1 of a32_Q24 is computed from row k as
 <figure align="center">
 <artwork align="center"><![CDATA[
@@ -4552,7 +4557,7 @@
                       4
           e_Q23[i]   __                                  b_Q7[k]
 res[i] = --------- + \  res[i - pitch_lags[s] + 2 - k] * ------- .
-         8388608.0   /_                                   128.0
+          2.0**23    /_                                   128.0
                      k=0
 ]]></artwork>
 </figure>
@@ -4566,7 +4571,7 @@
 <artwork align="center"><![CDATA[
           e_Q23[i]
 res[i] = ---------
-         8388608.0
+          2.0**23
 ]]></artwork>
 </figure>
 </t>
@@ -5060,7 +5065,7 @@
 boost contains the boost for this band. If boost is non-zero and dynalloc_logp
 is greater than 2, decrease dynalloc_logp.  Once this process has been
 executed on all bands, the band boosts have been decoded. This procedure
-is implemented around line 2352 of celt.c.</t>
+is implemented around line 2469 of celt.c.</t>
 
 <t>At very low rates it is possible that there won't be enough available
 space to execute the inner loop even once. In these cases band boost
@@ -5067,7 +5072,7 @@
 is not possible but its overhead is completely eliminated. Because of the
 high cost of band boost when activated, a reasonable encoder should not be
 using it at very low rates. The reference implements its dynalloc decision
-logic around line 1269 of celt.c.</t>
+logic around line 1299 of celt.c.</t>
 
 <t>The allocation trim is a integer value from 0-10. The default value of
 5 indicates no trim. The trim parameter is entropy coded in order to
@@ -5079,8 +5084,13 @@
 the trim value to 5, then iff the count of decoded 8th bits so far (ec_tell_frac)
 plus 48 (6 bits) is less than or equal to the total frame size in 8th
 bits minus total_boost (a product of the above band boost procedure),
-decode the trim value using the inverse CDF {127, 126, 124, 119, 109, 87, 41, 19, 9, 4, 2, 0}.</t>
+decode the trim value using the PDF in <xref target="celt_trim_pdf"/>.</t>
 
+<texttable anchor="celt_trim_pdf" title="PDF for the Trim">
+<ttcol>PDF</ttcol>
+<c>{1, 1, 2, 5, 10, 22, 46, 22, 10, 5, 2, 2}/128</c>
+</texttable>
+
 <t>For 10 ms and 20 ms frames using short blocks and that have at least LM+2 bits left prior to
 the allocation process, then one anti-collapse bit is reserved in the allocation process so it can
 be decoded later. Following the the anti-collapse reservation, one bit is reserved for skip if available.</t>
@@ -5188,7 +5198,30 @@
 </t>
 
 <t>
-The decoded vector is normalized such that its
+The decoded vector X is recovered as follows.
+Let i be the index decoded with the procedure in <xref target="ec_dec_uint"/>
+ with ft&nbsp;=&nbsp;V(N,K), so that 0&nbsp;&lt;=&nbsp;i&nbsp;&lt;&nbsp;V(N,K).
+Let k&nbsp;=&nbsp;K.
+Then for j&nbsp;=&nbsp;0 to (N&nbsp;-&nbsp;1), inclusive, do:
+<list style="numbers">
+<t>Let p&nbsp;=&nbsp;(V(N-j-1,k)&nbsp;+&nbsp;V(N-j,k))/2.</t>
+<t>
+If i&nbsp;&lt;&nbsp;p, then let sgn&nbsp;=&nbsp;1, else let sgn&nbsp;=&nbsp;-1
+ and set i&nbsp;=&nbsp;i&nbsp;-&nbsp;p.
+</t>
+<t>Let k0&nbsp;=&nbsp;k and set p&nbsp;=&nbsp;p&nbsp;-&nbsp;V(N-j-1,k).</t>
+<t>
+While p&nbsp;&gt;&nbsp;i, set k&nbsp;=&nbsp;k&nbsp;-&nbsp;1 and
+ p&nbsp;=&nbsp;p&nbsp;-&nbsp;V(N-j-1,k).
+</t>
+<t>
+Set X[j]&nbsp;=&nbsp;sgn*(k0&nbsp;-&nbsp;k) and i&nbsp;=&nbsp;i&nbsp;-&nbsp;p.
+</t>
+</list>
+</t>
+
+<t>
+The decoded vector X is then normalized such that its
 L2-norm equals one.
 </t>
 </section>
@@ -7204,6 +7237,32 @@
 </t>
 </section>
 
+<section anchor="cwrs-encoder" title="PVQ Encoding">
+
+<t>
+The vector to encode, X, is converted into an index i such that
+ 0&nbsp;&lt;=&nbsp;i&nbsp;&lt;&nbsp;V(N,K) as follows.
+Let i&nbsp;=&nbsp;0 and k&nbsp;=&nbsp;0.
+Then for j&nbsp;=&nbsp;(N&nbsp;-&nbsp;1) down to 0, inclusive, do:
+<list style="numbers">
+<t>
+If k&nbsp;>&nbsp;0, set
+ i&nbsp;=&nbsp;i&nbsp;+&nbsp;(V(N-j-1,k-1)&nbsp;+&nbsp;V(N-j,k-1))/2.
+</t>
+<t>Set k&nbsp;=&nbsp;k&nbsp;+&nbsp;abs(X[j]).</t>
+<t>
+If X[j]&nbsp;&lt;&nbsp;0, set
+ i&nbsp;=&nbsp;i&nbsp;+&nbsp;(V(N-j-1,k)&nbsp;+&nbsp;V(N-j,k))/2.
+</t>
+</list>
+</t>
+
+<t>
+The index i is then encoded using the procedure in
+ <xref target="encoding-ints"/> with ft&nbsp;=&nbsp;V(N,K).
+</t>
+
+</section>
 
 </section>