ref: e4689464eb5f1cc33f958a119deb3669919c529d
parent: eb8b3c2b07a761fdb3fcb6f7e39679ccfba595a6
author: Timothy B. Terriberry <[email protected]>
date: Mon Apr 23 20:37:04 EDT 2012
Addressing AD issues Including a description of the PVQ encoder and decoder
--- a/doc/draft-ietf-codec-opus.xml
+++ b/doc/draft-ietf-codec-opus.xml
@@ -943,7 +943,8 @@
They are reserved for future applications, such as in-band headers (containing
metadata, etc.).
Packets which violate these constraints may cause implementations of
- <em>this</em> specification to treat them as malformed, and discard them.
+ <spanx style="emph">this</spanx> specification to treat them as malformed, and
+ discard them.
</t>
<t>
These constraints are summarized here for reference:
@@ -1983,6 +1984,7 @@
]]></artwork>
</figure>
N.b., w1_Q13 is computed first here, because w0_Q13 depends on it.
+The constant 6554 is approximately 0.1 in Q16.
</t>
<texttable anchor="silk_stereo_weights_table"
@@ -2105,7 +2107,8 @@
A separate quantization gain is coded for each 5 ms subframe.
These gains control the step size between quantization levels of the excitation
signal and, therefore, the quality of the reconstruction.
-They are independent of the pitch gains coded for voiced frames.
+They are independent of and unrelated to the pitch contours coded for voiced
+ frames.
The quantization gains are themselves uniformly quantized to 6 bits on a
log scale, giving them a resolution of approximately 1.369 dB and a range
of approximately 1.94 dB to 88.21 dB.
@@ -2762,6 +2765,7 @@
w_Q9[k] = y + ((213*f*y)>>16)
]]></artwork>
</figure>
+The constant 46214 here is approximately the square root of 2 in Q15.
The cb1_Q8[] vector completely determines these weights, and they may be
tabulated and stored as 13-bit unsigned values (with a range of 1819 to 5227,
inclusive) to avoid computing them when decoding.
@@ -3453,6 +3457,7 @@
Then for each k from d_LPC-1 down to 0, if
abs(a32_Q24[k][k]) > 16773022, the filter is unstable and the
recurrence stops.
+The constant 16773022 here is approximately 0.99975 in Q24.
Otherwise, row k-1 of a32_Q24 is computed from row k as
<figure align="center">
<artwork align="center"><![CDATA[
@@ -4552,7 +4557,7 @@
4
e_Q23[i] __ b_Q7[k]
res[i] = --------- + \ res[i - pitch_lags[s] + 2 - k] * ------- .
- 8388608.0 /_ 128.0
+ 2.0**23 /_ 128.0
k=0
]]></artwork>
</figure>
@@ -4566,7 +4571,7 @@
<artwork align="center"><![CDATA[
e_Q23[i]
res[i] = ---------
- 8388608.0
+ 2.0**23
]]></artwork>
</figure>
</t>
@@ -5060,7 +5065,7 @@
boost contains the boost for this band. If boost is non-zero and dynalloc_logp
is greater than 2, decrease dynalloc_logp. Once this process has been
executed on all bands, the band boosts have been decoded. This procedure
-is implemented around line 2352 of celt.c.</t>
+is implemented around line 2469 of celt.c.</t>
<t>At very low rates it is possible that there won't be enough available
space to execute the inner loop even once. In these cases band boost
@@ -5067,7 +5072,7 @@
is not possible but its overhead is completely eliminated. Because of the
high cost of band boost when activated, a reasonable encoder should not be
using it at very low rates. The reference implements its dynalloc decision
-logic around line 1269 of celt.c.</t>
+logic around line 1299 of celt.c.</t>
<t>The allocation trim is a integer value from 0-10. The default value of
5 indicates no trim. The trim parameter is entropy coded in order to
@@ -5079,8 +5084,13 @@
the trim value to 5, then iff the count of decoded 8th bits so far (ec_tell_frac)
plus 48 (6 bits) is less than or equal to the total frame size in 8th
bits minus total_boost (a product of the above band boost procedure),
-decode the trim value using the inverse CDF {127, 126, 124, 119, 109, 87, 41, 19, 9, 4, 2, 0}.</t>
+decode the trim value using the PDF in <xref target="celt_trim_pdf"/>.</t>
+<texttable anchor="celt_trim_pdf" title="PDF for the Trim">
+<ttcol>PDF</ttcol>
+<c>{1, 1, 2, 5, 10, 22, 46, 22, 10, 5, 2, 2}/128</c>
+</texttable>
+
<t>For 10 ms and 20 ms frames using short blocks and that have at least LM+2 bits left prior to
the allocation process, then one anti-collapse bit is reserved in the allocation process so it can
be decoded later. Following the the anti-collapse reservation, one bit is reserved for skip if available.</t>
@@ -5188,7 +5198,30 @@
</t>
<t>
-The decoded vector is normalized such that its
+The decoded vector X is recovered as follows.
+Let i be the index decoded with the procedure in <xref target="ec_dec_uint"/>
+ with ft = V(N,K), so that 0 <= i < V(N,K).
+Let k = K.
+Then for j = 0 to (N - 1), inclusive, do:
+<list style="numbers">
+<t>Let p = (V(N-j-1,k) + V(N-j,k))/2.</t>
+<t>
+If i < p, then let sgn = 1, else let sgn = -1
+ and set i = i - p.
+</t>
+<t>Let k0 = k and set p = p - V(N-j-1,k).</t>
+<t>
+While p > i, set k = k - 1 and
+ p = p - V(N-j-1,k).
+</t>
+<t>
+Set X[j] = sgn*(k0 - k) and i = i - p.
+</t>
+</list>
+</t>
+
+<t>
+The decoded vector X is then normalized such that its
L2-norm equals one.
</t>
</section>
@@ -7204,6 +7237,32 @@
</t>
</section>
+<section anchor="cwrs-encoder" title="PVQ Encoding">
+
+<t>
+The vector to encode, X, is converted into an index i such that
+ 0 <= i < V(N,K) as follows.
+Let i = 0 and k = 0.
+Then for j = (N - 1) down to 0, inclusive, do:
+<list style="numbers">
+<t>
+If k > 0, set
+ i = i + (V(N-j-1,k-1) + V(N-j,k-1))/2.
+</t>
+<t>Set k = k + abs(X[j]).</t>
+<t>
+If X[j] < 0, set
+ i = i + (V(N-j-1,k) + V(N-j,k))/2.
+</t>
+</list>
+</t>
+
+<t>
+The index i is then encoded using the procedure in
+ <xref target="encoding-ints"/> with ft = V(N,K).
+</t>
+
+</section>
</section>