ref: 84846910c5133b2f53833c2c6a7a56add6de6df4
parent: 4a7027b27e2d962dedc63360a45db5ff74dc1131
author: Jean-Marc Valin <[email protected]>
date: Thu Oct 27 11:34:21 EDT 2011
draft: CELT encoder description for tf_analysis() and spreading_decision()
--- a/doc/draft-ietf-codec-opus.xml
+++ b/doc/draft-ietf-codec-opus.xml
@@ -5778,17 +5778,17 @@
<figure>
<artwork>
<![CDATA[
- +----------+ +-------+
- | sample | | SILK |
- +->| rate |--->|encoder|--+
- +-----------+ | |conversion| | | |
- | Optional | | +----------+ +-------+ | +-------+
--->| high-pass |---+ +--->| Range |
- + filter + | +------------+ +-------+ |encoder|---->
- +-----------+ | | Delay | | CELT | +--->| | bitstream
- +->|compensation|->|encoder|--+ +-------+
- | | | |
- +------------+ +-------+
+ +----------+ +-------+
+ | sample | | SILK |
+ +->| rate |--->|encoder|--+
+ +-----------+ | |conversion| | | |
+ | Optional | | +----------+ +-------+ | +-------+
+->| high-pass |--+ +-->| Range |
+ + filter + | +------------+ +-------+ |encoder|---->
+ +-----------+ | | Delay | | CELT | +-->| | bit-
+ +->|compensation|->|encoder|--+ +-------+ stream
+ | | | |
+ +------------+ +-------+
]]>
</artwork>
</figure>
@@ -6388,7 +6388,7 @@
</t>
<section anchor="pitch-prefilter" title="Pitch Prefilter">
-<t>The pitch prefilter is applied after the pre-emphasis and before the de-emphasis. It's applied
+<t>The pitch prefilter is applied after the pre-emphasis. It is applied
in such a way as to be the inverse of the decoder's post-filter. The main non-obvious aspect of the
prefilter is the selection of the pitch period. The pitch search should be optimised for the
following criteria:
@@ -6425,7 +6425,31 @@
</t>
</section> <!-- Energy quant -->
+<section title="Time-Frequency Decision">
+<t>
+The choice of time-frequency resolution used in <xref target="tf-change"></xref> is based on
+rate-distortion (RD) optimization. The distortion is the L1-norm (sum of absolute values) of each band
+after each TF resolution under consideration. The L1 norm is used because it represents the entropy
+for a Laplacian source. The number of bits required to code a change in TF resolution between
+two bands is higher than the cost of having those two bands use the same resolution, which is
+what requires the RD optimization. The optimal decision is computed using the Viterbi algorithm.
+See tf_analysis() in celt/celt.c.
+</t>
+</section>
+<section title="Spreading Values Decision">
+<t>
+The choice of the spreading value in <xref target="spread values"></xref> has an
+impact on the nature of the coding noise introduced by CELT. The larger the f_r value, the
+lower the impact of the rotation, and the more tonal the coding noise. The
+more tonal the signal, the more tonal the noise should be, so the CELT encoder determines
+the optimal value for f_r by estimating how tonal the signal is. The tonality estimate
+is based on discrete pdf (4-bin histogram) of each band. Bands that have a large number of small
+values are considered more tonal and a decision is made by combining all bands with more than
+8 samples. See spreading_decision() in celt/bands.c.
+</t>
+</section>
+
<section anchor="pvq" title="Spherical Vector Quantization">
<t>CELT uses a Pyramid Vector Quantization (PVQ) <xref target="PVQ"></xref>
codebook for quantizing the details of the spectrum in each band that have not
@@ -6473,7 +6497,7 @@
<t>
The search described above is considered to be a good trade-off between quality
and computational cost. However, there are other possible ways to search the PVQ
-codebook and the implementers MAY use any other search methods.
+codebook and the implementers MAY use any other search methods. See alg_quant() in celt/vq.c.
</t>
</section>