shithub: opus

--- a/doc/ietf/draft-valin-celt-codec.xml

+++ b/doc/ietf/draft-valin-celt-codec.xml

@@ -84,21 +84,9 @@

 speech and music and rates starting at 32 kbit/s. It is primarly designed for transmission

 over packet networks and protocols such as RTP <xref target="rfc3550"/>, but also includes

 a certain amount of robustness to bit errors, where this could be done at no significant

-cost. The codec features are:

+cost.

 </t>

-<t>

-<list style="symbols">

-<t>Ultra-low algorithmic delay (typically 3 to 9 ms)</t>

-<t>Full audio bandwidth (44.1 kHz and 48 kHz)</t>

-<t>Support for both voice and music</t>

-<t>Stereo support</t>

-<t>Packet loss concealment</t>

-<t>Constant bit-rates from 32 kbps to 128 kbps and above</t>

-<t>Free software/open-source/royalty-free</t>

-</list>

-</t>

 <t>The novel aspect of CELT compared to most other codecs is its very low delay,

 below 10 ms. There are two main advantages to having a very low delay audio link.

 The lower delay itself is important some interactions, such as playing music

@@ -134,12 +122,21 @@

 </t>

 <t>CELT is a transform codec, based on the Modified Discrete Cosine Transform

-<xref target="mdct"/>, which is based on a DCT-IV, with overlap and time-domain

-aliasing calcellation.</t>

+<xref target="mdct"/>, derived from the DCT-IV, with overlap and time-domain

+aliasing calcellation. The main characteristics of CELT are as follows:

+<list style="symbols">

+<t>Ultra-low algorithmic delay (typically 3 to 9 ms)</t>

+<t>Full audio bandwidth (44.1 kHz and 48 kHz)</t>

+<t>Support for both speech and music</t>

+<t>Stereo support</t>

+<t>Robustness to packet loss</t>

+<t>Constant bit-rate from 32 kbps to 128 kbps and above</t>

+<t>Open source, with no known intellectual property issue</t>

+</list>

+</t>

 </section>

 <section anchor="CELT Modes" title="CELT Modes">

@@ -265,7 +262,7 @@

         <ttcol align='center'>P</ttcol>

         <ttcol align='center'>S</ttcol>

         <ttcol align='center'>F</ttcol>

-        <ttcol align='center'>Encoding</ttcol>

+        <ttcol align='right'>Encoding</ttcol>

         <c>0</c><c>0</c><c>0</c><c>1</c><c>00</c>

         <c>0</c><c>1</c><c>0</c><c>1</c><c>01</c>

         <c>1</c><c>0</c><c>0</c><c>1</c><c>110</c>

@@ -435,12 +432,15 @@

 the unit vector that results from the normalisation in

 <xref target="normalization"></xref>. Given a PVQ codevector y, the unit vector X is

 obtained as X = y/||y||. Where ||.|| denotes the L2 norm. In the case where a pitch

-prediction or a folding vector P is used, the unit vector X becomes:

+prediction or a folding vector P is used, the quantized unit vector X' becomes:

 </t>

-<t>X = P + g_f * y,</t>

+<t>X' = P + g_f * y,</t>

 <t>where g_f = ( sqrt( (y^T*P)^2 + ||y||^2*(1-||P||^2) ) - y^T*P ) / ||y||^2. </t>

-<t>This is described in mix_pitch_and_residual() (<xref target="vq.c">vq.c</xref>).</t>

+<t>The combination of the pitch with the pvq codeword is described in

+mix_pitch_and_residual() (<xref target="vq.c">vq.c</xref>) and is used in

+both the encoder and the decoder.

+</t>

<t>

@@ -447,10 +447,32 @@

 The search for the best codevector y is performed by alg_quant()

 (<xref target="vq.c">vq.c</xref>). There are several possible approaches to the

 search with a tradeoff between quality and complexity. The method used in the reference

-implementation consists of first projecting the residual signal R = X - P onto the codebook

-pyramid.

+implementation computes an initial codeword y1 by projecting the residual signal

+R = X - P onto the codebook pyramid of K-1 pulses:

 </t>

+<t>

+y0 = round_towards_zero( (K-1) * R / sum(abs(R)))

+</t>

+<t>

+Depending on N, K and the input data, the initial codeword y0 may contain from

+0 to K-1 non-zero values. All the remaining pulses, with the exception of the last one,

+are found iteratively with a greedy search that minimizes the normalised correlation

+between y and R:

+</t>

+<t>

+J = -R^T*y / ||y||

+</t>

+<t>

+The last pulse is the only one considering the pitch and minimizes the cost function <xref target="celt-tasl"></xref>:

+</t>

+<t>

+J = -g_f * R^T*y + (g_f)^2 * ||y||^2

+</t>

 <section anchor="Index Encoding" title="Index Encoding">

<t>

 The best PVQ codeword is encoded by encode_pulses() (<xref target="cwrs.c">cwrs.c</xref>).

@@ -570,6 +592,8 @@

 </section>

+<!--

 <section anchor="Evaluation of CELT Implementations" title="Evaluation of CELT Implementations">

<t>

@@ -578,18 +602,7 @@

 </section>

-<section anchor="Issues that need to be addressed" title="Issues that need to be addressed">

-<t>

-<list>

-<t>Dynamic bit allocation</t>

-<t>Stereo coupling</t>

-</list>

-</t>

-</section>

+-->

 <section anchor="Acknowledgments" title="Acknowledgments">