ref: 9fe754cf8f028f98653c2958910e4ba001ba8892
parent: f7e5a8279dcccd01424356a91986d650a5e0db65
author: Jean-Marc Valin <[email protected]>
date: Sat Jul 4 17:21:00 EDT 2009
ietf doc: more corrections
--- a/doc/ietf/draft-valin-celt-codec.xml
+++ b/doc/ietf/draft-valin-celt-codec.xml
@@ -123,15 +123,16 @@
</t>
<t>CELT is a transform codec, based on the Modified Discrete Cosine Transform
-<xref target="mdct"/>, derived from the DCT-IV, with overlap and time-domain
-aliasing cancellation. The main characteristics of CELT are as follows:
+(MDCT). The MDCT is derived from the DCT-IV by adding an overlap with time-domain
+aliasing cancellation <xref target="mdct"/>.
+The main characteristics of CELT are as follows:
<list style="symbols">
-<t>Ultra-low algorithmic delay (scalable, typically 3 to 9 ms)</t>
+<t>Ultra-low algorithmic delay (scalable, typically 4 to 9 ms)</t>
<t>Sampling rates from 32 kHz to 48 kHz and above (full audio bandwidth)</t>
-<t>Applicable to both speech and music</t>
+<t>Applicability to both speech and music</t>
<t>Support for mono and stereo</t>
-<t>Adaptive bit-rate from 32 kbit/s to 128 kbit/s and above</t>
+<t>Adaptive bit-rate from 32 kbit/s to 128 kbit/s per channel and above</t>
<t>Scalable complexity</t>
<t>Robustness to packet loss (scalable trade-off between quality and loss-robustness)</t>
<t>Open source implementation (floating-point and fixed-point)</t>
@@ -142,7 +143,9 @@
<section anchor="bitstream" title="Bit-stream definition">
<t>
-This document contains a detailed description of both the encoder and the decoder, along with a reference implementation. In most circumstances, and unless otherwise stated, the calculations in other implementations do NOT need to produce results that are bit-identical with the reference implementation, so alternate algorithms can sometimes be used. However, there are a few (clearly identified) cases where bit-exactness is required. An implementation is considered to be compatible if, for any valid bit-stream, the decoder's output is perceptually very close to the output produced by the reference decoder.
+This document contains a detailed description of both the encoder and the decoder, along with a reference implementation. In most circumstances, and unless otherwise stated, the calculations
+do <spanx style="strong">not</spanx> need to produce results that are bit-identical with the reference implementation, so alternate algorithms can sometimes be used. However, there are a few (clearly identified) cases, such as the bit allocation, where bit-exactness with the reference
+implementation is required. An implementation is considered to be compatible if, for any valid bit-stream, the decoder's output is perceptually indistinguishable from the output produced by the reference decoder.
</t>
<t>
@@ -189,10 +192,10 @@
<t>
The CELT bit-stream is "octet-based" in the sense that the encoder always produces an
-integer number of octets when encoding a frame. Also, the bit-rate used by CELT can
-<spanx style="strong">only</spanx> be determined by the number of octets produced by
-the encoder. In many cases, the transport layer already encodes the data length, so
-no extra information is used to signal the bit-rate. In cases where this is not true,
+integer number of octets when encoding a frame. Also, the bit-rate used by the CELT encoder can
+<spanx style="strong">only</spanx> be determined by the number of octets produced.
+In many cases (e.g. UDP/RTP), the transport layer already encodes the data length, so
+no extra information is necessary to signal the bit-rate. In cases where this is not true,
or when there are multiple compressed frames per packet, the size of each compressed
frame MUST be signalled in some way.
</t>
@@ -259,8 +262,8 @@
current frame size and sample rate, using exact integer calculations. The reference
implementation
pre-computes these projections in compute_allocation_table() (<xref
-target="modes.c">modes.c</xref>) but implementations are free to use any
-approach which produces bit-identical allocation results.
+target="modes.c">modes.c</xref>) and any other implementation
+MUST produces bit-identical allocation results.
</t>
<t>
@@ -293,8 +296,9 @@
The basic block diagram of the CELT encoder is illustrated in <xref target="encoder-diagram"></xref>.
The encoder contains most of the building blocks of the decoder and can,
with very little extra computation, compute the signal that would be decoded by the decoder.
-CELT has three main quantizers denoted Q1, Q2 and Q3. These apply to band energies, pitch gains
-and normalized MDCT bins, respectively.
+CELT has three main quantizers denoted Q1, Q2 and Q3. These apply to band energies
+(<xref target="energy-quantization"></xref>), pitch gains (<xref target="pitch-prediction"></xref>)
+and normalized MDCT bins (<xref target="pvq"></xref>), respectively.
</t>
<figure anchor="encoder-diagram">
@@ -329,46 +333,12 @@
<postamble>Block diagram of the CELT encoder</postamble>
</figure>
-<!--
-<texttable anchor="bitstream">
- <ttcol align='center'>Parameter(s)</ttcol>
- <ttcol align='center'>Condition</ttcol>
- <ttcol align='center'>Symbol(s)</ttcol>
- <c>Feature flags</c><c>Always</c><c>2-4 bits</c>
- <c>Pitch period</c><c>P=1</c><c>1 Integer (8-9 bits)</c>
- <c>Transient scalefactor</c><c>S=1</c><c>2 bits</c>
- <c>Coarse energy</c><c>Always</c><c>one symbol per band</c>
- <c>Fine energy</c><c>Always</c><c>one symbol per band</c>
- <c>PVQ indices</c><c>Always</c><c>one symbol per band</c>
- <c>Remaining fine energy</c><c>bits available</c><c>one bit per band</c>
-</texttable>
--->
-
-
-<!--
-<figure>
-<artwork>
-+-----------------+---------------------+------------------------------+
-| Feature flags | (pitch period if P) | (transient scalefactor if S) |
-+-----------------+---------------------+------------------------------+
-| (transient time if scalefactor == 3) | coarse energy |
-+----------------+----------------------+-------+----------------------+
-| fine energy | PVQ indices for all bands | (more fine energy) |
-+----------------+------------------------------+----------------------+
-</artwork>
-<postamble>Fields within parentheses are not included in every packet</postamble>
-</figure>
--->
-
-<section anchor="pre-emphasis" title="Pre-emphasis">
-
-<t>The input audio first goes through a pre-emphasis filter, which attenuates the
+<t>The input audio first goes through a pre-emphasis filter
+(just before the windowing in <xref target="encoder-diagram"></xref>), which attenuates the
<spanx style="emph">spectral tilt</spanx>. The filter is has the transfer function A(z)=1-alpha_p*z^-1, with
-alpha_p=0.8. Although it is not a requirement, no part of the reference encoder operates
-on the non-pre-emphasized signal. The inverse of the pre-emphasis is applied at the decoder.</t>
+alpha_p=0.8. The inverse of the pre-emphasis is applied at the decoder.</t>
-</section> <!-- pre-emphasis -->
<section anchor="range-encoder" title="Range Coder">
<t>
@@ -946,7 +916,9 @@
<t>
Like most audio codecs, the CELT decoder is less complex than the encoder, as can be
-observed in the decoder block diagram in <xref target="decoder-diagram"></xref>.
+observed in the decoder block diagram in <xref target="decoder-diagram"></xref>. In
+fact, most of the operations performed by the decoder are also performed by the
+encoder.
</t>
<figure anchor="decoder-diagram">
@@ -979,9 +951,11 @@
</figure>
<t>
-If during the decoding process a decoded integer value is out of the specified range
-(which can happen due to a minimal amount of redundancy in the encoding of large integers with
-the range coder), then the decoder knows there has been an error in the coding,
+The decoder extracts information from the range-coded bit-stream in the same order
+as it was encoded by the encoder. In some circumstances, it is
+possible for a decoded value to be out of range due to a very small amount of redundancy
+in the encoding of large integers by the range coder.
+In that case, the decoder should assume there has been an error in the coding,
decoding, or transmission and SHOULD take measures to conceal the error and/or report
to the application that a problem has occurred.
</t>