shithub: opus

--- a/doc/ietf/draft-valin-celt-codec.xml

+++ b/doc/ietf/draft-valin-celt-codec.xml

@@ -343,7 +343,7 @@

 which is itself a rediscovery of the FIFO arithmetic code introduced by <xref target="coding-thesis"></xref>.

 It is very similar to arithmetic encoding, except that encoding is done with

 digits in any base instead of with bits,

-so it is faster when using larger bases (e.g.: an octet). All of the

+so it is faster when using larger bases (i.e.: an octet). All of the

 calculations in the range coder must use bit-exact integer arithmetic.

 </t>

@@ -519,7 +519,7 @@

 <section anchor="intra" title="Intra-frame energy (I)">

<t>

-CELT uses prediction to encode the energy in each frequency band. In order to make frames independent, however, it is possible to disable the part of the prediction that depends on previous frames. This is called <spanx style="emph">intra-frame energy</spanx> and requires around 12 more bits per frame. It is enabled with the <spanx style="emph">I</spanx> bit (Table. <xref target="flags-encoding">flags-encoding</xref>). The use of intra energy is OPTIONAL and the decision method is left to the implementor. The reference code describes one way of deciding which frames would benefit most from having their energy encoded without prediction. The intra_decision() (<xref target="quant_bands.c">quant_bands.c</xref>) function looks for frames where the log-spectral distance between consecutive frames is more than 9 dB. When such a difference is found between two frames, the next frame (not the one for which the difference is detected) is marked encoded with intra energy. The one-frame delay is to ensure that when a frame containing a transient event is lost, then the next frame will be decoded without accumulating error from the lost frame.

+CELT uses prediction to encode the energy in each frequency band. In order to make frames independent, however, it is possible to disable the part of the prediction that depends on previous frames. This is called <spanx style="emph">intra-frame energy</spanx> and requires around 12 more bits per frame. It is enabled with the <spanx style="emph">I</spanx> bit (Table. <xref target="flags-encoding">flags-encoding</xref>). The use of intra energy is OPTIONAL and the decision method is left to the implementor. The reference code describes one way of deciding which frames would benefit most from having their energy encoded without prediction. The intra_decision() (<xref target="quant_bands.c">quant_bands.c</xref>) function looks for frames where the log-spectral distance between consecutive frames is more than 9 dB. When such a difference is found between two frames, the next frame (not the one for which the difference is detected) is marked encoded with intra energy. The one-frame delay is to ensure that when a frame containing a transient is lost, then the next frame will be decoded without accumulating error from the lost frame.

 </t>

 </section>

@@ -708,7 +708,9 @@

<t>

 In bands where neither pitch nor folding is used, the PVQ is used to encode

 the unit vector that results from the normalization in

-<xref target="normalization"></xref> directly. " In the case where a pitch

+<xref target="normalization"></xref> directly. Given a PVQ codevector y,

+the unit vector X is obtained as X = y/||y||, where ||.|| denotes the

+L2 norm. In the case where a pitch

 prediction or a folding vector p is used, the quantized unit vector X' becomes:

 </t>

 <t>X' = p' + g_f * y,</t>

@@ -790,11 +792,11 @@

 There are many different ways to compute V(N,K), including pre-computed tables and direct

 use of the recursive formulation. The reference implementation applies the recursive

 formulation one line (or column) at a time to save on memory use,

-along with an alternate,

-univariate recurrence to initialise an arbitrary line, and direct

-polynomial solutions for small N. All of these methods are

-equivalent, and have different trade-offs in speed, memory usage, and

-code size. Implementations MAY use any methods they like, as long as

+along with an alternate,

+univariate recurrence to initialise an arbitrary line, and direct

+polynomial solutions for small N. All of these methods are

+equivalent, and have different trade-offs in speed, memory usage, and

+code size. Implementations MAY use any methods they like, as long as

 they are equivalent to the mathematical definition.

 </t>

@@ -815,7 +817,7 @@

 <section anchor="stereo" title="Stereo support">

<t>

-When encoding a stereo stream, some parameters are shared across the left and right channels, while others are transmitted separately for each channel, or jointly encoded. Only one copy of the flags for the transients and pitch (pitch period and gains) features are transmitted. The coarse and fine energy parameters are transmitted separately for each channel. Both the coarse energy and fine energy (including the remaining fine bits at the end of the stream) have the left and right bands interleaved in the stream, with the left band encoded first.

+When encoding a stereo stream, some parameters are shared across the left and right channels, while others are transmitted separately for each channel, or jointly encoded. Only one copy of the flags for the features, transients and pitch (pitch period and gains) are transmitted. The coarse and fine energy parameters are transmitted separately for each channel. Both the coarse energy and fine energy (including the remaining fine bits at the end of the stream) have the left and right bands interleaved in the stream, with the left band encoded first.

 </t>

<t>

@@ -903,74 +905,74 @@

 <section anchor="range-decoder" title="Range Decoder">

<t>

 The range decoder extracts the symbols and integers encoded using the range encoder in

-<xref target="range-encoder"></xref>. The range decoder maintains an internal

-state vector composed of the two-tuple (dif,rng), representing the

-difference between the high end of the current range and the actual

-coded value, and the size of the current range, respectively. Both

-dif and rng are 32-bit unsigned integer values. rng is initialized to

-2^7. dif is initialized to rng minus the top 7 bits of the first

-input octet. Then the range is immediately normalized, using the

+<xref target="range-encoder"></xref>. The range decoder maintains an internal

+state vector composed of the two-tuple (dif,rng), representing the

+difference between the high end of the current range and the actual

+coded value, and the size of the current range, respectively. Both

+dif and rng are 32-bit unsigned integer values. rng is initialized to

+2^7. dif is initialized to rng minus the top 7 bits of the first

+input octet. Then the range is immediately normalized, using the

 procedure described in the following section.

 </t>

 <section anchor="decoding-symbols" title="Decoding Symbols">

<t>

-   Decoding symbols is a two-step process. The first step determines

-   a value fs that lies within the range of some symbol in the current

-   context. The second step updates the range decoder state with the

-   three-tuple (fl,fh,ft) corresponding to that symbol, as defined in

+   Decoding symbols is a two-step process. The first step determines

+   a value fs that lies within the range of some symbol in the current

+   context. The second step updates the range decoder state with the

+   three-tuple (fl,fh,ft) corresponding to that symbol, as defined in

    <xref target="encoding-symbols"></xref>.

 </t>

<t>

    The first step is implemented by ec_decode()

    (<xref target="rangedec.c">rangedec.c</xref>),

-   and computes fs = ft-min((dif-1)/(rng/ft)+1,ft), where ft is

-   the sum of the frequency counts in the current context, as described

-   in <xref target="encoding-symbols"></xref>. The divisions here are exact integer division.

+   and computes fs = ft-min((dif-1)/(rng/ft)+1,ft), where ft is

+   the sum of the frequency counts in the current context, as described

+   in <xref target="encoding-symbols"></xref>. The divisions here are exact integer division.

 </t>

<t>

-   In the reference implementation, a special version of ec_decode()

-   called ec_decode_bin() (<xref target="rangeenc.c">rangeenc.c</xref>) is defined using

-   the parameter ftb instead of ft. It is mathematically equivalent to

-   calling ec_decode() with ft = (1&lt;&lt;ftb), but avoids one of the

-   divisions.

+   In the reference implementation, a special version of ec_decode()

+   called ec_decode_bin() (<xref target="rangeenc.c">rangeenc.c</xref>) is defined using

+   the parameter ftb instead of ft. It is mathematically equivalent to

+   calling ec_decode() with ft = (1&lt;&lt;ftb), but avoids one of the

+   divisions.

 </t>

<t>

-   The decoder then identifies the symbol in the current context

-   corresponding to fs; i.e., the one whose three-tuple (fl,fh,ft)

-   satisfies fl &lt;= fs &lt; fh. This tuple is used to update the decoder

-   state according to dif = dif - (rng/ft)*(ft-fh), and if fl is greater

-   than zero, rng = (rng/ft)*(fh-fl), or otherwise rng = rng - (rng/ft)*(ft-fh). After this update, the range is normalized.

+   The decoder then identifies the symbol in the current context

+   corresponding to fs; i.e., the one whose three-tuple (fl,fh,ft)

+   satisfies fl &lt;= fs &lt; fh. This tuple is used to update the decoder

+   state according to dif = dif - (rng/ft)*(ft-fh), and if fl is greater

+   than zero, rng = (rng/ft)*(fh-fl), or otherwise rng = rng - (rng/ft)*(ft-fh). After this update, the range is normalized.

 </t>

<t>

-   To normalize the range, the following process is repeated until

-   rng > 2^23. First, rng is set to (rng&lt;8)&amp;0xFFFFFFFF. Then the next

-   8 bits of input are read into sym, using the remaining bit from the

-   previous input octet as the high bit of sym, and the top 7 bits of the

-   next octet for the remaining bits of sym. If no more input octets

-   remain, zero bits are used instead. Then, dif is set to

-   (dif&lt;&lt;8)-sym&amp;0xFFFFFFFF (i.e., using wrap-around if the subtraction

-   overflows a 32-bit register). Finally, if dif is larger than 2^31,

-   dif is then set to dif - 2^31. This process is carried out by

-   ec_dec_normalize() (<xref target="rangedec.c">rangedec.c</xref>).

+   To normalize the range, the following process is repeated until

+   rng > 2^23. First, rng is set to (rng&lt;8)&amp;0xFFFFFFFF. Then the next

+   8 bits of input are read into sym, using the remaining bit from the

+   previous input octet as the high bit of sym, and the top 7 bits of the

+   next octet for the remaining bits of sym. If no more input octets

+   remain, zero bits are used instead. Then, dif is set to

+   (dif&lt;&lt;8)-sym&amp;0xFFFFFFFF (i.e., using wrap-around if the subtraction

+   overflows a 32-bit register). Finally, if dif is larger than 2^31,

+   dif is then set to dif - 2^31. This process is carried out by

+   ec_dec_normalize() (<xref target="rangedec.c">rangedec.c</xref>).

 </t>

 </section>

 <section anchor="decoding-ints" title="Decoding Uniformly Distributed Integers">

<t>

-   Functions ec_dec_uint() or ec_dec_bits() are based on ec_decode() and

-   decode one of N equiprobable symbols, each with a frequency of 1,

-   where N may be as large as 2^32-1. Because ec_decode() is limited to

-   a total frequency of 2^16-1, this is done by decoding a series of

-   symbols in smaller contexts.

+   Functions ec_dec_uint() or ec_dec_bits() are based on ec_decode() and

+   decode one of N equiprobable symbols, each with a frequency of 1,

+   where N may be as large as 2^32-1. Because ec_decode() is limited to

+   a total frequency of 2^16-1, this is done by decoding a series of

+   symbols in smaller contexts.

 </t>

<t>

-   ec_dec_bits() (<xref target="entdec.c">entdec.c</xref>) is defined, like

+   ec_dec_bits() (<xref target="entdec.c">entdec.c</xref>) is defined, like

    ec_decode_bin(), to take a single parameter ftb, with ftb &lt; 32.

    and ftb &lt; 32, and produces an ftb-bit decoded integer value, t,

    initialized to zero. While ftb is greater than 8, it decodes the next

    8 most significant bits of the integer, s = ec_decode_bin(8), updates

-   the decoder state with the 3-tuple (s,s+1,256), adds those bits to

+   the decoder state with the 3-tuple (s,s+1,256), adds those bits to

    the current value of t, t = t&lt;&lt;8 | s, and subtracts 8 from ftb. Then

    it decodes the remaining bits of the integer, s = ec_decode_bin(ftb),

    updates the decoder state with the 3 tuple (s,s+1,1&lt;&lt;ftb), and adds

@@ -995,15 +997,15 @@

 <section anchor="decoder-tell" title="Current Bit Usage">

<t>

-   The bit allocation routines in CELT need to be able to determine a

-   conservative upper bound on the number of bits that have been used

-   to decode from the current frame thus far. This drives allocation

-   decisions which must match those made in the encoder. This is

-   computed in the reference implementation to fractional bit precision

-   by the function ec_dec_tell() (<xref target="rangedec.c">rangedec.c</xref>). Like all

-   operations in the range decoder, it must be implemented in a

-   bit-exact manner, and must produce exactly the same value returned by

-   ec_enc_tell() after encoding the same symbols.

+   The bit allocation routines in CELT need to be able to determine a

+   conservative upper bound on the number of bits that have been used

+   to decode from the current frame thus far. This drives allocation

+   decisions which must match those made in the encoder. This is

+   computed in the reference implementation to fractional bit precision

+   by the function ec_dec_tell() (<xref target="rangedec.c">rangedec.c</xref>). Like all

+   operations in the range decoder, it must be implemented in a

+   bit-exact manner, and must produce exactly the same value returned by

+   ec_enc_tell() after encoding the same symbols.

 </t>

 </section>