ref: 2b5dc8624b22df9f39a33ebfdc726b645fef0591
parent: 41ec4b2837f9c8e4301974e5c4dd01d2e59a872a
author: Jean-Marc Valin <[email protected]>
date: Sun Nov 14 16:18:58 EST 2010
Adding range coding information
--- a/doc/draft-ietf-codec-opus.xml
+++ b/doc/draft-ietf-codec-opus.xml
@@ -2,7 +2,7 @@
<!DOCTYPE rfc SYSTEM 'rfc2629.dtd'>
<?rfc toc="yes" symrefs="yes" ?>
-<rfc ipr="trust200902" category="std" docName="draft-ietf-codec-opus-01">
+<rfc ipr="trust200902" category="std" docName="draft-ietf-codec-opus-02">
<front>
<title abbrev="Interactive Audio Codec">Definition of the Opus Audio Codec</title>
@@ -277,6 +277,309 @@
</section>
+<section title="Codec Encoder">
+<t>
+Opus encoder block diagram.
+</t>
+
+<section anchor="range-encoder" title="Range Coder">
+<t>
+Opus uses an entropy coder based upon <xref target="range-coding"></xref>,
+which is itself a rediscovery of the FIFO arithmetic code introduced by <xref target="coding-thesis"></xref>.
+It is very similar to arithmetic encoding, except that encoding is done with
+digits in any base instead of with bits,
+so it is faster when using larger bases (i.e.: an octet). All of the
+calculations in the range coder must use bit-exact integer arithmetic.
+</t>
+
+<t>
+The range coder also acts as the bit-packer for Opus. It is
+used in three different ways, to encode:
+<list style="symbols">
+<t>entropy-coded symbols with a fixed probability model using ec_encode(), (rangeenc.c)</t>
+<t>integers from 0 to 2^M-1 using ec_enc_uint() or ec_enc_bits(), (entenc.c)</t>
+<t>integers from 0 to N-1 (where N is not a power of two) using ec_enc_uint(). (entenc.c)</t>
+</list>
+</t>
+
+<t>
+The range encoder maintains an internal state vector composed of the
+four-tuple (low,rng,rem,ext), representing the low end of the current
+range, the size of the current range, a single buffered output octet,
+and a count of additional carry-propagating output octets. Both rng
+and low are 32-bit unsigned integer values, rem is an octet value or
+the special value -1, and ext is an integer with at least 16 bits.
+This state vector is initialized at the start of each each frame to
+the value (0,2^31,-1,0).
+</t>
+
+<t>
+Each symbol is drawn from a finite alphabet and coded in a separate
+context which describes the size of the alphabet and the relative
+frequency of each symbol in that alphabet. Opus only uses static
+contexts; they are not adapted to the statistics of the data that is
+coded.
+</t>
+
+<section anchor="encoding-symbols" title="Encoding Symbols">
+<t>
+ The main encoding function is ec_encode() (rangeenc.c),
+ which takes as an argument a three-tuple (fl,fh,ft)
+ describing the range of the symbol to be encoded in the current
+ context, with 0 <= fl < fh <= ft <= 65535. The values of this tuple
+ are derived from the probability model for the symbol. Let f(i) be
+ the frequency of the ith symbol in the current context. Then the
+ three-tuple corresponding to the kth symbol is given by
+ <![CDATA[
+fl=sum(f(i),i<k), fh=fl+f(i), and ft=sum(f(i)).
+]]>
+</t>
+<t>
+ ec_encode() updates the state of the encoder as follows. If fl is
+ greater than zero, then low = low + rng - (rng/ft)*(ft-fl) and
+ rng = (rng/ft)*(fh-fl). Otherwise, low is unchanged and
+ rng = rng - (rng/ft)*(fh-fl). The divisions here are exact integer
+ division. After this update, the range is normalized.
+</t>
+<t>
+ To normalize the range, the following process is repeated until
+ rng > 2^23. First, the top 9 bits of low, (low>>23), are placed into
+ a carry buffer. Then, low is set to <![CDATA[(low << 8 & 0x7FFFFFFF) and rng
+ is set to (rng<<8)]]>. This process is carried out by
+ ec_enc_normalize() (rangeenc.c).
+</t>
+<t>
+ The 9 bits produced in each iteration of the normalization loop
+ consist of 8 data bits and a carry flag. The final value of the
+ output bits is not determined until carry propagation is accounted
+ for. Therefore the reference implementation buffers a single
+ (non-propagating) output octet and keeps a count of additional
+ propagating (0xFF) output octets. An implementation MAY choose to use
+ any mathematically equivalent scheme to perform carry propagation.
+</t>
+<t>
+ The function ec_enc_carry_out() (rangeenc.c) performs
+ this buffering. It takes a 9-bit input value, c, from the normalization
+ 8-bit output and a carry bit. If c is 0xFF, then ext is incremented
+ and no octets are output. Otherwise, if rem is not the special value
+ -1, then the octet (rem+(c>>8)) is output. Then ext octets are output
+ with the value 0 if the carry bit is set, or 0xFF if it is not, and
+ rem is set to the lower 8 bits of c. After this, ext is set to zero.
+</t>
+<t>
+ In the reference implementation, a special version of ec_encode()
+ called ec_encode_bin() (rangeenc.c) is defined to
+ take a two-tuple (fl,ftb), where <![CDATA[0 <= fl < 2^ftb and ftb < 16. It is
+ mathematically equivalent to calling ec_encode() with the three-tuple
+ (fl,fl+1,1<<ftb)]]>, but avoids using division.
+
+</t>
+</section>
+
+<section anchor="encoding-ints" title="Encoding Uniformly Distributed Integers">
+<t>
+ Functions ec_enc_uint() or ec_enc_bits() are based on ec_encode() and
+ encode one of N equiprobable symbols, each with a frequency of 1,
+ where N may be as large as 2^32-1. Because ec_encode() is limited to
+ a total frequency of 2^16-1, this is done by encoding a series of
+ symbols in smaller contexts.
+</t>
+<t>
+ ec_enc_bits() (entenc.c) is defined, like
+ ec_encode_bin(), to take a two-tuple (fl,ftb), with <![CDATA[0 <= fl < 2^ftb
+ and ftb < 32. While ftb is greater than 8, it encodes bits (ftb-8) to
+ (ftb-1) of fl, e.g., (fl>>ftb-8&0xFF) using ec_encode_bin() and
+ subtracts 8 from ftb. Then, it encodes the remaining bits of fl, e.g.,
+ (fl&(1<<ftb)-1)]]>, again using ec_encode_bin().
+</t>
+<t>
+ ec_enc_uint() (entenc.c) takes a two-tuple (fl,ft),
+ where ft is not necessarily a power of two. Let ftb be the location
+ of the highest 1 bit in the two's-complement representation of
+ (ft-1), or -1 if no bits are set. If ftb>8, then the top 8 bits of fl
+ are encoded using ec_encode() with the three-tuple
+ (fl>>ftb-8,(fl>>ftb-8)+1,(ft-1>>ftb-8)+1), and the remaining bits
+ are encoded with ec_enc_bits using the two-tuple
+ <![CDATA[(fl&(1<<ftb-8)-1,ftb-8). Otherwise, fl is encoded with ec_encode()
+ directly using the three-tuple (fl,fl+1,ft)]]>.
+</t>
+</section>
+
+<section anchor="encoder-finalizing" title="Finalizing the Stream">
+<t>
+ After all symbols are encoded, the stream must be finalized by
+ outputting a value inside the current range. Let end be the integer
+ in the interval [low,low+rng) with the largest number of trailing
+ zero bits. Then while end is not zero, the top 9 bits of end, e.g.,
+ <![CDATA[(end>>23), are sent to the carry buffer, and end is replaced by
+ (end<<8&0x7FFFFFFF). Finally, if the value in carry buffer, rem, is]]>
+ neither zero nor the special value -1, or the carry count, ext, is
+ greater than zero, then 9 zero bits are sent to the carry buffer.
+ After the carry buffer is finished outputting octets, the rest of the
+ output buffer is padded with zero octets. Finally, rem is set to the
+ special value -1. This process is implemented by ec_enc_done()
+ (rangeenc.c).
+</t>
+</section>
+
+<section anchor="encoder-tell" title="Current Bit Usage">
+<t>
+ The bit allocation routines in Opus need to be able to determine a
+ conservative upper bound on the number of bits that have been used
+ to encode the current frame thus far. This drives allocation
+ decisions and ensures that the range code will not overflow the
+ output buffer. This is computed in the reference implementation to
+ fractional bit precision by the function ec_enc_tell()
+ (rangeenc.c).
+ Like all operations in the range encoder, it must
+ be implemented in a bit-exact manner.
+</t>
+</section>
+
+</section>
+
+<section title="SILK Encoder">
+<t>
+Copy from SILK draft.
+</t>
+</section>
+
+<section title="CELT Encoder">
+<t>
+Copy from CELT draft.
+</t>
+</section>
+
+</section>
+
+<section title="Codec Decoder">
+<t>
+Opus decoder block diagram.
+</t>
+
+<section anchor="range-decoder" title="Range Decoder">
+<t>
+The range decoder extracts the symbols and integers encoded using the range encoder in
+<xref target="range-encoder"></xref>. The range decoder maintains an internal
+state vector composed of the two-tuple (dif,rng), representing the
+difference between the high end of the current range and the actual
+coded value, and the size of the current range, respectively. Both
+dif and rng are 32-bit unsigned integer values. rng is initialized to
+2^7. dif is initialized to rng minus the top 7 bits of the first
+input octet. Then the range is immediately normalized, using the
+procedure described in the following section.
+</t>
+
+<section anchor="decoding-symbols" title="Decoding Symbols">
+<t>
+ Decoding symbols is a two-step process. The first step determines
+ a value fs that lies within the range of some symbol in the current
+ context. The second step updates the range decoder state with the
+ three-tuple (fl,fh,ft) corresponding to that symbol, as defined in
+ <xref target="encoding-symbols"></xref>.
+</t>
+<t>
+ The first step is implemented by ec_decode()
+ (rangedec.c),
+ and computes fs = ft-min((dif-1)/(rng/ft)+1,ft), where ft is
+ the sum of the frequency counts in the current context, as described
+ in <xref target="encoding-symbols"></xref>. The divisions here are exact integer division.
+</t>
+<t>
+ In the reference implementation, a special version of ec_decode()
+ called ec_decode_bin() (rangeenc.c) is defined using
+ the parameter ftb instead of ft. It is mathematically equivalent to
+ calling ec_decode() with ft = (1<<ftb), but avoids one of the
+ divisions.
+</t>
+<t>
+ The decoder then identifies the symbol in the current context
+ corresponding to fs; i.e., the one whose three-tuple (fl,fh,ft)
+ satisfies fl <= fs < fh. This tuple is used to update the decoder
+ state according to dif = dif - (rng/ft)*(ft-fh), and if fl is greater
+ than zero, rng = (rng/ft)*(fh-fl), or otherwise rng = rng - (rng/ft)*(ft-fh). After this update, the range is normalized.
+</t>
+<t>
+ To normalize the range, the following process is repeated until
+ rng > 2^23. First, rng is set to (rng<8)&0xFFFFFFFF. Then the next
+ 8 bits of input are read into sym, using the remaining bit from the
+ previous input octet as the high bit of sym, and the top 7 bits of the
+ next octet for the remaining bits of sym. If no more input octets
+ remain, zero bits are used instead. Then, dif is set to
+ (dif<<8)-sym&0xFFFFFFFF (i.e., using wrap-around if the subtraction
+ overflows a 32-bit register). Finally, if dif is larger than 2^31,
+ dif is then set to dif - 2^31. This process is carried out by
+ ec_dec_normalize() (rangedec.c).
+</t>
+</section>
+
+<section anchor="decoding-ints" title="Decoding Uniformly Distributed Integers">
+<t>
+ Functions ec_dec_uint() or ec_dec_bits() are based on ec_decode() and
+ decode one of N equiprobable symbols, each with a frequency of 1,
+ where N may be as large as 2^32-1. Because ec_decode() is limited to
+ a total frequency of 2^16-1, this is done by decoding a series of
+ symbols in smaller contexts.
+</t>
+<t>
+ ec_dec_bits() (entdec.c) is defined, like
+ ec_decode_bin(), to take a single parameter ftb, with ftb < 32.
+ and ftb < 32, and produces an ftb-bit decoded integer value, t,
+ initialized to zero. While ftb is greater than 8, it decodes the next
+ 8 most significant bits of the integer, s = ec_decode_bin(8), updates
+ the decoder state with the 3-tuple (s,s+1,256), adds those bits to
+ the current value of t, t = t<<8 | s, and subtracts 8 from ftb. Then
+ it decodes the remaining bits of the integer, s = ec_decode_bin(ftb),
+ updates the decoder state with the 3 tuple (s,s+1,1<<ftb), and adds
+ those bits to the final values of t, t = t<<ftb | s.
+</t>
+<t>
+ ec_dec_uint() (entdec.c) takes a single parameter,
+ ft, which is not necessarily a power of two, and returns an integer,
+ t, with a value between 0 and ft-1, inclusive, which is initialized to zero. Let
+ ftb be the location of the highest 1 bit in the two's-complement
+ representation of (ft-1), or -1 if no bits are set. If ftb>8, then
+ the top 8 bits of t are decoded using t = ec_decode((ft-1>>ftb-8)+1),
+ the decoder state is updated with the three-tuple
+ (s,s+1,(ft-1>>ftb-8)+1), and the remaining bits are decoded with
+ t = t<<ftb-8|ec_dec_bits(ftb-8). If, at this point, t >= ft, then
+ the current frame is corrupt, and decoding should stop. If the
+ original value of ftb was not greater than 8, then t is decoded with
+ t = ec_decode(ft), and the decoder state is updated with the
+ three-tuple (t,t+1,ft).
+</t>
+</section>
+
+<section anchor="decoder-tell" title="Current Bit Usage">
+<t>
+ The bit allocation routines in CELT need to be able to determine a
+ conservative upper bound on the number of bits that have been used
+ to decode from the current frame thus far. This drives allocation
+ decisions which must match those made in the encoder. This is
+ computed in the reference implementation to fractional bit precision
+ by the function ec_dec_tell() (rangedec.c). Like all
+ operations in the range decoder, it must be implemented in a
+ bit-exact manner, and must produce exactly the same value returned by
+ ec_enc_tell() after encoding the same symbols.
+</t>
+</section>
+
+</section>
+
+<section title="SILK Decoder">
+<t>
+Copy from SILK draft.
+</t>
+</section>
+
+<section title="CELT Decoder">
+<t>
+Copy from CELT draft.
+</t>
+</section>
+
+</section>
+
<section anchor="security" title="Security Considerations">
<t>
@@ -381,6 +684,25 @@
<seriesInfo name='BCP' value='72' />
<seriesInfo name='RFC' value='3552' />
<format type='TXT' octets='110393' target='ftp://ftp.isi.edu/in-notes/rfc3552.txt' />
+</reference>
+
+<reference anchor="range-coding">
+<front>
+<title>Range encoding: An algorithm for removing redundancy from a digitised message</title>
+<author initials="G." surname="Nigel" fullname=""><organization/></author>
+<author initials="N." surname="Martin" fullname=""><organization/></author>
+<date year="1979" />
+</front>
+<seriesInfo name="Proc. Institution of Electronic and Radio Engineers International Conference on Video and Data Recording" value="" />
+</reference>
+
+<reference anchor="coding-thesis">
+<front>
+<title>Source coding algorithms for fast data compression</title>
+<author initials="R." surname="Pasco" fullname=""><organization/></author>
+<date month="May" year="1976" />
+</front>
+<seriesInfo name="Ph.D. thesis" value="Dept. of Electrical Engineering, Stanford University" />
</reference>