ref: 13941f60619dc8ebb345862abad9cbd43d798dde
parent: b55b66119bbce514a62f1a8dbcdce8abc667627d
author: Timothy B. Terriberry <[email protected]>
date: Mon Mar 14 13:29:18 EDT 2011
Draft revisions for the entropy coder. Also includes some other minor grammar revisions.
--- a/doc/draft-ietf-codec-opus.xml
+++ b/doc/draft-ietf-codec-opus.xml
@@ -57,9 +57,9 @@
<section anchor="introduction" title="Introduction">
<t>
-We propose the Opus codec based on a linear prediction layer (LP) and an
+We propose the Opus codec, based on a linear prediction layer (LP) and an
MDCT-based layer. The main idea behind the proposal is that
-the speech low frequencies are usually more efficiently coded using
+in speech, low frequencies are usually more efficiently coded using
linear prediction codecs (such as CELP variants), while music and higher speech frequencies
are more efficiently coded in the transform domain (e.g. MDCT). For low
sampling rates, the MDCT layer is not useful and only the LP-based layer is
@@ -90,15 +90,15 @@
<t>While the symbolic representation is unambiguous and complete it is not
always the easiest way to understand the codec's operation. For this reason
-this document also describes significant parts of the codec in english and
-takes the opportunity to explain the rational behind many of the more
+this document also describes significant parts of the codec in English and
+takes the opportunity to explain the rationale behind many of the more
surprising elements of the design. These descriptions are intended to be
-accurate and informative but the limitations of common english sometimes
+accurate and informative, but the limitations of common english sometimes
result in ambiguity, so it is intended that the reader will always read
them alongside the symbolic representation. Numerous references to the
implementation are provided for this purpose. The descriptions sometimes
-differs in ordering, or through mathematical simplification, from the
-reference wherever such deviation made an explanation easier to understand.
+differ from the reference in ordering or through mathematical simplification
+wherever such deviation made an explanation easier to understand.
For example, the right shift and left shift operations in the reference
implementation are often described using division and multiplication in the text.
In general, the text is focused on the 'what' and 'why' while the symbolic
@@ -113,7 +113,7 @@
In hybrid mode, each frame is coded first by the LP layer and then by the MDCT
layer. In the current prototype, the cutoff frequency is 8 kHz. In the MDCT
layer, all bands below 8 kHz are discarded, such that there is no coding
-redundancy between the two layers. Also both layers use the same instance of
+redundancy between the two layers. Also, both layers use the same instance of
the range coder to encode the signal, which ensures that no "padding bits" are
wasted. The hybrid approach makes it easy to support both constant bit-rate
(CBR) and varaible bit-rate (VBR) coding. Although the SILK layer used is VBR,
@@ -152,10 +152,10 @@
<t>A hybrid (LP+MDCT) mode for full-bandwidth speech at medium bitrates</t>
<t>An MDCT-only mode for very low delay speech transmission as well as music transmission.</t>
</list>
-Each of these modes supports a number of difference frame sizes and sampling
+Each of these modes supports a number of different frame sizes and sampling
rates. In order to distinguish between the various modes and configurations,
we define a single-byte table-of-contents (TOC) header that can used in the transport layer
-(e.g RTP) to signal this information. The following describes the proposed
+(e.g., RTP) to signal this information. The following describes the proposed
TOC byte.
</t>
@@ -190,9 +190,9 @@
</t>
<t>
-There is thus a total of 32 configurations, encoded in 5 bits. On bit is used to signal mono vs stereo, which leaves 2 bits for the number of frames per packets (codes 0 to 3):
+There is thus a total of 32 configurations, encoded in 5 bits. One bit is used to signal mono vs stereo, which leaves 2 bits for the number of frames per packets (codes 0 to 3):
<list style="symbols">
-<t>0: 1 frames in the packet</t>
+<t>0: 1 frame in the packet</t>
<t>1: 2 frames in the packet, each with equal compressed size</t>
<t>2: 2 frames in the packet, with different compressed size</t>
<t>3: arbitrary number of frames in the packet</t>
@@ -200,7 +200,7 @@
For code 2, the TOC byte is followed by the length of the first frame, encoded as described below.
For code 3, the TOC byte is followed by a byte encoding the number of frames in the packet, with the MSB indicating VBR. In the VBR case, the byte indicating the number of frames is followed by N-1 frame
lengths encoded as described below. As an additional limit, the audio duration contained
-within a packet may not exceed 120 ms.
+within a packet MUST NOT exceed 120 ms.
</t>
<t>
@@ -215,7 +215,10 @@
<t>
The maximum size representable is 255*4+255=1275 bytes. For 20 ms frames, that
represents a bit-rate of 510 kb/s, which is really the highest rate anyone would want
-to use in stereo mode (beyond that point, lossless codecs would be more appropriate).
+to use in stereo mode.
+Beyond that point, lossless codecs would be more appropriate.
+It is also roughly the maximum useful rate of the MDCT layer, as shortly
+thereafter additional bits are no longer able to improve quality.
</t>
<section anchor="examples" title="Examples">
@@ -318,14 +321,43 @@
<section anchor="range-decoder" title="Range Decoder">
<t>
+Opus uses an entropy coder based upon <xref target="range-coding"></xref>,
+which is itself a rediscovery of the FIFO arithmetic code introduced by <xref target="coding-thesis"></xref>.
+It is very similar to arithmetic encoding, except that encoding is done with
+digits in any base instead of with bits,
+so it is faster when using larger bases (i.e., an octet). All of the
+calculations in the range coder must use bit-exact integer arithmetic.
+</t>
+<t>
+Each symbol is drawn from a finite alphabet and coded in a separate
+<spanx style="emph">context</spanx> which describes the size of the
+alphabet and the relative frequency of each symbol in that alphabet.
+Opus only uses static contexts; they are not adapted to the
+statistics of the data that is coded.
+</t>
+<t>
+ The parameters needed to encode or decode a symbol in a given context are
+ represented by a three-tuple (fl,fh,ft), with
+ 0 <= fl < fh <= ft <= 65535.
+ The values of this tuple are derived from the probability model for the
+ symbol, represented by traditional
+ <spanx style="emph">frequency counts</spanx> (although, since
+ Opus uses static contexts, these are not updated as symbols are decoded).
+ Let f(i) be the frequency of the ith symbol in the current context.
+ Then the three-tuple corresponding to the kth symbol is given by
+ <![CDATA[
+fl=sum(f(i),i<k), fh=fl+f(i), and ft=sum(f(i)).
+]]>
+</t>
+<t>
The range decoder extracts the symbols and integers encoded using the range encoder in
<xref target="range-encoder"></xref>. The range decoder maintains an internal
-state vector composed of the two-tuple (dif,rng), representing the
+state vector composed of the two-tuple (val,rng), representing the
difference between the high end of the current range and the actual
-coded value, and the size of the current range, respectively. Both
-dif and rng are 32-bit unsigned integer values. rng is initialized to
-2^7. dif is initialized to rng minus the top 7 bits of the first
-input octet. Then the range is immediately normalized, using the
+coded value, minus one, and the size of the current range, respectively. Both
+val and rng are 32-bit unsigned integer values. rng is initialized to
+2^7. val is initialized to rng minus the top 7 bits of the first
+input octet, minus one. Then the range is immediately normalized, using the
procedure described in the following section.
</t>
@@ -332,26 +364,17 @@
<section anchor="decoding-symbols" title="Decoding Symbols">
<t>
Decoding symbols is a two-step process. The first step determines
- a value fs that lies within the range of some symbol in the current
+ a value fs, which lies within the range of some symbol in the current
context. The second step updates the range decoder state with the
- three-tuple (fl,fh,ft) corresponding to that symbol, as defined in
- <xref target="encoding-symbols"></xref>.
+ three-tuple (fl,fh,ft) corresponding to that symbol.
</t>
<t>
- The first step is implemented by ec_decode()
- (rangedec.c),
- and computes fs = ft-min((dif-1)/(rng/ft)+1,ft), where ft is
- the sum of the frequency counts in the current context, as described
- in <xref target="encoding-symbols"></xref>. The divisions here are exact integer division.
+ The first step is implemented by ec_decode()
+ (entdec.c),
+ and computes fs = ft-min(val/(rng/ft)+1,ft).
+ The divisions here are exact integer division.
</t>
<t>
- In the reference implementation, a special version of ec_decode()
- called ec_decode_bin() (rangeenc.c) is defined using
- the parameter ftb instead of ft. It is mathematically equivalent to
- calling ec_decode() with ft = (1<<ftb), but avoids one of the
- divisions.
-</t>
-<t>
The decoder then identifies the symbol in the current context
corresponding to fs; i.e., the one whose three-tuple (fl,fh,ft)
satisfies fl <= fs < fh. This tuple is used to update the decoder
@@ -360,39 +383,97 @@
</t>
<t>
To normalize the range, the following process is repeated until
- rng > 2^23. First, rng is set to (rng<8)&0xFFFFFFFF. Then the next
+ rng > 2^23. First, rng is set to (rng<<8)&0xFFFFFFFF. Then the next
8 bits of input are read into sym, using the remaining bit from the
previous input octet as the high bit of sym, and the top 7 bits of the
next octet for the remaining bits of sym. If no more input octets
- remain, zero bits are used instead. Then, dif is set to
- (dif<<8)-sym&0xFFFFFFFF (i.e., using wrap-around if the subtraction
- overflows a 32-bit register). Finally, if dif is larger than 2^31,
- dif is then set to dif - 2^31. This process is carried out by
- ec_dec_normalize() (rangedec.c).
+ remain, zero bits are used instead. Then, val is set to
+ (val<<8)+256-sym-1&0x7FFFFFFF.
+ If a decoder consumes all of the bytes allocated to the current frame, it
+ MUST continue to use zero where any further input bytes are required.
+ This process is carried out by ec_dec_normalize() (entdec.c).
</t>
</section>
-<section anchor="decoding-ints" title="Decoding Uniformly Distributed Integers">
+<section anchor="decoding-alternate" title="Alternate Decoding Methods">
<t>
- Functions ec_dec_uint() or ec_dec_bits() are based on ec_decode() and
- decode one of N equiprobable symbols, each with a frequency of 1,
- where N may be as large as 2^32-1. Because ec_decode() is limited to
- a total frequency of 2^16-1, this is done by decoding a series of
- symbols in smaller contexts.
+The reference implementation uses three additional decoding methods that are
+exactly equivalent to the above, but make assumptions and simplifications that
+allow for a much more efficient implementation.
</t>
<t>
- ec_dec_bits() (entdec.c) is defined, like
- ec_decode_bin(), to take a single parameter ftb, with ftb < 32.
- and ftb < 32, and produces an ftb-bit decoded integer value, t,
- initialized to zero. While ftb is greater than 8, it decodes the next
- 8 most significant bits of the integer, s = ec_decode_bin(8), updates
- the decoder state with the 3-tuple (s,s+1,256), adds those bits to
- the current value of t, t = t<<8 | s, and subtracts 8 from ftb. Then
- it decodes the remaining bits of the integer, s = ec_decode_bin(ftb),
- updates the decoder state with the 3 tuple (s,s+1,1<<ftb), and adds
- those bits to the final values of t, t = t<<ftb | s.
+ The first is ec_decode_bin (entdec.c), which is defined using the parameter
+ ftb instead of ft.
+ It is mathematically equivalent to calling ec_decode() with
+ ft = (1<<ftb), but avoids one of the divisions.
</t>
<t>
+ The next is ec_dec_bit_logp() (entdec.c), which decodes a single binary
+ symbol, replacing both the ec_decode() and ec_dec_update() steps.
+ The context is described by a single parameter, logp, which is the (negative
+ of the) base-2 logarithm of the probability of a '1'.
+ It is mathematically equivalent to calling ec_decode() with
+ ft = (1<<logp), followed by ec_dec_update() with
+ fl = 0, fh = (1<<logp)-1, ft = (1<<logp) if the returned value
+ of fs is less than (1<<logp)-1 (a '0' was decoded), and with
+ fl = (1<<logp)-1, fh = ft = (1<<logp) otherwise (a '1' was
+ decoded), but it avoids all multiplications and divisions.
+</t>
+<t>
+ The last is ec_dec_icdf() (entdec.c), which decodes a single symbol with a
+ table-based context of up to 8 bits, also replacing both the ec_decode() and
+ ec_dec_update() steps, as well as the search for the decoded symbol
+ inbetween.
+ The context is described by two parameters, an icdf
+ (<spanx style="emph">inverse</spanx> cumulative distribution function)
+ table and ftb.
+ As with ec_decode_bin(), (1<<ftb) is equivalent to ft.
+ idcf[k], on the other hand, stores (1<<ftb)-fh for the kth symbol in
+ the context, which is equal to (1<<ftb)-fl for the (k+1)st symbol.
+ fl for the 0th symbol is assumed to be 0, and the table is terminated by a
+ value of 0 (where fh == ft).
+ It is mathematically equivalent to calling ec_decode() with
+ ft = (1<<ftb), using the returned value fs to search the table for the
+ first entry where (1<<ftb)-fs > icdf[k], and calling
+ ec_dec_update() with fl = (1<<ftb)-icdf[k-1] (or 0 if k == 0),
+ fh = (1<<ftb)-idcf[k], and ft = (1<<ftb).
+ Combining the search with the update allows the division to be replaced by a
+ series of multiplications (which are much cheaper), and using an inverse
+ CDF allows the representation of frequencies up to 256 in an 8-bit table
+ without any special cases.
+</t>
+</section>
+
+<section anchor="decoding-bits" title="Decoding Raw Bits">
+<t>
+ The CELT layer also allows directly encoding a series of
+ <spanx style="emph">raw</spanx> bits, outside
+ of the range coder, implemented in ec_dec_bits() (entdec.c).
+ This is both more efficient, and more robust to bit-errors, which will
+ desynchronize the range coder.
+ The raw bits are packed at the end of the packet, starting by storing the
+ least significant bit of the value to be packed in the least significant bit
+ of the last byte, filling up to the most significant bit in
+ the last byte, and the continuing in the least significant bit of the
+ penultimate byte, and so on.
+ Because the range decoder must read several bytes ahead in the stream, the
+ input consumed by raw bits MAY overlap with the input consumed by the range
+ coder, and a decoder MUST allow this.
+ The format should render it impossible to attempt to read more raw bits than
+ there are actual bits in the frame, though a decoder MAY wish to check for
+ this and report an error.
+</t>
+
+<section anchor="decoding-ints" title="Decoding Uniformly Distributed Integers">
+<t>
+ The ec_dec_uint() function is based on ec_decode() and decodes one of N
+ equiprobable symbols, each with a frequency of 1, where N may be as large as
+ 2^32-1.
+ Because ec_decode() is limited to a total frequency of 2^16-1, this is done
+ by decoding a series of symbols in smaller contexts.
+</t>
+
+<t>
ec_dec_uint() (entdec.c) takes a single parameter,
ft, which is not necessarily a power of two, and returns an integer,
t, with a value between 0 and ft-1, inclusive, which is initialized to zero. Let
@@ -400,9 +481,13 @@
representation of (ft-1), or -1 if no bits are set. If ftb>8, then
the top 8 bits of t are decoded using t = ec_decode((ft-1>>ftb-8)+1),
the decoder state is updated with the three-tuple
- (s,s+1,(ft-1>>ftb-8)+1), and the remaining bits are decoded with
+ (t,t+1,(ft-1>>ftb-8)+1), and the remaining bits are decoded with
t = t<<ftb-8|ec_dec_bits(ftb-8). If, at this point, t >= ft, then
- the current frame is corrupt, and decoding should stop. If the
+ the current frame is corrupt.
+ In that case, the decoder should assume there has been an error in the
+ coding, decoding, or transmission and SHOULD take measures to conceal the
+ error and/or report to the application that a problem has occurred.
+ If the
original value of ftb was not greater than 8, then t is decoded with
t = ec_decode(ft), and the decoder state is updated with the
three-tuple (t,t+1,ft).
@@ -413,13 +498,14 @@
<t>
The bit allocation routines in CELT need to be able to determine a
conservative upper bound on the number of bits that have been used
- to decode from the current frame thus far. This drives allocation
+ from the current frame thus far. This drives allocation
decisions which must match those made in the encoder. This is
- computed in the reference implementation to fractional bit precision
- by the function ec_dec_tell() (rangedec.c). Like all
- operations in the range decoder, it must be implemented in a
+ computed in the reference implementation to whole-bit precision by
+ the function ec_tell() (entcode.h) and to fractional 1/8th bit
+ precision by the function ec_tell_frac() (entcode.c).
+ Like all operations in the range coder, it must be implemented in a
bit-exact manner, and must produce exactly the same value returned by
- ec_enc_tell() after encoding the same symbols.
+ the same functions in the encoder after encoding the same symbols.
</t>
</section>
@@ -467,7 +553,7 @@
<section title='Decode Parameters'>
<t>
- Pulses and gains are decoded from the parameters that was decoded by the range decoder.
+ Pulses and gains are decoded from the parameters that were decoded by the range decoder.
</t>
<t>
@@ -692,13 +778,11 @@
and can be fairly inefficient. As a result three explicitly signaled
mechanisms are provided to alter the implicit allocation:</t>
-<t>
<list style="symbols">
<t>Band boost</t>
<t>Allocation trim</t>
<t>band skipping</t>
</list>
-</t>
<t>The first of these mechanisms, band boost, allows an encoder to boost
the allocation in specific bands. The second, allocation trim, works by
@@ -726,7 +810,7 @@
maximum allocation vector, decoding the boosts, decoding the tilt, determining
the remaining capacity the frame, searching the mode table for the
entry nearest but not exceeding the available space (subject to the tilt, boosts, band
-maxima, and band minima), linear interpolation, reallocation of
+maximums, and band minimums), linear interpolation, reallocation of
unused bits with concurrent skip decoding, determination of the
fine-energy vs shape split, and final reallocation. This process results
in an shape allocation per-band (in 1/8th bit units), a per-band fine-energy
@@ -741,14 +825,14 @@
to entropy coding of splitting parameters). Setting the maximum too low reduces the
maximum achievable quality in a band while setting it too high
may result in waste: bit-stream capacity available at the end
-of the frame which can not be put to any use. The maxima
+of the frame which can not be put to any use. The maximums
specified by the codec reflect the average maximum. In the reference
-the maxima are provided partially computed form, in order to fit in less
+the maximums are provided partially computed form, in order to fit in less
memory, as a static table (XXX cache.caps). Implementations are expected
to simply use the same table data but the procedure for generating
this table is included in rate.c as part of compute_pulse_cache().</t>
-<t>To convert the values in cache.caps into the actual maxima: First
+<t>To convert the values in cache.caps into the actual maximums: First
set nbBands to the maximum number of bands for this mode and stereo to
zero if stereo is not in use and one otherwise. For each band assign N
to the number of MDCT bins covered by the band (for one channel), set LM
@@ -852,7 +936,6 @@
</section>
-
<section anchor="PVQ-decoder" title="Shape Decoder">
<t>
In each band, the normalized <spanx style="emph">shape</spanx> is encoded
@@ -1063,19 +1146,10 @@
<section anchor="range-encoder" title="Range Coder">
<t>
-Opus uses an entropy coder based upon <xref target="range-coding"></xref>,
-which is itself a rediscovery of the FIFO arithmetic code introduced by <xref target="coding-thesis"></xref>.
-It is very similar to arithmetic encoding, except that encoding is done with
-digits in any base instead of with bits,
-so it is faster when using larger bases (i.e.: an octet). All of the
-calculations in the range coder must use bit-exact integer arithmetic.
-</t>
-
-<t>
The range coder also acts as the bit-packer for Opus. It is
used in three different ways, to encode:
<list style="symbols">
-<t>entropy-coded symbols with a fixed probability model using ec_encode(), (rangeenc.c)</t>
+<t>entropy-coded symbols with a fixed probability model using ec_encode(), (entenc.c)</t>
<t>integers from 0 to 2^M-1 using ec_enc_uint() or ec_enc_bits(), (entenc.c)</t>
<t>integers from 0 to N-1 (where N is not a power of two) using ec_enc_uint(). (entenc.c)</t>
</list>
@@ -1089,20 +1163,15 @@
and low are 32-bit unsigned integer values, rem is an octet value or
the special value -1, and ext is an integer with at least 16 bits.
This state vector is initialized at the start of each each frame to
-the value (0,2^31,-1,0).
+the value (0,2^31,-1,0). The reference implementation re-uses the
+'val' field of the entropy coder structure to hold low, in order to
+allow the same structure to be used for encoding and decoding, but
+we maintain the distinction here for clarity.
</t>
-<t>
-Each symbol is drawn from a finite alphabet and coded in a separate
-context which describes the size of the alphabet and the relative
-frequency of each symbol in that alphabet. Opus only uses static
-contexts; they are not adapted to the statistics of the data that is
-coded.
-</t>
-
<section anchor="encoding-symbols" title="Encoding Symbols">
<t>
- The main encoding function is ec_encode() (rangeenc.c),
+ The main encoding function is ec_encode() (entenc.c),
which takes as an argument a three-tuple (fl,fh,ft)
describing the range of the symbol to be encoded in the current
context, with 0 <= fl < fh <= ft <= 65535. The values of this tuple
@@ -1122,10 +1191,10 @@
</t>
<t>
To normalize the range, the following process is repeated until
- rng > 2^23. First, the top 9 bits of low, (low>>23), are placed into
+ rng > 2^23. First, the top 9 bits of low, (low>>23), are placed into
a carry buffer. Then, low is set to <![CDATA[(low << 8 & 0x7FFFFFFF) and rng
is set to (rng<<8)]]>. This process is carried out by
- ec_enc_normalize() (rangeenc.c).
+ ec_enc_normalize() (entenc.c).
</t>
<t>
The 9 bits produced in each iteration of the normalization loop
@@ -1137,9 +1206,9 @@
any mathematically equivalent scheme to perform carry propagation.
</t>
<t>
- The function ec_enc_carry_out() (rangeenc.c) performs
- this buffering. It takes a 9-bit input value, c, from the normalization
- 8-bit output and a carry bit. If c is 0xFF, then ext is incremented
+ The function ec_enc_carry_out() (entenc.c) performs
+ this buffering. It takes a 9-bit input value, c, from the normalization:
+ 8 bits of output and a carry bit. If c is 0xFF, then ext is incremented
and no octets are output. Otherwise, if rem is not the special value
-1, then the octet (rem+(c>>8)) is output. Then ext octets are output
with the value 0 if the carry bit is set, or 0xFF if it is not, and
@@ -1147,7 +1216,7 @@
</t>
<t>
In the reference implementation, a special version of ec_encode()
- called ec_encode_bin() (rangeenc.c) is defined to
+ called ec_encode_bin() (entenc.c) is defined to
take a two-tuple (fl,ftb), where <![CDATA[0 <= fl < 2^ftb and ftb < 16. It is
mathematically equivalent to calling ec_encode() with the three-tuple
(fl,fl+1,1<<ftb)]]>, but avoids using division.
@@ -1155,21 +1224,27 @@
</t>
</section>
-<section anchor="encoding-ints" title="Encoding Uniformly Distributed Integers">
+<section anchor="encoding-bits" title="Encoding Raw Bits">
<t>
- Functions ec_enc_uint() or ec_enc_bits() are based on ec_encode() and
- encode one of N equiprobable symbols, each with a frequency of 1,
- where N may be as large as 2^32-1. Because ec_encode() is limited to
- a total frequency of 2^16-1, this is done by encoding a series of
- symbols in smaller contexts.
+ The CELT layer also allows directly encoding a series of raw bits, outside
+ of the range coder, implemented in ec_enc_bits() (entenc.c).
+ The raw bits are packed at the end of the packet, starting by storing the
+ least significant bit of the value to be packed in the least significant bit
+ of the last byte, filling up to the most significant bit in
+ the last byte, and the continuing in the least significant bit of the
+ penultimate byte, and so on.
+ This packing may continue into the last byte output by the range coder,
+ though the format should render it impossible to overwrite any set bit
+ produced by the range coder when the procedure in
+ <xref target='encoder-finalzing'/> is followed to finalize the stream.
</t>
+
+<section anchor="encoding-ints" title="Encoding Uniformly Distributed Integers">
<t>
- ec_enc_bits() (entenc.c) is defined, like
- ec_encode_bin(), to take a two-tuple (fl,ftb), with <![CDATA[0 <= fl < 2^ftb
- and ftb < 32. While ftb is greater than 8, it encodes bits (ftb-8) to
- (ftb-1) of fl, e.g., (fl>>ftb-8&0xFF) using ec_encode_bin() and
- subtracts 8 from ftb. Then, it encodes the remaining bits of fl, e.g.,
- (fl&(1<<ftb)-1)]]>, again using ec_encode_bin().
+ The function ec_enc_uint() is based on ec_encode() and encodes one of N
+ equiprobable symbols, each with a frequency of 1, where N may be as large as
+ 2^32-1. Because ec_encode() is limited to a total frequency of 2^16-1, this
+ is done by encoding a series of symbols in smaller contexts.
</t>
<t>
ec_enc_uint() (entenc.c) takes a two-tuple (fl,ft),
@@ -1178,9 +1253,8 @@
(ft-1), or -1 if no bits are set. If ftb>8, then the top 8 bits of fl
are encoded using ec_encode() with the three-tuple
(fl>>ftb-8,(fl>>ftb-8)+1,(ft-1>>ftb-8)+1), and the remaining bits
- are encoded with ec_enc_bits using the two-tuple
- <![CDATA[(fl&(1<<ftb-8)-1,ftb-8). Otherwise, fl is encoded with ec_encode()
- directly using the three-tuple (fl,fl+1,ft)]]>.
+ are encoded as raw bits. Otherwise, fl is encoded with ec_encode() directly
+ using the three-tuple (fl,fl+1,ft).
</t>
</section>
@@ -1189,15 +1263,17 @@
After all symbols are encoded, the stream must be finalized by
outputting a value inside the current range. Let end be the integer
in the interval [low,low+rng) with the largest number of trailing
- zero bits. Then while end is not zero, the top 9 bits of end, e.g.,
+ zero bits, b, such that end+(1<<b)-1 is also in the interval
+ [low,low+rng). Then while end is not zero, the top 9 bits of end, e.g.,
<![CDATA[(end>>23), are sent to the carry buffer, and end is replaced by
(end<<8&0x7FFFFFFF). Finally, if the value in carry buffer, rem, is]]>
neither zero nor the special value -1, or the carry count, ext, is
greater than zero, then 9 zero bits are sent to the carry buffer.
After the carry buffer is finished outputting octets, the rest of the
- output buffer is padded with zero octets. Finally, rem is set to the
+ output buffer (if any) is padded with zero bits, until it reaches the raw
+ bits. Finally, rem is set to the
special value -1. This process is implemented by ec_enc_done()
- (rangeenc.c).
+ (entenc.c).
</t>
</section>
@@ -1206,12 +1282,14 @@
The bit allocation routines in Opus need to be able to determine a
conservative upper bound on the number of bits that have been used
to encode the current frame thus far. This drives allocation
- decisions and ensures that the range code will not overflow the
- output buffer. This is computed in the reference implementation to
- fractional bit precision by the function ec_enc_tell()
- (rangeenc.c).
- Like all operations in the range encoder, it must
- be implemented in a bit-exact manner.
+ decisions and ensures that the range coder and raw bits will not
+ overflow the output buffer. This is computed in the
+ reference implementation to whole-bit precision by
+ the function ec_tell() (entcode.h) and to fractional 1/8th bit
+ precision by the function ec_tell_frac() (entcode.c).
+ Like all operations in the range coder, it must be implemented in a
+ bit-exact manner, and must produce exactly the same value returned by
+ the same functions in the decoder after decoding the same symbols.
</t>
</section>