shithub: opus

--- a/doc/build_draft.sh

+++ b/doc/build_draft.sh

@@ -35,7 +35,7 @@

 tar czf opus_source.tar.gz "${destdir}"

 echo building base64 version

-cat opus_source.tar.gz| base64 | fold -w 66 | sed 's/^/###/' > opus_source.base64

+cat opus_source.tar.gz| base64 | tr -d '\n' | fold -w 64 | sed 's/^/###/' > opus_source.base64

 #echo '<figure>' > opus_compare_escaped.c

 #echo '<artwork>' >> opus_compare_escaped.c

--- a/doc/draft-ietf-codec-opus.xml

+++ b/doc/draft-ietf-codec-opus.xml

@@ -71,7 +71,9 @@

 <section anchor="introduction" title="Introduction">

<t>

-The Opus codec is a real-time interactive audio codec composed of a linear

+The Opus codec is a real-time interactive audio codec designed to meet the requirements

+described in <xref target="requirements"></xref>.

+It is composed of a linear

  prediction (LP)-based layer and a Modified Discrete Cosine Transform

  (MDCT)-based layer.

 The main idea behind using two layers is that in speech, linear prediction

@@ -4237,16 +4239,19 @@

 </t>

 </section>

-<section anchor="cwrs-decoder" title="Index Decoding">

+<section anchor="cwrs-decoder" title="PVQ Decoding">

<t>

-The codeword is decoded as a uniformly-distributed integer value

-by decode_pulses() (cwrs.c).

-The codeword is converted from a unique index in the same way specified in

+Decoding of PVQ vectors is implemented in decode_pulses() (cwrs.c).

+The uique codeword index is decoded as a uniformly-distributed integer value between 0 and

+V(N,K)-1, where V(N,K) is the number of possible combinations of K pulses in

+N samples. The index is then converted to a vector in the same way specified in

 <xref target="PVQ"></xref>. The indexing is based on the calculation of V(N,K)

-(denoted N(L,K) in <xref target="PVQ"></xref>), which is the number of possible

-combinations of K pulses

-in N samples. The number of combinations can be computed recursively as

+(denoted N(L,K) in <xref target="PVQ"></xref>).

+</t>

+<t>

+ The number of combinations can be computed recursively as

 V(N,K) = V(N-1,K) + V(N,K-1) + V(N-1,K-1), with V(N,0) = 1 and V(0,K) = 0, K != 0.

 There are many different ways to compute V(N,K), including precomputed tables and direct

 use of the recursive formulation. The reference implementation applies the recursive

@@ -4260,9 +4265,7 @@

 </t>

<t>

-The decoding of the codeword from the index is performed as specified in

-<xref target="PVQ"></xref>, as implemented in function

-decode_pulses() (cwrs.c). The decoded codeword is then normalised such that it's

+The decoded vector is normalised such that its

 L2-norm equals one.

 </t>

 </section>

@@ -4316,6 +4319,9 @@

<t>

 If the decoded vector represents more

 than one time block, then the following process is applied separately on each time block.

+Also, if each block represents 8 samples or more, then another N-D rotation, by

+(pi/2-theta), is applied <spanx style="emph">before</spanx> the rotation described above. This

+extra rotation is applied in an interleaved manner with a stride equal to round(sqrt(N/nb_blocks))

 </t>

 </section>

@@ -5388,6 +5394,25 @@

 <references title="Informative References">

+<reference anchor='requirements'>

+<front>

+<title>Requirements for an Internet Audio Codec</title>

+<author initials='J.-M.' surname='Valin' fullname='J.-M. Valin'>

+<organization /></author>

+<author initials='K.' surname='Vos' fullname='K. Vos'>

+<organization /></author>

+<author>

+<organization>IETF</organization></author>

+<date year='2011' month='August' />

+<abstract>

+<t>This document provides specific requirements for an Internet audio

+   codec.  These requirements address quality, sampling rate, bit-rate,

+   and packet-loss robustness, as well as other desirable properties.

+</t></abstract></front>

+<seriesInfo name='RFC' value='6366' />

+<format type='TXT' target='http://tools.ietf.org/rfc/rfc6366.txt' />

+</reference>

 <reference anchor='SILK'>

 <front>

 <title>SILK Speech Codec</title>

@@ -5423,25 +5448,6 @@

         <seriesInfo name="ICASSP-1991, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 641-644, October" value="1991"/>

       </reference>

-      <reference anchor="sinervo-norsig">

-        <front>

-          <title abbrev="SVQ versus MSVQ">Evaluation of Split and Multistage Techniques in LSF Quantization</title>

-          <author initials="U.S." surname="Sinervo" fullname="Ulpu Sinervo">

-            <organization/>

-          </author>

-          <author initials="J.N." surname="Nurminen" fullname="Jani Nurminen">

-            <organization/>

-          </author>

-          <author initials="A.H." surname="Heikkinen" fullname="Ari Heikkinen">

-            <organization/>

-          </author>

-          <author initials="J.S." surname="Saarinen" fullname="Jukka Saarinen">

-            <organization/>

-          </author>

-        </front>

-        <seriesInfo name="NORSIG-2001, Norsk symposium i signalbehandling, Trondheim, Norge, October" value="2001"/>

-      </reference>

       <reference anchor="leblanc-tsap">

         <front>

           <title>Efficient Search and Design Procedures for Robust Multi-Stage VQ of LPC Parameters for 4&nbsp;kb/s Speech Coding</title>

@@ -5592,14 +5598,6 @@

 </section>

 </section>

-<!--

-<section anchor="opus-compare" title="opus_compare.c">

-<t>

-<?rfc include="opus_compare_escaped.c"?>

-</t>

-</section>

- -->

 <section anchor="self-delimiting-framing" title="Self-Delimiting Framing">

<t>