ref: 1e0b6fd9f7b4dfeec36ea57c1d682f9211250f64
parent: 998e9e00fd01e0eff8cca0e752c48f2286804c13
author: Ralph Giles <[email protected]>
date: Tue Jan 14 12:23:00 EST 2014
Rewrite gap filling section. Incorporate list feedback from Mark Harris, Tim and Jean-Marc and try to improve clarity.
--- a/doc/draft-ietf-codec-oggopus.xml
+++ b/doc/draft-ietf-codec-oggopus.xml
@@ -249,16 +249,17 @@
<section anchor="gap-repair" title="Repairing Gaps in Real-time Streams">
<t>
-In order to support capturing a real-time stream that has lost packets, or that
- uses discontinuous transmission (DTX), a muxer SHOULD emit packets that
- explicitly request the use of Packet Loss Concealment (PLC) in place of the
- packets that were not transmitted.
+In order to support capturing a real-time stream that has lost or not
+ transmitted packets, a muxer SHOULD emit packets that explicitly request the
+ use of Packet Loss Concealment (PLC) in place of the missing packets.
Only gaps that are a multiple of 2.5 ms are repairable, as these are the
- only durations that can be created by packet loss or DTX.
+ only durations that can be created by packet loss or discontinuous
+ transmission.
Muxers need not handle other gap sizes.
Creating the necessary packets involves synthesizing a TOC byte (defined in
- Section 3.1 of <xref target="RFC6716"/>)---and whatever additional
- internal framing is needed---to indicate the packet duration for each stream.
+Section 3.1 of <xref target="RFC6716"/>)—and whatever
+ additional internal framing is needed—to indicate the packet duration
+ for each stream.
The actual length of each missing Opus frame inside the packet is zero bytes,
as defined in Section 3.2.1 of <xref target="RFC6716"/>.
</t>
@@ -267,17 +268,11 @@
<xref target="RFC6716"/> does not impose any requirements on the PLC, but this
section outlines choices that are expected to have a positive influence on
most PLC implementations, including the reference implementation.
-When possible, creating the TOC byte using the same mode, audio bandwidth,
- channel count, and frame size as the previous packet (if any) covers all
- losses that do not include a configuration switch, as defined in
- Section 4.5 of <xref target="RFC6716"/>.
+Where possible, synthesized TOC bytes MAY use the same mode, audio bandwidth,
+ channel count, and frame size as the previous packet (if any).
This is the simplest and usually the most well-tested case for the PLC to
- handle.
-If there is no previous packet, reasonable decoders will not emit anything
- other than silence regardless of the mode.
-Using the CELT-only mode for this case (with any audio bandwidth) allows
- maximum flexibility, since a single packet can represent any duration up to
- 120 ms that is a multiple of 2.5 ms using at most two bytes.
+ handle and it covers all losses that do not include a configuration switch,
+ as defined in Section 4.5 of <xref target="RFC6716"/>.
</t>
<t>
@@ -286,11 +281,14 @@
data it generates.
However, if the size of the gap is not a multiple of the most recent frame
size, then the frame size will have to change for at least some frames.
-Delaying such changes as long as possible to simplifies things for PLC
+Delaying such changes as long as possible simplifies things for PLC
implementations.
-A 95 ms gap could be encoded as 19 5 ms frames in two bytes
- with a single CBR code 3 packet.
-If the previous frame size was 20 ms, using four 80 ms frames,
+</t>
+
+<t>
+As an example, a 95 ms gap could be encoded as nineteen 5 ms frames
+ in two bytes with a single CBR code 3 packet.
+If the previous frame size was 20 ms, using four 20 ms frames
followed by three 5 ms frames requires 4 bytes (plus an extra byte
of Ogg lacing overhead), but allows the PLC to use its well-tested steady
state behavior for as long as possible.
@@ -305,6 +303,19 @@
10 ms.
If switching to CELT mode is needed to match the gap size, doing so at the end
of the gap allows the PLC to function for as long as possible.
+Thus in the above example, if the previous frame was a 20 ms SILK mode
+ frame, a better solution would be to synthesize a packet describing four
+ 20 ms SILK frames, followed by a packet with a single 10 ms SILK
+ frame, and finally a packet with a 5 ms CELT frame, to fill the 95 ms
+ gap.
+This also requires four bytes to describe the synthesized packet data (two
+ bytes for a CBR code 3 and one byte each for two code 0 packets) but requires
+ three bytes of Ogg lacing overhead to mark the packet boundaries.
+At 0.6 kbps this is still a minimal bitrate impact over a naive, low quality
+ solution.
+</t>
+
+<t>
Since CELT does not support medium-band audio, using wideband when switching
from medium-band SILK ensures that any PLC implementation that does try to
migrate state between the modes will not be forced to artificially reduce the