ref: 8161f9e9690fdf59a32a0aa2a5eda7338099169d
parent: db5f38ef65560319ed530fc88b2618511c2c2d67
author: Jean-Marc Valin <[email protected]>
date: Mon May 11 12:34:58 EDT 2009
Version -01 of the RTP draft
--- a/doc/ietf/draft-valin-celt-rtp-profile-00.txt
+++ /dev/null
@@ -1,1176 +1,0 @@
-
-
-
-AVT Working Group J-M. Valin
-Internet-Draft Octasic Semiconductor
-Expires: November 9, 2009 G. Maxwell
- Juniper Networks
- May 8, 2009
-
-
- draft-valin-celt-rtp-profile-00
- RTP Payload Format for the CELT Codec
-
-Status of this Memo
-
- This Internet-Draft is submitted to IETF in full conformance with the
- provisions of BCP 78 and BCP 79.
-
- Internet-Drafts are working documents of the Internet Engineering
- Task Force (IETF), its areas, and its working groups. Note that
- other groups may also distribute working documents as Internet-
- Drafts.
-
- Internet-Drafts are draft documents valid for a maximum of six months
- and may be updated, replaced, or obsoleted by other documents at any
- time. It is inappropriate to use Internet-Drafts as reference
- material or to cite them other than as "work in progress."
-
- The list of current Internet-Drafts can be accessed at
- http://www.ietf.org/ietf/1id-abstracts.txt.
-
- The list of Internet-Draft Shadow Directories can be accessed at
- http://www.ietf.org/shadow.html.
-
- This Internet-Draft will expire on November 9, 2009.
-
-Copyright Notice
-
- Copyright (c) 2009 IETF Trust and the persons identified as the
- document authors. All rights reserved.
-
- This document is subject to BCP 78 and the IETF Trust's Legal
- Provisions Relating to IETF Documents in effect on the date of
- publication of this document (http://trustee.ietf.org/license-info).
- Please review these documents carefully, as they describe your rights
- and restrictions with respect to this document.
-
-
-
-
-
-
-
-
-Valin & Maxwell Expires November 9, 2009 [Page 1]
-
-Internet-Draft draft-valin-celt-rtp-profile-00 May 2009
-
-
-Abstract
-
- CELT is an open-source voice codec suitable for use in very low delay
- audio communication applications, including Voice over IP (VoIP).
- This document describes the payload format for CELT generated bit
- streams within an RTP packet. Also included here are the necessary
- details for the use of CELT with the Session Description Protocol
- (SDP). At the time of this writing, the CELT bit-stream has NOT been
- finalized yet, and compatibility is usually broken with every new
- release of the codec.
-
-
-Table of Contents
-
- 1. Conventions used in this document . . . . . . . . . . . . . . 3
- 2. Overview of the CELT Codec . . . . . . . . . . . . . . . . . . 4
- 3. RTP payload format for CELT . . . . . . . . . . . . . . . . . 5
- 3.1. RTP Header . . . . . . . . . . . . . . . . . . . . . . . . 5
- 3.2. CELT payload . . . . . . . . . . . . . . . . . . . . . . . 6
- 3.3. Multiple CELT frames in a RTP packet . . . . . . . . . . . 7
- 3.4. Multiple channels . . . . . . . . . . . . . . . . . . . . 8
- 4. MIME registration of CELT . . . . . . . . . . . . . . . . . . 10
- 5. SDP usage of CELT . . . . . . . . . . . . . . . . . . . . . . 12
- 5.1. Multichannel Mapping . . . . . . . . . . . . . . . . . . . 14
- 5.2. Low-Overhead Mode . . . . . . . . . . . . . . . . . . . . 15
- 6. Congestion Control . . . . . . . . . . . . . . . . . . . . . . 17
- 7. Security Considerations . . . . . . . . . . . . . . . . . . . 18
- 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 19
- 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20
- 9.1. Normative References . . . . . . . . . . . . . . . . . . . 20
- 9.2. Informative References . . . . . . . . . . . . . . . . . . 20
- Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Valin & Maxwell Expires November 9, 2009 [Page 2]
-
-Internet-Draft draft-valin-celt-rtp-profile-00 May 2009
-
-
-1. Conventions used in this document
-
- The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
- "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
- document are to be interpreted as described in RFC 2119 [rfc2119].
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Valin & Maxwell Expires November 9, 2009 [Page 3]
-
-Internet-Draft draft-valin-celt-rtp-profile-00 May 2009
-
-
-2. Overview of the CELT Codec
-
- CELT stands for "Constrained Energy Lapped Transform". It applies
- some of the CELP principles, but does everything in the frequency
- domain, which removes some of the limitations of CELP. CELT is
- suitable for both speech and music and currently features:
-
- o Ultra-low algorithmic delay (as low as 2 ms)
-
- o Full audio bandwidth (up to 20 kHz audio bandwidth)
-
- o Support for both voice and music
-
- o Stereo support
-
- o Packet loss concealment
-
- o Constant bitrates from under 32 kbps to 128 kbps and above
-
- o Free software/open-source
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Valin & Maxwell Expires November 9, 2009 [Page 4]
-
-Internet-Draft draft-valin-celt-rtp-profile-00 May 2009
-
-
-3. RTP payload format for CELT
-
- For RTP based transportation of CELT encoded audio the standard RTP
- header [rfc3550] is followed by one or more payload data blocks. An
- optional padding terminator may also be used.
-
-
- 0 1 2 3
- 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | RTP Header |
- +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
- | one or more frames of CELT .... |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | .... |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-
-3.1. RTP Header
-
-
- 0 1 2 3
- 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- |V=2|P|X| CC |M| PT | sequence number |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | timestamp |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | synchronization source (SSRC) identifier |
- +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
- | contributing source (CSRC) identifiers |
- | ... |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-
- The RTP header is defined in the RTP specification [rfc3550]. This
- section defines how fields in the RTP header are used.
-
- Padding (P): 1 bit
-
- If the padding bit is set, the packet contains one or more additional
- padding octets at the end which are not part of the payload. The
- last octet of the padding contains a count of how many padding octets
- should be ignored, including itself. Padding may be needed by some
- encryption algorithms with fixed block sizes or for carrying several
- RTP packets in a lower-layer protocol data unit.
-
- Extension (X): 1 bit
-
- If the extension, X, bit is set, the fixed header MUST be followed by
-
-
-
-Valin & Maxwell Expires November 9, 2009 [Page 5]
-
-Internet-Draft draft-valin-celt-rtp-profile-00 May 2009
-
-
- exactly one header extension, with a format defined in Section 5.3.1.
- of [rfc3550].
-
- Marker (M): 1 bit
-
- The M bit MUST be set to zero in all packets. The receiver MUST
- ignore the M bit.
-
- Payload Type (PT): 7 bits
-
- Payload Type (PT): The assignment of an RTP payload type for this
- packet format is outside the scope of this document; it is specified
- by the RTP profile under which this payload format is used, or
- signaled dynamically out-of-band (e.g., using SDP).
-
- Timestamp: 32 bits
-
- A timestamp representing the sampling time of the first sample of the
- first CELT frame in the RTP payload. The clock frequency MUST be set
- to the sample rate of the encoded audio data and is conveyed out-of-
- band (e.g., as an SDP parameter).
-
-3.2. CELT payload
-
- For the purposes of packetizing the bit stream in RTP, it is only
- necessary to consider the sequence of bits as output by the CELT
- encoder [celt-website], and present the same sequence to the decoder.
- The payload format described here maintains this sequence.
-
- A typical CELT frame, encoded at a high bitrate, is approx. 128
- octets and the total size of the CELT frames SHOULD be kept below the
- path MTU to prevent fragmentation. CELT frames MUST NOT be split
- across multiple RTP packets,
-
- An RTP packet MAY contain CELT frames of the same bit rate or of
- varying bit rates, since the bitrate for the frames is explicitly
- conveyed in band with the signal. The encoding and decoding
- algorithm can change the bit rate at any frame boundary, with the bit
- rate change notification provided in-band. No out-of-band
- notification is required for the decoder to process changes in the
- bit rate sent by the encoder.
-
- It is RECOMMENDED that sampling rates 32000, 44100, or 48000 Hz be
- used for most applications, unless a specific reason exists -- such
- as requirements for a very specific packetization time. For example,
- 51200 Hz sampling may be useful to obtain a 5 ms packetization time
- with 256-sample frames. For compatibility reasons, the sender and
- receiver MUST support 48000 Hz sampling rate.
-
-
-
-Valin & Maxwell Expires November 9, 2009 [Page 6]
-
-Internet-Draft draft-valin-celt-rtp-profile-00 May 2009
-
-
- The CELT codec always produces an integer number of bytes and can
- produce any integer number of bytes, so no padding is ever required.
- Bitrate adjustment SHOULD be used instead of padding.
-
-3.3. Multiple CELT frames in a RTP packet
-
- The bitrate used by CELT is implicitly determined by the size of the
- compressed data. When more than one frame is encoded in the same
- packet, it is not possible to determine the size of each encoded
- frame, so the information MUST be explicitly encoded. If N frames
- are present in a packet, N compressed frame sizes need to be encoded
- at the beginning of the packet. Each size that is less than 255
- bytes is encoded in one byte (unsigned 8-bit integer). For sizes
- greater or equal to 255, a 0xff byte is encoded, followed by the
- size-255. Multiple 0xff bytes are allowed if there are more than 510
- bytes transmitted. The length is always the size of the CELT frame
- excluding the length byte itself. The payload MUST NOT be padded,
- except in accordance with the padding bit definition in the RTP
- header.
-
- Below is an example of two CELT frames contained within one RTP
- packet.
-
-
- 0 1 2 3
- 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- |V=2|P|X| CC |M| PT | sequence number |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | timestamp |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | synchronization source (SSRC) identifier |
- +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
- | contributing source (CSRC) identifiers |
- | ... |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | length frame 1| length frame 2| CELT frame 1... |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | (frame 1) | CELT frame 2... |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | (frame 2) |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-
- The following is an example of C code that interprets the length
- bytes:
-
-
-
-
-
-Valin & Maxwell Expires November 9, 2009 [Page 7]
-
-Internet-Draft draft-valin-celt-rtp-profile-00 May 2009
-
-
- int i, N, pos;
- int sizes[MAX_FRAMES][channels];
- unsigned int total_size;
- total_size=0;
- N = 0;
- pos = 0;
- while (total_size < payload_size) {
- for (i=0;i<channels;i++) {
- int s;
- int sum;
- sum = 0;
- do {
- s = payload[pos++];
- sum += s;
- total_size += s+1;
- } while (s == 255);
- sizes[N][i] = sum;
- }
- N++;
- }
-
-3.4. Multiple channels
-
- CELT supports both mono streams and stereo streams. If more than two
- channels are desired, it is possible to use transmit multiple streams
- in the same packet. In this case, the number of streams S and the
- pairing must be agreed with out-of-band negotiation such as SDP.
- Each stream can be either mono or stereo, depending on whether the
- channels are assumed to be correlated. For example, a 5.1 surround
- could have the front-left and front-right channels in a stereo
- stream, the rear-left and rear-right channels in a separate stereo
- stream, while the center and low-frequency channels would be in
- separate mono streams. In that example, the RTP packet would be:
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Valin & Maxwell Expires November 9, 2009 [Page 8]
-
-Internet-Draft draft-valin-celt-rtp-profile-00 May 2009
-
-
- 0 1 2 3
- 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- |V=2|P|X| CC |M| PT | sequence number |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | timestamp |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | synchronization source (SSRC) identifier |
- +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
- | contributing source (CSRC) identifiers |
- | ... |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | Front length | rear length | center length | LFE length |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | Front stereo |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | ... |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | ... | Rear stereo data... |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | ... |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | Center mono data... |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | ... |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | | LFE mono data... |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | ... |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-
- In the case where streams for multiple channels are used with
- multiple frames of the same streams per packet, then all streams for
- a certain timestamp are encoded before all streams for the following
- timestamp. In the case of the 5.1 example above with two frames per
- packet, the number of compressed length fields would be S*N = 8.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Valin & Maxwell Expires November 9, 2009 [Page 9]
-
-Internet-Draft draft-valin-celt-rtp-profile-00 May 2009
-
-
-4. MIME registration of CELT
-
- Full definition of the MIME [rfc2045] type for CELT will be part of
- the Ogg Vorbis MIME type definition application [rfc3534].
-
- MIME media type name: audio
-
- MIME subtype: celt
-
- Optional parameters:
-
- Required parameters: to be included in the Ogg MIME specification.
-
- Encoding considerations:
-
- Security Considerations:
-
- See Section 6 of RFC 3047.
-
- Interoperability considerations: none
-
- Published specification:
-
- Applications which use this media type:
-
- Additional information: none
-
- Person & email address to contact for further information:
-
-
- Jean-Marc Valin <[email protected]>
-
- Intended usage: COMMON
-
- Author/Change controller:
-
- Author: Jean-Marc Valin <[email protected]>
-
- Change controller: Jean-Marc Valin <[email protected]>
-
- Change controller: IETF AVT Working Group
-
- This transport type signifies that the content is to be interpreted
- according to this document if the contents are transmitted over RTP.
- Should this transport type appear over a lossless streaming protocol
- such as TCP, the content encapsulation should be interpreted as an
- Ogg Stream in accordance with [rfc3534], with the exception that the
- content of the Ogg Stream may be assumed to be CELT audio and CELT
-
-
-
-Valin & Maxwell Expires November 9, 2009 [Page 10]
-
-Internet-Draft draft-valin-celt-rtp-profile-00 May 2009
-
-
- audio only.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Valin & Maxwell Expires November 9, 2009 [Page 11]
-
-Internet-Draft draft-valin-celt-rtp-profile-00 May 2009
-
-
-5. SDP usage of CELT
-
- When conveying information by SDP [rfc2327], the encoding name MUST
- be set to "CELT". The sampling frequency is typically between 32000
- and 48000 Hz. Implementations SHOULD support both 44100 Hz and 48000
- Hz. The maximum bandwidth permitted for the CELT audio is encoded
- using the "b=AS:" header, as explained in SDP [rfc2327].
-
- The SDP parameters have the following interpretation with respect to
- CELT:
-
- b=AS: The maximum bandwidth (in kbit/s) allowed for CELT,
- excluding the header overhead. The default is 64 kbit/s.
-
- ptime: The desired packetization time. The sender SHOULD choose a
- number of frames per packet that corresponds to the smallest
- packetization time greater or equal to the specified ptime for the
- selected frame size. The default is 20 ms as specified in
- [rfc3551]
-
- maxptime: The maximum packetization time desired. If the maximum
- is lower than the smallest packetization time determined from the
- chosen frame size (as described above), then that packtization
- time SHOULD be used despite the maxptime value. The default is
- "no maximum".
-
- CELT-specific parameters can be given via the "a=fmtp:" directive.
- Several parameters can be given in a single a=fmtp line provided that
- they are separated by a semi-colon. The following parameters are
- defined for use in this way:
-
- frame-size: The frame size is the duration of each frame in
- samples. If more than one frame size is supported, a comma-
- separated list can be used. It is possible to use "any" to denote
- that all even frame sizes are supported. The default is 480.
-
- mapping: Optional string describing the multi-channel mapping.
-
- Because the frame-size is not transmitted in-band, an SDP answer MUST
- contain only one frame-size, even if multiple frame sizes were
- offered.
-
- The selected frame-size values MUST be even. They SHOULD be
- divisible by 8 and have a prime factorization which consists only of
- 2, 3, or 5 factors. For example, powers-of-two and values such as
- 160, 320, 240, and 480 are recommended. Implementations MUST support
- receiving and sending the default value of 480, and if the size 480
- is supported it MUST be offered. Implementations SHOULD also support
-
-
-
-Valin & Maxwell Expires November 9, 2009 [Page 12]
-
-Internet-Draft draft-valin-celt-rtp-profile-00 May 2009
-
-
- frame sizes of 256 and 512 since these are the ones that lead to the
- lowest complexity. When frame sizes that are powers of two are
- supported, they SHOULD be listed first in the offer and chosen over
- non powers of two in the answer.
-
- Care must be taken when setting the value of ptime: and b=AS: so that
- the RTP packet size does not exceed the path MTU.
-
- An example of the media representation in SDP for offering a single
- channel of CELT at 48000 samples per second might be:
-
-
- m=audio 8088 RTP/AVP 97
-
- a=rtpmap:97 CELT/48000
-
- Note that the RTP payload type code of 97 is defined in this media
- definition to be 'mapped' to the CELT codec at a 48kHz sampling
- frequency using the 'a=rtpmap' line. Any number from 96 to 127 could
- have been chosen (the allowed range for dynamic types). If there is
- more than one channel being encoded the rtpmap MUST specify the
- channel count.
-
- The following example illustrates the case where the offerer cannot
- receive more than 64 kbit/s.
-
-
- m=audio 8088 RTP/AVP 97
-
- b=AS:64
-
- a=rtmap:97 CELT/48000
-
- In this case, if the remote party agrees, it should configure its
- CELT encoder so that it does not use modes that produce more than 64
- kbit/s. Note that the "b=" constraint also applies on all payload
- types that may be proposed in the media line ("m=").
-
- The following example demonstrates the use of the a=fmtp: parameters:
-
-
- m=audio 8008 RTP/AVP 97
-
- a=ptime: 21
-
- a=rtpmap:97 CELT/44100
-
-
-
-
-
-Valin & Maxwell Expires November 9, 2009 [Page 13]
-
-Internet-Draft draft-valin-celt-rtp-profile-00 May 2009
-
-
- a=fmtp:97 frame-size=512;
-
- This examples illustrate an offerer that wishes to receive a CELT
- stream at 44100 Hz, by packing two 512-sample frames in each packet.
-
-5.1. Multichannel Mapping
-
- When more than two channels are used, a mapping parameter MUST be
- provided. The mapping parameter is defined as comma separated list
- of integers which specify the number of channels contained in each
- CELT stream, OPTIONALLY followed by a '/' and a comma separated list
- of channel identifiers, then OPTIONALLY another '/' and a string
- which provides an application specific elaboration on any speaker-
- feed definitions. The channels per stream entries MUST be either 1
- or 2. The total number of channels is indicated by the sum of the
- channels per stream entries. The sum of the channel counts MUST be
- equal to the total number of channels.
-
- Channel identifiers are short alphanumeric strings. Each identifier
- MUST begin with a letter indicating the type of channel. 'A' MUST be
- used to indicate an ambisonic channel, 'S' to indicate a speaker-feed
- channel, or 'O' indicating other usage.
-
- A channel identifier MAY be repeated, but the meaning of such
- repetition is application specific. Applications SHOULD attempt to
- utilize channel identifiers such that mixing all identical
- identifiers would produce a reasonable result.
-
- Non-surround usage such as individual performer tracks, effect send,
- "order wire", or other administrative channels may be given
- application specific identifiers which MUST not conflict with the
- identifiers defined in this draft. These identifiers SHOULD begin
- with S if it would be sensible to include them in a mono-downmix, or
- O if it would be most sensible to exclude them from a mono-downmix.
- An example usage might be mapping=2,1,2,1,1/
- SLguitar,SRguitar,OheadsetG,SLkeyboard,SRkeyboard,OheadsetK,SMbass,Oh
- eadsetB"
-
- Ambisonic channels MUST follow the Furse-Malham naming and weighing
- conventions for up to third order spherical[Ambisonic]. Higher order
- ambisonic support is application defined but MUST NOT reuse any of
- WXYZRSTUVKLMNOPQ for higher order components. For example, second
- order spherical ambisonics SHOULD use the mapping
- "mapping=1,1,1,1,1,1,1,1,1/AW,AX,AY,AZ,AR,AS,AT,AU,AV". Any set of
- Ambisonic channels MUST contain at least one "AW" channel.
-
- Speaker-feed identifiers are named based on the intended speaker
- locations. "L", "R" for the left and right speakers, respectively,
-
-
-
-Valin & Maxwell Expires November 9, 2009 [Page 14]
-
-Internet-Draft draft-valin-celt-rtp-profile-00 May 2009
-
-
- in conventional stereo or the front left and right in 4, 5, 5.1, or
- 7.1 channel surround. "LR", "RR" for the left and right rear
- speakers in 4,5 or 5.1 channel surround. C" is used for a center
- channel, "MLFE" for a low frequency extension channel. "LS", "RS"
- for the side channels in 7.1 channel surround. Additional speaker-
- feeds are application specific but should not reuse the prior
- identifiers. For 5.1 surround in non-ambisonic form the mapping
- SHOULD be "mapping=2,2,1,1/L,R,LR,RR,C,MLFE/ITU-RBS.775-1". When
- only one or two channels are used, the mapping parameter MAY be
- omitted, in which case the default mapping is used. For one channel,
- the default is "mapping=1/C", while for two channels, the default is
- "mapping=2/L,R".
-
- For example a stereo configuration might signal:
-
-
- m=audio 8008 RTP/AVP 97
-
- a=ptime: 5
-
- a=rtpmap:97 CELT/44100/2
-
- a=fmtp:97 frame-size=256;
-
- Which specifies a single two-channel CELT stream according to the
- default mapping.
-
-5.2. Low-Overhead Mode
-
- A low-overhead mode is defined to make more efficient use of
- bandwidth when transmitting CELT frames. In that mode none of the
- length values need to be transmitted. One the a=fmtp: parameter low-
- overhead: is defined and contains a single frame size, followed by a
- '/', followed by the number of frames (per channel) per packet,
- followed by a '/', followed by a comma-separated list of the number
- of bytes per frame for each stream defined in the channel mapping.
- The frame-size: parameter MUST not be specified and SHOULD be ignored
- if encountered in an SDP offer or answer. The ptime:, maxptime: and
- b=AS: parameters SHOULD also be ignored since the low-overhead:
- parameter makes them redundant. When the low-overhead: parameter is
- specified, the length of each frame MUST NOT be encoded in the
- payload and the bit-rate MUST NOT be changed during the session.
-
- For example a low-overhead surround configuration could be signaled
- as:
-
- m=audio 8008 RTP/AVP 97
-
-
-
-
-Valin & Maxwell Expires November 9, 2009 [Page 15]
-
-Internet-Draft draft-valin-celt-rtp-profile-00 May 2009
-
-
- a=ptime: 5
-
- a=rtpmap:97 CELT/48000/6
-
- a=fmtp:97 low-overhead=256/1/86,86,43,30;mapping=2,2,1,1/
- L,R,LR,RR,C,MLFE/ITU-RBS.775-1
-
- In this example, 4 bytes per packet would be saved. This corresponds
- to a 6 kbit/s reduction in the overhead, although the 60 kbit/s
- overhead of the IP, UDP and RTP headers is still present.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Valin & Maxwell Expires November 9, 2009 [Page 16]
-
-Internet-Draft draft-valin-celt-rtp-profile-00 May 2009
-
-
-6. Congestion Control
-
- CELT allows for bitrate adjustment in one byte per frame increments
- without any signaling requirement or overhead. Applications SHOULD
- utilize congestion control to regulate the transmitted bitrate. In
- some applications it may make sense to increase the packetization
- interval rather than decreasing the codec bitrate. Congestion
- control implementations should consider the users differential
- tolerance for high latency and low quality.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Valin & Maxwell Expires November 9, 2009 [Page 17]
-
-Internet-Draft draft-valin-celt-rtp-profile-00 May 2009
-
-
-7. Security Considerations
-
- RTP packets using the payload format defined in this specification
- are subject to the security considerations discussed in the RTP
- specification [rfc3550], and in any applicable RTP profile. The main
- security considerations for the RTP packet carrying the RTP payload
- format defined within this memo are confidentiality, integrity and
- source authenticity. Confidentiality is achieved by encryption of
- the RTP payload. Integrity of the RTP packets through suitable
- cryptographic integrity protection mechanism. Cryptographic system
- may also allow the authentication of the source of the payload. A
- suitable security mechanism for this RTP payload format should
- provide confidentiality, integrity protection and at least source
- authentication capable of determining if an RTP packet is from a
- member of the RTP session or not.
-
- Note that the appropriate mechanism to provide security to RTP and
- payloads following this memo may vary. It is dependent on the
- application, the transport, and the signalling protocol employed.
- Therefore a single mechanism is not sufficient, although if suitable
- the usage of SRTP [rfc3711] is recommended. Other mechanism that may
- be used are IPsec [rfc4301] and TLS [rfc5246] (RTP over TCP), but
- also other alternatives may exist.
-
- This RTP payload format and its media decoder do not exhibit any
- significant non-uniformity in the receiver-side computational
- complexity for packet processing, and thus are unlikely to pose a
- denial-of-service threat due to the receipt of pathological data.
- Nor does the RTP payload format contain any active content.
-
- Because this format supports VBR operation small amounts of
- information about the transmitted audio may be leaked by a length
- preserving cryptographic transport. Accordingly, when CELT is used
- inside a secure transport the sender SHOULD restrict the use of VBR
- to congestion control purposes.
-
- CELT implementations will typically exhibit tiny content-sensitive
- encoding time variances. Since transmission is usually triggered by
- an accurate hardware clock and the encoded data is typically
- transmitted as soon as encoding is complete this variance may result
- in a small amount of additional frame to frame jitter which could be
- measured by a third-party. Encrypted implementations SHOULD transmit
- packets at fixed intervals to avoid the possible information leak.
-
-
-
-
-
-
-
-
-Valin & Maxwell Expires November 9, 2009 [Page 18]
-
-Internet-Draft draft-valin-celt-rtp-profile-00 May 2009
-
-
-8. Acknowledgments
-
- The authors would also like to thank the following people for their
- input: Timothy B. Terriberry, Ben Schwartz, Alexander Carot, Thorvald
- Natvig, Brian West, Steve Underwood, and Anthony Minessale.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Valin & Maxwell Expires November 9, 2009 [Page 19]
-
-Internet-Draft draft-valin-celt-rtp-profile-00 May 2009
-
-
-9. References
-
-9.1. Normative References
-
- [rfc2119] Bradner, S., "Key words for use in RFCs to Indicate
- Requirement Levels", RFC 2119.
-
- [rfc3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
- Jacobson, "RTP: A Transport Protocol for real-time
- applications", RFC 3550.
-
- [rfc2045] "Multipurpose Internet Mail Extensions (MIME) Part One:
- Format of Internet Message Bodies", RFC 2045,
- November 1998.
-
- [rfc2327] Jacobson, V. and M. Handley, "SDP: Session Description
- Protocol", RFC 2327, April 1998.
-
- [rfc3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
- Video Conferences with Minimal Control.", RFC 3551,
- July 2003.
-
- [rfc3534] Walleij, L., "The application/ogg Media Type", RFC 3534,
- May 2003.
-
- [rfc3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
- Norrman, "The Secure Real-time Transport Protocol (SRTP)",
- RFC 3711, March 2004.
-
- [rfc4301] Kent, S. and K. Seo, "Security Architecture for the
- Internet Protocol", RFC 4301, December 2005.
-
- [rfc5246] Dierks, T. and E. Rescorla, "The Transport Layer Security
- (TLS) Protocol Version 1.2", RFC 5246, August 2008.
-
-9.2. Informative References
-
- [celt-website]
- Xiph.Org Foundation, "The CELT ultra-low delay audio
- codec", CELT website http://www.celt-codec.org/.
-
- [Ambisonic]
- Malham, D., "Higher order Ambisonic systems", Paper http:/
- /www.york.ac.uk/inst/mustech/3d_audio/
- higher_order_ambisonics.pdf, December 2003.
-
-
-
-
-
-
-Valin & Maxwell Expires November 9, 2009 [Page 20]
-
-Internet-Draft draft-valin-celt-rtp-profile-00 May 2009
-
-
-Authors' Addresses
-
- Jean-Marc Valin
- Octasic Semiconductor
- 4101, Molson Street, suite 300
- Montreal, Quebec H1Y 3L1
- Canada
-
- Email: [email protected]
-
-
- Gregory Maxwell
- Juniper Networks
- 2251 Corporate Park Drive, Suite 100
- Herndon, VA 20171-1817
- USA
-
- Email: [email protected]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Valin & Maxwell Expires November 9, 2009 [Page 21]
-
--- /dev/null
+++ b/doc/ietf/draft-valin-celt-rtp-profile-01.txt
@@ -1,0 +1,1120 @@
+
+
+
+AVT Working Group J-M. Valin
+Internet-Draft Octasic Semiconductor
+Expires: November 12, 2009 G. Maxwell
+ Juniper Networks
+ May 11, 2009
+
+
+ draft-valin-celt-rtp-profile-01
+ RTP Payload Format for the CELT Codec
+
+Status of this Memo
+
+ This Internet-Draft is submitted to IETF in full conformance with the
+ provisions of BCP 78 and BCP 79.
+
+ Internet-Drafts are working documents of the Internet Engineering
+ Task Force (IETF), its areas, and its working groups. Note that
+ other groups may also distribute working documents as Internet-
+ Drafts.
+
+ Internet-Drafts are draft documents valid for a maximum of six months
+ and may be updated, replaced, or obsoleted by other documents at any
+ time. It is inappropriate to use Internet-Drafts as reference
+ material or to cite them other than as "work in progress."
+
+ The list of current Internet-Drafts can be accessed at
+ http://www.ietf.org/ietf/1id-abstracts.txt.
+
+ The list of Internet-Draft Shadow Directories can be accessed at
+ http://www.ietf.org/shadow.html.
+
+ This Internet-Draft will expire on November 12, 2009.
+
+Copyright Notice
+
+ Copyright (c) 2009 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents in effect on the date of
+ publication of this document (http://trustee.ietf.org/license-info).
+ Please review these documents carefully, as they describe your rights
+ and restrictions with respect to this document.
+
+
+
+
+
+
+
+
+Valin & Maxwell Expires November 12, 2009 [Page 1]
+
+Internet-Draft draft-valin-celt-rtp-profile-01 May 2009
+
+
+Abstract
+
+ CELT is an open-source voice codec suitable for use in very low delay
+ audio communication applications, including Voice over IP (VoIP).
+ This document describes the payload format for CELT generated bit
+ streams within an RTP packet. Also included here are the necessary
+ details for the use of CELT with the Session Description Protocol
+ (SDP). At the time of this writing, the CELT bit-stream has NOT been
+ finalized yet, and compatibility is usually broken with every new
+ release of the codec.
+
+
+Table of Contents
+
+ 1. Conventions used in this document . . . . . . . . . . . . . . 3
+ 2. Overview of the CELT Codec . . . . . . . . . . . . . . . . . . 4
+ 3. RTP payload format for CELT . . . . . . . . . . . . . . . . . 5
+ 3.1. RTP Header . . . . . . . . . . . . . . . . . . . . . . . . 5
+ 3.2. CELT payload . . . . . . . . . . . . . . . . . . . . . . . 6
+ 3.3. Multiple CELT frames in a RTP packet . . . . . . . . . . . 7
+ 3.4. Multiple channels . . . . . . . . . . . . . . . . . . . . 8
+ 4. MIME registration of CELT . . . . . . . . . . . . . . . . . . 10
+ 5. SDP usage of CELT . . . . . . . . . . . . . . . . . . . . . . 12
+ 5.1. Multichannel Mapping . . . . . . . . . . . . . . . . . . . 13
+ 5.2. Low-Overhead Mode . . . . . . . . . . . . . . . . . . . . 15
+ 6. Congestion Control . . . . . . . . . . . . . . . . . . . . . . 16
+ 7. Security Considerations . . . . . . . . . . . . . . . . . . . 17
+ 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 18
+ 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 19
+ 9.1. Normative References . . . . . . . . . . . . . . . . . . . 19
+ 9.2. Informative References . . . . . . . . . . . . . . . . . . 19
+ Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Valin & Maxwell Expires November 12, 2009 [Page 2]
+
+Internet-Draft draft-valin-celt-rtp-profile-01 May 2009
+
+
+1. Conventions used in this document
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in RFC 2119 [rfc2119].
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Valin & Maxwell Expires November 12, 2009 [Page 3]
+
+Internet-Draft draft-valin-celt-rtp-profile-01 May 2009
+
+
+2. Overview of the CELT Codec
+
+ CELT stands for "Constrained Energy Lapped Transform". It applies
+ some of the CELP principles, but does everything in the frequency
+ domain, which removes some of the limitations of CELP. CELT is
+ suitable for both speech and music and currently features:
+
+ o Ultra-low algorithmic delay (as low as 2 ms)
+
+ o Full audio bandwidth (up to 20 kHz audio bandwidth)
+
+ o Support for both voice and music
+
+ o Stereo support
+
+ o Packet loss concealment
+
+ o Constant bitrates from under 32 kbps to 128 kbps and above
+
+ o Free software/open-source
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Valin & Maxwell Expires November 12, 2009 [Page 4]
+
+Internet-Draft draft-valin-celt-rtp-profile-01 May 2009
+
+
+3. RTP payload format for CELT
+
+ For RTP based transportation of CELT encoded audio the standard RTP
+ header [rfc3550] is followed by one or more payload data blocks. An
+ optional padding terminator may also be used.
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | RTP Header |
+ +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
+ | one or more frames of CELT .... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | .... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+3.1. RTP Header
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |V=2|P|X| CC |M| PT | sequence number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | timestamp |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | synchronization source (SSRC) identifier |
+ +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
+ | contributing source (CSRC) identifiers |
+ | ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ The RTP header is defined in the RTP specification [rfc3550]. This
+ section defines how fields in the RTP header are used.
+
+ Padding (P): 1 bit
+
+ If the padding bit is set, the packet contains one or more additional
+ padding octets at the end which are not part of the payload. The
+ last octet of the padding contains a count of how many padding octets
+ should be ignored, including itself. Padding may be needed by some
+ encryption algorithms with fixed block sizes or for carrying several
+ RTP packets in a lower-layer protocol data unit.
+
+ Extension (X): 1 bit
+
+ If the extension, X, bit is set, the fixed header MUST be followed by
+
+
+
+Valin & Maxwell Expires November 12, 2009 [Page 5]
+
+Internet-Draft draft-valin-celt-rtp-profile-01 May 2009
+
+
+ exactly one header extension, with a format defined in Section 5.3.1.
+ of [rfc3550].
+
+ Marker (M): 1 bit
+
+ The M bit MUST be set to zero in all packets. The receiver MUST
+ ignore the M bit.
+
+ Payload Type (PT): 7 bits
+
+ Payload Type (PT): The assignment of an RTP payload type for this
+ packet format is outside the scope of this document; it is specified
+ by the RTP profile under which this payload format is used, or
+ signaled dynamically out-of-band (e.g., using SDP).
+
+ Timestamp: 32 bits
+
+ A timestamp representing the sampling time of the first sample of the
+ first CELT frame in the RTP payload. The clock frequency MUST be set
+ to the sample rate of the encoded audio data and is conveyed out-of-
+ band (e.g., as an SDP parameter).
+
+3.2. CELT payload
+
+ For the purposes of packetizing the bit stream in RTP, it is only
+ necessary to consider the sequence of bits as output by the CELT
+ encoder [celt-website], and present the same sequence to the decoder.
+ The payload format described here maintains this sequence.
+
+ A typical CELT frame, encoded at a high bitrate, is approx. 128
+ octets and the total size of the CELT frames SHOULD be kept below the
+ path MTU to prevent fragmentation. CELT frames MUST NOT be split
+ across multiple RTP packets,
+
+ An RTP packet MAY contain CELT frames of the same bit rate or of
+ varying bit rates, since the bitrate for the frames is explicitly
+ conveyed in band with the signal. The encoding and decoding
+ algorithm can change the bit rate at any frame boundary, with the bit
+ rate change notification provided in-band. No out-of-band
+ notification is required for the decoder to process changes in the
+ bit rate sent by the encoder.
+
+ It is RECOMMENDED that sampling rates 32000, 44100, or 48000 Hz be
+ used for most applications, unless a specific reason exists -- such
+ as requirements for a very specific packetization time. For example,
+ 51200 Hz sampling may be useful to obtain a 5 ms packetization time
+ with 256-sample frames. For compatibility reasons, the sender and
+ receiver MUST support 48000 Hz sampling rate.
+
+
+
+Valin & Maxwell Expires November 12, 2009 [Page 6]
+
+Internet-Draft draft-valin-celt-rtp-profile-01 May 2009
+
+
+ The CELT codec always produces an integer number of bytes and can
+ produce any integer number of bytes, so no padding is ever required.
+ Bitrate adjustment SHOULD be used instead of padding.
+
+3.3. Multiple CELT frames in a RTP packet
+
+ The bitrate used by CELT is implicitly determined by the size of the
+ compressed data. When more than one frame is encoded in the same
+ packet, it is not possible to determine the size of each encoded
+ frame, so the information MUST be explicitly encoded. If N frames
+ are present in a packet, N compressed frame sizes need to be encoded
+ at the beginning of the packet. Each size that is less than 255
+ bytes is encoded in one byte (unsigned 8-bit integer). For sizes
+ greater or equal to 255, a 0xff byte is encoded, followed by the
+ size-255. Multiple 0xff bytes are allowed if there are more than 510
+ bytes transmitted. The length is always the size of the CELT frame
+ excluding the length byte itself. The payload MUST NOT be padded,
+ except in accordance with the padding bit definition in the RTP
+ header.
+
+ Below is an example of two CELT frames contained within one RTP
+ packet.
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |V=2|P|X| CC |M| PT | sequence number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | timestamp |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | synchronization source (SSRC) identifier |
+ +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
+ | contributing source (CSRC) identifiers |
+ | ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | length frame 1| length frame 2| CELT frame 1... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | (frame 1) | CELT frame 2... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | (frame 2) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ The following is an example of C code that interprets the length
+ bytes:
+
+
+
+
+
+Valin & Maxwell Expires November 12, 2009 [Page 7]
+
+Internet-Draft draft-valin-celt-rtp-profile-01 May 2009
+
+
+ int i, N, pos;
+ int sizes[MAX_FRAMES][channels];
+ unsigned int total_size;
+ total_size=0;
+ N = 0;
+ pos = 0;
+ while (total_size < payload_size) {
+ for (i=0;i<channels;i++) {
+ int s;
+ int sum;
+ sum = 0;
+ do {
+ s = payload[pos++];
+ sum += s;
+ total_size += s+1;
+ } while (s == 255);
+ sizes[N][i] = sum;
+ }
+ N++;
+ }
+
+3.4. Multiple channels
+
+ CELT supports both mono streams and stereo streams. If more than two
+ channels are desired, it is possible to use transmit multiple streams
+ in the same packet. In this case, the number of streams S and the
+ pairing must be agreed with out-of-band negotiation such as SDP.
+ Each stream can be either mono or stereo, depending on whether the
+ channels are assumed to be correlated. For example, a 5.1 surround
+ could have the front-left and front-right channels in a stereo
+ stream, the rear-left and rear-right channels in a separate stereo
+ stream, while the center and low-frequency channels would be in
+ separate mono streams. In that example, the RTP packet would be:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Valin & Maxwell Expires November 12, 2009 [Page 8]
+
+Internet-Draft draft-valin-celt-rtp-profile-01 May 2009
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |V=2|P|X| CC |M| PT | sequence number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | timestamp |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | synchronization source (SSRC) identifier |
+ +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
+ | contributing source (CSRC) identifiers |
+ | ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Front length | rear length | center length | LFE length |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Front stereo |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ... | Rear stereo data... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Center mono data... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | | LFE mono data... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ In the case where streams for multiple channels are used with
+ multiple frames of the same streams per packet, then all streams for
+ a certain timestamp are encoded before all streams for the following
+ timestamp. In the case of the 5.1 example above with two frames per
+ packet, the number of compressed length fields would be S*N = 8.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Valin & Maxwell Expires November 12, 2009 [Page 9]
+
+Internet-Draft draft-valin-celt-rtp-profile-01 May 2009
+
+
+4. MIME registration of CELT
+
+ Full definition of the MIME [rfc2045] type for CELT will be part of
+ the Ogg Vorbis MIME type definition application [rfc3534].
+
+ MIME media type name: audio
+
+ MIME subtype: celt
+
+ Optional parameters:
+
+ Required parameters: to be included in the Ogg MIME specification.
+
+ Encoding considerations:
+
+ Security Considerations:
+
+ See Section 6 of RFC 3047.
+
+ Interoperability considerations: none
+
+ Published specification:
+
+ Applications which use this media type:
+
+ Additional information: none
+
+ Person & email address to contact for further information:
+
+
+ Jean-Marc Valin <[email protected]>
+
+ Intended usage: COMMON
+
+ Author/Change controller:
+
+ Author: Jean-Marc Valin <[email protected]>
+
+ Change controller: Jean-Marc Valin <[email protected]>
+
+ Change controller: IETF AVT Working Group
+
+ This transport type signifies that the content is to be interpreted
+ according to this document if the contents are transmitted over RTP.
+ Should this transport type appear over a lossless streaming protocol
+ such as TCP, the content encapsulation should be interpreted as an
+ Ogg Stream in accordance with [rfc3534], with the exception that the
+ content of the Ogg Stream may be assumed to be CELT audio and CELT
+
+
+
+Valin & Maxwell Expires November 12, 2009 [Page 10]
+
+Internet-Draft draft-valin-celt-rtp-profile-01 May 2009
+
+
+ audio only.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Valin & Maxwell Expires November 12, 2009 [Page 11]
+
+Internet-Draft draft-valin-celt-rtp-profile-01 May 2009
+
+
+5. SDP usage of CELT
+
+ When conveying information by SDP [rfc4566], the encoding name MUST
+ be set to "CELT". The sampling frequency is typically between 32000
+ and 48000 Hz. Implementations MUST support 48000 Hz and SHOULD also
+ support 44100 Hz.
+
+ The SDP parameters have the following interpretation with respect to
+ CELT:
+
+ ptime: The desired packetization time. The sender SHOULD choose a
+ number of frames per packet that corresponds to the smallest
+ packetization time greater or equal to the specified ptime for the
+ selected frame size. The default is 20 ms as specified in
+ [rfc3551]
+
+ maxptime: The maximum packetization time desired. As specified in
+ [rfc4566], if the maximum is lower than the smallest packetization
+ time determined from the chosen frame size (as described above),
+ then that packtization time SHOULD be used despite the maxptime
+ value. The default is "no maximum".
+
+ CELT-specific parameters can be given via the "a=fmtp:" directive.
+ Several parameters can be given in a single a=fmtp line provided that
+ they are separated by a semi-colon. The following parameters are
+ defined for use in this way:
+
+ bitrate: The desired bit-rate in kbit/s for the codec only
+ (excluding headers and the length bytes). The value MUST be
+ rounded to an integer number of bytes per frame. The round-to-
+ nearest method is RECOMMENDED. The default bit-rate value is 64
+ kbit/s per channel.
+
+ frame-size: The frame size is the duration of each frame in
+ samples and has to be even. The default is 480.
+
+ mapping: Optional string describing the multi-channel mapping.
+
+ The selected frame-size values MUST be even. For quality and
+ complexity reasons, they SHOULD also be divisible by 8 and have a
+ prime factorization which consists only of 2, 3, or 5 factors. For
+ example, powers-of-two and values such as 160, 320, 240, and 480 are
+ recommended. Implementations MUST support receiving and sending the
+ default value of 480. Implementations SHOULD also support frame
+ sizes of 256 and 512 since these are the ones that lead to the lowest
+ complexity. When frame sizes that are powers-of-two are supported,
+ they SHOULD be listed first in the offer and chosen over non-powers-
+ of-two in the answer.
+
+
+
+Valin & Maxwell Expires November 12, 2009 [Page 12]
+
+Internet-Draft draft-valin-celt-rtp-profile-01 May 2009
+
+
+ Care must be taken when setting the value of ptime: and bitrate: so
+ that the RTP packet size does not exceed the path MTU.
+
+ An example of the media representation in SDP for offering a single
+ channel of CELT at 48000 samples per second might be:
+
+
+ m=audio 8088 RTP/AVP 97
+
+ a=rtpmap:97 CELT/48000/1
+
+ Note that the RTP payload type code of 97 is defined in this media
+ definition to be 'mapped' to the CELT codec at a 48 kHz sampling
+ frequency using the 'a=rtpmap' line. Any number from 96 to 127 could
+ have been chosen (the allowed range for dynamic types). If there is
+ more than one channel being encoded the rtpmap MUST specify the
+ channel count. When no channel count is included, the default is one
+ channel.
+
+ The following example demonstrates the use of the a=fmtp: parameters:
+
+
+ m=audio 8008 RTP/AVP 97
+
+ a=ptime: 25
+
+ a=rtpmap:97 CELT/44100
+
+ a=fmtp:97 frame-size=512;bitrate=48
+
+ This examples illustrate an offerer that wishes to receive a CELT
+ stream at 44100 Hz, by packing two 512-sample frames in each packet
+ (less than 25 ms) at around 48 kbps (70 bytes per frame).
+
+5.1. Multichannel Mapping
+
+ When more than two channels are used, a mapping parameter MUST be
+ provided. The mapping parameter is defined as comma separated list
+ of integers which specify the number of channels contained in each
+ CELT stream, OPTIONALLY followed by a '/' and a comma separated list
+ of channel identifiers, then OPTIONALLY another '/' and a string
+ which provides an application specific elaboration on any speaker-
+ feed definitions. The channels per stream entries MUST be either 1
+ or 2. The total number of channels is indicated by the sum of the
+ channels per stream entries. The sum of the channel counts MUST be
+ equal to the total number of channels.
+
+ Channel identifiers are short alphanumeric strings. Each identifier
+
+
+
+Valin & Maxwell Expires November 12, 2009 [Page 13]
+
+Internet-Draft draft-valin-celt-rtp-profile-01 May 2009
+
+
+ MUST begin with a letter indicating the type of channel. 'A' MUST be
+ used to indicate an ambisonic channel, 'S' to indicate a speaker-feed
+ channel, or 'O' indicating other usage.
+
+ A channel identifier MAY be repeated, but the meaning of such
+ repetition is application specific. Applications SHOULD attempt to
+ utilize channel identifiers such that mixing all identical
+ identifiers would produce a reasonable result.
+
+ Non-surround usage such as individual performer tracks, effect send,
+ "order wire", or other administrative channels may be given
+ application specific identifiers which MUST not conflict with the
+ identifiers defined in this draft. These identifiers SHOULD begin
+ with S if it would be sensible to include them in a mono-downmix, or
+ O if it would be most sensible to exclude them from a mono-downmix.
+ An example usage might be mapping=2,1,2,1,1/
+ SLguitar,SRguitar,OheadsetG,SLkeyboard,SRkeyboard,OheadsetK,SMbass,Oh
+ eadsetB"
+
+ Ambisonic channels MUST follow the Furse-Malham naming and weighing
+ conventions for up to third order spherical[Ambisonic]. Higher order
+ ambisonic support is application defined but MUST NOT reuse any of
+ WXYZRSTUVKLMNOPQ for higher order components. For example, second
+ order spherical ambisonics SHOULD use the mapping
+ "mapping=1,1,1,1,1,1,1,1,1/AW,AX,AY,AZ,AR,AS,AT,AU,AV". Any set of
+ Ambisonic channels MUST contain at least one "AW" channel.
+
+ Speaker-feed identifiers are named based on the intended speaker
+ locations. "L", "R" for the left and right speakers, respectively,
+ in conventional stereo or the front left and right in 4, 5, 5.1, or
+ 7.1 channel surround. "LR", "RR" for the left and right rear
+ speakers in 4,5 or 5.1 channel surround. C" is used for a center
+ channel, "MLFE" for a low frequency extension channel. "LS", "RS"
+ for the side channels in 7.1 channel surround. Additional speaker-
+ feeds are application specific but should not reuse the prior
+ identifiers. For 5.1 surround in non-ambisonic form the mapping
+ SHOULD be "mapping=2,2,1,1/L,R,LR,RR,C,MLFE/ITU-RBS.775-1". When
+ only one or two channels are used, the mapping parameter MAY be
+ omitted, in which case the default mapping is used. For one channel,
+ the default is "mapping=1/C", while for two channels, the default is
+ "mapping=2/L,R".
+
+ For example a stereo configuration might signal:
+
+
+ m=audio 8008 RTP/AVP 97
+
+
+
+
+
+Valin & Maxwell Expires November 12, 2009 [Page 14]
+
+Internet-Draft draft-valin-celt-rtp-profile-01 May 2009
+
+
+ a=ptime: 5
+
+ a=rtpmap:97 CELT/44100/2
+
+ a=fmtp:97 frame-size=256
+
+ Which specifies a single two-channel CELT stream according to the
+ default mapping.
+
+5.2. Low-Overhead Mode
+
+ A low-overhead mode is defined to make more efficient use of
+ bandwidth when transmitting CELT frames. In that mode none of the
+ length values need to be transmitted. One the a=fmtp: parameter low-
+ overhead: is defined and contains a single frame size, followed by a
+ '/', followed by a comma-separated list of the number of bytes per
+ frame for each stream defined in the channel mapping. The number of
+ frames per channel can thus be computed as the payload size divided
+ by the sum of the bytes-per-frame values. The frame-size: parameter
+ MUST not be specified and SHOULD be ignored if encountered in an SDP
+ offer or answer. The bitrate: parameter MUST also be ignored since
+ the low-overhead: parameter makes it redundant. When the low-
+ overhead: parameter is specified, the length of each frame MUST NOT
+ be encoded in the payload and the bit-rate MUST NOT be changed during
+ the session.
+
+ For example a low-overhead surround configuration could be signaled
+ as:
+
+ m=audio 8008 RTP/AVP 97
+
+ a=ptime: 5
+
+ a=rtpmap:97 CELT/48000/6
+
+ a=fmtp:97 low-overhead=256/1/86,86,43,30;mapping=2,2,1,1/
+ L,R,LR,RR,C,MLFE/ITU-RBS.775-1
+
+ In this example, 4 bytes per packet would be saved. This corresponds
+ to a 6 kbit/s reduction in the overhead, although the 60 kbit/s
+ overhead of the IP, UDP and RTP headers is still present.
+
+
+
+
+
+
+
+
+
+
+Valin & Maxwell Expires November 12, 2009 [Page 15]
+
+Internet-Draft draft-valin-celt-rtp-profile-01 May 2009
+
+
+6. Congestion Control
+
+ CELT allows any bitrate, with a one byte per frame resolution,
+ without any signaling requirement or overhead. Applications SHOULD
+ utilize congestion control to regulate the transmitted bitrate. In
+ some applications it may make sense to increase the packetization
+ interval rather than decreasing the codec bitrate. Congestion
+ control implementations should consider the users differential
+ tolerance for high latency and low quality.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Valin & Maxwell Expires November 12, 2009 [Page 16]
+
+Internet-Draft draft-valin-celt-rtp-profile-01 May 2009
+
+
+7. Security Considerations
+
+ RTP packets using the payload format defined in this specification
+ are subject to the security considerations discussed in the RTP
+ specification [rfc3550], and in any applicable RTP profile. The main
+ security considerations for the RTP packet carrying the RTP payload
+ format defined within this memo are confidentiality, integrity and
+ source authenticity. Confidentiality is achieved by encryption of
+ the RTP payload. Integrity of the RTP packets through suitable
+ cryptographic integrity protection mechanism. Cryptographic system
+ may also allow the authentication of the source of the payload. A
+ suitable security mechanism for this RTP payload format should
+ provide confidentiality, integrity protection and at least source
+ authentication capable of determining if an RTP packet is from a
+ member of the RTP session or not.
+
+ Note that the appropriate mechanism to provide security to RTP and
+ payloads following this memo may vary. It is dependent on the
+ application, the transport, and the signalling protocol employed.
+ Therefore a single mechanism is not sufficient, although if suitable
+ the usage of SRTP [rfc3711] is recommended. Other mechanism that may
+ be used are IPsec [rfc4301] and TLS [rfc5246] (RTP over TCP), but
+ also other alternatives may exist.
+
+ This RTP payload format and its media decoder do not exhibit any
+ significant non-uniformity in the receiver-side computational
+ complexity for packet processing, and thus are unlikely to pose a
+ denial-of-service threat due to the receipt of pathological data.
+ Nor does the RTP payload format contain any active content.
+
+ Because this format supports VBR operation small amounts of
+ information about the transmitted audio may be leaked by a length
+ preserving cryptographic transport. Accordingly, when CELT is used
+ inside a secure transport the sender SHOULD restrict the use of VBR
+ to congestion control purposes.
+
+ CELT implementations will typically exhibit tiny content-sensitive
+ encoding time variances. Since transmission is usually triggered by
+ an accurate hardware clock and the encoded data is typically
+ transmitted as soon as encoding is complete this variance may result
+ in a small amount of additional frame to frame jitter which could be
+ measured by a third-party. Encrypted implementations SHOULD transmit
+ packets at fixed intervals to avoid the possible information leak.
+
+
+
+
+
+
+
+
+Valin & Maxwell Expires November 12, 2009 [Page 17]
+
+Internet-Draft draft-valin-celt-rtp-profile-01 May 2009
+
+
+8. Acknowledgments
+
+ The authors would also like to thank the following people for their
+ input: Timothy B. Terriberry, Ben Schwartz, Alexander Carot, Thorvald
+ Natvig, Brian West, Steve Underwood, and Anthony Minessale.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Valin & Maxwell Expires November 12, 2009 [Page 18]
+
+Internet-Draft draft-valin-celt-rtp-profile-01 May 2009
+
+
+9. References
+
+9.1. Normative References
+
+ [rfc2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", RFC 2119.
+
+ [rfc3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
+ Jacobson, "RTP: A Transport Protocol for real-time
+ applications", RFC 3550.
+
+ [rfc2045] "Multipurpose Internet Mail Extensions (MIME) Part One:
+ Format of Internet Message Bodies", RFC 2045,
+ November 1998.
+
+ [rfc4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
+ Description Protocol", RFC 4566, July 2006.
+
+ [rfc3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
+ Video Conferences with Minimal Control.", RFC 3551,
+ July 2003.
+
+ [rfc3534] Walleij, L., "The application/ogg Media Type", RFC 3534,
+ May 2003.
+
+ [rfc3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
+ Norrman, "The Secure Real-time Transport Protocol (SRTP)",
+ RFC 3711, March 2004.
+
+ [rfc4301] Kent, S. and K. Seo, "Security Architecture for the
+ Internet Protocol", RFC 4301, December 2005.
+
+ [rfc5246] Dierks, T. and E. Rescorla, "The Transport Layer Security
+ (TLS) Protocol Version 1.2", RFC 5246, August 2008.
+
+9.2. Informative References
+
+ [celt-website]
+ Xiph.Org Foundation, "The CELT ultra-low delay audio
+ codec", CELT website http://www.celt-codec.org/.
+
+ [Ambisonic]
+ Malham, D., "Higher order Ambisonic systems", Paper http:/
+ /www.york.ac.uk/inst/mustech/3d_audio/
+ higher_order_ambisonics.pdf, December 2003.
+
+
+
+
+
+
+Valin & Maxwell Expires November 12, 2009 [Page 19]
+
+Internet-Draft draft-valin-celt-rtp-profile-01 May 2009
+
+
+Authors' Addresses
+
+ Jean-Marc Valin
+ Octasic Semiconductor
+ 4101, Molson Street, suite 300
+ Montreal, Quebec H1Y 3L1
+ Canada
+
+ Email: [email protected]
+
+
+ Gregory Maxwell
+ Juniper Networks
+ 2251 Corporate Park Drive, Suite 100
+ Herndon, VA 20171-1817
+ USA
+
+ Email: [email protected]
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Valin & Maxwell Expires November 12, 2009 [Page 20]
+