draft-ietf-payload-rtp-opus-00.txt   draft-ietf-payload-rtp-opus-01.txt 
Network Working Group J. Spittka Network Working Group J. Spittka
Internet-Draft Internet-Draft
Intended status: Standards Track K. Vos Intended status: Standards Track K. Vos
Expires: July 14, 2013 Skype Technologies S.A. Expires: February 03, 2014 Skype Technologies S.A.
JM. Valin JM. Valin
Mozilla Mozilla
January 10, 2013 August 02, 2013
RTP Payload Format for Opus Speech and Audio Codec RTP Payload Format for Opus Speech and Audio Codec
draft-ietf-payload-rtp-opus-00 draft-ietf-payload-rtp-opus-01
Abstract Abstract
This document defines the Real-time Transport Protocol (RTP) payload This document defines the Real-time Transport Protocol (RTP) payload
format for packetization of Opus encoded speech and audio data that format for packetization of Opus encoded speech and audio data that
is essential to integrate the codec in the most compatible way. is essential to integrate the codec in the most compatible way.
Further, media type registrations are described for the RTP payload Further, media type registrations are described for the RTP payload
format. format.
Status of this Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on July 14, 2013. This Internet-Draft will expire on February 03, 2014.
Copyright Notice Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Conventions, Definitions and Acronyms used in this document . 4 2. Conventions, Definitions and Acronyms used in this document . 3
2.1. Audio Bandwidth . . . . . . . . . . . . . . . . . . . . . 4 2.1. Audio Bandwidth . . . . . . . . . . . . . . . . . . . . . 3
3. Opus Codec . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Opus Codec . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.1. Network Bandwidth . . . . . . . . . . . . . . . . . . . . 5 3.1. Network Bandwidth . . . . . . . . . . . . . . . . . . . . 4
3.1.1. Recommended Bitrate . . . . . . . . . . . . . . . . . 5 3.1.1. Recommended Bitrate . . . . . . . . . . . . . . . . . 4
3.1.2. Variable versus Constant Bit Rate . . . . . . . . . . 5 3.1.2. Variable versus Constant Bit Rate . . . . . . . . . . 4
3.1.3. Discontinuous Transmission (DTX) . . . . . . . . . . . 6 3.1.3. Discontinuous Transmission (DTX) . . . . . . . . . . 4
3.2. Complexity . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2. Complexity . . . . . . . . . . . . . . . . . . . . . . . 5
3.3. Forward Error Correction (FEC) . . . . . . . . . . . . . . 6 3.3. Forward Error Correction (FEC) . . . . . . . . . . . . . 5
3.4. Stereo Operation . . . . . . . . . . . . . . . . . . . . . 7 3.4. Stereo Operation . . . . . . . . . . . . . . . . . . . . 6
4. Opus RTP Payload Format . . . . . . . . . . . . . . . . . . . 8 4. Opus RTP Payload Format . . . . . . . . . . . . . . . . . . . 6
4.1. RTP Header Usage . . . . . . . . . . . . . . . . . . . . . 8 4.1. RTP Header Usage . . . . . . . . . . . . . . . . . . . . 6
4.2. Payload Structure . . . . . . . . . . . . . . . . . . . . 9 4.2. Payload Structure . . . . . . . . . . . . . . . . . . . . 7
5. Congestion Control . . . . . . . . . . . . . . . . . . . . . . 11 5. Congestion Control . . . . . . . . . . . . . . . . . . . . . 8
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9
6.1. Opus Media Type Registration . . . . . . . . . . . . . . . 12 6.1. Opus Media Type Registration . . . . . . . . . . . . . . 9
6.2. Mapping to SDP Parameters . . . . . . . . . . . . . . . . 15 6.2. Mapping to SDP Parameters . . . . . . . . . . . . . . . . 13
6.2.1. Offer-Answer Model Considerations for Opus . . . . . . 17 6.2.1. Offer-Answer Model Considerations for Opus . . . . . 14
6.2.2. Declarative SDP Considerations for Opus . . . . . . . 18 6.2.2. Declarative SDP Considerations for Opus . . . . . . . 16
7. Security Considerations . . . . . . . . . . . . . . . . . . . 19 7. Security Considerations . . . . . . . . . . . . . . . . . . . 16
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 20 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 16
9. Normative References . . . . . . . . . . . . . . . . . . . . . 21 9. Normative References . . . . . . . . . . . . . . . . . . . . 17
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 18
1. Introduction 1. Introduction
The Opus codec is a speech and audio codec developed within the IETF The Opus codec is a speech and audio codec developed within the IETF
Internet Wideband Audio Codec working group (codec). The codec has a Internet Wideband Audio Codec working group (codec). The codec has a
very low algorithmic delay and it is highly scalable in terms of very low algorithmic delay and it is highly scalable in terms of
audio bandwidth, bitrate, and complexity. Further, it provides audio bandwidth, bitrate, and complexity. Further, it provides
different modes to efficiently encode speech signals as well as music different modes to efficiently encode speech signals as well as music
signals, thus, making it the codec of choice for various applications signals, thus, making it the codec of choice for various applications
using the Internet or similar networks. using the Internet or similar networks.
skipping to change at page 4, line 29 skipping to change at page 3, line 29
2.1. Audio Bandwidth 2.1. Audio Bandwidth
Throughout this document, we refer to the following definitions: Throughout this document, we refer to the following definitions:
+--------------+----------------+-----------+----------+ +--------------+----------------+-----------+----------+
| Abbreviation | Name | Bandwidth | Sampling | | Abbreviation | Name | Bandwidth | Sampling |
+--------------+----------------+-----------+----------+ +--------------+----------------+-----------+----------+
| nb | Narrowband | 0 - 4000 | 8000 | | nb | Narrowband | 0 - 4000 | 8000 |
| | | | | | | | | |
| mb | Mediumband | 0 - 6000 | 12000 | | mb | Mediumband | 0 - 6000 | 12000 |
| | | | | | | | | |
| wb | Wideband | 0 - 8000 | 16000 | | wb | Wideband | 0 - 8000 | 16000 |
| | | | | | | | | |
| swb | Super-wideband | 0 - 12000 | 24000 | | swb | Super-wideband | 0 - 12000 | 24000 |
| | | | | | | | | |
| fb | Fullband | 0 - 20000 | 48000 | | fb | Fullband | 0 - 20000 | 48000 |
+--------------+----------------+-----------+----------+ +--------------+----------------+-----------+----------+
Audio bandwidth naming Audio bandwidth naming
Table 1 Table 1
3. Opus Codec 3. Opus Codec
The Opus [RFC6716] speech and audio codec has been developed to The Opus [RFC6716] speech and audio codec has been developed to
encode speech signals as well as audio signals. Two different modes, encode speech signals as well as audio signals. Two different modes,
skipping to change at page 6, line 45 skipping to change at page 5, line 41
Complexity can be scaled to optimize for CPU resources in real-time, Complexity can be scaled to optimize for CPU resources in real-time,
mostly as a trade-off between audio quality and bitrate. Also, mostly as a trade-off between audio quality and bitrate. Also,
different modes of Opus have different complexity. different modes of Opus have different complexity.
3.3. Forward Error Correction (FEC) 3.3. Forward Error Correction (FEC)
The voice mode of Opus allows for "in-band" forward error correction The voice mode of Opus allows for "in-band" forward error correction
(FEC) data to be embedded into the bit stream of Opus. This FEC (FEC) data to be embedded into the bit stream of Opus. This FEC
scheme adds redundant information about the previous packet (n-1) to scheme adds redundant information about the previous packet (n-1) to
the current output packet n. For each frame, the encoder decides the current output packet n. For each frame, the encoder decides
whether to use FEC based on (1) an externally-provided estimate of whether to use FEC based on (1) an externally-provided estimate of
the channel's packet loss rate; (2) an externally-provided estimate the channel's packet loss rate; (2) an externally-provided estimate
of the channel's capacity; (3) the sensitivity of the audio or speech of the channel's capacity; (3) the sensitivity of the audio or speech
signal to packet loss; (4) whether the receiving decoder has signal to packet loss; (4) whether the receiving decoder has
indicated it can take advantage of "in-band" FEC information. The indicated it can take advantage of "in-band" FEC information. The
decision to send "in-band" FEC information is entirely controlled by decision to send "in-band" FEC information is entirely controlled by
the encoder and therefore no special precautions for the payload have the encoder and therefore no special precautions for the payload have
to be taken. to be taken.
On the receiving side, the decoder can take advantage of this On the receiving side, the decoder can take advantage of this
skipping to change at page 9, line 8 skipping to change at page 7, line 22
48000 Hz, for all modes and sampling rates of Opus. The unit for the 48000 Hz, for all modes and sampling rates of Opus. The unit for the
timestamp is samples per single (mono) channel. The RTP timestamp timestamp is samples per single (mono) channel. The RTP timestamp
corresponds to the sample time of the first encoded sample in the corresponds to the sample time of the first encoded sample in the
encoded frame. For sampling rates lower than 48000 Hz the number of encoded frame. For sampling rates lower than 48000 Hz the number of
samples has to be multiplied with a multiplier according to Table 2 samples has to be multiplied with a multiplier according to Table 2
to determine the RTP timestamp. to determine the RTP timestamp.
+---------+------------+ +---------+------------+
| fs (Hz) | Multiplier | | fs (Hz) | Multiplier |
+---------+------------+ +---------+------------+
| 8000 | 6 | | 8000 | 6 |
| | | | | |
| 12000 | 4 | | 12000 | 4 |
| | | | | |
| 16000 | 3 | | 16000 | 3 |
| | | | | |
| 24000 | 2 | | 24000 | 2 |
| | | | | |
| 48000 | 1 | | 48000 | 1 |
+---------+------------+ +---------+------------+
Table 2: Timestamp multiplier Table 2: Timestamp multiplier
4.2. Payload Structure 4.2. Payload Structure
The Opus encoder can be set to output encoded frames representing The Opus encoder can be set to output encoded frames representing
2.5, 5, 10, 20, 40, or 60 ms of speech or audio data. Further, an 2.5, 5, 10, 20, 40, or 60 ms of speech or audio data. Further, an
arbitrary number of frames can be combined into a packet. The arbitrary number of frames can be combined into a packet. The
maximum packet length is limited to the amount of encoded data maximum packet length is limited to the amount of encoded data
skipping to change at page 10, line 10 skipping to change at page 8, line 23
(fs) of Opus and how the timestamp needs to be incremented for (fs) of Opus and how the timestamp needs to be incremented for
packetization (ts incr). If the Opus encoder outputs multiple packetization (ts incr). If the Opus encoder outputs multiple
encoded frames into a single packet the timestamps have to be added encoded frames into a single packet the timestamps have to be added
up according to the combined frames. up according to the combined frames.
+---------+-----------------+-----+-----+-----+-----+------+------+ +---------+-----------------+-----+-----+-----+-----+------+------+
| Mode | fs | 2.5 | 5 | 10 | 20 | 40 | 60 | | Mode | fs | 2.5 | 5 | 10 | 20 | 40 | 60 |
+---------+-----------------+-----+-----+-----+-----+------+------+ +---------+-----------------+-----+-----+-----+-----+------+------+
| ts incr | all | 120 | 240 | 480 | 960 | 1920 | 2880 | | ts incr | all | 120 | 240 | 480 | 960 | 1920 | 2880 |
| | | | | | | | | | | | | | | | | |
| voice | nb/mb/wb/swb/fb | | | x | x | x | x | | voice | nb/mb/wb/swb/fb | | | x | x | x | x |
| | | | | | | | | | | | | | | | | |
| audio | nb/wb/swb/fb | x | x | x | x | | | | audio | nb/wb/swb/fb | x | x | x | x | | |
+---------+-----------------+-----+-----+-----+-----+------+------+ +---------+-----------------+-----+-----+-----+-----+------+------+
Table 3: Supported Opus frame sizes and timestamp increments Table 3: Supported Opus frame sizes and timestamp increments
5. Congestion Control 5. Congestion Control
The adaptive nature of the Opus codec allows for an efficient The adaptive nature of the Opus codec allows for an efficient
congestion control. congestion control.
The target bitrate of Opus can be adjusted at any point in time and The target bitrate of Opus can be adjusted at any point in time and
thus allowing for an efficient congestion control. Furthermore, the thus allowing for an efficient congestion control. Furthermore, the
amount of encoded speech or audio data encoded in a single packet can amount of encoded speech or audio data encoded in a single packet can
be used for congestion control since the transmission rate is be used for congestion control since the transmission rate is
skipping to change at page 12, line 48 skipping to change at page 10, line 14
sprop-maxcapturerate: a hint about the maximum input sampling rate sprop-maxcapturerate: a hint about the maximum input sampling rate
that the sender is likely to produce. This is not a guarantee that the sender is likely to produce. This is not a guarantee
that the sender will never send any higher bandwidth (e.g. it that the sender will never send any higher bandwidth (e.g. it
could send a pre-recorded prompt that uses a higher bandwidth), could send a pre-recorded prompt that uses a higher bandwidth),
but it indicates to the receiver that frequencies above this but it indicates to the receiver that frequencies above this
maximum can safely be discarded. This parameter is useful to maximum can safely be discarded. This parameter is useful to
avoid wasting receiver resources by operating the audio processing avoid wasting receiver resources by operating the audio processing
pipeline (e.g. echo cancellation) at a higher rate than necessary. pipeline (e.g. echo cancellation) at a higher rate than necessary.
This parameter can take any value between 8000 and 48000, although This parameter can take any value between 8000 and 48000, although
commonly the value will match one of the Opus bandwidths commonly the value will match one of the Opus bandwidths (Table
(Table 1). By default, the sender is assumed to have no 1). By default, the sender is assumed to have no limitations,
limitations, i.e. 48000. i.e. 48000.
maxptime: the decoder's maximum length of time in milliseconds maxptime: the decoder's maximum length of time in milliseconds
rounded up to the next full integer value represented by the media rounded up to the next full integer value represented by the media
in a packet that can be encapsulated in a received packet in a packet that can be encapsulated in a received packet
according to Section 6 of [RFC4566]. Possible values are 3, 5, according to Section 6 of [RFC4566]. Possible values are 3, 5,
10, 20, 40, and 60 or an arbitrary multiple of Opus frame sizes 10, 20, 40, and 60 or an arbitrary multiple of Opus frame sizes
rounded up to the next full integer value up to a maximum value of rounded up to the next full integer value up to a maximum value of
120 as defined in Section 4. If no value is specified, 120 is 120 as defined in Section 4. If no value is specified, 120 is
assumed as default. This value is a recommendation by the assumed as default. This value is a recommendation by the
decoding side to ensure the best performance for the decoder. The decoding side to ensure the best performance for the decoder. The
skipping to change at page 19, line 10 skipping to change at page 16, line 37
declarative and a participant MUST use the configurations that are declarative and a participant MUST use the configurations that are
provided for the session. More than one configuration may be provided for the session. More than one configuration may be
provided if necessary by declaring multiple RTP payload types; provided if necessary by declaring multiple RTP payload types;
however, the number of types should be kept small. however, the number of types should be kept small.
7. Security Considerations 7. Security Considerations
All RTP packets using the payload format defined in this All RTP packets using the payload format defined in this
specification are subject to the general security considerations specification are subject to the general security considerations
discussed in the RTP specification [RFC3550] and any profile from discussed in the RTP specification [RFC3550] and any profile from
e.g. [RFC3711] or [RFC3551]. e.g. [RFC3711] or [RFC3551].
This payload format transports Opus encoded speech or audio data, This payload format transports Opus encoded speech or audio data,
hence, security issues include confidentiality, integrity protection, hence, security issues include confidentiality, integrity protection,
and authentication of the speech or audio itself. The Opus payload and authentication of the speech or audio itself. The Opus payload
format does not have any built-in security mechanisms. Any suitable format does not have any built-in security mechanisms. Any suitable
external mechanisms, such as SRTP [RFC3711], MAY be used. external mechanisms, such as SRTP [RFC3711], MAY be used.
This payload format and the Opus encoding do not exhibit any This payload format and the Opus encoding do not exhibit any
significant non-uniformity in the receiver-end computational load and significant non-uniformity in the receiver-end computational load and
thus are unlikely to pose a denial-of-service threat due to the thus are unlikely to pose a denial-of-service threat due to the
skipping to change at page 21, line 17 skipping to change at page 17, line 17
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2326] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time [RFC2326] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time
Streaming Protocol (RTSP)", RFC 2326, April 1998. Streaming Protocol (RTSP)", RFC 2326, April 1998.
[RFC2974] Handley, M., Perkins, C., and E. Whelan, "Session [RFC2974] Handley, M., Perkins, C., and E. Whelan, "Session
Announcement Protocol", RFC 2974, October 2000. Announcement Protocol", RFC 2974, October 2000.
[RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
with Session Description Protocol (SDP)", RFC 3264, with Session Description Protocol (SDP)", RFC 3264, June
June 2002. 2002.
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
Jacobson, "RTP: A Transport Protocol for Real-Time Jacobson, "RTP: A Transport Protocol for Real-Time
Applications", STD 64, RFC 3550, July 2003. Applications", STD 64, RFC 3550, July 2003.
[RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
Video Conferences with Minimal Control", STD 65, RFC 3551, Video Conferences with Minimal Control", STD 65, RFC 3551,
July 2003. July 2003.
[RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
Norrman, "The Secure Real-time Transport Protocol (SRTP)", Norrman, "The Secure Real-time Transport Protocol (SRTP)",
RFC 3711, March 2004. RFC 3711, March 2004.
[RFC4288] Freed, N. and J. Klensin, "Media Type Specifications and [RFC4288] Freed, N. and J. Klensin, "Media Type Specifications and
Registration Procedures", BCP 13, RFC 4288, December 2005. Registration Procedures", RFC 4288, December 2005.
[RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
Description Protocol", RFC 4566, July 2006. Description Protocol", RFC 4566, July 2006.
[RFC4855] Casner, S., "Media Type Registration of RTP Payload [RFC4855] Casner, S., "Media Type Registration of RTP Payload
Formats", RFC 4855, February 2007. Formats", RFC 4855, February 2007.
[RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific
Media Attributes in the Session Description Protocol Media Attributes in the Session Description Protocol
(SDP)", RFC 5576, June 2009. (SDP)", RFC 5576, June 2009.
[RFC6562] Perkins, C. and JM. Valin, "Guidelines for the Use of [RFC6562] Perkins, C. and JM. Valin, "Guidelines for the Use of
Variable Bit Rate Audio with Secure RTP", RFC 6562, Variable Bit Rate Audio with Secure RTP", RFC 6562, March
March 2012. 2012.
[RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the
Opus Audio Codec", RFC 6716, September 2012. Opus Audio Codec", RFC 6716, September 2012.
Authors' Addresses Authors' Addresses
Julian Spittka Julian Spittka
Email: jspittka@gmail.com Email: jspittka@gmail.com
 End of changes. 23 change blocks. 
50 lines changed or deleted 50 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/