draft-ietf-payload-rtp-opus-07.txt   draft-ietf-payload-rtp-opus-08.txt 
Network Working Group J. Spittka Network Working Group J. Spittka
Internet-Draft Internet-Draft
Intended status: Standards Track K. Vos Intended status: Standards Track K. Vos
Expires: July 17, 2015 vocTone Expires: August 10, 2015 vocTone
JM. Valin JM. Valin
Mozilla Mozilla
January 13, 2015 February 6, 2015
RTP Payload Format for Opus Speech and Audio Codec RTP Payload Format for the Opus Speech and Audio Codec
draft-ietf-payload-rtp-opus-07 draft-ietf-payload-rtp-opus-08
Abstract Abstract
This document defines the Real-time Transport Protocol (RTP) payload This document defines the Real-time Transport Protocol (RTP) payload
format for packetization of Opus encoded speech and audio data format for packetization of Opus encoded speech and audio data
necessary to integrate the codec in the most compatible way. necessary to integrate the codec in the most compatible way.
Further, it describes media type registrations for the RTP payload Further, it describes media type registrations for the RTP payload
format. format.
Status of This Memo Status of This Memo
skipping to change at page 1, line 37 skipping to change at page 1, line 37
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on July 17, 2015. This Internet-Draft will expire on August 10, 2015.
Copyright Notice Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 26 skipping to change at page 2, line 26
3.1.3. Discontinuous Transmission (DTX) . . . . . . . . . . 4 3.1.3. Discontinuous Transmission (DTX) . . . . . . . . . . 4
3.2. Complexity . . . . . . . . . . . . . . . . . . . . . . . 5 3.2. Complexity . . . . . . . . . . . . . . . . . . . . . . . 5
3.3. Forward Error Correction (FEC) . . . . . . . . . . . . . 5 3.3. Forward Error Correction (FEC) . . . . . . . . . . . . . 5
3.4. Stereo Operation . . . . . . . . . . . . . . . . . . . . 6 3.4. Stereo Operation . . . . . . . . . . . . . . . . . . . . 6
4. Opus RTP Payload Format . . . . . . . . . . . . . . . . . . . 6 4. Opus RTP Payload Format . . . . . . . . . . . . . . . . . . . 6
4.1. RTP Header Usage . . . . . . . . . . . . . . . . . . . . 6 4.1. RTP Header Usage . . . . . . . . . . . . . . . . . . . . 6
4.2. Payload Structure . . . . . . . . . . . . . . . . . . . . 7 4.2. Payload Structure . . . . . . . . . . . . . . . . . . . . 7
5. Congestion Control . . . . . . . . . . . . . . . . . . . . . 8 5. Congestion Control . . . . . . . . . . . . . . . . . . . . . 8
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8
6.1. Opus Media Type Registration . . . . . . . . . . . . . . 8 6.1. Opus Media Type Registration . . . . . . . . . . . . . . 8
6.2. Mapping to SDP Parameters . . . . . . . . . . . . . . . . 12 7. SDP Considerations . . . . . . . . . . . . . . . . . . . . . 12
6.2.1. Offer-Answer Model Considerations for Opus . . . . . 13 7.1. SDP Offer/Answer Considerations . . . . . . . . . . . . . 13
6.2.2. Declarative SDP Considerations for Opus . . . . . . . 15 7.2. Declarative SDP Considerations for Opus . . . . . . . . . 15
7. Security Considerations . . . . . . . . . . . . . . . . . . . 15 8. Security Considerations . . . . . . . . . . . . . . . . . . . 15
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 15 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 15
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 16 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 16
9.1. Normative References . . . . . . . . . . . . . . . . . . 16 10.1. Normative References . . . . . . . . . . . . . . . . . . 16
9.2. Informative References . . . . . . . . . . . . . . . . . 17 10.2. Informative References . . . . . . . . . . . . . . . . . 17
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17
1. Introduction 1. Introduction
The Opus codec is a speech and audio codec developed within the IETF Opus [RFC6716] is a speech and audio codec developed within the IETF
Internet Wideband Audio Codec working group. The codec has a very Internet Wideband Audio Codec working group. The codec has a very
low algorithmic delay and it is highly scalable in terms of audio low algorithmic delay and it is highly scalable in terms of audio
bandwidth, bitrate, and complexity. Further, it provides different bandwidth, bitrate, and complexity. Further, it provides different
modes to efficiently encode speech signals as well as music signals, modes to efficiently encode speech signals as well as music signals,
thus making it the codec of choice for various applications using the thus making it the codec of choice for various applications using the
Internet or similar networks. Internet or similar networks.
This document defines the Real-time Transport Protocol (RTP) This document defines the Real-time Transport Protocol (RTP)
[RFC3550] payload format for packetization of Opus encoded speech and [RFC3550] payload format for packetization of Opus encoded speech and
audio data necessary to integrate the Opus codec in the most audio data necessary to integrate Opus in the most compatible way.
compatible way. Further, it describes media type registrations for Further, it describes media type registrations for the RTP payload
the RTP payload format. More information on the Opus codec can be format.
obtained from [RFC6716].
2. Conventions, Definitions and Acronyms used in this document 2. Conventions, Definitions and Acronyms used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119]. document are to be interpreted as described in [RFC2119].
audio bandwidth: The range of audio frequecies being coded audio bandwidth: The range of audio frequecies being coded
CBR: Constant bitrate CBR: Constant bitrate
CPU: Central Processing Unit CPU: Central Processing Unit
skipping to change at page 3, line 44 skipping to change at page 3, line 44
| | | | | | | | | |
| FB | Fullband | 0 - 20000 | 48000 | | FB | Fullband | 0 - 20000 | 48000 |
+--------------+----------------+-----------------+-----------------+ +--------------+----------------+-----------------+-----------------+
Audio bandwidth naming Audio bandwidth naming
Table 1 Table 1
3. Opus Codec 3. Opus Codec
The Opus [RFC6716] codec encodes speech signals as well as general Opus encodes speech signals as well as general audio signals. Two
audio signals. Two different modes can be chosen, a voice mode or an different modes can be chosen, a voice mode or an audio mode, to
audio mode, to allow the most efficient coding depending on the type allow the most efficient coding depending on the type of the input
of the input signal, the sampling frequency of the input signal, and signal, the sampling frequency of the input signal, and the intended
the intended application. application.
The voice mode allows efficient encoding of voice signals at lower The voice mode allows efficient encoding of voice signals at lower
bit rates while the audio mode is optimized for general audio signals bit rates while the audio mode is optimized for general audio signals
at medium and higher bitrates. at medium and higher bitrates.
The Opus speech and audio codec is highly scalable in terms of audio Opus is highly scalable in terms of audio bandwidth, bitrate, and
bandwidth, bitrate, and complexity. Further, Opus allows complexity. Further, Opus allows transmitting stereo signals.
transmitting stereo signals.
3.1. Network Bandwidth 3.1. Network Bandwidth
Opus supports bitrates from 6 kb/s to 510 kb/s. The bitrate can be Opus supports bitrates from 6 kb/s to 510 kb/s. The bitrate can be
changed dynamically within that range. All other parameters being changed dynamically within that range. All other parameters being
equal, higher bitrates result in higher quality. equal, higher bitrates result in higher audio quality.
3.1.1. Recommended Bitrate 3.1.1. Recommended Bitrate
For a frame size of 20 ms, these are the bitrate "sweet spots" for For a frame size of 20 ms, these are the bitrate "sweet spots" for
Opus in various configurations: Opus in various configurations:
o 8-12 kb/s for NB speech, o 8-12 kb/s for NB speech,
o 16-20 kb/s for WB speech, o 16-20 kb/s for WB speech,
o 28-40 kb/s for FB speech, o 28-40 kb/s for FB speech,
o 48-64 kb/s for FB mono music, and o 48-64 kb/s for FB mono music, and
o 64-128 kb/s for FB stereo music. o 64-128 kb/s for FB stereo music.
3.1.2. Variable versus Constant Bitrate 3.1.2. Variable versus Constant Bitrate
For the same average bitrate, variable bitrate (VBR) can achieve For the same average bitrate, variable bitrate (VBR) can achieve
higher quality than constant bitrate (CBR). For the majority of higher audio quality than constant bitrate (CBR). For the majority
voice transmission applications, VBR is the best choice. One reason of voice transmission applications, VBR is the best choice. One
for choosing CBR is the potential information leak that _might_ occur reason for choosing CBR is the potential information leak that
when encrypting the compressed stream. See [RFC6562] for guidelines _might_ occur when encrypting the compressed stream. See [RFC6562]
on when VBR is appropriate for encrypted audio communications. In for guidelines on when VBR is appropriate for encrypted audio
the case where an existing VBR stream needs to be converted to CBR communications. In the case where an existing VBR stream needs to be
for security reasons, then the Opus padding mechanism described in converted to CBR for security reasons, then the Opus padding
[RFC6716] is the RECOMMENDED way to achieve padding because the RTP mechanism described in [RFC6716] is the RECOMMENDED way to achieve
padding bit is unencrypted. padding because the RTP padding bit is unencrypted.
The bitrate can be adjusted at any point in time. To avoid The bitrate can be adjusted at any point in time. To avoid
congestion, the average bitrate SHOULD NOT exceed the available congestion, the average bitrate SHOULD NOT exceed the available
network bandwidth. If no target bitrate is specified, the bitrates network bandwidth. If no target bitrate is specified, the bitrates
specified in Section 3.1.1 are RECOMMENDED. specified in Section 3.1.1 are RECOMMENDED.
3.1.3. Discontinuous Transmission (DTX) 3.1.3. Discontinuous Transmission (DTX)
The Opus codec can, as described in Section 3.1.2, be operated with a Opus can, as described in Section 3.1.2, be operated with a variable
variable bitrate. In that case, the encoder will automatically bitrate. In that case, the encoder will automatically reduce the
reduce the bitrate for certain input signals, like periods of bitrate for certain input signals, like periods of silence. When
silence. When using continuous transmission, it will reduce the using continuous transmission, it will reduce the bitrate when the
bitrate when the characteristics of the input signal permit, but will characteristics of the input signal permit, but will never interrupt
never interrupt the transmission to the receiver. Therefore, the the transmission to the receiver. Therefore, the received signal
received signal will maintain the same high level of quality over the will maintain the same high level of audio quality over the full
full duration of a transmission while minimizing the average bit rate duration of a transmission while minimizing the average bit rate over
over time. time.
In cases where the bitrate of Opus needs to be reduced even further In cases where the bitrate of Opus needs to be reduced even further
or in cases where only constant bitrate is available, the Opus or in cases where only constant bitrate is available, the Opus
encoder can use discontinuous transmission (DTX), where parts of the encoder can use discontinuous transmission (DTX), where parts of the
encoded signal that correspond to periods of silence in the input encoded signal that correspond to periods of silence in the input
speech or audio signal are not transmitted to the receiver. A speech or audio signal are not transmitted to the receiver. A
receiver can distinguish between DTX and packet loss by looking for receiver can distinguish between DTX and packet loss by looking for
gaps in the sequence number, as described by Section 4.1 gaps in the sequence number, as described by Section 4.1
of [RFC3551]. of [RFC3551].
skipping to change at page 11, line 17 skipping to change at page 11, line 17
usedtx: specifies if the decoder prefers the use of DTX. Possible usedtx: specifies if the decoder prefers the use of DTX. Possible
values are 1 and 0. If no value is specified, the default is 0. values are 1 and 0. If no value is specified, the default is 0.
Encoding considerations: Encoding considerations:
The Opus media type is framed and consists of binary data The Opus media type is framed and consists of binary data
according to Section 4.8 in [RFC6838]. according to Section 4.8 in [RFC6838].
Security considerations: Security considerations:
See Section 7 of this document. See Section 8 of this document.
Interoperability considerations: none Interoperability considerations: none
Published specification: RFC [XXXX] Published specification: RFC [XXXX]
Note to the RFC Editor: Replace [XXXX] with the number of the Note to the RFC Editor: Replace [XXXX] with the number of the
published RFC. published RFC.
Applications that use this media type: Applications that use this media type:
skipping to change at page 12, line 15 skipping to change at page 12, line 15
Author: Author:
Julian Spittka jspittka@gmail.com Julian Spittka jspittka@gmail.com
Koen Vos koenvos74@gmail.com Koen Vos koenvos74@gmail.com
Jean-Marc Valin jmvalin@jmvalin.ca Jean-Marc Valin jmvalin@jmvalin.ca
Change controller: IETF Payload Working Group delegated from the IESG Change controller: IETF Payload Working Group delegated from the IESG
6.2. Mapping to SDP Parameters 7. SDP Considerations
The information described in the media type specification has a The information described in the media type specification has a
specific mapping to fields in the Session Description Protocol (SDP) specific mapping to fields in the Session Description Protocol (SDP)
[RFC4566], which is commonly used to describe RTP sessions. When SDP [RFC4566], which is commonly used to describe RTP sessions. When SDP
is used to specify sessions employing the Opus codec, the mapping is is used to specify sessions employing Opus, the mapping is as
as follows: follows:
o The media type ("audio") goes in SDP "m=" as the media name. o The media type ("audio") goes in SDP "m=" as the media name.
o The media subtype ("opus") goes in SDP "a=rtpmap" as the encoding o The media subtype ("opus") goes in SDP "a=rtpmap" as the encoding
name. The RTP clock rate in "a=rtpmap" MUST be 48000 and the name. The RTP clock rate in "a=rtpmap" MUST be 48000 and the
number of channels MUST be 2. number of channels MUST be 2.
o The OPTIONAL media type parameters "ptime" and "maxptime" are o The OPTIONAL media type parameters "ptime" and "maxptime" are
mapped to "a=ptime" and "a=maxptime" attributes, respectively, in mapped to "a=ptime" and "a=maxptime" attributes, respectively, in
the SDP. the SDP.
o The OPTIONAL media type parameters "maxaveragebitrate", o The OPTIONAL media type parameters "maxaveragebitrate",
"maxplaybackrate", "stereo", "cbr", "useinbandfec", and "usedtx", "maxplaybackrate", "stereo", "cbr", "useinbandfec", and "usedtx",
skipping to change at page 13, line 28 skipping to change at page 13, line 28
maxaveragebitrate=20000; stereo=1; useinbandfec=1; usedtx=0 maxaveragebitrate=20000; stereo=1; useinbandfec=1; usedtx=0
a=ptime:40 a=ptime:40
a=maxptime:40 a=maxptime:40
Example 3: Two-way full-band stereo preferred Example 3: Two-way full-band stereo preferred
m=audio 54312 RTP/AVP 101 m=audio 54312 RTP/AVP 101
a=rtpmap:101 opus/48000/2 a=rtpmap:101 opus/48000/2
a=fmtp:101 stereo=1; sprop-stereo=1 a=fmtp:101 stereo=1; sprop-stereo=1
6.2.1. Offer-Answer Model Considerations for Opus 7.1. SDP Offer/Answer Considerations
When using the offer-answer procedure described in [RFC3264] to When using the offer-answer procedure described in [RFC3264] to
negotiate the use of Opus, the following considerations apply: negotiate the use of Opus, the following considerations apply:
o Opus supports several clock rates. For signaling purposes only o Opus supports several clock rates. For signaling purposes only
the highest, i.e. 48000, is used. The actual clock rate of the the highest, i.e. 48000, is used. The actual clock rate of the
corresponding media is signaled inside the payload and is not corresponding media is signaled inside the payload and is not
restricted by this payload format description. The decoder MUST restricted by this payload format description. The decoder MUST
be capable of decoding every received clock rate. An example is be capable of decoding every received clock rate. An example is
shown below: shown below:
skipping to change at page 15, line 5 skipping to change at page 14, line 48
"stereo" is 0, as this would lead to inefficient use of network "stereo" is 0, as this would lead to inefficient use of network
resources. The "stereo" parameter does not affect resources. The "stereo" parameter does not affect
interoperability. interoperability.
o The "cbr" parameter is a unidirectional receive-only parameter. o The "cbr" parameter is a unidirectional receive-only parameter.
o The "useinbandfec" parameter is a unidirectional receive-only o The "useinbandfec" parameter is a unidirectional receive-only
parameter. parameter.
o The "usedtx" parameter is a unidirectional receive-only parameter. o The "usedtx" parameter is a unidirectional receive-only parameter.
o Any unknown parameter in an offer MUST be ignored by the receiver o Any unknown parameter in an offer MUST be ignored by the receiver
and MUST be removed from the answer. and MUST be removed from the answer.
6.2.2. Declarative SDP Considerations for Opus The Opus parameters in an SDP Offer/Answer exchange are completely
orthogonal, and there is no relationship between the SDP Offer and
the Answer.
7.2. Declarative SDP Considerations for Opus
For declarative use of SDP such as in Session Announcement Protocol For declarative use of SDP such as in Session Announcement Protocol
(SAP), [RFC2974], and RTSP, [RFC2326], for Opus, the following needs (SAP), [RFC2974], and RTSP, [RFC2326], for Opus, the following needs
to be considered: to be considered:
o The values for "maxptime", "ptime", "maxplaybackrate", and o The values for "maxptime", "ptime", "maxplaybackrate", and
"maxaveragebitrate" ought to be selected carefully to ensure that "maxaveragebitrate" ought to be selected carefully to ensure that
a reasonable performance can be achieved for the participants of a a reasonable performance can be achieved for the participants of a
session. session.
o The values for "maxptime", "ptime", and of the payload format o The values for "maxptime", "ptime", and of the payload format
configuration are recommendations by the decoding side to ensure configuration are recommendations by the decoding side to ensure
the best performance for the decoder. the best performance for the decoder.
o All other parameters of the payload format configuration are o All other parameters of the payload format configuration are
declarative and a participant MUST use the configurations that are declarative and a participant MUST use the configurations that are
provided for the session. More than one configuration can be provided for the session. More than one configuration can be
provided if necessary by declaring multiple RTP payload types; provided if necessary by declaring multiple RTP payload types;
however, the number of types ought to be kept small. however, the number of types ought to be kept small.
7. Security Considerations 8. Security Considerations
All RTP packets using the payload format defined in this All RTP packets using the payload format defined in this
specification are subject to the general security considerations specification are subject to the general security considerations
discussed in the RTP specification [RFC3550] and any profile from, discussed in the RTP specification [RFC3550] and any profile from,
e.g., [RFC3711] or [RFC3551]. e.g., [RFC3711] or [RFC3551].
This payload format transports Opus encoded speech or audio data. This payload format transports Opus encoded speech or audio data.
Hence, security issues include confidentiality, integrity protection, Hence, security issues include confidentiality, integrity protection,
and authentication of the speech or audio itself. Opus does not and authentication of the speech or audio itself. Opus does not
provide any confidentiality or integrity protection. Any suitable provide any confidentiality or integrity protection. Any suitable
external mechanisms, such as SRTP [RFC3711], MAY be used. external mechanisms, such as SRTP [RFC3711], MAY be used.
This payload format and the Opus encoding do not exhibit any This payload format and the Opus encoding do not exhibit any
significant non-uniformity in the receiver-end computational load and significant non-uniformity in the receiver-end computational load and
thus are unlikely to pose a denial-of-service threat due to the thus are unlikely to pose a denial-of-service threat due to the
receipt of pathological datagrams. receipt of pathological datagrams.
8. Acknowledgements 9. Acknowledgements
Many people have made useful comments and suggestions contributing to Many people have made useful comments and suggestions contributing to
this document. In particular, we would like to thank Tina le Grand, this document. In particular, we would like to thank Tina le Grand,
Cullen Jennings, Jonathan Lennox, Gregory Maxwell, Colin Perkins, Jan Cullen Jennings, Jonathan Lennox, Gregory Maxwell, Colin Perkins, Jan
Skoglund, Timothy B. Terriberry, Martin Thompson, Justin Uberti, Skoglund, Timothy B. Terriberry, Martin Thompson, Justin Uberti,
Magnus Westerlund, and Mo Zanaty. Magnus Westerlund, and Mo Zanaty.
9. References 10. References
9.1. Normative References 10.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2326] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time [RFC2326] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time
Streaming Protocol (RTSP)", RFC 2326, April 1998. Streaming Protocol (RTSP)", RFC 2326, April 1998.
[RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
with Session Description Protocol (SDP)", RFC 3264, June with Session Description Protocol (SDP)", RFC 3264, June
2002. 2002.
skipping to change at page 17, line 9 skipping to change at page 17, line 9
Variable Bit Rate Audio with Secure RTP", RFC 6562, March Variable Bit Rate Audio with Secure RTP", RFC 6562, March
2012. 2012.
[RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the
Opus Audio Codec", RFC 6716, September 2012. Opus Audio Codec", RFC 6716, September 2012.
[RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type
Specifications and Registration Procedures", BCP 13, RFC Specifications and Registration Procedures", BCP 13, RFC
6838, January 2013. 6838, January 2013.
9.2. Informative References 10.2. Informative References
[RFC2974] Handley, M., Perkins, C., and E. Whelan, "Session [RFC2974] Handley, M., Perkins, C., and E. Whelan, "Session
Announcement Protocol", RFC 2974, October 2000. Announcement Protocol", RFC 2974, October 2000.
Authors' Addresses Authors' Addresses
Julian Spittka Julian Spittka
Email: jspittka@gmail.com Email: jspittka@gmail.com
 End of changes. 22 change blocks. 
56 lines changed or deleted 58 lines changed or added

This html diff was produced by rfcdiff 1.42. The latest version is available from http://tools.ietf.org/tools/rfcdiff/