<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-ietf-avt-rtp-g719-03" ipr="full3978">
  <front>
    <title abbrev="RTP Payload format for G.719">RTP Payload format for
    G.719</title>

    <author fullname="Magnus Westerlund" initials="M." surname="Westerlund">
      <organization>Ericsson AB</organization>

      <address>
        <postal>
          <street>Torshamnsgatan 21-23</street>

          <city>SE-164 83 Stockholm</city>

          <country>SWEDEN</country>
        </postal>

        <phone>+46 8 7190000</phone>

        <email>magnus.westerlund@ericsson.com</email>
      </address>
    </author>

    <author fullname="Ingemar Johansson" initials="I." surname="Johansson">
      <organization>Ericsson AB</organization>

      <address>
        <postal>
          <street>Laboratoriegrand 11</street>

          <city>SE-971 28 Lulea</city>

          <country>SWEDEN</country>
        </postal>

        <phone>+46 73 0783289</phone>

        <email>ingemar.s.johansson@ericsson.com</email>
      </address>
    </author>

    <date day="8" month="Oct" year="2008" />

    <abstract>
      <t>This document specifies the payload format for packetization of the
      G.719 full-band codec encoded audio signals into the Real-time Transport
      Protocol (RTP). The payload format supports transmission of multiple
      channels, multiple frames per payload, and interleaving.</t>
    </abstract>
  </front>

  <middle>
    <section title="Introduction">
      <t>This document specifies the payload format for packetization of the
      G.719 full-band (FB) codec encoded audio signals into the Real-time
      Transport Protocol (RTP) <xref target="RFC3550"></xref>. The payload
      format supports transmission of multiple channels, multiple frames per
      payload, packet loss robustness methods using redundancy or
      interleaving.</t>

      <t>This document starts with conventions, a brief description of the
      codec, and the payload formats capabilities. The payload format is
      specified in <xref target="sec-payload"></xref>. Examples can be found
      in <xref target="sec-examples"></xref>. The media type and its mappings
      to SDP, usage in SDP offer/answer is then specified. The document ends
      with considerations around congestion control and security.</t>
    </section>

    <section title="Definitions and Conventions">
      <t>The term "frame-block" is used in this document to describe the
      time-synchronized set of audio frames in a multi-channel audio session.
      In particular, in an N-channel session, a frame-block will contain N
      audio frames, one from each of the channels, and all N speech frames
      represents exactly the same time period.</t>

      <t>This document contains depictions of bit fields. The most significant
      bit is always leftmost in the figure on each row and have the lowest
      enumeration. For fields that are depicted over multiple rows the upper
      row is more significant than the next.</t>

      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119">RFC 2119</xref>.</t>
    </section>

    <section title="G.719 Description">
      <t>The ITU-T G.719 full-band codec is a transform coder based on
      Modulated Lapped Transform (MLT). G.719 is a low complexity full
      bandwidth codec for conversational speech and audio coding. The encoder
      input and decoder output are sampled at 48 kHz. The codec enables full
      bandwidth, from 20 Hz to 20 kHz, encoding of speech, music and general
      audio content at rates from 32 kbit/s up to 128 kbit/s. The codec
      operates on 20ms frames and has an algorithmic delay of 40 ms.</t>

      <t>The codec provides excellent quality for speech, music and other
      types of audio. Some of the applications for which this coder is
      suitable are:<list style="symbols">
          <t>Real-time communications such as video conferencing and
          telephony.</t>

          <t>Streaming audio</t>

          <t>Archival and messaging</t>
        </list></t>

      <t>The encoding and decoding algorithm can change the bit rate at any
      20ms frame boundary. The encoder receives the audio sampled at 48kHz.
      The support of other sampling rates is possible by re-sampling the input
      signal to the codec's sampling rate, i.e. 48kHz, however, this
      functionality is not part of the standard.</t>

      <t>The encoding is performed on equally sized frames. For each frame,
      the encoder decides between two encoding modes, a transient mode and a
      stationary mode. The decision is based on statistics derived from the
      input signal. The stationary mode uses a long MLT that leads to a
      spectrum of 960 coefficients while the transient encoding mode uses a
      short MLT (higher time resolution transform) which results in 4 spectra
      (4 x 240 = 960 coefficients). The encoding of the spectrum is done in
      two steps. First, the spectral envelope is computed, quantized and
      Huffman encoded. The envelope is computed on a non-uniform frequency
      subdivision. From the coded spectral envelope, a weighted spectral
      envelope is derived and is used for bit-allocation, this process is also
      repeated at the decoder, thus only the spectral envelope is transmitted.
      The output of the bit-allocation is used in order to quantize the
      spectra. In addition, for stationary frames the encoder estimates the
      amount of noise level. The decoder applies the reverse operation upon
      reception of the bit stream. The non-coded coefficients (i.e. no bits
      allocated) are replaced by entries of a noise codebook which is built
      based on the decoded coefficients.</t>
    </section>

    <section title="Payload format Capabilities">
      <t>This payload format have a number of capabilities and this section
      discuss them in some detail.</t>

      <section title="Multi-rate Encoding and Rate Adaptation">
        <t>G.719 supports multi-rate encoding capability that enables on a per
        frame basis variation of the encoding rate. This enables support for
        bit-rate adaptation and congestion control. The possibility to
        aggregate multiple audio frames into a single RTP payload is another
        dimension of adaptation. The RTP and payload format overhead can thus
        be reduced by the aggregation at the cost of increased delay and
        reduced packet-loss robustness.</t>
      </section>

      <section title="Support for Multi-Channel Sessions">
        <t>The RTP payload format defined in this document supports
        multi-channel audio content (e.g. stereophonic or surround audio
        sessions). Although the G.719 codec itself does not support encoding
        of multi-channel audio content into a single bit stream, it can be
        used to separately encode and decode each of the individual channels.
        To transport (or store) the separately encoded multi-channel content,
        the audio frames for all channels that are framed and encoded for the
        same 20 ms period are logically collected in a "frame-block".</t>

        <t>At the session setup, out-of-band signaling must be used to
        indicate the number of channels in the payload type. The order of the
        audio frames within the frame-block depends on the number of the
        channels and follows the definition in Section 4.1 of the RTP/<xref
        target="RFC3551">AVP Profile</xref>. When using SDP for signaling, the
        number of channels is specified in the rtpmap attribute.</t>
      </section>

      <section title="Robustness against Packet Loss">
        <t>The payload format supports several means, including forward error
        correction (FEC) and frame interleaving, to increase robustness
        against packet loss.</t>

        <section anchor="sec-fec"
                 title="Use of Forward Error Correction (FEC)">
          <t>Generic forward error correction within RTP is defined, for
          example, in RFC 5109 <xref target="RFC5109"></xref>. Audio
          redundancy coding is defined in RFC 2198 <xref
          target="RFC2198"></xref>. Either scheme can be used to add redundant
          information to the RTP packet stream and make it more resilient to
          packet losses, at the expense of a higher bit rate. Please see
          either RFCs for a discussion of the implications of the higher bit
          rate to network congestion.</t>

          <t>In addition to these media-unaware mechanisms, this memo
          specifies an optional G.719 specific form of audio redundancy
          coding, which may be beneficial in terms of packetization overhead.
          Conceptually, previously transmitted transport frames are aggregated
          together with new ones. A sliding window can be used to group the
          frames to be sent in each payload. However, irregular or
          non-consecutive patterns are also possible by inserting NO_DATA
          frames between primary and redundant transmissions. <xref
          target="fig-red"></xref> below shows an example.</t>

          <figure anchor="fig-red"
                  title="An example of redundant transmission">
            <artwork><![CDATA[
--+--------+--------+--------+--------+--------+--------+--------+--
  | f(n-2) | f(n-1) |  f(n)  | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
--+--------+--------+--------+--------+--------+--------+--------+--

   <---- p(n-1) ---->
            <----- p(n) ----->
                     <---- p(n+1) ---->
                              <---- p(n+2) ---->
                                       <---- p(n+3) ---->
                                                <---- p(n+4) ---->
]]></artwork>
          </figure>

          <t>Here, each frame is retransmitted once in the following RTP
          payload packet. f(n-2)...f(n+4) denote a sequence of audio frames,
          and p(n-1)...p(n+4) a sequence of payload packets.</t>

          <t>The mechanism described does not really require signaling at the
          session setup. However, signalling has been defined to allow for the
          sender to voluntarily bounding the buffering and delay requirements.
          If nothing is signalled the use of this mechanism is allowed and
          unbounded. For a certain timestamp, the receiver may receive
          multiple copies of a frame containing encoded audio data, even at
          different encoding rates. The cost of this scheme is bandwidth and
          the receiver delay necessary to allow the redundant copy to
          arrive.</t>

          <t>This redundancy scheme provides a functionality similar to the
          one described in RFC 2198, but it works only if both original frames
          and redundant representations are G.719 frames. When the use of
          other media coding schemes is desirable, one has to resort to RFC
          2198.</t>

          <t>The sender is responsible for selecting an appropriate amount of
          redundancy based on feedback about the channel conditions, e.g., in
          the RTP Control Protocol (RTCP) <xref target="RFC3550"></xref>
          receiver reports. The sender is also responsible for avoiding
          congestion, which may be exacerbated by redundancy (see <xref
          target="sec-congestion"></xref> for more details).</t>
        </section>

        <section anchor="sec-interleaving" title="Use of Frame Interleaving">
          <t>To decrease protocol overhead, the payload design allows several
          audio transport frames to be encapsulated into a single RTP packet.
          One of the drawbacks of such an approach is that in case of packet
          loss several consecutive frames are lost. Consecutive frame loss
          normally renders error concealment less efficient and usually causes
          clearly audible and annoying distortions in the reconstructed audio.
          Interleaving of transport frames can improve the audio quality in
          such cases by distributing the consecutive losses into a number of
          isolated frame losses, which are easier to conceal. However,
          interleaving and bundling several frames per payload also increases
          end-to-end delay and sets higher buffering requirements. Therefore,
          interleaving is not appropriate for all use cases or devices.
          Streaming applications should most likely be able to exploit
          interleaving to improve audio quality in lossy transmission
          conditions.</t>

          <t>Note that this payload design supports the use of frame
          interleaving as an option. The usage of this feature needs to be
          negotiated in the session setup.</t>

          <t>The interleaving supported by this format is rather flexible. For
          example, a continuous pattern can be defined, as depicted in <xref
          target="fig-interleaving"></xref>.</t>

          <figure anchor="fig-interleaving"
                  title="An example of interleaving pattern that has constant delay">
            <artwork><![CDATA[
--+--------+--------+--------+--------+--------+--------+--------+--
  | f(n-2) | f(n-1) |  f(n)  | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
--+--------+--------+--------+--------+--------+--------+--------+--

           [ p(n)   ]
  [ p(n+1) ]                 [ p(n+1) ]
                    [ p(n+2) ]                 [ p(n+2) ]
                                      [ p(n+3) ]                 
                                                        [ p(n+4) ]]]></artwork>
          </figure>

          <t>In <xref target="fig-interleaving"></xref> the consecutive
          frames, denoted f(n-2) to f(n+4), are aggregated into packets p(n)
          to p(n+4), each packet carrying two frames. This approach provides
          an interleaving pattern that allows for constant delay in both the
          interleaving and de-interleaving processes. The de-interleaving
          buffer needs to have room for at least three frames, including the
          one that is ready to be consumed. The storage space for three frames
          is needed, for example, when f(n) is the next frame to be decoded:
          since frame f(n) was received in packet p(n+2), which also carried
          frame f(n+3), both these frames are stored in the buffer.
          Furthermore, frame f(n+1) received in the previous packet, p(n+1),
          is also in the de-interleaving buffer. Note also that in this
          example the buffer occupancy varies: when frame f(n+1) is the next
          one to be decoded, there are only two frames, f(n+1) and f(n+3), in
          the buffer.</t>
        </section>

        <t></t>
      </section>

      <t></t>
    </section>

    <section anchor="sec-payload" title="Payload format">
      <t>The main purpose of the payload design for G.719 is to maximize the
      potential of the codec to its fullest degree with an as minimal overhead
      as possible. In the design both basic and interleaved modes have been
      included as the codec is suitable both for conversational and other low
      delay applications as well as streaming, where more delay is
      acceptable.</t>

      <t>The main structural difference between the basic and interleaved
      modes is the extension of the table of content entries with frame
      displacement fields in the interleaved mode. The basic mode supports
      aggregation of multiple consecutive frames in a payload. The interleaved
      mode supports aggregation of multiple frames that are non-consecutive in
      time. In both modes it is possible to have frames encoded with different
      frame types in the same payload.</t>

      <t>The payload format also supports the usage of G.719 for carrying
      multi-channel content using one discrete encoder per channel all using
      the same bit-rate. In this case a complete frame-block with data from
      all channels are included in the RTP payload. The data is the
      concatenation of all the encoded audio frames in the order specified for
      that number of included channels. Also interleaving is done on complete
      frame-blocks rather than individual audio frames.</t>

      <section title="RTP Header Usage">
        <t>The RTP timestamp corresponds to the sampling instant of the first
        sample encoded for the first frame-block in the packet. The timestamp
        clock frequency SHALL be 48000 Hz. The timestamp is also used to
        recover the correct decoding order of the frame-blocks.</t>

        <t>The RTP header marker bit (M) SHALL be set to 1 whenever the first
        frame-block carried in the packet is the first frame-block in a
        talkspurt (see definition of the talkspurt in section 4.1 of <xref
        target="RFC3551"></xref>). For all other packets the marker bit SHALL
        be set to zero (M=0).</t>

        <t>The assignment of an RTP payload type for the format defined in
        this memo is outside the scope of this document. The RTP profiles in
        use currently mandates binding the payload type dynamically for this
        payload format. This is basically necessary due to that the payload
        type expresses the configuration of the payload itself, i.e. basic or
        interleaved mode and the number of channels carried.</t>

        <t>The remaining RTP header fields are used as specified in RFC 3550
        <xref target="RFC3550"></xref>.</t>
      </section>

      <section title="Payload Structure">
        <t>The payload consists of one or more table of contents (ToC) entires
        followed by the audio data corresponding to the ToC entries. The
        following sections describe both the basic mode and the interleaved
        mode. Each ToC entry MUST be padded to a byte boundary to ensure octet
        alignment. The rules regarding maximum payload size given in Section
        3.2 of <xref target="I-D.ietf-tsvwg-udp-guidelines"></xref> SHOULD be
        followed.</t>

        <section title="Basic ToC element">
          <t>All the different formats and modes in this draft use a common
          basic ToC which may be extended in the different options described
          below.</t>

          <figure anchor="fig-toc" title="Basic TOC element">
            <artwork><![CDATA[
 0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
|F|    L    |R|R|
+-+-+-+-+-+-+-+-+
]]></artwork>
          </figure>

          <t><list style="hanging">
              <t hangText="F (1 bit):">If set to 1, indicates that this ToC
              entry is followed by another ToC entry; if set to 0, indicates
              that this ToC entry is the last one in the ToC.</t>

              <t hangText="L (5 bits):">A field that gives the frame length of
              each individual frame within the frame-block. <figure
                  anchor="tab-l-values"
                  title="How to map L values to frame lengths">
                  <artwork><![CDATA[
     L          length(bytes) 
    ============================ 
     0           0 NO_DATA 
     1-7         N/A (reserved)
     8-22        80+10*(L-8) 
    23-27        240+20*(L-23) 
    28-31        N/A (reserved)
]]></artwork>
                </figure>L=0 (NO_DATA) is used to indicate an empty frame,
              this is useful if frames are missing e.g at re-packetization or
              to insert gaps when sending redundant frames together with
              primary frames in the same payload.<vspace />The value range
              [1..7] and [28..31] inclusive is reserved for future use in this
              draft version, if these values occur in a ToC the entire packet
              SHOULD be treated as invalid and discarded.<vspace />A few
              examples are given below where the frame size and the
              corresponding codec bitrate is computed based on the value
              L.<figure anchor="tab-l-examples"
                  title="Examples of L values and corresponding frame lengths">
                  <artwork><![CDATA[
      L    Bytes    Codec Bitrate(kbps) 
    ===================================
      8      80        32 
      9      90        36 
     10     100        40 
     12     120        48 
     16     160        64 
     22     220        88 
     23     240        96 
     25     280       112 
     27     320       128
]]></artwork>
                </figure>This encoding yields a granularity of 4kbps between
              32 and 88kbps and a granularity of 8kbps between 88 and 128kbps
              with a defined range of 32-128kbps for the codec data.</t>

              <t hangText="R (2bits):">Reserved bits. SHALL be set to 0 on
              sending and SHALL be ignored on reception.</t>
            </list></t>
        </section>
      </section>

      <section title="Basic mode">
        <t>The basic ToC element <xref target="fig-toc"></xref> is followed by
        a one octet field for the number of frame-blocks (#frames) to form the
        ToC entry. The frame-blocks field tells how many frame-blocks of the
        same length the ToC entry relates to.</t>

        <figure anchor="fig-frames" title="Number of frame-blocks field">
          <artwork><![CDATA[
 0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
|    #frames    |
+-+-+-+-+-+-+-+-+]]></artwork>
        </figure>
      </section>

      <section title="Interleaved mode">
        <t>The basic ToC is followed by a one octet field for the number of
        frame-blocks (#frames) and then the DIS fields to form a ToC entry in
        interleaved mode. The frame-blocks field tells how many frame-blocks
        of the same length the ToC relates to. The DIS fields, one for each
        frame-block indicated by the #frames field, express the interleaving
        distance between audio frames carried in the payload. If necessary to
        achieve octet alignment, a 4-bit padding is added.</t>

        <figure anchor="fig-frames-interleave"
                title="Number of frame-block + interleave fields">
          <artwork><![CDATA[
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    #frames    | DIS1  |  ...  | DISi  |  ...  | DISn  | Padd  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+]]></artwork>
        </figure>

        <t><list style="hanging">
            <t hangText="DIS1...DISn (4 bits):">A list of n (n=#frames)
            displacement fields indicating the displacement of the i:th
            (i=1..n) audio frame-block relative to the preceding frame-block
            in the payload, in units of 20 ms long audio frame-blocks). The
            four-bit unsigned integer displacement values may be between 0 and
            15 indicating the number of audio frame-blocks in decoding order
            between the (i-1):th and the i:th frame in the payload. Note that
            for the first ToC entry of the payload the value of DIS1 is
            meaningless. It SHALL be set to zero by a sender, and SHALL be
            ignored by a receiver. This frame-block's location in the decoding
            order is uniquely defined by the RTP timestamp. Note that for
            subsequent ToC entries DIS1 indicates the number of frames between
            the last frame of the previous group and the first frame of this
            group.</t>

            <t hangText="Padd (4 bits):">To ensure octet alignment, four
            padding bits SHALL be included at the end of the ToC entry in case
            there is an odd number of frame-blocks in the group referenced by
            this ToC entry. These bits SHALL be set to zero and SHALL be
            ignored by the receiver. If a group containing an even number of
            frames is referenced by this ToC entry, these padding bits SHALL
            NOT be included in the payload.</t>
          </list></t>

        <t></t>
      </section>

      <section title="Audio Data">
        <t>The audio data part follows the table of contents. All the octets
        comprising an audio frame SHALL be appended to the payload as a unit.
        For each frame-block the audio frames are concatenated in order
        indicated by table in Section 4.1 of <xref target="RFC3551"></xref>
        for the number of channels configured for the payload type in use. So
        the first channel (left most) indicated comes first followed by the
        next channel. The audio frame-blocks are packetized in increasing
        timestamp order within each group of frame-blocks (per ToC entry),
        i.e. oldest frame-block first. The groups of frame-blocks are
        packetized in the same order as their corresponding ToC entries.</t>

        <t>The audio frames are specified in ITU recommendation <xref
        target="ITU-T-G719"></xref>.</t>

        <t>The G.719 bit stream is split into a sequence of octets and
        transmitted in order from the left most (most significant&ndash;MSB)
        bit to the right most (least significant &ndash;LSB) bit.</t>
      </section>

      <section title="Implementation Considerations">
        <t>An application implementing this payload format MUST understand all
        the payload parameters specified in this specification. Any mapping of
        the parameters to a signaling protocol MUST support all parameters. So
        an implementation of this payload format in an application using SDP
        is required to understand all the payload parameters in their
        SDP-mapped form. This requirement ensures that an implementation
        always can decide whether it is capable of communicating when the
        communicating enties support this version of the specification.</t>

        <t>Basic mode SHALL be implemented and the interleaved mode SHOULD be
        implemented. The implementation burden of both is rather small, and
        supporting both ensures interoperability. However, interleaving is not
        mandated as it has limited applicability for conversational
        application that requires tight delay boundaries.</t>

        <section title="Receiving Redundant Frames">
          <t>The reception of redundant audio frames, i.e. more than one audio
          frame from the same source for the same time slot, MUST be supported
          by the implementation. In the case that the receiver gets multiple
          audio frames in different bit-rates for the same time slot it is
          RECOMMENDED that the receiver keeps the one with the highest
          bit-rate.</t>
        </section>

        <section title="Interleaving">
          <t>The use of interleaving requires further considerations. As
          presented in the example in <xref target="sec-interleaving"></xref>,
          a given interleaving pattern requires a certain amount of the
          de-interleaving buffer. This buffer space, expressed in a number of
          transport frame slots, is indicated by the "interleaving" media type
          parameter. The number of frame slots needed can be converted into
          actual memory requirements by considering the 320 bytes per frame
          used by the highest bit-rate rate of G.719.</t>

          <t>The information about the frame buffer size is not always
          sufficient to determine when it is appropriate to start consuming
          frames from the interleaving buffer. Additional information is
          needed when the interleaving pattern changes. The "int-delay" media
          type parameter is defined to convey this information. It allows a
          sender to indicate the minimal media time that needs to be present
          in the buffer before the decoder can start consuming frames from the
          buffer. Because the sender has full control over the interleaving
          pattern, it can calculate this value. In certain cases (for example,
          if joining a multicast session with interleaving mid-session), a
          receiver may initially receive only part of the packets in the
          interleaving pattern. This initial partial reception (in frame
          sequence order) of frames can yield too few frames for acceptable
          quality from the audio decoding. This problem also arises when using
          encryption for access control, and the receiver does not have the
          previous key. Although the G.719 is robust and thus tolerant to a
          high random frame erasure rate, it would have difficulties handling
          consecutive frame losses at startup. Thus, some special
          implementation considerations are described.</t>

          <t>In order to handle this type of startup efficiently, decoding can
          start provided that:<list style="numbers">
              <t>There are at least two consecutive frames available.</t>

              <t>More than or equal to half the frames are available in the
              time period from where decoding was planned to start and the
              most forward received decoding.</t>
            </list>After receiving a number of packets, in the worst case as
          many packets as the interleaving pattern covers, the previously
          described effects disappear and normal decoding is resumed. Similar
          issues arise when a receiver leaves a session or has lost access to
          the stream. If the receiver leaves the session, this would be a
          minor issue since playout is normally stopped. The sender can avoid
          this type of problem in many sessions by starting and ending
          interleaving patterns correctly when risks of losses occur. One such
          example is a key-change done for access control to encrypted
          streams. If only some keys are provided to clients and there is a
          risk of they receiving content for which they do not have the key,
          it is recommended that interleaving patterns do not overlap key
          changes.</t>
        </section>

        <section title="Decoding Validation">
          <t>If the receiver finds a mismatch between the size of a received
          payload and the size indicated by the ToC of the payload, the
          receiver SHOULD discard the packet. This is recommended because
          decoding a frame parsed from a payload based on erroneous ToC data
          could severely degrade the audio quality.</t>
        </section>

        <t></t>
      </section>
    </section>

    <section anchor="sec-examples" title="Payload Examples">
      <t>A few examples to highlight the payload format</t>

      <section title="3 mono frames with 2 different bitrates">
        <t>The first example is a payload consisting of 3 mono frames where
        the 2 first frames correspond to a bitrate of 32kbps (80byte/frame)
        and the last is 48kbps (120byte/frame).</t>

        <figure>
          <artwork><![CDATA[
   The first 32 bits are ToC fields. 
   Bit 0 is '1' as another ToC field follow.
   Bits 1..5 is 01000 = 80bytes/frame
   Bits 8..15 is 00000010 = 2 frame-blocks with 80bytes/frame
   Bit 16 is '0', no more ToC follows
   Bits 17..21 is 01100 = 120 bytes/frame
   Bits 24..31 = 00000001 = 1 frame-block with 120bytes/frame]]></artwork>
        </figure>

        <figure>
          <artwork><![CDATA[
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |1|0 1 0 0 0|0 0|0 0 0 0 0 0 1 0|0|0 1 1 0 0|0 0|0 0 0 0 0 0 0 1|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |d(0)   frame 1                                                 |
   .                                                               .
   |                                                         d(639)|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |d(0)   frame 2                                                 |
   .                                                               .
   |                                                         d(639)|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |d(0)   frame 3                                                 |
   .                                                               .
   |                                                         d(959)|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+]]></artwork>
        </figure>
      </section>

      <section title="2 stereo frame-blocks of the same bitrate">
        <t>A payload consisting of 2 stereo frames corresponding to a bitrate
        of 32kbps (80byte/frame) per channel. The receiver calculates the
        number of frames in the audio block by multiplying the value of the
        channels parameter (2) with the #frames field value (2) to derive that
        there are 4 audio frames in the payload.</t>

        <figure>
          <artwork><![CDATA[
   The first 16 bits is the ToC field. 
   Bit 0 is '0' as no ToC field follow.
   Bits 1..5 is 01000 = 80bytes/frame
   Bits 8..15 is 00000010 = 2 frame-blocks with 80bytes/frame]]></artwork>
        </figure>

        <figure>
          <artwork><![CDATA[
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0|0 1 0 0 0|0 0|0 0 0 0 0 0 1 0| d(0) frame 1 left ch.         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   .                                                               .
   |                         d(639)| d(0) frame 1 right ch.        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   .                                                               .
   |                         d(639)| d(0) frame 2 left ch.         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   .                                                               .
   |                         d(639)| d(0) frame 2 right ch.        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                         d(639)|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+]]></artwork>
        </figure>
      </section>

      <section title="4 mono frames interleaved">
        <t>A payload consisting of 4 mono frames corresponding to a bitrate of
        32kbps (80byte/frame) interleaved. A pattern of interleaving for
        constant delay when aggregating 4 frames is used in the below example.
        The actual packet illustrated is packet n, while the previous and
        following packets frame-block content is shown to illustrate the
        pattern.</t>

        <figure>
          <artwork><![CDATA[   
   Packet n-3:  1,  6, 11, 16
   Packet n-2:  5, 10, 15, 20
   Packet n-1:  9, 14, 19, 24
   Packet   n: 13, 18, 23, 28
   Packet n+1: 17, 22, 27, 32
   Packet n+2: 21, 26, 31, 36 

   The first 16 bits is the ToC field. 
   Bit 0 is '0' as there are no ToC field following.
   Bits 1..5 is 01000 = 80bytes/frame
   Bits 8..15 is 00000100 = 4 frame-blocks with 80bytes/frame
   Bits 16..19 is 0000 = DIS1 (0)
   Bits 20..23 is 0100 = DIS2 (4)
   Bits 24..27 is 0100 = DIS3 (4)
   Bits 28..31 is 0100 = DIS4 (4)]]></artwork>
        </figure>

        <figure>
          <artwork><![CDATA[
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0|0 1 0 0 0|0 0|0 0 0 0 0 1 0 0|0 0 0 0|0 1 0 0|0 1 0 0|0 1 0 0|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | d(0) frame 13                                                 |
   .                                                               .
   |                                                         d(639)| 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | d(0) frame 18                                                 |
   .                                                               .
   |                                                         d(639)| 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | d(0) frame 23                                                 |
   .                                                               .
   |                                                         d(639)| 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | d(0) frame 28                                                 |
   .                                                               .
   |                                                         d(639)| 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+]]></artwork>
        </figure>
      </section>
    </section>

    <section title="Payload Format Parameters">
      <t>This RTP payload format is identified using the media type audio/g719
      which is registered in accordance with <xref target="RFC4855"></xref>
      and using the template of <xref target="RFC4288"></xref>.</t>

      <section anchor="sec-media-type" title="Media Type Definition">
        <t>The media type for the G.719 codec is allocated from the IETF tree
        since G.719 is a has the potential to become a widely used audio codec
        in general VoIP, teleconferencing and streaming applications. This
        media type registration covers real-time transfer via RTP.</t>

        <t>Note, any unspecified parameter MUST be ignored by the receiver to
        ensure that additional parameters can be added in any future revision
        of this specification.</t>

        <t>Type name: audio</t>

        <t>Subtype name: g719</t>

        <t>Required parameters: none</t>

        <t>Optional parameters: <list style="hanging">
            <t hangText="interleaving:">Indicates that interleaved mode SHALL
            be used for the payload. The parameter specifies the number of
            frame-block slots available in a de-interleaving buffer (including
            the frame that is ready to be consumed). Its value is equal to one
            plus the maximum number of frames that can precede any frame in
            transmission order and follow the frame in RTP timestamp order.
            The value MUST be greater than zero. If this parameter is not
            present, interleaved mode SHALL NOT be used.</t>

            <t hangText="int-delay:">The minimal media time delay in
            milliseconds that is needed to avoid underrun in the
            de-interleaving buffer before starting decoding, i.e., the
            difference in RTP timestamp ticks between the earliest and latest
            audio frame present in the de-interleaving buffer expressed in
            milliseconds. The value is a stream property and provided per
            source. The allowed values are 0 to the largest value expressible
            by a unsigned 16 bit integer (65535). Please note that the in
            practice largest value that can be used is equal to the declared
            size of the interleaving buffer of the receiver. If the value for
            some reason is larger than the receiver buffer declared by or for
            the receiver this value defaults to the size of the receiver
            buffer. For sources for which this value hasn't been provided the
            value defaults to the size of the receiver buffer. The format is
            comma separated list of SSRC ":" delay in ms pairs which in <xref
            target="RFC5234">ABNF</xref> is expressed as: <list style="empty">
                <t>int-delay = "int-delay:" source-delay *(","
                source-delay)</t>

                <t>source-delay = SSRC ":" delay-value</t>

                <t>SSRC = 1*8HEXDIG ; The 32-bit SSRC encoded in hex
                format</t>

                <t>delay-value = 1*5DIGIT ; The delay value in
                milliseconds</t>

                <t>Example: int-delay=ABCD1234:1000,4321DCB:640</t>

                <t>NOTE: No white space allowed in the parameter before the
                end of all the value pairs</t>
              </list></t>

            <t hangText="max-red:">The maximum duration in milliseconds that
            elapses between the primary (first) transmission of a frame and
            any redundant transmission that the sender will use. This
            parameter allows a receiver to have a bounded delay when
            redundancy is used. Allowed values are between 0 (no redundancy
            will be used) and 65535. If the parameter is omitted, no
            limitation on the use of redundancy is present.</t>

            <t hangText="channels:">The number of audio channels. The possible
            values (1-6) and their respective channel order is specified in
            Section 4.1 in <xref target="RFC3551"></xref>. If omitted, it has
            the default value of 1.</t>

            <t hangText="CBR:">Constant Bit Rate (CBR), indicates the exact
            codec-bitrate in bits per second (not including the overhead from
            packetization, RTP header or lower layers) that the codec MUST
            use. CBR is to be used when dynamic rate cannot be supported (one
            case is e.g gateway to H.320). CBR is mostly used for gateways to
            circuit switch networks. Therefore the CBR rate is the rate not
            including any FEC as specified in <xref target="sec-fec"></xref>.
            If FEC is to be used the b= parameter MUST be used to allow the
            extra bit rate needed to send the redundant information. It is
            RECOMMENDED that this parameter is only used when necessary to
            establish a working communication. The usage of this parameter
            have implications on congestion control that needs to be
            considered, see <xref target="sec-congestion"></xref>.</t>

            <t hangText="ptime:">see <xref target="RFC4566"></xref>.</t>

            <t hangText="maxptime:">see <xref target="RFC4566"></xref>.</t>
          </list></t>

        <t>Encoding considerations:<list style="empty">
            <t>This media type is framed and binary, see section 4.8 in <xref
            target="RFC4288">RFC4288</xref>.</t>
          </list></t>

        <t>Security considerations: <list style="empty">
            <t>See <xref target="sec-sec"></xref> of RFC XXXX.</t>
          </list></t>

        <t>Interoperability considerations:</t>

        <t><list style="empty">
            <t>The support of the Interleaving mode is not mandatory and needs
            to be negotiated. See <xref target="sec-map-sdp"></xref> for how
            to that for SDP based protocols.</t>
          </list></t>

        <t>Published specification:<list style="empty">
            <t>RFC XXXX</t>
          </list></t>

        <t>Applications that use this media type:<list style="empty">
            <t>Real-time audio applications like voice over IP and
            teleconference, and multi-media streaming.</t>
          </list></t>

        <t>Additional information: none</t>

        <t>Person &amp; email address to contact for further information:<list
            style="empty">
            <t>Payload format: IngemarJohansson
            &lt;ingemar.s.johansson@ericsson.com&gt;</t>
          </list></t>

        <t>Intended usage: COMMON</t>

        <t>Restrictions on usage:<list style="empty">
            <t>This media type depends on RTP framing, and hence is only
            defined for transfer via RTP <xref target="RFC3550"></xref>.
            Transport within other framing protocols is not defined at this
            time.</t>
          </list></t>

        <t>Author: <list style="empty">
            <t>Ingemar Johansson &lt;ingemar.s.johansson@ericsson.com&gt;</t>

            <t>Magnus Westerlund &lt;magnus.westerlund@ericsson.com&gt;</t>
          </list></t>

        <t>Change controller:<list style="empty">
            <t>IETF Audio/Video Transport working group delegated from the
            IESG.</t>
          </list>Additional Information:</t>

        <t><list style="empty">
            <t>File storage of G.719 encoded audio in ISO base media file
            format is specified in Annex A of <xref
            target="ITU-T-G719"></xref>. Thus media file formats such as MP4
            (audio/mp4 or video/mp4) <xref target="RFC4337"></xref> and 3GP
            (audio/3GPP and video/3GPP) <xref target="RFC3839"></xref> can
            contain G.719 encoded audio. </t>
          </list></t>
      </section>

      <section anchor="sec-map-sdp" title="Mapping to SDP">
        <t>The information carried in the media type specification has a
        specific mapping to fields in the Session Description Protocol (SDP)
        <xref target="RFC4566"></xref>, which is commonly used to describe RTP
        sessions. When SDP is used to specify sessions employing the G.719
        codec, the mapping is as follows: <list style="symbols">
            <t>The media type ("audio") goes in SDP "m=" as the media
            name.</t>

            <t>The media subtype (payload format name) goes in SDP "a=rtpmap"
            as the encoding name. The RTP clock rate in "a=rtpmap" MUST be
            48000, and the encoding parameter "channels" (<xref
            target="sec-media-type"></xref>) MUST either be explicitly set to
            N or omitted, implying a default value of 1. The values of N that
            are allowed are specified in Section 4.1 in <xref
            target="RFC3551"></xref>.</t>

            <t>The parameters "ptime" and "maxptime" go in the SDP "a=ptime"
            and "a=maxptime" attributes, respectively.</t>

            <t>Any remaining parameters go in the SDP "a=fmtp" attribute by
            copying them directly from the media type parameter string as a
            semicolon-separated list of parameter=value pairs.</t>
          </list></t>

        <section title="Offer/Answer Considerations">
          <t>The following considerations apply when using SDP Offer-Answer
          procedures to negotiate the use of G.719 payload in RTP: <list
              hangIndent="" style="symbols">
              <t>Each combination of the RTP payload transport format
              configuration parameters (interleaving, and channels) is unique
              in its bit-pattern and not compatible with any other
              combination. When creating an offer in an application desiring
              to use the more advanced features (interleaving, or more than
              one channel), the offerer is RECOMMENDED to also offer a payload
              type containing only the configuration with a single channel. If
              multiple configurations are of interest to the application, they
              may all be offered; however, care should be taken not to offer
              too many payload types. An SDP answerer MUST include, in the SDP
              answer for a payload type, the following parameters unmodified
              from the SDP offer (unless it removes the payload type):
              "interleaving"; and "channels". However, the value of the
              Interleaving parameter MAY be changed. The SDP offerer and
              answerer MUST generate G.719 packets as described by these
              parameters.</t>

              <t>The "interleaving" and "int-delay" parameter's values have a
              specific relationship that needs to be considered. It also
              depends on the directionality of the streams and their delivery
              method. The high level explanation that can be understood from
              the definition is that the value of "interleaving" declares the
              size of the receiver buffer, while int-delay is a stream
              property provided by the sender to inform how much buffer space
              it in practice is using for the stream it sends.<list
                  style="symbols">
                  <t>For media streams which is sent over multicast the value
                  of "interleaving" SHALL NOT be changed by the answerer. It
                  shall either be accepted or the payload type deleted. The
                  value of the "int-delay" parameter is a stream property and
                  provided by the offer/answer agent that intends to send
                  media with this payload type, and for each stream coming
                  from that agent (one or more). The value MUST be between 0
                  and what corresponds to the buffer size declared by the
                  value of the "interleaving" parameter.</t>

                  <t>For unicast streams which the offerer declares as
                  send-only the value of the "interleaving" parameter is the
                  size that the answerer is RECOMMENDED to use by the offerer.
                  The answerer MAY change it to any allowed value. The
                  int-delay parameter value will be the one the offerer
                  intends to use unless the answerer reduce the value of the
                  interleaving parameter below what is needed for that
                  int-delay value. If the interleaving value in the answer is
                  smaller than the offer's int-delay, the int-delay value is
                  per default reduced to be corresponding to the interleaving
                  value. If the offerer is not satisfied with this he will
                  need to perform another round of offer/answer. As the
                  answerer will not send any media it doesn't include any
                  int-delay in the answer.</t>

                  <t>For unicast streams which the offerer declares as
                  recvonly the value of interleaving in the offer will be the
                  offerer's size of the interleaving buffer. The answerer
                  indicate its preferred size of the interleaving buffer for
                  any future round of offer/answer. The offerer will not
                  provide any int-delay parameter as it is not sending any
                  media. The answerer is recommended in its answer include a
                  int-delay parameter to declare what the property is for the
                  stream it is going to send. As it already know the receivers
                  interleaving buffer size, there should be no issue with
                  providing a value that is between 0 and corresponding to a
                  full de-interleaving buffer.</t>

                  <t>For unicast streams which the offer declares as sendrecv
                  streams the value of the interleaving parameter in the offer
                  will be offerer's size of the interleaving buffer. The
                  answerer will in the answer indicate the size of its actual
                  interleaving buffer. It is recommended that this value is as
                  least as big as the offer's. The offerer is recommended to
                  include a int-delay parameter that is selected based on that
                  the answerer has at least as much interleaving space as the
                  offerer unless nothing else is known. As the offerer's
                  interleaving buffer size is not yet known this may fail, in
                  which cases the default rule is to downgrade the value of
                  the int-delay to correspond to the full size of the
                  answerer's interleaving buffer. If the offerer isn't
                  satisfied with this it will need to initiate another round
                  of offer/answer. The answerer is recommended in its answer
                  include a int-delay parameter to declare what the property
                  is for the stream(s) it is going to send. As it already know
                  the receivers interleaving buffer size, there should be no
                  issue with providing a value that is between 0 and
                  corresponding to a full de-interleaving buffer.</t>
                </list></t>

              <t>In most cases, the parameters "maxptime" and "ptime" will not
              affect interoperability; however, the setting of the parameters
              can affect the performance of the application. The SDP offer-
              answer handling of the "ptime" parameter is described in <xref
              target="RFC3264"></xref>. The "maxptime" parameter MUST be
              handled in the same way.</t>

              <t>The parameter "max-red" is a stream property parameter. For
              sendonly or sendrecv unicast media streams, the parameter
              declares the limitation on redundancy that the stream sender
              will use. For recvonly streams, it indicates the desired value
              for the stream sent to the receiver. The answerer MAY change the
              value, but is RECOMMENDED to use the same limitation as the
              offer declares. In the case of multicast, the offerer MAY
              declare a limitation; this SHALL be answered using the same
              value. A media sender using this payload format is RECOMMENDED
              to always include the "max-red" parameter. This information is
              likely to simplify the media stream handling in the receiver.
              This is especially true if no redundancy will be used, in which
              case "max-red" is set to 0.</t>

              <t>Any unknown parameter in an offer SHALL be removed in the
              answer.</t>

              <t>The b= SDP parameter SHOULD be used to negotiate the maximum
              bandwidth to be used for the audio stream. The offerer may offer
              a maximum rate and the answer may contain a lower rate. If no b=
              parameter is present in the offer or answer it implies a rate up
              to 128kbps</t>

              <t>The parameter "CBR" is a receiver capability, i.e. only
              receivers that really requires constant bit-rate should use it.
              Usage of this parameter have negative impact on the possibility
              to perform congestion control, see Section 9. For recvonly and
              sendrecv streams, it indicates the desired constant bit rate
              that the receiver wants to accept. A sender MUST be able to send
              constant bit rate stream since it is a subset of the variable
              bit rate capability. If the offer includes this parameter the
              answerer MUST send G.719 audio at the constant bit rate if it is
              within the allowed session bit rate (b= parameter). If the
              answerer can not support the stated CBR this payload type must
              be refused in the answer. The answerer SHOULD only include this
              parameter if it self requires to receive at a constant bit rate,
              even if the offer did not include the CBR parameter. In this
              case, the offerer SHALL send at the constant bit rate but SHALL
              be able to accept media at variable bit rate. An answerer is
              RECOMMEND to use the same CBR rate as in the offer, as symmetric
              usage is more likely to work. If both sides requires a
              particular CBR rate there is the possibility of communication
              failure when one or both sides can't transmit the requested
              rate. In this case the agent detecting this issue will have to
              perform a second round of offer/answer to try to find another
              working configuration or end the established session. In case
              the offer contained a CBR parameter but the answer does not,
              then the offerer is free to transmit at any rate to the
              answerer, but the answerer is restricted to the declared
              rate.</t>
            </list></t>
        </section>

        <section title="Declarative SDP Considerations">
          <t>In declarative usage, like SDP in RTSP <xref
          target="RFC2326"></xref> or SAP <xref target="RFC2974"></xref>, the
          parameters SHALL be interpreted as follows: <list style="symbols">
              <t>The payload format configuration parameters (interleaving,
              and channels) are all declarative, and a participant MUST use
              the configuration(s) that is provided for the session. More than
              one configuration may be provided if necessary by declaring
              multiple RTP payload types; however, the number of types should
              be kept small.</t>

              <t>It might not be possible to know the SSRC values that are
              going to be used by the sources at the time of sending the SDP.
              This is not a major issues as the size of the interleaving
              buffer can be tailored towards the values actually going to be
              used. Thus ensuring that the default values for int-delay is not
              resulting in to much extra buffering.</t>

              <t>Any "maxptime" and "ptime" values should be selected with
              care to ensure that the session's participants can achieve
              reasonable performance.</t>

              <t>The parameter "CBR" if included applies to all RTP streams
              using that payload type for which a particular CBR rate is
              declared. Usage of this parameter have negative impact on the
              possibility to perform congestion control, see Section 9. </t>
            </list></t>
        </section>
      </section>
    </section>

    <section title="IANA Considerations">
      <t>One media type (audio/g719) has been defined and needs registration
      in the media types registry; see <xref
      target="sec-media-type"></xref>.</t>
    </section>

    <section anchor="sec-congestion" title="Congestion Control">
      <t>The general congestion control considerations for transporting RTP
      data apply; see RTP <xref target="RFC3550"></xref> and any applicable
      RTP profile like AVP <xref target="RFC3551"></xref>. However, the
      multi-rate capability of G.719 audio coding provides a mechanism that
      may help to control congestion, since the bandwidth demand can be
      adjusted (within the limits of the codec) by selecting a different
      encoding bit-rate.</t>

      <t>The number of frames encapsulated in each RTP payload highly
      influences the overall bandwidth of the RTP stream due to header
      overhead constraints. Packetizing more frames in each RTP payload can
      reduce the number of packets sent and hence the header overhead, at the
      expense of increased delay and reduced error robustness. If forward
      error correction (FEC) is used, the amount of FEC-induced redundancy
      needs to be regulated such that the use of FEC itself does not cause a
      congestion problem.</t>

      <t>The CBR signalling parameter allows a receiver to lock down a RTP
      payload type to use a single encoding rate. As this prevents the codec
      rate from being lowered when congestion is experienced, the sender is
      constrained to either change the packetization or abort the
      transmission. Since these responses to congestion are severely limited,
      implementations SHOULD NOT use the CBR parameter unless they are
      interacting with a device that cannot support variable bit rate (e.g. a
      gateway to H.320 systems). When using CBR mode, a receiver MUST monitor
      the packet loss rate to ensure congestion is not caused, following the
      guidelines in Section 2 of RFC 3551.</t>
    </section>

    <section anchor="sec-sec" title="Security Considerations">
      <t>RTP packets using the payload format defined in this specification
      are subject to the general security considerations discussed in RTP
      <xref target="RFC3550"></xref> and any applicable profile such as AVP
      <xref target="RFC3551"></xref> or SAVP <xref target="RFC3711"></xref>.
      As this format transports encoded audio, the main security issues
      include confidentiality, integrity protection, and data origin
      authentication of the audio itself. The payload format itself does not
      have any built-in security mechanisms. Any suitable external mechanisms,
      such as SRTP <xref target="RFC3711"></xref>, MAY be used.</t>

      <t>This payload format and the G.719 decoder do not exhibit any
      significant non-uniformity in the receiver-side computational complexity
      for packet processing, and thus are unlikely to pose a denial-of-service
      threat due to the receipt of pathological data. The payload format or
      the codec data does not contain any type of active content such as
      scripts.</t>

      <section title="Confidentiality">
        <t>In order to ensure confidentiality of the encoded audio, all audio
        data bits MUST be encrypted. There is less need to encrypt the payload
        header or the table of contents since they only carry information
        about the frame type. This information could also be useful to a third
        party, for example, for quality monitoring.</t>

        <t>The use of interleaving in conjunction with encryption can have a
        negative impact on confidentiality, for a short period of time.
        Consider the following packets (in brackets) containing frame numbers
        as indicated: {10, 14, 18}, {13, 17, 21}, {16, 20, 24} (a popular
        continuous diagonal interleaving pattern). The originator wishes to
        deny some participants the ability to hear material starting at time
        16. Simply changing the key on the packet with the timestamp at or
        after 16, and denying that new key to those participants, does not
        achieve this; frames 17, 18, and 21 have been supplied in prior
        packets under the prior key, and error concealment may make the audio
        intelligible at least as far as frame 18 or 19, and possibly
        further.</t>
      </section>

      <section title="Authentication and Integrity">
        <t>To authenticate the sender of the audio-stream, an external
        mechanism MUST be used. It is RECOMMENDED that such a mechanism
        protects both the complete RTP header and the payload (audio and data
        bits). Data tampering by a man-in-the-middle attacker could replace
        audio content and also result in erroneous depacketization/decoding
        that could lower the audio quality.</t>
      </section>
    </section>

    <section anchor="Acknowledgements" title="Acknowledgements">
      <t>The authors would like to thank Roni Even and Anisse Taleb for their
      help with this draft. We would also like to thank the people that has
      provided feedback; Colin Perkins, Mark Baker and Stephen Botzko.</t>
    </section>
  </middle>

  <back>
    <references title="Informative References">
      <?rfc include='reference.RFC.2198'?>

      <?rfc include='reference.RFC.2326'?>

      <?rfc include='reference.RFC.2974'?>

      <?rfc include='reference.RFC.3711'?>

      <?rfc include='reference.RFC.3839'?>

      <?rfc include='reference.RFC.4288'?>

      <?rfc include='reference.RFC.4337'?>

      <?rfc include='reference.RFC.4855'?>

      <?rfc include='reference.RFC.5109'?>
    </references>

    <references title="Normative References">
      <reference anchor="ITU-T-G719">
        <front>
          <title>Specification : ITU-T G.719 extension for 20 kHz fullband
          audio</title>

          <author fullname="">
            <organization>ITU-T</organization>
          </author>

          <date day="16" month="April" year="2008" />
        </front>
      </reference>

      <?rfc include="reference.RFC.2119"?>

      <?rfc include='reference.RFC.3264'?>

      <?rfc include='reference.RFC.3550'?>

      <?rfc include='reference.RFC.3551'?>

      <?rfc include='reference.RFC.4566'?>

      <?rfc include='reference.RFC.5234'?>

      <?rfc include='reference.I-D.ietf-tsvwg-udp-guidelines'?>
    </references>
  </back>
</rfc>