<?xml version="1.0"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc ipr="full3978" 
     updates="RFC 2198"
     category='std'
     docName="draft-ietf-avt-forward-shifted-red-02.txt">
    <front>
        <title abbrev="Forward-shifted RTP Redundancy"> 
	Forward-shifted RTP Redundancy Payload Support
	</title>

	<author initials="Q." surname="Xie"
	    fullname="Qiaobing Xie">
	    <organization abbrev="TRG">
	    The Resource Group
	    </organization>
	    <address>
	      <postal>
	      <street>1700 Pennsylvania Ave. NW</street>
	      <city>Washington</city> <region>DC</region>
	      <code>20006</code>
	      <country>US</country>
	      </postal>
	      <phone>+1-224-465-5954</phone>
	      <email>Qiaobing.Xie@gmail.com</email>
	    </address>
	</author>

	<date day="7" month="October" year="2008" />
	<area>General</area>
	<workgroup>Audio Video Transport WG</workgroup>
	<keyword>Internet-Draft</keyword>

<abstract>

<t> This document defines a simple enhancement to RFC 2198 to support
RTP sessions with forward-shifted redundant encodings, i.e., redundant
data sent before the corresponding primary data. Forward-shifted
redundancy can be used to conceal losses of a large number of
consecutive media frames (e.g., consecutive loss of seconds or even tens 
of seconds of media).</t> 

</abstract>

    </front>

    <middle>

<section title="Conventions">

<t> The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, NOT RECOMMENDED, MAY, and OPTIONAL, when they
appear in this document, are to be interpreted as described in RFC
2119 <xref target="RFC2119" />. </t>

</section> <!-- end of "Conventions"-->

<section title="Introduction">

<t> This document defines a simple enhancement to RFC 2198 <xref
target="RFC2198" /> to support RTP sessions with forward-shifted
redundant encodings, i.e., redundant data is sent before the
corresponding primary data. </t>

<t> Forward-shifted redundancy can be used to conceal losses of a
large number of consecutive media frames (e.g., consecutive loss of 
seconds of media). Such capability is highly desirable,
especially in wireless mobile communication environments where the
radio signal to a mobile wireless media receiver can be temporarily
blocked by tall buildings, mountains, tunnels, etc. In other words, the
receiver enters into a shadow of the radio coverage. No new data will
be received when the receiver is in a shadow. </t>

<t> In some extreme cases, the receiver may have to spend seconds or
even tens of seconds in a shadow. The traditional backward-shifted
redundant encoding scheme (i.e., redundant data is sent after the
primary data), as currently supported by RFC 2198 <xref
target="RFC2198" />, is known to be ineffective in dealing with such
consecutive frame losses.  </t>

<t> In contrast, the forward-shifted redundancy, when used in
combination with the anti-shadow loss management at the receiver (as
described in <xref target="ANTI_SHADOW" />), can effectively prevent
service interruptions when a mobile receiver runs into such a
shadow. </t>

<section title="Sending Redundant Data Inband vs. Out-of-band">

<t> Regardless of the direction of time shift (e.g., forward-shifting
or backward-shifting as in RFC 2198) or the encoding scheme (e.g.,
FEC, or non-FEC), there is always the option of sending the redundant
data and the primary data either in the same RTP session (i.e.,
inband) or in separate RTP sessions (i.e., out-of-band). There are
pros and cons for either approach, as outlined below.  </t>

<t> Inband Approach: 
  <list style="hanging">
   <t hangText="o"> Pro: A single RTP session is faster to setup and
   easier to manage. </t>

   <t hangText="o"> Pro: A single RTP session presents a simpler
   problem for NAT/firewall traversal. </t>

   <t hangText="o"> Pro: Less overall overhead - one RTP/UDP/IP
   overhead. </t>

   <t hangText="o"> Con: Lack of flexibility - difficult for middle
   boxes such as gateways to add/remove the redundant data.  </t>

   <t hangText="o"> Con: Need more specification - special payload
   formats need to be defined to carry the redundant data inband.
   </t>

  </list>
</t>

<t> Out-of-band Approach: 
  <list style="hanging">

   <t hangText="o"> Pro: Flexibility - redundant data can be more
   easily added, removed, or replaced by a middle box such as a
   gateway. </t>

   <t hangText="o"> Pro: Little or none specification - no new payload
   format is needed. </t>

   <t hangText="o"> Con: Multiple RTP sessions may take longer to
   setup and more complexity to manage. </t>

   <t hangText="o"> Con: Multiple RTP sessions NAT/firewall traverse
   are harder to address. </t>

   <t hangText="o"> Con: Bigger overall overhead - more than one
   RTP/UDP/IP overhead. </t>

  </list>
</t>

<t> It is noteworthy that the specification of inband payload formats
such as this and RFC 2198 does not preclude a deployment from using
the out-of-band approach. Rather, it gives the deployment the choose
to use whichever approach deemed most beneficiary under a given
circumstance.  </t>

</section> <!-- end of "In-band vs. out-of-band" -->

</section> <!-- end of "Introduction" -->


<section title="Allowing Forward-shifted Redundant Data">

<t> In RFC 2198, the timestamp offset in the additional header
corresponding to a redundant block is defined as a 14 bits unsigned
offset of timestamp relative to timestamp given in the RTP header.  As
stated in RFC 2198:  
   <figure>
      <artwork>
  "The use of an unsigned offset implies that redundant data must be
  sent after the primary data, and is hence a time to be subtracted
  from the current timestamp to determine the timestamp of the data
  for which this block is the redundancy."
     </artwork>
  </figure>

This effectively prevents RFC 2198 from being used to support
forward-shifted redundant blocks. </t>

<t> In order to support the use of forward-shifted redundant blocks,
the media type "fwdred" which allows an optional MIME parameter,
"forwardshift", is introduced for indicating the capability and
willingness of using forward-shifted redundancy and the base value of
timestamp forward-shifting. The base value of "forwardshift" is an
integer equal or greater than '0'. </t>

<t> In an RTP session which uses forward-shifted redundant encodings,
the timestamp of a redundant block in a received RTP packet is
determined as follows:
   <figure>
      <artwork>
  timestamp of redundant block = timestamp in RTP header
                      - timestamp offset in additional header 
                      + forward shift base value
     </artwork>
  </figure>
</t>

</section> <!-- title="Allowing Forward-shifted Redundant Data" -->


<section anchor='fwdred-sect' 
title='Registration of Media Type "fwdred"'>

<t> (The definition is based on media type "red" defined in RFC 2198
<xref target="RFC2198"/>, with the addition of the optional
"forwardshift" parameter.)

<list style="hanging">
   <t hangText="Type name:"> audio </t>

   <t hangText="Subtype names:"> fwdred </t>

   <t hangText="Required parameters:"> none </t>

   <t hangText="Optional parameters:"></t>

   <list style="hanging">
     <t hangText="forwardshift:"> An unsigned integer can be specified
     as value.

     <vspace blankLines="1" />
     If this parameter is present with a value greater than '0', it
     indicates that the sender of this parameter will
     use forward shifting with a base value as specified when sending out
     redundant data. 
     <vspace blankLines="1" />
     If this parameter is absent or present with a value of '0', it
     indicates that the sender of this parameter will not use forward
     shifting when sending its redundant data, i.e., the sender
     will have the same behaviors as defined in RFC 2198.
 </t>
   </list>

   <t hangText="Encoding considerations:"> 
   <vspace blankLines="0" />
   This media type is framed binary data (see RFC 4288, Section 4.8)
   and is only defined for transfer of RTP redundant data frames 
   specified in RFC 2198.  </t>

   <t hangText="Security considerations:"> 
   See Section 6 "Security Considerations" of RFC 2198. </t> 

   <t hangText="Interoperability considerations:"> None. </t>

   <t hangText="Published specification:"> 
   <vspace blankLines="0" />
   RTP redundant data frame format is specified in RFC 2198. </t>

   <t hangText="Applications that use this media type:">
   <vspace blankLines="0" />
   It is expected that real-time audio/video applications that want
   protection against losses of a large number of consecutive frames
   will be interested in using this type. </t>

   <t hangText="Additional information:"> none </t>

   <t hangText="Person & email address to contact for further
		information:">
   <vspace blankLines="0" />
   Qiaobing Xie &lt;Qiaobing.Xie@motorola.com> </t>

   <t hangText="Intended usage:"> COMMON </t>

   <t hangText="Restrictions on usage:">
   <vspace blankLines="0" />
   This media type depends on RTP framing, and hence is only defined
   for transfer via RTP (RFC 3550 <xref target="RFC3550"/>).  Transfer
   within other framing protocols is not defined at this time.  </t>

   <t hangText="Author:">
   <vspace blankLines="0" />
   Qiaobing Xie </t>

   <t hangText="Change controller:">
   <vspace blankLines="0" />
   IETF Audio/Video Transport working group delegated from the IESG. </t>
</list>
</t>

</section> <!-- title="Updated Registration of Media Type red" -->

<section title="Mapping Media Type Parameters into SDP">

<t> The information carried in the MIME media type specification has a
specific mapping to fields in the Session Description Protocol (SDP)
<xref target="RFC4566" />, which is commonly used to describe RTP
sessions. When SDP is used to specify sessions employing the
forward-shifted redundant format, the mapping is as
follows: 

  <list style="symbols">
  <t> The media type ("audio") goes in SDP "m=" as the media name. </t>

  <t> The media subtype ("fwdred") goes in SDP "a=rtpmap" as the encoding
  name. </t> 

  <t> The optional parameter "forwardshift" goes in the SDP "a=fmtp"
  attribute by copying it directly from the MIME media type string as
  "forwardshift=value".  </t>

  </list>
</t>

<t> Example of usage of indicating forward-shifted (by 5.1 sec) redundancy:
   <figure>
      <artwork>
  m=audio 12345 RTP/AVP 121 0 5
  a=rtpmap:121 fwdred/8000/1
  a=fmtp:121 0/5 forwardshift=40800
      </artwork>
   </figure>
</t>

<t> Example of usage of indicating sending redundancy without
forward-shifting (equivalent to RFC 2198): 
   <figure>
      <artwork>
  m=audio 12345 RTP/AVP 121 0 5
  a=rtpmap:121 fwdred/8000/1
  a=fmtp:121 0/5 forwardshift=0
      </artwork>
   </figure>
</t>

</section> <!--title="Mapping MIME Parameters into SDP"-->

<section title="Usage in Offer/Answer">

<t> The optional "forwardshift" SDP parameter specified in this
document is declarative, and all reasonable values are expected to be
supported. </t>

</section> <!--title="Usage in Offer/Answer"-->

<section title="IANA Considerations">

<t> The registration of the new MIME subtype "fwdred", as described in
<xref target="fwdred-sect"/>, is required. </t>

</section> <!--title="IANA Considerations"-->


<section title="Security Considerations">

<t>See Section 6 "Security Considerations" of RFC 2198 
<xref target="RFC2198"/>.</t>

</section> <!--title="Security Considerations"-->


    </middle>



    <back>

<references title="Normative References">

    <reference anchor='RFC2119'>
    <front>
      <title>Key words for use in RFCs to Indicate Requirement
      Levels</title>
      <author initials='S.' surname='Bradner'></author>
      <date month='March' year='1997'></date>
    </front>
    <seriesInfo name='BCP' value='14' />
    <seriesInfo name='RFC' value='2119' />
    </reference>

    <reference anchor='RFC2198'>
    <front>
      <title>RTP Payload for Redundant Audio Data</title>
      <author initials='C.' surname='Perkins'></author>
      <author initials='I.' surname='Kouvelas'></author>
      <author initials='O.' surname='Hodson'></author>
      <author initials='V.' surname='Hardman'></author>
      <author initials='M.' surname='Handley'></author>
      <author initials='J.C.' surname='Bolot'></author>
      <author initials='A.' surname='Vega-Garcia'></author>
      <author initials='S.' surname='Fosse-Parisis'></author>
     <date month='September' year='1997'></date>
    </front>
    <seriesInfo name='RFC' value='2198' />
    </reference>

    <reference anchor='RFC3550'>
    <front>
      <title>RTP: A Transport Protocol for Real-Time
      Applications</title>
      <author initials='H.' surname='Schulzrinne'></author>
      <author initials='S.' surname='Casner'></author>
      <author initials='R.' surname='Frederick'></author>
      <author initials='V.' surname='Jacobson'></author>
      <date month='July' year='2003' />
    </front>
    <seriesInfo name='RFC' value='3550' />
    </reference>

   <reference anchor='RFC4566'>
    <front>
      <title>SDP: Session Description Protocol</title>
      <author initials='M.' surname='Handley'></author>
      <author initials='V.' surname='Jacobson'></author>
      <author initials='C.' surname='Perkins'></author>
      <date month='July' year='2006' />
    </front>
    <seriesInfo name='RFC' value='4566' />
    </reference>


</references>

<section anchor="ANTI_SHADOW" 
title="Anti-shadow Loss Concealment Using Forward-shifted Redundancy">

<t> It is not unusual in a wireless mobile communication environment
that the radio signal to a mobile wireless media receiver can be
temporarily blocked by tall buildings, mountains, tunnels, etc. for a
period of time. In other words, the receiver enters into a shadow of
the radio coverage. When the receiver is in such a shadow no new data
will be received. In some extreme cases, the receiver may have to spend
seconds or even tens of seconds in such a shadow.  </t>

<t> Without special design considerations to compensate the loss of
data due to shadowing, a mobile user may experience an unacceptable
level of service interruptions. And traditional redundant encoding
schemes (including RFC 2198 and most FEC schemes) are known to be
ineffective in dealing with such losses of consecutive frames.  </t>

<t> However, the employment of forward-shifted redundancy, in
combination with the anti-shadow loss concealment at the receiver, as
described here, can effectively prevent service interruptions due to
the effect of shadowing. </t>

<section title="Sender Side Operations">

<t> For anti-shadow loss management, the RTP sender simply adds a
forward-shifted redundant stream (called anti-shadow or AS stream)
while transmitting the primary media stream. The amount of
forward-shifting, which should remain constant for the duration of the
session, will determine the maximal length of shadows that can be
completely concealed at the receiver, as explained below.  </t>

<t> Except for the fast that it is forward-shifted relative to the
primary stream (i.e., the redundant data is sent ahead of the
corresponding primary data), the design decision and trade-offs on the
quality, encoding, bandwidth overhead, etc. of the redundant stream is
not different from the traditional RTP payload redundant scheme. </t>

<t> The following diagram illustrates a segment of the transmission
sequence of a forward-shifted redundant RTP session, in which the AS
stream is forward-shifted by 155 frames. If, for simplicity here, we
assume the value of timestamp is incremented by 1 between two
consecutive frames, this forward-shifted redundancy can then be
indicated with:
   <figure>
      <artwork>
    forwardshift=155
     </artwork>
  </figure>
and the setting of timestamp offset to 0 in all the additional
headers. This can mean a 3.1 second of forward shifting if each frame
represents 20 ms of original media,

   <figure>
      <artwork>
                   Primary stream    AS stream

       ...               |                |
                         v                v
       Pkt k+8        [ 111 ]          [ 266 ]
                         |                |
                         v                v
       Pkt k+7        [ 110 ]          [ 265 ]
                         |                |
                         v                v
   ^   Pkt k+6        [ 109 ]          [ 264 ]
   |                     |                |
   |                     v                v
       Pkt k+5        [ 108 ]          [ 263 ]
   T                     |                |
   I                     v                v
   M   Pkt k+4        [ 107 ]          [ 262 ]
   E                     |                |
                         v                v
       Pkt k+3        [ 106 ]          [ 261 ]
                         |                |
                         v                v
       Pkt k+2        [ 105 ]          [ 260 ]
                         |                |
                         v                v
       Pkt k+1        [ 104 ]          [ 259 ]
                         |                |
                         v                v
       Pkt k          [ 103 ]          [ 258 ]
                         |                |
                         v                v

                           Transmit first

   Figure 1. An example of forward-shifted redundant RTP packet
                           transmission.

     </artwork>
  </figure>
</t>

</section> <!-- title="Send media for anti-shadowing operation" -->

<section title="Receiver Side Operations">

<t> The anti-shadow receiver is illustrated in the following diagram. 

   <figure>
      <artwork>
                                               +---------+
                             normal mode   sw1 | media   |     media
  Primary stream ======================o___o==>| decoder |===> output
  AS stream     ----                           +---------+     device
                   |             AS mode o 
                   |       +---------+   |
                   |       | anti-   |   |
                   ------->| shadow  |----
                           | buffer  |
                           +---------+
                                |
                                V
                           expired frames
                           discarded

              Figure 2. Anti-shadow RTP receiver.
     </artwork>
  </figure>
The anti-shadow receiver operates between two modes - "normal mode"
and "AS mode".  When the receiver is not in a shadow (it can easily
tell that if it is still receiving new data), the receiver operates in
the normal mode. Otherwise, it operates in the AS mode.
</t>

<section title="Normal Mode Operation">

<t> In the normal mode, after receiving a new RTP packet that contains
the primary data and forward-shifted AS data, the receiver passes the
primary data directly to the appropriate media decoder for play-out (a
de-jittering buffer may be used before the play-out, but for
simplicity we assume none is used here), while the received AS data is
stored in an anti-shadow buffer. </t>

<t> Moreover, data stored in the anti-shadow buffer will be
continuously checked to determine whether it has expired. If a
redundant data in the anti-shadow buffer is found to have a timestamp
older (i.e., smaller) than that of the last primary frame passed to
the media decoder, it will be considered expired and be purged from
the anti-shadowing buffer.  </t>

<t> The following example illustrates the operation of the anti-shadow
buffer in normal mode. We use the same assumption as in Figure 1, and
assume that the initial timestamp value is 103 when the session
starts.

   <figure>
      <artwork>

          Timestamp     Timestamp
  Time      being      of media in
 (in ms)  played out    AS buffer         Note
------------------------------------------------------------------
  t < 0                 --             (buffer empty)
   ...
  t=0       103         258            (hold 1 AS frame)
  t=20      104         258-259        (hold 2 AS frames)
  t=40      105         258-260        (hold 3 AS frames)

   ...
  t=3080    257         258-412        (full, hold 154 AS frames)
  t=3100    258         259-413        (full, frame 258 purged)
  t=3120    259         260-414        (full, frame 259 purged)
   ...

  t=6240    415         416-570        (always holds 3.08 sec
                                        worth of redundant data)
   ...                                
                                   

 Figure 3. Example of anti-shadow buffer operation in normal mode.

     </artwork>
  </figure>
</t>


<t> At the beginning of the session (t=0), the anti-shadow buffer will
be empty. When the first primary frame is received, the play-out will
start immediately, and the first received AS frame is stored in the
anti-shadow buffer. And with the arriving of more forward-shifted
redundant frames, the anti-shadow buffer will gradually be filled up.
</t>

<t> For the example shown in Figure 1, after 3.08 seconds (the amount
of the forward-shifting minus one frame) from the start of the
session, the anti-shadow buffer will be full, holding exactly 3.08
seconds worth of redundant data, with the oldest frame corresponding
to t=3.1 sec and youngest frame corresponding to t=6.18 sec.  </t>

<t> And it is not difficult to see that in normal mode because of the
continuous purge of expired frames and the addition of new frames, the
anti-shadowing buffer will always be full holding the next
forward-shift amount of redundant frames.  </t>

</section> <!-- title="normal mode operation" -->


<section title="Anti-shadow Mode Operation">

<t> When the receiver enters a shadow (or any other conditions that
prevent the receiver from getting new media data), the receiver
switches to the anti-shadow mode, in which it simply continues the
play-out from the forward-shifted redundant data stored in the
anti-shadow buffer. </t>

<t> For the example in Figure 3, if the receiver enters a shadow at
t=3120, it can continue the play-out by using the forward-shifted
redundant frames (ts=260-414) from the anti-shadow buffer. As far as
the receiver can move out of the shadow by t=6240, there will be no
service interruption.  </t>

<t> When the shadow condition ends (meaning new data starts to arrive
again), the receiver immediately switches back to normal mode of
operation, playing out from newly arrived primary frames. And at the
same time, the arrival of new AS frames will start to re-fill the
anti-shadow buffer.  </t>

<t> However, if the duration of the shadow is longer than the
amount of forward-shifting, the receiver will run out of media
frames from its anti-shadow buffer. At that point, service interruption will
occur. </t>

<t> Anti-shadow loss concealment described above can be readily
applied to the streaming of pre-recorded media. Because of the need of
generating the forward-shifted anti-shadow redundant stream, to apply
anti-shadow loss concealment to the streaming of live media will
require the insertion of a delay equal to or greater than the amount
of forward-shifting at the source of media.  </t>

</section> <!-- title="AS operation" -->

</section> <!-- title="Receive media for anti-shadowing operation"-->

</section> <!--title="Anti-shadowing algorithm for "-->

    </back>

</rfc>
