<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">

<?rfc toc="yes"?>
<?rfc compact="yes" ?>
<?rfc subcompact="no" ?>
<?rfc strict="no" ?>
<?rfc symrefs="yes" ?>

<rfc category="std" ipr="full3978" docName="draft-dreibholz-rserpool-applic-distcomp-05.txt">

<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>


<front>

<title abbrev="RSerPool for Distributed Computing">
Applicability of Reliable Server Pooling for Real-Time Distributed Computing
</title>

<author initials="T." surname="Dreibholz" fullname="Thomas Dreibholz">
<organization abbrev="University of Duisburg-Essen">University of Duisburg-Essen, Institute for Experimental Mathematics</organization>
<address>
<postal>
   <street>Ellernstrasse 29</street>
   <city>45326 Essen</city>
   <region>Nordrhein-Westfalen</region>
   <country>Germany</country>
</postal>
<phone>+49-201-1837637</phone>
<facsimile>+49-201-1837673</facsimile>
<email>dreibh@iem.uni-due.de</email>
<uri>http://www.iem.uni-due.de/~dreibh/</uri>
</address>
</author>

<date day="11" month="July" year="2008" />
<keyword>Internet-Draft</keyword>

<abstract>
<t>This document describes the applicability of the Reliable Server
Pooling architecture to manage real-time distributed computing pools
and access the resources of such pools.</t>
</abstract>


</front>

<middle>


<section title="Introduction">
<t>Reliable Server Pooling defines protocols for providing highly
available services. The services are located in a pool of redundant
servers and if a server fails, another server will take over. The
only requirement put on these servers belonging to the pool is that if
state is maintained by the server, this state must be transferred
to the other server taking over.</t>

<t>The goal is to provide server-based redundancy. Transport and
network level redundancy are handled by the transport and network layer
protocols.</t>

<t>The application may choose to distribute its traffic over the servers
of the pool conforming to a certain policy.</t>


<section title="Scope">
<t>The scope of this document is to explain the way of using
Reliable Server Pooling mechanisms to manage and access pools
of Distributed Computing resources.</t>
</section>


<section title="Terminology">
<t>The terms are commonly identified in related work and can be found
in the Aggregate Server Access Protocol and Endpoint Handlespace Redundancy
Protocol Common Parameters document
<xref target="I-D.ietf-rserpool-common-param">ietf-rserpool-common-param</xref>
</t>
</section>
</section>


<section title="Distributed Computing using RSerPool">

<section title="Requirements">

<t>The application scenario for Distributed Computing is defined as follows:</t>

<t><list style="symbols">
<t>Clients generate large computation jobs. Jobs have to be processed by
servers as soon as possible (real-time), i.e. unlike concepts like
SETI@home <xref target="SETIatHome" />, it is
not possible to let clients fetch a job, process it later and may be
some day upload the result.</t>

<t>Jobs may be partitionable, i.e. they can be split up to smaller pieces
which can be processed independently and the processing results can be
concatenated to the processing result of the complete job.
Jobs have to be processed by servers.</t>

<t>Servers may be unreliable; i.e. user computers may be temporarily added
to the pool of computing resources and may be revoked when they are used
again by their owners. Furthermore, they may simply disappear because of
broken network connections (modems, etc.) or power turned off.</t>

<t>The processing power of servers in a pool of computing resources may be
very heterogeneous, i.e. a few supercomputers and many low-end user PCs.</t>
</list></t>

Maintaining a Distributed Computing pool for the scenario described above
arises the following requirements to the pool management:

<t><list style="symbols">
<t>It must be possible to manage large server pools, e.g. up to some hundreds
or even thousands of servers.
</t>

<t>Due to heterogeneous processing resources within a pool, it
must be possible to use appropriate server selection procedures to
meaningfully utilize the available resources.</t>

<t>It must be possible to dynamically add and remove servers.</t>

<t>Servers may be unreliable, especially when the servers are represented by
user PCs. Failover mechanisms are required to continue an interrupted
computation session.</t>
</list></t>


<section title="Architecture">

All requirements for pool and session management of the Distributed Computing
scenario defined in the previous section can be fulfilled by the Reliable
Server Pooling architecture:

<t><list style="symbols">
<t>An efficient implementation of the handlespace management structures allows
pools to contain thousands of elements. Handlespace management structures
have been proposed, implemented and analyzed in
<xref target="Contel2005" />, <xref target="Dre2006" />.</t>

<t>RSerPool allows to specify server selection rules by pool member selection
policies <xref target="I-D.ietf-rserpool-policies" />. A set of adaptive and
non-adaptive policies is already defined.
To fulfill the requirements of new applications, it is also possible to define
new policies. Research has already been made on the subject of load
distribution efficiency of pool policies in Distributed Computing
scenarios: see
<xref target="LCN2005" />,
<xref target="Dre2006" />,
<xref target="Tencon2005" />,
<xref target="Euromicro2007" />,
<xref target="ICN2005" />
for details.</t>

<t>Dynamic addition and removal of PEs is a feature of RSerPool
<xref target="I-D.ietf-rserpool-asap" />.</t>

<t>The control/data channel concept
<xref target="I-D.ietf-rserpool-service" /> of RSerPool realizes a session
layer. That is, RSerPool already handles the main task of maintaining and
monitoring connections between PUs and PEs; the only task of the application
layer to provide full failover functionality is to realize an
application-dependent failover procedure. By the usage of client-based state
synchronization <xref target="LCN2002" />, <xref target="Euromicro2005" />
in the form of ASAP Cookies, a failover may be fully transparent to the PU
while only a state restoration is necessary on the PE side. A demo application
<xref target="RSerPoolPage" /> using the RSerPool session layer in a
Distributed Computing application is described in
<xref target="Infocom2005" />.</t>
</list></t>

</section>


<section title="Limitations">
<t>Applying RSerPool for distributed computing applications, the duties of the
RSerPool architecture are still limited to the management of pools and
independent sessions only. It is in particular a non-goal to provide
functionalities like data synchronization among sessions, user
authentication, accounting or the support for more than
one administrative domain. Such functionalities are considered to be
application-specific and are therefore out of the scope of RSerPool.</t>
</section>


<section title="Implementation">

<t>A proof of concept implementation of a Distributed Computing application
based on the RSerPool prototype rsplib can be found at
<xref target="RSerPoolPage" />. This system provides a fractal graphics
computation service; the failover procedure is handled by ASAP cookies.</t>

</section>

</section>
</section>


<section title="Security considerations">

<t>The protocols used in the Reliable Server Pooling architecture only
try to increase the availability of the servers in the
network. RSerPool protocols do not contain any protocol mechanisms
which are directly related to user message authentication, integrity
and confidentiality functions. For such features, it depends on the
IPSEC protocols or on Transport Layer Security (TLS) protocols for
its own security and on the architecture and/or security features
of its user protocols.</t>

<t>The RSerPool architecture allows the use of different transport
protocols for its application and control data exchange. These
transport protocols may have mechanisms for reducing the risk of
blind denial-of-service attacks and/or masquerade attacks. If such
measures are required by the applications, then it is advised to
check the SCTP applicability statement
<xref target="RFC3257">RFC3257</xref> for guidance on this issue.</t>

</section>


</middle>


<back>

<references title='Normative References'>
<?rfc include="reference.RFC.2960" ?>
<?rfc include="reference.RFC.3257" ?>
<?rfc include="reference.I-D.ietf-rserpool-arch" ?>
<?rfc include="reference.I-D.ietf-rserpool-asap" ?>
<?rfc include="reference.I-D.ietf-rserpool-enrp" ?>
<?rfc include="reference.I-D.ietf-rserpool-common-param" ?>
<?rfc include="reference.I-D.ietf-rserpool-policies" ?>
<?rfc include="reference.I-D.ietf-rserpool-service" ?>
<?rfc include="reference.RFC.3668" ?>
</references>

<references title='Informative References'>
<?rfc include="reference.RSerPoolPage" ?>
<?rfc include="reference.Dre2006" ?>
<?rfc include="reference.LCN2005" ?>
<?rfc include="reference.Tencon2005" ?>
<?rfc include="reference.LCN2002" ?>
<?rfc include="reference.Euromicro2005" ?>
<?rfc include="reference.Euromicro2007" ?>
<?rfc include="reference.ICN2005" ?>
<?rfc include="reference.Infocom2005" ?>
<?rfc include="reference.Contel2005" ?>
<?rfc include="reference.SETIatHome" ?>
</references>

</back>


</rfc>
