Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Does Ten-Gigabit Ethernet need fault tolerance?




The purpose of this note is to present a case for inclusion of fault
tolerance in 10GbE, and to offer a suitable proven technology for
consideration.  However, no salesman will call.

Why Fault Tolerance?  Ten-Gigabit Ethernet is going to be a relatively
expensive, high-performance technology intended for major backbones,
perhaps even nibbling at the bottom end of the wide-area network (WAN)
market.  In such applications, high availability is very much desired; loss
of such a backbone or WAN is much too disruptive (and therefore expensive)
to be much tolerated, and this kind of a market will gladly pay a
reasonable premium to achieve the needed fault tolerance.

Why add Fault Tolerance now?  Because it's easiest (and thus cheapest) if
done from the start, and because having FT built in and therefore becoming
ubiquitous will be a competitive discriminator, neutralizing one of the
remaining claimed advantages of ATM.

Isn't Fault Tolerance difficult?  In hub-and-spoke (logical star, physical
loop) topologies such as GbE and10GbE, it's not hard to achieve both fault
tolerance (FT) and military-level damage tolerance (DT).  In networks of
unrestricted topology, it's a lot harder.  The presence of bridges does not
affect this conclusion.

How do I know that FT is so easily achieved?  Because it's already been
done, may be bought commercially, and is in use on one military system and
is proposed for others.  The FT/DT technology mentioned here was developed
on a US Navy project, and is publically available without intellectual
property restrictions.  Why was the technology made public?  To encourage
its adopotion and use in COTS products, so that defense contractors can buy
FT/DT lans from catalogs, rather than having to develop them again and
again, at great risk and expense.

What is the difference between Fault Tolerance and Damage Tolerance?  In
fault tolerance, faults are rare and do not correlate in either time or
place. The classic example is the random failure of hardware components.
(Small acts of damage, such as somebody tripping over a wire or breaking a
connector somewhere, are treated as faults here because they are also rare
and uncorrelated.) In damage tolerance, the individual faults are sharply
correlated in time and place, and are often massive in number. The classic
military example is a weapon strike. In the commercial world, a major power
failure is a good example. Damage tolerance is considered much harder to
accomplish than fault tolerance. If you have damage tolerance, you also
have fault tolerance, but fault tolerance does not by itself confer damage
tolerance.

How is this Damage Tolerance achieved?  All changes in LAN segment topology
(the loss or gain of nodes (NICs), hubs, or fibers) are detected in MAC
hardware by the many link receivers, which report both loss and acquisition
of modulated light. This surveillance occurs all the time on all links, and
is independent of data traffic. Any change in topology provokes the
hardware into "rostering mode", the automatic exploration of the segment
using a flood of special "roster" packets to find the best path, where
"best" is defined as that path which includes the maximum number of nodes
(NICs).

Just how fault tolerant and damage tolerant is this scheme?  A segment will
work properly with any number of nodes and hubs, if sufficient fibers
survive to connect them together, and will automatically configure itself
into a working segment within a millisecond of the last fault. If the
number of broken fibers is less than the number of hubs, all surviving
nodes will remain accessible, regardless of the fault pattern. If the
number of fiber breaks is equal to or greater than the number of hubs,
there is a simple equation to predict the probability of loss of access to
a typical node due to loss of hubs and/or fibers, given only the number of
hubs and the probability of any fiber breaking: Pnd[p,r]= ((2p)(1-p))^r,
where p is the probability of fiber breakage and r is the number of
surviving hubs (which ranges from zero to four in a quad system). This
equation is exact (to within 1%) for fiber breakage probabilities of 33% or
less, and applies for any number of hubs.

The simplicity of this equation is a consequence of the simplicity of this
protocol, which is currently implemented in standard-issue FPGAs (not
ASICs), and works without software intervention.  It can also be
implemented in firmware.

To give a numerical example, in a 33-node 4-hub segment, loss of 42 fibers
(16% of the segment's 264 fibers) would lead to only 0.5% of the nodes
becoming inaccessible, on average. Said another way, after 42 fiber breaks,
there are only five chances out of a thousand that a node will become
inaccessible. This is very heavy damage, with one fiber in six broken. To
take a more likely example, with three broken fibers, all nodes will be
accessible, and with four broken fibers, there is less than one chance in a
million that a node will become inaccessible. Recovery takes two ring tour
times plus settling time (electrical plus mechanical), typically less than
one millisecond in ship-size networks, measured from the last fault.
Chattering and/or intermittent faults can be handled by a number of
mechanisms, including delaying node entry by up to one second. Few current
LAN technologies approach this degree of resilience, or speed of recovery.

In commercial systems and some military systems, a dual-ring solution is
sufficient.  Up to quad-ring solutions are comercially available, needed
for some military systems.  However, the ability to support up to quad
redundant systems should be provided in 10GbE, for two reasons.  First,
quad is needed for the military market, which may be economically
significant in the early years of 10GbE.  Second, quad provides a clear
growth path and a way to reassure non-military customers that their most
stringent problems can be solved: One can ask them if their needs really
exceed those of warships duelling with supersonic missiles.

The basic technical document, the RTFC Principles of Operation, is on the
GbE website as "http://grouper.ieee.org/ groups/802/3/ 10G_study/public/
email_attach/ gwinn_1_0699.pdf" and "http://grouper.ieee.org/
groups/802/3/10G_study/ public/ email_attach/ gwinn_2_0699.pdf".   I was a
member of the team that developed the technology, and am the author of
these documents.

Although these documents assume RTFC, a form of distributed shared memory,
the basic rostering technology can easily be adapted for Gigabit and
Ten-Gigabit Ethernet as well.  For nontechnical reasons, RTFC originally
favored smart nodes connected via dumb hubs.  However, the overall design
can be somewhat simplified if one goes the other way, to dumb nodes and
smart hubs.  This also allows the same dumb nodes to be used in both non-FT
and FT networks, increasing node production volume, and does not force
users to throw nodes away to upgrade to FT.

I therefore would submit that 10GbE would greatly benefit from fault
tolerance, and also that it's very easily achieved if included in the
original design of 10GbE.

Joe Gwinn