Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: Does Ten-Gigabit Ethernet need fault tolerance?





Gents, 

Please forgive my ignorance, but are we talking here about redundancy
and where to put it, or a fault tolerant scheme requiring some
intelligence, state machines, etc.  The latter seems too complex for the
discussions that I'm hearing (reading?) lately, likely well out of the
desired scope of the HSSG, but worthy of discussion elsewhere.  The
former seems to be in the scope of the system designers, not this group. 

From a system design standpoint, two redundant links would use two
PHYs.  These could potentially be in the same package -- given that
multiport versions of PHYs always seem to follow single port as soon as
is feasable, I don't think that a custom part would be required.  An
exceedingly simple (100 gates???) circuit would detect the link down in
the master and mux in the slave, and report this to a CPU.  Any PHY
vendor should be happy to include a few extra gates to make it easy to
use two ports in a redundant fashion like this.  This would be for those
that prefer a standby link with fast switchover.  

For those that prefer to *use* their redundant link while both are
functioning, and are willing to tolerate a slightly longer interruption
in traffic, the link aggregation and rapid reconfiguration standards
should handle their needs nicely.  A system designed for link
aggregation shouldn't require much modification to support using one
link as a layer one standby for another.   

Given the competitive nature of the Ethernet market, I suspect that PHY
and system vendors will jump at the chance to sell two ports per network
link in an effort to move failover from layer 3 down to layer 1,
assuming that purchasers are willing to put their money where... well,
you know the ending.  

The only reason to standardize failover behavior, that I can see, would
be if it involved a scheme more like the military class solution from
the original poster.  My personal opinion would be that if this were to
happen in the IEEE, it should be done using a separate standard which
inserts a layer between the MAC and the physical interface, not in the
interface itself.  This doesn't appreciably affect the performance of
the failover, as link aggregation may, and it cleanly fits it into the
IEEE architecture. 

-Simon L. Sabato
-Level One Communications

P.S. Isn't it ironic that the higher layer protocols are being labelled
as inadequate for fault tolerance -- wasn't IP itself based in
government research to build a network capable of withstanding massive
(i.e. multiple nuclear strike) amounts of damage?  No flames, please, I
was just pointing out the irony, not commenting on the validity of the
opinion... times change and what was acceptable for the military then
may well be inadequate for a telecom provider today. 


Roy Bynum wrote:
> 
> Shawn,
> 
> Actually, it is a lot simpler at the MAC.  All of the fault/TCE (threshold
> crossing event) sensing should be done in PHY, beneath the GMII.
> 
> For implementations that fault tolerance is desired, a PHY chip set can be used
> that will provide the required protection.  This would be used primarily in
> extended LAN and MAN/WAN implementations.
> 
> For implementations that do not need fault tolerance, a non-protected PHY can be
> used.  This would work for server farms and other very localized installations.
> 
> Thank you,
> Roy Bynum
> MCI WorldCom
> 
> "Rogers, Shawn" wrote:
> 
> > Okay, I see now.  I've seen this referred to as redundancy or redundant
> > links.  At the PHY you're adding a Tx differential pair, Rx differential
> > pair, two power/ground pairs and a mux control pin for every port (9 pins
> > for serial/ 36 pins for WWDM).  The MAC or higher layer function controls
> > the MUX pin sense.  Pretty simple but it does add pins and power to the
> > transceiver.  There has been some interest shown in adding this to 1GbE
> > transceivers.
> >
> > I'm not knowledgeable about what complexity it adds to MAC nor am I
> > knowledgeable about the response time from Loss of Signal (LOS) to Link up
> > after MUX toggling, Rx PLL re-lock.  Both of these are very important to
> > providing a viable solution.
> >
> > This does pose some questions for WWDM solutions similar to those raised in
> > Austin:
> > What if one of the four lambdas goes down?  Do you have redundancy for
> > lambda's or for the entire port?  My guess is for the entire port, meaning
> > that if one lambda were lost, then the entire port would be switched to the
> > redundant link.
> >
> > Shawn
> >
> > -----Original Message-----
> > From: Roy Bynum [mailto:rabynum@xxxxxxxxxxx]
> > Sent: Monday, July 19, 1999 6:59 AM
> > To: Rogers, Shawn
> > Cc: stds-802-3-hssg@xxxxxxxx
> > Subject: Re: Does Ten-Gigabit Ethernet need fault tolerance?
> >
> > Shawn,
> >
> > I use the term "L1 restoration link services" to refer to the generic
> > process by
> > which the L1 processes of SONET/SDH does "protection" switching.  SONET/SDH
> > do
> > this without requiring instruction from upper layers.  I will attempt to
> > give a
> > simple explanation of the service process of one of the architectural
> > implementations.
> >
> > SONET/SDH monitors the operable status of alternate links, "working" and
> > "protection".  SONET/SDH "sends" traffic over both links, but only receives
> > it
> > on the "working" link.  SONET/SDH monitors for L0/L1 fault and other
> > threshold
> > crossing "events".  When SONET/SDH detects an "event" it moves the reception
> > of
> > traffic from the "working" to the "protection" link.  SONET/SDH then sends
> > an
> > alarm message to upper layer monitoring applications.
> >
> > With the exception of the upper layer monitoring applications, most of this
> > is
> > done in the SONET/SDH chip set.  Because it is done directly in the chip
> > set, at
> > L1, it is very fast.  This is as close to true "fault tolerance" as can be
> > achieved by a communications process.  Not upper layer processing is
> > required by
> > SONET/SDH.  The speed that SONET/SDH does protection switching can not be
> > duplicated by upper layer applications.  I do not consider the fault
> > restoration
> > processes of upper layer protocols to be "fault tolerant".  By definition
> > and
> > comparison, they can not be.   At best, upper layer processes are "fault
> > acceptable".
> >
> > When we talk about fault tolerance for 802.3, we need to think of it as
> > operating at the PHY level.  In many ways, fibers/wavelengths are "cheap".
> > Inside buildings and in WAN/MAN areas of the country they will be becoming
> > even
> > "cheaper".  Using the chip set to implement the process makes L1 restoration
> > link services "cheap" and simple to implement.
> >
> > Thank you,
> > Roy Bynum
> > MCI WorldCom
> >
> > "Rogers, Shawn" wrote:
> >
> > > Roy, sorry but I blinked and missed a tread.  What is L1 restoration link
> > > services?
> > > Shawn
> > >
> > > -----Original Message-----
> > > From: Roy Bynum [mailto:rabynum@xxxxxxxxxxx]
> > > Sent: Sunday, July 18, 1999 6:02 PM
> > > To: Joe Gwinn
> > > Cc: stds-802-3-hssg@xxxxxxxx
> > > Subject: Re: Does Ten-Gigabit Ethernet need fault tolerance?
> > >
> > > Joe,
> > >
> > > I have a question?  Your RTFC flooding algorithm sounds a lot like the
> > > restoration
> > > algorithms that have been proposed for DXC traffic restoration in the
> > > telephony
> > > industry.  These have not been used because of the complexity of
> > constraints
> > > introduced in designing the architecture.  This might work in very simple
> > > systems,
> > > but I have reservations about successful implementations in large, high
> > > bandwidth,
> > > enterprise networks.
> > >
> > > In doing economic modeling, it was discovered that using simple L1
> > > working/protect
> > > restoration link services actually cost less to implement in large complex
> > > architectures.  Nodal failure is more often better handled at higher
> > layers.
> > >
> > > In addition, simple L1 restoration implementations are easier to maintain
> > > for
> > > not-so-technical people.  Part of the reason for the success of 802.3 is
> > the
> > > simplicity of maintenance.  The success of 10GbE will be dependent on a
> > > continuing
> > > of that simplicity.
> > >
> > > Thank you,
> > > Roy Bynum
> > > MCI WorldCom
> > >
> > > Joe Gwinn wrote:
> > >
> > > > Jonathan,
> > > >
> > > > At 2:22 PM 99/7/16, Jonathan Thatcher wrote:
> > > > >
> > > > >A question and a suggestion:
> > > > >
> > > > >1. Are you suggesting that Fault Tolerence is a requirement for 10 Gig
> > > > >Ethernet or for all Ethernet?  Or, if FT is added to 10Gig Ethernet, is
> > > it of
> > > > >any particular value if 10, 100, and 1000 BASE-* don't have it?
> > > >
> > > > I am suggesting fault tolerance as an optional enhancement for 10GbE
> > only,
> > > > mainly because it's early enough in 10GbE's standards development
> > timeline
> > > > that FT could be included without pain, if the committee so desires.
> > > >
> > > > Another reason is that I would like to be able to buy FT/DT 10GbE
> > products
> > > > a few years from now, for use in military systems.  If you recall from
> > the
> > > > London GbE meeting, I intended to suggest this FT technology to GbE, but
> > > > the technology couldn't be released in time, and so missed the GbE
> > > > standards train.
> > > >
> > > > As for the other ethernet standards, I propose nothing, although there
> > is
> > > > no reason that they could not also take advantage of the offered FT
> > > > technology, should they so desire.
> > > >
> > > > The RTFC technology allows some segments of an overall network to be FT,
> > > > and does not require all to be FT, so there is no reason for an
> > > all-or-none
> > > > approach.  In a network containing multiple FT segments, the segments
> > > react
> > > > to changes and roster independently of one another.
> > > >
> > > > >A1: If all Ethernet: you should ask for a call for interest in 802.3
> > and
> > > > >bring presentations supporting the requirement (5 criteria, etc).
> > > > >
> > > > >A2: If only 10 Gig Ethernet: you should bring a presentation supporting
> > > the
> > > > >requirment to the next 802.3 HSSG meeting.  Expect questions about how
> > > this
> > > > >supports the 5 criteria. Expect questions about why only 10 Gig
> > Ethernet.
> > > >
> > > > The famous 5 criteria, lifted from slide 15 of thatcher_1_0399.pdf:
> > > >
> > > > 3.4.1. Broad Market Potential  -- FT is already in ATM/SONET, 802.3ad,
> > > > Rapid Reconfiguration in 802.1, etc, so there seems to be preexisting
> > wide
> > > > agreement that fault tolerance is desirable and has a sufficiently broad
> > > > market potential.
> > > >
> > > > 3.4.2. Compatibility with IEEE Standard 802.3 -- Based on my experience
> > > > with 802.3z, I believe the offered technology is compatible, but the
> > > > committee is the expert here.
> > > >
> > > > Some facts:  The current RTFC implementations use standard TriQuint
> > > > Fibre-Channel parts and Finisar optical transceivers for the gigabit
> > > links,
> > > > plus some code in a standard-issue FPGA.  Only Fibre Channel layers FC-0
> > > > and part of FC-1 are used, just as GbE does (although the details of use
> > > of
> > > > FC-1 differs).  The network segments (containing NICs, hubs, and fibers)
> > > > are either in "data mode" (with normal lan traffic), or in "rostering
> > > mode"
> > > > (where the new roster of NICs, hubs, and fibers are configuring
> > themselves
> > > > into a working segment), and the protocols used in those two modes are
> > > > wholly independent of one another.  This is detailed in RTFC Principles
> > of
> > > > Operation.
> > > >
> > > > 3.4.3. Distinct Identity -- No problem.  No other fault and damage
> > > > tolerance algorithm works this way, and thus confers unique advantages.
> > > > For one, the technology is noticably simpler than all other FT
> > > technologies
> > > > I am aware of, and is a whole lot more robust (in that it also supports
> > > > DT).
> > > >
> > > > Perhaps the key difference between this and other message-based fault
> > > > distributed tolerance schemes is that all other schemes attempted to be
> > > > stingy with mesages (because they are expensive in most distributed
> > > > systems), while RTFC is a flooding protocol with just enough population
> > > > control to prevent network saturation.  The use of flooding allowed a
> > > > radical simplification of the algorithm, and the implementation of true
> > > > damage tolerance rather than just fault tolerance.
> > > >
> > > > 3.4.4. Technical Feasibility -- It has been implemented, and is in use
> > in
> > > a
> > > > military application, with others under consideration.
> > > >
> > > > 3.4.5. Economic Feasibility -- It has been implemented, and the
> > algorithm
> > > > is quite simple, as detailed in RTFC Principles of Operation.  We are
> > > > basically talking about making a gate array slightly larger in those
> > hubs
> > > > supporting fault tolerance (for which one can charge extra).
> > > >
> > > > I guess the only requirement, in the sense that all of 10GbE would have
> > to
> > > > follow it, is for the NICs to do their part in rostering, a simple task
> > > > easily buried in the NIC's state machines.  The rest is for an optional
> > > > variety of hub where one does the rest of the rostering algorithm.
> > > >
> > > > If by "requirment" you mean only a one-liner like "10GbE shall support
> > > > Fault Tolerance", it wouldn't be much of a presentation.  I doubt that
> > > > anyone will argue that fault tolerance is undesirable; their question
> > will
> > > > be "At what price?".  I claim the price is small, and the payoff large.
> > > In
> > > > the final analysis, the matter will turn on how hard it is to implement
> > > the
> > > > algorithm, a matter of details.
> > > >
> > > > I don't know that I will be able to attend many meetings, so I won't be
> > a
> > > > very active proponent of my own technology.  As I said before, no
> > salesman
> > > > will call.  But email is another matter.
> > > >
> > > > More to the point, a few brave souls will no doubt read the RTFC
> > > Principles
> > > > of Operation, and if they think that there is something there that 10GbE
> > > > either wants or needs, and the rest of the committee comes to agree, the
> > > > technology will find its way into GbE.  Otherwise, it won't.  How else
> > > > could it be?
> > > >
> > > > Basically, this technology is a gift, yours if you wish it.  I feel it
> > is
> > > > of great value to 10GbE, and will be very interested to know what people
> > > > think after they have had time to absorb the core of the technology, and
> > > to
> > > > see the implications.
> > > >
> > > > Joe
> > > >
> > > > PS:  I'll be on travel, to an unrelated standards meeting, the week
> > 19-23
> > > > July 1999.
> > > >
> > > > >> -----Original Message-----
> > > > >> From: gwinn@xxxxxxxxxx [mailto:gwinn@xxxxxxxxxx]
> > > > >> Sent: Friday, July 16, 1999 2:15 PM
> > > > >> To: stds-802-3-hssg@xxxxxxxx
> > > > >> Subject: Does Ten-Gigabit Ethernet need fault tolerance?
> > > > >>
> > > > >>
> > > > >>
> > > > >> The purpose of this note is to present a case for inclusion of fault
> > > > >> tolerance in 10GbE, and to offer a suitable proven technology for
> > > > >> consideration.  However, no salesman will call.
> > > > >>
> > > > [snip]
> > > > >The basic technical document, the RTFC Principles of Operation, is on
> > the
> > > > >GbE website as "http://grouper.ieee.org/ groups/802/3/
> > 10G_study/public/
> > > > >email_attach/ gwinn_1_0699.pdf" and "http://grouper.ieee.org/
> > > > >groups/802/3/10G_study/ public/ email_attach/ gwinn_2_0699.pdf".