Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: CRC check indication of bad fiber




I can think of several problems with using CRC errors as an indication
of bad fiber. The intent of this note is to show how the PHY layer may
be much more adept at making "bad media" determinations:

1) The rate of CRC error is traffic dependent since a CRC is only
contained within an Ethernet packet. If the traffic rate is very low,
the rate of CRC errors may be lower still;

2) Management entities counting CRC errors may not easily associate CRC
errors with specific links. A link may consist of a multiple link
segments. Each segment may contain "good" or "bad" media and even
different media types due to media conversion. In addition, different
BER's are associated with different media. All this makes it very
difficult to detect a bad fiber with CRC errors alone.

GbE employs 8B/10B coding. Please note that I'm not promoting 8B/10B
coding for 10 GbE since MAS uses is own more efficient coding. However
8B/10B has some great error detection capabilities whose usefulness
cannot be ignored at a time when the HSSG will strongly consider other
coding alternatives.

8B/10B-based links provide a very efficient way of detecting "bad"
fibers. When coupled with an intelligent transceiver, such as a GBIC,
and a management entity at each link end, it is straightforward to
assess the BER of individual link segment from each link end, and even
sound an alarm when a preset BER threshold is reached for a particular
media type (useful to low BER fiber and higher BER copper-based
systems).

One trick is necessitated by the fact that certain types of bit errors
may result in multiple code-group errors. An effective 8B/10B-based BER
monitoring facility should ideally count multiple associated errors as a
single error event. One simple way to do this is to count the first
error and then ignore errors for a specific time interval.

The short proof of 8B/10B error detection capabilities is that no single
error and very few types of multiple errors can escape detection by
8B/10B transmission code. The few error patterns which can remain
undetected are unlikely candidates for systematic detection failure. In
addition, so long as continuously-enabled code-group alignment (hot
sync) is not employed, a misaligned comma (present in Idles and Config)
will always generate a code violation. 

I have the long-proof in my back pocket in a memo on the subject of link
performance monitoring written by Al Widmer of IBM, co-inventor of the
8B/10B code. You don't REALLY want to see this... do you???

Best Regards,
Rich

------------------------------------------------------------- 
Richard Taborek Sr.    Tel: 650 210 8800 x101 or 408 370 9233       
Principal Architect         Fax: 650 940 1898 or 408 374 3645
Transcendata, Inc.           Email: rtaborek@xxxxxxxxxxxxxxxx 
1029 Corporation Way              http://www.transcendata.com 
Palo Alto, CA 94303-4305    Alt email: rtaborek@xxxxxxxxxxxxx

--

Drew Perkins wrote:
> 
> I assume that that packets with bad CRCs will be dropped, not forwarded. I
> realize that this assumption may be incorrect since I haven't been involved
> with Ethernet bridging/switching in a while. Certainly, if packets are
> forwarded at layer 3 they'll either be discarded because of a bad CRC, or a
> new, correct CRC will be generated. I also assume that any reasonable
> carrier-class switch has internal failure detection capabilities so that an
> internal switching failure will be detected and not attributed to the link.
> Also that most source failures will be detected as well. Otherwise, it is
> always the case, even with SONET, that a source failure can be attributed to
> the link if it is not easily detectable. Thus, I don't understand the need
> for any additional mechanisms.
> 
> Drew
> ---------------------------------------------------------
> Ciena Corporation                 Email: ddp@xxxxxxxxxxxx
> Core Switching Division                 Tel: 408-865-6202
> 10201 Bubb Road                         Fax: 408-865-6291
> Cupertino, CA 95014              Cell/Pager: 408-829-8298
> 
> -----Original Message-----
> From: owner-stds-802-3-hssg@xxxxxxxxxxxxxxxxxx
> [mailto:owner-stds-802-3-hssg@xxxxxxxxxxxxxxxxxx]On Behalf Of Michael M.
> Salzman
> Sent: Friday, May 21, 1999 12:36 AM
> To: stds-802-3-hssg@xxxxxxxxxxxxxxxxxx
> Subject: RE: CRC check indication of bad fiber
> 
> Drew and Mick,
> Your notion is interesting.  Certainly bad packet indications can be useful
> measurements of quality, but of what?  As Mick says, the architecture and
> the coding scheme convolve these measures such that they become almost
> irrelevant.
> 
> In todays highly switched, non-shared network links, CRC errors are more
> likely to indicate a problem with the source or switching equipment then
> with the link.  The coding scheme may or may not yield a run of errors based
> on the CRC error, depending on where in the process the error took place
> (before or after the coding).  Furthermore code errors are usually detected
> up front in the PHY process and may (if so designed) issue signals to the
> MAC layer about the channel.  Usually these channel indications are not
> gradual and graceful, on the contrary they are usually a foggy statement
> that the channel is not useable.  Why foggy?  Because they need to be
> resolved against other indications in order to settle the proximate cause of
> failure.  For example a sync loss in an 8B10B system can be an indication of
> complete and abrupt link failure, or merely a bit error in the 10B symbol.
> If the latter, then the system will quickly recover sync at the next comma,
> but by that time the Phy state machine will take the link down, and the Mac
> will have to restart it.
> 
> So, we come back to the issue that to verify the link operation, we have two
> options.  If we want to do it dynamically, while the link is running, we
> have to inject some kind of secondary channel or test packets and measure
> them.  Or we can do it only upon link startup, at the full 10Gig rate for
> some few seconds or minutes and assess the quality of the link at that time
> using some clever patterns.  Beyond that level of built in testing, we can
> employ specific test equipment.
> 
> > -----Original Message-----
> > From: owner-stds-802-3-hssg@xxxxxxxxxxxxxxxxxx
> > [mailto:owner-stds-802-3-hssg@xxxxxxxxxxxxxxxxxx]On Behalf Of Mick
> > Seaman
> > Sent: Thursday, May 20, 1999 14:11
> > To: 'stds-802-3-hssg@xxxxxxxxxxxxxxxxxx'
> > Subject: CRC check indication of bad fiber
> >
> >
> >
> > My understanding is, that if the data is scrambled, one physical bit error
> > can be turned into a large number (order of the scrambling polynomial?)
> > data bit errors.
> >
> > This could much reduce the degree of error checking/protection provided by
> > the CRC, presuming that to be calculated over the data bits
> > rather than the
> > physically transmitted bits and to remain the same 'end to end' where in
> > this case I mean that much interpreted term to mean over a number
> > of number
> > of .1D bridges or a number of repeaters (are full-duplex repeaters capable
> > of all potential media type conversions with the different transmission
> > arrrangements under discussion?).
> >
> > So it may be necessary to add extra information to the frame (or
> > to a set of
> > frames) to guard against bad fibers (?).
> >
> > Mick
> >
> > > -----Original Message-----
> > > From:       Chang, Edward S [SMTP:Edward.Chang@xxxxxxxxxx]
> > > Sent:       Thursday, May 20, 1999 1:58 PM
> > > To: Drew Perkins; 'msalzman@xxxxxxxxxx';
> > > 'stds-802-3-hssg@xxxxxxxxxxxxxxxxxx'
> > > Subject:    RE: IEEE 802.3 Requirements
> > >
> > >
> > > Drew:
> > >
> > > Yes, at MAC level, the CRC of the whole packet is checked; therefore, it
> > > can
> > > be used for indication of bad fibers.  However, we have to differentiate
> > > the
> > > normal read errors from the persistent errors generated by a bad fibers.
> > > Any way, it can be done.
> > >
> > > Ed Chang
> > > Unisys Corporation
> > >
> > > -----Original Message-----
> > > From: Drew Perkins [mailto:drew.perkins@xxxxxxxxxxxx]
> > > Sent: Thursday, May 20, 1999 2:32 PM
> > > To: 'msalzman@xxxxxxxxxx'; 'stds-802-3-hssg@xxxxxxxxxxxxxxxxxx'
> > > Subject: RE: IEEE 802.3 Requirements
> > >
> > >
> > >
> > > Is there any reason that the Ethernet CRC wouldn't make a
> > pretty darn good
> > > error detection mechanism?
> > >
> > > Drew
> > > ---------------------------------------------------------
> > > Ciena Corporation                 Email: ddp@xxxxxxxxxxxx
> > > Core Switching Division                 Tel: 408-865-6202
> > > 10201 Bubb Road                         Fax: 408-865-6291
> > > Cupertino, CA 95014              Cell/Pager: 408-829-8298
> > >
> > >
> > > -----Original Message-----
> > > From: owner-stds-802-3-hssg@xxxxxxxxxxxxxxxxxx
> > > [mailto:owner-stds-802-3-hssg@xxxxxxxxxxxxxxxxxx]On Behalf Of Michael M.
> > > Salzman
> > > Sent: Wednesday, May 19, 1999 9:28 PM
> > > To: stds-802-3-hssg@xxxxxxxxxxxxxxxxxx
> > > Subject: RE: IEEE 802.3 Requirements
> > >
> > >
> > >
> > > Hi Ed, comments offered below on your ideas.
> > >
> > > > -----Original Message-----
> > > > From: owner-stds-802-3-hssg@xxxxxxxxxxxxxxxxxx
> > > > [mailto:owner-stds-802-3-hssg@xxxxxxxxxxxxxxxxxx]On Behalf Of Chang,
> > > > Edward S
> > > > Sent: Wednesday, May 19, 1999 10:44
> > > > To: Bruce_Tolley@xxxxxxxx; msalzman@xxxxxxxxxx
> > > > Cc: stds-802-3-hssg@xxxxxxxxxxxxxxxxxx
> > > > Subject: RE: IEEE 802.3 Requirements
> > > >
> > > > First of all, all datacom equipment have built-in error-check routines
> > > to
> > > > count the number of retries with a given client.  When that number
> > > reaches
> > > > the preset "water-level", it will give up retry and report
> > the problem.
> > > > These mechanisms are already in place, and we do not need to
> > reinvent or
> > > > re-invest.  We may modify the error check routines to fit our purpose.
> > >
> > > Ed,  802.3 does not have access to any retry counters of any
> > higher level
> > > protocols.  Furthermore, not all protocols rely on retries.
> > The MAC layer
> > > has access only to its own activities, which include send
> > packet attempts.
> > > In a full duplex configuration the send attempts are always successful,
> > > unless the entire layer fails.  The only way to measure a live
> > error count
> > > is to run some kind of OAM channel and to pass test frames over it.
> > > Furthermore, some coding schemes, can give abrupt indication of
> > sync loss.
> > > In summary, at the MAC layer, it is difficult to assess channel
> > > deterioration.
> > >
> > > A practical approach is to detect link failure, shut it down,
> > and then do
> > > link acquisition which includes link quality testing at full
> > rate, and to
> > > then either declare the link dead or alive.  That's roughly what is done
> > > in
> > > 1GE and we can improve upon it for 10GE, or we can add optional
> > > improvements
> > > for, say, MAN applications.
> > >
> > > >
> > > > Second, we, TIA FO2.2, have studied many of the MM fibers in industry
> > > with
> > > > varieties of launch conditions; therefore, we should be able to
> > > > come up with
> > > > a realistic, optimized cable plant design to drastically improve the
> > > > performance, and at the same time, reject those DMD, or bad fibers.
> > > > Remember, those are defected fibers, which is wrong to be in
> > the market
> > > in
> > > > the first place.  We have pretty good idea how it may shape up.
> > > > We are not
> > > > talking taking-chance with ignorance, or try-and-error.  We all
> > > > have product
> > > > responsibility; as a result, reliability and customer satisfaction are
> > > > always the first priority.  It implies that the rejection ratio is in
> > > the
> > > > minimum, limited to those DMD and bad fibers only.
> > >
> > > Ed, I am not sure what you are suggesting.  Perhaps you can offer a
> > > presentation on this idea in the meeting.
> > >
> > > Mike.
> >