Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: Local Fault/Remote Fault



Stephen,

I believe that we are in general agreement. Please see my comments
below.

Stephen.Finch@xxxxxx wrote:
> 
> Rich,
> 
> As to your exception to 1) b), I have to disagree.
> 
> A PMA/PMD device can detect a loss of useful signal, thus a local fault.
> It can not generate a local fault as it may not know whether the bit
> sequence is that of a WIS stream or a non-WIS stream.  And even if it
> knew that bit of info, it would need a lot of unnecessary logic to
> generate a properly encoded LF.  Instead, such a device should output
> zeros (or ones).  The PCS device to which it was attached will quickly
> lose sync and start generating LFs.  Both devices would set their link
> status to 0, and station management should be able to figure it out
> and ignore the PCS devices error and assume that if the PMA/PMD
> problem is fixed then they would both set link status as good.

I have to apologize for doing the "ASS_U_ME" thing when I wrote my (1b)
response. I assumed that the "device" you had in mind was an integrated
device such as a XAUI, 64B/66B Framer or 64B/66B + WIS Framer. Upon
re-read, it's clear that you were including lower level devices such as
PMDs and PMAs. These devices clearly do not have the wherewithall to
generate fault messages unless integrated with higher sublayers. PMD and
PMA devices that are capable of detecting fault conditions simply
indicate those conditions via a signal to the higher sublayers, which in
turn may generate Fault messages. One Example of a PMD signal is
Loss_of_Signal. One Example of a PMA signal is Loss_of_Lock. The latter
type of information is useful. Having a PMA or PMD output all zeros or
all ones is not very informative.

> As to your comment on 2):  I don't think we have a disagreement.  When I
> wrote what I did, I didn't distinguish between an "error" and a "fault",
> I just ignored transient problems as "errors" and lumped hard problems
> as "faults". Given this (sort of) definition, transients are marked as
> bit/lane errors and the system keeps right on working, i.e., no LFs.
>  Only when things go real wrong, e.g., a loss of sync or lock, then we
> fall into the LFs.  I assume this is what you were
> driving at and I will agree.

That's it.

> I'm going to read Shimon's reply next and determine my course of action,
> if any.
> 
> Thanks for the reply and let me know if I miss understood you on
> anything.
> 
> Steve Finch
> 
> Rich Taborek <rtaborek@xxxxxxxxxxxxx> on 12/29/2000 01:33:55 PM
> 
> Please respond to rtaborek@xxxxxxxxxxxxx
> 
> To:   HSSG <stds-802-3-hssg@xxxxxxxx>
> cc:
> Subject:  Re: Local Fault/Remote Fault
> 
> Stephen,
> 
> I sympathize with you. Please note that most, if not all, of the Link
> Status architecture was developed and presented "on the fly" at the
> November meeting in Tampa. Subsequently, all the clause editors had only
> a couple of weeks to get this new architecture in the form of formal
> Clause documentation and ready for Task Group ballot. As a result, the
> consistency of the Link Status architecture in P802.3ae D2.0 is
> poor-to-mediocre at best.
> 
> That said, I want to thank you for taking the time to put your thoughts
> together because this is your opportunity to improve the overall quality
> of the P802.3ae document. I suggest that all the ideas below be arranged
> into several D2.0 comments which are tied together by a common thread:
> call it Link Status.
> 
> I've included some specific responses to your note below to help with
> comment generation. I would appreciate it very much if you can take
> ownership of this issue and follow through with comments on this issue.
> 
> Stephen.Finch@xxxxxx wrote:
> >
> > First, let me say that I participated in the definition of
> > Local Fault and Remote Fault as presented at the November
> > meeting so I think I understand what was intended.  The problem
> > I'm having is finding all of the pieces of that definition
> > in the draft so that those who didn't attend or see some
> > of the slides that were presented can understand.
> >
> > With that said, what I did was search through D2.0 looking
> > for Local Fault and LF to find all of the associated text.
> > I may have missed some.  What I found was incomplete and/or
> > confusing.
> >
> > Here are the concepts I think we need to communicate:
> >
> > 1.  Any device, in either its transmit or receive paths, could
> >     detect a fault condition.  The fault may be that the data
> >     being received is invalid or that some internal problem
> >     is causing the problem.  Some/many faults may go
> >     undetected.  If a device detects a fault condition (i.e.,
> >     a locally detected fault) it should set its link status
> >     to zero and not forward what is received, but should,
> >     at its output, either:
> >
> >   a.  generate a local fault pulse ordered set if it is
> >       capable of doing so,
> > or
> >   b.  generate all zeros or all ones, making it probable that
> >       the next device in the link will detect the problem and
> >       (hopefully) generate a local fault pulse ordered set.
> 
> I disagree with (b). The response to a detected fault condition should
> be consistent. It's just as easy to generate a local fault pulse
> ordered-set as it is to generate all zeros and ones. Generating multiple
> responses at a transmitter results in multiple interpretations at the
> receiver. (a) should be the only response to a detected fault. The (a)
> response only is implied in the accepted baseline proposal in
> taborek_2_1100.pdf.
> 
> > 2.  All devices not detecting fault conditions should forward
> >     whatever is received.  Local fault pulse ordered sets and
> >     remote fault pulse ordered sets may be generated by other
> >     devices and, when received, must be forwarded on.  With the
> >     exception of the RS layer, receiption of a Local fault
> >     ordered set or a remote fault ordered set must have no
> >     effect on the device receiving these pulse ordered sets.
> 
> This is not strictly true. Any fault condition will bring down the
> entire link. The link remains down until the fault condition abates. The
> Link Status protocol should protect against false fault detection
> conditions such as those caused by random bit or signal errors. A fault
> condition recognition process is implemented whereby a detected fault
> conditions are validated. A device which recognizes a fault condition
> essentially operates in "fault" mode rather than "normal data" mode. In
> this sense, the reception of a fault ordered-set DOES have impact on the
> device receiving these pulse ordered sets.
> 
> > 3.  The RS layer is where the Local Fault Pulse Ordered Set is
> >     processed.  The RS layer is the only place that a Remote
> >     Fault Pulse Ordered Set can be generated.  If an RS receives
> >     a Local Fault Pulse Ordered Set it must stop sending packets
> >     and begin sending alternating columns of Idles and Remote
> >     Fault Pulse Ordered Sets.  If an RS receives a Remote Fault
> >     Pulse Ordered Set, it must stop sending packets and send
> >     only Idles.
> 
> Correct.
> 
> > Devices detecting fault conditions set their link status to 0
> > and attempt to generate LF's (local fault ordered sets).  In some
> > cases, multiple devices may be detecting faults and attempt to
> > send LF's.
> 
> Correct.
> 
> > Station management can obtain each device's status and localize
> > the problem.
> 
> Correct.
> 
> > What I found in the standard is in the following clauses:
> >
> > 45.2.1.2.3
> > 45.2.2.1.7
> > 45.2.3.1.7
> > 45.2.4.2.3
> > 45.2.5.2.3
> > 46.2.5.1    (last paragraph)
> > 46.2.6
> > Table 46-4
> > 48.1.3.1
> > 48.2.2
> > 48.2.4.5 and 48.2.4.5.1
> > Figure 48-10
> > 48.2.5.4 and 48.2.5.4.1 and 48.2.5.4.2
> > 49.2.4.5
> > 49.2.11.1.1  (definition of LFRAME_R)
> > Figure 49-14 --> top state
> >
> > I don't think these "pieces" capture what we need.  In fact, the
> > inconsist usage of terms is confusing.  For example, what
> > does "detected a local fault signal on the inbound path" mean?
> 
> Loosely translated, inbound path is any devices receiver. Local fault
> signal could be a Loss_of_Signal, loss-of-sync, or local fault message.
> 
> > I think we need some standardized terms used through out.
> > And I think we need a basic description (better written than what
> > I did above) place somewhere in the intro and not in one of
> > the "component" pieces where it could be missed by others.
> >
> > Before I start on what I think should be done, I'd like confirmation
> > that my description above is correct.  I'll then start on my proposed
> > fixes.
> 
> Go for it!
> 
> > Steve Finch

-- 

Happy Holidays,
Rich

------------------------------------------------------- 
Richard Taborek Sr.                 Phone: 408-845-6102       
Chief Technology Officer             Cell: 408-832-3957
nSerial Corporation                   Fax: 408-845-6114
2500-5 Augustine Dr.        mailto:rtaborek@xxxxxxxxxxx
Santa Clara, CA 95054            http://www.nSerial.com