Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: Local Fault/Remote Fault







Rich,

As to your exception to 1) b), I have to disagree.

A PMA/PMD device can detect a loss of useful signal, thus a local fault.
It can not generate a local fault as it may not know whether the bit
sequence
is that of a WIS stream or a non-WIS stream.  And even if it knew that
bit of
info, it would need a lot of unnecessary logic to generate a properly
encoded
LF.  Instead, such a device should output zeros (or ones).  The PCS
device
to which it was attached will quickly lose sync and start generating
LFs.  Both
devices would set their link status to 0, and station management should
be
able to figure it out and ignore the PCS devices error and assume that
if the
PMA/PMD problem is fixed then they would both set link status as good.

As to your comment on 2):  I don't think we have a disagreement.  When I
wrote
what I did, I didn't distinguish between an "error" and a "fault", I
just ignored
transient problems as "errors" and lumped hard problems as "faults".
Given
this (sort of) definition, transients are marked as bit/lane errors and
the system
keeps right on working, i.e., no LFs.  Only when things go real wrong,
e.g., a
loss of sync or lock, then we fall into the LFs.  I assume this is what
you were
driving at and I will agree.

I'm going to read Shimon's reply next and determine my course of action,
if any.

Thanks for the reply and let me know if I miss understood you on
anything.

Steve Finch





Rich Taborek <rtaborek@xxxxxxxxxxxxx> on 12/29/2000 01:33:55 PM

Please respond to rtaborek@xxxxxxxxxxxxx

To:   HSSG <stds-802-3-hssg@xxxxxxxx>
cc:
Subject:  Re: Local Fault/Remote Fault





Stephen,

I sympathize with you. Please note that most, if not all, of the Link
Status architecture was developed and presented "on the fly" at the
November meeting in Tampa. Subsequently, all the clause editors had only
a couple of weeks to get this new architecture in the form of formal
Clause documentation and ready for Task Group ballot. As a result, the
consistency of the Link Status architecture in P802.3ae D2.0 is
poor-to-mediocre at best.

That said, I want to thank you for taking the time to put your thoughts
together because this is your opportunity to improve the overall quality
of the P802.3ae document. I suggest that all the ideas below be arranged
into several D2.0 comments which are tied together by a common thread:
call it Link Status.

I've included some specific responses to your note below to help with
comment generation. I would appreciate it very much if you can take
ownership of this issue and follow through with comments on this issue.

Stephen.Finch@xxxxxx wrote:
>
> First, let me say that I participated in the definition of
> Local Fault and Remote Fault as presented at the November
> meeting so I think I understand what was intended.  The problem
> I'm having is finding all of the pieces of that definition
> in the draft so that those who didn't attend or see some
> of the slides that were presented can understand.
>
> With that said, what I did was search through D2.0 looking
> for Local Fault and LF to find all of the associated text.
> I may have missed some.  What I found was incomplete and/or
> confusing.
>
> Here are the concepts I think we need to communicate:
>
> 1.  Any device, in either its transmit or receive paths, could
>     detect a fault condition.  The fault may be that the data
>     being received is invalid or that some internal problem
>     is causing the problem.  Some/many faults may go
>     undetected.  If a device detects a fault condition (i.e.,
>     a locally detected fault) it should set its link status
>     to zero and not forward what is received, but should,
>     at its output, either:
>
>   a.  generate a local fault pulse ordered set if it is
>       capable of doing so,
> or
>   b.  generate all zeros or all ones, making it probable that
>       the next device in the link will detect the problem and
>       (hopefully) generate a local fault pulse ordered set.

I disagree with (b). The response to a detected fault condition should
be consistent. It's just as easy to generate a local fault pulse
ordered-set as it is to generate all zeros and ones. Generating multiple
responses at a transmitter results in multiple interpretations at the
receiver. (a) should be the only response to a detected fault. The (a)
response only is implied in the accepted baseline proposal in
taborek_2_1100.pdf.

> 2.  All devices not detecting fault conditions should forward
>     whatever is received.  Local fault pulse ordered sets and
>     remote fault pulse ordered sets may be generated by other
>     devices and, when received, must be forwarded on.  With the
>     exception of the RS layer, receiption of a Local fault
>     ordered set or a remote fault ordered set must have no
>     effect on the device receiving these pulse ordered sets.

This is not strictly true. Any fault condition will bring down the
entire link. The link remains down until the fault condition abates. The
Link Status protocol should protect against false fault detection
conditions such as those caused by random bit or signal errors. A fault
condition recognition process is implemented whereby a detected fault
conditions are validated. A device which recognizes a fault condition
essentially operates in "fault" mode rather than "normal data" mode. In
this sense, the reception of a fault ordered-set DOES have impact on the
device receiving these pulse ordered sets.

> 3.  The RS layer is where the Local Fault Pulse Ordered Set is
>     processed.  The RS layer is the only place that a Remote
>     Fault Pulse Ordered Set can be generated.  If an RS receives
>     a Local Fault Pulse Ordered Set it must stop sending packets
>     and begin sending alternating columns of Idles and Remote
>     Fault Pulse Ordered Sets.  If an RS receives a Remote Fault
>     Pulse Ordered Set, it must stop sending packets and send
>     only Idles.

Correct.

> Devices detecting fault conditions set their link status to 0
> and attempt to generate LF's (local fault ordered sets).  In some
> cases, multiple devices may be detecting faults and attempt to
> send LF's.

Correct.

> Station management can obtain each device's status and localize
> the problem.

Correct.

> What I found in the standard is in the following clauses:
>
> 45.2.1.2.3
> 45.2.2.1.7
> 45.2.3.1.7
> 45.2.4.2.3
> 45.2.5.2.3
> 46.2.5.1    (last paragraph)
> 46.2.6
> Table 46-4
> 48.1.3.1
> 48.2.2
> 48.2.4.5 and 48.2.4.5.1
> Figure 48-10
> 48.2.5.4 and 48.2.5.4.1 and 48.2.5.4.2
> 49.2.4.5
> 49.2.11.1.1  (definition of LFRAME_R)
> Figure 49-14 --> top state
>
> I don't think these "pieces" capture what we need.  In fact, the
> inconsist usage of terms is confusing.  For example, what
> does "detected a local fault signal on the inbound path" mean?

Loosely translated, inbound path is any devices receiver. Local fault
signal could be a Loss_of_Signal, loss-of-sync, or local fault message.

> I think we need some standardized terms used through out.
> And I think we need a basic description (better written than what
> I did above) place somewhere in the intro and not in one of
> the "component" pieces where it could be missed by others.
>
> Before I start on what I think should be done, I'd like confirmation
> that my description above is correct.  I'll then start on my proposed
> fixes.

Go for it!

> Steve Finch

--

Happy Holidays,
Rich

-------------------------------------------------------
Richard Taborek Sr.                 Phone: 408-845-6102
Chief Technology Officer             Cell: 408-832-3957
nSerial Corporation                   Fax: 408-845-6114
2500-5 Augustine Dr.        mailto:rtaborek@xxxxxxxxxxx
Santa Clara, CA 95054            http://www.nSerial.com