Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: Clause 48: unaligned 8b10b stream problem




Brian, et. al.

Bob's analysis shows that any 8B/10B stream problems, especially given
multiple lanes, will result is relatively prompt link failure and link
reinitialization. Taking things a step further, one can employ XAUI as a
generic serial bus for applications such as chip-to-chip interconnects
and operate in what I call "data" mode. Data mode allows raw 8B/10B
data, such as SONET packets, to flow across any reasonable number of
XAUI lanes running at any reasonable speed once link initialization is
complete. No lane comma checking or lane-to-lane alignment checking
should be required in data mode. In general, data content across lanes
provides sufficient randomness of 10B data in each lane to ensure quick
detection of unaligned 10B content at the receiver. There is existence
proof of proprietary 8B/10B links employing  similar transport schemes
with no unaligned stream problems.

In addition, protocols running atop XAUI 8B/10B links typically invoke
link initialization whenever the underlying transport is unreliable
(e.g. reporting bad framing info) but is not reporting transport errors
(e.g. 10B alignment errors).

--

Best Regards,
Rich
          

Bob Noseworthy wrote:
> 
> Brian,
>  First off, this email largely restates Pat's comment, just with more (too
> much) detail on the amount of errors in a 1 bit slipped nominal IPG.  But if
> you have that pathologic need to keep reading, then by all means...
> 
>  I assumed you were describing a case where a bit slip occurs on one lane
> (XAUI or LX4).  In which case, a deskew_error would occur and proceed as I
> described.
>  I can't judge the likelihood of all four lane transmitters or receivers
> slipping, but I assume this to be unlikely.
>  But, lets assume this actually happened.  In which case we could look at
> lane 0, where at a minimum there would be only 1 column of idle, which would
> look something like this: ...D/K/S/D21.2(0x55)/D...  Normally though, clock
> deletion of ||R|| would not be occurring (not for every IPG), and even with
> DIC operating, worst case, every 4 frames there would be a frame with an IPG
> with 3 columns of idle (again, ignoring clock compensation)
> 
> Thus, with 3 columns of idle, lane 0 could take on only 5 possible forms:
> Case 1: /K/R/R/S/D21.2
> Case 2: /K/R/K/S/D21.2
> Case 3: /K/R/A/S/D21.2
> Case 4: /A/R/R/S/D21.2
> Case 5: /A/R/K/S/D21.2
> Assuming the slipped 10bitcodes outside of this IPG region somehow remain
> valid, then lets look at how many errors might occur in these sequences
> assuming 4 types of shifts (left 1 bit, shift in a 0; left 1 bit, shift in a
> 1; right 1 bit, shift in a 0; right 1 bit, shift in a 1)  and for each type
> of shift, the sequence might start with RD=negative, or RD=positive.  Thus
> there's 8 scenarios to go through for each case. I go through those at the
> end, but lets skip that for now...
> 
> Summary:
>  All 1bit left shifts, of 20 scenarios: 40% have 2 errors, 40% have 3
> errors, 20% have 4 errors
>  All 1bit right shifts, of 20 scenarios: 20% have 2 errors, 40% have 3
> errors, 40% have 4 errors
> 
> Now lets cast aside our assumption that "no" errors are occuring in the
> 10bitcodes outside of the IPG.
> To lose sync in the worst case (when only 2 errors occur in the IPG), the
> sync state machine on lane 0 would have to see 1 error in 4 codegroups
> before the IPG and 1 error in 4 codegroups after the IPG, or any similar
> permutation resulting in loss of sync.    One error outside of the cases
> I've outlined is likely to occur in the CRC, leaving one error to occur by
> chance.  But in the worst case (a left shift), this need for 2 errors is
> occuring in only 40% of these IPG cases.
> 
> Since 20% of these cases would cause a loss of sync, a *simple* argument
> could be made that: these IPG cases outlined would occur at least once every
> 4 frames (due to DIC), and 1 out of 5 of them would cause loss of sync, thus
> every 20frames at least one should cause loss of sync.  Not rigourous, but
> hopefully enough to obviate the need to add a timer for this remotest of
> scenarios.
> 
>         Bob Noseworthy
>         (603) 862-4342
>         UNH InterOperability Lab
>         IOL Development Staff active w/ DOCSIS, Ethernet,
>         Fast Ethernet, Fibre Channel, & Gigabit Ethernet.
> 
> The details of the 20 difference scenarios per shift are outlined below:
> 
> Left shift 1bit, shift in a 0
>   Case 1: /K/R/R/S/D21.2 ->
>     Start RD='-':  2 Invalids
>     Start RD='+':  1 Invalid & 2 RD errors
>   Case 2: /K/R/K/S/D21.2 ->
>     Start RD='-':  2 Invalids & 2 RD errors
>     Start RD='+':  3 Invalid
>   Case 3: /K/R/A/S/D21.2 ->
>     Start RD='-':  1 Invalids & 1 RD errors
>     Start RD='+':  2 Invalid
>   Case 4: /A/R/R/S/D21.2 ->
>     Start RD='-':  2 Invalid
>     Start RD='+':  1 Invalids & 2 RD errors
>   Case 5: /A/R/K/S/D21.2 ->
>     Start RD='-':  2 Invalids & 2 RD errors
>     Start RD='+':  3 Invalids
> 
> Left Shift 1bit, shift in a 1
>   Case 1: /K/R/R/S/D21.2 ->
>     Start RD='-':  2 Invalids
>     Start RD='+':  1 Invalid & 2 RD errors
>   Case 2: /K/R/K/S/D21.2 ->
>     Start RD='-':  2 Invalids & 2 RD errors
>     Start RD='+':  3 Invalid
>   Case 3: /K/R/A/S/D21.2 ->
>     Start RD='-':  1 Invalids & 1 RD errors
>     Start RD='+':  2 Invalid
>   Case 4: /A/R/R/S/D21.2 ->
>     Start RD='-':  2 Invalid
>     Start RD='+':  1 Invalids & 2 RD errors
>   Case 5: /A/R/K/S/D21.2 ->
>     Start RD='-':  2 Invalids & 2 RD errors
>     Start RD='+':  3 Invalids
> 
> Right shift 1bit, shift in a 0
>   Case 1: /K/R/R/S/D21.2 ->
>     Start RD='-':  1 Invalid & 2 RD errors
>     Start RD='+':  1 Invalid & 2 RD errors
>   Case 2: /K/R/K/S/D21.2 ->
>     Start RD='-':  3 Invalid & 1 RD errors
>     Start RD='+':  3 Invalid & 1 RD errors
>   Case 3: /K/R/A/S/D21.2 ->
>     Start RD='-':  1 Invalid & 1 RD errors
>     Start RD='+':  1 Invalid & 1 RD errors
>   Case 4: /A/R/R/S/D21.2 ->
>     Start RD='-':  3 RD errors
>     Start RD='+':  3 RD errors
>   Case 5: /A/R/K/S/D21.2 ->
>     Start RD='-':  2 Invalid & 2 RD errors
>     Start RD='+':  2 Invalid & 2 RD errors
> 
> Right Shift 1bit, shift in a 1
>   Case 1: /K/R/R/S/D21.2 ->
>     Start RD='-':  1 Invalid & 2 RD errors
>     Start RD='+':  1 Invalid & 2 RD errors
>   Case 2: /K/R/K/S/D21.2 ->
>     Start RD='-':  3 Invalid & 1 RD errors
>     Start RD='+':  3 Invalid & 1 RD errors
>   Case 3: /K/R/A/S/D21.2 ->
>     Start RD='-':  1 Invalid & 1 RD errors
>     Start RD='+':  1 Invalid & 1 RD errors
>   Case 4: /A/R/R/S/D21.2 ->
>     Start RD='-':  3 RD errors
>     Start RD='+':  3 RD errors
>   Case 5: /A/R/K/S/D21.2 ->
>     Start RD='-':  2 Invalid & 2 RD errors
>     Start RD='+':  2 Invalid & 2 RD errors
> 
> More info than you care about, this is the same info as above, but listed on
> a case-for-case basis.  If you're actually reading all this (not just this
> line) then you've earned a gold star and maybe even my favorite Irish bevy:
> Case 1: /K/R/R/S/D21.2 ->
>   Left shift 1bit, shift in a 0
>     Start RD='-':  2 Invalids
>     Start RD='+':  1 Invalid & 2 RD errors
>   Left Shift 1bit, shift in a 1
>     Start RD='-':  2 Invalids
>     Start RD='+':  1 Invalid & 2 RD errors
>   Right shift 1bit, shift in a 0
>     Start RD='-':  1 Invalid & 2 RD errors
>     Start RD='+':  1 Invalid & 2 RD errors
>   Right Shift 1bit, shift in a 1
>     Start RD='-':  1 Invalid & 2 RD errors
>     Start RD='+':  1 Invalid & 2 RD errors
> 
> Case 2: /K/R/K/ ->
>   Left shift 1bit, shift in a 0
>     Start RD='-':  2 Invalids & 2 RD errors
>     Start RD='+':  3 Invalid
>   Left Shift 1bit, shift in a 1
>     Start RD='-':  2 Invalids & 2 RD errors
>     Start RD='+':  3 Invalid
>   Right shift 1bit, shift in a 0
>     Start RD='-':  3 Invalid & 1 RD errors
>     Start RD='+':  3 Invalid & 1 RD errors
>   Right Shift 1bit, shift in a 1
>     Start RD='-':  3 Invalid & 1 RD errors
>     Start RD='+':  3 Invalid & 1 RD errors
> 
> Case 3: /K/R/A/ ->
>   Left shift 1bit, shift in a 0
>     Start RD='-':  1 Invalids & 1 RD errors
>     Start RD='+':  2 Invalid
>   Left Shift 1bit, shift in a 1
>     Start RD='-':  1 Invalids & 1 RD errors
>     Start RD='+':  2 Invalid
>   Right shift 1bit, shift in a 0
>     Start RD='-':  1 Invalid & 1 RD errors
>     Start RD='+':  1 Invalid & 1 RD errors
>   Right Shift 1bit, shift in a 1
>     Start RD='-':  1 Invalid & 1 RD errors
>     Start RD='+':  1 Invalid & 1 RD errors
> 
> Case 4: /A/R/R/ ->
>   Left shift 1bit, shift in a 0
>     Start RD='-':  2 Invalid
>     Start RD='+':  1 Invalids & 2 RD errors
>   Left Shift 1bit, shift in a 1
>     Start RD='-':  2 Invalid
>     Start RD='+':  1 Invalids & 2 RD errors
>   Right shift 1bit, shift in a 0
>     Start RD='-':  3 RD errors
>     Start RD='+':  3 RD errors
>   Right Shift 1bit, shift in a 1
>     Start RD='-':  3 RD errors
>     Start RD='+':  3 RD errors
> 
> Case 5: /A/R/K/ ->
>   Left shift 1bit, shift in a 0
>     Start RD='-':  2 Invalids & 2 RD errors
>     Start RD='+':  3 Invalids
>   Left Shift 1bit, shift in a 1
>     Start RD='-':  2 Invalids & 2 RD errors
>     Start RD='+':  3 Invalids
>   Right shift 1bit, shift in a 0
>     Start RD='-':  2 Invalid & 2 RD errors
>     Start RD='+':  2 Invalid & 2 RD errors
>   Right Shift 1bit, shift in a 1
>     Start RD='-':  2 Invalid & 2 RD errors
>     Start RD='+':  2 Invalid & 2 RD errors
> 
> -----Original Message-----
> From: brian.cruikshank@xxxxxxxxxxxxx [mailto:brian.cruikshank@xxxxxxxxxxxxx]
> Sent: Wednesday, May 02, 2001 11:32 AM
> To: ren@xxxxxxxxxxx
> Cc: stds-802-3-hssg@xxxxxxxx
> Subject: RE: Clause 48: unaligned 8b10b stream problem
> 
> Bob,
> 
> I agree this is an unlikely case.  Sometimes it seems that those unlikely
> cases actually happen when you do not expect it.  Maybe an transmit side
> causes this by a poor implementation and a PLL problem.
> 
> I do not see how the align_status=fail would occur as you mention below.  I
> do not think that the deskew_error would occur.  There is no timer (any
> more) for looking for no occurrences of ||A||.  So this would not cause
> deskew_error.  Since the bit alignment was wrong, an /A would not occur in
> any channels during the IPG.  So this would not cause the deskew_error to
> occur either.
> 
> In all likelihood, it eventually would hit something that would trigger the
> LOSS_OF_SYNC and the system would recover.  The more reaonsonable concern is
> that it might take longer than desired to recover.  This once again is
> because the 8b10b stream shifted by one can look like a mostly valid stream.
> 
> Often it seems that the group is very concerned about recovering quickly and
> sensing every bad packet.  I thought this might be a spot to give
> consideration.
> 
> /Brian
> 
> "Bob Noseworthy" <ren@xxxxxxxxxxx>
> Sent by: owner-stds-802-3-hssg@xxxxxxxx
> 05/01/01 05:13 PM
> 
>         To:        <stds-802-3-hssg@xxxxxxxx>
>         cc:
>         Subject:        RE: Clause 48: unaligned 8b10b stream problem
> 
> Brian,
>  First lets assume the case you outlined actually happened - that a lane
> slipped a bit but did not cause Loss of Sync.
> 
> In this case:
> - align_status=fail would occur within 8 frames (since the first ||I||
> after every other frame would be an ||A||)
> - The receiver would then generate local fault
> - The RS would then transmit remote fault to the link partner sourcing the
> line-rate frames
> - Upon reception of remote fault, the link partners RS would then inhibit
> frame transmission and source ||I||
> 
> so in a worse case, if this unlikely scenario was occurring, then after 8
> frames + the max round trip propagation delay, the Sync state machine would
> lose sync (after receiving 4 bit-slipped /A/,/K/,or /R/ which would cause
> PUDI=/INVALID/)
> 
> Given the unlikely nature of the problem, this "slow" recovery mechanism
> seems to be adequate and requires no change to the standard.
> 
> Regards,
>                 Bob Noseworthy
>                 (603) 862-4342
>                 UNH InterOperability Lab
> 
> -----Original Message-----
> From: owner-stds-802-3-hssg@xxxxxxxx
> [mailto:owner-stds-802-3-hssg@xxxxxxxx]On Behalf Of
> brian.cruikshank@xxxxxxxxxxxxx
> Sent: Tuesday, May 01, 2001 3:39 PM
> To: stds-802-3-hssg@xxxxxxxx; rtaborek@xxxxxxxxxxxxx
> Subject: Clause 48: unaligned 8b10b stream problem
> 
> Rich,
> 
> I have a concern about turning off comma detection and not having a timer to
> indicate no recent commas.
> I believe I am looking at this correctly.  Let me know if I am missing
> something.
> 
> If a valid 8b10b stream becomes out of alignment, I believe there is a
> chance that it would never be realigned.
> It may be a statistically rare case, but I thought I would mention it.
> 
> Here is the case:
> - The interface is using maximum bandwidth for packets and is only
> transmitting minimum IPG.
> - A bit slip or bit insertion in the stream happens so that bit alignment
> is wrong.
> - Many of the valid codewords if shifted by one are still valid; this may
> not cause Loss of Sync.  D21.5 can be shifted and be fine.  LOSyn takes 4
> codeblock violations which consists of 4 codewords.  An occasional codeword
> violation may not cause Loss of Sync to happen.
> - The IPG codewords shifted would not be valid, but they do not occur often
> enough (with min IPG) to cause Loss of Sync.  This would only definately
> cause 6 codeword violations or 2 codeblocks.
> - Since Loss of Sync never happened, the commas would never be detected and
> the stream would never be realigned.
> 
> I have not had a chance to see statistically how many codwords can be
> shifted and still be valid.  That is an important question.  The next
> question will be is if that number is ok to not change the state machine.
> 
> /Brian Cruikshank
> 
> ____________________________________________________
> Brian Cruikshank
> Mindspeed Techologies
> Conexant Systems
> 5555 Central Ave
> Boulder, CO 80301
> Phone: (303) 543-2023
> Cell:      (303) 641-9528
> Fax:      (303) 543-2099
> Email: brian.cruikshank@xxxxxxxxxxxxx
                            
---------------------------------------------------------
Richard Taborek Sr.                     Intel Corporation
XAUI Sherpa                    Intel Communications Group
3101 Jay Street, Suite 110         Optical Products Group
Santa Clara, CA 95054           Santa Clara Design Center
408-496-3423                                     JAY1-101
Cell: 408-832-3957          mailto:rich.taborek@xxxxxxxxx
Fax: 408-486-9783                    http://www.intel.com