Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: 64/66 system benefits and ad-hoc agenda





Dear Kamran,

> Why do you think synchronizing a de-scrambler is complex?  I was
> thinking of a simple way this could be performed during the start-up. 
> If you assume that at start-up you are expecting to receive 66b blocks
> filled with idles, and assuming that you refer to idle characters by
> the 0 byte (arbitrary choice).  Then the encoder looks like:
> 
> ....
>
> This means that on the receive side we are automatically receiving the
> seed s(k)!  Simple start-up/synchronization procedure.  No hassle.  A
> 66b block is sufficient for achieving synchronization provided that
> the polynomial is of degree < 64. 

My main objection is that the RX has to somehow know when the idle field
has arrived.  I can think of no simple way to know this since the frames
are all scrambled.  As I understand the Nortel proposal, they do something
like this "in the blind", and then check to see if they were right by 
looking for errors in the CRC field.  If there are errors, then another
random block is loaded in, and the process repeats until satisfactory.

At heart, I'm very conservative in these matters.  In random order, here
are some of my "unspoken objectives".

    1) A link should be capable of recovering promptly even after
    catastrophic interruptions in connectivity or power supply
    integrity. 

    2) The RX side of the link may be turned on well *after* the
    transmitter has started.

    3) The software engineering principle of "information hiding" can
    profitably be applied to system design.  In this context, the link
    should be capable of bit, scrambler, and frame-sync fully
    autonomously, and should require no intervention from higher level
    functions.  This makes the use of the subsystem as transparent and
    fool proof as possible. 

    4) From #3 above follows the esthetic of viewing the link as a
    "virtual ribbon cable".  All details regarding startup, coding, CDR,
    synchronization, should be handled as autonomously as possible. 
    The ideal would be for the abstract model of the link subsystem
    to be just parallel data and clock - no different from a simple
    ribbon cable.  Every attempt should be made to achieve this ideal
    unless there are compelling technical reasons to make it impractical.

Following from these general ideals, I am tending to:

    a) avoid start-up training sequences unless they are closed-loop
    controlled by full-duplex links using an end-to-end handshake. 
	
    b) require that the RX must be able to acquire bit and frame
    sync on arbitrary data. 

    c) obsess that NO deadly embrace mechanisms are designed into
    the system.  It must be impossible (or provably unlikely to
    within some tolerable limit) for the system to deadlock, or to
    accept "bad data" as "good data". 

> My point is that if there is a very simple synchronization mechanism
> (like the above example), then the above scrambling seems to be a
> better choice for the following reasons: 1) self-synchronizing
> scramblers create error multiplication.  

Perhaps you haven't yet got to my posting on this matter.  The subject
of error multiplication was well covered by numerous mathematicians for
the PPP-over-SONET standard.  In the end, they chose a self-synchronizing
x^43+1 scrambler with 2-bit error multiplication for anti-jamming purposes.

It is possible to prove that a given self-synchronizing scrambler does
not reduce the CRC power.

> They require the use of very long polynomials in order to reduce the
> interaction with the Ethernet CRC. 

The interaction is not reduced.  It is eliminated.  The sole requirement
is that the scrambler must have no common factors with the CRC generator
(and that the spill-in errors are explicitly checked). 

> Besides your reference of IBM paper, you could also refer to a Nortel
> presentation on the SONET scrambler and interaction with CRC (Montreal
> meeting).  The problem with long polynomials is un-necessary increase
> in hardware complexity.  2) increase in latency (on the
> receive/descrambler side the received signal has to go through many
> registers, ex: N=63).  3) they require the choice of long polynomials
> to avoid lock-up conditions

Let me cover these in order.

1) the hardware for the proposed scrambler is only 64 latches and 64
3-input xor gates.  This is trivial compared to the logic state machine
required to implement a cryptographic style synchronized scrambler which
must probabilistically hunt for synchronization frames (a-la Nortel's
proposal).  The CRC blocks used to test for proper synchronization are
themselves equivalent to the entire complexity of the self-synchronous
scrambler circuit. 

2) Latency is the strong point of the self-synchronizing scrambler.  If
you look at the example 4-wide parallel scrambler that I showed in
Kauai, you will see that the latency is only 2-XOR gate delays.  The
extra latches sit to the side of the data path - not within it.  They
serve only to save delayed bits of state. 

Latency for scrambling is dramatically lower than any other alternative
that I know of.  8B/10B for example, requires approximately 10 gate
delays rather than the 2 delays for scrambling.  This means than an
8B/10B encode/decode nearly always needs one or more pipeline stages,
whereas a scrambler can generally be fit between any pre-existing
register transfer stage. 

> The above scrambling scheme avoids all of the above problems.  You
> will notice that it has no issue of lock-up, it has zero latency, no
> error multiplication at all, and also it allows the choice of lower
> degree polynomial than your current choice thus reducing gate count. 

I chose the polynomial degree for the purpose of jamming tolerance, not
just for CRC independence.  If CRC interaction was the only consideration
I could have chosen any primitive trinomial that was not a sub-factor of
CRC-32.  Since CRC-32 is itself primitive, then any small scrambler such
as x^7+x+1 could be a candidate (subject, of course, to an analysis of
spilled-in errors due to packet boundaries). 

A simple analysis shows that the scrambler order (N) must be greater
than 56 to prevent a malicious attack from jamming the link any faster than
the (exceedingly rare) random failures that would normally occur. 

Here's the basic idea:

If the PLL is assumed to tolerate up to a 64 bit run length (quite
reasonable since 80 is the design limit for SONET systems), then we can
calculate a mean time to link failure (MTTF) from random data of 29
years. 

    Tbit = 100p     # bit time at ~10Gb
    Npll = 64       # maximum tolerable unbroken run
    PO = 26*8       # Ethernet packet overhead including IPG
    N               # maximum scrambler polynomial delay coefficient

    MTTF(random) = Tbit*(2^Npll)/2 ~= 29 years

For a systematic attack where minimum sized packets are sent with a
payload containing an inverse scrambling sequence, the probability of
failure is determined by how big the sequence is.  For a 30 year MTTF,
and minimum-sized probe packets, I calculate that we need a scrambler
with a sequence length greater than 2^56-1 to ensure that an attacker
gains no advantage sending from sending any particular pattern w.r.t. 
random data. 

    MTTF(deterministic) = (2^N-1)*Tbit*(Npll+PO)/2 ~= 32 years (N=56)

So, relative immunity to attack is ensured if the polynomial is of order
56 or higher.  At this point, the probability of a concerted attack with
an inverse sequence falls below the probability of jamming with randomly
formed packets. 

This analysis is quite plausible given that the PPP over SONET
recommendation achieves an effective order of 43+7, or a MTTF of
about a half a year given a concerted attack.

For this reason, I only considered trinomials of order 56 and above.

In practice, this is probably any overly conservative analysis.  The
limiting effect will not really be run-length because the two recurring
sync-bits guarantee at least one transition every 64 bits, and in nearly 
all cases, the code has a density of two transitions in every 64 bits.

The real limiting factor is likely to be baseline wander which I have
already talked about in a previous posting.  

So,  I think that we are safe on both counts as long is the scrambler
ensures that the data stream follows random walk statistics (each bit
being statistically independent).  In this case, my previously posted
analysis of baseline wander holds true.  This condition is practically
guaranteed only if the scrambler is long enough to preclude a malicious
user from guessing the scrambler state.  Hence, a large polynomial.

> It also gives more flexibility for choosing a scrambling polynomial. 
> This type of scrambling has usually the drawback of either: sending
> the seed periodically (which is not the case here as my example shows
> it), or require a synchronization period.  Again, in this case, that'
> not a problem since there is already a start-up procedure in the
> 64b/66b anyway.  Do you agree? 

Not completely.  I am specifically trying to avoid reliance on any
start-up procedure.  I want the RX end of the link to immediately
acquire lock without any assistance from the TX side of the link, even
if it is powered up many hours *after* the TX is enabled. 

"Start-up procedures" are not allowed unless they are controlled by
handshake, or are periodically transmitted (such as the sync characters
of 8B/10B).  

An alternative would be something like Nortel's scrambler where the
scrambler seed is periodically sent, and the proper synchronization is
checked by evaluating the (putatively) correct CRC fields for lack of
errors. 

I have rejected this strategy due to its implementation complexity which
vastly exceeds the 64 latches needed to implement the self-healing,
self-synchronizing scrambler. 

Thanks for your comments and ongoing interest, Kamran.  Please let me know
if I haven't adequately addressed your concerns, or have misunderstood
any of your points.

kind regards,
--
Rick Walker