RE: clause 49 comments and questions
David,
Thank you for your comments. My responses are bracketed by <PAT>.
Pat
-----Original Message-----
From: David Gross [mailto:dgross@xxxxxxxxxxxxxxxxxx]
Sent: Friday, December 08, 2000 11:22 AM
To: THALER,PAT (A-Roseville,ex1)
Cc: stds-802-3-hssg@xxxxxxxx
Subject: clause 49 comments and questions
Hi Pat,
I've just been reading the new draft, and I have a few questions
concerning clause 49. I was hoping you could clarify some of them:
1) In figure 49-11, the condition for going to the 32_BAD state has been
updated to include "+ frame_lock = false". This seems to imply that if a
slip occurs (causing frame_lock to go false) and the new sync_header is
tested for validity before achieving frame lock (must get 64
contiguous), if only 1 error occurs in the sync_header during this time
bad_sh_eq_thresh will be true again (frame_lock is still false) and
another slip will occur. This seems much more restrictive, and I was
wondering why it is now chosen to be implemented this way.
<PAT> This change was made in response to comments on the prior draft
suggesting a way to speed-up the acquisition of lock. The lock machine
acquires lock by testing candidate sync header positions until it finds a
position that has 64 consecutive correct sync headers. It is desireable to
limit the time spent testing incorrect positions because that will speed
acquisition of lock. Also, it is a non-goal to come up quickly or at all on
a link with high BER.
The original lock machine discarded a candidate position if it had 32 bad
sync headers out of 64. This test would average more than 64 frames to fail
on a bad position. If the starting position for testing candidates is
random, it will on the average have to test 33 candidates before getting to
the correct position. So it would take more than 2000 frames to acquire
lock.
The new algorithm averages less than two frames to discard a bad position
because one bad sync header will cause a move to the next position.
Probability of getting a bad sync header when testing the correct position
should be low because BER should be less than 10^-12. Let's say one is
unlucky and gets a bit error while testing the correct position. Therefore,
one tests an average of 33 candidates, tests the correct position but
discards it because of a bit error and has to test all 66 positions to get
back to the correct position. The testing of 99 incorrect positions takes
less than 200 frames. So, even if the hair trigger on discarding a position
causes the correct position to be bypassed once, the new algorithm achieves
lock much faster than the original.
I'm not sure what you mean by "much more restrictive", but I'm convinced
that the new algorithm is much better than the old one. <PAT>
2) In figure 49-13, the TX state machine's initial condition is to
output IFRAME_T. However, in the November slide deck from Rich Taborek
et al. concerning Link Status, upon initialization the TX outputs RF.
Now, eventually LF will be translated into RF and sent out of the TX
path, but I'm wondering why it should be the default condition of the TX
state machine?
<PAT> RF is sent by the RS in response to receiving an LF. I assume you are
referring to slide 18 of Rich's slides. In that case, there is no received
signal so the RS will be receiving LF and will send RF. Note that the TX
machine is only in this state during power on or reset. Once it is up, it
will be sending on whatever it is receiving from above. <PAT>
Additionally, it seems possible that if device 1 outputs I_FRAME on
start_up, then device 2 might not receive a signal (at first), generate
an RF, receive the I_FRAME from device 1's start_up, and then begin
sending out normally. Meanwhile, device 1 will detect a LF, send a RF to
device 2 (when device 2 recieves this it will stop transmitting data and
will also send RF) and will eventually detect Device 2's idle followed
by it's data (it will then stop sending RF and start sending data). The
problem is that Device 2 will have sent RF once Device 1 sends it, and
vice versa, and so on and so on. I'm hoping that by initially having the
TX state machine send out RF we can avoid this possibilty...
<PAT> Receiving an RF does not cause sending of RF. If it did, things would
lock up. It does not matter if a device receives idle for a while followed
by RF (because the other end hasn't achieved lock yet) and then followed by
idle when the other end achieves lock. This would be normal.
Think about two devices that are both up but not connected. They will both
be receiving LF and sending RF. Then when a cable gets plugged in connecting
them, they will both obtain lock and start sending idle. There can be a time
when they have achieved lock but are still receiving the RF that the other
side sent before it got lock. This case has to work.
We designed the fault signalling and initialization in the simplest way with
no handshakes between the two sides.
Lack of lock or lack of signal cause a sublayer to send LF.
Receipt of LF causes an RS to send RF.
Receipt of idle without LF causes RS to send normal idle and any packets
sent to it by the MAC.
RF lets the receiving side report that there is a problem with the link but
doesn't cause any state changes. <PAT>
3) Also in figure 49-13, I noticed the TX state machine is now longer
worried about 3-bit hamming protection as it was in D1.1. Why is this no
longer a concern?
<PAT> The operation now defined for the TX state machine combined with the
next frame check of the RX state machine preserves the 4-bit Hamming
distance. Since a packet has to go through the RX state machine, there is no
need for the TX machine to perform the next frame check. When the TX machine
transmits a T frame, only an S or a C frame are valid as the next frame. Any
other frame will be transmitted as an E frame. When the RX machine receives
a T frame followed by an E frame, its next frame check will cause the T
frame to be changed to an E.
Therefore, there is no reason to impose the burden of a next frame check on
the TX machine and the check was removed. <PAT>
Thanks.
Dave Gross