Re: XAUI/XGXS
Rich,
My comments are embedded below.
Regards,
Mike
Rich Taborek wrote:
>
> Mike,
>
> In your own picture, it's clear that additional registers AND latency accompany
> a word-striping approach at both the transmitter and receiver.
>
> For the staggered word-striping approach illustrated in your note, WORD0 on the
> RS or XGMII must be held for at least THREE byte times (9.6 ns) before its
> transmission across XAUI. This is required since the RS/XGMII supplies data to
> the XGXS in what can essentially be considered to be COLUMN-STRIPED format.
As much as we both try to sound like we disagree, maybe
we are in agreement here. Let me try restating your words:
In the word striped approach (timing diagram below), the
fourth byte of any word is buffered in the transmitter
for 9.6 ns, but is immediately available when received
at the other end, as it completes the word. The third
byte is buffered at the transmitter for 6.4 ns and at the
receiver for 3.2 ns (waiting for the fourth byte), and
similarly for the second and first byte.
The conclusion is that the TOTAL latency -- transmitter
plus receiver latency -- is 9.6 ns for all of the data.
(Is thet what you were saying?) Compared with the latency
needed to deskew 40 bits, I think this is not too important.
As for the RS/XGMII supplying "COLUMN-STRIPED format",
I think that is simply marketing hubris. It's simply
four bytes that have to get from here to there.
> 8B/10B doesn't allow a striping granularity less than a single character per
> lane (i.e. a column). Any striping granularity greater than a single column
> clearly requires additional buffering and incurs a latency penalty, both being
> proportional to the chosen striping granularity.
>
> I'd like to take this opportunity to respond to some other comments you made on
> this issue:
>
> > > The MAC will not see this "low pin count" unless XGXS/XAUI
> > > can be integrated with the MAC, which is less likely as a
> > > XGXS/XAUI design requires more high speed logic and/or a
> > > special process.
>
> 1) Low pin count advantages do not require XGXS/XAUI integration with the MAC.
> XAUI over backplanes and attached to PHY instantiations such as transceiver
> modules greatly reduce the number of traces and I/O pins used for these
> plentiful 10 GbE link elements. XGXS/XAUI integration with the MAC simply
> provides the same benefits one interface up the stack.
Agreed. My comment addressed only the MAC pin count.
>
> 2) XGXS/XAUI design requires significantly less high speed logic for
> column-striping than word-striping, primarily due to its low buffering
> requirements. We've been through all this. At 10 Gbps, 32/36-bit data paths must
> run at 312.5 MHz. Most XGXS/XAUI logic must run at this speed. 32/36-bit Words
> must be processed at this rate, not at 78 MHz, which only results in a 2.5 Gbps
> throughput.
Not agreed. I don't get the connection between buffering
and logic speed. As you say, a 32/36-bit-width path processed
at 78 MHz results in 2.5 Gbps. That is what one word striped
lane is! Put four of those lanes together and you have a
10 Gbps, word-striped design. The only need for 312.5 MHz
is at the parallel interface, and if it were integrated there
would be no pin-count-driven need for so narrow an interface.
>
> 3) XGXS/XAUI does not require a special process. 0.25 um CMOS is adequate. This
> is being proven out already by several vendors which are sampling chips which
> implement the quad-SerDes portion of XGXS/XAUI.
The serdes portion is not the concern. We have already
shown 0.25 um serdes running at 3.125G. The issue, I
think, is particularly in the column striping control
logic, running, as you say, at 312.5 MHz. We can debate
this point 'til the cows come home. Hardware will tell.
>
> Mike Jenkins wrote:
> >
> > Larry,
> >
> > Thanks for the feedback. I guess I could have been more clear.
> > The part of Rich's statement I was rebutting was the need for a
> > word striped XAUI to hold four 32-bit words prior to starting
> > transmission. Registers are needed to hold the data, but the
> > latency isn't there. Here's a brief timing diagram for the
> > transmitter. (Use fixed width font, please):
> >
> > XGXS input | WORD0 | WORD1 | WORD2 | WORD3 | WORD4 | WORD5 |
> > data ...| valid | valid | valid | valid | valid | valid | ...
> > (XGMII?) |<3.2ns>| | | | | |
> >
> > Lane 0 TX ...-->|<----- WORD0 (12.8ns) -------->|<-- WORD4 -------
> >
> > Lane 1 TX ...---------->|<----- WORD1 ----------------->|<-- WORD5
> >
> > Lane 2 TX ...---(prev. word)--->|<----- WORD2 ---------------->|<-
> >
> > Lane 3 TX ...--------(prev. word)------>|<----- WORD3 ------------
> >
> > There probably is an extra cycle of latency for 8b10b encoding,
> > but that is common to both striping schemes, so I haven't shown it.
> >
> > The receiver timing diagram recovering the words looks pretty much
> > the same, except that WORDn can't be presented on the bus until all
> > of it has been received, of course.
> >
> > Sorry I didn't make this aspect more explicit. Please let me know
> > if you have any more questions or comments and I'll beat the dead
> > horse some more.
> >
> > Regards,
> > Mike
> >
> > Larry Rennie wrote:
> > >
> > > Mike, now you have confused me regarding the buffering requirement for word
> > > striping. You said in response to Rich's statement that four buffers are
> > > required for word strinpng:
> > >
> > > "Not true, Rich. Word striping DOES NOT "require that the 4
> > > consecutive 32-bit words from the MAC be buffered prior to
> > > transmitting the first byte across each of lanes 0:3."
> > > You have been present several times when this was explained.
> > > As soon as the word arrives it begins to be transmitted. The
> > > next word arrives and begins to be transmitted on the next lane
> > > 3.2 ns later, etcetera, so the words are staggered on the lanes,
> > > one word arriving at the receiver each 3.2 ns, exactly as needed."
> > >
> > > Your slide number 5 from your presentation in Kauai "seems" to imply the need
> > > for a 4, 32-bit buffers. Is this correct? I agree that these buffers are
> > > needed. If no buffers are needed, then what is being transmitted on the other
> > > three lanes when the fourth lane is actively carrying data?
> > >
> > > Regards,
> > >
> > > Larry
> > >
> > > Lawrence J. Rennie
> > > National Semiconductor Corporation
> > > Calabasas, CA.
> > > 818 880 2720
> > >
> > > -----Original Message-----
> > > From: Mike Jenkins [SMTP:jenkins@xxxxxxxx]
> > > Sent: Wednesday, May 10, 2000 5:50 PM
> > > To: rtaborek@xxxxxxxxxxx
> > > Cc: HSSG
> > > Subject: Re: XAUI/XGXS
> > >
> > > Rich,
> > >
> > > I feel compelled to respond to a few of your comments regarding
> > > word striping in your reply to Ed Chang. Please see below.
> > >
> > > Regards,
> > > Mike
> > >
> > > Rich Taborek wrote:
> > > >
> > > > Edward Chang wrote:
> > > > >
> > > > > Rich:
> > > > >
> > > > > Thanks to show me the XAUI proposal. I may have missed this one, since it
> > > > > was not included in the David's March presentation list. You submitted
> > > > > later.
> > > >
> > > > Ed,
> > > >
> > > > The XAUI/XGXS proposal has been available from the March presentations page
> > > of
> > > > the 10 GbE web page since the time of the meeting and referenced in many,
> > > many
> > > > reflector notes since then.
> > > >
> > > > > Well, this is a good opportunity to discuss a few XAUI questions.
> > > > >
> > > > > For CWDM application, it needs four parallel, transmission lines to extend
> > > > > the connections from SERDES (PMA) to transceivers + CWDM (PMD with
> > > > > re-timer). Normally, in a board layout, these line lengths can be made
> > > > > short enough to treat as a usual four asynchronous PC traces without any
> > > big
> > > > > deal.
> > > > >
> > > > > As we all know, for a switch application, these four self-clocking lines
> > > may
> > > > > extend beyond 10 inches. In this case, a re-timer can be added to restore
> > > > > the amplitude and remove DJ.
> > > > >
> > > > > We gave a name "HARI" to these four differential lines. Basically, they
> > > are
> > > > > pure electrical issue with electrical specification only - transparent to
> > > > > any code.
> > > >
> > > > Hari has been changed to XAUI/XGXS as of the March presentation. Electrical
> > > > specifications were only one component of the original Hari proposal. Coding
> > > was
> > > > another component. The proposed code for Hari was 8B/10B. This has not
> > > changed
> > > > with the name change to XAUI/XGXS. Hari electrical and coding specifications
> > > > include some common elements. Among those is skew specifications and deskew
> > > > functionality. Codes other 8B/10B may not have the ability to handle skew in
> > > the
> > > > same manner. The XAUI/XGXS proposal is complete in that it describes all
> > > > mechanisms required for reliable operation of a parallel arrangement of
> > > multiple
> > > > serial lanes, 4 lanes to be exact in this case. To say that the
> > > Hari-XAUI/XGXS
> > > > proposal is "transparent to any code" is incorrect.
> > > >
> > > > > These four lines can use "Word striping" as in Mike Jenkins' proposal to
> > > > > move data from XGMII, through PCS (8B/10B), PMA, HARI, PMD in a straight
> > > > > forward manner. At the receiving side, the data from four lines can be
> > > > > individually clocked into each one's FIFO after de-serializing, then let
> > > > > FIFO perform the final deskew for XGMII interface.
> > > > >
> > > > > It seems pretty straight forward, and it does the job.
> > > >
> > > > A complete XAUI/XGXS proposal based on word striping has never been aired.
> > > The
> > >
> > > As far as 10 Gigabit Ethernet goes, that's true.
> > > Regrettably, not enough interest to warrant the effort.
> > >
> > > > March 2000 XAUI/XGXS proposal requires column striping to meet all link
> > > > requirements including the recent desire to reduce 8B/10B EMI through
> > > > randomization of the Idle pattern. Other link requirements include lane and
> > > link
> > > > synchronization, clock tolerance compensation and link deskew.
> > > >
> > > > > My question is the XAUI interface has additional coding requirement on top
> > > > > of 8B/10B code to perform column striping. Why it is needed? I do not see
> > > > > the need in a 4-line CWDM application. It seems, XAUI makes it more
> > > > > complicated to achieve additional objectives, than a pure electrical
> > > > > interface HARI for a CWDM application. Does the serial application require
> > > > > XAUI's additional features, but not a simple four parallel electrical
> > > lines?
> > > >
> > > > The March 2000 XAUI/XGXS proposal supports one, and only one code, 8B/10B. No
> > > > other coding is required to perform column striping. Column striping simply
> > > > refers to the simultaneous transmission of 4 bytes of information from the
> > > MAC
> > > > directly across XAUI lanes 0:3. Alternatively, word-striping requires that
> > > the 4
> > > > consecutive 32-bit words from the MAC be buffered prior to transmitting the
> > > > first byte across each of lanes 0:3 (i.e. in a column-striped fashion). My
> > > > answer to "Why it is needed?" is that column striping results in an
> > > architecture
> > > > which is simple, lowest in latency and requires the least buffering. Please
> > > > allow me to turn the question around: If the MAC supplies 32-bit words to the
> > > > PHY, why should the PHY have to stripe the words one at a time across all
> > > four
> > > > lanes before transmitting anything?
> > >
> > > Not true, Rich. Word striping DOES NOT "require that the 4
> > > consecutive 32-bit words from the MAC be buffered prior to
> > > transmitting the first byte across each of lanes 0:3."
> > > You have been present several times when this was explained.
> > > As soon as the word arrives it begins to be transmitted. The
> > > next word arrives and begins to be transmitted on the next lane
> > > 3.2 ns later, etcetera, so the words are staggered on the lanes,
> > > one word arriving at the receiver each 3.2 ns, exactly as needed.
> > >
> > > The answer to "Why is word striping needed?" is that it avoids
> > > the need to deskew with the attendant need for high-speed
> > > logic. The cost in latency is about 1/2 word (6.4 ns) which
> > > may be less than what is consumed in deskewing column striping.
> > > >
> > > > I'm confused with your statement: "It seems, XAUI makes it more complicated
> > > to
> > > > achieve additional objectives, pure electrical interface HARI for a CWDM
> > > > application". I assume that by CWDM you're referring to the WWDM PHY proposal
> > > > which also uses 8B/10B encoding. Prior to transmission, the data on each WWDM
> > > > lane must be serialized by a PMA, a 10:1 serializer in this case. Prior to
> > > that,
> > > > the data must be encoded by the PCS, 8B/10B in this case. Prior to encoding,
> > > the
> > > > data must be sourced from the MAC. It is directly sourced through the
> > > > reconciliation layer on a byte-by-byte basis in this case. XAUI/XGXS is
> > > always
> > > > optional. It is optional for a WWDM PHY. XAUI/XGXS may optionally be used in
> > > a
> > > > WWDM WAN PHY to go across long (~20") PCB traces as well as attenuate jitter
> > > at,
> > > > or in close proximity to, the transceiver module.
> > > >
> > > > Pushing a "Pure electrical interface Hari" and "Word striping" does not
> > > compute
> > > > to me. How do you get a pure electrical interface to buffer a full word on
> > > each
> > > > lane before transmitting anything in that lane?
> > >
> > > Again, not true. Please see above.
> > > >
> > > > XAUI/XGXS is just as optional for Serial applications as it is for WWDM. The
> > > > intention of XAUI/XGXS for Serial applications is to support low pin count,
> > > long
> > > > PCB traces between the MAC/RS and PCS.
> > >
> > > The MAC will not see this "low pin count" unless XGXS/XAUI
> > > can be integrated with the MAC, which is less likely as a
> > > XGXS/XAUI design requires more high speed logic and/or a
> > > special process.
> > > >
> > > > > There are some comments on the XGXS functions listed in the presentation.
> > > > >
> > > > > (1) Perform clock tolerance compensation:
> > > > >
> > > > > All clocks are generated from one write clock source, which provide XGMII
> > > > > clocking, SEWRDES (HARY self clocking data) clocking; therefore, it seems
> > > > > there is no need for clock tolerance compensation. There are phase
> > > > > differences, which contribute skew, but not frequency deviation.
> > > >
> > > > The group that developed Hari specified a clock tolerance compensation
> > > > capability. The use of this capability is implementation dependent. In the
> > > case
> > > > that the RS or XGMII clock source is not adequate enough to guarantee that
> > > Hari
> > > > operates within spec, an optional clock reference may be used to clock the
> > > Hari
> > > > interface. The optional usage of such a reference dictates the utilization of
> > > > the Hari clock tolerance compensation capability. This Hari capability was
> > > > propagated to the XAUI/XGXS proposal.
> > > >
> > > > > (2) Perform error control to prevent error propagation:
> > > > >
> > > > > Electrically, HARI interface is not any different from other PC runs
> > > design.
> > > > > I believe, the normal PC design rules can assure HARI will not generate any
> > > > > extra errors. Do we have to worry about additional error generation by
> > > > > those four lines (HARI)? I do not think so. Otherwise, we may have to add
> > > > > error correction for other PC runs.
> > > >
> > > > On the contrary, the Hari interface is very, very different from "other PC
> > > runs
> > > > design". I'm not aware of 8B/10B encoding being used within PC's today.
> > > >
> > > > Each lane of a Hari receiver, due to nature of 8B/10B code, must perform
> > > error
> > > > control by definition. If a code violation is detected at the receiver, the
> > > > propagated code must indicate that the received code-group was invalid. What
> > > do
> > > > you suggest that the received invalid code-group be changed to instead?
> > > >
> > > > > For EMI? No, I do not think so. For 8B/10B code, as long as it stays
> > > > > inside a cabinet or going out of a cabinet with a fiber cable, I believe,
> > > > > there is no EMI problem to worry about.
> > > >
> > > > Good coding practices are typically simple and very cost effective. For
> > > 8B/10B
> > > > code, good coding practices clearly go hand in hand with good electrical and
> > > > mechanical design practices to minimize EMI. I'd hate to be telling a
> > > customer
> > > > pointing out an EMI problem to me that "there is no EMI problem to worry
> > > about."
> > > >
> > > > > Regards,
> > > > >
> > > > > Edward S. Chang
> > > > > NetWorth Technologies, Inc.
> > > > > EChang@xxxxxxxxxxxxxxxx
> > > > > Tel: (610)292-2870
> > > > > Fax: (610)292-2872
> > > > >
> > > > > Ed,
> > > > >
> > > > > XAUI has nothing to do with a 10.3125 Gbaud line rate. That rate is
> > > > > associated
> > > > > with the overhead of 64B/66B: 66/64 * 10 = 10.3125. Where are you getting
> > > > > this
> > > > > information?
> > > > >
> > > > > FYI: XAUI is proposed as an 8B/10B 4-lane serial interface with each lane
> > > > > running at 3.125 Gbaud.
> > > > >
> > > > > Please refer to:
> > > > > http://grouper.ieee.org/groups/802/3/ae/public/mar00/taborek_1_0300.pdf
> > > > > for further details.
> > > > >
> > > > > --
> > > > >
> > > > > Best Regards,
> > > > > Rich
> > > > >
> > > > > Edward Chang wrote:
> > > > > >
> > > > > > Comments:
> > > > > >
> > > > > > I agree XAUI should be 100% transparent. XAUI has its unique value in 10
> > > > > > Gbps serial application to maintain symbol rate at 10.3125 which is very
> > > > > > close to 10 Gbps. However, it comes with a lot of complicated coding
> > > > > > manipulations.
> > > > > >
> > > > > > For CWDM approach, the data rate is low which, does need XAUI approach.
> > > > > The
> > > > > > straight forward, mature, and market-proved block code will do nice job.
> > > > > In
> > > > > > the reference model, it will completely skip the XAUI of 64b/66b, and the
> > > > > > MAC will go directly to 8B/10B coding (PCS) followed by SERDES (PMA).
> > > > > Just
> > > > > > the same as GbE ... simple and cost-effective.
> > > > > >
> > > > > > If the XAUI proposal is trying to make all applications using XAUI of
> > > > > > 64b/66b, it is a wrong approach. Keep it flexible. Not everyone needs
> > > > > the
> > > > > > complex manipulation of the coding scheme.
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > Edward S. Chang
> > > > > > NetWorth Technologies, Inc.
> > > > > > EChang@xxxxxxxxxxxxxxxx
> > > > > > Tel: (610)292-2870
> > > > > > Fax: (610)292-2872
> > > >
> > > > --
> > > >
> > > > Best Regards,
> > > > Rich
> > > >
> > > > -------------------------------------------------------
> > > > Richard Taborek Sr. Phone: 408-845-6102
> > > > Chief Technology Officer Cell: 408-832-3957
> > > > nSerial Corporation Fax: 408-845-6114
> > > > 2500-5 Augustine Dr. mailto:rtaborek@xxxxxxxxxxx
> > > > Santa Clara, CA 95054 http://www.nSerial.com
> >
> > --
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > Mike Jenkins Phone: 408.433.7901 _____
> > LSI Logic Corp, ms/G715 Fax: 408.433.7461 LSI|LOGIC| (R)
> > 1525 McCarthy Blvd. mailto:Jenkins@xxxxxxxx | |
> > Milpitas, CA 95035 http://www.lsilogic.com |_____|
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> --
>
> Best Regards,
> Rich
>
> -------------------------------------------------------
> Richard Taborek Sr. Phone: 408-845-6102
> Chief Technology Officer Cell: 408-832-3957
> nSerial Corporation Fax: 408-845-6114
> 2500-5 Augustine Dr. mailto:rtaborek@xxxxxxxxxxx
> Santa Clara, CA 95054 http://www.nSerial.com
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Mike Jenkins Phone: 408.433.7901 _____
LSI Logic Corp, ms/G715 Fax: 408.433.7461 LSI|LOGIC| (R)
1525 McCarthy Blvd. mailto:Jenkins@xxxxxxxx | |
Milpitas, CA 95035 http://www.lsilogic.com |_____|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~