Thread Links | Date Links | ||||
---|---|---|---|---|---|
Thread Prev | Thread Next | Thread Index | Date Prev | Date Next | Date Index |
Dear All, Regarding Zhongfeng’s comments on my presentation about the 40G transcoding: 1)
The question of MUX’ing in Zhongfeng’s proposed scheme comes down to how to handle the datapath through the PHY. My analysis makes the assumption that you would want to run a 64 + 1 bit datapath clocked on the XLGMII clock.
The problem with the scheme proposed by Zhongfeng is that it introduces the concept of an extra row to send, and in order to save the 7x 1:8 byte MUX’s (which is an insignificant number of gates in current semiconductors processes), you either have to implement
a 9/8th faster clock on the 64+1 datapath, or you have to transfer this new row in parallel with un-MUX’d data, resulting in a 128+1 bit datapath - both undesirable options.
2)
With respect to the question of latency and the suggestion that 513 bits will not be simultaneously available in a “proper/reasonable” RS decoder: the assumption here is that we are doing the error location via a Chien search,
and until the search is complete and the correction syndrome checked, we won’t know if the correction passed or failed – i.e. for T or less errors, it will pass, but for T -> 2T errors it will fail, and for >2T errors it may fail. Consequently, all of the
bits are available at the time the syndrome is checked. Regards, Paul Langner From: Zhongfeng Wang [mailto:zfwang@xxxxxxxxxxxx]
Dear All, I missed this interim meeting due to my personal vacation in China (12 hr difference from Florida) I got Paul’s presentation from Tom. I had a quick look. But I found his analyses on the alternate transcoding scheme are
not correct. 1) As I stated in my May presentation (see attached) on page 8, the new transcoding scheme does NOT need muxing logic for rest 7 bytes for each row. But conventional transcoding DO need muxing logic for all bytes. Here’s an simplified example: 1) input X[0:99]; Output Y[0:99]; Y[0:19]=X[80:99]; Y[20:99]=X[0:79]; In this case, output and input have 1:1 mapping. Thus there’s no need for muxing logic for the data conversion. This explains why those 7 bytes (for each row) do not need muxing logic. 2) Regarding latency, although input data for RS decoding are all received,
the RS decoder output the corrected data in multiple cycles. Since our throughput is only 1Gbps, we only need output 9 bits (= 1 RS symbol) per 9 ns (roughly). Assuming 375Mhz of clock speed, we only need output about 1 (corrected) RS symbol (9 bits) in about 3 cycles. Obviously we will not get corrected 513 bits in one clock cycle in a proper/reasonable VLSI design. For a silly design, we can use high-level parallel processing (leads to linearly increased HW complexity and peak power) so that we can output corrected 513 or more bits in one clock cycle. In this case, the RS decoder has rough throughput 375Mhzx513bits > 180Gbps. This is obviously overdesign and consume
unnecessary HW. I’m copying this email to a few VLSI implementation experts. Hopefully they can answer any further questions of you regarding this matter in a timely manner (It is deep night in my place). Note: After my presentation in May, I basically gave up the effort to push the improved scheme to the 40GBaseT standard since
I knew it takes much much more effort than technical analyses. However, for the truth of science and technology, I want to take this further effort to explain my original scheme.
I truly believe that no one in this community wants to have a wrong analysis to be recorded in the IEEE history forever. Thanks for all of your attention. --- Zhongfeng
|