[802.3_B400G_LOGIC] 答复: Follow-up questions regarding lu_3df_logic

Thread Links	Date Links
Thread Prev	Thread Next	Thread Index	Date Prev	Date Next	Date Index

Hi Adee,

1) For 200G AUIs and 200G PMDs, End-to-end FEC is better. We can find many of reasons for this, for example:

a) It is not necessary to decode the FEC inside the CDR, decoding inside the CDR could hardly provide performance improvement. For HOST—AUI (segment1) à CDRà PMD (segment 2) à CDR (segment 3) àHOST link , we have three independent link segments, these segments are independent with each other, assume the BER of these three segments are BER_1, BER_2 and BER_3, respectively. The overall BER of the whole link is BER_1+BER_2+BER_3. Let’s assume BER_1 ~=BER_3 ~=1e-4 and BER_2~=1e-3, the overall BER of the whole link is BER_1+BER_2+BER_3~= 1e-3 which is the same order of BER_2. It means for segmented links, the worst segment is dominant in performance, and the FEC only need to cover the worst segment. This was illustrated in our previous contribution https://www.ieee802.org/3/df/public/22_02/lu_3df_01b_220215.pdf page 11.

b) FEC decoding inside CDR is very costly, full Rx PCS and full PMA functionality should be implemented inside the CDR which is overwhelmed. These functions include the virtual lane alignment, deskew, reorder as well as the FEC decoding. Suppose we are considering a 800GE CDR design with FEC termination, since the FEC of Ethernet is per port not per lane, we need to integrate a combo Ethernet PHY of 4*200GE, 2*400GE, 1*800GE. Furthermore if we use RS(544, 514) for 200G AUIs and use higher overhead FEC for 200G PMDs, we need an extra PLL inside the CDR. We do not see any feasibility or economy of such a solution. What's more, the effectiveness of such a scheme is doubtful.

c) If FEC is decoded inside CDR, that means the CDR implementation is protocol dependent, we completely loss the flexibility of hardware design. Bit-transparent CDR is always preferred it provides the most flexibility, and the CDR can be used for both the front plane and backplane. Bit-transparent CDR can support link bonding and breakout easily. For example, we can use two independent 4 lane CDR chip (800G) work in parallel to support 1.6TE. We cannot support FEC decoding in this scenario because the FEC of Ethernet is per port not per lane.

To answer your questions:

Q: Should we really shoot for an end-to-end FEC which will be the same FEC for all 200G/lane PMDs, as suggested by the top row of slide 3?

A: I suggest that this is where we should be. At least it should be an direction we should target for. Segmented FEC should be considered only when necessary, e.g. “100G AUIs and 200G PMDs”.

Q: - Would a segmented FEC architecture in which all FECs are RS(544, 514), such that FEC decoding/re-encoding can be selectively bypassed in a module, match what you advocate in your presentation? (If this FEC is insufficient for some PMD types, these PMDs can terminate it and use another FEC – without burdening the electrical signaling or other PMDs)

A: For “100G AUIs and 200G PMDs” I think the answer is yes, because RS(544, 514) is sufficient for the AUI as long as PMA does not introduce more server error spreading. For “200G AUIs and PMDs” I think the answer is no. We do not have a conclusion that electrical link (AUI) is easier than optical link (PMD), not even a clue for that. Even if the 200G electrical link is easier than the optical link and can be covered by RS(544, 514) (e.g. by using single-ended signaling proposed in lu_b400g_01_210322, lu_b400g_01b_210729 and lu_b400g_01a_210826), we still believe processing the FEC&PCS in the host ASICS is much more economical than terminate RS(544, 514) inside the CDR. 100G per lane 100GE, 200GE, 400GE are exactly this case, the BER of 100G AUI is 1e-6, however the RS(544, 514) can cover 2e-4, but we still use end-to-end RS(544, 514) FEC and use bit transparent CDR in the module. If we can prove that RS(544, 514) can cover both the electrical and optical links with advanced DSP algorithm, then we can reuse RS(544, 514), but we still believe it is end-to-end RS(544, 514) FEC, not segmented.

For the topic of MTTFPA concern of concatenated FEC. We can refer to the figure on page 5 of anslow_3ct_01_0519, we can see that @BERout=1e-12 the MTTFPA curve is very close to the age of universe. We think the curve is correct, we may check closely in the future and also the inconsistency you mentioned i.e. comment #74 against D1.2 of 802.3bs (P802d3bs_D1p2_comments_final_Cl). Here are some points we want to emphasize:

1) The MTTFPA curves are obtained with AWGN assumption which is optimistic without burst errors considered. Burst errors electrical links or optical links are not considered in the analysis of anslow_3ct_01_0519.

2) With concatenated FEC, the burst errors generated by the inner FEC failure will make it even worse.

3) The reliability of concatenated FEC for error detection is much weaker for sure, because the concatenated FEC drops the inner FEC parity bits, the error detection capability is identical the RS(544, 514), but the channel is worse. I don’t think it is acceptable that higher speed Ethernet port has less reliability than lower speed Ethernet port. Higher speed Ethernet ports are usually deployed in the core of a network, which requires higher reliability. We found a relevant contribution which discussed about the uncorrectable FEC errors https://www.ieee802.org/3/df/public/22_02/huber_3df_01a_220203.pdf, it is about the same concern we are discussing.

4) Concatenated FEC does not give satisfactory net coding gain improvement, but has potential reliability issues (MTTFPA concerns), which make it much less attractive.

5) We also need to investigation if concatenated FEC or Soft-decision FEC has error floor, this is a routine procedure for such kind of FEC schemes. Unfortunately, we did not see any data about it.

We have to address both the reliability concerns (i.e. MTTFPA issue) as well as its cost effectiveness for concatenated FEC in the future discussion.

Best Regards,

Yuchun (Louis)

发件人: Adee Ran (aran) [mailto:aran@xxxxxxxxx]
发送时间: 2022年4月26日 22:01
收件人: STDS-802-3-B400G-LOGIC@xxxxxxxxxxxxxxxxx
抄送: Zhuangyan (Yan) <zhuangyan.zhuang@xxxxxxxxxx>; Luyuchun <yuchun.lu@xxxxxxxxxx>
主题: Follow-up questions regarding lu_3df_logic_220425

Yan and Yuchun,

Thank you for your presentation FEC architecture and performance investigation for 800GbE and 1.6TbE in yesterday’s ad hoc call. I had some comments and questions but due to connectivity issues the Q&A had to be cut off. Since it may be relevant for many participants, I’m using the reflector.

My understanding from the presentation is that end-to-end FEC is your preferable scheme. But on slide 3 (as of the original slide deck) it is suggested that the “provisional” application (third row) uses a segmented FEC scheme, where the RS(544, 514) protects just the 100G/lane AUIs, and the “FEC” block (apparently something other than RS(544,514)) protects just the optical link. Only in the “mainstream” applications it is really end-to-end, and it’s supposedly the same FEC for all PMDs.

As noted by Brian Welch in the subsequent presentation End to segmented FEC, with a segmented FEC scheme in which all segments use the same FEC, if the total BER is low enough, FEC termination in modules can be bypassed, making it an end-to-end FEC. I’m not getting into details of when and how this can be done. If the PMD-to-PMD FEC happens to be RS(544, 514) then this option “comes for free”, but if it results in choosing a higher overhead FEC, the additional bandwidth will be a burden on the electrical segment, and it needs to be analyzed with much more detail.

So my questions are:

- Should we really shoot for an end-to-end FEC which will be the same FEC for all 200G/lane PMDs, as suggested by the top row of slide 3?

- Would a segmented FEC architecture in which all FECs are RS(544, 514), such that FEC decoding/re-encoding can be selectively bypassed in a module, match what you advocate in your presentation?

o (If this FEC is insufficient for some PMD types, these PMDs can terminate it and use another FEC – without burdening the electrical signaling or other PMDs)

On another topic:

- Slide 12 states that “the MTTFPA for RS(544, 514) is already marginal with random input error distribution assumption”, pointing to anslow_3ct_01_0519.

- That contribution quotes (on slide 5) a statement from clause 91: “The probability that the decoder fails to indicate a codeword with t+1 errors as uncorrected is not expected to exceed 10^–6”. However, in 119.2.5.3 it is stated that “The probability that the decoder fails to indicate a codeword with t+1 errors as uncorrected is not expected to exceed 10^–16”.

- My understanding is that this much lower probability is due to the fact that clause 119 specifically has t=15, while clause 91 has either t=7 or t=15, and the number 10^-6 is suitable for t=7. See also comment #74 against D1.2 of 802.3bs (P802d3bs_D1p2_comments_final_Cl) and its response, which points to cideciyan_01_0112.

- If indeed the probability of undetected uncorrectable error is that low, then MTTFPA is not marginal with RS(544, 514) as used in 200G/400G Ethernet, and keeping this FEC would not create an issue in 800G/1.6T. Apparently there are contradicting messages here, and it would be good to get to the bottom of this concern.

Best regards,

</Adee>

To unsubscribe from the STDS-802-3-B400G-LOGIC list, click the following link: https://listserv.ieee.org/cgi-bin/wa?SUBED1=STDS-802-3-B400G-LOGIC&A=1

[802.3_B400G_LOGIC] 答复: Follow-up questions regarding lu_3df_logic_220425