Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: [802.3_NGECDC] Input Requested for Beyond 400 GbE CFI



Hi Xiang, Ali,

 

Thank you for bringing up the topic. As we see, besides high bandwidth connections, latency/power is also very import for current datacenters for AI/ML/high performance storage which should be worth considering in next speed discussion.

 

Regarding the fec, since the termination solution might import more latency, a distributed fec design as Xiang suggested or a stronger one (or something else) for low end-to-end latency should be considered. It might be a balance/compromise between the fec coding gain and the latency based on the channel we can get. We could dig into the topic further J.

 

Thank you.

 

Best Regards,

 

Yan

 

From: Xiang Zhou [mailto:000011dbeaa0229f-dmarc-request@xxxxxxxxxxxxxxxxx]
Sent: Monday, August 3, 2020 6:50 AM
To: STDS-802-3-NGECDC@xxxxxxxxxxxxxxxxx
Subject: Re: [802.3_NGECDC] Input Requested for Beyond 400 GbE CFI

 

Hi Ali,

 

For the FEC, I believe there exists some trade-off  between latency, power and overhead,  so it is not impossible to have a FEC with coding gain higher than KP4 FEC but with similar or even lower latency.

If we implement such a FEC at the switch SerDes side, then there is no '3x' latency increase.

 

If the switch side stays with the same KP4 FEC, such as our initial use case proposal with 8x100G electrical lane  and  4x200G optical lane, we may need some additional FEC gain within the optical module. For this case, some innovations on the FEC design may be necessary. For example. we may not need to terminate the KP4 FEC, instead, we may apply the distributed FEC design concept to introduce a low-latency overlay orthogonal FEC within the optical module to maintain low end-to-end latency.

 

thanks

Xiang

 

 

 

On Sun, Aug 2, 2020 at 12:53 PM Ali Ghiasi <aghiasi@xxxxxxxxx> wrote:

Hello Cedric,

 

You raise a very important point that application Google is 1st considering for 200G/lane is not the traditional DC where switch radix is 256/512 but instead AI nodes connected with torus arcitectrue.  

I assume these type of links would use some form of cache coherence protocol and not Ethernet, for lower latency.

 

Even in the traditional Ethernet DC latency becoming more important now that CPU/Memory/Flash have become much faster.  Given that network latency becoming IOP bottleneck the server-server needs to get to 

few micro-seconds from current 10’s micro-seconds.   This is why I believe even for Ethernet DC due to latency we don’t have much options beside PAM4 due to FEC latency.  PAM4 optically will have definite advantage 

over higher order PAMx,  but obviously the electrical channels to support 200G/lane are not there.  

 

If super-low latency is important for Google AI applications then you don’t have much option beside PAM4 given the FEC latency of PAM6/PAM8!  Unlike 100G/lane eco-system where we 

allocated 0.1 dBo to support 4 electrical sub-link operating at 1E-5, I am afraid at 200G we will need to terminate the FEC in the module and the end-end latency will be 3 time the FEC latency!

 

Thanks,
Ali Ghiasi
Ghiasi Quantum LLC



On Aug 2, 2020, at 12:18 PM, Cedric Lam (林 峯) <000011675c2a7243-dmarc-request@xxxxxxxxxxxxxxxxx> wrote:

 

Chris:

 

I cannot make the prediction you asked.  What I can tell you is that machine learning (at least the one from Google) uses a torus architecture to construct the pod.  This is public information.   So the speed required per link is high (as there are not as high radix as the connections used to form Clos for DC fabric.  WE should probably also ask the HPC guys.  For those applications, super low latency is also very important.  But the physical layer will be the same for short reach interconnects and both applications can cross leverage each other.


--

Cedric F. Lam

 

 

On Sat, Aug 1, 2020 at 3:49 AM John D'Ambrosia <jdambrosia@xxxxxxxxx> wrote:

Chris,

In the past the question you asked below has been used to justify the next speed, not the justification for the speed in question itself.  So I am trying to understand your question.  It would seem the question you want to ask would be related to 100G, not 200G.

 

Just trying to understand what you are getting at to see if additional data is needed.

 

Thanks

 

John

 

From: Chris Cole <chris.cole@xxxxxxxxxxx>
Sent: Saturday, August 1, 2020 1:20 AM
To: STDS-802-3-NGECDC@xxxxxxxxxxxxxxxxx
Subject: Re: [802.3_NGECDC] Input Requested for Beyond 400 GbE CFI

 

Hi Cedric

 

When do you think the 1st million optical transceivers with 200G I/O will ship? It can be any configuration; Nx200G, Nx400G, 800G, etc.

 

Chris

 

From: Cedric Lam ( ) <000011675c2a7243-dmarc-request@xxxxxxxxxxxxxxxxx>
Sent: Friday, July 31, 2020 9:35 AM
To: STDS-802-3-NGECDC@xxxxxxxxxxxxxxxxx
Subject: Re: [802.3_NGECDC] Input Requested for Beyond 400 GbE CFI

 

I can see 1x200G as something useful for server to TOR connections in the future and might be easy to add to the Ethernet family.  I agree with you on the 2x200G.   Also, bear in mind the limited distances that 200G lane can cover and the use cases.  We see it mostly in the intra-DC applications.

--

Cedric F. Lam

 

 

On Fri, Jul 31, 2020 at 8:05 AM John D'Ambrosia <jdambrosia@xxxxxxxxx> wrote:

All,

I received a question after this week’s NEA meeting that I would like to get some feedback on from others.

 

The question was –

If 200 Gb/s per lane signaling were developed could efforts to define 200 GbE based on 1x200 Gb/s and 400 GbE based on 2x200 Gb/s be addressed.

 

I think it is actually a good question and important for me in developing the CFI Consensus deck and defining the SG chartering motion.  As shown by the slide below - 200 Gb/s signaling is applicable to 200 and 400 GbE. 400 Gb/s serial signaling might also be applicable to 400 GbE. 

 

My own personal opinion is that the whole 1x / 2x lanes would then need to be examined on a PHY basis – as we have seen some instances where 2x lanes don’t see market adoption.

 

<image001.jpg>

 

This also raises the question as to whether the study group would define more than one PAR.  Based on the above text – I think there is an opportunity for that or another project that spins out efforts based on consideration of schedule.

 

So I would appreciate some feedback from individuals as it impacts the consensus deck

 

Thanks in advance.

 

John


To unsubscribe from the STDS-802-3-NGECDC list, click the following link: https://listserv.ieee.org/cgi-bin/wa?SUBED1=STDS-802-3-NGECDC&A=1


To unsubscribe from the STDS-802-3-NGECDC list, click the following link: https://listserv.ieee.org/cgi-bin/wa?SUBED1=STDS-802-3-NGECDC&A=1


To unsubscribe from the STDS-802-3-NGECDC list, click the following link: https://listserv.ieee.org/cgi-bin/wa?SUBED1=STDS-802-3-NGECDC&A=1

 


To unsubscribe from the STDS-802-3-NGECDC list, click the following link: https://listserv.ieee.org/cgi-bin/wa?SUBED1=STDS-802-3-NGECDC&A=1


To unsubscribe from the STDS-802-3-NGECDC list, click the following link: https://listserv.ieee.org/cgi-bin/wa?SUBED1=STDS-802-3-NGECDC&A=1


To unsubscribe from the STDS-802-3-NGECDC list, click the following link: https://listserv.ieee.org/cgi-bin/wa?SUBED1=STDS-802-3-NGECDC&A=1