Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: [8023-CMSG] Purpose



Title:
Hugh,
 
Well, I certainly can't get on board with the idea of 40 or 100 Gb/s being cheap or simple. At least not in the next couple of years.
 
I never thought of myself as small-frame-phobic. I always thought of myself as a lover of improved cost-performance.
 
You are correct that geometry matters if you want low latency.
 
Regarding your comment of this being a niche of a niche, to some of us, being a part of a 1 B$ a year and rapidly growing niche within a 50B $ a year niche is worthy of consideration. It doesn't especially bother me that this might be embarrassingly small and not worthy of consideration for the largest vendors.
 
Of course a switch implementing pre-emption would interoperate with a switch that didn't. Really Hugh, that kind of FUD is beneath you.
 
jonathan
-----Original Message-----
From: owner-stds-802-3-cm@LISTSERV.IEEE.ORG [mailto:owner-stds-802-3-cm@LISTSERV.IEEE.ORG]On Behalf Of Hugh Barrass
Sent: Friday, April 30, 2004 11:22 AM
To: STDS-802-3-CM@LISTSERV.IEEE.ORG
Subject: Re: [8023-CMSG] Purpose

Jonathan,

I don't know why you're so scared of smaller frames - anyone who wants smaller latency should prefer smaller frames. If you reduce the MTU to 500 bytes (not to 48) the equation swings in favor of existing standards:

6 x 500 x 8/10k + 0.5 = 2.9 vs 0.8 - still a 3x improvement using preemption, but getting closer. Bear in mind that this is an extreme worst case comparison. Averages will be almost identical because the preempting packet can arrive at any time during the preempted frame; the preempted frame might not be of maximum size; the link may be idle when the preempting packet arrives; plus of course the packet in progress may be a high priority packet also.

Of course if we start adding in more delays the difference gets yet smaller (both delays increase similarly). e.g.

Your example allows only 15m per link, you will start to run into geometrical problems if you want to aggregate very large numbers of nodes with only 15m per link. If there are fewer nodes then you need to re-architect you interconnect matrix because 6 hops should be able to accommodate many thousands of end stations.

Your example must be assuming very aggressive cut-through switch architecture (cut through has lost popularity in recent years, shame). If you want to conform to the requirements of bridging then you should wait for both the source & dest MAC address to be received before you transmit (unless you are a repeater!). Since you are advocating preemption, I would also assume that you must wait for the COS/TOS tag. That extra 10 bytes will be difficult to avoid. Of course, if you decide that the error propagation of cut-through makes the technique unfavorable then you have a full 64 bytes of latency to wait for the CRC of the incoming frame.

Regarding jumbo frames and complexity of end station devices, I would expect that any device capable of filling  a 10Gbps pipe will require some hardware acceleration. For hardware implementations there really is no significant difference between encapsulating 1500 byte frames vs 500 byte (or even smaller) frames. Hardware which performs this high speed operation has the advantage that it is seamlessly compatible with any other equipment that might be connected to it. On the other hand, if a switch started using a preemption mechanism when connected to any existing hardware then it could be anybody's guess what would result. My assertion is that a small reduction in MTU for the local network will yield results which are close enough to your extreme examples to make the applicable space where a new standard is demanded very small indeed. As I said, it's a niche of a niche.

Better to spend our effort on cheap and simple 40Gig (or even 100Gig) and make this whole argument moot (yes, at 100G the max length frame can be stored in 25m of wire).

Hugh.

Jonathan Thatcher wrote:
Hugh,
 
Okay, lets use near worst case numbers (worst case means that the implementer is choosing to suboptimize performance).
 
All in microseconds
Number of hops: 6
Delay on 100 m fiber (total, not per hop; speed of light in glass): 100 m / (300,000,000 m/s * 2/3) = 0.5
Wait (Number of bytes): 1500
Rate: 10Gb/s
Preemption slot (delay due to preemption): 64 Bytes (4 to 8 is more realistic)
 
Latency without preemption: 6 hops * 1500 bytes/hop * 8 bits/byte / 10 Gb/s + 0.5 = 7.2 + 0.5 = 7.7
Latency with preemption: 6 hops * 64 bytes/hop * 8 bits/byte / 10 Gb/s + 0.5 = 0.3 + 0.5 = 0.8
 
That looks like an order of magnitude to me.
 
This doesn't include the serialization/deserialization, but the 64 bytes in preemption slot more than covers that.
 
On the other hand, you could reduce the frame size to 48 byte cells with 5 byte headers and do better than this with only the added expense of a nearly free SAR (all silicon is free, right? :-). Now, there's a solution one could really fall in love with.
 
jonathan
 
p.s.
Doing this with 4 to 8 byte preemption slots on a backplane (1m) is even more interesting:
Without: 1 hop * 1500 bytes/hop * 8 bits/byte / 10 Gb/s + 0.005 = 1.2
With: 1 hop * 6 bytes/hop * 8 bits/byte / 10 Gb/s + 0.005 = .01
 
Could that be closing in on 2 orders of magnitude?
 
Jumbo frames makes it even more fun. But, we don't talk about those.... No, smaller frames is much better than bigger ones :-)