Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: [8023-CMSG] Purpose



Title:
Hugh,
 
Okay, lets use near worst case numbers (worst case means that the implementer is choosing to suboptimize performance).
 
All in microseconds
Number of hops: 6
Delay on 100 m fiber (total, not per hop; speed of light in glass): 100 m / (300,000,000 m/s * 2/3) = 0.5
Wait (Number of bytes): 1500
Rate: 10Gb/s
Preemption slot (delay due to preemption): 64 Bytes (4 to 8 is more realistic)
 
Latency without preemption: 6 hops * 1500 bytes/hop * 8 bits/byte / 10 Gb/s + 0.5 = 7.2 + 0.5 = 7.7
Latency with preemption: 6 hops * 64 bytes/hop * 8 bits/byte / 10 Gb/s + 0.5 = 0.3 + 0.5 = 0.8
 
That looks like an order of magnitude to me.
 
This doesn't include the serialization/deserialization, but the 64 bytes in preemption slot more than covers that.
 
On the other hand, you could reduce the frame size to 48 byte cells with 5 byte headers and do better than this with only the added expense of a nearly free SAR (all silicon is free, right? :-). Now, there's a solution one could really fall in love with.
 
jonathan
 
p.s.
Doing this with 4 to 8 byte preemption slots on a backplane (1m) is even more interesting:
Without: 1 hop * 1500 bytes/hop * 8 bits/byte / 10 Gb/s + 0.005 = 1.2
With: 1 hop * 6 bytes/hop * 8 bits/byte / 10 Gb/s + 0.005 = .01
 
Could that be closing in on 2 orders of magnitude?
 
Jumbo frames makes it even more fun. But, we don't talk about those.... No, smaller frames is much better than bigger ones :-)
-----Original Message-----
From: Hugh Barrass [mailto:hbarrass@cisco.com]
Sent: Friday, April 30, 2004 8:15 AM
To: jonathan.thatcher@ieee.org
Cc: STDS-802-3-CM@LISTSERV.IEEE.ORG
Subject: Re: [8023-CMSG] Purpose

Jonathan,

I only mentioned average once, all other numbers are worst case. If you deal with averages then the numbers are much lower because you must deal with the average frame size, the average time of arrival for the pre-empting packet and the average link utilization. Therefore the average latency will be much lower than the worst case latency quoted (1.2uS with max length, 400ns with MTU = 500bytes). If we deal with averages then there is almost no perceptible benefit to preemption. Some algorithms are able to benefit from unpredictable but fast communication, others must deal with worst case and may suffer more because of jitter than absolute delay.  Network components may be architected to benefit either type of application without any change to current (layer 1 & 2) standards.

Hugh.

Jonathan Thatcher wrote:
Hugh,

Interesting how you are choosing to bias the numbers. In some cases you
choose to use averages, in other cases you choose to use worst case. Hmmmm.

If I were using averages, then the average distance would probably be short
of 20 m. If distance were required, then optics would probably be used when
latency was an issue. The average number of hops would not be one.

If I were using worst case, then the latency would be N hops times the wait
on a maximum size packet. Etc.

Implicit in your note is that this is somehow a difficult thing to do. It
isn't. It is far simpler than rate control.

jonathan

  
-----Original Message-----
From: owner-stds-802-3-cm@LISTSERV.IEEE.ORG
[mailto:owner-stds-802-3-cm@LISTSERV.IEEE.ORG]On Behalf Of
Hugh Barrass
Sent: Friday, April 30, 2004 6:03 AM
To: STDS-802-3-CM@LISTSERV.IEEE.ORG
Subject: Re: [8023-CMSG] Purpose


Arthur,

I agree that preemption is a fine idea, but in my view it
falls into the
"not worth the effort" category. Assuming that any new definition that
we could make will not be standardized until 2006 & will be commonly
available in silicon at least a year later, I think we can
safely ignore
any Ethernet interfaces below 1Gbps. Even Gigabit Ethernet seems
somewhat pedestrian for high-end data center applications and I would
suggest that anyone concerned about the latency penalty of
the frame in
progress at Gigabit speed would be well advised to migrate to
10G before
2007.

In that timeframe, a user will have the choice of 10GBASE-CX4 and
10GBASE-T for (cheap) copper interfaces. The former seems
ideal for data
center as it is extremely low latency and targeted at the shorter
distances necessary for system-system communication. If the distances
involved force a requirement of distances up to 100m, making
10GBASE-T a
necessity, then the latency budget will be swamped by the physical
distance (500ns @ 100m) and the PMA/PCS latency of 10GBASE-T (probably
~1uS).

A maximum length frame in progress at 10Gbps will take ~1.2uS, making
the average gain due to pre-emption ~600uS (ignoring packet
mix and link
utilization). Even taking the maximum delay (which will map
to the delay
jitter component), the order of magnitude is similar to the
fixed delay
of 10GBASE-T and therefore cannot possibly lead to a significant
reduction for systems using that technology.

Assuming that the speed-crazed implementor chooses 10GBASE-CX4 and
wishes to eliminate the 1.2uS max jitter then there are two
options. The
first is preemption - which can significantly reduce this
(depending on
the definition) but will involve significant new work. The alternative
is to reduce the MTU - which involves no new work. Changing
the MTU from
1500 bytes to 500 bytes reduces the maximum jitter to 400nS at the
expense of  ~3% extra overhead. Further reductions can be achieved for
larger overheads - which is a tradeoff that can be made at system
configuration time. I'm fairly sure that some will argue that the MTU
needs to be increased (to 9k, 16k, 64k or higher) because
software/firmware based NICs cannot encapsulate small frames at line
speed and 1982 vintage routers cannot switch line rate streams of
minimum size packets. I would suggest that anyone who is
serious enough
to be asking for a new standard to improve latency should be using
hardware acceleration for packetization and true wire speed
switch fabrics.

Assuming that the MTU has been reduced to 400nS, smart switch fabric
designers might wish to employ some techniques which can reduce the
jitter further at the expense of an increase in fixed latency. Given
that the fixed latency of the copper interconnect is approaching the
same magnitude, this seems like a reasonable tradeoff to make
for system
performance (assuming that delay variation is the problem).

In summary, the net gain that can be achieved by preemption
is too small
to make a difference except in the most extreme
circumstances. For most
applications, current standards can be utilized (at layer 1 & 2)  to
attain acceptable performance therefore the demand for silicon
implementing a new standard will be limited to a niche of a niche. If
the application area is sufficiently small then more exotic (or
targeted) technologies may have a competitive edge - there will be no
"Ethernet advantage."

Hugh.

Arthur Marris wrote:

    
Jonathan,
  The presentation you gave in March at the Data Center Ethernet CFI
suggested preemption as an area for exploration.

  Preemption would require a minor change to the PCS to
      
support extra
    
control-codes.

  Supporting preemption seems like a worthwhile objective as every
microsecond is precious in cluster computing.

Arthur.