Re: [8023-CMSG] Server/NIC analogy
Hugh, Jonathon, Ben,
I think this discussion is off on a tangent. There are assumptions being
made here that are off-base. We need to focus our attention on what it
is we are trying to enable with new standards. (My numbered items are
responses to Hugh's numbered items.)
1. If what we are trying to enable are single stage interconnects for
backplanes, then wrt the IEEE standards, we're done. We just need to get
good implementations of NIC's and switches using 802.3x (rate control,
not XON/XOFF) to meet the requirements (e.g. good enough throughput, low
latency, low latency variation, no loss due to congestion). But ...
single stage interconnects are not very interesting to people who want
to construct larger interconnects to tie multiple racks with multiple
shelves of blades together into a single system.
2. (Putting on my server hat) We're NOT asking for IEEE to provide
end-to-end congestion management mechanisms. If IEEE can simply
standardize some tweaks to the current 802.3 (& 802.1) standards to
support better congestion visibility at layer 2 and better methods of
reacting to congestion at layer 2 (more selective rate control and no
frame drops), then the rest can be left up to the upper layers. There
are methods that can be implemented in layer 2 that don't prohibit
scalability. Scalability may be limited to a few hops, but that is all
that is needed.
3. The assumption in item 3 is not entirely true. There are
relationships (that can be automatically discovered or configured) that
can be expoited for significantly improved layer 2 congestion control.
4. For backpressure to work, it neither requires congestion to be pushed
all the way back to the source nor does it require the backpressuring
device to accurately predict the future. From the layer 2 perspective,
the source may be a router. So back pressure only needs to be pushed up
to the upper layers (which could be a source endpoint or a router).
Also, the backpressuring device simply needs to know its own state of
congestion and be able to convey clues to that state to the surrounding
devices. We don't need virtual circuits to supported at layer 2 to get
"good enough" congestion control.
5. From an implementation perspective, I believe the queues can go
either in the MAC or the bridge, depending on the switch implementation.
(Am I wrong? I haven't seen anything in the interface between the bridge
and the MAC that would force the queues to be in the bridge.) IMO, where
they go should NOT be dictated by either 802.1 or 802.3. The interface
between the bridge and MAC should be defined to enable the queues to be
place where most appropriate for the switch architecture. In fact, a
switch could be implemented such that frame payloads bypass the bridge
and the bridge only deal with the task of routing frame handles from MAC
receivers to one or more MAC transmitters (Do the 802.1 standards
prevent such a design?).
As far as the IETF standards go, they don't seem to rely on layer 2 to
drop frames (although we don't yet have a clear answer on this). If a
router gets overwhelmed, it will drop packets. But if it supports ECN,
it can start forwarding ECN notices before becoming overwhelmed. I think
the jury is still out on whether the upper layers (in a confined
network) would work better with layer 2 backpressure or layer 2 drops.
From a datacenter server perspective, there is no doubt in my mind that
backpressure would be preferrable to drops.
Gary
-----Original Message-----
From: owner-stds-802-3-cm@LISTSERV.IEEE.ORG
[mailto:owner-stds-802-3-cm@LISTSERV.IEEE.ORG] On Behalf Of Hugh Barrass
Sent: Saturday, June 05, 2004 8:06 AM
To: STDS-802-3-CM@LISTSERV.IEEE.ORG
Subject: Re: [8023-CMSG] Server/NIC analogy
Jonathan and Ben,
Jonathan's summary matches my perception of the problem, please correct
us if we're wrong. So a few points:
1. NIC - switch - NIC is not 1 hop it is 2. If you say that the
backpressure must only traverse 1 hop then that precludes a destination
pushing back to the source.
2. On the other hand, I believe that what is being requested is a
mechanism to signal congestion from the destination to the source. It
may be limited to structures with only one bridging element but I think
it runs into a number of problems. The first (and IMHO most important)
is that such a scheme (effectively) prohibits scaleability - there is no
possibility of moving to a multi-stage or multi-path fabric. How could
such a closed system be linked together transparently with another
system?
3. LANs defined by IEEE 802 are (generally) connectionless. This is
particularly the case for 802.3/802.1 "Ethernet networks." This means
that there is no specific relationship between the destination and the
source that can be exploited for such a backpressure mechanism.
4. For backpressure mechanisms to work they require that the congestion
is pushed back to the source and that the backpressuring device can
accurately predict the future. This second part is difficult to achieve
with current technology... Imagine a situation where device B is
receiving too much traffic from device A. Device B sends a message to
device A to tell it to limit its transmit rate. However device A is just
about to finish its transmission to device B and has a large
transmission pending for device C - which is currently uncongested. In
the same network at another time when device B is receiving too much
traffic from device A. Device B sends a message to device A to tell it
to limit its transmit rate. In this situation, device D is just about to
start transmitting to device B - causing the overcongestion that we
tried to avoid. The solution requires that all devices have to maintain
separate input queues for all sources and output queues for all
destinations. Such a "virtual circuit" architecture has already been
standardized and I suggest that it would not be in the best interest of
the networking industry to redefine it inside Ethernet.
5. Finally, 802.1 defines queues for LANs, 802.3 does not. The queue
definitions required for any such mechanism would have to be defined for
end-to-end operation and would therefore be out of scope for 802.3.
Given that such a mechanism would operate at the endpoints but might not
have any effect on the intermediate network elements, I think it might
even be out of scope for 802.1 bridging. I suggest that interested
parties might be well advised to consider a definition in IETF (or
elsewhere if appropriate) for the transport layer protocol that includes
congestion management.
Hugh.
Jonathan Thatcher wrote:
>Ben,
>
>If I get this right, you are painting a picture where the switch/bridge
>and the server(s)/NIC(s) are integrated into a single system.
>
>The feedback from the switch/bridge would extend back solely to the
>server(s)/NIC(s).
>
>It is presumed that the NIC already has an implementation specific
>means to throttle the processor. It is presumed that an implementation
>specific means will be created to tie the feedback mechanism to the
>throttle.
>
>It is further presumed that the switch/bridge/line-card can readily
>identify the source(s) of the traffic that are causing congestion.
>
>Finally, it is presumed that the congestion is, in Bob Grow's words,
>transitory. As Hugh implies below, if this problem is a subscription
>problem, then rate limiting is an adequate, if not ideal, solution.
>
>Did I capture this correctly?
>
>Presuming so, you have defined the problem as local to the specific
>system.
>
>You have also taken an interesting twist on Hugh's point of moving the
>"choke point" by putting it back to where it would have gone anyway, to
>the source. In short, as there is only one hop, there is no other place
>for it to migrate to.
>
>This is a curious concept in that there is no communication between
>bridges, nor is there an implied bridge in the NIC. If so, there is no
>question about ownership of the problem :-)
>
>Hmmmmmmmmm.
>
>
>
>