Thread Links | Date Links | ||||
---|---|---|---|---|---|
Thread Prev | Thread Next | Thread Index | Date Prev | Date Next | Date Index |
Gary, You say you're seeing promising results from simulations but you're not ready to share the data. I certainly hope that will change before the presentation deadline for the July meeting in 4 weeks. I don't mean to pick on you but you seem to be the only one that is taking up the flag AND at least suggesting that there is simulation data to back up your claims. As chair of this group, I'm trying to stir up discussion in order to get all the arguments on the table. If there are flaws in these arguments (the "gospels" as you call them) and the exploitation of these flaws has broad market potential and is both technically and economically feasible, then we need to get this information disseminated as soon as possible. I don't think we can try to go through the July meeting without this material and expect to get a continuation of this study group. Regards, Ben McAlpine, Gary L wrote: Norm, I agree with you on many of your points below. A higher granularity of "flow" than 8 priorities is needed to get any significant improvement across multiple stages of switching. I know I'm being vague about exactly what granularity of "flow" on which I want to exert targeted influence (rate control/backpressure). It's not because I don't know, it's because any discussions on the subject without data to back the proposals will "simply" turn into a big rathole. I am busy developing the data. I understand all your arguments below. I've been listening to the same ones for the last 15 years and, until a few years ago, treating them as the gospel. It wasn't until I set out to thoroughly understand the gorey details through simulations that I realized there were some interesting flaws in the "old" assumptions that can be very effectively exploited in confined networks such as multi-stage cluster interconnects. I guess I don't see such a clear boundary of responsibility between 802.1 and 802.3 as you. I think it's an IEEE problem. And since the target link technology is Ethernet, then the focus should be on the 802.3 support required to enable acceptable Ethernet based solutions. I think 802.1 needs to be part of a complete solution, but only to the extent of including support for the 802.3 mechanisms. Gary -----Original Message----- From: owner-stds-802-3-cm@LISTSERV.IEEE.ORG [mailto:owner-stds-802-3-cm@LISTSERV.IEEE.ORG] On Behalf Of Norman Finn Sent: Wednesday, June 09, 2004 2:33 AM To: STDS-802-3-CM@LISTSERV.IEEE.ORG Subject: Re: [8023-CMSG] Server/NIC analogy Gary, McAlpine, Gary L wrote: > > I think this discussion is off on a tangent. One can reasonably claim that you're the one who's off on a tangent. One man's tangent is another man's heart of the argument. You keep saying, "we're just ..." and "we're only ..." and "we're simply ..." and failing to acknowledge our "but you're ..." arguments. Specifically: You want back pressure on some level finer than whole links. The heart of the argument, that you are not addressing in your last message, is, "On exactly what granularity do you want to exert back pressure?" The answer to that question is, inevitably, "flows". (I have no problem that "flows" are relatively undefined; we dealt with that in Link Aggregation.) Per-flow back pressure is the "but you're ..." argument. Hugh Barrass's comments boil down to exactly this point. You want to have per-flow back pressure. The "per-something Pause" suggestions have mentioned VLANs and priority levels as the granularity. The use of only 8 priority levels, and thus only 8 flows, is demonstrably insufficient in any system with more than 9 ports. For whatever granularity you name, you require at least one queue in each endstation transmitter for each flow in which that transmitter participates. Unfortunately, this o(n) problem in the endstations is an o(n**2) problem in the switch. A simple-minded switch architecture requires one queue per flow on each inter-switch trunk port, which means o(n**2) queues per trunk port. The construction of switches to handle back-pressured flows without requiring o(n**2) queues per inter-switch port has been quite thoroughly explored by ATM and Fibre Channel, to name two. It is *not* an easy problem to solve. At the scale of one switch, one flow per port, and only a few ports, as Ben suggests, it is easy and quite convenient to ignore the o(n**2) factor, and assume that the per-link back pressure protocol is the whole problem. Unfortunately, as you imply in your e-mail below, the trivial case of a one-switch "network" is insufficient. As soon as you scale the system up to even "a few hops", as you suggest, the number of ports has grown large enough to stress even a 12-bit tag per link. Furthermore, to assume that a given pair of physical ports will never want to have multiple flows, e.g. between different processes in the CPUs, is to deny the obvious. In other words, implementing per-flow back pressure, even in networks with a very small number of switches, very quickly requires very sophisticated switch architectures. For a historical example, just look at Fibre Channel. It started with very similar goals, and very similar scaling expectations, to what you're talking about, here. (The physical size was different because of the technology of the day, but the number of ports and flows was quite similar.) Fibre Channel switches are now quite sophisticated, because the problem they are solving becomes extraordinarily difficult even for relatively small networks. Summary: This project, as described by its proponents, is per-flow switching. It is not the job of 802.3 to work on switching based even on MAC address, much less per-flow switching. It is essential that anyone who desires to work on per-flow switching in 802 or any forum become familiar with what the real problems are, and what solutions exist. -- Norm > ... There are assumptions beingmade here that are off-base. We need to focus our attention on what itis we are trying to enable with new standards. (My numbered items are responses to Hugh's numbered items.) 1. If what we are trying to enable are single stage interconnects for backplanes, then wrt the IEEE standards, we're done. We just need to get good implementations of NIC's and switches using 802.3x (rate control, not XON/XOFF) to meet the requirements (e.g. good enough throughput, low latency, low latency variation, no loss due to congestion). But ... single stage interconnects are not very interesting to people who want to construct larger interconnects to tie multiple racks with multiple shelves of blades together into a single system. 2. (Putting on my server hat) We're NOT asking for IEEE to provide end-to-end congestion management mechanisms. If IEEE can simply standardize some tweaks to the current 802.3 (& 802.1) standards to support better congestion visibility at layer 2 and better methods of reacting to congestion at layer 2 (more selective rate control and no frame drops), then the rest can be left up to the upper layers. There are methods that can be implemented in layer 2 that don't prohibit scalability. Scalability may be limited to a few hops, but that is allthat is needed. 3. The assumption in item 3 is not entirely true. There are relationships (that can be automatically discovered or configured) that can be expoited for significantly improved layer 2 congestion control. 4. For backpressure to work, it neither requires congestion to be pushed all the way back to the source nor does it require the backpressuring device to accurately predict the future. From the layer2 perspective, the source may be a router. So back pressure only needsto be pushed up to the upper layers (which could be a source endpoint or a router). Also, the backpressuring device simply needs to know itsown state of congestion and be able to convey clues to that state to the surrounding devices. We don't need virtual circuits to supported at layer 2 to get "good enough" congestion control. 5. From an implementation perspective, I believe the queues can go either in the MAC or the bridge, depending on the switch implementation. (Am I wrong? I haven't seen anything in the interface between the bridge and the MAC that would force the queues to be in the bridge.) IMO, where they go should NOT be dictated by either 802.1or 802.3. The interface between the bridge and MAC should be defined to enable the queues to be place where most appropriate for the switcharchitecture. In fact, a switch could be implemented such that frame payloads bypass the bridge and the bridge only deal with the task of routing frame handles from MAC receivers to one or more MAC transmitters (Do the 802.1 standards prevent such a design?). As far as the IETF standards go, they don't seem to rely on layer 2 todrop frames (although we don't yet have a clear answer on this). If a router gets overwhelmed, it will drop packets. But if it supports ECN,it can start forwarding ECN notices before becoming overwhelmed. I think the jury is still out on whether the upper layers (in a confined network) would work better with layer 2 backpressure or layer 2 drops. >From a datacenter server perspective, there is no doubt in my mindthatbackpressure would be preferrable to drops. Gary -- ----------------------------------------- Benjamin Brown 178 Bear Hill Road Chichester, NH 03258 603-491-0296 - Cell 603-798-4115 - Office benjamin-dot-brown-at-ieee-dot-org (Will this cut down on my spam???) ----------------------------------------- |