I concur with Petar. At Intel, we consider the Top 500 supercomputers
to be a bellweather for future commercial performance compute
cluster trends. Infiniband has been growing at a faster rate than
Ethernet in the Top 500 supercomputers partly because of lower latency
and partly because of higher bandwidth/$ compared to 10GbE in the recent
past. Instead of giving up the large commercial server cluster
interconnect market to Infiniband, 802.3 should define standards that
allow Ethernet to be more competitive. A low-cost 100m cable spec for
40/100G is one of the required elements.
Rob Hays
Intel Corporation
------------------------------------------------------------------------
*From:* Petar Pepeljugoski [mailto:petarp@xxxxxxxxxx]
*Sent:* Friday, July 11, 2008 11:23 AM
*To:* STDS-802-3-HSSG@xxxxxxxxxxxxxxxxx
*Subject:* Re: [802.3BA] XR ad hoc Phone Conference Notice
Paul,
I can't disagree more with you. If you go to top500.org, and scroll down
to "Other highlights from the latest list", the last bullet says:
"The last system on the list would have been listed at position 200 in
the previous TOP500 just six months ago. This is the largest turnover
rate in the 16-year history of the TOP500 project."
So if the #200 from the list from six months ago is now listed as 500,
that means 300 NEW machines made it to the top 500 list in just six
months. This is 60% change in the list,
Contrary to your claim, this is a look at the future.
Regards,
Peter
Petar Pepeljugoski
IBM Research
P.O.Box 218 (mail)
1101 Kitchawan Road, Rte. 134 (shipping)
Yorktown Heights, NY 10598
e-mail: petarp@xxxxxxxxxx
phone: (914)-945-3761
fax: (914)-945-4134
From: Paul Kolesar <PKOLESAR@xxxxxxxxxxxx>
To: STDS-802-3-HSSG@xxxxxxxxxxxxxxxxx
Date: 07/11/2008 01:53 PM
Subject: Re: [802.3BA] XR ad hoc Phone Conference Notice
------------------------------------------------------------------------
Joel,
scanning the top 500 list is a look into the past. How many super
computers are produced each year? Probably a few tens. That means this
list is a compilation of more than 10 years worth of super computers.
InfiniBand only arrived on the scene in more recent years, and really
started to gain some market in more recent years than that. Yet it
already represents a quarter of the super computing links. As proof of
the interconnect market share trends see
http://www.top500.org/overtime/list/31/conn
Here you will observe that InfiniBand arrived on super computing
machines in 2003 and since 2006 has been the fastest growing
interconnect protocol, at the expense of almost every other protocol,
including leveling out the spectacular growth of GbE.
The 802.1 activity is more proof that this issue needs to be addressed.
The degree of continued shift towards InfiniBand will likely depend on
the success of the 802.1 efforts.
My statement was that the latency issue _could_ drive this continued
trend, not that it absolutely would. I stand by my statement as I see
no evidence that refutes it, only evidence that supports it.
As I responded earlier, I am not advocating a change to the 100m
objective. But I will find it very difficult to support a baseline that
does not include a longer reach than this with a low-cost solution.
Regards,
Paul Kolesar
CommScope Inc.
Enterprise Solutions
1300 East Lookout Drive
Richardson, TX 75082
Phone: 972.792.3155
Fax: 972.792.3111
eMail: pkolesar@xxxxxxxxxxxxx
*Joel Goergen <joel@xxxxxxxxxxxxxxx>*
07/11/2008 11:32 AM
Please respond to
joel@xxxxxxxxxxxxxxx
To
STDS-802-3-HSSG@xxxxxxxxxxxxxxxxx
cc
Subject
Re: [802.3BA] XR ad hoc Phone Conference Notice
Paul,
I have an issue with the following statement:
“ Indeed, given that latency is a major performance concern for HPC, the
vendors of such machines may prefer to use InfiniBand. This could mean
that one of the primary customers to which we have tuned our present
objective will actually not use Ethernet, but will benefit anyway by
driving InfiniBand to adopt the same 100m PMD specs that 802.3ba defines.”
I reviewed the top 500 org website for their June 2008 report on the top
500 supercomputers. Ethernet at this point has a 56.8% share of the
interconnects. See _http://www.top500.org/stats/list/31/connfam_.
Infiniband has 24.20%. So I believe that this demonstrates that this
area will use Ethernet. Next, in regards to latency, there is the Data
Center Bridging Task Group in 802.1 that is working on this.
Therefore, I do not agree with the statement that “one of the primary
customers to which we have tuned our present objective will actually not
use Ethernet.”
thanks
-joel
Paul Kolesar wrote:
Steve,
thanks for furthering the discussion. Your views make sense to me.
I'd like to examine the super computer cabling distance distribution
that Petar shared with us yesterday in a bit more detail. I've plotted
it to allow folks to see it in graphical form.
This data has several features that are remarkably similar to that of
general data center cabling.
1) The distribution is highly skewed towards the shorter end of the
distribution.
2) The distribution has a very long tail relative to the position of the
mode, the most frequent length, at 20m.
3) The mode is at a distance that is one fifth of the maximum length.
The white dot on the graph represents the coordinate of equivalent
coverage relative to the 100m objective to the data center cabling
distribution. Speaking to Steve's point that questions the correctness
of the 100m objective for HPC environments, I would venture to say that
a 25m objective, which is the roughly equivalent in coverage to the 100m
objective we are attempting to apply to data centers, would not be
satisfactory for the HPC environment, as it would leave a significant
portion of the channels without a low-cost solution.
It is clear that the 100m objective is a near-perfect match to the needs
of HPC. Yet I do not believe that HPC should be the primary focus of
our development. We must be developing a solution that properly
satisfies a much larger market than this or we are wasting our time.
Indeed, given that latency is a major performance concern for HPC, the
vendors of such machines may prefer to use InfiniBand. This could mean
that one of the primary customers to which we have tuned our present
objective will actually not use Ethernet, but will benefit anyway by
driving InfiniBand to adopt the same 100m PMD specs that 802.3ba
defines. This possibility reinforces my perspective that we need to
properly address a broader set of customers - those that operate in the
general data center environment. It is clear from all of the data and
surveys that remaining only with a 100m solution misses the mark for
this broader market. Continuing under this condition will mean that the
more attractive solution for links longer than 100m in the general data
center will be to deploy link aggregated 10GBASE-SR. Its cost will be
on par and it will reach the distances the customers need in their data
centers.
Is this the future you want for all our efforts, or do you want to face
the facts and address the issue head on with a solution that gives data
center customers what they need?
Next week these decisions will be placed before the Task Force. I hope
we choose wisely.
Regards,
Paul Kolesar
CommScope Inc.
Enterprise Solutions
1300 East Lookout Drive
Richardson, TX 75082
Phone: 972.792.3155
Fax: 972.792.3111
eMail: _pkolesar@xxxxxxxxxxxxxx <mailto:pkolesar@xxxxxxxxxxxxx>
*"Swanson, Steven E" **_<SwansonSE@xxxxxxxxxxx>_*
<mailto:SwansonSE@xxxxxxxxxxx>
07/11/2008 07:32 AM
To
_PKOLESAR@xxxxxxxxxxxxx <mailto:PKOLESAR@xxxxxxxxxxxx>,
_STDS-802-3-HSSG@xxxxxxxxxxxxxxxxxx
<mailto:STDS-802-3-HSSG@xxxxxxxxxxxxxxxxx>
cc
Subject
RE: [802.3BA] XR ad hoc Phone Conference Notice
All,
I think Paul's suggestion is a good one; I would like to add some other
input (in the form of questions) from my point of view:
*_
1. Do we have the right MMF objective (support at least 100m on OM3
fiber)?_*
My data suggests that we don't; we have tried to come at this from two
different directions, trying to be as unbiased as possible in assessing
the situation. I presented Corning sales data in November 2006 (see
_http://www.ieee802.org/3/hssg/public/nov06/swanson_01_1106.pdf_). This
data showed a need to support a link length longer than 100m and I
recommended that we support 200m at that time.
We also polled our customers, offering three options, a low cost, single
PMD at 100m on OM3, a slightly higher cost single PMD at 150-200m on
OM3, and a third option that would specify two PMDs consisting of both
option 1 and option 2. The results were overwhelmingly in favor of
Option 2, a single PMD at longer length. A small number supported Option
3 (2 PMDs) but NONE supported Option 1. While it is true that many of
our customers have a substantial portion of their link lengths that are
less than 100m, they all have link lengths longer than 100m. One
customer noted that more than half of his data center had link lengths
longer than 100m.
Kolesar presented his company's sales data in September 2006 (see
_http://www.ieee802.org/3/hssg/public/sep06/kolesar_01_0906.pdf_). His
data also suggested that longer link lengths were needed and he
recommended 150m at that time.
All the data for datacenter seems to suggest that 100m is TOO SHORT to
cover a significant portion of the datacenter application.
Pepeljugoski presented new data yesterday on HPC link lengths that show
85% being less than 20m and 98% less than 50m. This might suggest that
100m is TOO LONG for HPC applications.This leads to another question of
whether there is any economic or technical advantage to a shorter MMF
objective for HPC?
*
2. Is there consensus on supporting a longer reach objective for MMF?*
I think there is, others on the call yesterday did not. I base my
opinion on the straw poll conducted in Munich:
Straw Poll #15: Should we continue to work on a proposal for an annex to
extend the reach of a 40GBASE-SR4 and 100GBASE-SR10 in addition to the
proposal(“pepeljugoski_01_0508.pdf”) as in “jewell_01_0508.pdf”.
Yes: 55
No: 3
*3. Could we achieve 75% support for adding a new MMF objective?*
I don't know but if we could not, I would be forced to vote against
adopting the current MMF baseline proposal (which I don't want to do)
and I think others may also. This may or may not lead to an impasse
similar to what we experienced in 802.3ae.
I understand the concern that adding the objective without a clear
consensus on how to support the new objective could lead to delay but I
have found this committee to be very resourceful in driving to a
solution after we have made a decision to go forward. 40G is one recent
example of a situation where no consensus turned very quickly to consensus.
I think adding a new objective is the right approach and in the long run
will save the task force valuable development time.
*4. Can we agree on the right assumptions on the 10G model to evaluate
the various proposals?*
Everyone seems to be using slightly different variations of the model to
evaluate the capability of the proposal; we need to agree on a common
approach of analysis.
*
5. Can we not let the discussion on OM4 cloud the decision?*
We can get extended link lengths on OM3. By achieving longer lengths on
OM3, even longer lengths will be possible on OM4 with the same
specification. What I don't want people to think is that OM4 is required
to get longer lengths.
*
6. Summary*
John D'Ambrosia has provided advice that if we want to move forward with
a new MMF objective, July is the time to do it - if we delay the
decision, it is guaranteed to delay the overall process. Some might
think if we make the decision, it will delay the overall process but we
don't know that yet. I don't think adding an informative specification
on a PMD is the right way to go - let's get the MMF objective(s) right -
we owe it to ourselves and to our customers. To do anything less is just
avoiding the issue. Let's get the objectives set, get the assumptions
correct and utilize the process set up by Petrilla and Barbieri to drive
toward the hard decisions that we are all very capable of making.
Sincerely,
Steve Swanson
------------------------------------------------------------------------
*From:* Paul Kolesar [_mailto:PKOLESAR@xxxxxxxxxxxxx] *
Sent:* Thursday, July 10, 2008 7:19 PM*
To:* _STDS-802-3-HSSG@xxxxxxxxxxxxxxxxxx
<mailto:STDS-802-3-HSSG@xxxxxxxxxxxxxxxxx>*
Subject:* Re: [802.3BA] XR ad hoc Phone Conference Notice
Alessandro,
I'd like to continue your thread with some observations that have driven
me to certain conclusions, and to follow that with a suggestion about
how to parse the approach and drive to a consensus position.
First let's consider what various customers are telling us. The Corning
survey of their customers, which has been presented to the Ethernet
Alliance, the XR ad-hoc, and will be presented next week to 802.3ba,
shows that the large majority of customers want a single PMD solution
that can provide 150m on OM3 and 250m on OM4. A minority were willing
to accept a two PMD solution set that delivers the lowest cost PMD to
serve up to 100 m and a second PMD to serve the extended distances as
above. Not a single response indicated a preference for a solution
limited to 100m. We also hear strongly expressed opinions from various
system vendors that a longer distance solution is not acceptable if it
raises cost or power consumption of the currently adopted 100m PMD.
Under these conditions, and given the options presented and debated
within the XR ad-hoc, I believe you are justified in concluding that a
single PMD cannot satisfy all these constraints. Yet it is clear to me
that the market will demand a low-cost PMD that can support more than
100m to fulfill the distance needs of data centers. Therefore I
conclude that the correct compromise position is to develop a two-PMD
solution. If the committee does not undertake this development, it is
likely that several different proprietary solutions will be brought to
the market, with the net result of higher overall cost structures.
So let's consider how to choose from among the various proposals for an
extended reach PMD and let the determination of how to document it
within the standard be addressed after that.
I would propose a series of polls at next week's meeting designed to
gauge the preferences of the Task Force. I do not think that any XR
proposal will garner >75% at the outset, so I would propose the use of
Chicago rules wherein members may vote for all the proposals they find
acceptable. From this we can see which of the solutions is least
acceptable. Then through a process of elimination from the bottom, and
repeated application of Chicago rules for the remainder, finally
determine the most acceptable solution.
Depending on the degree of maturity of the specifications or other
considerations for the chosen solution, the Task Force will be better
able to determine how it should be handled within the standard. For
example, a proposal with a maturity on par with the adopted baseline
could be put forth under a new objective without undue concern of
becoming a drag on the timeline, while a proposal of lesser maturity
could be placed in an annex without an additional objective.
Regards,
Paul Kolesar
CommScope Inc.
Enterprise Solutions
1300 East Lookout Drive
Richardson, TX 75082
Phone: 972.792.3155
Fax: 972.792.3111
eMail: _pkolesar@xxxxxxxxxxxxxx <mailto:pkolesar@xxxxxxxxxxxxx>
*"Alessandro Barbieri (abarbier)" **_<abarbier@xxxxxxxxx>_*
<mailto:abarbier@xxxxxxxxx>
07/10/2008 04:43 PM
Please respond to
"Alessandro Barbieri (abarbier)" _<abarbier@xxxxxxxxx>_
<mailto:abarbier@xxxxxxxxx>
To
_STDS-802-3-HSSG@xxxxxxxxxxxxxxxxxx
<mailto:STDS-802-3-HSSG@xxxxxxxxxxxxxxxxx>
cc
Subject
Re: [802.3BA] XR ad hoc Phone Conference Notice
Matt,
here is my *personal* read of the situation in the XR ad hoc:
a) I think there could be consensus on supporting XR, as long as we pick
a solution that does not impact the cost structure of the 100m PMD.
Because of that I also don't feel a single PMD is realistic at this point.
a) The trouble however is that there is no consensus (>75%) on any of
the technical proposals. No one proposal has a clear lead over the others.
Of the three options you list below, I think adding an objective for a
ribbon XR PMD could have a major impact on the project schedule, because
it seems we are nowhere near technical consensus. We could drag the
discussion for several TF meetings...I am not sure delaying the project
over this specific topic is worth it.
We can always resort to non-standard solutions to fulfill market
requirements we can't address within IEEE, or come back in the future
with another CFI.
At the end of the conference call earlier today I requested that we get
together after hours next week to see if we can accelerate consensus
building.
All the data is on the table now, so if we don't show any material
progress, I am not sure we should extend this ad hoc.
Alessandro
------------------------------------------------------------------------
*From:* Matt Traverso [_mailto:matt.traverso@xxxxxxxxxx] *
Sent:* Thursday, July 10, 2008 10:07 AM*
To:* _STDS-802-3-HSSG@xxxxxxxxxxxxxxxxxx
<mailto:STDS-802-3-HSSG@xxxxxxxxxxxxxxxxx>*
Subject:* Re: [802.3BA] XR ad hoc Phone Conference Notice
Colleagues,
I feel that we are coming to a situation similar to the impasse at 40G
vs. 100G where different participants call different segments of the
networking industry their customer.
For MMF, I'd like to see an optimized solution at 100m per all of the
work that has been done.
I'd like to understand if folks feel that a different status for the
extended reach
a) Informative
b) Normative
c) New objective
would significantly alter the technically proposed solution from the Ad
Hoc. Opinions?
Chris,
The case of slow market/industry transition from LX4 to LRM is one of
the reasons why I would like to see the industry adopt 40G serial from
the launch. The slow adoption of LRM has primarily been limited by end
customer knowledge of the solution. 40G serial technology is available.
thanks
--matt
Hi Gourgen,
Some numbers might help clarify what close to 0 means.
For 2008, Lightcounting gives a shipment number of approximately 30,000
for 10GE-LRM (and for 10GE-LX4 it's about 60,000.) So close to 0 would
apply if we were rounding to the nearest 100K. As an aside, 10GE-LRM
supports 220m of MMF, not 300m.
300m of OM3 is supported by 10GE-SR, which Lightcounting gives as
approximately 400,000 in 2008, so that would be close to 0 if we
rounding to the nearest 1M.
Another interesting sideline in looking at these numbers is that 2 years
after the 10GE-LRM standard was adopted in 2006, despite the huge
investment being made in 10GE-LRM development, and despite very little
new investment being made in 10GE-LX4, the 10GE CWDM equivalent (i.e.
10GE-LX4, 4x3G) is chugging along at 2x the volume of the 10GE Serial
solution that was adopted to replace it.
This should put some dim on hopes that very low cost 40GE Serial
technology can be developed from scratch in two years and ship in volume
when the 40GE standard is adopted in 2010.
Chris
------------------------------------------------------------------------
*From:* Gourgen Oganessyan [mailto:_gourgen@xxxxxxxxxxxx
<mailto:gourgen@xxxxxxxxxxx>] *
Sent:* Wednesday, July 09, 2008 8:02 PM
*To:* _STDS-802-3-HSSG@xxxxxxxxxxxxxxxxxx
<mailto:STDS-802-3-HSSG@xxxxxxxxxxxxxxxxx>*
Subject:* Re: [802.3BA] XR ad hoc Phone Conference Notice
Petar,
Well, sadly that's what has been happening in the 10G world, people are
forced to amortize the cost of 300m reach (LRM), while in reality the
number of people who need 300m is close to 0.
That's why I am strongly in support of your approach of keeping the 100m
objective as primary goal.
Frank, OM4 can add as much cost as it wants to, the beauty is the added
cost goes directly where it's needed, which is the longer links.
Alternatives force higher cost/higher power consumption on all ports
regardless of whether it's needed there or not.
*Gourgen Oganessyan*
*Quellan Inc.*
Phone: (630)-802-0574 (cell)
Fax: (630)-364-5724
e-mail: _gourgen@xxxxxxxxxxxx <mailto:gourgen@xxxxxxxxxxx>
------------------------------------------------------------------------
*From:* Petar Pepeljugoski [mailto:_petarp@xxxxxxxxxxx
<mailto:petarp@xxxxxxxxxx>] *
Sent:* Wednesday, July 09, 2008 7:51 PM*
To:* _STDS-802-3-HSSG@xxxxxxxxxxxxxxxxxx
<mailto:STDS-802-3-HSSG@xxxxxxxxxxxxxxxxx>*
Subject:* Re: [802.3BA] XR ad hoc Phone Conference Notice
Frank,
If I interpret correctly, you are saying that all users should amortize
the cost of very few who need extended reach.
We need to be careful how we proceed here - we should not repeat the
mistakes of the past if we want successful standard.
Regards,
Peter
Petar Pepeljugoski
IBM Research
P.O.Box 218 (mail)
1101 Kitchawan Road, Rte. 134 (shipping)
Yorktown Heights, NY 10598
e-mail: _petarp@xxxxxxxxxxx <mailto:petarp@xxxxxxxxxx>
phone: (914)-945-3761
fax: (914)-945-4134
From: Frank Chang <_ychang@xxxxxxxxxxxx <mailto:ychang@xxxxxxxxxxx>>
To: _STDS-802-3-HSSG@xxxxxxxxxxxxxxxxxx
<mailto:STDS-802-3-HSSG@xxxxxxxxxxxxxxxxx>
Date: 07/09/2008 10:29 PM
Subject: Re: [802.3BA] XR ad hoc Phone Conference Notice
------------------------------------------------------------------------
Hi Jeff;
Thanks for your comment. You missed one critical point that there is
cost increase from OM3 to OM4. If you take ribbon cable cost in
perspective, OM4 option is possibly the largest of the 4 options.
Besides, the use of OM4 requires to tighten TX specs which impact TX
yield, so you are actually compromising the primary goal.
Frank
------------------------------------------------------------------------
*From:* Jeff Maki [_mailto:jmaki@xxxxxxxxxxxx] *
Sent:* Wednesday, July 09, 2008 7:02 PM*
To:* _STDS-802-3-HSSG@xxxxxxxxxxxxxxxxxx
<mailto:STDS-802-3-HSSG@xxxxxxxxxxxxxxxxx>*
Subject:* Re: [802.3BA] XR ad hoc Phone Conference Notice
Dear MMF XR Ad Hoc Committee Members,
I believe our current objective of "at least 100 meters on OM3 MMF"
should remain as a primary goal, the baseline. Support for any form of
extended reach should be considered only if it does not compromise this
primary goal. A single PMD for all reach objectives is indeed a good
starting premise; however, it should not be paramount. In the following
lists are factors, enhancements, or approaches I would like to put
forward as acceptable and not acceptable for obtaining extended reach.
Not Acceptable:
1. Cost increase for the baseline PMD (optic) in order to obtain greater
than 100-meter reach
2. EDC on the system/host board in any case
3. CDR on the system/host board as part of the baseline solution
4. EDC in the baseline PMD (optic)
5. CDR in the baseline PMD (optic)
Acceptable:
1. Use of OM4 fiber
2. Process maturity that yields longer reach with no cost increase
In summary, we should not burden the baseline solution with cost
increases to meet the needs of an extended-reach solution.
Sincerely,
Jeffery Maki
————————————————
Jeffery J. Maki, Ph.D.
Principal Optical Engineer
Juniper Networks, Inc.
1194 North Mathilda Avenue
Sunnyvale, CA 94089-1206
Voice +1-408-936-8575
FAX +1-408-936-3025 _
__www.juniper.net_ <http://www.juniper.net/> _
__jmaki@xxxxxxxxxxxx <mailto:jmaki@xxxxxxxxxxx>
————————————————