I concur with Petar. At Intel, we consider the Top 500
supercomputers to be a bellweather for future commercial performance
compute cluster trends. Infiniband has been growing at a faster rate
than Ethernet in the Top 500 supercomputers partly because of lower
latency and partly because of higher bandwidth/$ compared to 10GbE in
the recent past. Instead of giving up the large commercial server
cluster interconnect market to Infiniband, 802.3 should define
standards that allow Ethernet to be more competitive. A low-cost
100m cable spec for 40/100G is one of the required elements.
Rob Hays
Intel Corporation
------------------------------------------------------------------------
*From:* Petar Pepeljugoski [mailto:petarp@xxxxxxxxxx]
*Sent:* Friday, July 11, 2008 11:23 AM
*To:* STDS-802-3-HSSG@xxxxxxxxxxxxxxxxx
*Subject:* Re: [802.3BA] XR ad hoc Phone Conference Notice
Paul,
I can't disagree more with you. If you go to top500.org, and scroll
down to "Other highlights from the latest list", the last bullet says:
"The last system on the list would have been listed at position 200
in the previous TOP500 just six months ago. This is the largest
turnover rate in the 16-year history of the TOP500 project."
So if the #200 from the list from six months ago is now listed as
500, that means 300 NEW machines made it to the top 500 list in just
six months. This is 60% change in the list,
Contrary to your claim, this is a look at the future.
Regards,
Peter
Petar Pepeljugoski
IBM Research
P.O.Box 218 (mail)
1101 Kitchawan Road, Rte. 134 (shipping)
Yorktown Heights, NY 10598
e-mail: petarp@xxxxxxxxxx
phone: (914)-945-3761
fax: (914)-945-4134
From: Paul Kolesar <PKOLESAR@xxxxxxxxxxxx>
To: STDS-802-3-HSSG@xxxxxxxxxxxxxxxxx
Date: 07/11/2008 01:53 PM
Subject: Re: [802.3BA] XR ad hoc Phone Conference Notice
------------------------------------------------------------------------
Joel,
scanning the top 500 list is a look into the past. How many super
computers are produced each year? Probably a few tens. That means
this list is a compilation of more than 10 years worth of super
computers. InfiniBand only arrived on the scene in more recent
years, and really started to gain some market in more recent years
than that. Yet it already represents a quarter of the super
computing links. As proof of the interconnect market share trends
see http://www.top500.org/overtime/list/31/conn
Here you will observe that InfiniBand arrived on super computing
machines in 2003 and since 2006 has been the fastest growing
interconnect protocol, at the expense of almost every other protocol,
including leveling out the spectacular growth of GbE.
The 802.1 activity is more proof that this issue needs to be
addressed. The degree of continued shift towards InfiniBand will
likely depend on the success of the 802.1 efforts.
My statement was that the latency issue _could_ drive this continued
trend, not that it absolutely would. I stand by my statement as I
see no evidence that refutes it, only evidence that supports it.
As I responded earlier, I am not advocating a change to the 100m
objective. But I will find it very difficult to support a baseline
that does not include a longer reach than this with a low-cost
solution.
Regards,
Paul Kolesar
CommScope Inc.
Enterprise Solutions
1300 East Lookout Drive
Richardson, TX 75082
Phone: 972.792.3155
Fax: 972.792.3111
eMail: pkolesar@xxxxxxxxxxxxx
*Joel Goergen <joel@xxxxxxxxxxxxxxx>*
07/11/2008 11:32 AM
Please respond to
joel@xxxxxxxxxxxxxxx
To
STDS-802-3-HSSG@xxxxxxxxxxxxxxxxx
cc
Subject
Re: [802.3BA] XR ad hoc Phone Conference Notice
Paul,
I have an issue with the following statement:
“ Indeed, given that latency is a major performance concern for HPC,
the vendors of such machines may prefer to use InfiniBand. This
could mean that one of the primary customers to which we have tuned
our present objective will actually not use Ethernet, but will
benefit anyway by driving InfiniBand to adopt the same 100m PMD specs
that 802.3ba defines.”
I reviewed the top 500 org website for their June 2008 report on the
top 500 supercomputers. Ethernet at this point has a 56.8% share of
the interconnects. See
_http://www.top500.org/stats/list/31/connfam_. Infiniband has
24.20%. So I believe that this demonstrates that this area will use
Ethernet. Next, in regards to latency, there is the Data Center
Bridging Task Group in 802.1 that is working on this.
Therefore, I do not agree with the statement that “one of the primary
customers to which we have tuned our present objective will actually
not use Ethernet.”
thanks
-joel
Paul Kolesar wrote:
Steve,
thanks for furthering the discussion. Your views make sense to me.
I'd like to examine the super computer cabling distance distribution
that Petar shared with us yesterday in a bit more detail. I've
plotted it to allow folks to see it in graphical form.
This data has several features that are remarkably similar to that of
general data center cabling. 1) The distribution is highly skewed
towards the shorter end of the distribution. 2) The distribution has
a very long tail relative to the position of the mode, the most
frequent length, at 20m.
3) The mode is at a distance that is one fifth of the maximum length.
The white dot on the graph represents the coordinate of equivalent
coverage relative to the 100m objective to the data center cabling
distribution. Speaking to Steve's point that questions the
correctness of the 100m objective for HPC environments, I would
venture to say that a 25m objective, which is the roughly equivalent
in coverage to the 100m objective we are attempting to apply to data
centers, would not be satisfactory for the HPC environment, as it
would leave a significant portion of the channels without a low-cost
solution.
It is clear that the 100m objective is a near-perfect match to the
needs of HPC. Yet I do not believe that HPC should be the primary
focus of our development. We must be developing a solution that
properly satisfies a much larger market than this or we are wasting
our time. Indeed, given that latency is a major performance concern
for HPC, the vendors of such machines may prefer to use InfiniBand.
This could mean that one of the primary customers to which we have
tuned our present objective will actually not use Ethernet, but will
benefit anyway by driving InfiniBand to adopt the same 100m PMD specs
that 802.3ba defines. This possibility reinforces my perspective
that we need to properly address a broader set of customers - those
that operate in the general data center environment. It is clear
from all of the data and surveys that remaining only with a 100m
solution misses the mark for this broader market. Continuing under
this condition will mean that the more attractive solution for links
longer than 100m in the general data center will be to deploy link
aggregated 10GBASE-SR. Its cost will be on par and it will reach the
distances the customers need in their data centers.
Is this the future you want for all our efforts, or do you want to
face the facts and address the issue head on with a solution that
gives data center customers what they need?
Next week these decisions will be placed before the Task Force. I
hope we choose wisely.
Regards,
Paul Kolesar
CommScope Inc.
Enterprise Solutions
1300 East Lookout Drive
Richardson, TX 75082
Phone: 972.792.3155
Fax: 972.792.3111
eMail: _pkolesar@xxxxxxxxxxxxxx <mailto:pkolesar@xxxxxxxxxxxxx>
*"Swanson, Steven E" **_<SwansonSE@xxxxxxxxxxx>_*
<mailto:SwansonSE@xxxxxxxxxxx>
07/11/2008 07:32 AM
To
_PKOLESAR@xxxxxxxxxxxxx <mailto:PKOLESAR@xxxxxxxxxxxx>,
_STDS-802-3-HSSG@xxxxxxxxxxxxxxxxxx
<mailto:STDS-802-3-HSSG@xxxxxxxxxxxxxxxxx>
cc
Subject
RE: [802.3BA] XR ad hoc Phone Conference Notice
All,
I think Paul's suggestion is a good one; I would like to add some
other input (in the form of questions) from my point of view:
*_
1. Do we have the right MMF objective (support at least 100m on OM3
fiber)?_*
My data suggests that we don't; we have tried to come at this from
two different directions, trying to be as unbiased as possible in
assessing the situation. I presented Corning sales data in November
2006 (see
_http://www.ieee802.org/3/hssg/public/nov06/swanson_01_1106.pdf_).
This data showed a need to support a link length longer than 100m and
I recommended that we support 200m at that time.
We also polled our customers, offering three options, a low cost,
single PMD at 100m on OM3, a slightly higher cost single PMD at
150-200m on OM3, and a third option that would specify two PMDs
consisting of both option 1 and option 2. The results were
overwhelmingly in favor of Option 2, a single PMD at longer length. A
small number supported Option 3 (2 PMDs) but NONE supported Option 1.
While it is true that many of our customers have a substantial
portion of their link lengths that are less than 100m, they all have
link lengths longer than 100m. One customer noted that more than half
of his data center had link lengths longer than 100m.
Kolesar presented his company's sales data in September 2006 (see
_http://www.ieee802.org/3/hssg/public/sep06/kolesar_01_0906.pdf_).
His data also suggested that longer link lengths were needed and he
recommended 150m at that time.
All the data for datacenter seems to suggest that 100m is TOO SHORT
to cover a significant portion of the datacenter application.
Pepeljugoski presented new data yesterday on HPC link lengths that
show 85% being less than 20m and 98% less than 50m. This might
suggest that 100m is TOO LONG for HPC applications.This leads to
another question of whether there is any economic or technical
advantage to a shorter MMF objective for HPC?
*
2. Is there consensus on supporting a longer reach objective for MMF?*
I think there is, others on the call yesterday did not. I base my
opinion on the straw poll conducted in Munich:
Straw Poll #15: Should we continue to work on a proposal for an annex
to extend the reach of a 40GBASE-SR4 and 100GBASE-SR10 in addition to
the proposal(“pepeljugoski_01_0508.pdf”) as in “jewell_01_0508.pdf”.
Yes: 55
No: 3
*3. Could we achieve 75% support for adding a new MMF objective?*
I don't know but if we could not, I would be forced to vote against
adopting the current MMF baseline proposal (which I don't want to do)
and I think others may also. This may or may not lead to an impasse
similar to what we experienced in 802.3ae.
I understand the concern that adding the objective without a clear
consensus on how to support the new objective could lead to delay but
I have found this committee to be very resourceful in driving to a
solution after we have made a decision to go forward. 40G is one
recent example of a situation where no consensus turned very quickly
to consensus.
I think adding a new objective is the right approach and in the long
run will save the task force valuable development time.
*4. Can we agree on the right assumptions on the 10G model to
evaluate the various proposals?*
Everyone seems to be using slightly different variations of the model
to evaluate the capability of the proposal; we need to agree on a
common approach of analysis.
*
5. Can we not let the discussion on OM4 cloud the decision?*
We can get extended link lengths on OM3. By achieving longer lengths
on OM3, even longer lengths will be possible on OM4 with the same
specification. What I don't want people to think is that OM4 is
required to get longer lengths.
*
6. Summary*
John D'Ambrosia has provided advice that if we want to move forward
with a new MMF objective, July is the time to do it - if we delay the
decision, it is guaranteed to delay the overall process. Some might
think if we make the decision, it will delay the overall process but
we don't know that yet. I don't think adding an informative
specification on a PMD is the right way to go - let's get the MMF
objective(s) right - we owe it to ourselves and to our customers. To
do anything less is just avoiding the issue. Let's get the objectives
set, get the assumptions correct and utilize the process set up by
Petrilla and Barbieri to drive toward the hard decisions that we are
all very capable of making.
Sincerely,
Steve Swanson
------------------------------------------------------------------------
*From:* Paul Kolesar [_mailto:PKOLESAR@xxxxxxxxxxxxx] *
Sent:* Thursday, July 10, 2008 7:19 PM*
To:* _STDS-802-3-HSSG@xxxxxxxxxxxxxxxxxx
<mailto:STDS-802-3-HSSG@xxxxxxxxxxxxxxxxx>*
Subject:* Re: [802.3BA] XR ad hoc Phone Conference Notice
Alessandro,
I'd like to continue your thread with some observations that have
driven me to certain conclusions, and to follow that with a
suggestion about how to parse the approach and drive to a consensus
position.
First let's consider what various customers are telling us. The
Corning survey of their customers, which has been presented to the
Ethernet Alliance, the XR ad-hoc, and will be presented next week to
802.3ba, shows that the large majority of customers want a single PMD
solution that can provide 150m on OM3 and 250m on OM4. A minority
were willing to accept a two PMD solution set that delivers the
lowest cost PMD to serve up to 100 m and a second PMD to serve the
extended distances as above. Not a single response indicated a
preference for a solution limited to 100m. We also hear strongly
expressed opinions from various system vendors that a longer distance
solution is not acceptable if it raises cost or power consumption of
the currently adopted 100m PMD. Under these conditions, and given
the options presented and debated within the XR ad-hoc, I believe you
are justified in concluding that a single PMD cannot satisfy all
these constraints. Yet it is clear to me that the market will demand
a low-cost PMD that can support more than 100m to fulfill the
distance needs of data centers. Therefore I conclude that the
correct compromise position is to develop a two-PMD solution. If the
committee does not undertake this development, it is likely that
several different proprietary solutions will be brought to the
market, with the net result of higher overall cost structures.
So let's consider how to choose from among the various proposals for
an extended reach PMD and let the determination of how to document it
within the standard be addressed after that.
I would propose a series of polls at next week's meeting designed to
gauge the preferences of the Task Force. I do not think that any XR
proposal will garner >75% at the outset, so I would propose the use
of Chicago rules wherein members may vote for all the proposals they
find acceptable. From this we can see which of the solutions is
least acceptable. Then through a process of elimination from the
bottom, and repeated application of Chicago rules for the remainder,
finally determine the most acceptable solution.
Depending on the degree of maturity of the specifications or other
considerations for the chosen solution, the Task Force will be better
able to determine how it should be handled within the standard. For
example, a proposal with a maturity on par with the adopted baseline
could be put forth under a new objective without undue concern of
becoming a drag on the timeline, while a proposal of lesser maturity
could be placed in an annex without an additional objective.
Regards,
Paul Kolesar
CommScope Inc.
Enterprise Solutions
1300 East Lookout Drive
Richardson, TX 75082
Phone: 972.792.3155
Fax: 972.792.3111
eMail: _pkolesar@xxxxxxxxxxxxxx <mailto:pkolesar@xxxxxxxxxxxxx>
*"Alessandro Barbieri (abarbier)" **_<abarbier@xxxxxxxxx>_*
<mailto:abarbier@xxxxxxxxx>
07/10/2008 04:43 PM
Please respond to
"Alessandro Barbieri (abarbier)" _<abarbier@xxxxxxxxx>_
<mailto:abarbier@xxxxxxxxx>
To
_STDS-802-3-HSSG@xxxxxxxxxxxxxxxxxx
<mailto:STDS-802-3-HSSG@xxxxxxxxxxxxxxxxx>
cc
Subject
Re: [802.3BA] XR ad hoc Phone Conference Notice
Matt,
here is my *personal* read of the situation in the XR ad hoc:
a) I think there could be consensus on supporting XR, as long as we
pick a solution that does not impact the cost structure of the 100m
PMD. Because of that I also don't feel a single PMD is realistic at
this point.
a) The trouble however is that there is no consensus (>75%) on any of
the technical proposals. No one proposal has a clear lead over the
others.
Of the three options you list below, I think adding an objective for
a ribbon XR PMD could have a major impact on the project schedule,
because it seems we are nowhere near technical consensus. We could
drag the discussion for several TF meetings...I am not sure delaying
the project over this specific topic is worth it. We can always
resort to non-standard solutions to fulfill market requirements we
can't address within IEEE, or come back in the future with another CFI.
At the end of the conference call earlier today I requested that we
get together after hours next week to see if we can accelerate
consensus building.
All the data is on the table now, so if we don't show any material
progress, I am not sure we should extend this ad hoc.
Alessandro
------------------------------------------------------------------------
*From:* Matt Traverso [_mailto:matt.traverso@xxxxxxxxxx] *
Sent:* Thursday, July 10, 2008 10:07 AM*
To:* _STDS-802-3-HSSG@xxxxxxxxxxxxxxxxxx
<mailto:STDS-802-3-HSSG@xxxxxxxxxxxxxxxxx>*
Subject:* Re: [802.3BA] XR ad hoc Phone Conference Notice
Colleagues,
I feel that we are coming to a situation similar to the impasse at
40G vs. 100G where different participants call different segments of
the networking industry their customer.
For MMF, I'd like to see an optimized solution at 100m per all of the
work that has been done.
I'd like to understand if folks feel that a different status for the
extended reach
a) Informative
b) Normative
c) New objective
would significantly alter the technically proposed solution from the
Ad Hoc. Opinions?
Chris,
The case of slow market/industry transition from LX4 to LRM is one of
the reasons why I would like to see the industry adopt 40G serial
from the launch. The slow adoption of LRM has primarily been limited
by end customer knowledge of the solution. 40G serial technology is
available.
thanks
--matt
Hi Gourgen,
Some numbers might help clarify what close to 0 means.
For 2008, Lightcounting gives a shipment number of approximately
30,000 for 10GE-LRM (and for 10GE-LX4 it's about 60,000.) So close to
0 would apply if we were rounding to the nearest 100K. As an aside,
10GE-LRM supports 220m of MMF, not 300m.
300m of OM3 is supported by 10GE-SR, which Lightcounting gives as
approximately 400,000 in 2008, so that would be close to 0 if we
rounding to the nearest 1M.
Another interesting sideline in looking at these numbers is that 2
years after the 10GE-LRM standard was adopted in 2006, despite the
huge investment being made in 10GE-LRM development, and despite very
little new investment being made in 10GE-LX4, the 10GE CWDM
equivalent (i.e. 10GE-LX4, 4x3G) is chugging along at 2x the volume
of the 10GE Serial solution that was adopted to replace it.
This should put some dim on hopes that very low cost 40GE Serial
technology can be developed from scratch in two years and ship in
volume when the 40GE standard is adopted in 2010.
Chris
------------------------------------------------------------------------
*From:* Gourgen Oganessyan [mailto:_gourgen@xxxxxxxxxxxx
<mailto:gourgen@xxxxxxxxxxx>] *
Sent:* Wednesday, July 09, 2008 8:02 PM
*To:* _STDS-802-3-HSSG@xxxxxxxxxxxxxxxxxx
<mailto:STDS-802-3-HSSG@xxxxxxxxxxxxxxxxx>*
Subject:* Re: [802.3BA] XR ad hoc Phone Conference Notice
Petar,
Well, sadly that's what has been happening in the 10G world, people
are forced to amortize the cost of 300m reach (LRM), while in reality
the number of people who need 300m is close to 0.
That's why I am strongly in support of your approach of keeping the
100m objective as primary goal.
Frank, OM4 can add as much cost as it wants to, the beauty is the
added cost goes directly where it's needed, which is the longer
links. Alternatives force higher cost/higher power consumption on all
ports regardless of whether it's needed there or not.
*Gourgen Oganessyan*
*Quellan Inc.*
Phone: (630)-802-0574 (cell)
Fax: (630)-364-5724
e-mail: _gourgen@xxxxxxxxxxxx <mailto:gourgen@xxxxxxxxxxx>
------------------------------------------------------------------------
*From:* Petar Pepeljugoski [mailto:_petarp@xxxxxxxxxxx
<mailto:petarp@xxxxxxxxxx>] *
Sent:* Wednesday, July 09, 2008 7:51 PM*
To:* _STDS-802-3-HSSG@xxxxxxxxxxxxxxxxxx
<mailto:STDS-802-3-HSSG@xxxxxxxxxxxxxxxxx>*
Subject:* Re: [802.3BA] XR ad hoc Phone Conference Notice
Frank,
If I interpret correctly, you are saying that all users should
amortize the cost of very few who need extended reach.
We need to be careful how we proceed here - we should not repeat the
mistakes of the past if we want successful standard.
Regards,
Peter
Petar Pepeljugoski
IBM Research
P.O.Box 218 (mail)
1101 Kitchawan Road, Rte. 134 (shipping)
Yorktown Heights, NY 10598
e-mail: _petarp@xxxxxxxxxxx <mailto:petarp@xxxxxxxxxx>
phone: (914)-945-3761
fax: (914)-945-4134
From: Frank Chang <_ychang@xxxxxxxxxxxx <mailto:ychang@xxxxxxxxxxx>>
To: _STDS-802-3-HSSG@xxxxxxxxxxxxxxxxxx
<mailto:STDS-802-3-HSSG@xxxxxxxxxxxxxxxxx>
Date: 07/09/2008 10:29 PM
Subject: Re: [802.3BA] XR ad hoc Phone Conference Notice
------------------------------------------------------------------------
Hi Jeff;
Thanks for your comment. You missed one critical point that there is
cost increase from OM3 to OM4. If you take ribbon cable cost in
perspective, OM4 option is possibly the largest of the 4 options.
Besides, the use of OM4 requires to tighten TX specs which impact TX
yield, so you are actually compromising the primary goal.
Frank
------------------------------------------------------------------------
*From:* Jeff Maki [_mailto:jmaki@xxxxxxxxxxxx] *
Sent:* Wednesday, July 09, 2008 7:02 PM*
To:* _STDS-802-3-HSSG@xxxxxxxxxxxxxxxxxx
<mailto:STDS-802-3-HSSG@xxxxxxxxxxxxxxxxx>*
Subject:* Re: [802.3BA] XR ad hoc Phone Conference Notice
Dear MMF XR Ad Hoc Committee Members,
I believe our current objective of "at least 100 meters on OM3 MMF"
should remain as a primary goal, the baseline. Support for any form
of extended reach should be considered only if it does not compromise
this primary goal. A single PMD for all reach objectives is indeed a
good starting premise; however, it should not be paramount. In the
following lists are factors, enhancements, or approaches I would like
to put forward as acceptable and not acceptable for obtaining
extended reach.
Not Acceptable:
1. Cost increase for the baseline PMD (optic) in order to obtain
greater than 100-meter reach
2. EDC on the system/host board in any case
3. CDR on the system/host board as part of the baseline solution
4. EDC in the baseline PMD (optic)
5. CDR in the baseline PMD (optic)
Acceptable:
1. Use of OM4 fiber
2. Process maturity that yields longer reach with no cost increase
In summary, we should not burden the baseline solution with cost
increases to meet the needs of an extended-reach solution.
Sincerely,
Jeffery Maki
————————————————
Jeffery J. Maki, Ph.D.
Principal Optical Engineer
Juniper Networks, Inc.
1194 North Mathilda Avenue
Sunnyvale, CA 94089-1206
Voice +1-408-936-8575
FAX +1-408-936-3025 _
__www.juniper.net_ <http://www.juniper.net/> _
__jmaki@xxxxxxxxxxxx <mailto:jmaki@xxxxxxxxxxx>
————————————————