Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: [802.3BA] Longer OM3 Reach Objective



Petar

Please see my remarks below:

Petar Pepeljugoski wrote:
OF36777F3B.445133C3-ON85257417.0065DEE3-85257417.006B3AE7@us.ibm.com" type="cite">
Ali,

To address some of your points.

1. For disaster recovery, you will need SMF solution, since 300m is not enough.
For disaster recovery you will need SMF.
OF36777F3B.445133C3-ON85257417.0065DEE3-85257417.006B3AE7@us.ibm.com" type="cite">
2. BG/L system longest length is 30m. The current BG/P system (current #1 on top500) has longest length less than 50m. When we agreed to 100m, we really wanted 50m, but agreed because we thought that there is not much difference between 50m and 100m from power buget perspective. HPC does not need more than 100m. Everything fits in a larger machine room.
It looks to me that BG/L links are intra-systems which would outside the scope IEEE, but you might be able to leverage the components
developed in the 802.3ba.
OF36777F3B.445133C3-ON85257417.0065DEE3-85257417.006B3AE7@us.ibm.com" type="cite">3. Not likely.  BG/L uses 1GbE electrical for interconnects/switch, BG/P uses 10 GbE to switch.
If I understand you correctly the Ethernet connection on the BG/L is 1GBase-T and 10GbE on the BG/P, all high speed
links are internal and propitiatory.  I guess IBM sell this as a one unit with very large price tag.  Some of the presentation
given in HSSG for example http://www.ieee802.org/3/hssg/public/mar07/bechtel_01_0307.pdf uses Ethernet fabric
with commodity server to build a large cluster.

OF36777F3B.445133C3-ON85257417.0065DEE3-85257417.006B3AE7@us.ibm.com" type="cite">
4. What higher performance are you going to achieve by going longer than 100m?
I agree if you have one of these massively integrated IBM server with an internal clustering network which meets your
computing requirement with single unit then you probably don't need more than 100m unless you have a power user or
you make the BG/P or L in to a Video server.

OF36777F3B.445133C3-ON85257417.0065DEE3-85257417.006B3AE7@us.ibm.com" type="cite">
5. I agree with you that we can have limiting option for 100m, without any extra features. Longer distances can be handled, for example, with informative annex.
As important if not more is to define a methodology similar to 8 Gig FC and SFP+ so you can utilize the benefit of the linear link.   As the editor of
SFP+ and my experience with various limiting and linear SFP+ links, the limiting links in some scenario are even more difficult to handle than the
scary LRM split symmetric.

OF36777F3B.445133C3-ON85257417.0065DEE3-85257417.006B3AE7@us.ibm.com" type="cite">

As a user, my needs are clearly defined. 100m is more than sufficient.
I value your input and sharing your requirements.

Thanks,
Ali
OF36777F3B.445133C3-ON85257417.0065DEE3-85257417.006B3AE7@us.ibm.com" type="cite">

Regards,

Peter


Petar Pepeljugoski
IBM Research
P.O.Box 218 (mail)
1101 Kitchawan Road, Rte. 134 (shipping)
Yorktown Heights, NY 10598

e-mail: petarp@us.ibm.com
phone: (914)-945-3761
fax:        (914)-945-4134



"Ali Ghiasi" <aghiasi@broadcom.com>

03/24/2008 05:58 PM

To
Petar Pepeljugoski/Watson/IBM@IBMUS
cc
STDS-802-3-HSSG@LISTSERV.IEEE.ORG
Subject
Re: [802.3BA] Longer OM3 Reach Objective







Petar

Thanks for sending the pointer to the top 500 list and I do see the server at TJW.

In November 2007, 2 systems appeared in the TOP500 list.

Rank
System
Procs
Memory(GB)
Rmax (GFlops)
Rpeak (GFlops)
Vendor
8 BGW
eServer Blue Gene Solution
40960 N/A 91290 114688 IBM


They did not show a picture or how big is the server, but based on your remarks it is small enough to fit in modest room.

I assume the Intra-links with the Blue Gene might be proprietary or IB.  What does clustering system Intra-links has do to
with the Ethernet network connection.  

I assume still some of the users in TJW lab may want to connect with higher speed Ethernet to this server, very likely you will need
links longer than 100 m.  In addition higher speed Ethernet may be used to cluster several Blue Gene system for fail over,
redundancy, disaster tolerance, or higher performance which will require links longer than 100 m.

We are both in agreements that parallel ribbon fiber will provide the highest density in near future.  The module form factors with a gearbox
will be 3-4x larger.   Here is a rough estimate of BW/mm (Linear face plate) for several form factors:
Speed      Media Sig.         Form Factor                                             Bandwidth (Gb/mm)  
 10GbE     1x10G      SFP+ (SR/LR/LRM/Cu )                    1.52 (Assumes stacked cages)
 40 GbE     4x10G      QSFP (SR or direct attach)                  4.37 (Assumes stacked cages)
 40 GbE     TBD         If assumed Xenpak (LR)                     0.98
 100 GbE    10x10G   CSFP (SR or direct attach)                  3.85 (The proposed connector already is stacked )
  100 GbE   4x25G     CFP (LR)                                             1.23

As you could see here the form factors which allow you to go >100 m will be several time larger and not compatible
with the higher density solution based on nx10G.  Linear nx10G as given in  

http://www.ieee802.org/3/ba/public/jan08/ghiasi_02_0108.pdf
can extend the reach to 300 m on OM3 fiber and relax the transmitter and jitter budget.

You have stated strongly you see no need for more than 100 m,  but we have also heard from other who stated
there is a need for  MMF for more than 100 m especially if you have to change the form factor for more than
100m!  Like FC and SFP+  we can define limiting option for 100 m and  linear option for 300 m, and
let the market decide.

Thanks,
Ali

Petar Pepeljugoski wrote:


Frank,


You are missing my point. Even the best case stat, no matter how you twist it in your favor, is based on distances from yesterday. New servers are much smaller, require shorter interconnect distances. I wish you could come to see the room where current #8  on the top500 list of supercomputers is (Rpeak 114 GFlops), maybe you'll understand then.


Instead of trying to design something that uses more power and goes unnecessarilly longer distances, we should focus our effort towards designing energy efficient, small footprint,  cost effective modules.


Regards,


Petar Pepeljugoski
IBM Research
P.O.Box 218 (mail)
1101 Kitchawan Road, Rte. 134 (shipping)
Yorktown Heights, NY 10598

e-mail:
petarp@us.ibm.com
phone: (914)-945-3761
fax:        (914)-945-4134



Frank Chang <ychang@VITESSE.COM>

03/14/2008 09:23 PM
Please respond to
Frank Chang
<ychang@VITESSE.COM>


To
STDS-802-3-HSSG@LISTSERV.IEEE.ORG
cc

Subject
Re: [802.3BA] Longer OM3 Reach Objective









Petar;

 

Depending on the sources of link statistics, 100m OM3 reach objective actually covers from 70% to 90% of the links, so we are talking about that 100m isnot even close to 95% coverage.
   
 

Regards

Frank


From: Petar Pepeljugoski [mailto:petarp@US.IBM.COM]
Sent:
Friday, March 14, 2008 5:09 PM
To:
STDS-802-3-HSSG@listserv.ieee.org
Subject:
Re: [802.3BA] Longer OM3 Reach Objective


Hello Jonathan,


While I am sympathetic with your view of the objectives, I disagree and oppose changing the current reach objective of 100m over OM3 fiber.


From my previous standards experience, I believe that all the difficulties arise in the last 0.5 dB or 1dB of the power budget (as well as jitter budget). It is worthwhile to ask module vendors how much would their yield improve if they are given 0.5 or 1 dB. It is responsible for most yield hits, making products much more expensive.
I believe that selecting specifications that penalize 95% of the customers to benefit 5% is a wrong design point.

You make another point - that larger data centers have higher bandwidth needs. While it is true that the bandwidth needs increase, you fail to mention is that the distance needs today are less than on previous server generations, since the processing power today is much more densely packed than before.

I believe that 100m is more than sufficient to address our customers' needs.  

Sincerely.


Petar Pepeljugoski
IBM Research
P.O.Box 218 (mail)
1101 Kitchawan Road, Rte. 134 (shipping)
Yorktown Heights, NY 10598

e-mail:
petarp@us.ibm.com
phone: (914)-945-3761
fax:        (914)-945-4134
Jonathan Jew <jew@j-and-m.com>

03/14/2008 01:32 PM
Please respond to
jew@j-and-m.com


To
STDS-802-3-HSSG@LISTSERV.IEEE.ORG
cc

Subject
[802.3BA] Longer OM3 Reach Objective



I am a consultant with over 25 years experience in data  center
infrastructure design and data center relocations including in excess of 50
data centers totaling 2 million+ sq ft.  I am currently engaged in data
center projects for one of the two top credit card processing firms and one
of the two top computer manufacturers.

I'm concerned about the 100m OM3 reach objective, as it does not cover an
adequate number (>95%) of backbone (access-to-distribution and
distribution-to-core switch) channels for most of my clients' data centers.


Based on a review of my current and past projects, I expect that a 150m or
larger reach objective would be more suitable.  It appears that some of the
data presented by others to the task force, such as Alan Flatman's Data
Centre Link Survey supports my impression.

There is a pretty strong correlation between the size of my clients' data
centers and the early adoption of new technologies such as higher speed LAN
connectivity.   It also stands to reason that larger data centers have
higher bandwidth needs, particularly at the network core.

I strongly encourage you to consider a longer OM3 reach objective than 100m.

Jonathan Jew
President
J&M Consultants, Inc

jew@j-and-m.com

co-chair BICSI data center standards committee
vice-chair TIA TR-42.6 telecom administration subcommittee
vice-chair TIA TR-42.1.1 data center working group (during development of
TIA-942)
USTAG representative to ISO/IEC JTC 1 SC25 WG3 data center standard adhoc


[attachment "aghiasi.vcf" deleted by Petar Pepeljugoski/Watson/IBM]

begin:vcard
fn:Ali Ghiasi
n:Ghiasi;Ali
org:Broadcom;HSIP
adr;dom:;;3151 Zanker Road;San Jose;CA;95014
email;internet:aghiasi@broadcom.com
title:Chief Architect
tel;work:(408)922-7423
tel;cell:(949)290-8103
version:2.1
end:vcard