Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

RE: Hari Byte vs. Word striping



Title: RE: Hari Byte vs. Word striping

My excuse for weekending: I was helping my daughter find something on the internet and glanced at my e-mail.  At least that's the story I'm telling my wife. 

Mike, I'm sorry but I did not see anything in your embodiment that was specific to a word stripped approach.  In fact, it is generic enough that it could have been used for byte stripe, or multi-lane scrambling.  I'm not trying to pick on it, I'm honestly interested in understanding the benefits and challenges of the word striped approach. 


Comments:  Agree with ~500mW/channel at 0.25um CMOS.  However, I do not agree that it will likely be reduced due to "future optimizations" (by this, I assume you mean integrations).  Most of our data shows that the power is dominated by the Tx buffer, PLL, and the receive detect circuitry.  All of this is in the serial domain, and would not benefit from integration.  It is also not likely to come down with migration to more advanced CMOS process technologies without completely new architectures (i.e. migrating serial path logic into parallel path logic). 



Regards,
shawn

-----Original Message-----
From: Mike Jenkins [mailto:jenkins@xxxxxxxx]
Sent: Friday, December 17, 1999 8:48 PM
To: HSSG
Subject: Re: Hari Byte vs. Word striping



Rich,

In your recent response to Mark Ritter, a frequent refrain was:

> Please prove your assertion by a means acceptable to this
> standards body such as product, prototype, illustration, etc.

This strikes me as a good idea, so I will attempt to detail a word
striped embodiment, invoking existing designs as proof of concept.
The diagram and description below show each word-striped HARI lane
below as identical to existing functions in Fibre Channel designs.
We have done many of these designs.  As you have said, most of the
power and die area is in the serdes.  One lane is less than 2 mm**2
of die area and less than 500 mW in 0.25 um CMOS, making a full
word-striped HARI interface < 8 mm**2 and < 2 Watts.  Future
optimization to share resources, etc., would drive these numbers
lower. 

I would be happy to clarify any issues with the description below.
But, I would also very much appreciate a similar level of detail
regarding byte striping.  From the beginning, the uncertainty around
implementation of byte striping has bothered me.  The only independent
estimate of the implementation difficulty of byte striping solicited
by the HARI group came to the same conclusion, spawning this running
debate.  So, please provide whatever equivalent details there may be
for byte striping, especially in the deskew function.  Otherwise, to
quote you again, "your claims by emphatic assertion are just that."

Regards,
Mike


  _________LANE 3_______________________________         
 |  _________LANE 2_____________________________|_         
 | |  _________LANE 1_____________________________|_            __
 > | |  _________LANE 0_____________________________|_  ======>|  \
 | > | |   __________     __________     __________   |   ====>|MUX\==>
 | | > |  |          |   |          |   |          |  |     ==>|   /
 | | | >->|  DESER.  |==>|   FIFO   |==>|  DECODE  |==>=======>|__/
 | | | |  |__________|   |__________|   |__________|  |         __ 
 < | | |                                              | <======|  \
 | < | |   __________          _         __________   |   <====|DE \<==
 |_| < |  |          |        / |       |          |  |     <==|MUX/
   |_| <--|SERIALIZER|<======|  |<======|  ENCODE  |<========<=|__/
     |_|  |__________|        \_|       |__________|  |
       |______________________________________________|
                            

 * The functions within each lane are identical to existing Fibre
   Channel designs and similar to Gigabit Ethernet designs.

 * Clocks within each lane are 156 MHz or slower.

 * The mux in the ENCODE/SERIALIZER path permits the FIFO output to
   be retransmitted through the serializer for diagnostics or use as
   a retimer function.

 * DECODE & ENCODE blocks can be bypassed, depending on application.

 * MUX & DEMUX are optional, depending on data path width in the ASIC.

 * Control logic across four lanes determines when to add/delete a
   "skip" word in one FIFO for speed matching.  For delete operation,
   the normally rotating MUX address skips that lane.  For an add
   operation, the MUX address dwells on that lane for two words.
   (This logic is identical to existing control logic for speed
   matching buffers with the four FIFOs viewed as one address space.
   The logic size is trivial.)

 * If a protocol requires "trunking" (or "aggregation") wherein four
   separate data streams are transmitted, the add/delete operations
   are even simpler -- the MUX address always rotates one count per
   word and add/delete is autonomous within each lane (exactly as in
   1Gb/s Fibre Channel designs).  In this case, the four lanes can be
   (and usually would be) asynchronous.
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Mike Jenkins               Phone: 408.433.7901            _____    
 LSI Logic Corp, ms/G715      Fax: 408.433.7461        LSI|LOGIC| (R)  
 1525 McCarthy Blvd.       mailto:Jenkins@xxxxxxxx        |     |    
 Milpitas, CA  95035         http://www.lsilogic.com      |_____|   
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~