Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: Hari Byte vs. Word striping




Shawn,

Apologies for being late in responding to your comments.  I just
returned from vacation, and it took some time to dig thru the heap
of email.

I appreciate hearing your feedback.  I will try to highlight what 
I think are the most significant differences between the block
diagram below for word striping and a byte striping implementation:

 1) In byte striping, the data path within each lane is one byte
    wide.  That implies clocks at 312 MHz.  (Maybe there is a way
    around this, but I don't see it, and no one has come forward
    to give any such details for byte striping.)  For word striping,
    the data path can easily be 2 or 4 bytes wide, with consequently
    lower speed clocks.

 2) The need to deskew in byte striping seems to involve determining
    some pointer offset in each of the four FIFOs.  The logic to
    perform this determination must use some clock synchronous with 
    the received data.  But a clock derived from one lane might well
    produce metastability vis-a-vis data from another lane (due to
    static and dynamic skew).  The same issues arise with the logic
    to detect and operate on add/delete columns (i.e., one byte from
    each of the four lanes).

    In word striping, a clock from any lane will latch a word of data
    in each lane within that lane's data valid window.  No further
    deskew is necessary.  This same clock can then also run the 
    add/delete logic with no problems.

Again, I appreciate your feedback.  If you (or anyone) want to follow
up, that would be great.  Especially, I would welcome anyone who wants 
to provide any details regarding a byte striping implementation.

Regarding your comments about power dissipation, I think we are close
enough to agreement that I won't quibble (and I suppose neither of us
wants to get too deep into technology or architectural details).

Regards,
Mike

> "Rogers, Shawn" wrote:
> 
> My excuse for weekending: I was helping my daughter find something on
> the internet and glanced at my e-mail.  At least that's the story I'm
> telling my wife.
> 
> Mike, I'm sorry but I did not see anything in your embodiment that was
> specific to a word striped approach.  In fact, it is generic enough
> that it could have been used for byte stripe, or multi-lane
> scrambling.  I'm not trying to pick on it, I'm honestly interested in
> understanding the benefits and challenges of the word striped
> approach.
> 
> Comments:  Agree with ~500mW/channel at 0.25um CMOS.  However, I do
> not agree that it will likely be reduced due to "future optimizations"
> (by this, I assume you mean integrations).  Most of our data shows
> that the power is dominated by the Tx buffer, PLL, and the receive
> detect circuitry.  All of this is in the serial domain, and would not
> benefit from integration.  It is also not likely to come down with
> migration to more advanced CMOS process technologies without
> completely new architectures (i.e. migrating serial path logic into
> parallel path logic).
> 
> Regards,
> shawn
> 
> -----Original Message-----
> From: Mike Jenkins [mailto:jenkins@xxxxxxxx]
> Sent: Friday, December 17, 1999 8:48 PM
> To: HSSG
> Subject: Re: Hari Byte vs. Word striping
> 
> Rich,
> 
> In your recent response to Mark Ritter, a frequent refrain was:
> 
> > Please prove your assertion by a means acceptable to this
> > standards body such as product, prototype, illustration, etc.
> 
> This strikes me as a good idea, so I will attempt to detail a word
> striped embodiment, invoking existing designs as proof of concept.
> The diagram and description below show each word-striped HARI lane
> below as identical to existing functions in Fibre Channel designs.
> We have done many of these designs.  As you have said, most of the
> power and die area is in the serdes.  One lane is less than 2 mm**2
> of die area and less than 500 mW in 0.25 um CMOS, making a full
> word-striped HARI interface < 8 mm**2 and < 2 Watts.  Future
> optimization to share resources, etc., would drive these numbers
> lower.
> 
> I would be happy to clarify any issues with the description below.
> But, I would also very much appreciate a similar level of detail
> regarding byte striping.  From the beginning, the uncertainty around
> implementation of byte striping has bothered me.  The only independent
> estimate of the implementation difficulty of byte striping solicited
> by the HARI group came to the same conclusion, spawning this running
> debate.  So, please provide whatever equivalent details there may be
> for byte striping, especially in the deskew function.  Otherwise, to
> quote you again, "your claims by emphatic assertion are just that."
> 
> Regards,
> Mike
> 
>   _________LANE 3_______________________________
>  |  _________LANE 2_____________________________|_
>  | |  _________LANE 1_____________________________|_            __
>  > | |  _________LANE 0_____________________________|_  ======>|  \
>  | > | |   __________     __________     __________   |   ====>|MUX\==>
>  | | > |  |          |   |          |   |          |  |     ==>|   /
>  | | | >->|  DESER.  |==>|   FIFO   |==>|  DECODE  |==>=======>|__/
>  | | | |  |__________|   |__________|   |__________|  |         __
>  < | | |                                              | <======|  \
>  | < | |   __________          _         __________   |   <====|DE \<==
>  |_| < |  |          |        / |       |          |  |     <==|MUX/
>    |_| <--|SERIALIZER|<======|  |<======|  ENCODE  |<========<=|__/
>      |_|  |__________|        \_|       |__________|  |
>        |______________________________________________|
> 
> 
>  * The functions within each lane are identical to existing Fibre
>    Channel designs and similar to Gigabit Ethernet designs.
> 
>  * Clocks within each lane are 156 MHz or slower.
> 
>  * The mux in the ENCODE/SERIALIZER path permits the FIFO output to
>    be retransmitted through the serializer for diagnostics or use as
>    a retimer function.
> 
>  * DECODE & ENCODE blocks can be bypassed, depending on application.
> 
>  * MUX & DEMUX are optional, depending on data path width in the ASIC.
> 
>  * Control logic across four lanes determines when to add/delete a
>    "skip" word in one FIFO for speed matching.  For delete operation,
>    the normally rotating MUX address skips that lane.  For an add
>    operation, the MUX address dwells on that lane for two words.
>    (This logic is identical to existing control logic for speed
>    matching buffers with the four FIFOs viewed as one address space.
>    The logic size is trivial.)
> 
>  * If a protocol requires "trunking" (or "aggregation") wherein four
>    separate data streams are transmitted, the add/delete operations
>    are even simpler -- the MUX address always rotates one count per
>    word and add/delete is autonomous within each lane (exactly as in
>    1Gb/s Fibre Channel designs).  In this case, the four lanes can be
>    (and usually would be) asynchronous.

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Mike Jenkins               Phone: 408.433.7901            _____     
 LSI Logic Corp, ms/G715      Fax: 408.433.7461        LSI|LOGIC| (R)   
 1525 McCarthy Blvd.       mailto:Jenkins@xxxxxxxx        |     |     
 Milpitas, CA  95035         http://www.lsilogic.com      |_____|    
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~