Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: [RPRWG] protection messages




I want to particularly second David's comment:

"Its not the architect's job
to prove that normally-illegal states are never entered, its his
job to prove they are always quickly and robustly exited."

Bad things will happen to good networks.  The question is how do we recover
when they do.  It is almost never acceptable to assume the error can't
happen.

Best regards,

Robert D. Love
President, Resilient Packet Ring Alliance
President, LAN Connect Consultants
7105 Leveret Circle     Raleigh, NC 27615
Phone: 919 848-6773       Mobile: 919 810-7816
email: rdlove@xxxxxxxx          Fax: 208 978-1187
----- Original Message -----
From: "David V. James" <dvj@xxxxxxxxxxxx>
To: "John Lemon" <JLemon@xxxxxxxxxxxx>; "'Anoop Ghanwani'"
<anoop@xxxxxxxxxxxxxx>
Cc: <stds-802-17@xxxxxxxx>
Sent: Wednesday, May 29, 2002 1:20 AM
Subject: RE: [RPRWG] protection messages


>
> John (and Anoop),
>
> I tend to agree with Anoop on a few points, but perhaps
> for different reasons. I like a fixed-rate sending because:
>   1) Its simpler to build
>   2) Its simpler to test
>   3) Its simpler to integrate into hardware, so that the
>      software is only interrupted on excess-delay conditions.
>   4) Expodential backoff raises the possibility of some info
>      being up-to-date while other info is stale. Unless one tries
>      to synchronize the backoffs, which has its own set of
>      complexities and problems.
>   5) If its periodic, we can (more likely) piggy-back one of the
>      other periodic signals, such as type-A fairness, type-B fairness,
>      protection, or OAM. That would also save on processor interrupts
>      (the interrupt itself is the heavy overhead, processing time
>      per-quadlet is typically _very_ small).
>   5) Expodential backoff sometimes responds slowly, depending
>      on the node state. A couple of problems with this include:
>      a) Its hard to test, as this behavior is highly node-state
>         dependent.
>      b) Recovery from errors can be excessively long.
>
> Relative to (4b), one of my early Engineering Mentors at HP noted
> that devices should recovery quickly from ANY_AND_ALL undefined
> and/or inconsistent states quickly. Its not the architect's job
> to prove that normally-illegal states are never entered, its his
> job to prove they are always quickly and robustly exited.
>
> My current boss has much the same feeling: a reset should never
> be needed because all state-machines should quickly and robustly
> transition from illegal-to-legal states.
>
> An expondential backoff has a _very_ long potential illegal-to-legal
> state transition, which therefore fails the aforementioned criteria.
>
> Please add these thoughts and concerns to those raised by Anoop.
>
> DVJ
>
>
> David V. James, PhD
> Chief Architect
> Network Processing Solutions
> Data Communications Division
> Cypress Semiconductor, Bldg #3
> 3901 North First Street
> San Jose, CA 95134-1599
> Work: +1.408.545.7560
> Cell: +1.650.954.6906
> Fax:  +1.408.456.1962
> Work: djz@xxxxxxxxxxx
> Base: dvj@xxxxxxxxxxxx
>
>
> >>-----Original Message-----
> >>From: owner-stds-802-17@xxxxxxxxxxxxxxxxxx
> >>[mailto:owner-stds-802-17@xxxxxxxxxxxxxxxxxx]On Behalf Of John Lemon
> >>Sent: Tuesday, May 28, 2002 8:04 PM
> >>To: 'Anoop Ghanwani'
> >>Cc: stds-802-17@xxxxxxxx
> >>Subject: RE: [RPRWG] protection messages
> >>
> >>
> >>
> >>Anoop,
> >>
> >>Since I was the one who proposed the exponential backoff, let me try to
> >>justify it.
> >>
> >>I believe it solves the following problems (listed in decreasing
> >>importance):
> >>a) improves timely notification
> >>b) assures notification
> >>c) reduces overhead
> >>d) reduces bandwidth
> >>
> >>a) By quickly repeating the message, a transient error is likely to be
> >>overcome with the next message. The more transient the error, the more
> >>likely a new message sent after a short period of delay will make it
> >>through. The more long standing the error, the less it helps to send
> >>messages quickly. In other words, send again quite quickly in order to
> >>overcome a message lost to a very transient error, send again
> >>after a longer
> >>delay to overcome a less transient error, etc.
> >>b) In the case of a longer term error, the message should eventually get
> >>through. After the initial time period has passed, the urgency
> >>declines, but
> >>the knowledge is still important.
> >>c) By backing off to a less frequent rate, you don't interrupt the MAC
as
> >>often.
> >>d) By backing off to a less frequent rate, you use less bandwidth,
albeit
> >>probably a negligible amount.
> >>
> >>I disagree that exponential backoff is the worst of both worlds.
> >>In fact, I
> >>believe the opposite. It has low processing overhead since the
processing
> >>overhead backs off exponentially. It is entirely reliable since it keeps
> >>trying forever.
> >>
> >>jl
> >>
> >>-----Original Message-----
> >>From: Anoop Ghanwani [mailto:anoop@xxxxxxxxxxxxxx]
> >>Sent: Tuesday, May 28, 2002 8:50 AM
> >>To: 'Mike Takefman'; jan.van.ruymbeke@xxxxxxxxxxx
> >>Cc: Anoop Ghanwani; dzhu@xxxxxxxxx; nuzun@xxxxxxxxx;
> >>stds-802-17@xxxxxxxx
> >>Subject: RE: [RPRWG] protection messages
> >>
> >>
> >>
> >>
> >>Wow...there's been a lot of discussion on this thread
> >>over the weekend!
> >>
> >>Anyway, reliable delivery of protection messages is a
> >>bigger issue for networks with steering than it is for
> >>wrapping as noted by Mike.
> >>
> >>We also have to make sure that every station know when
> >>a protection occurs preferably within 50 msec, but
> >>even after that it must know (and sooner the better), and
> >>so I don't quite agree with Leon's argument.
> >>
> >>And getting back to overhead: if bandwidth overhead
> >>is such a big deal, then why not piggyback the information
> >>on Type B fairness messages which are being sent anyway?
> >>Hardware/software can be easily designed to support this.
> >>If overhead is not a big deal, then why can't we just send
> >>them as separate messages but at a constant rate?
> >>
> >>The exponential back-off protocol specified currently
> >>gives us the worst of both worlds.  It has the high
> >>processing overhead that Mike is concerned at the time
> >>immediately following the change in protection status,
> >>and yet it isn't quite as robust as we need it to be.
> >>
> >>-Anoop
> >>
> >>
> >>> -----Original Message-----
> >>> From: Mike Takefman [mailto:tak@xxxxxxxxx]
> >>> Sent: Tuesday, May 28, 2002 4:56 AM
> >>> To: jan.van.ruymbeke@xxxxxxxxxxx
> >>> Cc: anoop@xxxxxxxxxxxxxx; dzhu@xxxxxxxxx; nuzun@xxxxxxxxx;
> >>> stds-802-17@xxxxxxxx
> >>> Subject: Re: [RPRWG] protection messages
> >>>
> >>>
> >>> Jan,
> >>>
> >>> A certain company (whom I work for) can easily hit the
> >>> 50 ms time even for large rings with a message based
> >>> scheme. Of course since we do wrapping, the messages
> >>> only have to flow between the two adjacent nodes
> >>> to protect the ring.
> >>>
> >>> That being said, I am sure the companies who have
> >>> implemented steering will assure you that they can
> >>> hit 50 ms as well.
> >>>
> >>> Do you have actual proof that it cannot be done,
> >>> because that would be quite interesting to see.
> >>>
> >>> mike
> >>>
> >>> jan.van.ruymbeke@xxxxxxxxxxx wrote:
> >>> >
> >>> > Hello,
> >>> > By accident I had a similar discussion about ATM OAM
> >>> (I.610) with SDH people.  The situation is similar and so is
> >>> the solution :
> >>> > -forget 50 ms restoration if you rely only on packet messages;
> >>> > -use physical layer indications (LOS, AIS, ...)
> >>> >
> >>> > regards
> >>> > Jan Van Ruymbeke
> >>> > Belgacom Advanced Networks & Systems / Network Innovation &
> >>> Strategy / Strategy Architecture & Economics of Core network.
> >>> > Koning Albert II laan 27, 1030 Brussel, Belgium
> >>> >
> >>> > T:    32 2 202 45 80
> >>> > GSM:  32 476 28 70 25
> >>> >
> >>> > -----Original Message-----
> >>> > From: Mike Takefman [mailto:tak@xxxxxxxxx]
> >>> > Sent: 27 May 2002 21:27
> >>> > To: Anoop Ghanwani
> >>> > Cc: 'Daniel Zhu'; 'Necdet Uzun'; 'stds-802-17@xxxxxxxx'
> >>> > Subject: Re: [RPRWG] protection messages
> >>> >
> >>> > Only partially joking, many reliable protocols run over
> >>> > ethernet networks.
> >>> >
> >>> > I agree that we need to reliably have 50 ms reaction
> >>> > times to faults. It is not clear to me that sending protection
> >>> > information every 10ms in non fault conditions is a good
> >>> > idea.
> >>> >
> >>> > cheers,
> >>> >
> >>> > mike
> >>> >
> >>> > Anoop Ghanwani wrote:
> >>> > >
> >>> > > Mike,
> >>> > >
> >>> > > CSMA/CD is non-deterministic.  Anyway, I assume you were
> >>> > > joking (hence the chuckle?).
> >>> > >
> >>> > > -Anoop
> >>> > >
> >>> > > > -----Original Message-----
> >>> > > > From: Mike Takefman [mailto:tak@xxxxxxxxx]
> >>> > > > Sent: Friday, May 24, 2002 12:47 PM
> >>> > > > To: Anoop Ghanwani
> >>> > > > Cc: 'Daniel Zhu'; 'Necdet Uzun'; 'stds-802-17@xxxxxxxx'
> >>> > > > Subject: Re: [RPRWG] protection messages
> >>> > > >
> >>> > > >
> >>> > > > CSMA-CD comes to mind.
> >>> > > >
> >>> > > > he he he,
> >>> > > >
> >>> > > > mike
> >>> > > >
> >>> > > > Anoop Ghanwani wrote:
> >>> > > > >
> >>> > > > > Daniel,
> >>> > > > >
> >>> > > > > The exponential backoff is what I don't like.  I would
> >>> > > > > rather see it sent at a steady rate, or just transmitted
> >>> > > > > reliably so that there is no constant refresh.
> >>> > > > >
> >>> > > > > Are there any protocols that use a similar exponential
> >>> > > > > backoff to guarantee timely delivery?
> >>> > > > >
> >>> > > > > -Anoop
> >>> > > > >
> >>> > > > > > -----Original Message-----
> >>> > > > > > From: Daniel Zhu [mailto:dzhu@xxxxxxxxx]
> >>> > > > > > Sent: Friday, May 24, 2002 11:19 AM
> >>> > > > > > To: Anoop Ghanwani
> >>> > > > > > Cc: 'Necdet Uzun'; 'stds-802-17@xxxxxxxx'
> >>> > > > > > Subject: Re: [RPRWG] protection messages
> >>> > > > > >
> >>> > > > > >
> >>> > > > > > Anoop,
> >>> > > > > >
> >>> > > > > > I believe, in the current RPR draft, protection message will
> >>> > > > > > be broadcast periodically every 1 second in steady state.
> >>> > > > > > During period of changes, protection message will be sent
> >>> > > > > > much more frequently with a back off scheme up to 1 second.
> >>> > > > > >
> >>> > > > > > Is there something missing here?
> >>> > > > > >
> >>> > > > > > Daniel
> >>> > > > > >
> >>> > > > > > Anoop Ghanwani wrote:
> >>> > > > > >
> >>> > > > > > > Necdet,
> >>> > > > > > >
> >>> > > > > > > Thanks for pointing this out.  Per the current draft,
> >>> > > > > > > Type B's aren't sent that often (1/10-th the rate of
> >>> > > > > > > Type A's) and so it's possible that they can be
> >>> > > > > > > sourced in software.
> >>> > > > > > >
> >>> > > > > > > Anyway, let's assume for now that we absolutely had
> >>> > > > > > > to keep protection and fairness separate.  How would
> >>> > > > > > > you recommend that we address the issue of timely
> >>> > > > > > > delivery of the protection notification message?
> >>> > > > > > >
> >>> > > > > > > I see only 2 possibilties:
> >>> > > > > > >
> >>> > > > > > > - Periodic link status broadcasts (regardless of whether
> >>> > > > > > >   the link is up or not).
> >>> > > > > > >
> >>> > > > > > > - Hop-by-hop reliable broadcast when the link status
> >>> > > > > > >   changes.
> >>> > > > > > >
> >>> > > > > > > I'm OK with either.  Can you think of any other ways
> >>> > > > > > > to do this?
> >>> > > > > > >
> >>> > > > > > > -Anoop
> >>> > > > > > >
> >>> > > > > > > > -----Original Message-----
> >>> > > > > > > > From: Necdet Uzun [mailto:nuzun@xxxxxxxxx]
> >>> > > > > > > > Sent: Thursday, May 23, 2002 7:13 PM
> >>> > > > > > > > To: Anoop Ghanwani
> >>> > > > > > > > Cc: 'stds-802-17@xxxxxxxx'
> >>> > > > > > > > Subject: Re: [RPRWG] protection messages
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > > > Anoop,
> >>> > > > > > > >
> >>> > > > > > > > Type B fairness message is generated by Fairness
> >>> > > > Control Unit (in
> >>> > > > > > > > hardware) and sent to client, whereas
> >>> protection messages are
> >>> > > > > > > > generated
> >>> > > > > > > > MAC control unit (which is implemented in software) and
> >>> > > > > > multicast to
> >>> > > > > > > > other MACs' control units. Combining them is the worst
> >>> > > > > > that can happen
> >>> > > > > > > > (HW vs SW, microsecond time frame vs millisecond time
> >>> > > > frame etc.)
> >>> > > > > > > >
> >>> > > > > > > > Thanks.
> >>> > > > > > > >
> >>> > > > > > > > Necdet
> >>> > > > > > > >
> >>> > > > > > > > Anoop Ghanwani wrote:
> >>> > > > > > > >
> >>> > > > > > > > > I had a comment that expressed concern about
> >>> the delivery
> >>> > > > > > > > > of protection notification messages.
> >>> > > > > > > > >
> >>> > > > > > > > > The way things are defined in D0.2, the messages are
> >>> > > > > > > > > neither reliable nor periodic.  There are no
> >>> > > > > > > > > acknowledgments, so we are never sure that all nodes
> >>> > > > > > > > > have seen the protection notification message.
> >>> > > > > > > > > Sending special protection messages periodically
> >>> > > > > > > > > increases the overhead (but even that is not
> >>> specified).
> >>> > > > > > > > > Why can't we piggyback the protection notification
> >>> > > > > > > > > onto Type B fairness messages since they are required
> >>> > > > > > > > > to be sent frequently in any case (typically more
> >>> > > > > > > > > frequently than 1 msec)?
> >>> > > > > > > > >
> >>> > > > > > > > > The ad hoc's response to my comment says that Type B's
> >>> > > > > > > > > are optional.  This is not true.  Sending of
> >>> both Type A
> >>> > > > > > > > > and Type B messages is mandatory per D0.2 and
> >>> there have
> >>> > > > > > > > > been no comments to change that behavior.
> >>> > > > > > > > >
> >>> > > > > > > > > -Anoop
> >>> > > > > > > > > --
> >>> > > > > > > > > Anoop Ghanwani - Lantern Communications - 408-521-6707
> >>> > > > > > > >
> >>> > > > > >
> >>> > > >
> >>> > > > --
> >>> > > > Michael Takefman              tak@xxxxxxxxx
> >>> > > > Manager of Engineering,       Cisco Systems
> >>> > > > Chair IEEE 802.17 Stds WG
> >>> > > > 2000 Innovation Dr, Ottawa, Canada, K2K 3E8
> >>> > > > voice: 613-254-3399       fax: 613-254-4867
> >>> > > >
> >>> >
> >>> > --
> >>> > Michael Takefman              tak@xxxxxxxxx
> >>> > Manager of Engineering,       Cisco Systems
> >>> > Chair IEEE 802.17 Stds WG
> >>> > 2000 Innovation Dr, Ottawa, Canada, K2K 3E8
> >>> > voice: 613-254-3399       fax: 613-254-4867
> >>> >
> >>> > **** DISCLAIMER ****
> >>> > "This e-mail and any attachments thereto may contain information
> >>> > which is confidential and/or protected by intellectual property
> >>> > rights and are intended for the sole use of the
> >>> recipient(s) named above.
> >>> > Any use of the information contained herein (including, but
> >>> not limited to,
> >>> > total or partial reproduction, communication or
> >>> distribution in any form)
> >>> > by persons other than the designated recipient(s) is prohibited.
> >>> > If you have received this e-mail in error, please notify
> >>> the sender either
> >>> > by telephone or by e-mail and delete the material from any computer.
> >>> > Thank you for your cooperation."
> >>>
> >>> --
> >>> Michael Takefman              tak@xxxxxxxxx
> >>> Manager of Engineering,       Cisco Systems
> >>> Chair IEEE 802.17 Stds WG
> >>> 2000 Innovation Dr, Ottawa, Canada, K2K 3E8
> >>> voice: 613-254-3399       fax: 613-254-4867
> >>>