[RPRWG] [OAM-AH] Flows
Title: RE: A place for papers on RPR website
The
issue of which OAM flows should be supported by the RPR layer was discussed in
the OAM-Ad Hoc during the Ottawa meeting. Regarding the Echo flow it was the
consensus of the group that it should be supported, but the opinions were
divided regarding: CC, RDI and Activation/Deactivation
flows.
We
decided to ask for inputs from the whole group, before we make changes to the
draft. So, as per Mike's email, anyone that has an opinion on this issue is
invited to send me an email. Next Sunday (5/26) I will send an email to all
the interested people with the list of participants in the Ad-Hoc. From then on,
all the OAM Ad-Hoc related emails should be sent to that list only (of course,
anyone that will want to be added to the list latter on can do so by sending me
an email). I suggest to start with emails, and then set up a conference call to
reach the final decision of the Ad-Hoc.
To
help you decide if you want to be part of the OAM Ad-hoc and as a start point,
here is my opinion on this issue (it reflects what has been accepted, as for
today, into the draft):
RPR defines a: "network technology optimized for the
use in the MAN" (from draft 0.2), and as
such it has to provide tools that will enable maintenance and fault localization
in a fast and cost effective way for large (in number of stations and ring
circumference) networks, that may be shared by different managing entities. This
is different from the LAN case in which the network is restricted to a limited
geographical area managed by a single entity.
An important feature of OAM is to be able to discover
faults fast (at least before the user complains) and indicate where the fault
is: Station and layer. Thus each layer should have its own OAM mechanisms, to
allow the segregation and the "agnostic"
behavior.
One such mechanism is Echo, basically defined as: send
a frame to a destination, the destination should loop the Echo back, and the
source expects to receive the looped frame. The source waits for Tsec, and if no
response is received back it declares a failure. Echo is a very good tool during
configuration and fault localization, but it is not a continuous fault
monitoring tool. We could of course define a "Continuous Echo" mode in which the
Echo frame is sent continuously, but this method will not cover all the
faults.
As an example, let us assume that we have a ring and at
some point in time a configuration change in a Station, or a fault, makes the
Station address identical to another Station address. The Echo will still be OK,
but it may be looped by the wrong Station, no fault indication will be raised
(note that lower layers will not see this
fault).
Let us now look how CC operates: Each Station sends to
each other station a frame once every Tsec (continuously). The destination
station expects these frames, and if it does not receive them within
nxTsec it rises a fault indication. In the example above, if a Station is
stealing the CC frames, the destination Station will not receive them, and a
fault indication will be raised. Now the network manager knows which Stations
are affected, and the failure location.
RDI is useful in
single side failures (for example a Station is missconfigured in one ringlet
only). In this case only one Station will discover the fault, the other side
will still receive the CC and operate normally. So the RDI is the vehicle to
indicate that a failure has been detected by the other station, without the need
to correlate faults through management (that may be from different service
providers).
The example above is a simple one, I suppose that the
topology mechanism may eventually find that there is something wrong, but the CC
will add immediately very valuable information, and in my opinion the task of
the topology discovery mechanism is exactly what it declares: to discover the
topology, and not to discover faults and fault locations. Note also that the
fairness keep alive will not discover this type of
faults.
It has been claimed that RPR is a MAC and it is not
connection oriented, and that CC is more suited for ATM and MPLS. My opinion is
that CC does not dictate connection oriented, it only verifies connectivity in
our shared media between any pair of stations. In other words, since RPR is a
shared media without physical connection between non adjacent stations, the CC
can be viewed as a heart beat between any pair of
Stations.
Regarding the CC timeouts, this is something that has
to be discussed. My opinion is a CC once per second, at this rate it can be
implemented either in the MAC hardware or in the MAC software (for a 128
Stations ring, 127 CC frames have to be sent and 127 monitored, every second). I
also think that this has to be an optional mechanism, since service providers
may prefer to save bandwidth at the cost of lower availability figures. The
Bandwidth required for a 128 stations ring with CC enabled between all the
stations is ~1.5Mbps per ringlet (0.15% for a 1Gbps
ring).
Regarding the Activation/Deactivation flow it allows
the Station to start CC without the need to coordinate the operation through
management. It is useful specially when the Stations are "owned" by different
management entities, but it also saves coordination of the activation and
deactivation of the CC sink side with the CC source side to avoid unwanted
alarms during CC configuration. It is my opinion that this flow will be handled
by the MAC Software, and the timers where set accordingly. It is also optional,
and the process allows supporting Stations to interoperate gracefully with
non-supporting stations.
Leon