RE: [EFM] OAM - Faye's seven points
Geoff,
I am
pretty sure your points are valid. Based on my past
experience (that is not the only scenario, I do
recognize
that!), here are some points.
The assumptions I stated 'to avoid sending a
technician' are:
1. CPE
is cheap, therefore it will be quite expensive to
have
an alternative dial up management interface dedicated
for
each CPE. The only way any management commands
from
carrier can get to CPE is through the headend.
2.
When a CPE is in trouble, not necessarily just the EFM
link
that is bad. This can be software hanging or
subscriber
line
went bad. (This means reset can reset both the CPE
OR the
subscriber line).
3. If
the first reset CPE command timed out, falls back to
'link
keep-alive' stage and determine if the link is good/bad?
Note
that if the 'reset' command is issued by EMS to the
headend or CPE directly, EMS usually does have
retries.
This
becomes a design issue with headend to handle
the
proxy commands correctly that not to congest it's
management links to the CPEs.
In
another words, there are two management segments:
One
from EMS to head-end
and
one from head-end to CPE(s)
The
later is closer to device management (like ILMI in
ATM)
than legacy network management.
These
are my assumptions, please correct me if I am
wrong.
-faye
Faye
At 04:54 PM 9/18/01 -0700, Faye Ly
wrote:
Geoff,
Some OAM traffic is more
critical than others. For example -
OAM command like
'reset' (in our case, reset CPE) should not be retried.
Actually, I don't agree. Resets should be confirmed by the
entity being reset. If a confirmation is not received in a "reasonable" amount
of time then the protocol should try some set number of times before giving up
and assuming that communication has been lost with the entity.
Certainly don't want to reset the CPE a couple
of times just because network is slow.
Agreed, but that
doesn't negate what I said above rather it says that your reset retry protocol
should have a reasonable amount of time between retrys.
Giving up means sending a technician to the
field to actually toggle the power button on the CPE.
If this
is the case then the equipment vendor will have done a bad job of systems
design. Hopefully there will be enough information elsewhere in the system to
help figure out if the cable has gone open or hopelessly noisy or the far end
power has gone down or anyone of a number of other real live faults as opposed
to a far-end microprocessor program counter going off into the weeds. That is
the result of poor design.
This is very expensive.
Agreed
The whole reason of requesting for a dedicated
OAM channel/IPG/whatever is to gurantee that no actual human needs to be
sent to the field. Maybe this is not do-able but we ought to try
our best.
I do not believe that there is any correlation
between the need to send/not send a technician to the field and the presence
of a separate OAM channel.
On a side note -
Can you
please clarify the statement "P2P PHYs do not drop
packets"?
P2P PHYS don't drop packets any more (or any less)
than any other piece of pipe.
There are only 2 places for bits to go in a
setup that consists
of:
________
_________
P2P PHY |_____MEDIUM_______| P2P
PHY
________|
|_________
1) Go where they are supposed to.
2) Not go there, in
which case you can't communicate with anything in the far end.
When the
link is reestablished (or gone around) then there are already plenty of
counters to look at in the existing MAC management to count the lost
packets.
This is good. I don't need to keep all
those dropped packets/bytes
error counters then.
Thanks.
-faye
Geoff