RE: [EFM] OAM - Faye's seven points
Geoff,
 
I am 
pretty sure your points are valid.  Based on my past
experience (that is not the only scenario, I do 
recognize
that!), here are some points.
 
The assumptions I stated 'to avoid sending a 
technician' are:  
 
1. CPE 
is cheap, therefore it will be quite expensive to 
have 
an alternative dial up management interface dedicated
for 
each CPE.  The only way any management commands
from 
carrier can get to CPE is through the headend.
 
2. 
When a CPE is in trouble, not necessarily just the EFM
link 
that is bad.  This can be software hanging or 
subscriber
line 
went bad.  (This means reset can reset both the CPE
OR the 
subscriber line).
 
3. If 
the first reset CPE command timed out, falls back to
'link 
keep-alive' stage and determine if the link is good/bad?
Note 
that if the 'reset' command is issued by EMS to the
headend or CPE directly, EMS usually does have 
retries.
This 
becomes a design issue with headend to handle 
the 
proxy commands correctly that not to congest it's
management links to the CPEs.
 
In 
another words, there are two management segments:
 
One 
from EMS to head-end
and 
one from head-end to CPE(s)
 
The 
later is closer to device management (like ILMI in
ATM) 
than legacy network management. 
 
These 
are my assumptions, please correct me if I am
wrong.
 
-faye
  Faye
At 04:54 PM 9/18/01 -0700, Faye Ly 
  wrote:
  Geoff,
 
Some OAM traffic is more 
    critical than others.  For example -
 
OAM command like 
    'reset' (in our case, reset CPE) should not be retried. 
  
Actually, I don't agree. Resets should be confirmed by the 
  entity being reset. If a confirmation is not received in a "reasonable" amount 
  of time then the protocol should try some set number of times before giving up 
  and assuming that communication has been lost with the entity.
  Certainly don't want to reset the CPE a couple 
    of times  just because network is slow. 
Agreed, but that 
  doesn't negate what I said above rather it says that your reset retry protocol 
  should have a reasonable amount of time between retrys.
  Giving up means sending a technician to the 
    field to actually toggle the power button on the CPE.
If this 
  is the case then the equipment vendor will have done a bad job of systems 
  design. Hopefully there will be enough information elsewhere in the system to 
  help figure out if the cable has gone open or hopelessly noisy or the far end 
  power has gone down or anyone of a number of other real live faults as opposed 
  to a far-end microprocessor program counter going off into the weeds. That is 
  the result of poor design.
  This is very expensive.  
  
Agreed
  The whole reason of requesting for a dedicated 
    OAM channel/IPG/whatever is to gurantee that no actual human needs to be 
    sent to the field.   Maybe this is not do-able but we ought to try 
    our best.
I do not believe that there is any correlation 
  between the need to send/not send a technician to the field and the presence 
  of a separate OAM channel.
  
On a side note -
 
Can you 
    please clarify the statement "P2P PHYs do not drop 
  packets"?
P2P PHYS don't drop packets any more (or any less) 
  than any other piece of pipe.
There are only 2 places for bits to go in a 
  setup that consists 
  of:
________                    
  _________
P2P PHY |_____MEDIUM_______| P2P 
  PHY
________|                  
  |_________
1) Go where they are supposed to.
2) Not go there, in 
  which case you can't communicate with anything in the far end.
When the 
  link is reestablished (or gone around) then there are already plenty of 
  counters to look at in the existing MAC management to count the lost 
  packets.
  This is good.  I don't need to keep all 
    those dropped packets/bytes
error counters then.  
    Thanks.
 
-faye
Geoff