Thread Links | Date Links | ||||
---|---|---|---|---|---|
Thread Prev | Thread Next | Thread Index | Date Prev | Date Next | Date Index |
Jason: I
got tied up in a few things, so I haven’t been able to spend as much time as
would be liked. I still have not received any e-mails from the IPS/Topology
AdHoc, so I am still in the dark regarding any progress being made. I am
assuming I was never added to that list. Anyhow,
some observations regarding IPS stability and the sequence checks… I
modified my code based on D2.2 to allow dynamic changes to the sequence number
check algorithm used. Using
the current D2.2 check on sequence #, the ring never seems to converge (as per
my original comments). Part of this was because I was operating my state
machines event driven rather than storing packet info and processing later when
all other checks occur. That can be quite burdensome for the S/W. Also, there
are some really bad startup conditions to attempt to overcome. I
tried the sliding window approach (Case 2 below), which is essentially your
original Check1/Check2 in a more optimized form. If the sequence # is >=, I
will process the packet. This works well, subject to some caveats described
below. Case
1 below simply processes all packets regardless of sequence number. This proves
to be fairly unstable. It worked better than the plain Yes/No check in the
current specification, but suffered from instability and severe protocol
flapping when entering/exiting wrap modes (note: I did not try this in a
steered mode). I would see several hundred transitions/packet generations
happening from a single event (and that was when it stabilized fast). This
would seem to indicate we do need some type of sequence checking to enforce “newness”
rules. The
sliding window method of sequence number checking worked well until the S/W on
a previously connected node is restarted, and the start sequence # is then in
the wrong half of the sequence range. At this point, the protocol stale-mates –
just like the current D2.2 bad startup conditions. There are other ways this
can happen, but this is the easiest to describe. What
I would recommend (and I will try to verify the concept in the near future), would
be to store the last “bad” sequence number received. If 3 (or some configurable
constant) consecutive sequences are received which are in the forward moving
direction of the stored “bad” sequence #, reset the sequence # to adopt the new
sequence to the “bad” number. This will force a resynchronization, but allow
adoption of the windowed approach of sequence checking. Additionally,
I believe there needs to be a value for “sequence number currently not set”,
and certain state changes (like signal fail, topology change etc), that can set
the stored sequence number to that value which would allow “any” new sequence
number to be automatically declared valid.
switch (RPR802_17IPSCheckAndReserveInNodeArray (ring_unit, node_num,
ringlet, mac_addr)) {
case RPR_802_17_EXISTING_ENTRY:
switch (rpr_802_17_sequence_check_type)
{
case 1: /*
Skip the sequence check in this mode */
ring_unit->node_info [ringlet][node_num]->current_sequence_num =
sequence;
process_data = TRUE;
break;
case 2:
/* Try a sliding window sequence number */
if (((sequence +
~(ring_unit->node_info [ringlet][node_num]->current_sequence_num
&
RPR_802_17_IPS_SEQNUM_MASK) + 1) & 0x20) == 0)
{
ring_unit->node_info [ringlet][node_num]->current_sequence_num =
sequence;
process_data = TRUE;
}
else
{
printk (KERN_WARNING "IPS: Node: %lu, Ringlet: %u - Sequence check
failed. Recv: %u, stored: %u.\n",
node_num, ringlet,
sequence,
ring_unit->node_info [ringlet][node_num]->current_sequence_num);
}/*IF*/
break;
case 0:
default:
/* The official D2.2 specification check */
if (sequence != ring_unit->node_info
[ringlet][node_num]->current_sequence_num)
{
ring_unit->node_info
[ringlet][node_num]->current_sequence_num = sequence;
process_data = TRUE;
}/*IF*/
break;
}/*SWITCH*/
break; case
RPR_802_17_NEW_ENTRY:
ring_unit->node_info [ringlet][node_num]->current_sequence_num =
sequence;
process_data = TRUE;
break;
case RPR_802_17_REPLACED_ENTRY:
ring_unit->node_info [ringlet][node_num]->current_sequence_num =
sequence;
process_data = TRUE;
break;
case RPR_802_17_ALLOC_FAILURE:
bad_packet = TRUE;
printk (KERN_WARNING "IPS: Node: %lu, Ringlet: %u - Unable to
locate an empty node.\n",
node_num,
ringlet);
break;
default:
/* All enums are accounted for...must be a program error */
printk (KERN_WARNING "IPS: Program error. Invalid enumerated value
received for type RPR_802_17_ALLOC_RESULT.\n");
break;
}/*SWITCH*/ Hope
this information helps, Regards, Michael
Allen -----Original Message----- Hi Michael, Thanks for your inputs. There is work going on in the PAH related
to the protection state machine to simplify and clarify its presentation. Jim
and I will make sure that you are added to the PAH mailing list. The normal PAH
meeting time is 9:30 am on Tuesdays, and call-in information is sent weekly by
Jim. I'm glad that you are implementing and testing the topology and
protection portions of the standard. This will be very important to determine
problems in the state machines as currently defined. Your inputs at the PAH
will be quite valuable. In terms of your questions, the intent of the protection state
machine is that all checks for a given state must be performed upon any trigger
that causes entry to the state machine (until a check passes). This will be
made clear in upcoming versions of the draft. The information that needs to be stored
for handling WTR expiration is neighbor station information that enables a
given station to determine what action to take when its WTR timer expires. Only
messages received from the short path neighbor station are relevant, so
information in messages from other stations on the ring doesn't need to be
separately stored. Since you have a test implementation of the protection state
machine, it would be great if you would bring up issues that you've seen with
the definition of the sequence number check to the PAH, and also via comment. -- Jason -----Original
Message----- Hi Jason: If I follow your logic to completion, this
would mean I have to cache *ALL*
the IPS messages from *ALL* nodes
and then when any event occurs (like WTR), I would have to reprocess all those
messages. This is because some of the tests are based on packets from
neighbors, and some are tests against non-neighbors (state 28 for example). In addition to the points I made below about
sequence #’s, there can be problems with startup. When a neighbor is instantiated,
the sequence # is often 0 (structures are normally zero initialized). As such,
the first neighbor message will probably have a zero sequence too. Wouldn’t
this cause that message to be ignored? As a side note…I defeated the sequence check,
and the state machine seems to work much more reliably – I actually end up with
totally idle nodes rather than stick WTR’s. Regards, Michael -----Original Message----- Hi Michael, The intent of line 36 is that if a TP frame is
received from a neighbor that meets the conditions of line 36 prior to the WTR
expiring, the relevant information in the TP frame will still be available for
the purposes of the check when WTR expires. At that point the transition will
occur into the IDLE state. The expiration of WTR is a trigger for doing state
machine processing just as when a new TP frame is received. -- Jason -----Original
Message----- I have been implementing the D2.2 IPS state
machine and run into what I believe is an issue with the new sequence # check.
If I interpret the check correctly, it looks like the first packet received
with a new sequence # is processed & then all future ones are suppressed
(until the sequence # changes again). It seems like this can cause a problem when
attempting to unwrap a link (on a wrapping ring), since a WTR node would
require state 36 to fire to get out of the WTR and the neighbor is the same as the
original. The problem is when the packet defined in state 36 arrives, but the
WTR timer has not expired. Once the packet has been processed, and the timer
later expires, it is not possible to get into state 36. Also, state 40 is the only state that formally
copies neighbor addresses. Since the whole state machine banks on the neighbor
MAC addresses being updated at the right time, this should be in the state
tables. It seems as though ALL
ClearXxxSideEdgeStatus() operations should remove a wrap. That way, reception
of IPS with STEER mode can switch the ring from wrapping to steered
immediately, and any current wrap conditions will get removed. Only the wrap
operation should be conditional on all nodes being able to wrap. Comments are welcome, Regards, Michael Allen |