366 lines
16 KiB
Plaintext
366 lines
16 KiB
Plaintext
|
9/2/1998 JosephJ
|
||
|
As part of implementing promiscuous mode multicast, we need to make some changes
|
||
|
to the mars server.
|
||
|
|
||
|
Some observations:
|
||
|
* RCF2022 has special support for routers wanting to monitor blocks of
|
||
|
NONOVERLAPPING addresses. It handles these blocks somewhat independantly
|
||
|
from individual group joins: a block can overlap one or more individual
|
||
|
group joins. On leaving the block, the mars server maintains the
|
||
|
individual group joins.
|
||
|
|
||
|
Changes/fixes required:
|
||
|
1. MarsPunchHoles needs to be written to change time complexity from
|
||
|
O(number of addresses in block)
|
||
|
to
|
||
|
O(number of mcs address-ranges + number of individual groups).
|
||
|
|
||
|
Currently, it runs through all 268 million possible addresses
|
||
|
when processing a join of the entire class-D address space!
|
||
|
|
||
|
Fix is jump to the end of the hole (if in a hole) or to the
|
||
|
beginning of the next hole (if not in a hole).
|
||
|
|
||
|
Also, it was enumerating the list of groups wrongly -- using
|
||
|
pGroup->Next instead of pGroup->NextGroup.
|
||
|
|
||
|
2. MarsAddClusterMemberTogroups:
|
||
|
When processing an join of the entire class-D space, it needs to check if
|
||
|
the member has already joined the entire space, in which case it
|
||
|
should treat the join request as a duplicate (send it back on the private
|
||
|
vc). Currently it will treat it as a new entry and hence will send
|
||
|
it on the cluster-control-vc. We would end up with duplicate entries.
|
||
|
MarsDelClusterMemberFromGroups:
|
||
|
Similarly, when when processing a leave of the entire class-D space
|
||
|
when in fact the member has already left, we must make sure
|
||
|
that we reflect it privately.
|
||
|
|
||
|
3. MARS_MULTI processing: MarsHandleRequest had to be modified to
|
||
|
also return the hw addresses of members monitoring the entire class-D
|
||
|
address space (but also make sure that it doesn't send duplicate
|
||
|
hw addresses if the member is also minitoring the specific
|
||
|
group address).
|
||
|
|
||
|
9/14/1998 JosephJ
|
||
|
|
||
|
There are reports of bugchecks during boot up. One is bug# 212412
|
||
|
(ATMARPS: MarsSendRedirect referenced past end of pool). The stack was:
|
||
|
f7cfbcdc f276ba36 00000000 fa502000 00000000 ntoskrnl!KiTrap0E+0xc3
|
||
|
f7cfbd80 f276cc04 80a11828 f92cff18 00000000 atmarps!MarsSendRedirect+0xbe
|
||
|
|
||
|
MarsSendRedirect is a timer callback function. It it creates and sends
|
||
|
a redirect packet containing all the configured and registered addresses.
|
||
|
However it doesn't claim any lock. So it's possible that it's in the
|
||
|
middle of doing this when registered addresses are added (they are
|
||
|
added on ArpSCoRequestComplete callbacks.
|
||
|
|
||
|
So we fix this by adding a lock to the interface when reading/modifying
|
||
|
these addresses: in MarsSendRedirect and in ArpSCoRequestComplete. Checked
|
||
|
in today.
|
||
|
|
||
|
10/24/1998 JosephJ
|
||
|
|
||
|
Added statistics -- created ARP_SERVER_STATISTICS and MARS_SERVER_STATISTICS
|
||
|
(iocto.h, ioctl.c, also the added functionality to atmarp.exe to dump this
|
||
|
stuff).
|
||
|
|
||
|
10/25/1998 JosephJ
|
||
|
Added support to actually fill out the statistics: ndis.c, mars.c, arps.c, etc.
|
||
|
|
||
|
Some clarifications...
|
||
|
|
||
|
Mcast Joins: failed doesn't include dups.
|
||
|
|
||
|
Mcast requests: MCMesh requests include responses to those nodes that are in
|
||
|
promiscuous mode.
|
||
|
|
||
|
Discarded pkts: usually means pkts discarded due to error/unsupported format or
|
||
|
because a resource allocation failure. Also includes pkts discarded because
|
||
|
they are not from a cluster member. If the discarded pkt is known to be for a
|
||
|
particular task, say a join, then the count of failed joins is also incremented.
|
||
|
|
||
|
We do not explicitly keep track of failed mars or arp requests which are
|
||
|
ignored (either acked or nacked) -- so if you see the arp request count
|
||
|
increment but the ack+nak not incrementing, you can conclude that the request
|
||
|
is being ignored.
|
||
|
|
||
|
10/25/1998 JosephJ
|
||
|
While implementing the above, found the following existing bugs:
|
||
|
|
||
|
MarsReqThread -- if it gets a pkt with an unrecognized opcode, it simply
|
||
|
leaves it in limbo -- it should put it back into the free list! Otherwise
|
||
|
we'll soon run out of pkts if we get barraged with bad pkts.
|
||
|
|
||
|
MarsDelClusterMemberFromGroups: it should
|
||
|
(a) delete a mars entry when
|
||
|
it's list of members goes to NULL -- delete consists of removing
|
||
|
it from the hash table and calling ArpSFreeBlock on it.
|
||
|
(b) actually free the pGroup, calling FREE_MEM.
|
||
|
|
||
|
checked in 10/28/1998
|
||
|
|
||
|
12/22/1998
|
||
|
|
||
|
Made a change to the following code in ArpSCoRequestComplete:
|
||
|
[Old code]
|
||
|
pIntF->NumAddressesRegd ++;
|
||
|
if (pIntF->NumAddressesRegd < pIntF->NumRegdAddresses)
|
||
|
{
|
||
|
...
|
||
|
}
|
||
|
[New code]
|
||
|
We inly increment NumAddressRegd if
|
||
|
(pIntF->NumAddressesRegd < pIntF->NumRegdAddresses).
|
||
|
We assert otherwise. See 12/22/1998 Note in
|
||
|
ArpSCoRequestComplete for details.
|
||
|
|
||
|
2/26/1999 JosephJ
|
||
|
Bug: #297656 ATMARPS: Unbinding arp/mars on server A does not unregister
|
||
|
the well-known address from server A.
|
||
|
Fix:
|
||
|
I unregister the registered addresses in shutdowninterface, and block until all
|
||
|
the unregistrations complete. I decided to block because the completions will
|
||
|
likely be asynchronous and we immediately go on to close the Af and then
|
||
|
deallocate the object, so I was not comfortable with not waiting for
|
||
|
completions) of the unregistrations.
|
||
|
|
||
|
2/29/1999 JosephJ
|
||
|
|
||
|
Stress hit the following assert with checked atmarpc.sys, 1990:
|
||
|
Assert NdisPartyHandle != NULL failed: file mars.c, line 3247
|
||
|
|
||
|
The assert was because the ndispartyhandle was NULL in the context of the
|
||
|
dropparty handler.
|
||
|
|
||
|
The crutial observation is the following 2 identical lines of debug spew before
|
||
|
the assert:
|
||
|
0:MARS: AddMemberToCC: pVc 868178c8, pMember 87736808, ConnState 2
|
||
|
0:MARS: AddMemberToCC: pVc 868178c8, pMember 87736808, ConnState 2
|
||
|
|
||
|
The 2nd call to NdisAddParty (from MarsAddMemberToClusterControlVc) clobbers
|
||
|
the previous value of pMember->NdisPartyHandle on entry and then probably fails
|
||
|
(because there's already a party).
|
||
|
|
||
|
Looking for ways that MarsAddMemberToClusterControlVc could be called twice
|
||
|
(which is not supposed to happen), the only way I can see this happening
|
||
|
consistant with the debug spew, is that an incoming registration-join from a
|
||
|
new member (MarsHandleJoin) came at the same time that we were handling
|
||
|
post-processing of the initial CC MakeCall complete
|
||
|
(ArpSMakeCallComplete, 1195,ndis.c).
|
||
|
|
||
|
There are holes in the way vc and member flags are set and checked that would
|
||
|
lead to MarsAddMemberToClusterControlVc being called from both code paths.
|
||
|
(The enumeration in line 1185 of ArpSMakeCallComplete is also dangerous in
|
||
|
that it could potentially deref a freed pMember, but that's not what happened
|
||
|
here).
|
||
|
|
||
|
The following functions have similary dangerous emumeration/assumption that
|
||
|
pMember will be valid:
|
||
|
MarsAbortAllMembers
|
||
|
ArpSDropPartyComplete
|
||
|
MarsDelMemberFromClusterControlVc
|
||
|
|
||
|
|
||
|
FIXES to all of the above:
|
||
|
1. Added new function MarsIsValidClusterMember that makes sure a particular
|
||
|
member is in the cluster list.
|
||
|
2. This function is called from:
|
||
|
MarsDelMemberFromClusterControlVc (which simply returns if invalid).
|
||
|
ArpSMakeCallComplete (which stops enumeration if a pMember is invalid).
|
||
|
3. MarsAbortAllMembers fixed to do a safe enumeration of all members.
|
||
|
4. MarsAddMemberToClusterControlVc returns without doing anything if
|
||
|
if (MARS_GET_CM_CONN_STATE(pMember) != CM_CONN_IDLE) (check is made
|
||
|
AFTER getting the IF lock).
|
||
|
|
||
|
3/3/1999 JosephJ Revisiting MarsAbortAllMembers and DelMembersfromVc
|
||
|
-- safe enumeration used.
|
||
|
|
||
|
|
||
|
3/3/1999 JosephJ CM_INVALID use
|
||
|
-- fix del member on add-party complete, but be careful on make-call complete
|
||
|
(maybe ok if there are no other members at this time).
|
||
|
|
||
|
3/16/1999 JosephJ Actually checked in the following files...
|
||
|
ioctl.c v13 fix uninit var
|
||
|
mars.c v12 299201 -- various robustness-re
|
||
|
mars.h v2 299201 -- various robustness-re
|
||
|
ndis.c v16 299201 -- various robustness-re
|
||
|
protos.h v9 299201 -- various robustness-re
|
||
|
|
||
|
All of the above are the fixes explained in 2/29 and 3/3 entries above
|
||
|
|
||
|
registry.c v5 ArpSReadAdapterConfig... p -- p
|
||
|
(make sure proper default values are in place in case the
|
||
|
corresponding call to read registry values fails --
|
||
|
The are two cases:
|
||
|
- pConfig->RegAddresses -- this is benign in all cases (because
|
||
|
pConfig->NumAllocedRegdAddresses is set to 0 on failure),
|
||
|
EXCEPT in ArpSReadAdapterConfiguration, where
|
||
|
PrevRegAddresses is freed if it is non NULL.
|
||
|
Fix is to initialize pConfig->RegAddresses to NULL before
|
||
|
calling NdisReadocnfiguration.
|
||
|
- pConfig->pMcsList -- this is not benign.
|
||
|
Fix is to init it to NULL before calling NdisReadocnfiguration.
|
||
|
|
||
|
04/20/1999 JosephJ Fix for 327626 Need Rouge ARP Server detection on ARP
|
||
|
registration.
|
||
|
|
||
|
1st version of fix:
|
||
|
BEFORE registering address, we try to make a call to the address. If it
|
||
|
succeeds, we log an event in the event log and fail the initialization.
|
||
|
|
||
|
2nd version of fix: keep call open -- if we get an incoming close, we
|
||
|
then try again.
|
||
|
|
||
|
Note: it's possible that if two notes are doing this at the same time they
|
||
|
may still both come up, i.e., rogue detection will fail.
|
||
|
|
||
|
05/05/1999 JosephJ 331517 - bugchecks due to wrong VC PacketSize
|
||
|
The bugchk was triggered by the fact that the mars server got several incoming VCs
|
||
|
with very large max packet size (they should all be 9188, but we saw 64008 for
|
||
|
several and 18200 for one).
|
||
|
|
||
|
Fix (in MarsHandleRequest (mars.c)) is to replace SHORT and USHORT local variables
|
||
|
by ULONG.
|
||
|
|
||
|
05/05/1999 JosephJ rogue ARP server detection contd...
|
||
|
ArpSCoAfRegisterNotify -> ArpSOpenAfComplete ->
|
||
|
ArpSRegisterSap -> ArpSRegisterSapComplete
|
||
|
|
||
|
ArpSBindAdapter -> ArpSReadAdapterConfiguration -> ArpSQueryAndSetAddresses
|
||
|
|
||
|
ArpSCoRequestComplete
|
||
|
|
||
|
Configured address (pIntF->ConfiguredAddress) filled by GET OID_CO_GET_ADDRESSES
|
||
|
registered addresses (pIntF->RegAddresses[])
|
||
|
|
||
|
Fri 05/14/1999 JosephJ Rogue ARP server detection contd.
|
||
|
Things are kicked off on getting a OID_CO_ADDRESS_CHANGE notification.
|
||
|
We (as before) query the adapter for the configured address.
|
||
|
On completion of this (OID_CO_GET_ADDRESS), we start the process of validating
|
||
|
and registering all addresses, by calling ArpSValidateAndSetRegdAddresses.
|
||
|
|
||
|
ArpSValidateAndSetRegdAddresses allocates and initializes pIntF->pRegAddrCtxt
|
||
|
(which keeps all context associated with the validation and registration of
|
||
|
addresses). A reference is added to pIntF for pRegAddrCtxt.
|
||
|
The function then calls ArpSValidateOneRegdAddress.
|
||
|
|
||
|
ArpSValidateOneRegdAddress attempts to initiate the validation & registration
|
||
|
of a single address. If there are no addresses left to be processed, it
|
||
|
will unlink pIntF->pRegAddrCtxt (deref pIntF and deallocate pRegAddrCtxt).
|
||
|
"Validation" consists of making a point-to-point to call to the address,
|
||
|
using the same call params as the atmarp client uses. If the call fails,
|
||
|
the address is considered "validated". The protocol's context for the VC
|
||
|
is pRegAddrCtxt itself.
|
||
|
|
||
|
The next stage happens in the make-call complete handler
|
||
|
(ArpSMakeRegAddrCallComplete).
|
||
|
|
||
|
ArpSMakeRegAddrCallComplete:
|
||
|
- on successful make call (which is a failed validation),
|
||
|
it it immediately closes the call. The close call handler
|
||
|
(ArpSCloseRegAddrCallComplete) deltes the vc, increments
|
||
|
pRegAddrCtxt->RegAddrIndex and calls
|
||
|
ArpSValidateOneRegdAddress (to initiate the validation & registration
|
||
|
of the NEXT address, if any).
|
||
|
|
||
|
-- On failed make call (which is a successful validation), it deletes the
|
||
|
vc and initiates registration of the address (calls NdisRequest with
|
||
|
OID_CO_ADD_ADDRESS). On completion of the OID_CO_ADD_ADDRESS
|
||
|
(ArpSCoRequestComplete), we do the following:
|
||
|
- on success, copy over the addres to
|
||
|
pIntF->RegAddresses[pIntF->NumAddressesRegd] and
|
||
|
increment pIntF->NumAddressesRegd ++
|
||
|
|
||
|
- on failure or success, we increment pRegAddrCtxt->RegAddrIndex
|
||
|
and call ArpSValidateOneRegdAddress (to initiate the validation &
|
||
|
registration of the NEXT address, if any).
|
||
|
|
||
|
A note on the use of pIntF->RegAddresses[...]
|
||
|
This array is initialized with all the user-specified addresses read from
|
||
|
the registry. As validation proceeds, however, the successfully validated
|
||
|
and registered addresses are copied sequentially into this array. If
|
||
|
*all* the addresses are successfully validated and registered, then the
|
||
|
end values in the array is the same as the initial values. However, if
|
||
|
some intermediate addresses are not validated or registered succcessfully,
|
||
|
then the end result will be different. In all cases, the first
|
||
|
pIntF->NumAddressesRegd entries will contain the registered addresses.
|
||
|
|
||
|
09/30/1999 JosephJ Bug 405851
|
||
|
*RC3SS: ATMARPS: Bugcheck unloading atmarps during shutdown
|
||
|
The basic problem is that ArpSIfList contains a pointer to a just-freed pIF.
|
||
|
|
||
|
The biggest problem I've found is that INTF_CLOSING state is set only in the
|
||
|
ArpSCloseAdapterComplete handler. This means that ArpSStopInterface (which
|
||
|
calls on ArpSReferenceIntF which *does* check the INTF_CLOSING flag) can be
|
||
|
called multiple times for the same adapter. Also ArpSReferenceIntF is called
|
||
|
from some other places. ArpSStopInterface is called from: ArpSShutDown
|
||
|
(called from arp's unload handler as WELL as when handling IRP_MJ_SHUTDOWN)
|
||
|
ArpSCoRequest(AF closing) and ArpSUnbindAdapter(unbind adapter handler).
|
||
|
|
||
|
ArpSStopInterface is NOT idempotent (it expects to be called only once per pIF),
|
||
|
but given the above, it CAN becalled multiple times per IF (i.e., because
|
||
|
INTF_CLOSING is only set on close adapter complete).
|
||
|
One specific problem with this: it assumes it can use pIntF->CleanupEvent.
|
||
|
|
||
|
Another problem:
|
||
|
ArpSShutDown: it goes through each IF in ArpSIfList, refs it (which would fail
|
||
|
if the IF is INTF_CLOSING), release list lock, then calls ArpSStopInterface.
|
||
|
It is definately flawed in the way it uses pIF->Next -- pIF->Next could well be
|
||
|
gone by the time it gets to it. HOWEVER that's not the problem we're seeing
|
||
|
-- we're seeing the case of the ArpSIfList ITSELF being corrupted.
|
||
|
|
||
|
FIXES:
|
||
|
1. arps.c ArpSShutdown now refs the pNext pointer -- a little bit of intricate
|
||
|
code that makes sure pNext is still around when we need it.
|
||
|
2. We use the INTF_ADAPTER_OPENED flag, set on successful completion of
|
||
|
open adapter (in ArpSOpenAdapterComplete). We use this to make sure
|
||
|
that NdisCloseAdapter is called only once. NdisCloseAdapter was called
|
||
|
in a bunch of places -- now we call ArpSTryCloseAdapter, which checks
|
||
|
the INTF_ADAPTER_OPENED flag first.
|
||
|
3. ArpSStopInterface now doesn't clobber pIntF->CleanupEvent -- if it's
|
||
|
NON-NULL it simply doesn't wait.
|
||
|
|
||
|
10/07/1999 JosephJ fix for 412018
|
||
|
*RC3SS: ATMARPS bugchecks on unloading if no call manager bound to adapter.
|
||
|
On unloading, we were trying to deregister addresses which had never
|
||
|
been registered (in fact the AF was never opened). Fix is in
|
||
|
DeregisterAllAddresses (ndis.c): it used to check that
|
||
|
pIntF->NumAllocedRegdAddresses is non-null ; now it checks that
|
||
|
NumAddressesRegd is non-null.
|
||
|
|
||
|
01/06/2000 JosephJ fix for 416301 corrupt ArpSIfList
|
||
|
|
||
|
|
||
|
ArpSOpenAfComplete(ndis.c): we were setting a pInfF flag without holding the IF
|
||
|
lock (we just had the IF list lock). This was possibly corrupting the other
|
||
|
bits in the flag field. This is most likely the cause of the problem.
|
||
|
Also got rid of INTF_IN_LIST field, which is not required.
|
||
|
|
||
|
ArpSDereferenceIntF(arps.c): Arvindm added code to make sure two threads don't
|
||
|
try to deref the interface to zero at the same time. This is not likely to
|
||
|
be the cause of the bug (bug is that the list is corrupted, not that
|
||
|
an IF entry was de-allocated prematurely or twice) but is a hole that needs to
|
||
|
be fixed.
|
||
|
03/30/2000 JosephJ Hit an assert in arps.c when the close-call handler for
|
||
|
the validation make call (ArpSCloseRegAddrCallComplete) is called during
|
||
|
shutdown -- we've nuked pIntF->NumAllocedRegdAddresses by this time as
|
||
|
part of shutdown, so the following assert in the function fails:
|
||
|
ASSERT(pRegAddrCtxt->RegAddrIndex < pIntF->NumAllocedRegdAddresses);
|
||
|
Fix is to bracket this code by if (!(pIntF->Flags & INTF_STOPPING)).
|
||
|
|
||
|
04/18/2000 JosephJ
|
||
|
Removed assert from the code below in ArpSValidateAndSetRegdAddresses
|
||
|
|
||
|
if (pIntF->pRegAddrCtxt != NULL)
|
||
|
{
|
||
|
ASSERT(FALSE);
|
||
|
break;
|
||
|
}
|
||
|
|
||
|
This could happen if we get an OID_CO_ADDRESS_CHANGE when we are
|
||
|
either processing an earlier one, or are in the process of
|
||
|
initializing. We get this case (and hit the assert) during pnp stress
|
||
|
( 1c_reset script against an Olicom 616X) -- Whistler bug#102805
|
||
|
|