[snmp] snmpv3 context receiver thread destroyed
Birgit Arkesteijn
birgit at westhawk.co.uk
Wed Apr 4 13:18:35 BST 2007
Hi Josh,
An interesting problem ... hence a very long answer.
Incoming discovery responses aren't treated any different from other
responses. This is because other (non discovery) responses contain
timeline info as well to keep the stack synchronised.
BTW, discovery is done blocked.
Here is a very short outline of what happens when a response (discovery
or not) is received and decoded:
**** AsnDecoderv3.processSNMPv3() calls:
- TimeWindow.isEngineIdOK():
+ if no engineId yet: store it
+ if engineId: compare
returns true/false
(debug > 4)
When false -> throws DecodingException("Received engine Id xxx is not
correct")
- TimeWindow.isOutsideTimeWindow():
returns true/false, does not store
+ if no timeline info yet: return false
+ if timeline info: check if in window (return true/false)
When true -> throws DecodingException("Message is outside time window")
- TimeWindow.updateTimeWindow():
stores when:
+ (no timeline yet) & (bootsA > 0 || timeA > 0)
+ received timeline data newer then previous one
(debug > 4)
**** In other words:
1 The engineId is stored the first time.
2 Timeline info is only stored if newer.
However, if an older timeline info is received after newer timeline
info, the older info might cause a DecodingException exception.
I don't think that's your case, since I cannot see a DecodingException
in your debug output.
**** As to your debug output:
I'm trying to highlight the most important lines of your debug output. I
removed "uk.co.westhawk." to make the lines shorter. The first number is
the line number:
<snippets>
284: Bringing down the loopback interface.
671: pdu.DiscoveryPdu.setResponseException(): Timed out
4284: pdu.DiscoveryPdu.setResponseException(): Timed out
4290: pdu.GetPdu_vec.setResponseException(): Timed out
4297: Bringing up the loopback interface.
4298: stack.SnmpContextv3.addPdu(): msgId=57, Pdu reqId=57
4304: stack.SnmpContextv3.addPdu(): msgId=58, Pdu reqId=58
4374: pdu.DiscoveryPdu.sendme(): Sent Pdu reqId 58, retries 1
4436: stack.SnmpContextv3.processIncomingResponse(): msgId=58, Pdu reqId=58
4462: stack.TimeWindow.setSnmpEngineId():
hostaddr '127.0.0.1',
port '161',
snmpEngineId '800007E580F3A63E214C07C945'',
key '127.0.0.1: 161'
4466: stack.TimeWindow.setTimeLine():
snmpEngineId 800007E580F3A63E214C07C945,
node stack.TimeWindowNode[
engineId=800007E580F3A63E214C07C945,
engineBoots=8,
engineTime=409479,
latestReceivedEngineTime=409479]
4469: stack.SnmpContextv3.processIncomingResponse(): rid2=58
4472: beans.UsmDiscoveryBean.startDiscovery(): Done
4474: stack.SnmpContextv3.actualEncodePacket(): msgId=57, Pdu reqId=57
4571: pdu.DiscoveryPdu.sendme(): Sent Pdu reqId 56, retries 5
4633: stack.SnmpContextv3.processIncomingResponse(): msgId=56, Pdu reqId=56
4659: pdu.DiscoveryPdu.setResponseException(): Timed out
4667: stack.SnmpContextv3.processIncomingResponse(): rid2=56
4670: beans.UsmDiscoveryBean.startDiscovery(): Done
4758: pdu.GetPdu_vec.setResponseException(): Timed out
4774: pdu.GetPdu_vec.setResponseException(): Timed out
4784: pdu.GetPdu_vec.setResponseException(): Timed out
4794: pdu.GetPdu_vec.setResponseException(): Timed out
4795: Done unblocked timed out.
4892: WARNING: Method: public abstract void fcncnm.ndm.commonobjects.EventStreamAPI.updateStatus(fcncnm.ndm.commonobjects.Status) throws fcncnm.ndm.commonobjects.CNPException,
couldn't add arg: DeviceStatus:
MeId: NMS ID = 25/SECRET/HOST/bogus/local
ME type = fcncnm.ndm.ams.adaptors.ics.ICSHost
ME instance = localhost,knode82'
date='Wed Mar 28 12: 05:11 EDT 2007'
4955: SEVERE: Couldn't find ip address bogus-ip cause:
"java.net.UnknownHostException: bogus-ip: bogus-ip"
4956: java.net.UnknownHostException: bogus-ip: bogus-ip
</snippets>
*** Comments on your debug output:
Two of the discovery packets are answered (58, 56). However, all your
original GetPdu_vec keep timing out. Message 57 (what I assume is one of
your GetPdu_vec), should have stand a chance.
As far as I can tell, the GetPdu_vec isn't even sent once (I would
expect a "GetPdu_vec.sendme()" in the log file), because it times out.
That's due to the waitForSelf(). The situation will improve if you use
the notification mechanism, since that starts the clock when the message
is actually sent (see Pdu.transmit()), whereas waitForSelf() immediately
starts waiting.
(BTW, your debug says "one unblocked timed out", but by calling
"waitForSelf()" you are sending the GetPdu_vec blocked.)
I realise that our debugging isn't sufficient, since not all the message
mention the request ID. It makes it very hard to track when a request is
added, sent and timed out. It won't help you now, but I'll put that on
my list.
BTW, what the level of debug are you using? I would have expected more
IOExceptions.
*** Overall conclusion:
The situation will improve dramatically if you use the latest CVS
source, since the discovery packets will use the same retry intervals.
The discovery requests and your requests won't drift so much.
When you send the GetPdu_vec via the notification mechanism, I expect the
pdu to be sent after the discovery succeeds, whereas now it has already
timed out.
Hope the above helps.
Cheers, Birgit
On Tue, Apr 03, 2007 at 03:06:48PM -0400, Josh Bers wrote:
> Birgit,
>
> Yeah, I'd prefer an official release but if this fixes the problem we
> might update to the CVS head... Did you and/or Tim take a look at the
> debug output? Any ideas about what's going wrong?
>
> > To answer your question:
> > The 5 retries all have the same request Id. The first one
> > that comes back is "accepted" (for better word of it), the
> > next ones are ignored.
> > In the later case you would get a message: Pdu of msgId XXX
> > is already answered.
> >
>
> Birgit the problem that I'm expecting is that many Discovery PDU's with
> different msgId's will be outstanding (being retried) say req id 55, 57,
> and 59. Let's say the original request of 59 comes back because the
> agent becomes reachable. Now retry # 2 for reqid 57 goes through... Does
> the stack handle this case of multiple USM discovery PDU's with
> responses? What happens to the timeline, engine id, etc.
>
> Josh
--
-- Birgit Arkesteijn, birgit at westhawk.co.uk,
-- Westhawk Ltd, Albion Wharf, 19 Albion Street, Manchester M1 5LN, UK
-- Company no: 1769350
-- Registered Office:
-- 15 London Road, Stockton Heath, Warrington WA4 6SJ. UK.
-- tel.: +44 (0)161 237 0660
-- <URL: http://www.westhawk.co.uk>
More information about the snmp
mailing list