[snmp] snmpv3 context receiver thread destroyed
Josh Bers
jbers at bbn.com
Tue May 8 15:33:42 BST 2007
Birgit,
I tried using the latest code from sourceforge, however, the problem still
appears. Strangely the first test fails (remote host not reachable then
reachable). If I run a subsequent test in the same JVM it succeeds. Is the
timeline /authentication info cached separately from contexts? If so that
might explain why only the first test fails (since the subsequent ones don't
have to send discovery PDU's).
I looked at some of the email archives and found a problem that sounds
related. There is a problem with transmitters not getting cleaned up when an
exception is thrown by discovery pdu (which is the case since it times out,
right?). I am including the email. Was this ever fixed in the code?
Thanks,
Josh
> -----Original Message-----
> From: snmp-bounces at snmp.westhawk.co.uk
> [mailto:snmp-bounces at snmp.westhawk.co.uk] On Behalf Of Birgit
> Arkesteijn
> Sent: Wednesday, April 04, 2007 10:56 AM
> To: List for discussion of the Westhawk SNMP stack
> Subject: Re: [snmp] snmpv3 context receiver thread destroyed
>
>
> Hi Josh,
>
> I don't have much time now, so just a quick answer:
>
> In your debug output:
> The lines:
> 4298: snmp.stack.SnmpContextv3.addPdu(): msgId=57, Pdu reqId=57
> 4304: snmp.stack.SnmpContextv3.addPdu(): msgId=58, Pdu reqId=58
>
> are the last where 'addPdu()' is called.
>
> - Pdu.send() calls
> - Pdu.send(int error_status, int error_index) calls
> - SnmpContextBasisFace.addPdu(this);
>
> Pdu.send() should be on your end.
> Are you sure you keep sending PDUs?
> Could you (maybe) add some debug every time you call send()?
>
> Cheers, Birgit
>
>
>
> On Wed, Apr 04, 2007 at 10:30:46AM -0400, Josh Bers wrote:
> > Birgit,
> >
> > Thank you for the detailed analysis. I am going to try
> doing a notify
> > instead of waitForSelf() to see if that improves things...however...
> >
> > I still do not understand why the Receiver thread for the localhost
> > context dies which would seem to explain why all subsequent
> > GetPdu_vec's timeout. Based upon your response I can
> understand that
> > the set of getPdu_vecs that are lined up before the agent
> comes up are
> > timing out, however, once the timeline is in and no more
> > discoverUSMbeans need to run... There should not be any delay in
> > sending the pdu's... However you are not seeing and sendme() calls.
> > Are the transmitter's getting killed too?
> >
> > BTW. We were using debug level of 15.
> >
> > Josh
> >
> > > -----Original Message-----
> > > From: snmp-bounces at snmp.westhawk.co.uk
> > > [mailto:snmp-bounces at snmp.westhawk.co.uk] On Behalf Of Birgit
> > > Arkesteijn
> > > Sent: Wednesday, April 04, 2007 7:19 AM
> > > To: List for discussion of the Westhawk SNMP stack
> > > Subject: Re: [snmp] snmpv3 context receiver thread destroyed
> > >
> > >
> > > Hi Josh,
> > >
> > > An interesting problem ... hence a very long answer.
> > >
> > > Incoming discovery responses aren't treated any different
> > > from other responses. This is because other (non discovery)
> > > responses contain timeline info as well to keep the stack
> > > synchronised.
> > >
> > > BTW, discovery is done blocked.
> > >
> > > Here is a very short outline of what happens when a response
> > > (discovery or not) is received and decoded:
> > >
> > > **** AsnDecoderv3.processSNMPv3() calls:
> > > - TimeWindow.isEngineIdOK():
> > > + if no engineId yet: store it
> > > + if engineId: compare
> > > returns true/false
> > > (debug > 4)
> > > When false -> throws DecodingException("Received engine Id
> > > xxx is not
> > > correct")
> > >
> > > - TimeWindow.isOutsideTimeWindow():
> > > returns true/false, does not store
> > > + if no timeline info yet: return false
> > > + if timeline info: check if in window (return true/false)
> > > When true -> throws DecodingException("Message is outside
> > > time window")
> > >
> > > - TimeWindow.updateTimeWindow():
> > > stores when:
> > > + (no timeline yet) & (bootsA > 0 || timeA > 0)
> > > + received timeline data newer then previous one
> > > (debug > 4)
> > >
> > >
> > > **** In other words:
> > > 1 The engineId is stored the first time.
> > > 2 Timeline info is only stored if newer.
> > > However, if an older timeline info is received after
> newer timeline
> > > info, the older info might cause a DecodingException exception.
> > > I don't think that's your case, since I cannot see a
> > > DecodingException
> > > in your debug output.
> > >
> > >
> > >
> > > **** As to your debug output:
> > > I'm trying to highlight the most important lines of your
> > > debug output. I removed "uk.co.westhawk." to make the lines
> > > shorter. The first number is the line number:
> > >
> > > <snippets>
> > > 284: Bringing down the loopback interface.
> > >
> > > 671: pdu.DiscoveryPdu.setResponseException(): Timed out
> > > 4284: pdu.DiscoveryPdu.setResponseException(): Timed out
> > > 4290: pdu.GetPdu_vec.setResponseException(): Timed out
> > >
> > > 4297: Bringing up the loopback interface.
> > >
> > > 4298: stack.SnmpContextv3.addPdu(): msgId=57, Pdu reqId=57
> > > 4304: stack.SnmpContextv3.addPdu(): msgId=58, Pdu reqId=58
> > > 4374: pdu.DiscoveryPdu.sendme(): Sent Pdu reqId 58, retries 1
> > >
> > > 4436: stack.SnmpContextv3.processIncomingResponse():
> > > msgId=58, Pdu reqId=58
> > >
> > > 4462: stack.TimeWindow.setSnmpEngineId():
> > > hostaddr '127.0.0.1',
> > > port '161',
> > > snmpEngineId '800007E580F3A63E214C07C945'',
> > > key '127.0.0.1: 161'
> > >
> > > 4466: stack.TimeWindow.setTimeLine():
> > > snmpEngineId 800007E580F3A63E214C07C945,
> > > node stack.TimeWindowNode[
> > > engineId=800007E580F3A63E214C07C945,
> > > engineBoots=8,
> > > engineTime=409479,
> > > latestReceivedEngineTime=409479]
> > >
> > > 4469: stack.SnmpContextv3.processIncomingResponse(): rid2=58
> > > 4472: beans.UsmDiscoveryBean.startDiscovery(): Done
> > > 4474: stack.SnmpContextv3.actualEncodePacket(): msgId=57, Pdu
> > > reqId=57
> > >
> > > 4571: pdu.DiscoveryPdu.sendme(): Sent Pdu reqId 56, retries 5
> > >
> > > 4633: stack.SnmpContextv3.processIncomingResponse():
> > > msgId=56, Pdu reqId=56
> > >
> > > 4659: pdu.DiscoveryPdu.setResponseException(): Timed out
> > >
> > > 4667: stack.SnmpContextv3.processIncomingResponse(): rid2=56
> > > 4670: beans.UsmDiscoveryBean.startDiscovery(): Done
> > >
> > > 4758: pdu.GetPdu_vec.setResponseException(): Timed out
> > > 4774: pdu.GetPdu_vec.setResponseException(): Timed out
> > > 4784: pdu.GetPdu_vec.setResponseException(): Timed out
> > > 4794: pdu.GetPdu_vec.setResponseException(): Timed out
> > > 4795: Done unblocked timed out.
> > >
> > > 4892: WARNING: Method: public abstract void
> > > fcncnm.ndm.commonobjects.EventStreamAPI.updateStatus(fcncnm.nd
> > > m.commonobjects.Status) throws
> fcncnm.ndm.commonobjects.CNPException,
> > > couldn't add arg: DeviceStatus:
> > > MeId: NMS ID = 25/SECRET/HOST/bogus/local
> > > ME type = fcncnm.ndm.ams.adaptors.ics.ICSHost
> > > ME instance = localhost,knode82'
> > > date='Wed Mar 28 12: 05:11 EDT 2007'
> > >
> > > 4955: SEVERE: Couldn't find ip address bogus-ip cause:
> > > "java.net.UnknownHostException: bogus-ip: bogus-ip"
> > >
> > > 4956: java.net.UnknownHostException: bogus-ip: bogus-ip
> </snippets>
> > >
> > >
> > > *** Comments on your debug output:
> > > Two of the discovery packets are answered (58, 56). However,
> > > all your original GetPdu_vec keep timing out. Message 57
> > > (what I assume is one of your GetPdu_vec), should have
> stand a chance.
> > >
> > > As far as I can tell, the GetPdu_vec isn't even sent once (I
> > > would expect a "GetPdu_vec.sendme()" in the log file),
> > > because it times out.
> > >
> > > That's due to the waitForSelf(). The situation will improve
> > > if you use the notification mechanism, since that starts the
> > > clock when the message is actually sent (see Pdu.transmit()),
> > > whereas waitForSelf() immediately starts waiting.
> > >
> > > (BTW, your debug says "one unblocked timed out", but by
> > > calling "waitForSelf()" you are sending the GetPdu_vec blocked.)
> > >
> > > I realise that our debugging isn't sufficient, since not all
> > > the message mention the request ID. It makes it very hard to
> > > track when a request is added, sent and timed out. It won't
> > > help you now, but I'll put that on my list.
> > >
> > > BTW, what the level of debug are you using? I would have
> > > expected more IOExceptions.
> > >
> > >
> > > *** Overall conclusion:
> > > The situation will improve dramatically if you use the latest
> > > CVS source, since the discovery packets will use the same
> > > retry intervals. The discovery requests and your requests
> > > won't drift so much.
> > >
> > > When you send the GetPdu_vec via the notification mechanism,
> > > I expect the pdu to be sent after the discovery succeeds,
> > > whereas now it has already timed out.
> > >
> > > Hope the above helps.
> > >
> > > Cheers, Birgit
> > >
> > >
> > > On Tue, Apr 03, 2007 at 03:06:48PM -0400, Josh Bers wrote:
> > > > Birgit,
> > > >
> > > > Yeah, I'd prefer an official release but if this fixes the
> > > problem we
> > > > might update to the CVS head... Did you and/or Tim take a
> > > look at the
> > > > debug output? Any ideas about what's going wrong?
> > > >
> > > > > To answer your question:
> > > > > The 5 retries all have the same request Id. The first
> one that
> > > > > comes back is "accepted" (for better word of it), the
> next ones
> > > > > are ignored. In the later case you would get a
> message: Pdu of
> > > > > msgId XXX is already answered.
> > > > >
> > > >
> > > > Birgit the problem that I'm expecting is that many
> Discovery PDU's
> > > > with different msgId's will be outstanding (being retried)
> > > say req id
> > > > 55, 57, and 59. Let's say the original request of 59 comes back
> > > > because the agent becomes reachable. Now retry # 2 for
> > > reqid 57 goes
> > > > through... Does the stack handle this case of multiple USM
> > > discovery
> > > > PDU's with responses? What happens to the timeline, engine id,
> > > > etc.
> > > >
> > > > Josh
> > >
> > > --
> > > -- Birgit Arkesteijn, birgit at westhawk.co.uk,
> > > -- Westhawk Ltd, Albion Wharf, 19 Albion Street,
> Manchester M1 5LN, UK
> > > -- Company no: 1769350
> > > -- Registered Office:
> > > -- 15 London Road, Stockton Heath, Warrington WA4 6SJ. UK.
> > > -- tel.: +44 (0)161 237 0660
> > > -- <URL: http://www.westhawk.co.uk>
> > > _______________________________________________
> > > snmp mailing list
> > > snmp at snmp.westhawk.co.uk
> > > http://snmp.westhawk.co.uk/mailman/listinfo/snmp
> > >
> >
> >
> >
> > _______________________________________________
> > snmp mailing list
> > snmp at snmp.westhawk.co.uk
> > http://snmp.westhawk.co.uk/mailman/listinfo/snmp
> >
>
> --
> -- Birgit Arkesteijn, birgit at westhawk.co.uk,
> -- Westhawk Ltd, Albion Wharf, 19 Albion Street, Manchester M1 5LN, UK
> -- Company no: 1769350
> -- Registered Office:
> -- 15 London Road, Stockton Heath, Warrington WA4 6SJ. UK.
> -- tel.: +44 (0)161 237 0660
> -- <URL: http://www.westhawk.co.uk>
> _______________________________________________
> snmp mailing list
> snmp at snmp.westhawk.co.uk
> http://snmp.westhawk.co.uk/mailman/listinfo/snmp
>
More information about the snmp
mailing list