[snmp] snmpv3 context receiver thread destroyed

Josh Bers jbers at bbn.com
Tue May 8 15:33:42 BST 2007


Birgit,

I tried using the latest code from sourceforge, however, the problem still
appears. Strangely the first test fails (remote host not reachable then
reachable). If I run a subsequent test in the same JVM it succeeds. Is the
timeline /authentication info cached separately from contexts? If so that
might explain why only the first test fails (since the subsequent ones don't
have to send discovery PDU's).

I looked at some of the email archives and found a problem that sounds
related. There is a problem with transmitters not getting cleaned up when an
exception is thrown by discovery pdu (which is the case since it times out,
right?). I am including the email. Was this ever fixed in the code?

Thanks,

Josh

> -----Original Message-----
> From: snmp-bounces at snmp.westhawk.co.uk 
> [mailto:snmp-bounces at snmp.westhawk.co.uk] On Behalf Of Birgit 
> Arkesteijn
> Sent: Wednesday, April 04, 2007 10:56 AM
> To: List for discussion of the Westhawk SNMP stack
> Subject: Re: [snmp] snmpv3 context receiver thread destroyed
> 
> 
> Hi Josh,
> 
> I don't have much time now, so just a quick answer:
> 
> In your debug output:
> The lines:
> 4298: snmp.stack.SnmpContextv3.addPdu(): msgId=57, Pdu reqId=57
> 4304: snmp.stack.SnmpContextv3.addPdu(): msgId=58, Pdu reqId=58
> 
> are the last where 'addPdu()' is called.
> 
> - Pdu.send() calls
> - Pdu.send(int error_status, int error_index) calls
> - SnmpContextBasisFace.addPdu(this);
> 
> Pdu.send() should be on your end.
> Are you sure you keep sending PDUs?
> Could you (maybe) add some debug every time you call send()?
> 
> Cheers, Birgit
> 
> 
> 
> On Wed, Apr 04, 2007 at 10:30:46AM -0400, Josh Bers wrote:
> > Birgit,
> > 
> > Thank you for the detailed analysis. I am going to try 
> doing a notify 
> > instead of waitForSelf() to see if that improves things...however...
> > 
> > I still do not understand why the Receiver thread for the localhost 
> > context dies which would seem to explain why all subsequent 
> > GetPdu_vec's timeout. Based upon your response I can 
> understand that 
> > the set of getPdu_vecs that are lined up before the agent 
> comes up are 
> > timing out, however, once the timeline is in and no more 
> > discoverUSMbeans need to run... There should not be any delay in 
> > sending the pdu's... However you are not seeing and sendme() calls. 
> > Are the transmitter's getting killed too?
> > 
> > BTW. We were using debug level of 15.
> > 
> > Josh
> > 
> > > -----Original Message-----
> > > From: snmp-bounces at snmp.westhawk.co.uk
> > > [mailto:snmp-bounces at snmp.westhawk.co.uk] On Behalf Of Birgit 
> > > Arkesteijn
> > > Sent: Wednesday, April 04, 2007 7:19 AM
> > > To: List for discussion of the Westhawk SNMP stack
> > > Subject: Re: [snmp] snmpv3 context receiver thread destroyed
> > > 
> > > 
> > > Hi Josh,
> > > 
> > > An interesting problem ... hence a very long answer.
> > > 
> > > Incoming discovery responses aren't treated any different
> > > from other responses. This is because other (non discovery) 
> > > responses contain timeline info as well to keep the stack 
> > > synchronised.
> > > 
> > > BTW, discovery is done blocked.
> > > 
> > > Here is a very short outline of what happens when a response
> > > (discovery or not) is received and decoded:
> > > 
> > > **** AsnDecoderv3.processSNMPv3() calls:
> > > - TimeWindow.isEngineIdOK():
> > >   + if no engineId yet: store it
> > >   + if engineId: compare
> > >   returns true/false
> > >   (debug > 4)
> > >   When false -> throws DecodingException("Received engine Id
> > > xxx is not
> > >   correct")
> > > 
> > > - TimeWindow.isOutsideTimeWindow():
> > >   returns true/false, does not store
> > >   + if no timeline info yet: return false
> > >   + if timeline info: check if in window (return true/false)
> > >   When true -> throws DecodingException("Message is outside
> > > time window")
> > > 
> > > - TimeWindow.updateTimeWindow():
> > >   stores when:
> > >   + (no timeline yet) & (bootsA > 0 || timeA > 0)
> > >   + received timeline data newer then previous one
> > >   (debug > 4)
> > > 
> > > 
> > > **** In other words:
> > > 1 The engineId is stored the first time.
> > > 2 Timeline info is only stored if newer. 
> > >   However, if an older timeline info is received after 
> newer timeline
> > >   info, the older info might cause a DecodingException exception.
> > >   I don't think that's your case, since I cannot see a
> > > DecodingException
> > >   in your debug output.
> > > 
> > > 
> > > 
> > > **** As to your debug output:
> > > I'm trying to highlight the most important lines of your
> > > debug output. I removed "uk.co.westhawk." to make the lines 
> > > shorter. The first number is the line number:
> > > 
> > > <snippets>
> > >  284: Bringing down the loopback interface.
> > > 
> > >  671: pdu.DiscoveryPdu.setResponseException(): Timed out
> > > 4284: pdu.DiscoveryPdu.setResponseException(): Timed out
> > > 4290: pdu.GetPdu_vec.setResponseException(): Timed out
> > > 
> > > 4297: Bringing up the loopback interface.
> > > 
> > > 4298: stack.SnmpContextv3.addPdu(): msgId=57, Pdu reqId=57
> > > 4304: stack.SnmpContextv3.addPdu(): msgId=58, Pdu reqId=58
> > > 4374: pdu.DiscoveryPdu.sendme(): Sent Pdu reqId 58, retries 1
> > > 
> > > 4436: stack.SnmpContextv3.processIncomingResponse():
> > > msgId=58, Pdu reqId=58
> > > 
> > > 4462: stack.TimeWindow.setSnmpEngineId(): 
> > >     hostaddr '127.0.0.1', 
> > >     port '161', 
> > >     snmpEngineId '800007E580F3A63E214C07C945'', 
> > >     key '127.0.0.1: 161'
> > > 
> > > 4466: stack.TimeWindow.setTimeLine(): 
> > >     snmpEngineId 800007E580F3A63E214C07C945, 
> > >     node stack.TimeWindowNode[
> > >         engineId=800007E580F3A63E214C07C945, 
> > >         engineBoots=8, 
> > >         engineTime=409479, 
> > >         latestReceivedEngineTime=409479]
> > > 
> > > 4469: stack.SnmpContextv3.processIncomingResponse(): rid2=58
> > > 4472: beans.UsmDiscoveryBean.startDiscovery(): Done
> > > 4474: stack.SnmpContextv3.actualEncodePacket(): msgId=57, Pdu 
> > > reqId=57
> > > 
> > > 4571: pdu.DiscoveryPdu.sendme(): Sent Pdu reqId 56, retries 5
> > > 
> > > 4633: stack.SnmpContextv3.processIncomingResponse():
> > > msgId=56, Pdu reqId=56
> > > 
> > > 4659: pdu.DiscoveryPdu.setResponseException(): Timed out
> > > 
> > > 4667: stack.SnmpContextv3.processIncomingResponse(): rid2=56
> > > 4670: beans.UsmDiscoveryBean.startDiscovery(): Done
> > > 
> > > 4758: pdu.GetPdu_vec.setResponseException(): Timed out
> > > 4774: pdu.GetPdu_vec.setResponseException(): Timed out
> > > 4784: pdu.GetPdu_vec.setResponseException(): Timed out
> > > 4794: pdu.GetPdu_vec.setResponseException(): Timed out
> > > 4795: Done unblocked timed out.
> > > 
> > > 4892: WARNING: Method: public abstract void
> > > fcncnm.ndm.commonobjects.EventStreamAPI.updateStatus(fcncnm.nd
> > > m.commonobjects.Status) throws 
> fcncnm.ndm.commonobjects.CNPException, 
> > >     couldn't add arg:  DeviceStatus: 
> > >     MeId:  NMS ID = 25/SECRET/HOST/bogus/local 
> > >     ME type = fcncnm.ndm.ams.adaptors.ics.ICSHost 
> > >     ME instance = localhost,knode82' 
> > >     date='Wed Mar 28 12: 05:11 EDT 2007'
> > > 
> > > 4955: SEVERE: Couldn't find ip address bogus-ip cause: 
> > >     "java.net.UnknownHostException:  bogus-ip: bogus-ip"
> > > 
> > > 4956: java.net.UnknownHostException: bogus-ip: bogus-ip 
> </snippets>
> > > 
> > > 
> > > *** Comments on your debug output:
> > > Two of the discovery packets are answered (58, 56). However,
> > > all your original GetPdu_vec keep timing out. Message 57 
> > > (what I assume is one of your GetPdu_vec), should have 
> stand a chance.
> > > 
> > > As far as I can tell, the GetPdu_vec isn't even sent once (I
> > > would expect a "GetPdu_vec.sendme()" in the log file), 
> > > because it times out.
> > > 
> > > That's due to the waitForSelf(). The situation will improve
> > > if you use the notification mechanism, since that starts the 
> > > clock when the message is actually sent (see Pdu.transmit()), 
> > > whereas waitForSelf() immediately starts waiting. 
> > > 
> > > (BTW, your debug says "one unblocked timed out", but by
> > > calling "waitForSelf()" you are sending the GetPdu_vec blocked.)
> > > 
> > > I realise that our debugging isn't sufficient, since not all
> > > the message mention the request ID. It makes it very hard to 
> > > track when a request is added, sent and timed out. It won't 
> > > help you now, but I'll put that on my list.
> > > 
> > > BTW, what the level of debug are you using? I would have
> > > expected more IOExceptions.
> > > 
> > > 
> > > *** Overall conclusion:
> > > The situation will improve dramatically if you use the latest
> > > CVS source, since the discovery packets will use the same 
> > > retry intervals. The discovery requests and your requests 
> > > won't drift so much.
> > > 
> > > When you send the GetPdu_vec via the notification mechanism,
> > > I expect the pdu to be sent after the discovery succeeds, 
> > > whereas now it has already timed out.
> > > 
> > > Hope the above helps.
> > > 
> > > Cheers, Birgit
> > > 
> > > 
> > > On Tue, Apr 03, 2007 at 03:06:48PM -0400, Josh Bers wrote:
> > > > Birgit,
> > > > 
> > > > Yeah, I'd prefer an official release but if this fixes the
> > > problem we
> > > > might update to the CVS head... Did you and/or Tim take a
> > > look at the
> > > > debug output? Any ideas about what's going wrong?
> > > > 
> > > > > To answer your question:
> > > > > The 5 retries all have the same request Id. The first 
> one that 
> > > > > comes back is "accepted" (for better word of it), the 
> next ones 
> > > > > are ignored. In the later case you would get a 
> message: Pdu of 
> > > > > msgId XXX is already answered.
> > > > > 
> > > > 
> > > > Birgit the problem that I'm expecting is that many 
> Discovery PDU's
> > > > with different msgId's will be outstanding (being retried) 
> > > say req id
> > > > 55, 57, and 59. Let's say the original request of 59 comes back
> > > > because the agent becomes reachable. Now retry # 2 for 
> > > reqid 57 goes
> > > > through... Does the stack handle this case of multiple USM
> > > discovery
> > > > PDU's with responses? What happens to the timeline, engine id, 
> > > > etc.
> > > > 
> > > > Josh
> > > 
> > > --
> > > -- Birgit Arkesteijn, birgit at westhawk.co.uk,
> > > -- Westhawk Ltd, Albion Wharf, 19 Albion Street, 
> Manchester M1 5LN, UK
> > > -- Company no: 1769350
> > > -- Registered Office: 
> > > -- 15 London Road, Stockton Heath, Warrington WA4 6SJ. UK.
> > > -- tel.: +44 (0)161 237 0660
> > > -- <URL: http://www.westhawk.co.uk> 
> > > _______________________________________________
> > > snmp mailing list
> > > snmp at snmp.westhawk.co.uk 
> > > http://snmp.westhawk.co.uk/mailman/listinfo/snmp
> > > 
> > 
> > 
> > 
> > _______________________________________________
> > snmp mailing list
> > snmp at snmp.westhawk.co.uk 
> > http://snmp.westhawk.co.uk/mailman/listinfo/snmp
> > 
> 
> -- 
> -- Birgit Arkesteijn, birgit at westhawk.co.uk,
> -- Westhawk Ltd, Albion Wharf, 19 Albion Street, Manchester M1 5LN, UK
> -- Company no: 1769350
> -- Registered Office: 
> -- 15 London Road, Stockton Heath, Warrington WA4 6SJ. UK.
> -- tel.: +44 (0)161 237 0660
> -- <URL: http://www.westhawk.co.uk> 
> _______________________________________________
> snmp mailing list
> snmp at snmp.westhawk.co.uk 
> http://snmp.westhawk.co.uk/mailman/listinfo/snmp
> 


More information about the snmp mailing list