[snmp] snmpv3 context receiver thread destroyed

Andrew Chandler andy at riftware.com
Tue May 8 15:00:01 BST 2007


You might want to try the code from cvs then because I think that bug was
fixed after the release available for download.
 

-----Original Message-----
From: snmp-bounces at snmp.westhawk.co.uk
[mailto:snmp-bounces at snmp.westhawk.co.uk] On Behalf Of Josh Bers
Sent: Tuesday, May 08, 2007 1:34 PM
To: 'List for discussion of the Westhawk SNMP stack'
Subject: RE: [snmp] snmpv3 context receiver thread destroyed

Birgit,

I tried using the latest code from sourceforge, however, the problem still
appears. Strangely the first test fails (remote host not reachable then
reachable). If I run a subsequent test in the same JVM it succeeds. Is the
timeline /authentication info cached separately from contexts? If so that
might explain why only the first test fails (since the subsequent ones don't
have to send discovery PDU's).

I looked at some of the email archives and found a problem that sounds
related. There is a problem with transmitters not getting cleaned up when an
exception is thrown by discovery pdu (which is the case since it times out,
right?). I am including the email. Was this ever fixed in the code?

Thanks,

Josh

> -----Original Message-----
> From: snmp-bounces at snmp.westhawk.co.uk 
> [mailto:snmp-bounces at snmp.westhawk.co.uk] On Behalf Of Birgit 
> Arkesteijn
> Sent: Wednesday, April 04, 2007 10:56 AM
> To: List for discussion of the Westhawk SNMP stack
> Subject: Re: [snmp] snmpv3 context receiver thread destroyed
> 
> 
> Hi Josh,
> 
> I don't have much time now, so just a quick answer:
> 
> In your debug output:
> The lines:
> 4298: snmp.stack.SnmpContextv3.addPdu(): msgId=57, Pdu reqId=57
> 4304: snmp.stack.SnmpContextv3.addPdu(): msgId=58, Pdu reqId=58
> 
> are the last where 'addPdu()' is called.
> 
> - Pdu.send() calls
> - Pdu.send(int error_status, int error_index) calls
> - SnmpContextBasisFace.addPdu(this);
> 
> Pdu.send() should be on your end.
> Are you sure you keep sending PDUs?
> Could you (maybe) add some debug every time you call send()?
> 
> Cheers, Birgit
> 
> 
> 
> On Wed, Apr 04, 2007 at 10:30:46AM -0400, Josh Bers wrote:
> > Birgit,
> > 
> > Thank you for the detailed analysis. I am going to try
> doing a notify
> > instead of waitForSelf() to see if that improves things...however...
> > 
> > I still do not understand why the Receiver thread for the localhost 
> > context dies which would seem to explain why all subsequent 
> > GetPdu_vec's timeout. Based upon your response I can
> understand that
> > the set of getPdu_vecs that are lined up before the agent
> comes up are
> > timing out, however, once the timeline is in and no more 
> > discoverUSMbeans need to run... There should not be any delay in 
> > sending the pdu's... However you are not seeing and sendme() calls.
> > Are the transmitter's getting killed too?
> > 
> > BTW. We were using debug level of 15.
> > 
> > Josh
> > 
> > > -----Original Message-----
> > > From: snmp-bounces at snmp.westhawk.co.uk 
> > > [mailto:snmp-bounces at snmp.westhawk.co.uk] On Behalf Of Birgit 
> > > Arkesteijn
> > > Sent: Wednesday, April 04, 2007 7:19 AM
> > > To: List for discussion of the Westhawk SNMP stack
> > > Subject: Re: [snmp] snmpv3 context receiver thread destroyed
> > > 
> > > 
> > > Hi Josh,
> > > 
> > > An interesting problem ... hence a very long answer.
> > > 
> > > Incoming discovery responses aren't treated any different from 
> > > other responses. This is because other (non discovery) responses 
> > > contain timeline info as well to keep the stack synchronised.
> > > 
> > > BTW, discovery is done blocked.
> > > 
> > > Here is a very short outline of what happens when a response 
> > > (discovery or not) is received and decoded:
> > > 
> > > **** AsnDecoderv3.processSNMPv3() calls:
> > > - TimeWindow.isEngineIdOK():
> > >   + if no engineId yet: store it
> > >   + if engineId: compare
> > >   returns true/false
> > >   (debug > 4)
> > >   When false -> throws DecodingException("Received engine Id xxx 
> > > is not
> > >   correct")
> > > 
> > > - TimeWindow.isOutsideTimeWindow():
> > >   returns true/false, does not store
> > >   + if no timeline info yet: return false
> > >   + if timeline info: check if in window (return true/false)
> > >   When true -> throws DecodingException("Message is outside time 
> > > window")
> > > 
> > > - TimeWindow.updateTimeWindow():
> > >   stores when:
> > >   + (no timeline yet) & (bootsA > 0 || timeA > 0)
> > >   + received timeline data newer then previous one
> > >   (debug > 4)
> > > 
> > > 
> > > **** In other words:
> > > 1 The engineId is stored the first time.
> > > 2 Timeline info is only stored if newer. 
> > >   However, if an older timeline info is received after
> newer timeline
> > >   info, the older info might cause a DecodingException exception.
> > >   I don't think that's your case, since I cannot see a 
> > > DecodingException
> > >   in your debug output.
> > > 
> > > 
> > > 
> > > **** As to your debug output:
> > > I'm trying to highlight the most important lines of your debug 
> > > output. I removed "uk.co.westhawk." to make the lines shorter. The 
> > > first number is the line number:
> > > 
> > > <snippets>
> > >  284: Bringing down the loopback interface.
> > > 
> > >  671: pdu.DiscoveryPdu.setResponseException(): Timed out
> > > 4284: pdu.DiscoveryPdu.setResponseException(): Timed out
> > > 4290: pdu.GetPdu_vec.setResponseException(): Timed out
> > > 
> > > 4297: Bringing up the loopback interface.
> > > 
> > > 4298: stack.SnmpContextv3.addPdu(): msgId=57, Pdu reqId=57
> > > 4304: stack.SnmpContextv3.addPdu(): msgId=58, Pdu reqId=58
> > > 4374: pdu.DiscoveryPdu.sendme(): Sent Pdu reqId 58, retries 1
> > > 
> > > 4436: stack.SnmpContextv3.processIncomingResponse():
> > > msgId=58, Pdu reqId=58
> > > 
> > > 4462: stack.TimeWindow.setSnmpEngineId(): 
> > >     hostaddr '127.0.0.1', 
> > >     port '161', 
> > >     snmpEngineId '800007E580F3A63E214C07C945'', 
> > >     key '127.0.0.1: 161'
> > > 
> > > 4466: stack.TimeWindow.setTimeLine(): 
> > >     snmpEngineId 800007E580F3A63E214C07C945, 
> > >     node stack.TimeWindowNode[
> > >         engineId=800007E580F3A63E214C07C945, 
> > >         engineBoots=8, 
> > >         engineTime=409479, 
> > >         latestReceivedEngineTime=409479]
> > > 
> > > 4469: stack.SnmpContextv3.processIncomingResponse(): rid2=58
> > > 4472: beans.UsmDiscoveryBean.startDiscovery(): Done
> > > 4474: stack.SnmpContextv3.actualEncodePacket(): msgId=57, Pdu
> > > reqId=57
> > > 
> > > 4571: pdu.DiscoveryPdu.sendme(): Sent Pdu reqId 56, retries 5
> > > 
> > > 4633: stack.SnmpContextv3.processIncomingResponse():
> > > msgId=56, Pdu reqId=56
> > > 
> > > 4659: pdu.DiscoveryPdu.setResponseException(): Timed out
> > > 
> > > 4667: stack.SnmpContextv3.processIncomingResponse(): rid2=56
> > > 4670: beans.UsmDiscoveryBean.startDiscovery(): Done
> > > 
> > > 4758: pdu.GetPdu_vec.setResponseException(): Timed out
> > > 4774: pdu.GetPdu_vec.setResponseException(): Timed out
> > > 4784: pdu.GetPdu_vec.setResponseException(): Timed out
> > > 4794: pdu.GetPdu_vec.setResponseException(): Timed out
> > > 4795: Done unblocked timed out.
> > > 
> > > 4892: WARNING: Method: public abstract void 
> > > fcncnm.ndm.commonobjects.EventStreamAPI.updateStatus(fcncnm.nd
> > > m.commonobjects.Status) throws
> fcncnm.ndm.commonobjects.CNPException,
> > >     couldn't add arg:  DeviceStatus: 
> > >     MeId:  NMS ID = 25/SECRET/HOST/bogus/local 
> > >     ME type = fcncnm.ndm.ams.adaptors.ics.ICSHost 
> > >     ME instance = localhost,knode82' 
> > >     date='Wed Mar 28 12: 05:11 EDT 2007'
> > > 
> > > 4955: SEVERE: Couldn't find ip address bogus-ip cause: 
> > >     "java.net.UnknownHostException:  bogus-ip: bogus-ip"
> > > 
> > > 4956: java.net.UnknownHostException: bogus-ip: bogus-ip
> </snippets>
> > > 
> > > 
> > > *** Comments on your debug output:
> > > Two of the discovery packets are answered (58, 56). However, all 
> > > your original GetPdu_vec keep timing out. Message 57 (what I 
> > > assume is one of your GetPdu_vec), should have
> stand a chance.
> > > 
> > > As far as I can tell, the GetPdu_vec isn't even sent once (I would 
> > > expect a "GetPdu_vec.sendme()" in the log file), because it times 
> > > out.
> > > 
> > > That's due to the waitForSelf(). The situation will improve if you 
> > > use the notification mechanism, since that starts the clock when 
> > > the message is actually sent (see Pdu.transmit()), whereas 
> > > waitForSelf() immediately starts waiting.
> > > 
> > > (BTW, your debug says "one unblocked timed out", but by calling 
> > > "waitForSelf()" you are sending the GetPdu_vec blocked.)
> > > 
> > > I realise that our debugging isn't sufficient, since not all the 
> > > message mention the request ID. It makes it very hard to track 
> > > when a request is added, sent and timed out. It won't help you 
> > > now, but I'll put that on my list.
> > > 
> > > BTW, what the level of debug are you using? I would have expected 
> > > more IOExceptions.
> > > 
> > > 
> > > *** Overall conclusion:
> > > The situation will improve dramatically if you use the latest CVS 
> > > source, since the discovery packets will use the same retry 
> > > intervals. The discovery requests and your requests won't drift so 
> > > much.
> > > 
> > > When you send the GetPdu_vec via the notification mechanism, I 
> > > expect the pdu to be sent after the discovery succeeds, whereas 
> > > now it has already timed out.
> > > 
> > > Hope the above helps.
> > > 
> > > Cheers, Birgit
> > > 
> > > 
> > > On Tue, Apr 03, 2007 at 03:06:48PM -0400, Josh Bers wrote:
> > > > Birgit,
> > > > 
> > > > Yeah, I'd prefer an official release but if this fixes the
> > > problem we
> > > > might update to the CVS head... Did you and/or Tim take a
> > > look at the
> > > > debug output? Any ideas about what's going wrong?
> > > > 
> > > > > To answer your question:
> > > > > The 5 retries all have the same request Id. The first
> one that
> > > > > comes back is "accepted" (for better word of it), the
> next ones
> > > > > are ignored. In the later case you would get a
> message: Pdu of
> > > > > msgId XXX is already answered.
> > > > > 
> > > > 
> > > > Birgit the problem that I'm expecting is that many
> Discovery PDU's
> > > > with different msgId's will be outstanding (being retried)
> > > say req id
> > > > 55, 57, and 59. Let's say the original request of 59 comes back 
> > > > because the agent becomes reachable. Now retry # 2 for
> > > reqid 57 goes
> > > > through... Does the stack handle this case of multiple USM
> > > discovery
> > > > PDU's with responses? What happens to the timeline, engine id, 
> > > > etc.
> > > > 
> > > > Josh
> > > 
> > > --
> > > -- Birgit Arkesteijn, birgit at westhawk.co.uk,
> > > -- Westhawk Ltd, Albion Wharf, 19 Albion Street,
> Manchester M1 5LN, UK
> > > -- Company no: 1769350
> > > -- Registered Office: 
> > > -- 15 London Road, Stockton Heath, Warrington WA4 6SJ. UK.
> > > -- tel.: +44 (0)161 237 0660
> > > -- <URL: http://www.westhawk.co.uk> 
> > > _______________________________________________
> > > snmp mailing list
> > > snmp at snmp.westhawk.co.uk
> > > http://snmp.westhawk.co.uk/mailman/listinfo/snmp
> > > 
> > 
> > 
> > 
> > _______________________________________________
> > snmp mailing list
> > snmp at snmp.westhawk.co.uk
> > http://snmp.westhawk.co.uk/mailman/listinfo/snmp
> > 
> 
> --
> -- Birgit Arkesteijn, birgit at westhawk.co.uk,
> -- Westhawk Ltd, Albion Wharf, 19 Albion Street, Manchester M1 5LN, UK
> -- Company no: 1769350
> -- Registered Office: 
> -- 15 London Road, Stockton Heath, Warrington WA4 6SJ. UK.
> -- tel.: +44 (0)161 237 0660
> -- <URL: http://www.westhawk.co.uk>
> _______________________________________________
> snmp mailing list
> snmp at snmp.westhawk.co.uk
> http://snmp.westhawk.co.uk/mailman/listinfo/snmp
> 



More information about the snmp mailing list