[snmp] snmpv3 context receiver thread destroyed

Josh Bers jbers at bbn.com
Tue May 8 16:21:21 BST 2007


I did try the latest from CVS on sourceforge and the problem persists (same
behavior as on 5_1).  Here is the email that I was talking about before
(maillist doesn't like attachments).

>From Cochran, Steve A. [steve at more.net]
Sent on 5/24/2006

Hi there,

I've been working on tracking down a couple of minor problems with the
(otherwise excellent) Westhawk stack, one of which is somewhat similar to
Vicne's.

Both bugs I've found are triggered only for SNMPv3 queries.  I noticed that
if the initial GET or GETNEXT to an SNMPv3 host fails then several threads
are left behind despite a call to destroy().

The first cause I found was in the UsmDiscoveryBean.  In it's
startDiscovery() method there are two places where exceptions are thrown
without first calling freeResources().  Thus the context created by the
UsmDiscoveryBean itself is never destroy()ed.  Adding a call to
freeResources() before each of these two throw statements seems to fix this
issue.

After correcting this I discovered that I still had some transmitter threads
left over when an SNMPv3 query failed.  My initial theory, which I exchanged
email with Birgit about privately, was wrong as he thought. However, I
*think* I have figured it out now.

The code for Pdu.send() first calls it's context's addPdu() which creates a
Transmitter thread to send the data packet.  After addPdu() returns
Pdu.send() calls the Pdu.addToTrans() method which actually starts the
Transmitter thread to working.  The Transmitter thread calls the context's
sendPacket() method, which in turn calls activate() to start
AbstractSnmpContext's main thread.

This all works fine for SNMPv1/v2.  However, SnmpContextv3Basis supplies its
own addPdu() which first calls AbstractSnmpContext's addPdu() and then
starts up a UsmDiscoveryBean.  The problem is that UsmDiscoveryBean can
throw a PduException if requests to the destination device time out. The
exception isn't caught in either SnmpContextv3Basis.addPdu() or Pdu.send(),
which means that Pdu.send() never completes and
Pdu.addToTrans() never gets called.  Because of this the Transmitter never
calls sendPacket() and sendPacket() never calls activate().  The active
Transmitter threads are never cleaned up despite a call to destroy().


protected boolean send(int error_status, int error_index) throws
java.io.IOException, PduException {
    if (added == false)
    {
        // Moved this statement from the constructor because it
        // conflicts with the way the SnmpContextXPool works.
        added = context.addPdu(this);   <-- EXCEPTION CAN OCCUR HERE!
    }
    Enumeration vbs = reqVarbinds.elements();
    encodedPacket = context.encodePacket(msg_type, req_id, error_status,
        error_index, vbs, snmpv3MsgId);
    addToTrans();
    return added;
}

Sorry if I didn't do a very good job of explaining all this.  Long story
short, I added a call to activate() in AbstractSnmpContext.addPdu() so that
a Transmitter couldn't be created without first creating the main thread.
It works for me, but Vicne's may be the more general solution.

Hope this is of some help to someone.  Otherwise I'm very happy with how
well this SNMP stack works.  My thanks to everyone who has contributed to
it.

Steve Cochran


_______________________________________________
snmp mailing list

> -----Original Message-----
> From: snmp-bounces at snmp.westhawk.co.uk 
> [mailto:snmp-bounces at snmp.westhawk.co.uk] On Behalf Of Andrew Chandler
> Sent: Tuesday, May 08, 2007 3:00 PM
> To: 'List for discussion of the Westhawk SNMP stack'
> Subject: RE: [snmp] snmpv3 context receiver thread destroyed
> 
> 
> You might want to try the code from cvs then because I think 
> that bug was fixed after the release available for download.
>  
> 
> -----Original Message-----
> From: snmp-bounces at snmp.westhawk.co.uk 
> [mailto:snmp-bounces at snmp.westhawk.co.uk] On Behalf Of Josh Bers
> Sent: Tuesday, May 08, 2007 1:34 PM
> To: 'List for discussion of the Westhawk SNMP stack'
> Subject: RE: [snmp] snmpv3 context receiver thread destroyed
> 
> Birgit,
> 
> I tried using the latest code from sourceforge, however, the 
> problem still appears. Strangely the first test fails (remote 
> host not reachable then reachable). If I run a subsequent 
> test in the same JVM it succeeds. Is the timeline 
> /authentication info cached separately from contexts? If so 
> that might explain why only the first test fails (since the 
> subsequent ones don't have to send discovery PDU's).
> 
> I looked at some of the email archives and found a problem 
> that sounds related. There is a problem with transmitters not 
> getting cleaned up when an exception is thrown by discovery 
> pdu (which is the case since it times out, right?). I am 
> including the email. Was this ever fixed in the code?
> 
> Thanks,
> 
> Josh
> 
> > -----Original Message-----
> > From: snmp-bounces at snmp.westhawk.co.uk
> > [mailto:snmp-bounces at snmp.westhawk.co.uk] On Behalf Of Birgit 
> > Arkesteijn
> > Sent: Wednesday, April 04, 2007 10:56 AM
> > To: List for discussion of the Westhawk SNMP stack
> > Subject: Re: [snmp] snmpv3 context receiver thread destroyed
> > 
> > 
> > Hi Josh,
> > 
> > I don't have much time now, so just a quick answer:
> > 
> > In your debug output:
> > The lines:
> > 4298: snmp.stack.SnmpContextv3.addPdu(): msgId=57, Pdu reqId=57
> > 4304: snmp.stack.SnmpContextv3.addPdu(): msgId=58, Pdu reqId=58
> > 
> > are the last where 'addPdu()' is called.
> > 
> > - Pdu.send() calls
> > - Pdu.send(int error_status, int error_index) calls
> > - SnmpContextBasisFace.addPdu(this);
> > 
> > Pdu.send() should be on your end.
> > Are you sure you keep sending PDUs?
> > Could you (maybe) add some debug every time you call send()?
> > 
> > Cheers, Birgit
> > 
> > 
> > 
> > On Wed, Apr 04, 2007 at 10:30:46AM -0400, Josh Bers wrote:
> > > Birgit,
> > > 
> > > Thank you for the detailed analysis. I am going to try
> > doing a notify
> > > instead of waitForSelf() to see if that improves 
> things...however...
> > > 
> > > I still do not understand why the Receiver thread for the 
> localhost
> > > context dies which would seem to explain why all subsequent 
> > > GetPdu_vec's timeout. Based upon your response I can
> > understand that
> > > the set of getPdu_vecs that are lined up before the agent
> > comes up are
> > > timing out, however, once the timeline is in and no more
> > > discoverUSMbeans need to run... There should not be any delay in 
> > > sending the pdu's... However you are not seeing and 
> sendme() calls.
> > > Are the transmitter's getting killed too?
> > > 
> > > BTW. We were using debug level of 15.
> > > 
> > > Josh
> > > 
> > > > -----Original Message-----
> > > > From: snmp-bounces at snmp.westhawk.co.uk
> > > > [mailto:snmp-bounces at snmp.westhawk.co.uk] On Behalf Of Birgit 
> > > > Arkesteijn
> > > > Sent: Wednesday, April 04, 2007 7:19 AM
> > > > To: List for discussion of the Westhawk SNMP stack
> > > > Subject: Re: [snmp] snmpv3 context receiver thread destroyed
> > > > 
> > > > 
> > > > Hi Josh,
> > > > 
> > > > An interesting problem ... hence a very long answer.
> > > > 
> > > > Incoming discovery responses aren't treated any different from
> > > > other responses. This is because other (non discovery) 
> responses 
> > > > contain timeline info as well to keep the stack synchronised.
> > > > 
> > > > BTW, discovery is done blocked.
> > > > 
> > > > Here is a very short outline of what happens when a response
> > > > (discovery or not) is received and decoded:
> > > > 
> > > > **** AsnDecoderv3.processSNMPv3() calls:
> > > > - TimeWindow.isEngineIdOK():
> > > >   + if no engineId yet: store it
> > > >   + if engineId: compare
> > > >   returns true/false
> > > >   (debug > 4)
> > > >   When false -> throws DecodingException("Received engine Id xxx
> > > > is not
> > > >   correct")
> > > > 
> > > > - TimeWindow.isOutsideTimeWindow():
> > > >   returns true/false, does not store
> > > >   + if no timeline info yet: return false
> > > >   + if timeline info: check if in window (return true/false)
> > > >   When true -> throws DecodingException("Message is outside time
> > > > window")
> > > > 
> > > > - TimeWindow.updateTimeWindow():
> > > >   stores when:
> > > >   + (no timeline yet) & (bootsA > 0 || timeA > 0)
> > > >   + received timeline data newer then previous one
> > > >   (debug > 4)
> > > > 
> > > > 
> > > > **** In other words:
> > > > 1 The engineId is stored the first time.
> > > > 2 Timeline info is only stored if newer. 
> > > >   However, if an older timeline info is received after
> > newer timeline
> > > >   info, the older info might cause a DecodingException 
> exception.
> > > >   I don't think that's your case, since I cannot see a
> > > > DecodingException
> > > >   in your debug output.
> > > > 
> > > > 
> > > > 
> > > > **** As to your debug output:
> > > > I'm trying to highlight the most important lines of your debug
> > > > output. I removed "uk.co.westhawk." to make the lines 
> shorter. The 
> > > > first number is the line number:
> > > > 
> > > > <snippets>
> > > >  284: Bringing down the loopback interface.
> > > > 
> > > >  671: pdu.DiscoveryPdu.setResponseException(): Timed out
> > > > 4284: pdu.DiscoveryPdu.setResponseException(): Timed out
> > > > 4290: pdu.GetPdu_vec.setResponseException(): Timed out
> > > > 
> > > > 4297: Bringing up the loopback interface.
> > > > 
> > > > 4298: stack.SnmpContextv3.addPdu(): msgId=57, Pdu reqId=57
> > > > 4304: stack.SnmpContextv3.addPdu(): msgId=58, Pdu reqId=58
> > > > 4374: pdu.DiscoveryPdu.sendme(): Sent Pdu reqId 58, retries 1
> > > > 
> > > > 4436: stack.SnmpContextv3.processIncomingResponse():
> > > > msgId=58, Pdu reqId=58
> > > > 
> > > > 4462: stack.TimeWindow.setSnmpEngineId(): 
> > > >     hostaddr '127.0.0.1', 
> > > >     port '161', 
> > > >     snmpEngineId '800007E580F3A63E214C07C945'', 
> > > >     key '127.0.0.1: 161'
> > > > 
> > > > 4466: stack.TimeWindow.setTimeLine(): 
> > > >     snmpEngineId 800007E580F3A63E214C07C945, 
> > > >     node stack.TimeWindowNode[
> > > >         engineId=800007E580F3A63E214C07C945, 
> > > >         engineBoots=8, 
> > > >         engineTime=409479, 
> > > >         latestReceivedEngineTime=409479]
> > > > 
> > > > 4469: stack.SnmpContextv3.processIncomingResponse(): rid2=58
> > > > 4472: beans.UsmDiscoveryBean.startDiscovery(): Done
> > > > 4474: stack.SnmpContextv3.actualEncodePacket(): msgId=57, Pdu 
> > > > reqId=57
> > > > 
> > > > 4571: pdu.DiscoveryPdu.sendme(): Sent Pdu reqId 56, retries 5
> > > > 
> > > > 4633: stack.SnmpContextv3.processIncomingResponse():
> > > > msgId=56, Pdu reqId=56
> > > > 
> > > > 4659: pdu.DiscoveryPdu.setResponseException(): Timed out
> > > > 
> > > > 4667: stack.SnmpContextv3.processIncomingResponse(): rid2=56
> > > > 4670: beans.UsmDiscoveryBean.startDiscovery(): Done
> > > > 
> > > > 4758: pdu.GetPdu_vec.setResponseException(): Timed out
> > > > 4774: pdu.GetPdu_vec.setResponseException(): Timed out
> > > > 4784: pdu.GetPdu_vec.setResponseException(): Timed out
> > > > 4794: pdu.GetPdu_vec.setResponseException(): Timed out
> > > > 4795: Done unblocked timed out.
> > > > 
> > > > 4892: WARNING: Method: public abstract void
> > > > fcncnm.ndm.commonobjects.EventStreamAPI.updateStatus(fcncnm.nd
> > > > m.commonobjects.Status) throws
> > fcncnm.ndm.commonobjects.CNPException,
> > > >     couldn't add arg:  DeviceStatus: 
> > > >     MeId:  NMS ID = 25/SECRET/HOST/bogus/local 
> > > >     ME type = fcncnm.ndm.ams.adaptors.ics.ICSHost 
> > > >     ME instance = localhost,knode82' 
> > > >     date='Wed Mar 28 12: 05:11 EDT 2007'
> > > > 
> > > > 4955: SEVERE: Couldn't find ip address bogus-ip cause: 
> > > >     "java.net.UnknownHostException:  bogus-ip: bogus-ip"
> > > > 
> > > > 4956: java.net.UnknownHostException: bogus-ip: bogus-ip
> > </snippets>
> > > > 
> > > > 
> > > > *** Comments on your debug output:
> > > > Two of the discovery packets are answered (58, 56). However, all
> > > > your original GetPdu_vec keep timing out. Message 57 (what I 
> > > > assume is one of your GetPdu_vec), should have
> > stand a chance.
> > > > 
> > > > As far as I can tell, the GetPdu_vec isn't even sent 
> once (I would
> > > > expect a "GetPdu_vec.sendme()" in the log file), 
> because it times 
> > > > out.
> > > > 
> > > > That's due to the waitForSelf(). The situation will 
> improve if you
> > > > use the notification mechanism, since that starts the 
> clock when 
> > > > the message is actually sent (see Pdu.transmit()), whereas 
> > > > waitForSelf() immediately starts waiting.
> > > > 
> > > > (BTW, your debug says "one unblocked timed out", but by calling
> > > > "waitForSelf()" you are sending the GetPdu_vec blocked.)
> > > > 
> > > > I realise that our debugging isn't sufficient, since not all the
> > > > message mention the request ID. It makes it very hard to track 
> > > > when a request is added, sent and timed out. It won't help you 
> > > > now, but I'll put that on my list.
> > > > 
> > > > BTW, what the level of debug are you using? I would 
> have expected
> > > > more IOExceptions.
> > > > 
> > > > 
> > > > *** Overall conclusion:
> > > > The situation will improve dramatically if you use the 
> latest CVS
> > > > source, since the discovery packets will use the same retry 
> > > > intervals. The discovery requests and your requests 
> won't drift so 
> > > > much.
> > > > 
> > > > When you send the GetPdu_vec via the notification mechanism, I
> > > > expect the pdu to be sent after the discovery succeeds, whereas 
> > > > now it has already timed out.
> > > > 
> > > > Hope the above helps.
> > > > 
> > > > Cheers, Birgit
> > > > 
> > > > 
> > > > On Tue, Apr 03, 2007 at 03:06:48PM -0400, Josh Bers wrote:
> > > > > Birgit,
> > > > > 
> > > > > Yeah, I'd prefer an official release but if this fixes the
> > > > problem we
> > > > > might update to the CVS head... Did you and/or Tim take a
> > > > look at the
> > > > > debug output? Any ideas about what's going wrong?
> > > > > 
> > > > > > To answer your question:
> > > > > > The 5 retries all have the same request Id. The first
> > one that
> > > > > > comes back is "accepted" (for better word of it), the
> > next ones
> > > > > > are ignored. In the later case you would get a
> > message: Pdu of
> > > > > > msgId XXX is already answered.
> > > > > > 
> > > > > 
> > > > > Birgit the problem that I'm expecting is that many
> > Discovery PDU's
> > > > > with different msgId's will be outstanding (being retried)
> > > > say req id
> > > > > 55, 57, and 59. Let's say the original request of 59 
> comes back
> > > > > because the agent becomes reachable. Now retry # 2 for
> > > > reqid 57 goes
> > > > > through... Does the stack handle this case of multiple USM
> > > > discovery
> > > > > PDU's with responses? What happens to the timeline, engine id,
> > > > > etc.
> > > > > 
> > > > > Josh
> > > > 
> > > > --
> > > > -- Birgit Arkesteijn, birgit at westhawk.co.uk,
> > > > -- Westhawk Ltd, Albion Wharf, 19 Albion Street,
> > Manchester M1 5LN, UK
> > > > -- Company no: 1769350
> > > > -- Registered Office:
> > > > -- 15 London Road, Stockton Heath, Warrington WA4 6SJ. UK.
> > > > -- tel.: +44 (0)161 237 0660
> > > > -- <URL: http://www.westhawk.co.uk> 
> > > > _______________________________________________
> > > > snmp mailing list
> > > > snmp at snmp.westhawk.co.uk
> > > > http://snmp.westhawk.co.uk/mailman/listinfo/snmp
> > > > 
> > > 
> > > 
> > > 
> > > _______________________________________________
> > > snmp mailing list
> > > snmp at snmp.westhawk.co.uk 
> > > http://snmp.westhawk.co.uk/mailman/listinfo/snmp
> > > 
> > 
> > --
> > -- Birgit Arkesteijn, birgit at westhawk.co.uk,
> > -- Westhawk Ltd, Albion Wharf, 19 Albion Street, Manchester 
> M1 5LN, UK
> > -- Company no: 1769350
> > -- Registered Office:
> > -- 15 London Road, Stockton Heath, Warrington WA4 6SJ. UK.
> > -- tel.: +44 (0)161 237 0660
> > -- <URL: http://www.westhawk.co.uk>
> > _______________________________________________
> > snmp mailing list
> > snmp at snmp.westhawk.co.uk
> > http://snmp.westhawk.co.uk/mailman/listinfo/snmp
> > 
> 
> _______________________________________________
> snmp mailing list
> snmp at snmp.westhawk.co.uk
> http://snmp.westhawk.co.uk/mailman/listinfo/snmp
> 





More information about the snmp mailing list