[snmp] snmpv3 context receiver thread destroyed
Birgit Arkesteijn
birgit at westhawk.co.uk
Wed Apr 4 16:55:58 BST 2007
Hi Josh,
I don't have much time now, so just a quick answer:
In your debug output:
The lines:
4298: snmp.stack.SnmpContextv3.addPdu(): msgId=57, Pdu reqId=57
4304: snmp.stack.SnmpContextv3.addPdu(): msgId=58, Pdu reqId=58
are the last where 'addPdu()' is called.
- Pdu.send() calls
- Pdu.send(int error_status, int error_index) calls
- SnmpContextBasisFace.addPdu(this);
Pdu.send() should be on your end.
Are you sure you keep sending PDUs?
Could you (maybe) add some debug every time you call send()?
Cheers, Birgit
On Wed, Apr 04, 2007 at 10:30:46AM -0400, Josh Bers wrote:
> Birgit,
>
> Thank you for the detailed analysis. I am going to try doing a notify
> instead of waitForSelf() to see if that improves things...however...
>
> I still do not understand why the Receiver thread for the localhost
> context dies which would seem to explain why all subsequent
> GetPdu_vec's timeout.
> Based upon your response I can understand that the set of getPdu_vecs
> that are lined up before the agent comes up are timing out, however,
> once the timeline is in and no more discoverUSMbeans need to run...
> There should not be any delay in sending the pdu's... However you are
> not seeing and sendme() calls. Are the transmitter's getting killed
> too?
>
> BTW. We were using debug level of 15.
>
> Josh
>
> > -----Original Message-----
> > From: snmp-bounces at snmp.westhawk.co.uk
> > [mailto:snmp-bounces at snmp.westhawk.co.uk] On Behalf Of Birgit
> > Arkesteijn
> > Sent: Wednesday, April 04, 2007 7:19 AM
> > To: List for discussion of the Westhawk SNMP stack
> > Subject: Re: [snmp] snmpv3 context receiver thread destroyed
> >
> >
> > Hi Josh,
> >
> > An interesting problem ... hence a very long answer.
> >
> > Incoming discovery responses aren't treated any different
> > from other responses. This is because other (non discovery)
> > responses contain timeline info as well to keep the stack
> > synchronised.
> >
> > BTW, discovery is done blocked.
> >
> > Here is a very short outline of what happens when a response
> > (discovery or not) is received and decoded:
> >
> > **** AsnDecoderv3.processSNMPv3() calls:
> > - TimeWindow.isEngineIdOK():
> > + if no engineId yet: store it
> > + if engineId: compare
> > returns true/false
> > (debug > 4)
> > When false -> throws DecodingException("Received engine Id
> > xxx is not
> > correct")
> >
> > - TimeWindow.isOutsideTimeWindow():
> > returns true/false, does not store
> > + if no timeline info yet: return false
> > + if timeline info: check if in window (return true/false)
> > When true -> throws DecodingException("Message is outside
> > time window")
> >
> > - TimeWindow.updateTimeWindow():
> > stores when:
> > + (no timeline yet) & (bootsA > 0 || timeA > 0)
> > + received timeline data newer then previous one
> > (debug > 4)
> >
> >
> > **** In other words:
> > 1 The engineId is stored the first time.
> > 2 Timeline info is only stored if newer.
> > However, if an older timeline info is received after newer timeline
> > info, the older info might cause a DecodingException exception.
> > I don't think that's your case, since I cannot see a
> > DecodingException
> > in your debug output.
> >
> >
> >
> > **** As to your debug output:
> > I'm trying to highlight the most important lines of your
> > debug output. I removed "uk.co.westhawk." to make the lines
> > shorter. The first number is the line number:
> >
> > <snippets>
> > 284: Bringing down the loopback interface.
> >
> > 671: pdu.DiscoveryPdu.setResponseException(): Timed out
> > 4284: pdu.DiscoveryPdu.setResponseException(): Timed out
> > 4290: pdu.GetPdu_vec.setResponseException(): Timed out
> >
> > 4297: Bringing up the loopback interface.
> >
> > 4298: stack.SnmpContextv3.addPdu(): msgId=57, Pdu reqId=57
> > 4304: stack.SnmpContextv3.addPdu(): msgId=58, Pdu reqId=58
> > 4374: pdu.DiscoveryPdu.sendme(): Sent Pdu reqId 58, retries 1
> >
> > 4436: stack.SnmpContextv3.processIncomingResponse():
> > msgId=58, Pdu reqId=58
> >
> > 4462: stack.TimeWindow.setSnmpEngineId():
> > hostaddr '127.0.0.1',
> > port '161',
> > snmpEngineId '800007E580F3A63E214C07C945'',
> > key '127.0.0.1: 161'
> >
> > 4466: stack.TimeWindow.setTimeLine():
> > snmpEngineId 800007E580F3A63E214C07C945,
> > node stack.TimeWindowNode[
> > engineId=800007E580F3A63E214C07C945,
> > engineBoots=8,
> > engineTime=409479,
> > latestReceivedEngineTime=409479]
> >
> > 4469: stack.SnmpContextv3.processIncomingResponse(): rid2=58
> > 4472: beans.UsmDiscoveryBean.startDiscovery(): Done
> > 4474: stack.SnmpContextv3.actualEncodePacket(): msgId=57, Pdu reqId=57
> >
> > 4571: pdu.DiscoveryPdu.sendme(): Sent Pdu reqId 56, retries 5
> >
> > 4633: stack.SnmpContextv3.processIncomingResponse():
> > msgId=56, Pdu reqId=56
> >
> > 4659: pdu.DiscoveryPdu.setResponseException(): Timed out
> >
> > 4667: stack.SnmpContextv3.processIncomingResponse(): rid2=56
> > 4670: beans.UsmDiscoveryBean.startDiscovery(): Done
> >
> > 4758: pdu.GetPdu_vec.setResponseException(): Timed out
> > 4774: pdu.GetPdu_vec.setResponseException(): Timed out
> > 4784: pdu.GetPdu_vec.setResponseException(): Timed out
> > 4794: pdu.GetPdu_vec.setResponseException(): Timed out
> > 4795: Done unblocked timed out.
> >
> > 4892: WARNING: Method: public abstract void
> > fcncnm.ndm.commonobjects.EventStreamAPI.updateStatus(fcncnm.nd
> > m.commonobjects.Status) throws fcncnm.ndm.commonobjects.CNPException,
> > couldn't add arg: DeviceStatus:
> > MeId: NMS ID = 25/SECRET/HOST/bogus/local
> > ME type = fcncnm.ndm.ams.adaptors.ics.ICSHost
> > ME instance = localhost,knode82'
> > date='Wed Mar 28 12: 05:11 EDT 2007'
> >
> > 4955: SEVERE: Couldn't find ip address bogus-ip cause:
> > "java.net.UnknownHostException: bogus-ip: bogus-ip"
> >
> > 4956: java.net.UnknownHostException: bogus-ip: bogus-ip </snippets>
> >
> >
> > *** Comments on your debug output:
> > Two of the discovery packets are answered (58, 56). However,
> > all your original GetPdu_vec keep timing out. Message 57
> > (what I assume is one of your GetPdu_vec), should have stand a chance.
> >
> > As far as I can tell, the GetPdu_vec isn't even sent once (I
> > would expect a "GetPdu_vec.sendme()" in the log file),
> > because it times out.
> >
> > That's due to the waitForSelf(). The situation will improve
> > if you use the notification mechanism, since that starts the
> > clock when the message is actually sent (see Pdu.transmit()),
> > whereas waitForSelf() immediately starts waiting.
> >
> > (BTW, your debug says "one unblocked timed out", but by
> > calling "waitForSelf()" you are sending the GetPdu_vec blocked.)
> >
> > I realise that our debugging isn't sufficient, since not all
> > the message mention the request ID. It makes it very hard to
> > track when a request is added, sent and timed out. It won't
> > help you now, but I'll put that on my list.
> >
> > BTW, what the level of debug are you using? I would have
> > expected more IOExceptions.
> >
> >
> > *** Overall conclusion:
> > The situation will improve dramatically if you use the latest
> > CVS source, since the discovery packets will use the same
> > retry intervals. The discovery requests and your requests
> > won't drift so much.
> >
> > When you send the GetPdu_vec via the notification mechanism,
> > I expect the pdu to be sent after the discovery succeeds,
> > whereas now it has already timed out.
> >
> > Hope the above helps.
> >
> > Cheers, Birgit
> >
> >
> > On Tue, Apr 03, 2007 at 03:06:48PM -0400, Josh Bers wrote:
> > > Birgit,
> > >
> > > Yeah, I'd prefer an official release but if this fixes the
> > problem we
> > > might update to the CVS head... Did you and/or Tim take a
> > look at the
> > > debug output? Any ideas about what's going wrong?
> > >
> > > > To answer your question:
> > > > The 5 retries all have the same request Id. The first one
> > > > that comes back is "accepted" (for better word of it), the
> > > > next ones are ignored.
> > > > In the later case you would get a message: Pdu of msgId XXX
> > > > is already answered.
> > > >
> > >
> > > Birgit the problem that I'm expecting is that many Discovery PDU's
> > > with different msgId's will be outstanding (being retried)
> > say req id
> > > 55, 57, and 59. Let's say the original request of 59 comes back
> > > because the agent becomes reachable. Now retry # 2 for
> > reqid 57 goes
> > > through... Does the stack handle this case of multiple USM
> > discovery
> > > PDU's with responses? What happens to the timeline, engine id, etc.
> > >
> > > Josh
> >
> > --
> > -- Birgit Arkesteijn, birgit at westhawk.co.uk,
> > -- Westhawk Ltd, Albion Wharf, 19 Albion Street, Manchester M1 5LN, UK
> > -- Company no: 1769350
> > -- Registered Office:
> > -- 15 London Road, Stockton Heath, Warrington WA4 6SJ. UK.
> > -- tel.: +44 (0)161 237 0660
> > -- <URL: http://www.westhawk.co.uk>
> > _______________________________________________
> > snmp mailing list
> > snmp at snmp.westhawk.co.uk
> > http://snmp.westhawk.co.uk/mailman/listinfo/snmp
> >
>
>
>
> _______________________________________________
> snmp mailing list
> snmp at snmp.westhawk.co.uk
> http://snmp.westhawk.co.uk/mailman/listinfo/snmp
>
--
-- Birgit Arkesteijn, birgit at westhawk.co.uk,
-- Westhawk Ltd, Albion Wharf, 19 Albion Street, Manchester M1 5LN, UK
-- Company no: 1769350
-- Registered Office:
-- 15 London Road, Stockton Heath, Warrington WA4 6SJ. UK.
-- tel.: +44 (0)161 237 0660
-- <URL: http://www.westhawk.co.uk>
More information about the snmp
mailing list