From thp at westhawk.co.uk Thu Feb 1 10:28:11 2007 From: thp at westhawk.co.uk (Tim Panton) Date: Thu Feb 1 10:33:55 2007 Subject: [snmp] Possible flaw in transmitter? In-Reply-To: <014d01c744b7$d6ee16a0$83b328c0@visionael.com> References: <014d01c744b7$d6ee16a0$83b328c0@visionael.com> Message-ID: <4D6A81BE-463F-490F-9ABF-6BA0D660E2E3@westhawk.co.uk> On 30 Jan 2007, at 21:44, Andy Chandler wrote: > I'm tracking an elusive nullpointer in a multi-threaded app > that does > quite a bit of snmp. I may have nailed down my original problem > but I > also saw this snippet of code in the transmitter and I'm hoping > someone can > put me at ease that the nullpointer couldn't come from here: > > > /** > * This method is the counterpart of sit(). > * > * The Pdu will call this method when it is sent. > * This method will notify itself and in the end it is transmitted > * in run. > * > * @see #sit() > * @see #run() > * @see Pdu#send > */ > synchronized void stand() > { > notifyAll(); > } > > /** > * It may be sleeping (as opposed to wait()ing > * so send it a kick. > */ > void interruptMe() > { > // unsafe ? > me.interrupt(); > } > > void destroy() > { > me = null; > pdu=null; > stand(); > } > > > When context.destroy() is called it calls destroy on the > transmitters ( by > actually calling the freetransmitters method). My thought is > this: If a > trnasmitter is "sit" and a destroy comes in, it will set "me = > null". What > if a latent host reponds at that moment? It would call pdufillin > which in > turn would call interuptMe() - however since "me" is null courtesy > of the > non synchronized destroy then you would get a nullpointer from > "me.interrupt()" > > > I know it sounds a long shot but when you are running 99 major > threads, each > running 3 discovery subthreads across 65k - 16Million addresses the > statistically improbable can happen. > > > > If I'm in any way correct about this could the following changes > alleviate > the risk? > > destroy calls interruptMe() first and THEN sets me = null ? > > > Synchronizing destroy in the past led to some stack deadlocks so I'm > avoiding that as a suggestion. Andy, thanks, Yes, and I don't think it is strictly needed here. How about something as simple as: if( me!= null){ me.interrupt(); } If you follow Andy's case, then interrupt won't get called. That isn't a problem because all the interrupt does is to speed things up, the recv will timeout and the thread will clean itself up, just a bit later than it would have. I'll do some testing to see if this has any unpleasant side-effects, but it looks like the simplest solution, and one that won't slow down the vast majority of cases. (i.e. the _rare_ case is slower, but the rest not) Tim. From birgit at westhawk.co.uk Mon Feb 5 15:12:03 2007 From: birgit at westhawk.co.uk (Birgit Arkesteijn) Date: Mon Feb 5 15:17:15 2007 Subject: [snmp] SNMP stack on SourceForge Message-ID: <20070205151203.GA26026@westhawk.co.uk> Hi everyone, The source code of the stack can now be downloaded via cvs from SourceFourge; See http://sourceforge.net/projects/westhawksnmp/ There isn't much else (yet) to be found on SourceForge, but that will happen in due time. Cheers, Birgit -- -- Birgit Arkesteijn, birgit@westhawk.co.uk, -- Westhawk Ltd, Albion Wharf, 19 Albion Street, Manchester M1 5LN, UK -- tel.: +44 (0)161 237 0660 -- From birgit at westhawk.co.uk Mon Feb 5 15:13:47 2007 From: birgit at westhawk.co.uk (Birgit Arkesteijn) Date: Mon Feb 5 15:18:38 2007 Subject: [snmp] Possible flaw in transmitter? In-Reply-To: <4D6A81BE-463F-490F-9ABF-6BA0D660E2E3@westhawk.co.uk> References: <014d01c744b7$d6ee16a0$83b328c0@visionael.com> <4D6A81BE-463F-490F-9ABF-6BA0D660E2E3@westhawk.co.uk> Message-ID: <20070205151347.GB26026@westhawk.co.uk> I added Tim's code suggestion to Transmitter.interruptMe(), but didn't test it. The code is checked into cvs (on SF), see previous announcement. Birgit On Thu, Feb 01, 2007 at 10:28:11AM +0000, Tim Panton wrote: > Andy, thanks, > > Yes, and I don't think it is strictly needed here. > > How about something as simple as: > if( me!= null){ > me.interrupt(); > } > > If you follow Andy's case, then interrupt won't get called. > That isn't a problem because all the interrupt does is to speed > things up, > the recv will timeout and the thread will clean itself up, just a bit > later > than it would have. > > I'll do some testing to see if this has any unpleasant side-effects, > but > it looks like the simplest solution, and one that won't slow down the > vast majority of cases. (i.e. the _rare_ case is slower, but the rest > not) > > Tim. -- -- Birgit Arkesteijn, birgit@westhawk.co.uk, -- Westhawk Ltd, Albion Wharf, 19 Albion Street, Manchester M1 5LN, UK -- tel.: +44 (0)161 237 0660 -- From birgit at westhawk.co.uk Mon Feb 5 15:15:59 2007 From: birgit at westhawk.co.uk (Birgit Arkesteijn) Date: Mon Feb 5 15:20:47 2007 Subject: [snmp] slow snmp v3 discovery timeout In-Reply-To: <20070130105737.GA2350@westhawk.co.uk> References: <005f01c74411$a43ad3c0$010010ac@JBERS2> <20070130105737.GA2350@westhawk.co.uk> Message-ID: <20070205151559.GC26026@westhawk.co.uk> I amended the code and checked it into cvs on SourceForge. Cheers, Birgit On Tue, Jan 30, 2007 at 10:57:37AM +0000, Birgit Arkesteijn wrote: > I agree with Josh, parameters such as the retry_interval should be > copied from Pdu to DiscoveryPdu via UsmDiscoveryBean. > That probably means new methods > - Pdu.getRetryIntervals() > - UsmDiscoveryBean.setRetryIntervals() > > > Cheers, Birgit -- -- Birgit Arkesteijn, birgit@westhawk.co.uk, -- Westhawk Ltd, Albion Wharf, 19 Albion Street, Manchester M1 5LN, UK -- tel.: +44 (0)161 237 0660 -- From andy at riftware.com Wed Feb 7 10:36:19 2007 From: andy at riftware.com (Andrew Chandler) Date: Wed Feb 7 16:39:09 2007 Subject: [snmp] Possible flaw in transmitter? In-Reply-To: <20070205151347.GB26026@westhawk.co.uk> References: <014d01c744b7$d6ee16a0$83b328c0@visionael.com><4D6A81BE-463F-490F-9ABF-6BA0D660E2E3@westhawk.co.uk> <20070205151347.GB26026@westhawk.co.uk> Message-ID: <00ce01c74ad6$1c217540$83b328c0@visionael.com> Thanks - I've got it in play and it combined with something else I did has apparently fixed my 5 + year running bug ! Interesting thing to note folks, although probably many of you realized this already: The operation ++ is not threadsafe. By this I mean if you have a member int x in an object. If you then update that count by Doing an "x++" operation inside an update method. The x++ is actually a 4 step process. JDK 5 and above provides an AtomicInteger (and other types as well) that allows you to do these things in a thread safe way. (They probably just synchronize all access but hey its easier than doing it myself). I post this here because I suspect with regards to timeouts etc there might be a couple of areas in the stack that warrant scrutiny. After changing the me.interrupt in the stack the problem had not gone away entirely. However after changing to an AtomicInteger that I incremented to show how many of the tried community strings had returned my problem seems to have gone away. -----Original Message----- From: snmp-bounces@snmp.westhawk.co.uk [mailto:snmp-bounces@snmp.westhawk.co.uk] On Behalf Of Birgit Arkesteijn Sent: Monday, February 05, 2007 9:14 AM To: List for discussion of the Westhawk SNMP stack Subject: Re: [snmp] Possible flaw in transmitter? I added Tim's code suggestion to Transmitter.interruptMe(), but didn't test it. The code is checked into cvs (on SF), see previous announcement. Birgit On Thu, Feb 01, 2007 at 10:28:11AM +0000, Tim Panton wrote: > Andy, thanks, > > Yes, and I don't think it is strictly needed here. > > How about something as simple as: > if( me!= null){ > me.interrupt(); > } > > If you follow Andy's case, then interrupt won't get called. > That isn't a problem because all the interrupt does is to speed things > up, the recv will timeout and the thread will clean itself up, just a > bit later than it would have. > > I'll do some testing to see if this has any unpleasant side-effects, > but it looks like the simplest solution, and one that won't slow down > the vast majority of cases. (i.e. the _rare_ case is slower, but the > rest > not) > > Tim. -- -- Birgit Arkesteijn, birgit@westhawk.co.uk, -- Westhawk Ltd, Albion Wharf, 19 Albion Street, Manchester M1 5LN, UK -- tel.: +44 (0)161 237 0660 -- _______________________________________________ snmp mailing list snmp@snmp.westhawk.co.uk http://snmp.westhawk.co.uk/mailman/listinfo/snmp