[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090309150453.GB7627@kroah.com>
Date: Mon, 9 Mar 2009 08:04:53 -0700
From: Greg KH <greg@...ah.com>
To: Alex Chiang <achiang@...com>, kay.sievers@...y.org, rjw@...k.pl,
linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org
Subject: Re: kobj refcounting weirdness
On Mon, Mar 09, 2009 at 12:36:54AM -0600, Alex Chiang wrote:
> Hi Kay, Greg,
>
> I've been working on this patch series recently that adds
> function and device level hotplug into the PCI core:
>
> http://thread.gmane.org/gmane.linux.kernel.pci/3495
>
> For the last two weeks, I've been beating my head against a
> refcounting/kobject problem, and was hoping you could give me
> some advice, since I seem to have run into a wall.
>
> My test case has been removing device 0000:04:00.0, which should
> remove all the devices below it.
You are removing the children before the parent device, right? If not,
you have to be _very_ careful (personally, I don't think you should be
allowed to do that, but others, like the scsi developers, like doing
things like this...)
> +-[0000:03]---00.0-[0000:04-07]----00.0-[0000:05-07]--+-02.0-[0000:06]--+-00.0 Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter
> | | \-00.1 Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter
> | \-04.0-[0000:07]--+-00.0 Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter
> | \-00.1 Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter
>
> I can remove the device and rescan the bus once, and it works
> fine. The second removal works fine, and then, unpredictably,
> later rescan/remove cycles eventually end up producing a warning
> and oops every time. Sometimes I die on the 2nd rescan, sometimes
> not until the 4th or 5th remove/rescan cycle.
What is the warning and oops?
> In this data set, I turned on kobject debugging, and managed to
> capture a trace where we die on the 2nd rescan.
>
> In this data set, we:
>
> - create a kobject for 0000:04:00.0 (e00000018cac2920)
> - remove the device
> - observe '0000:04:00.0' (e00000018cac2920): calling ktype release
> - rescan the bus
> - discover that e00000018cac2920 is still hanging around!
What do you mean by "rescan"? And sure, if you create a new device, it
could be allocated at the same location, that's what the slab allocators
do, right?
Can you provide the full debug log that shows the problem?
thanks,
greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists