linux-kernel - Re: kobj refcounting weirdness

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090309150453.GB7627@kroah.com>
Date:	Mon, 9 Mar 2009 08:04:53 -0700
From:	Greg KH <greg@...ah.com>
To:	Alex Chiang <achiang@...com>, kay.sievers@...y.org, rjw@...k.pl,
	linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org
Subject: Re: kobj refcounting weirdness

On Mon, Mar 09, 2009 at 12:36:54AM -0600, Alex Chiang wrote:
> Hi Kay, Greg,
> 
> I've been working on this patch series recently that adds
> function and device level hotplug into the PCI core:
> 
> 	http://thread.gmane.org/gmane.linux.kernel.pci/3495
> 
> For the last two weeks, I've been beating my head against a
> refcounting/kobject problem, and was hoping you could give me
> some advice, since I seem to have run into a wall.
> 
> My test case has been removing device 0000:04:00.0, which should
> remove all the devices below it.

You are removing the children before the parent device, right?  If not,
you have to be _very_ careful (personally, I don't think you should be
allowed to do that, but others, like the scsi developers, like doing
things like this...)

>  +-[0000:03]---00.0-[0000:04-07]----00.0-[0000:05-07]--+-02.0-[0000:06]--+-00.0  Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter
>  |                                                     |                 \-00.1  Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter
>  |                                                     \-04.0-[0000:07]--+-00.0  Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter
>  |                                                                       \-00.1  Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter
> 
> I can remove the device and rescan the bus once, and it works
> fine. The second removal works fine, and then, unpredictably,
> later rescan/remove cycles eventually end up producing a warning
> and oops every time. Sometimes I die on the 2nd rescan, sometimes
> not until the 4th or 5th remove/rescan cycle.

What is the warning and oops?

> In this data set, I turned on kobject debugging, and managed to
> capture a trace where we die on the 2nd rescan.
> 
> In this data set, we:
> 
> 	- create a kobject for 0000:04:00.0 (e00000018cac2920)
> 	- remove the device
> 	- observe '0000:04:00.0' (e00000018cac2920): calling ktype release
> 	- rescan the bus
> 	- discover that e00000018cac2920 is still hanging around!

What do you mean by "rescan"?  And sure, if you create a new device, it
could be allocated at the same location, that's what the slab allocators
do, right?

Can you provide the full debug log that shows the problem?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/