lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aHfXrT_rU0JAjnVD@google.com>
Date: Wed, 16 Jul 2025 09:47:41 -0700
From: Brian Norris <briannorris@...omium.org>
To: Manivannan Sadhasivam <mani@...nel.org>
Cc: Bartosz Golaszewski <brgl@...ev.pl>,
	Bjorn Helgaas <bhelgaas@...gle.com>, linux-pci@...r.kernel.org,
	Rob Herring <robh@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] PCI/pwrctrl: Only destroy alongside host bridge

On Wed, Jul 16, 2025 at 09:27:55PM +0530, Manivannan Sadhasivam wrote:
> On Tue, Jul 15, 2025 at 02:21:47PM GMT, Brian Norris wrote:
> > OTOH, I also see that part of my change is not really doing quite what I
> > thought it was -- so far, I think there may be some kind of resource
> > leak (kobj ref), since I'm not seeing pci_release_host_bridge_dev()
> > called when I think it should be. If I perform cleanup in
> > pci_free_host_bridge() instead, then I do indeed see
> > of_platform_device_destroy() tear things down the way I expect.
> > 
> 
> Oh, that's bad! Which controller it is? I played with making the pcie-qcom
> driver modular and I unloaded/loaded multiple times, but never saw any
> refcount warning (I really hope if there was any leak, it would've tripped over
> during insmod).

I'm still trying to tease this apart, and I'm not sure when I'll have
plenty of time to get further on this. I'm also primarily using a
non-upstream DWC-based driver, which isn't really ready to be published.
I also have some systems that use
drivers/pci/controller/pcie-rockchip-host.c and are fully
upstream-supported, so I'll see if I can replicate my observations
there.

But I think there are at least two problems:

(1) I'm adding code to bridge->dev.release(). release() is only called
    when the device's refcount drops to zero. And child devices hold a
    refcount on their parent (the bridge). So, I have a circular
    refcount, if there were any pwrctrl children present.

    I think this is easily solved by moving the child destruction to
    pci_free_host_bridge() instead.

(2) Even after resolving 1, I'm seeing pci_free_host_bridge() exit with
    a bridge->dev.kboj.kref refcount of 1 in some cases. I don't yet
    have an explanation of that one.

IIUC, this kind of error would be considered a leak, but crucially, I
also don't think it would produce any kind of refcount warning or other
error. It's "just" a device that has been removed (a la, device_del()),
but still has some client holding a reference count (i.e., not enough
put_device()).

Brian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ