linux-kernel - Re: [PATCH 2/2] genirq: proc: fix a procfs entry leak

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2023082908-bulb-scrubbed-32af@gregkh>
Date:   Tue, 29 Aug 2023 22:18:32 +0200
From:   Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To:     Bartosz Golaszewski <brgl@...ev.pl>
Cc:     Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org,
        Bartosz Golaszewski <bartosz.golaszewski@...aro.org>,
        Marc Zyngier <maz@...nel.org>,
        Linus Walleij <linus.walleij@...aro.org>
Subject: Re: [PATCH 2/2] genirq: proc: fix a procfs entry leak

On Tue, Aug 29, 2023 at 02:24:21PM +0200, Bartosz Golaszewski wrote:
> On Tue, Aug 29, 2023 at 11:11 AM Thomas Gleixner <tglx@...utronix.de> wrote:
> >
> > On Tue, Aug 29 2023 at 08:26, Bartosz Golaszewski wrote:
> > > On Mon, Aug 28, 2023 at 11:44 PM Thomas Gleixner <tglx@...utronix.de> wrote:
> > >> That's the module which provides the interrupt domain and hid-whatever
> > >> is the one which requests the interrupt, no?
> > >>
> > > Not at all! This is what I said: "we have the hid-cp2112 module which
> > > drives a GPIO-and-I2C-expander-on-a-USB-stick". Meaning: the
> > > hid-cp2112 module registers a USB driver for a GPIO expander on a
> > > stick. This GPIO chip is the interrupt controller here. It's the USB
> > > stick that provides interrupts for whatever else needs them (in real
> > > life: it can be an IIO device on the I2C bus which signals some events
> > > over the GPIOs). The user can get the interrupt number using the
> > > gpiod_to_irq() function. It can be unplugged at any moment and module
> > > references will not stop the USB bus from unbinding it.
> >
> > Sorry for my confusion, but this all is confusing at best.
> >
> > So what you are saying is that the GPIO driver, which creates the
> > interrupt domain is unbound and that unbind destroys the interrupt
> > domain, right? IOW, the wonderful world of plug and pray.
> >
> > Let's look at the full picture again.
> >
> >    USB -> USB-device
> >           |----------- GPIO
> >           |------------I2C  ---------- I2C-device
> >                  (hid-cp2112 driver)   (i2c-xx-driver)
> >
> > i2x-xx-driver is the one which requests the interrupt from
> > hid-cp2112-GPIO, right?
> >
> 
> Yes! Sorry if I was not being clear about it.
> 
> > So when the USB device disconnects, then something needs to tell the
> > i2c-xx-driver that the I2C interface is not longer available, right?
> >
> > IOW, the unbind operation needs the following notification and teardown
> > order:
> >
> >    1) USB core notifies hid-cp2112
> >
> >    2) hid-cp2112 notifies i2c-xx-driver
> >
> >    3) i2c-xx-driver mops up and invokes free_irq()
> >
> >    4) hid-cp2112 removes the interrupt domain
> >
> > But it seems that you end up with a situation where the notification of
> > the i2c-xx-driver is either not happening or the driver in question is
> > simply failing to mop up and free the requested interrupt.
> >
> 
> Yes, there's no notification of any kind.

Why not fix that?

> It's a common problem unfortunately across different subsystems. We
> have hot-unpluggable consumers using resources that don't support it
> (like interrupts in this example).

Then the driver for the controller of that hot-pluggable irq controller
should be fixed.

> > As a consequence you want to work around it by mopping up the requested
> > interrupts somewhere else.
> >
> 
> The approach I'm proposing - and that we implement in GPIO - is
> treating the "handle" to the resource as what's often called in
> programming - a weak reference. The resource itself is released not by
> the consumer, but the provider. The consumer in turn can get the weak
> reference from the provider and has to have some way of converting it
> to a strong one for the duration of any of the API calls. It can be
> implemented internally with a mutex, spinlock, an RCU read section or
> otherwise (in GPIO we're using rw_semaphores but I'm working on
> migrating to SRCU in order to protect the functions called from
> interrupt context too which is missing ATM). If for any reason the
> provider vanishes, then the next API call will fail. If it vanishes
> during a call, then we'll wait for the call to exit before freeing the
> resources, even if the underlying HW is already gone (the call in
> progress may fail, that's alright).
> 
> For interrupts it would mean that when the consumer calls
> request_irq(), the number it gets is a weak reference to the irq_desc.
> For any management operation we lock irq_desc. If the domain is
> destroyed, irq_descs get destroyed with it (after all users leave the
> critical section). Next call to any of the functions looks up the irq
> number and sees it's gone. It fails or silently returns depending on
> the function (e.g. irq_free() would have to ignore the missing
> lookup).
> 
> But I'm just floating ideas here.

That's a nice idea, but a lot of work implementing this.  Good luck!

Fixing the driver might be simpler :)

greg k-h