lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87sf87aw36.ffs@tglx>
Date:   Fri, 25 Aug 2023 19:08:29 +0200
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Bartosz Golaszewski <brgl@...ev.pl>
Cc:     linux-kernel@...r.kernel.org,
        Bartosz Golaszewski <bartosz.golaszewski@...aro.org>,
        Marc Zyngier <marc.zyngier@....com>,
        Linus Walleij <linus.walleij@...aro.org>
Subject: Re: [PATCH 2/2] genirq: proc: fix a procfs entry leak

On Fri, Aug 25 2023 at 13:01, Bartosz Golaszewski wrote:
> On Fri, Aug 25, 2023 at 10:11 AM Thomas Gleixner <tglx@...utronix.de> wrote:
>> On Fri, Aug 25 2023 at 00:36, brgl@...ev.pl wrote:
>> > On Thu, 24 Aug 2023 22:12:41 +0200, Thomas Gleixner <tglx@...utronix.de> said:
>> > Here a GPIO device - that is also an irq chip - is unbound (this is a testing
>> > module unbound during a test-case but it can be anything else, like an I2C
>> > expander for which the driver is unloaded) while some users called
>> > request_irq() on its interrupts (this is orthogonal to gpiod_get() and doesn't
>> > take a reference to the module, so nothing is stopping us from
>> > unloading it)
>>
>> You just described the real problem in this sentence. So why are you
>> trying to cure a symptom?
>
> Honestly I'm not following. Even if we did have a way of taking the
> reference to the underlying GPIO module (I'm 99% percent sure, it's
> not possible: we're not using any of the GPIO interfaces for that,

The point is that something frees an in-use interrupt descriptor, which
is obviously wrong to begin with.

Now you go and cure the symptom of a stale procfs file, but as I said
before this is the least of the worries.

Care to look at what free_irq() does and you might figure out why this
stale file is just an easy to observe symptom, but obviously not the
real problem.

This leaves an activated interrupt around with stale pointers to the
descriptor. The interrupt could be on the flight. The associated thread
could be running. There can be resources claimed via request_irq() which
will not be freed either. There can be management operations on the
interrupt in parallel.

A driver which successfully requests an interrupt can rightfully expect
that any management operation on the interrupt, e.g. disable(),
enable(), set_*() is valid until the point it invokes free_irq().

IOW, the descriptor including the associated interrupt chips (software
representation) in the hierarchy must be at least in a consistent state
and accessible. If the underlying hardware vanished, e.g. USB
disconnect, then still the software side must be intact. Of course an
disable_irq() won't reach the hardware anymore, but that's something the
relevant driver has to handle correctly.

So just claiming that it's fine to free in-use interrupts and everything
is greate by removing the procfs entries is just wishful thinking.

You simply cannot free an in-use interrupt descriptor and I'm going to
add a big fat warning into that code to that effect.

So if it turns out that this is a general problem, then we have to sit
down and solve it from ground up.

Thanks,

        tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ