linux-kernel - Re: [PATCH v3] genirq: Check irq disabled & masked states in irq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAAFQd5BxsZAosNu6AJrcTehLsYOwKPoLWk14H6OTkrMEXkDmBw@mail.gmail.com>
Date:   Sun, 28 May 2017 00:11:48 +0900
From:   Tomasz Figa <tfiga@...omium.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Jeffy Chen <jeffy.chen@...k-chips.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Brian Norris <briannorris@...omium.org>,
        Douglas Anderson <dianders@...omium.org>
Subject: Re: [PATCH v3] genirq: Check irq disabled & masked states in irq_shutdown

Hi Thomas,

Thank you for your comments. Please see my replies inline.

On Sat, May 27, 2017 at 8:12 PM, Thomas Gleixner <tglx@...utronix.de> wrote:
> On Sat, 27 May 2017, Jeffy Chen wrote:
>
>> If a irq is already disabled, irq_shutdown may try to disable it again,
>> for example:
>> devm_request_irq->irq_startup->irq_enable
>> disable_irq                                   <-- disabled
>> devm_free_irq->irq_shutdown                   <-- disable it again
>>
>> This would confuse some chips which require balanced irq enable and
>> disable, for example pinctrl-rockchip & pinctrl-nomadik.
>
> "would confuse" is an interesting technical term. Can you please start to
> explain things in coherent sentences so people who do not have detailed
> knowledge of the problem at hand can understand it?

I think we might simply have a language barrier here unfortunately. I
agree, though, that we need a better description of the problem. Next
time we will help Jeffy with polishing the commit message. Please
forgive him this time, as he is still learning "the art" of mainline
patch submission.

>
>> Add a state check before calling irq_disable to prevent that.
>
> So you analyzed in half an hour that all of this is correct and fixes all
> possible imbalances, right?  Really impressive!
>
> You just forgot to mention this in the changelog.

I think this patch should have been an RFC to begin with, as it's just
supposed to show the problem we found and start a discussion on a way
to fix it.

>
>> v2: Rewrite commit message.
>> v3: Rewrite commit message and not skip irq_shutdown.
>
> This does not belong into the changelog and having that information twice
> is more than pointless.
>
> Before you send yet another version of this within 5 minutes, can you
> please sit down and analyze all the possible imbalance scenarios?

I think we should start from actually figuring out where the problem is.

The IRQ functionality provided by the pinctrl-rockchip has a power
saving mechanism that attempts to gate clocks whenever there are no
enabled interrupts. Currently the driver calls clk_enable() in
.irq_enable() and clk_disable() in .irq_disable() callbacks of its IRQ
chip. However there is no mention about ordering or reference counting
of those in code documentation (which is likely to mean that there are
no ordering guarantees and/or unbalanced calls may happen, please
correct me if I'm wrong).

We noticed that following scenario causes unbalanced clk_disable() calls:
 1) request_irq(),
 2) disable_irq(),
 3) free_irq().

After checking what's going on, we found that free_irq() ends up
calling irq_shutdown(), which defaults to chip's .irq_disable() if it
doesn't have .irq_shutdown() specified. This means that regardless of
whether disable_irq() was or wasn't called before, there is one more
call to .irq_disable() after free_irq(), which is the reason for our
unbalanced clk_disable() call from pinctrl-rockchip IRQ chip.

Now, we can simply hack the driver to not rely on any ordering of
.enable/disable_irq() calls, but we thought it would make more sense
to actually try to figure out whether it's the expected behavior from
the IRQ chip core code.

>
> I'm not going to apply hastiliy cobbled together workarounds which "fix"
> just half of the problem space. This is not random driver code which breaks
> ONE type of machine. This is core code which has the potential to break ALL
> machines out there in one go.
>
> Either you come up with a properly analyzed solution which addresses all
> possible imbalanced invocations or you have to wait until I find some time
> to look at it myself.

As I explained above, we might have introduced unnecessary confusion
here. Please forgive us. On the other hand, I'd like to ask for a bit
more understanding, as we have people involved in this, who are still
learning the tough art of upstream submission, potentially also behind
a language barrier and I'm pretty much sure they are more than happy
to address all your concerns, but would be much more motivated to do
the right thing if guided in a bit more humane manner. Thanks in
advance.

Best regards,
Tomasz