linux-kernel - Re: MSI interrupt for xhci still lost on 5.6-rc6 after cpu hotplug

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87tv2dd17z.fsf@nanos.tec.linutronix.de>
Date:   Tue, 24 Mar 2020 20:03:44 +0100
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Evan Green <evgreen@...omium.org>
Cc:     Mathias Nyman <mathias.nyman@...ux.intel.com>, x86@...nel.org,
        linux-pci <linux-pci@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        "Ghorai\, Sukumar" <sukumar.ghorai@...el.com>,
        "Amara\, Madhusudanarao" <madhusudanarao.amara@...el.com>,
        "Nandamuri\, Srikanth" <srikanth.nandamuri@...el.com>
Subject: Re: MSI interrupt for xhci still lost on 5.6-rc6 after cpu hotplug

Evan Green <evgreen@...omium.org> writes:
> On Mon, Mar 23, 2020 at 5:24 PM Thomas Gleixner <tglx@...utronix.de> wrote:
>> And of course all of this is so well documented that all of us can
>> clearly figure out what's going on...
>
> I won't pretend to know what's going on, so I'll preface this by
> labeling it all as "flailing", but:
>
> I wonder if there's some way the interrupt can get delayed between
> XHCI snapping the torn value and it finding its way into the IRR. For
> instance, if xhci read this value at the start of their interrupt
> moderation timer period, that would be awful (I hope they don't do
> this). One test patch would be to carve out 8 vectors reserved for
> xhci on all cpus. Whenever you change the affinity, the assigned
> vector is always reserved_base + cpu_number. That lets you exercise
> the affinity switching code, but in a controlled manner where torn
> interrupts could be easily seen (ie hey I got an interrupt on cpu 4's
> vector but I'm cpu 2). I might struggle to write such a change, but in
> theory it's doable.

Well, the point is that we don't see a spurious interrupt on any
CPU. We added a traceprintk into do_IRQ() and that would immediately
tell us where the thing goes off into lala land. Which it didn't.

> I was alternately trying to build a theory in my head about the write
> somehow being posted and getting out of order, but I don't think that
> can happen.

If that happens then the lost XHCI interrupt is the least of your
worries.

Thanks,

        tglx