lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 22 Jan 2020 10:00:59 -0800
From:   Evan Green <evgreen@...omium.org>
To:     Rajat Jain <rajatja@...gle.com>
Cc:     Bjorn Helgaas <bhelgaas@...gle.com>,
        linux-pci <linux-pci@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] PCI/MSI: Avoid torn updates to MSI pairs

On Wed, Jan 22, 2020 at 3:26 AM Rajat Jain <rajatja@...gle.com> wrote:
>
> On Fri, Jan 17, 2020 at 4:26 PM Evan Green <evgreen@...omium.org> wrote:
> >
> > __pci_write_msi_msg() updates three registers in the device: address
> > high, address low, and data. On x86 systems, address low contains
> > CPU targeting info, and data contains the vector. The order of writes
> > is address, then data.
> >
> > This is problematic if an interrupt comes in after address has
> > been written, but before data is updated, and both the SMP affinity
> > and target vector are being changed. In this case, the interrupt targets
> > the wrong vector on the new CPU.
> >
> > This case is pretty easy to stumble into using xhci and CPU hotplugging.
> > Create a script that repeatedly targets interrupts at a set of cores and
> > then offlines those cores. Put some stress on USB, and then watch xhci
> > lose an interrupt and die.
>
> Do I understand it right, that even with this patch, the driver might
> still miss the same interrupt (because we are disabling the interrupt
> for that time) -  the improvement this patch brings is that it will at
> least not be delivered to the wrong CPU or via a wrong vector?

In my experiments, the driver no longer misses the interrupt. XHCI is
particularly sensitive to this, if it misses one interrupt it seems to
completely wedge the driver.

I think in my case the device pends the interrupts until MSIs are
re-enabled, because I don't see anything other than MSI for xhci in
/proc/interrupts. But I'm not sure if other devices may fall back to
line-based interrupts for a moment, and if that's a problem.

Although, I already see we call pci_msi_set_enable(0) whenever we set
up MSIs, presumably for this same reason of avoiding torn MSIs. So my
fix is really just doing the same thing for an additional case. And if
getting stuck in a never-to-be-handled line based interrupt were a
problem, you'd think it would also be a problem in
pci_restore_msi_state(), where the same thing is done.

Maybe my fix is at the wrong level, and should be up in
pci_msi_domain_write_msg() instead? Though I see a lot of callers to
pci_write_msi_msg() that I worry have the same problem.
-Evan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ