linux-kernel - Re: [PATCH] nvme-pci: Prevent mmio reads if pci channel offline

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wgRdATtsBzo-LDkWn_sW-PJvq95SahZcsTxxDMPzLWxKA@mail.gmail.com>
Date:   Tue, 26 Feb 2019 17:01:49 -0800
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Alex Gagniuc <Alex_Gagniuc@...lteam.com>
Cc:     Keith Busch <keith.busch@...el.com>, Jens Axboe <axboe@...com>,
        Sagi Grimberg <sagi@...mberg.me>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        linux-nvme@...ts.infradead.org, mr.nuke.me@...il.com,
        Christoph Hellwig <hch@....de>,
        Jon Derrick <jonathan.derrick@...el.com>
Subject: Re: [PATCH] nvme-pci: Prevent mmio reads if pci channel offline

On Tue, Feb 26, 2019 at 2:37 PM <Alex_Gagniuc@...lteam.com> wrote:
>
> Then nobody gets the (error) message. You can go a bit further and try
> 'pcie_ports=native". Again, nobody gets the memo. ):

So? The error was bogus to begin with. Why would we care?

Yes, yes, PCI bridges have the ability to return errors in accesses to
non-existent devices. But that was always bogus, and is never useful.
The whole "you get an interrupt or NMI on a bad access" is simply a
horribly broken model. It's not useful.

We already have long depended on hotplug drivers noticing the "oh, I'm
getting all-ff returns, the device may be gone". It's usually trivial,
and works a whole lot better.

It's not an error. Trying to force it to be an NMI or SCI or machine
check is bogus. It causes horrendous pain, because asynchronous
reporting doesn't work reliably anyway, and *synchronous* reporting is
impossible to sanely handle without crazy problems.

So the only sane model for hotplug devices is "IO still works, and
returns all ones". Maybe with an async one-time and *recoverable*
machine check or other reporting the access after the fact.

Anything else is simply broken. It would be broken even if firmware
wasn't involved, but obviously firmware people tend to often make a
bad situation even worse.

              Linus