linux-kernel - RE: [PATCH v2 0/2] Replace and improve "mcsafe" with copy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a4aabe6f2ca649779a772a5f0365af6f@AcuMS.aculab.com>
Date:   Sun, 3 May 2020 12:57:30 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Linus Torvalds' <torvalds@...ux-foundation.org>,
        Dan Williams <dan.j.williams@...el.com>
CC:     "Luck, Tony" <tony.luck@...el.com>,
        Andy Lutomirski <luto@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "Peter Zijlstra" <peterz@...radead.org>,
        Borislav Petkov <bp@...en8.de>,
        stable <stable@...r.kernel.org>,
        the arch/x86 maintainers <x86@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        Paul Mackerras <paulus@...ba.org>,
        "Benjamin Herrenschmidt" <benh@...nel.crashing.org>,
        Erwin Tsaur <erwin.tsaur@...el.com>,
        Michael Ellerman <mpe@...erman.id.au>,
        "Arnaldo Carvalho de Melo" <acme@...nel.org>,
        linux-nvdimm <linux-nvdimm@...ts.01.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v2 0/2] Replace and improve "mcsafe" with copy_safe()

From: Linus Torvalds
> Sent: 01 May 2020 19:29
...
> And as DavidL pointed out - if you ever have "iomem" as a source or
> destination, you need yet another case. Not because they can take
> another kind of fault (although on some platforms you have the machine
> checks for that too), but because they have *very* different
> performance profiles (and the ERMS "rep movsb" sucks baby donkeys
> through a straw).

I was actually thinking that the nvdimm accesses need to be treated
much more like (cached) memory mapped io space than normal system
memory.
So treating them the same as "iomem" and then having access functions
that report access failures (which the current readq() doesn't)
might make sense.

If you are using memory that 'might fail' for kernel code or data
you really get what you deserve.

OTOH system response to PCIe errors is currently rather problematic.
Mostly reads time out and return ~0u.
This can be checked for and, if possibly valid, a second location read.

However we have a x86 server box (I've forgotten whether it is HP or
Dell) that generates an NMI whenever a PCIe link goes down.
(The 'platform' takes the AER interrupt and uses an NMI to pass
it to the kernel - whose bright idea was it to use an NMI???)
This happens even after we've done an 'echo 1 >remove'.
The system is supposed to be NEBS (I think that is the term) compliant
which is supposed to be suitable for telephony work (including
emergency calls), but any PCIe failure crashes the box!

I've another system here that sometimes fails to bring the PCIe
link back up.
I guess these code paths don't get regular testing.
In my case the PCIe slave is an fpga, reloading the fpga image
(either over JTAG or after rewriting eeprom) doesn't always work.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)