lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 2 Mar 2024 10:06:56 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Tong Tiangen <tongtiangen@...wei.com>
Cc: Al Viro <viro@...nel.org>, David Howells <dhowells@...hat.com>, Jens Axboe <axboe@...nel.dk>, 
	Christoph Hellwig <hch@....de>, Christian Brauner <christian@...uner.io>, 
	David Laight <David.Laight@...lab.com>, Matthew Wilcox <willy@...radead.org>, 
	Jeff Layton <jlayton@...nel.org>, linux-fsdevel@...r.kernel.org, 
	linux-block@...r.kernel.org, linux-mm@...ck.org, netdev@...r.kernel.org, 
	linux-kernel@...r.kernel.org, Kefeng Wang <wangkefeng.wang@...wei.com>
Subject: Re: [bug report] dead loop in generic_perform_write() //Re: [PATCH v7
 07/12] iov_iter: Convert iterate*() to inline funcs

On Sat, 2 Mar 2024 at 01:37, Tong Tiangen <tongtiangen@...wei.com> wrote:
>
> I think this solution has two impacts:
> 1. Although it is not a performance-critical path, the CPU usage may be
> affected by one more memory copy in some large-memory applications.

Compared to the IO, the extra memory copy is a non-issue.

If anything, getting rid of the "copy_mc" flag removes extra code in a
much more important path (ie the normal iov_iter code).

> 2. If a hardware memory error occurs in "good location" and the
> ".copy_mc" is removed, the kernel will panic.

That's always true. We do not support non-recoverable machine checks
on kernel memory. Never have, and realistically probably never will.

In fact, as far as I know, the hardware that caused all this code in
the first place no longer exists, and never really made it to wide
production.

The machine checks in question happened on pmem, now killed by Intel.
It's possible that somebody wants to use it for something else, but
let's hope any future implementations are less broken than the
unbelievable sh*tshow that caused all this code in the first place.

The whole copy_mc_to_kernel() mess exists mainly due to broken pmem
devices along with old and broken CPU's that did not deal correctly
with machine checks inside the regular memory copy ('rep movs') code,
and caused hung machines.

IOW, notice how 'copy_mc_to_kernel()' just becomes a regular
'memcpy()' on fixed hardware, and how we have that disgusting
copy_mc_fragile_key that gets enabled for older CPU cores.

And yes, we then have copy_mc_enhanced_fast_string() which isn't
*that* disgusting, and that actually handles machine checks properly
on more modern hardware, but it's still very much "the hardware is
misdesiged, it has no testing, and nobody sane should depend on this"

In other words, it's the usual "Enterprise Hardware" situation. Looks
fancy on paper, costs an arm and a leg, and the reality is just sad,
sad, sad.

               Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ