[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bc82e3e7-f301-3a40-cbf6-927351b6575d@huawei.com>
Date: Mon, 4 Mar 2024 16:45:30 +0800
From: Tong Tiangen <tongtiangen@...wei.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
CC: Al Viro <viro@...nel.org>, David Howells <dhowells@...hat.com>, Jens Axboe
<axboe@...nel.dk>, Christoph Hellwig <hch@....de>, Christian Brauner
<christian@...uner.io>, David Laight <David.Laight@...lab.com>, Matthew
Wilcox <willy@...radead.org>, Jeff Layton <jlayton@...nel.org>,
<linux-fsdevel@...r.kernel.org>, <linux-block@...r.kernel.org>,
<linux-mm@...ck.org>, <netdev@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, Kefeng Wang <wangkefeng.wang@...wei.com>
Subject: Re: [bug report] dead loop in generic_perform_write() //Re: [PATCH v7
07/12] iov_iter: Convert iterate*() to inline funcs
在 2024/3/3 2:06, Linus Torvalds 写道:
> On Sat, 2 Mar 2024 at 01:37, Tong Tiangen <tongtiangen@...wei.com> wrote:
>>
>> I think this solution has two impacts:
>> 1. Although it is not a performance-critical path, the CPU usage may be
>> affected by one more memory copy in some large-memory applications.
>
> Compared to the IO, the extra memory copy is a non-issue.
>
> If anything, getting rid of the "copy_mc" flag removes extra code in a
> much more important path (ie the normal iov_iter code).
Indeed. I'll test this solution. Theoretically, it should solve the problem.
>
>> 2. If a hardware memory error occurs in "good location" and the
>> ".copy_mc" is removed, the kernel will panic.
>
> That's always true. We do not support non-recoverable machine checks
> on kernel memory. Never have, and realistically probably never will. >
> In fact, as far as I know, the hardware that caused all this code in
> the first place no longer exists, and never really made it to wide
> production.
Yes. There is a low probability that the newly applied memory is faulty.
Thanks,
Tong.
>
> The machine checks in question happened on pmem, now killed by Intel.
> It's possible that somebody wants to use it for something else, but
> let's hope any future implementations are less broken than the
> unbelievable sh*tshow that caused all this code in the first place.
>
> The whole copy_mc_to_kernel() mess exists mainly due to broken pmem
> devices along with old and broken CPU's that did not deal correctly
> with machine checks inside the regular memory copy ('rep movs') code,
> and caused hung machines.
>
> IOW, notice how 'copy_mc_to_kernel()' just becomes a regular
> 'memcpy()' on fixed hardware, and how we have that disgusting
> copy_mc_fragile_key that gets enabled for older CPU cores.
>
> And yes, we then have copy_mc_enhanced_fast_string() which isn't
> *that* disgusting, and that actually handles machine checks properly
> on more modern hardware, but it's still very much "the hardware is
> misdesiged, it has no testing, and nobody sane should depend on this"
>
> In other words, it's the usual "Enterprise Hardware" situation. Looks
> fancy on paper, costs an arm and a leg, and the reality is just sad,
> sad, sad.
>
> Linus
> .
Powered by blists - more mailing lists