linux-kernel - Re: [PATCH -next v4 2/3] x86/mce: rename MCE_IN_KERNEL_COPYIN to MCE_IN_KERNEL_COPY

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4d974c1e-b3a8-8b21-88f4-e5f20b2fb654@huawei.com>
Date: Sat, 3 Feb 2024 15:56:04 +0800
From: Tong Tiangen <tongtiangen@...wei.com>
To: "Luck, Tony" <tony.luck@...el.com>, Borislav Petkov <bp@...en8.de>
CC: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
	"wangkefeng.wang@...wei.com" <wangkefeng.wang@...wei.com>, Dave Hansen
	<dave.hansen@...ux.intel.com>, "x86@...nel.org" <x86@...nel.org>, "H. Peter
 Anvin" <hpa@...or.com>, Andy Lutomirski <luto@...nel.org>, Peter Zijlstra
	<peterz@...radead.org>, Andrew Morton <akpm@...ux-foundation.org>, Naoya
 Horiguchi <naoya.horiguchi@....com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "linux-edac@...r.kernel.org"
	<linux-edac@...r.kernel.org>, "linux-mm@...ck.org" <linux-mm@...ck.org>,
	Guohanjun <guohanjun@...wei.com>
Subject: Re: [PATCH -next v4 2/3] x86/mce: rename MCE_IN_KERNEL_COPYIN to
 MCE_IN_KERNEL_COPY_MC



在 2024/2/3 6:46, Luck, Tony 写道:
>> Now, since you're explaining things today :) pls explain to me what this
>> patchset is all about? You having reviewed patch 3 and all?
>>
>> Why is this pattern:
>>
>>        if (copy_mc_user_highpage(dst, src, addr, vma)) {
>>                memory_failure_queue(page_to_pfn(src), 0);
>>
>> not good anymore?
>>
>> Or is the goal here to poison straight from the #MC handler and not
>> waste time and potentially get another #MC while memory_failure_queue()
>> on the source address is done?
>>
>> Or something completely different?
> 
> See the comment above memory_failure_queue()
> 
> * The function is primarily of use for corruptions that
>   * happen outside the current execution context (e.g. when
>   * detected by a background scrubber)
> 
> In the copy_mc_user_highpage() case the fault happens in
> the current execution context. So scheduling someone else
> to handle it at some future point is risky. Just deal with it
> right away.
> 
> -Tony

The goal of this patch:
   When #MC is triggered by copy_mc_user_highpage(), #MC is directly
processed in the synchronously triggered do_machine_check() ->
kill_me_never() -> memory_failure().

And the current handling is to call memory_failure_queue() ->
schedule_work_on() in the execution context, I think that's what
"scheduling someone else to handle it at some future point is risky."

Thanks.
Tong.