linux-kernel - Re: [PATCH v2 0/5] mm/hwpoison: Fix regressions in memory failure handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250224220146.GBZ7zsSnXLftyqWzW_@fat_crate.local>
Date: Mon, 24 Feb 2025 23:01:46 +0100
From: Borislav Petkov <bp@...en8.de>
To: Shuai Xue <xueshuai@...ux.alibaba.com>
Cc: "Luck, Tony" <tony.luck@...el.com>,
	"nao.horiguchi@...il.com" <nao.horiguchi@...il.com>,
	"tglx@...utronix.de" <tglx@...utronix.de>,
	"mingo@...hat.com" <mingo@...hat.com>,
	"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
	"x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
	"linmiaohe@...wei.com" <linmiaohe@...wei.com>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"peterz@...radead.org" <peterz@...radead.org>,
	"jpoimboe@...nel.org" <jpoimboe@...nel.org>,
	"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"baolin.wang@...ux.alibaba.com" <baolin.wang@...ux.alibaba.com>,
	"tianruidong@...ux.alibaba.com" <tianruidong@...ux.alibaba.com>
Subject: Re: [PATCH v2 0/5] mm/hwpoison: Fix regressions in memory failure
 handling

On Fri, Feb 21, 2025 at 02:05:28PM +0800, Shuai Xue wrote:
> #perf script
> kworker/48:1-mm 25516 [048]  1713.893549: probe:memory_failure: (ffffffffaa622db4)
>         ffffffffaa622db5 memory_failure+0x5 ([kernel.kallsyms])
>         ffffffffaa25aa93 uc_decode_notifier+0x73 ([kernel.kallsyms])
>         ffffffffaa3068bb notifier_call_chain+0x5b ([kernel.kallsyms])
>         ffffffffaa306ae1 blocking_notifier_call_chain+0x41 ([kernel.kallsyms])
>         ffffffffaa25bbfe mce_gen_pool_process+0x3e ([kernel.kallsyms])
>         ffffffffaa2f455f process_one_work+0x19f ([kernel.kallsyms])
>         ffffffffaa2f509c worker_thread+0x20c ([kernel.kallsyms])
>         ffffffffaa2fec89 kthread+0xd9 ([kernel.kallsyms])
>         ffffffffaa245131 ret_from_fork+0x31 ([kernel.kallsyms])
>         ffffffffaa2076ca ret_from_fork_asm+0x1a ([kernel.kallsyms])
> 
> einj_mem_uc 44530 [184]  1713.908089: probe:memory_failure: (ffffffffaa622db4)
>         ffffffffaa622db5 memory_failure+0x5 ([kernel.kallsyms])
>         ffffffffaa2594fb kill_me_maybe+0x5b ([kernel.kallsyms])
>         ffffffffaa2fac29 task_work_run+0x59 ([kernel.kallsyms])
>         ffffffffaaf52347 irqentry_exit_to_user_mode+0x1c7 ([kernel.kallsyms])
>         ffffffffaaf50bce noist_exc_machine_check+0x3e ([kernel.kallsyms])
>         ffffffffaa001303 asm_exc_machine_check+0x33 ([kernel.kallsyms])
>                   405046 thread+0xe (/home/shawn.xs/ras-tools/einj_mem_uc)
> 
> einj_mem_uc 44531 [089]  1713.916319: probe:memory_failure: (ffffffffaa622db4)
>         ffffffffaa622db5 memory_failure+0x5 ([kernel.kallsyms])
>         ffffffffaa2594fb kill_me_maybe+0x5b ([kernel.kallsyms])
>         ffffffffaa2fac29 task_work_run+0x59 ([kernel.kallsyms])
>         ffffffffaaf52347 irqentry_exit_to_user_mode+0x1c7 ([kernel.kallsyms])
>         ffffffffaaf50bce noist_exc_machine_check+0x3e ([kernel.kallsyms])
>         ffffffffaa001303 asm_exc_machine_check+0x33 ([kernel.kallsyms])
>                   405046 thread+0xe (/home/shawn.xs/ras-tools/einj_mem_uc)

What are those stack traces supposed to say?

Two processes are injecting, cause a #MC and a kworker gets to handle the UC?

All injecting to the same page?

What's the upper limit on CPUs seeing the same hw error and all raising
a CMCI/#MC?

> - kill_accessing_process() is only called when the flags are set to
>   MF_ACTION_REQUIRED, which means it is in the MCE path.
> - Whether the page is clean determines the behavior of try_to_unmap. For a
>   dirty page, try_to_unmap uses TTU_HWPOISON to unmap the PTE and convert the
>   PTE entry to a swap entry. For a clean page, try_to_unmap uses ~TTU_HWPOISON
>   and simply unmaps the PTE.
> - When does walk_page_range() with hwpoison_walk_ops return 1?
>   1. If the poison page still exists, we should of course kill the current
>      process.
>   2. If the poison page does not exist, but is_hwpoison_entry is true, meaning
>      it is a dirty page, we should also kill the current process, too.
>   3. Otherwise, it returns 0, which means the page is clean.

I think you're too deep into detail. What I'd do is step back, think what
would be the *proper* recovery action and then make sure memory_failure does
that. If it doesn't - fix it to do so.

So, what should really happen wrt recovery action if any number of CPUs see
the same memory error?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette