linux-kernel - Re: Machine check recovery broken in v6.9-rc1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3e49dd21-0aea-c7ac-1633-91764e759bf7@huawei.com>
Date: Sun, 7 Apr 2024 11:59:33 +0800
From: Miaohe Lin <linmiaohe@...wei.com>
To: "Luck, Tony" <tony.luck@...el.com>, Oscar Salvador <osalvador@...e.de>
CC: David Hildenbrand <david@...hat.com>, Borislav Petkov <bp@...en8.de>,
	Yazen Ghannam <yazen.ghannam@....com>, Naoya Horiguchi
	<naoya.horiguchi@....com>, "linux-mm@...ck.org" <linux-mm@...ck.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: Machine check recovery broken in v6.9-rc1

On 2024/4/7 8:08, Luck, Tony wrote:
>> This one is against 6.1 (previous one was against v6.9-rc2):
>> Again, compile tested only
> 
> Oscar.
> 
> Both the 6.1 and 6.9-rc2 patches make the BUG (and subsequent issues) go away.
> 
> Here's what's happening.
> 
> When the machine check occurs there's a scramble from various subsystems
> to report the memory error.
> 
> ghes_do_memory_failure() calls memory_failure_queue() which later
> calls memory_failure() from a kernel thread. Side note: this happens TWICE
> for each error. Not sure yet if this is a BIOS issue logging more than once.
> or some Linux issues in acpi/apei/ghes.c code.
> 
> uc_decode_notifier() [called from a different kernel thread] also calls
> do_memory_failure()
> 
> Finally kill_me_maybe() [called from task_work on return to the application
> when returning from the machine check handler] also calls memory_failure()
> 
> do_memory_failure() is somewhat prepared for multiple reports of the same
> error. It uses an atomic test and set operation to mark the page as poisoned.
> 
> First called to report the error does all the real work. Late arrivals take a
> shorter path, but may still take some action(s) depending on the "flags"
> passed in:
> 
>         if (TestSetPageHWPoison(p)) {
>                 pr_err("%#lx: already hardware poisoned\n", pfn);
>                 res = -EHWPOISON;
>                 if (flags & MF_ACTION_REQUIRED)
>                         res = kill_accessing_process(current, pfn, flags);
>                 if (flags & MF_COUNT_INCREASED)
>                         put_page(p);
>                 goto unlock_mutex;
>         }
> 
> In this case the last to arrive has MF_ACTION_REQUIRED set, so calls
> kill_accessing_process() ... which is in the stack trace that led to the:
> 
>    kernel BUG at include/linux/swapops.h:88!
> 
> I'm not sure that I fully understand your patch. I guess that it is making sure to
> handle the case that the page has already been marked as poisoned?
> 
> 
> Anyway ... thanks for the quick fix. I hope the above helps write a good
> commit message to get this applied and backported to stable.

Sorry for late. I was just back from my vacation.

> 
> Tested-by: Tony Luck <tony.luck@...el.com>

Thanks for both. This should be a issue introduced from commit:

0d206b5d2e0d ("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry")

hwpoison_entry_to_pfn() is replaced with swp_offset_pfn() which might not be intended to be used
with hwpoison entry:

/*
 * A pfn swap entry is a special type of swap entry that always has a pfn stored
 * in the swap offset. *They are used to represent unaddressable device memory*
 * *and to restrict access to a page undergoing migration*
 */
static inline bool is_pfn_swap_entry(swp_entry_t entry)
{
	/* Make sure the swp offset can always store the needed fields */
	BUILD_BUG_ON(SWP_TYPE_SHIFT < SWP_PFN_BITS);

	return is_migration_entry(entry) || is_device_private_entry(entry) ||
	       is_device_exclusive_entry(entry);
}

I think Oscar's patch is the right fix and it will be better to amend the corresponding comment too.

Thanks.

> 
> -Tony
> 
> 
> 
> 
> 
> .
>