linux-kernel - Re: [PATCH v3 2/3] mm: Change ghes code to allow poison of non-struct pfn

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <134e43f7-583c-48c1-8ccc-dddc18700d3b@linux.alibaba.com>
Date: Fri, 24 Oct 2025 18:03:22 +0800
From: Shuai Xue <xueshuai@...ux.alibaba.com>
To: Ira Weiny <ira.weiny@...el.com>, "Luck, Tony" <tony.luck@...el.com>,
 "ankita@...dia.com" <ankita@...dia.com>,
 "aniketa@...dia.com" <aniketa@...dia.com>, "Sethi, Vikram"
 <vsethi@...dia.com>, "jgg@...dia.com" <jgg@...dia.com>,
 "mochs@...dia.com" <mochs@...dia.com>,
 "skolothumtho@...dia.com" <skolothumtho@...dia.com>,
 "linmiaohe@...wei.com" <linmiaohe@...wei.com>,
 "nao.horiguchi@...il.com" <nao.horiguchi@...il.com>,
 "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
 "david@...hat.com" <david@...hat.com>,
 "lorenzo.stoakes@...cle.com" <lorenzo.stoakes@...cle.com>,
 "Liam.Howlett@...cle.com" <Liam.Howlett@...cle.com>,
 "vbabka@...e.cz" <vbabka@...e.cz>, "rppt@...nel.org" <rppt@...nel.org>,
 "surenb@...gle.com" <surenb@...gle.com>, "mhocko@...e.com"
 <mhocko@...e.com>, "bp@...en8.de" <bp@...en8.de>,
 "rafael@...nel.org" <rafael@...nel.org>,
 "guohanjun@...wei.com" <guohanjun@...wei.com>,
 "mchehab@...nel.org" <mchehab@...nel.org>, "lenb@...nel.org"
 <lenb@...nel.org>, "Tian, Kevin" <kevin.tian@...el.com>,
 "alex@...zbot.org" <alex@...zbot.org>
Cc: "cjia@...dia.com" <cjia@...dia.com>,
 "kwankhede@...dia.com" <kwankhede@...dia.com>,
 "targupta@...dia.com" <targupta@...dia.com>,
 "zhiw@...dia.com" <zhiw@...dia.com>, "dnigam@...dia.com"
 <dnigam@...dia.com>, "kjaju@...dia.com" <kjaju@...dia.com>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 "linux-mm@...ck.org" <linux-mm@...ck.org>,
 "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
 "Jonathan.Cameron@...wei.com" <Jonathan.Cameron@...wei.com>,
 "Smita.KoralahalliChannabasappa@....com"
 <Smita.KoralahalliChannabasappa@....com>,
 "u.kleine-koenig@...libre.com" <u.kleine-koenig@...libre.com>,
 "peterz@...radead.org" <peterz@...radead.org>,
 "linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
 "kvm@...r.kernel.org" <kvm@...r.kernel.org>
Subject: Re: [PATCH v3 2/3] mm: Change ghes code to allow poison of non-struct
 pfn



在 2025/10/22 23:03, Ira Weiny 写道:
> Shuai Xue wrote:
>>
>>
>> 在 2025/10/22 01:19, Luck, Tony 写道:
>>>>>       pfn = PHYS_PFN(physical_addr);
>>>>> -   if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) {
>>>>
>>>> Tony,
>>>>
>>>> I'm not an SGX expert but does this break SGX by removing
>>>> arch_is_platform_page()?
>>>>
>>>> See:
>>>>
>>>> 40e0e7843e23 ("x86/sgx: Add infrastructure to identify SGX EPC pages")
>>>> Cc: Tony Luck <tony.luck@...el.com>
>>>>
>>> Ira,
>>>
>>> I think this deletion makes the GHES code always call memory_failure()
>>> instead of bailing out here on "bad" page frame numbers.
>>>
>>> That centralizes the checks for different types of memory into
>>> memory_failure().
>>>
>>> -Tony
>>
>> Hi, Tony, Ankit and Ira,
>>
>> Finally, we're seeing other use cases that need to handle errors for
>> non-struct page PFNs :)
>>
>> IMHO, non-struct page PFNs are common in production environments.
>> Besides NVIDIA Grace GPU device memory, we also use reserved DRAM memory
>> managed by a separate VMEM allocator.
> 
> Can you elaborate on this more?

We reserve a significant portion of DRAM memory at boot time using
kernel command line parameters. This reserved memory is then managed by
our internal VMEM allocator, which handles memory allocation and
deallocation for virtual machines.

To minimize memory overhead, we intentionally avoid creating struct
pages for this reserved memory region. Instead, we've implemented the
following approach:

- Our VMEM allocator directly manages the physical memory without the
   overhead of struct page metadata.
- Error Handling: We register custom RAS operations (ras_ops) with the
   memory failure infrastructure. When poisoned memory is accessed within
   this region, our registered handler: Tags the affected memory area as
   poisoned Isolates the memory to prevent further access Terminates any
   tasks that were using the poisoned memory

This approach allows us to handle memory errors effectively while
maintaining minimal memory overhead for large reserved regions. It's
similar in concept to how device memory (like NVIDIA Grace GPU memory
mentioned earlier) needs error handling without struct page backing.

Thanks.
Shuai