[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251021223850.GA21107@nvidia.com>
Date: Tue, 21 Oct 2025 19:38:50 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: "Liam R. Howlett" <Liam.Howlett@...cle.com>, ankita@...dia.com,
aniketa@...dia.com, vsethi@...dia.com, mochs@...dia.com,
skolothumtho@...dia.com, linmiaohe@...wei.com,
nao.horiguchi@...il.com, akpm@...ux-foundation.org,
david@...hat.com, lorenzo.stoakes@...cle.com, vbabka@...e.cz,
rppt@...nel.org, surenb@...gle.com, mhocko@...e.com,
tony.luck@...el.com, bp@...en8.de, rafael@...nel.org,
guohanjun@...wei.com, mchehab@...nel.org, lenb@...nel.org,
kevin.tian@...el.com, alex@...zbot.org, cjia@...dia.com,
kwankhede@...dia.com, targupta@...dia.com, zhiw@...dia.com,
dnigam@...dia.com, kjaju@...dia.com, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, linux-edac@...r.kernel.org,
Jonathan.Cameron@...wei.com, ira.weiny@...el.com,
Smita.KoralahalliChannabasappa@....com,
u.kleine-koenig@...libre.com, peterz@...radead.org,
linux-acpi@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [PATCH v3 0/3] mm: Implement ECC handling for pfn with no struct
page
On Tue, Oct 21, 2025 at 02:54:10PM -0400, Liam R. Howlett wrote:
> > > Surely it's not failing hardware that may cause performance impacts, so
> > > is this triggered in some other way that I'm missing or a conversation
> > > pointer?
> >
> > It is the splitting of a pgd/pmd level into PTEs that gets mirrored
> > into the S2 and then greatly increases the cost of table walks inside
> > a guest. The HW caches are sized for 1G S2 PTEs, not 4k.
>
> Ah, I see. Seems like a worthy addition to the commit message? I mean,
> this is really a choice of throwing away memory for the benefit of tlb
> performance. Seems like a valid choice in your usecase but less so for
> the average laptop.
No memory is being thrown away, the choice is if the kernel will
protect itself from loading via userspace issuing repeated reads to
bad memory.
Ankit please include some of these details in the commit message
> Won't leaving the poisoned memory mapped cause migration issues? Even
> if the machine is migrated, my understanding is the poison follows
> through checkpoint restore.
The VMM has to keep track of this and not try to read the bad memory
during migration.
Jason
Powered by blists - more mailing lists