[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aNMhiZ4FiEE1Rk_T@casper.infradead.org>
Date: Tue, 23 Sep 2025 23:39:05 +0100
From: Matthew Wilcox <willy@...radead.org>
To: Yin Tirui <yintirui@...wei.com>
Cc: akpm@...ux-foundation.org, david@...hat.com, lorenzo.stoakes@...cle.com,
Liam.Howlett@...cle.com, vbabka@...e.cz, rppt@...nel.org,
surenb@...gle.com, mhocko@...e.com, ziy@...dia.com,
baolin.wang@...ux.alibaba.com, npache@...hat.com,
ryan.roberts@....com, dev.jain@....com, baohua@...nel.org,
catalin.marinas@....com, will@...nel.org, paul.walmsley@...ive.com,
palmer@...belt.com, aou@...s.berkeley.edu, alex@...ti.fr,
anshuman.khandual@....com, yangyicong@...ilicon.com,
ardb@...nel.org, apopple@...dia.com, samuel.holland@...ive.com,
luxu.kernel@...edance.com, abrestic@...osinc.com,
yongxuan.wang@...ive.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-riscv@...ts.infradead.org, wangkefeng.wang@...wei.com,
chenjun102@...wei.com
Subject: Re: [PATCH RFC 2/2] mm: add PMD-level huge page support for
remap_pfn_range()
On Tue, Sep 23, 2025 at 09:31:04PM +0800, Yin Tirui wrote:
> + entry = pte_clrhuge(pfn_pte(pmd_pfn(old_pmd), pmd_pgprot(old_pmd)));
This doesn't make sense. And I'm not saying you got this wrong; I
suspect in terms of how things work today it's actually necessary.
But the way we handle this stuff is so insane.
pte_clrhuge() should not exist. If we have a PTE, it can't have the
huge bit set, by definition (don't anybody mention hugetlbfs because
that is an entirely separate pile of broken horrors). I understand what
you're trying to do here. You want to construct a PTE that points to
the same address as the first page of the PMD and has the same
permissions. But that *should* be written as:
entry = pfn_pte(pmd_pfn(old_pmd), pmd_pgprot(old_pmd)));
right? Now, pmd_pgprot() might or might not want to return the huge bit
set. I'm not sure. Perhaps you could have a look through and figure it
out. But pfn_pte() should never return a PTE with the huge bit set.
So if it is set in the pgorot on entry, it should filter it out.
There are going to be consequences to this. Maybe there's code
somewhere that relies on pfn_pte() returning a PTE with the huge bit
set. Perhaps it's hugetlbfs.
But we have to start cleaning this garbage up. I did some work with
e3981db444a0 and the commits leading up to that. See
https://lkml.kernel.org/r/20250402181709.2386022-12-willy@infradead.org
I'd like pte_clrhuge() to be deleted from x86, not added to arm and
riscv.
Powered by blists - more mailing lists