[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1daeaf4e-5477-40cb-bca0-e4cd5ad8a224@samsung.com>
Date: Wed, 11 Jun 2025 10:03:17 +0200
From: Marek Szyprowski <m.szyprowski@...sung.com>
To: Alistair Popple <apopple@...dia.com>
Cc: linux-mm@...ck.org, akpm@...ux-foundation.org,
"linux-riscv@...ts.infradead.org" <linux-riscv@...ts.infradead.org>, David
Hildenbrand <david@...hat.com>, Christoph Hellwig <hch@....de>, Jason
Gunthorpe <jgg@...dia.com>, gerald.schaefer@...ux.ibm.com,
dan.j.williams@...el.com, jgg@...pe.ca, willy@...radead.org,
linux-kernel@...r.kernel.org, nvdimm@...ts.linux.dev, jhubbard@...dia.com,
zhang.lyra@...il.com, debug@...osinc.com, bjorn@...nel.org,
balbirs@...dia.com, lorenzo.stoakes@...cle.com, John@...ves.net
Subject: Re: [PATCH] mm: Remove PFN_MAP, PFN_SPECIAL, PFN_SG_CHAIN and
PFN_SG_LAST
On 11.06.2025 04:38, Alistair Popple wrote:
> On Tue, Jun 10, 2025 at 06:18:09PM +0200, Marek Szyprowski wrote:
>> On 04.06.2025 05:21, Alistair Popple wrote:
>>> The PFN_MAP flag is no longer used for anything, so remove it.
>>> The PFN_SG_CHAIN and PFN_SG_LAST flags never appear to have been
>>> used so also remove them. The last user of PFN_SPECIAL was removed
>>> by 653d7825c149 ("dcssblk: mark DAX broken, remove FS_DAX_LIMITED
>>> support").
>>>
>>> Signed-off-by: Alistair Popple <apopple@...dia.com>
>>> Acked-by: David Hildenbrand <david@...hat.com>
>>> Reviewed-by: Christoph Hellwig <hch@....de>
>>> Reviewed-by: Jason Gunthorpe <jgg@...dia.com>
>>> Cc: gerald.schaefer@...ux.ibm.com
>>> Cc: dan.j.williams@...el.com
>>> Cc: jgg@...pe.ca
>>> Cc: willy@...radead.org
>>> Cc: david@...hat.com
>>> Cc: linux-kernel@...r.kernel.org
>>> Cc: nvdimm@...ts.linux.dev
>>> Cc: jhubbard@...dia.com
>>> Cc: hch@....de
>>> Cc: zhang.lyra@...il.com
>>> Cc: debug@...osinc.com
>>> Cc: bjorn@...nel.org
>>> Cc: balbirs@...dia.com
>>> Cc: lorenzo.stoakes@...cle.com
>>> Cc: John@...ves.net
>>>
>>> ---
>>>
>>> Splitting this off from the rest of my series[1] as a separate clean-up
>>> for consideration for the v6.16 merge window as suggested by Christoph.
>>>
>>> [1] - https://lore.kernel.org/linux-mm/cover.541c2702181b7461b84f1a6967a3f0e823023fcc.1748500293.git-series.apopple@nvidia.com/
>>> ---
>>> include/linux/pfn_t.h | 31 +++----------------------------
>>> mm/memory.c | 2 --
>>> tools/testing/nvdimm/test/iomap.c | 4 ----
>>> 3 files changed, 3 insertions(+), 34 deletions(-)
>> This patch landed in today's linux-next as commit 28be5676b4a3 ("mm:
>> remove PFN_MAP, PFN_SPECIAL, PFN_SG_CHAIN and PFN_SG_LAST"). In my tests
>> I've noticed that it breaks operation of all RISC-V 64bit boards on my
>> test farm (VisionFive2, BananaPiF3 as well as QEMU's Virt machine). I've
>> isolated the changes responsible for this issue, see the inline comments
>> in the patch below. Here is an example of the issues observed in the
>> logs from those machines:
> Thanks for the report. I'm really confused by this because this change should
> just be removal of dead code - nothing sets any of the removed PFN_* flags
> AFAICT.
>
> I don't have access to any RISC-V hardwdare but you say this reproduces under
> qemu - what do you run on the system to cause the error? Is it just a simple
> boot and load a module or are you running selftests or something else?
It fails a simple boot test. Here is a detailed instruction how to
reproduce this issue with the random Debian rootfs image found on the
internet (tested on Ubuntu 22.04, with next-20250610
kernel source):
cd linux-src
wget
http://vimer.7766.org:63015/images/Unmatched-debian/202503/nvme-rootfs.img.xz
xz -d nvme-rootfs.img.xz
make ARCH=riscv CROSS_COMPILE="riscv64-linux-gnu-" defconfig -j16 Image
qemu-system-riscv64 -kernel arch/riscv/boot/Image -append "console=ttyS0
earlycon no_console_suspend root=/dev/vda1 rw rootwait
ip=::::target::off" -M virt -smp 2 -m 1024 -device
virtio-blk-device,drive=virtio-blk0 -drive
file=nvme-rootfs.img,id=virtio-blk0,if=none,format=raw -netdev
user,id=user -device virtio-net-device,netdev=user -serial mon:stdio
-display none
("ctrl-a x" to exit)
> Also if possible it would be good to know if you still see the issue
> after applying the full series (https://lore.kernel.org/linux-mm/
> cover.541c2702181b7461b84f1a6967a3f0e823023fcc.1748500293.git-series.apopple@nvi
> https://protect2.fireeye.com/v1/url?k=9dffef89-c264d6ec-9dfe64c6-000babff32e3-99412cda4ccd1e25&q=1&e=26e3fd2c-72ca-4455-8caf-3257b47262e5&u=http%3A%2F%2Fdia.com%2F%29.
Well, that whole patchset applied on top of v6.15-rc7 seems to be
working fine. I was not able to rebase it onto v6.16-rc1 or
next-20250610 without conflicts, so I didn't test it further on newer
kernels.
However I've checked status after each patch from that patchset. After
applying the first one (which is $subject), the RISCV issue appears.
Then it gets fixed by the "[PATCH 04/12] mm: Convert vmf_insert_mixed()
from using pte_devmap to pte_special" patch. Applying the "[PATCH 04/12]
mm: Convert vmf_insert_mixed() from using pte_devmap to pte_special"
patch on top of next-20250610 fixes the RISCV issue too.
I hope this analysis somehow helps.
>> BUG: Bad page map in process modprobe pte:20682653 pmd:20f23c01
>> page: refcount:1 mapcount:-1 mapping:0000000000000000 index:0x0 pfn:0x81a09
>> flags: 0x2004(referenced|reserved|zone=0)
>> raw: 0000000000002004 ff1c000000068248 ff1c000000068248 0000000000000000
>> raw: 0000000000000000 0000000000000000 00000001fffffffe 0000000000000000
>> page dumped because: bad pte
>> addr:00007fff84619000 vm_flags:04044411 anon_vma:0000000000000000
>> mapping:0000000000000000 index:0
>> file:(null) fault:special_mapping_fault mmap:0x0 mmap_prepare: 0x0
>> read_folio:0x0
>> CPU: 1 UID: 0 PID: 58 Comm: modprobe Not tainted
>> 6.16.0-rc1-next-20250610+ #15719 NONE
>> Hardware name: riscv-virtio,qemu (DT)
>> Call Trace:
>> [<ffffffff80016152>] dump_backtrace+0x1c/0x24
>> [<ffffffff8000147a>] show_stack+0x28/0x34
>> [<ffffffff8000f61e>] dump_stack_lvl+0x5e/0x86
>> [<ffffffff8000f65a>] dump_stack+0x14/0x1c
>> [<ffffffff80234b7e>] print_bad_pte+0x1b4/0x1ee
>> [<ffffffff8023854a>] unmap_page_range+0x4da/0xf74
>> [<ffffffff80239042>] unmap_single_vma.constprop.0+0x5e/0x90
>> [<ffffffff8023913a>] unmap_vmas+0xc6/0x1c4
>> [<ffffffff80244a70>] exit_mmap+0xb6/0x418
>> [<ffffffff80021dc4>] mmput+0x56/0xf2
>> [<ffffffff8002b84e>] do_exit+0x182/0x80e
>> [<ffffffff8002c02a>] do_group_exit+0x24/0x70
>> [<ffffffff8002c092>] pid_child_should_wake+0x0/0x54
>> [<ffffffff80b66112>] do_trap_ecall_u+0x29c/0x4cc
>> [<ffffffff80b747e6>] handle_exception+0x146/0x152
>> Disabling lock debugging due to kernel taint
>>
>>
>>> diff --git a/include/linux/pfn_t.h b/include/linux/pfn_t.h
>>> index 2d9148221e9a..46afa12eb33b 100644
>>> --- a/include/linux/pfn_t.h
>>> +++ b/include/linux/pfn_t.h
>>> @@ -5,26 +5,13 @@
>>>
>>> /*
>>> * PFN_FLAGS_MASK - mask of all the possible valid pfn_t flags
>>> - * PFN_SG_CHAIN - pfn is a pointer to the next scatterlist entry
>>> - * PFN_SG_LAST - pfn references a page and is the last scatterlist entry
>>> * PFN_DEV - pfn is not covered by system memmap by default
>>> - * PFN_MAP - pfn has a dynamic page mapping established by a device driver
>>> - * PFN_SPECIAL - for CONFIG_FS_DAX_LIMITED builds to allow XIP, but not
>>> - * get_user_pages
>>> */
>>> #define PFN_FLAGS_MASK (((u64) (~PAGE_MASK)) << (BITS_PER_LONG_LONG - PAGE_SHIFT))
>>> -#define PFN_SG_CHAIN (1ULL << (BITS_PER_LONG_LONG - 1))
>>> -#define PFN_SG_LAST (1ULL << (BITS_PER_LONG_LONG - 2))
>>> #define PFN_DEV (1ULL << (BITS_PER_LONG_LONG - 3))
>>> -#define PFN_MAP (1ULL << (BITS_PER_LONG_LONG - 4))
>>> -#define PFN_SPECIAL (1ULL << (BITS_PER_LONG_LONG - 5))
>>>
>>> #define PFN_FLAGS_TRACE \
>>> - { PFN_SPECIAL, "SPECIAL" }, \
>>> - { PFN_SG_CHAIN, "SG_CHAIN" }, \
>>> - { PFN_SG_LAST, "SG_LAST" }, \
>>> - { PFN_DEV, "DEV" }, \
>>> - { PFN_MAP, "MAP" }
>>> + { PFN_DEV, "DEV" }
>>>
>>> static inline pfn_t __pfn_to_pfn_t(unsigned long pfn, u64 flags)
>>> {
>>> @@ -46,7 +33,7 @@ static inline pfn_t phys_to_pfn_t(phys_addr_t addr, u64 flags)
>>>
>>> static inline bool pfn_t_has_page(pfn_t pfn)
>>> {
>>> - return (pfn.val & PFN_MAP) == PFN_MAP || (pfn.val & PFN_DEV) == 0;
>>> + return (pfn.val & PFN_DEV) == 0;
>>> }
>>>
>>> static inline unsigned long pfn_t_to_pfn(pfn_t pfn)
>>> @@ -100,7 +87,7 @@ static inline pud_t pfn_t_pud(pfn_t pfn, pgprot_t pgprot)
>>> #ifdef CONFIG_ARCH_HAS_PTE_DEVMAP
>>> static inline bool pfn_t_devmap(pfn_t pfn)
>>> {
>>> - const u64 flags = PFN_DEV|PFN_MAP;
>>> + const u64 flags = PFN_DEV;
>>>
>>> return (pfn.val & flags) == flags;
>>> }
>> The above change causes the stability issues on RISC-V based boards. To
>> get them working again with today's linux-next I had to apply the
>> following change:
>>
>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>> index 6ff7dd305639..f502860f2a76 100644
>> --- a/arch/riscv/Kconfig
>> +++ b/arch/riscv/Kconfig
>> @@ -46,7 +46,6 @@ config RISCV
>> select ARCH_HAS_PREEMPT_LAZY
>> select ARCH_HAS_PREPARE_SYNC_CORE_CMD
>> select ARCH_HAS_PTDUMP if MMU
>> - select ARCH_HAS_PTE_DEVMAP if 64BIT && MMU
>> select ARCH_HAS_PTE_SPECIAL
>> select ARCH_HAS_SET_DIRECT_MAP if MMU
>> select ARCH_HAS_SET_MEMORY if MMU
>>
>> I'm not sure if this is really the desired solution and frankly speaking
>> I don't understand the code behind the 'devmap' related bits. I can help
>> testing other patches that will fix this issue properly.
>>
>>
>>> @@ -116,16 +103,4 @@ pmd_t pmd_mkdevmap(pmd_t pmd);
>>> pud_t pud_mkdevmap(pud_t pud);
>>> #endif
>>> #endif /* CONFIG_ARCH_HAS_PTE_DEVMAP */
>>> -
>>> -#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
>>> -static inline bool pfn_t_special(pfn_t pfn)
>>> -{
>>> - return (pfn.val & PFN_SPECIAL) == PFN_SPECIAL;
>>> -}
>>> -#else
>>> -static inline bool pfn_t_special(pfn_t pfn)
>>> -{
>>> - return false;
>>> -}
>>> -#endif /* CONFIG_ARCH_HAS_PTE_SPECIAL */
>>> #endif /* _LINUX_PFN_T_H_ */
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index 49199410805c..cc85f814bc1c 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -2569,8 +2569,6 @@ static bool vm_mixed_ok(struct vm_area_struct *vma, pfn_t pfn, bool mkwrite)
>>> return true;
>>> if (pfn_t_devmap(pfn))
>>> return true;
>>> - if (pfn_t_special(pfn))
>>> - return true;
>>> if (is_zero_pfn(pfn_t_to_pfn(pfn)))
>>> return true;
>>> return false;
>>> diff --git a/tools/testing/nvdimm/test/iomap.c b/tools/testing/nvdimm/test/iomap.c
>>> index e4313726fae3..ddceb04b4a9a 100644
>>> --- a/tools/testing/nvdimm/test/iomap.c
>>> +++ b/tools/testing/nvdimm/test/iomap.c
>>> @@ -137,10 +137,6 @@ EXPORT_SYMBOL_GPL(__wrap_devm_memremap_pages);
>>>
>>> pfn_t __wrap_phys_to_pfn_t(phys_addr_t addr, unsigned long flags)
>>> {
>>> - struct nfit_test_resource *nfit_res = get_nfit_res(addr);
>>> -
>>> - if (nfit_res)
>>> - flags &= ~PFN_MAP;
>>> return phys_to_pfn_t(addr, flags);
>>> }
>>> EXPORT_SYMBOL(__wrap_phys_to_pfn_t);
>>>
>>>
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
Powered by blists - more mailing lists