[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9a8254b9-92f8-4530-88e8-fca3b7465908@redhat.com>
Date: Wed, 29 Oct 2025 13:15:44 +0100
From: David Hildenbrand <david@...hat.com>
To: Heiko Carstens <hca@...ux.ibm.com>
Cc: Luiz Capitulino <luizcap@...hat.com>, borntraeger@...ux.ibm.com,
 joao.m.martins@...cle.com, mike.kravetz@...cle.com,
 linux-kernel@...r.kernel.org, linux-mm@...ck.org,
 linux-s390@...r.kernel.org, gor@...ux.ibm.com,
 gerald.schaefer@...ux.ibm.com, agordeev@...ux.ibm.com, osalvador@...e.de,
 akpm@...ux-foundation.org, aneesh.kumar@...nel.org
Subject: Re: [PATCH v2] s390: fix HugeTLB vmemmap optimization crash
On 29.10.25 11:44, Heiko Carstens wrote:
> On Wed, Oct 29, 2025 at 10:57:15AM +0100, David Hildenbrand wrote:
>> On 28.10.25 22:15, Luiz Capitulino wrote:
>>> A reproducible crash occurs when enabling HugeTLB vmemmap optimization (HVO)
>>> on s390. The crash and the proposed fix were worked on an s390 KVM guest
>>> running on an older hypervisor, as I don't have access to an LPAR. However,
>>> the same issue should occur on bare-metal.
> ...
>>> This commit fixes this by implementing flush_tlb_all() on s390 as an
>>> alias to __tlb_flush_global(). This should cause a flush on all TLB
>>> entries on all CPUs as expected by the flush_tlb_all() semantics.
>>>
>>> Fixes: f13b83fdd996 ("hugetlb: batch TLB flushes when freeing vmemmap")
>>> Signed-off-by: Luiz Capitulino <luizcap@...hat.com>
>>> ---
>>
>> Nice finding!
>>
>> Makes me wonder whether the default flush_tlb_all() should actually map to a
>> BUILD_BUG(), such that we don't silently not-flush on archs that don't
>> implement it.
> 
> Which default flush_tlb_all()? :)
What I meant is: all such functions that an architecture doesn't expect 
to be called because they are effectively unimplemented.
Taking a look at flush_tlb_all(), there is really only a dummy 
implementation on s390x and on riscv without MMU.
So yeah, there is no "default" fallback one :)
BTW, I'm staring at s390x's flush_tlb() function and wonder why that one 
is defined. I'm sure there is a good reason ;)
> 
> There was a no-op implementation for s390, and besides drivers/xen/balloon.c
> there is only mm/hugetlb_vmemmap.c in common code which makes use of this. To
> me it looks like both call sites only need to flush TLB entries of the kernel
> address space. So I'd rather prefer if flush_tlb_all() would die instead.
I'd assume that we only modify the kernel virtual address space, so I agree.
> 
> But I'm also wondering about the correctness of the whole thing even with this
> patch. If I'm not mistaken then vmemmap_split_pmd() changes an active pmd
> entry of the kernel mapping. That is: an active leaf entry (aka large page) is
> changed to an active entry pointing to a page table.
That's my understanding as well.
> 
> Changing active entries without the detour over an invalid entry or using
> proper instructions like crdte or cspg is not allowed on s390. This was solved
> for other parts that change active entries of the kernel mapping in an
> architecture compliant way for s390 (see arch/s390/mm/pageattr.c).
Good point. I recall ARM64 has similar break-before-make requirements 
because they cannot tolerate two different TLB entries (small vs. large) 
for the same virtual address.
And if I rememebr correctly, that's the reason why arm64 does not enable 
ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP just yet.
-- 
Cheers
David / dhildenb
Powered by blists - more mailing lists
 
