[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251029104457.8393B96-hca@linux.ibm.com>
Date: Wed, 29 Oct 2025 11:44:57 +0100
From: Heiko Carstens <hca@...ux.ibm.com>
To: David Hildenbrand <david@...hat.com>
Cc: Luiz Capitulino <luizcap@...hat.com>, borntraeger@...ux.ibm.com,
        joao.m.martins@...cle.com, mike.kravetz@...cle.com,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        linux-s390@...r.kernel.org, gor@...ux.ibm.com,
        gerald.schaefer@...ux.ibm.com, agordeev@...ux.ibm.com,
        osalvador@...e.de, akpm@...ux-foundation.org, aneesh.kumar@...nel.org
Subject: Re: [PATCH v2] s390: fix HugeTLB vmemmap optimization crash
On Wed, Oct 29, 2025 at 10:57:15AM +0100, David Hildenbrand wrote:
> On 28.10.25 22:15, Luiz Capitulino wrote:
> > A reproducible crash occurs when enabling HugeTLB vmemmap optimization (HVO)
> > on s390. The crash and the proposed fix were worked on an s390 KVM guest
> > running on an older hypervisor, as I don't have access to an LPAR. However,
> > the same issue should occur on bare-metal.
...
> > This commit fixes this by implementing flush_tlb_all() on s390 as an
> > alias to __tlb_flush_global(). This should cause a flush on all TLB
> > entries on all CPUs as expected by the flush_tlb_all() semantics.
> > 
> > Fixes: f13b83fdd996 ("hugetlb: batch TLB flushes when freeing vmemmap")
> > Signed-off-by: Luiz Capitulino <luizcap@...hat.com>
> > ---
> 
> Nice finding!
> 
> Makes me wonder whether the default flush_tlb_all() should actually map to a
> BUILD_BUG(), such that we don't silently not-flush on archs that don't
> implement it.
Which default flush_tlb_all()? :)
There was a no-op implementation for s390, and besides drivers/xen/balloon.c
there is only mm/hugetlb_vmemmap.c in common code which makes use of this. To
me it looks like both call sites only need to flush TLB entries of the kernel
address space. So I'd rather prefer if flush_tlb_all() would die instead.
But I'm also wondering about the correctness of the whole thing even with this
patch. If I'm not mistaken then vmemmap_split_pmd() changes an active pmd
entry of the kernel mapping. That is: an active leaf entry (aka large page) is
changed to an active entry pointing to a page table.
Changing active entries without the detour over an invalid entry or using
proper instructions like crdte or cspg is not allowed on s390. This was solved
for other parts that change active entries of the kernel mapping in an
architecture compliant way for s390 (see arch/s390/mm/pageattr.c).
Am I missing something?
Gerald, since you enabled the corresponding Kconfig option for s390: is there
any reason why this should work in an architecture compliant way?
Powered by blists - more mailing lists
 
