[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240726040052.hs2gvpktrnlbvhsq@oppo.com>
Date: Fri, 26 Jul 2024 12:00:52 +0800
From: Hailong Liu <hailong.liu@...o.com>
To: Baoquan He <bhe@...hat.com>
CC: Barry Song <21cnbao@...il.com>, Andrew Morton <akpm@...ux-foundation.org>,
Uladzislau Rezki <urezki@...il.com>, Christoph Hellwig <hch@...radead.org>,
Lorenzo Stoakes <lstoakes@...il.com>, Vlastimil Babka <vbabka@...e.cz>,
Michal Hocko <mhocko@...e.com>, Matthew Wilcox <willy@...radead.org>,
Tangquan Zheng <zhengtangquan@...o.com>, <linux-mm@...ck.org>,
<linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH v2] mm/vmalloc: fix incorrect
__vmap_pages_range_noflush() if vm_area_alloc_pages() from high order
fallback to order0
On Fri, 26. Jul 10:31, Baoquan He wrote:
[...]
> > The logic of this patch is somewhat similar to my first one. If high order
> > allocation fails, it will go normal mapping.
> >
> > However I also save the fallback position. The ones before this position are
> > used for huge mapping, the ones >= position for normal mapping as Barry said.
> > "support the combination of PMD and PTE mapping". this will take some
> > times as it needs to address the corner cases and do some tests.
>
> Hmm, we may not need to worry about the imperfect mapping. Currently
> there are two places setting VM_ALLOW_HUGE_VMAP: __kvmalloc_node_noprof()
> and vmalloc_huge().
>
> For vmalloc_huge(), it's called in below three interfaces which are all
> invoked during boot. Basically they can succeed to get required contiguous
> physical memory. I guess that's why Tangquan only spot this issue on kvmalloc
> invocation when the required size exceeds e.g 2M. For kvmalloc_node(),
> we have told that in the code comment above __kvmalloc_node_noprof(),
> it's a best effort behaviour.
>
Take a __vmalloc_node_range(2.1M, VM_ALLOW_HUGE_VMAP) as a example.
because the align requirement of huge. the real size is 4M.
if allocation first order-9 successfully and the next failed. becuase the
fallback, the layout out pages would be like order9 - 512 * order0
order9 support huge mapping, but order0 not.
with the patch above, would call vmap_small_pages_range_noflush() and do normal
mapping, the huge mapping would not exist.
> mm/mm_init.c <<alloc_large_system_hash>>
> table = vmalloc_huge(size, gfp_flags);
> net/ipv4/inet_hashtables.c <<inet_pernet_hashinfo_alloc>>
> new_hashinfo->ehash = vmalloc_huge(ehash_entries * sizeof(struct inet_ehash_bucket),
> net/ipv4/udp.c <<udp_pernet_table_alloc>>
> udptable->hash = vmalloc_huge(hash_entries * 2 * sizeof(struct udp_hslot)
>
> Maybe we should add code comment or document to notice people that the
> contiguous physical pages are not guaranteed for vmalloc_huge() if you
> use it after boot.
>
> >
> > IMO, the draft can fix the current issue, it also does not have significant side
> > effects. Barry, what do you think about this patch? If you think it's okay,
> > I will split this patch into two: one to remove the VM_ALLOW_HUGE_VMAP and the
> > other to address the current mapping issue.
> >
> > --
> > help you, help me,
> > Hailong.
> >
>
>
--
help you, help me,
Hailong.
Powered by blists - more mailing lists