lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4yRJFEUrqxO0guv2kVUgj4aDSzvjkUimg9Bkn5ADNWJNg@mail.gmail.com>
Date: Thu, 25 Jul 2024 18:22:08 +0800
From: Barry Song <21cnbao@...il.com>
To: Hailong Liu <hailong.liu@...o.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Uladzislau Rezki <urezki@...il.com>, 
	Christoph Hellwig <hch@...radead.org>, Lorenzo Stoakes <lstoakes@...il.com>, Vlastimil Babka <vbabka@...e.cz>, 
	Michal Hocko <mhocko@...e.com>, Baoquan He <bhe@...hat.com>, Matthew Wilcox <willy@...radead.org>, 
	"Tangquan . Zheng" <zhengtangquan@...o.com>, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v2] mm/vmalloc: fix incorrect __vmap_pages_range_noflush()
 if vm_area_alloc_pages() from high order fallback to order0

On Thu, Jul 25, 2024 at 5:58 PM Hailong Liu <hailong.liu@...o.com> wrote:
>
> On Thu, 25. Jul 21:34, Barry Song wrote:
> > On Thu, Jul 25, 2024 at 9:17 PM Hailong Liu <hailong.liu@...o.com> wrote:
> > >
> > > On Thu, 25. Jul 18:21, Barry Song wrote:
> > > > On Thu, Jul 25, 2024 at 3:53 PM <hailong.liu@...o.com> wrote:
> > > [snip]
> > > >
> > > > This is still incorrect because it undoes Michal's work. We also need to break
> > > > the loop if (!nofail), which you're currently omitting.
> > >
> > > IIUC, the origin issue is to fix kvcalloc with __GFP_NOFAIL return NULL.
> > > https://lore.kernel.org/all/ZAXynvdNqcI0f6Us@dhcp22.suse.cz/T/#u
> > > if we disable huge flag in kmalloc_node, the issue will be fixed.
> >
> > No, this just bypasses kvmalloc and doesn't solve the underlying issue. Problems
> > can still be triggered by vmalloc_huge() even after the bypass. Once we
> > reorganize vmap_huge to support the combination of PMD and PTE
> > mapping, we should re-enable HUGE_VMAP for kvmalloc.
> Totally agree, This will take some time to support. As in [1] I prepare to fix
> with a offset in page_private to indicate the location of fallback.
>
> >
> > I would consider dropping VM_ALLOW_HUGE_VMAP() for kvmalloc as
> > an short-term "optimization" to save memory rather than a long-term fix. This
> > 'optimization' is only valid until we reorganize HUGE_VMAP in a way
> > similar to THP. I mean, for a 2.1MB kvmalloc, we can map 2MB as PMD
> > and 0.1 as PTE.
> However this just fixed the kvmalloc_node, but for others who call
> vmalloc_huge(), the issue exits. so I remove the Michal's code. sorry for this.

My proposal was to fallback to order-0 for __GFP_NOFAIL even before
vm_area_alloc_pages() as a short-term quick "fix".

We need to meet three conditions to do HUGE_VMAP
1. vmap_allow_huge
2. vm_flags & VM_ALLOW_HUGE_VMAP
3. !__GFP_NOFAIL gfp_flags

This is because if we fallback within vm_area_alloc_pages(), the
caller still expects
vm_area_alloc_pages() to return contiguous 2MB memory. By removing this
assumption from its callers, its caller will realize
vm_area_alloc_pages() is returning
small pages. That means, vm_area gets 0 as page_order from the first
beginning if we
have __GFP_NOFAIL in gfp_flags.

Other fixes appear to require significant changes to the source code
and can't be
done quickly.


>
> >
> > > >
> > > > To avoid reverting Michal's work, the simplest "fix" would be,
> > > >
> > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > index caf032f0bd69..0011ca30df1c 100644
> > > > --- a/mm/vmalloc.c
> > > > +++ b/mm/vmalloc.c
> > > > @@ -3775,7 +3775,7 @@ void *__vmalloc_node_range_noprof(unsigned long
> > > > size, unsigned long align,
> > > >                 return NULL;
> > > >         }
> > > >
> > > > -       if (vmap_allow_huge && (vm_flags & VM_ALLOW_HUGE_VMAP)) {
> > > > +       if (vmap_allow_huge && (vm_flags & VM_ALLOW_HUGE_VMAP) &
> > > > !(gfp_mask & __GFP_NOFAIL)) {
> > > >                 unsigned long size_per_node;
> > > >
> > > >                 /*
> > > > >
> > > > > [1] https://lore.kernel.org/lkml/20240724182827.nlgdckimtg2gwns5@oppo.com/
> > > > > 2.34.1
> > > >
> > > > Thanks
> > > > Barry
> > >
> > > --
> > > help you, help me,
> > > Hailong.
>
> --
> help you, help me,
> Hailong.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ