lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPhsuW4A8H00MV=tT9Gs_mtjn7o9KOzPjhwhdWDJMbY_9=9RrQ@mail.gmail.com>
Date:   Thu, 18 May 2023 13:51:08 -0700
From:   Song Liu <song@...nel.org>
To:     Kent Overstreet <kent.overstreet@...ux.dev>
Cc:     Mike Rapoport <rppt@...nel.org>, linux-mm@...ck.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Rick Edgecombe <rick.p.edgecombe@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Vlastimil Babka <vbabka@...e.cz>, linux-kernel@...r.kernel.org,
        x86@...nel.org
Subject: Re: [RFC PATCH 1/5] mm: intorduce __GFP_UNMAPPED and unmapped_alloc()

On Thu, May 18, 2023 at 12:15 PM Kent Overstreet
<kent.overstreet@...ux.dev> wrote:
>
> On Thu, May 18, 2023 at 12:03:03PM -0700, Song Liu wrote:
> > On Thu, May 18, 2023 at 11:47 AM Song Liu <song@...nel.org> wrote:
> > >
> > > On Thu, May 18, 2023 at 10:24 AM Kent Overstreet
> > > <kent.overstreet@...ux.dev> wrote:
> > > >
> > > > On Thu, May 18, 2023 at 10:00:39AM -0700, Song Liu wrote:
> > > > > On Thu, May 18, 2023 at 9:48 AM Kent Overstreet
> > > > > <kent.overstreet@...ux.dev> wrote:
> > > > > >
> > > > > > On Thu, May 18, 2023 at 09:33:20AM -0700, Song Liu wrote:
> > > > > > > I am working on patches based on the discussion in [1]. I am planning to
> > > > > > > send v1 for review in a week or so.
> > > > > >
> > > > > > Hey Song, I was reviewing that thread too,
> > > > > >
> > > > > > Are you taking a different approach based on Thomas's feedback? I think
> > > > > > he had some fair points in that thread.
> > > > >
> > > > > Yes, the API is based on Thomas's suggestion, like 90% from the discussions.
> > > > >
> > > > > >
> > > > > > My own feeling is that the buddy allocator is our tool for allocating
> > > > > > larger variable sized physically contiguous allocations, so I'd like to
> > > > > > see something based on that - I think we could do a hybrid buddy/slab
> > > > > > allocator approach, like we have for regular memory allocations.
> > > > >
> > > > > I am planning to implement the allocator based on this (reuse
> > > > > vmap_area logic):
> > > >
> > > > Ah, you're still doing vmap_area approach.
> > > >
> > > > Mike's approach looks like it'll be _much_ lighter weight and higher
> > > > performance, to me. vmalloc is known to be slow compared to the buddy
> > > > allocator, and with Mike's approach we're only modifying mappings once
> > > > per 2 MB chunk.
> > > >
> > > > I don't see anything in your code for sub-page sized allocations too, so
> > > > perhaps I should keep going with my slab allocator.
> > >
> > > The vmap_area approach handles sub-page allocations. In 5/5 of set [2],
> > > we showed that multiple BPF programs share the same page with some
> > > kernel text (_etext).
> > >
> > > > Could you share your thoughts on your approach vs. Mike's? I'm newer to
> > > > this area of the code than you two so maybe there's an angle I've missed
> > > > :)
> > >
> > > AFAICT, tree based solution (vmap_area) is more efficient than bitmap
> > > based solution.
>
> Tree based requires quite a bit of overhead for the rbtree pointers, and
> additional vmap_area structs.
>
> With a buddy allocator based approach, there's no additional state that
> needs to be allocated, since it all fits in struct page.

To allocate memory for text, we will allocate 2MiB, make it ROX, and then
use it for many small allocations. IIUC, buddy allocator will use unallocated
parts of this page for metadata. I guess this may be a problem, as the
whole page is ROX now, and we have to use text_poke to write to it.

OTOH, if we allocate extra memory for metadata (tree based solution),
all the metadata operations can be regular read/write.

Thanks,
Song

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ