lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAa6QmTN2B-JAO=38A09hMtUp=srLiUfs=sDbck7Chkr=W-dCw@mail.gmail.com>
Date: Thu, 18 Jan 2024 06:58:42 -0800
From: "Zach O'Keefe" <zokeefe@...gle.com>
To: Michal Hocko <mhocko@...e.com>
Cc: Lance Yang <ioworker0@...il.com>, akpm@...ux-foundation.org, david@...hat.com, 
	songmuchun@...edance.com, shy828301@...il.com, peterx@...hat.com, 
	mknyszek@...gle.com, minchan@...nel.org, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, linux-api@...r.kernel.org
Subject: Re: [PATCH v2 1/1] mm/madvise: add MADV_F_COLLAPSE_LIGHT to process_madvise()

On Thu, Jan 18, 2024 at 5:43 AM Michal Hocko <mhocko@...e.com> wrote:
>
> Dang, forgot to cc linux-api...
>
> On Thu 18-01-24 14:40:19, Michal Hocko wrote:
> > On Thu 18-01-24 20:03:46, Lance Yang wrote:
> > [...]
> >
> > before we discuss the semantic, let's focus on the usecase.
> >
> > > Use Cases
> > >
> > > An immediate user of this new functionality is the Go runtime heap allocator
> > > that manages memory in hugepage-sized chunks. In the past, whether it was a
> > > newly allocated chunk through mmap() or a reused chunk released by
> > > madvise(MADV_DONTNEED), the allocator attempted to eagerly back memory with
> > > huge pages using madvise(MADV_HUGEPAGE)[2] and madvise(MADV_COLLAPSE)[3]
> > > respectively. However, both approaches resulted in performance issues; for
> > > both scenarios, there could be entries into direct reclaim and/or compaction,
> > > leading to unpredictable stalls[4]. Now, the allocator can confidently use
> > > process_madvise(MADV_F_COLLAPSE_LIGHT) to attempt the allocation of huge pages.

Aside: The thought was a MADV_F_COLLAPSE_LIGHT _flag_; so it'd be
process_madvise(..., MADV_COLLAPSE, MADV_F_COLLAPSE_LIGHT)

> > IIUC the primary reason is the cost of the huge page allocation which
> > can be really high if the memory is heavily fragmented and it is called
> > synchronously from the process directly, correct? Can that be worked
> > around by process_madvise and performing the operation from a different
> > context? Are there any other reasons to have a different mode?
> >
> > I mean I can think of a more relaxed (opportunistic) MADV_COLLAPSE -
> > e.g. non blocking one to make sure that the caller doesn't really block
> > on resource contention (be it locks or memory availability) because that
> > matches our non-blocking interface in other areas but having a LIGHT
> > operation sounds really vague and the exact semantic would be
> > implementation specific and might change over time. Non-blocking has a
> > clear semantic but it is not really clear whether that is what you
> > really need/want.

IIUC, usecase from Go is unbounded latency due to sync compaction in a
context where the latency is unacceptable. Working w/ them to
understand how things can be improved -- it's possible the changes can
occur entirely on their side, w/o any additional kernel support.

The non-blocking case awkwardly sits between MADV_COLLAPSE today, and
khugepaged; esp when common case is that the allocation can probably
be satisfied in fast path.

The suggestion for something like "LIGHT" was intentionally vague
because it could allow for other optimizations / changes down the
line, as you point out. I think that might be a win, vs tying to a
specific optimization (e.g. like a MADV_F_COLLAPSE_NODEFRAG). But I
could be alone on that front, given the design of
/sys/kernel/mm/transparent_hugepage.

But circling back, I agree w/ you that the first order of business is to
iron out a real usecase. As of right now, it's not clear something
like this is required or helpful.

Thanks,
Zach




> > > [1] https://github.com/torvalds/linux/commit/7d8faaf155454f8798ec56404faca29a82689c77
> > > [2] https://github.com/golang/go/commit/8fa9e3beee8b0e6baa7333740996181268b60a3a
> > > [3] https://github.com/golang/go/commit/9f9bb26880388c5bead158e9eca3be4b3a9bd2af
> > > [4] https://github.com/golang/go/issues/63334
> > >
> > > [v1] https://lore.kernel.org/lkml/20240117050217.43610-1-ioworker0@gmail.com/
> > --
> > Michal Hocko
> > SUSE Labs
>
> --
> Michal Hocko
> SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ