linux-kernel - Re: [PATCH v2 1/1] mm/madvise: add MADV_F_COLLAPSE_LIGHT to process

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZakqQyL9t2ffNUIf@tiehlicka>
Date: Thu, 18 Jan 2024 14:40:19 +0100
From: Michal Hocko <mhocko@...e.com>
To: Lance Yang <ioworker0@...il.com>
Cc: akpm@...ux-foundation.org, zokeefe@...gle.com, david@...hat.com,
	songmuchun@...edance.com, shy828301@...il.com, peterx@...hat.com,
	mknyszek@...gle.com, minchan@...nel.org, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/1] mm/madvise: add MADV_F_COLLAPSE_LIGHT to
 process_madvise()

On Thu 18-01-24 20:03:46, Lance Yang wrote:
[...]

before we discuss the semantic, let's focus on the usecase.

> Use Cases
> 
> An immediate user of this new functionality is the Go runtime heap allocator
> that manages memory in hugepage-sized chunks. In the past, whether it was a
> newly allocated chunk through mmap() or a reused chunk released by
> madvise(MADV_DONTNEED), the allocator attempted to eagerly back memory with
> huge pages using madvise(MADV_HUGEPAGE)[2] and madvise(MADV_COLLAPSE)[3]
> respectively. However, both approaches resulted in performance issues; for
> both scenarios, there could be entries into direct reclaim and/or compaction,
> leading to unpredictable stalls[4]. Now, the allocator can confidently use
> process_madvise(MADV_F_COLLAPSE_LIGHT) to attempt the allocation of huge pages.

IIUC the primary reason is the cost of the huge page allocation which
can be really high if the memory is heavily fragmented and it is called
synchronously from the process directly, correct? Can that be worked
around by process_madvise and performing the operation from a different
context? Are there any other reasons to have a different mode?

I mean I can think of a more relaxed (opportunistic) MADV_COLLAPSE -
e.g. non blocking one to make sure that the caller doesn't really block
on resource contention (be it locks or memory availability) because that
matches our non-blocking interface in other areas but having a LIGHT
operation sounds really vague and the exact semantic would be
implementation specific and might change over time. Non-blocking has a
clear semantic but it is not really clear whether that is what you
really need/want.

> [1] https://github.com/torvalds/linux/commit/7d8faaf155454f8798ec56404faca29a82689c77
> [2] https://github.com/golang/go/commit/8fa9e3beee8b0e6baa7333740996181268b60a3a
> [3] https://github.com/golang/go/commit/9f9bb26880388c5bead158e9eca3be4b3a9bd2af
> [4] https://github.com/golang/go/issues/63334
> 
> [v1] https://lore.kernel.org/lkml/20240117050217.43610-1-ioworker0@gmail.com/
-- 
Michal Hocko
SUSE Labs