[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5dca8600-0352-4b5b-acb0-0cd4f84733f4@isc.org>
Date: Fri, 30 Aug 2024 16:28:33 +0200
From: Petr Špaček <pspacek@....org>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Vlastimil Babka <vbabka@...e.cz>, Liam Howlett <liam.howlett@...cle.com>
Subject: Re: [PATCH RFC] mm: mmap: Change DEFAULT_MAX_MAP_COUNT to INT_MAX
On 30. 08. 24 14:01, Lorenzo Stoakes wrote:
> On Fri, Aug 30, 2024 at 12:41:37PM GMT, Lorenzo Stoakes wrote:
>> On Fri, Aug 30, 2024 at 11:56:36AM GMT, Petr Spacek wrote:
>>> From: Petr Spacek <pspacek@....org>
>>>
>>> Raise default sysctl vm.max_map_count to INT_MAX, which effectively
>>> disables the limit for all sane purposes. The sysctl is kept around in
>>> case there is some use-case for this limit.
[snip]
>> NACK.
>
> Sorry this may have come off as more hostile than intended... we are
> welcoming of patches, promise :)
[snip]
Understood. The RFC in the subject was honest - and we are having the
discussion now, so all's good!
I also apologize for not Ccing the right people. This is my first patch
here and I'm still trying to grasp the process.
> It is only because we want to be _super_ careful about things like this
> that can have potentially problematic impact if you have a buggy program
> that allocates too many VMAs.
Now I understand your concern. From the docs and code comments I've seen
it was not clear that the limit serves _another_ purpose than mere
compatibility shim for old ELF tools.
> It is a NACK, but it's a NACK because of the limit being so high.
>
> With steam I believe it is a product of how it performs allocations, and
> unfortunately this causes it to allocate quite a bit more than you would
> expect.
FTR select non-game applications:
ElasticSearch and OpenSearch insist on at least 262144.
DNS server BIND 9.18.28 linked to jemalloc 5.2.1 was observed with usage
around 700000.
OpenJDK GC sometimes weeps about values < 737280.
SAP docs I was able to access use 1000000.
MariaDB is being tested by their QA with 1048576.
Fedora, Ubuntu, NixOS, and Arch distros went with value 1048576.
Is it worth sending a patch with the default raised to 1048576?
> With jemalloc() that seems strange, perhaps buggy behaviour?
Good question. In case of BIND DNS server, jemalloc handles mmap() and
we keep statistics about bytes requested from malloc().
When we hit max_map_count limit the
(sum of not-yet-freed malloc(size)) / (vm.max_map_count)
gives average size of mmaped block ~ 100 k.
Is 100 k way too low / does it indicate a bug? It does not seem terrible
to me - the application is handling ~ 100-1500 B packets at rate
somewhere between 10-200 k packets per second so it's expected it does
lots of small short lived allocations.
A complicating factor is that the process itself does not see the
current counter value (unless BPF is involved) so it's hard to monitor
this until the limit is hit.
> It may be reasonable to adjust the default limit higher, and I'm not
> opposed to that, but it might be tricky to find a level that is sensible
> across all arches including ones with significantly smaller memory
> availability.
Hmm... Thinking aloud:
Are VMA sizes included in cgroup v2 memory accounting? Maybe the safety
limit can be handled there?
If sizing based on available memory is a concern then a fixed value is
probably already wrong? I mean, current boxes range from dozen MB to 512
GB of RAM.
For a box with 16 MB of RAM we get ~ 16M/(sizeof ~ 184) = 91 180 VMAs to
fill RAM, and the current limit is 65 530 _per process_.
Threat model which allows attacker to attacker mmap() but not fork()
seems theoretical to me. I.e. an insane (or rogue) application can eat up to
(max # of processes) * (max_map_count) * (sizeof VMA)
bytes of memory, not just
max_map_count * (sizeof VMA)
we were talking about before.
Apologies for having more questions than answers. I'm trying to
understand what purpose the limit serves and if we can improve user
experience.
Thank you for patience and have a great weekend!
--
Petr Špaček
Internet Systems Consortium
Powered by blists - more mailing lists