[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b8436b35-0c39-4471-baf7-ec9a07537f9f@isc.org>
Date: Fri, 30 Aug 2024 19:00:33 +0200
From: Petr Špaček <pspacek@....org>
To: Pedro Falcato <pedro.falcato@...il.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Vlastimil Babka <vbabka@...e.cz>,
Liam Howlett <liam.howlett@...cle.com>
Subject: Re: [PATCH RFC] mm: mmap: Change DEFAULT_MAX_MAP_COUNT to INT_MAX
On 30. 08. 24 17:04, Pedro Falcato wrote:
> On Fri, Aug 30, 2024 at 04:28:33PM GMT, Petr Špaček wrote:
>> Now I understand your concern. From the docs and code comments I've seen it
>> was not clear that the limit serves _another_ purpose than mere
>> compatibility shim for old ELF tools.
>>
>>> It is a NACK, but it's a NACK because of the limit being so high.
>>>
>>> With steam I believe it is a product of how it performs allocations, and
>>> unfortunately this causes it to allocate quite a bit more than you would
>>> expect.
>>
>> FTR select non-game applications:
>>
>> ElasticSearch and OpenSearch insist on at least 262144.
>> DNS server BIND 9.18.28 linked to jemalloc 5.2.1 was observed with usage
>> around 700000.
>> OpenJDK GC sometimes weeps about values < 737280.
>> SAP docs I was able to access use 1000000.
>> MariaDB is being tested by their QA with 1048576.
>> Fedora, Ubuntu, NixOS, and Arch distros went with value 1048576.
>>
>> Is it worth sending a patch with the default raised to 1048576?
>>
>>
>>> With jemalloc() that seems strange, perhaps buggy behaviour?
>>
>> Good question. In case of BIND DNS server, jemalloc handles mmap() and we
>> keep statistics about bytes requested from malloc().
>>
>> When we hit max_map_count limit the
>> (sum of not-yet-freed malloc(size)) / (vm.max_map_count)
>> gives average size of mmaped block ~ 100 k.
>>
>> Is 100 k way too low / does it indicate a bug? It does not seem terrible to
>> me - the application is handling ~ 100-1500 B packets at rate somewhere
>> between 10-200 k packets per second so it's expected it does lots of small
>> short lived allocations.
>>
>> A complicating factor is that the process itself does not see the current
>> counter value (unless BPF is involved) so it's hard to monitor this until
>> the limit is hit.
>
> Can you get us a dump of the /proc/<pid>/maps? It'd be interesting to see how
> exactly you're hitting this.
I have immediately available only a coredump from hitting the default
limit. GDB apparently does not show these regions in "info proc
mappings", but I was able to extract section addresses from the coredump:
https://users.isc.org/~pspacek/sf1717/elf-sections.csv
Distribution of section sizes and their count in format "size,count" is
here:
https://users.isc.org/~pspacek/sf1717/sizes.csv
If you want to see some cumulative stats they are as OpenDocument here:
https://users.isc.org/~pspacek/sf1717/sizes.ods
From a quick glance it is obvious that single-page blocks eat most of
the quota.
I don't know if it is a bug or just memory fragmentation caused by a
long-running server application.
I can try to get data from production system to you next week if needed.
--
Petr Špaček
Internet Systems Consortium
Powered by blists - more mailing lists