[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <90f07fec-3f46-4b38-86fd-07c9f8201904@lucifer.local>
Date: Fri, 30 Aug 2024 12:41:37 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Petr Spacek <pspacek@....org>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Vlastimil Babka <vbabka@...e.cz>,
Liam Howlett <liam.howlett@...cle.com>
Subject: Re: [PATCH RFC] mm: mmap: Change DEFAULT_MAX_MAP_COUNT to INT_MAX
On Fri, Aug 30, 2024 at 11:56:36AM GMT, Petr Spacek wrote:
> From: Petr Spacek <pspacek@....org>
>
> Raise default sysctl vm.max_map_count to INT_MAX, which effectively
> disables the limit for all sane purposes. The sysctl is kept around in
> case there is some use-case for this limit.
>
> The old default value of vm.max_map_count=65530 provided compatibility
> with ELF format predating year 2000 and with binutils predating 2010. At
> the same time the old default caused issues with applications deployed
> in 2024.
>
> State since 2012: Linux 3.2.0 correctly generates coredump from a
> process with 100 000 mmapped files. GDB 7.4.1, binutils 2.22 work with
> this coredump fine and can actually read data from the mmaped addresses.
>
> Signed-off-by: Petr Spacek <pspacek@....org>
NACK.
> ---
>
> Downstream distributions started to override the default a while ago.
> Individual distributions are summarized at the end of this message:
> https://lists.archlinux.org/archives/list/arch-dev-public@lists.archlinux.org/thread/5GU7ZUFI25T2IRXIQ62YYERQKIPE3U6E/
Did they change them to 2.14 billion?
>
> Please note it's not only games in emulator which hit this default
> limit. Larger instances of server applications are also suffering from
> this. Couple examples here:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2057792/comments/24
>
> SAP documentation behind paywall also mentions this limit:
> https://service.sap.com/sap/support/notes/2002167
>
> And finally, it is also an issue for BIND DNS server compiled against
> jemalloc, which is what brought me here.
>
> System V gABI draft dated 2000-07-17 already extended the ELF numbering:
> https://www.sco.com/developers/gabi/2000-07-17/ch4.sheader.html
>
> binutils support is in commit ecd12bc14d85421fcf992cda5af1d534cc8736e0
> dated 2010-01-19. IIUC this goes a bit beyond what is described in the
> gABI document and extends ELF's e_phnum.
>
> Linux coredumper support is in commit
> 8d9032bbe4671dc481261ccd4e161cd96e54b118 dated 2010-03-06.
>
> As mentioned above, this all works for the last 12 years and the
> conservative limit seems to do more harm than good.
>
> include/linux/mm.h | 21 +++++++++------------
> 1 file changed, 9 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 6549d0979..3e1ed3b80 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -178,22 +178,19 @@ static inline void __mm_zero_struct_page(struct page *page)
>
> /*
> * Default maximum number of active map areas, this limits the number of vmas
> - * per mm struct. Users can overwrite this number by sysctl but there is a
> - * problem.
> + * per mm struct. Users can overwrite this number by sysctl. Historically
> + * this limit was a compatibility measure for ELF format predating year 2000.
> *
> * When a program's coredump is generated as ELF format, a section is created
> - * per a vma. In ELF, the number of sections is represented in unsigned short.
> - * This means the number of sections should be smaller than 65535 at coredump.
> - * Because the kernel adds some informative sections to a image of program at
> - * generating coredump, we need some margin. The number of extra sections is
> - * 1-3 now and depends on arch. We use "5" as safe margin, here.
> + * per a vma. In ELF before year 2000, the number of sections was represented
> + * as unsigned short e_shnum. This means the number of sections should be
> + * smaller than 65535 at coredump.
> *
> - * ELF extended numbering allows more than 65535 sections, so 16-bit bound is
> - * not a hard limit any more. Although some userspace tools can be surprised by
> - * that.
> + * ELF extended numbering was added into System V gABI spec around 2000.
> + * It allows more than 65535 sections, so 16-bit bound is not a hard limit any
> + * more.
> */
> -#define MAPCOUNT_ELF_CORE_MARGIN (5)
> -#define DEFAULT_MAX_MAP_COUNT (USHRT_MAX - MAPCOUNT_ELF_CORE_MARGIN)
> +#define DEFAULT_MAX_MAP_COUNT INT_MAX
NACK, you can't abitrarily change an established limit like this.
Also VMAs have a non-zero size. On my system, 184 bytes. So your change allows
for ~395 GiB to be assigned to VMAs. Does that seem reasonable?
It _might_ be sensible to increase the minimum, not to INT_MAX.
Also note that you _can_ change this limit, it's a tunable. It's not egregious
to you know, change a tunable.
Also please cc- the MEMORY MAPPING reviewers for changes like this. It wasn't
obvious because include/linux/mm.h isn't included in the MAINTAINERS block but
that's me, Liam and Vlastimil, cc'd now.
>
> extern int sysctl_max_map_count;
>
>
> base-commit: d5d547aa7b51467b15d9caa86b116f8c2507c72a
> --
> 2.46.0
>
>
Powered by blists - more mailing lists