lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <43ryds7hzhs5bpaxznco7fppmakdb4f46agwtsc5erudqfoz2x@7y4jgbtft7jj>
Date: Thu, 4 Sep 2025 13:22:51 -0400
From: "Liam R. Howlett" <Liam.Howlett@...cle.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Kalesh Singh <kaleshsingh@...gle.com>, akpm@...ux-foundation.org,
        minchan@...nel.org, kernel-team@...roid.com, android-mm@...gle.com,
        David Hildenbrand <david@...hat.com>, Vlastimil Babka <vbabka@...e.cz>,
        Mike Rapoport <rppt@...nel.org>,
        Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
        Jann Horn <jannh@...gle.com>, Pedro Falcato <pfalcato@...e.de>,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: centralize and fix max map count limit checking

* Lorenzo Stoakes <lorenzo.stoakes@...cle.com> [250904 12:02]:
> On Wed, Sep 03, 2025 at 04:24:35PM -0700, Kalesh Singh wrote:
> > The check against the max map count (sysctl_max_map_count) was
> > open-coded in several places. This led to inconsistent enforcement
> > and subtle bugs where the limit could be exceeded.
> >
> > For example, some paths would check map_count > sysctl_max_map_count
> > before allocating a new VMA and incrementing the count, allowing the
> > process to reach sysctl_max_map_count + 1:
> >
> >     int do_brk_flags(...)
> >     {
> >         if (mm->map_count > sysctl_max_map_count)
> >             return -ENOMEM;
> >
> >         /* We can get here with mm->map_count == sysctl_max_map_count */
> >
> >         vma = vm_area_alloc(mm);
> >         ...
> >         mm->map_count++   /* We've now exceeded the threshold. */
> >     }
> >
> > To fix this and unify the logic, introduce a new function,
> > exceeds_max_map_count(), to consolidate the check. All open-coded
> > checks are replaced with calls to this new function, ensuring the
> > limit is applied uniformly and correctly.
> >
> > To improve encapsulation, sysctl_max_map_count is now static to
> > mm/mmap.c. The new helper also adds a rate-limited warning to make
> > debugging applications that exhaust their VMA limit easier.
> >
> 
> Yeah this is nice thanks!
> 
> > Cc: Andrew Morton <akpm@...ux-foundation.org>
> > Cc: Minchan Kim <minchan@...nel.org>
> > Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> > Signed-off-by: Kalesh Singh <kaleshsingh@...gle.com>
> > ---
> >  include/linux/mm.h | 11 ++++++++++-
> >  mm/mmap.c          | 15 ++++++++++++++-
> >  mm/mremap.c        |  7 ++++---
> >  mm/nommu.c         |  2 +-
> >  mm/util.c          |  1 -
> >  mm/vma.c           |  6 +++---
> >  6 files changed, 32 insertions(+), 10 deletions(-)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 1ae97a0b8ec7..d4e64e6a9814 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -192,7 +192,16 @@ static inline void __mm_zero_struct_page(struct page *page)
> >  #define MAPCOUNT_ELF_CORE_MARGIN	(5)
> >  #define DEFAULT_MAX_MAP_COUNT	(USHRT_MAX - MAPCOUNT_ELF_CORE_MARGIN)
> >
> > -extern int sysctl_max_map_count;
> > +/**
> > + * exceeds_max_map_count - check if a VMA operation would exceed max_map_count
> > + * @mm: The memory descriptor for the process.
> > + * @new_vmas: The number of new VMAs the operation will create.
> > + *
> > + * Returns true if the operation would cause the number of VMAs to exceed
> > + * the sysctl_max_map_count limit, false otherwise. A rate-limited warning
> > + * is logged if the limit is exceeded.
> > + */
> > +extern bool exceeds_max_map_count(struct mm_struct *mm, unsigned int new_vmas);
> 
> No new externs please (as Pedro also pointed out! :)
> 
> >
> >  extern unsigned long sysctl_user_reserve_kbytes;
> >  extern unsigned long sysctl_admin_reserve_kbytes;
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 7306253cc3b5..693a0105e6a5 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -374,7 +374,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> >  		return -EOVERFLOW;
> >
> >  	/* Too many mappings? */
> > -	if (mm->map_count > sysctl_max_map_count)
> > +	if (exceeds_max_map_count(mm, 0))
> 
> Yeah this last 'mystery meat' parameter isn't amazing though to be honest, it's
> kinda hard to read/understand.
> 
> E.g.:
> 
> int map_count_capacity(const struct *mm)
> {
> 	const int map_count = mm->map_count;
> 	const int max_count = sysctl_max_map_count;
> 
> 	return max_count > map_count ? max_count - map_count : 0;
> }
> 
> Then this would become;
> 
> 	if (!map_count_capacity(mm)) {
> 		// blah.
> 	}
> 
> 
> >  		return -ENOMEM;
> >
> >  	/*
> > @@ -1504,6 +1504,19 @@ struct vm_area_struct *_install_special_mapping(
> >  int sysctl_legacy_va_layout;
> >  #endif
> >
> > +static int sysctl_max_map_count __read_mostly = DEFAULT_MAX_MAP_COUNT;
> > +
> > +bool exceeds_max_map_count(struct mm_struct *mm, unsigned int new_vmas)
> > +{
> > +	if (unlikely(mm->map_count + new_vmas > sysctl_max_map_count)) {
> > +		pr_warn_ratelimited("%s (%d): Map count limit %u exceeded\n",
> > +				    current->comm, current->pid,
> > +				    sysctl_max_map_count);
> 
> Yeah not a fan of this, we aren't warning elsewhere, it's actually perfectly
> normal for somebody to exceed this, this isn't at the level of a warning.
> 
> > +		return true;
> > +	}
> > +	return false;
> > +}
> > +
> >  static const struct ctl_table mmap_table[] = {
> >  		{
> >  				.procname       = "max_map_count",
> > diff --git a/mm/mremap.c b/mm/mremap.c
> > index e618a706aff5..793fad58302c 100644
> > --- a/mm/mremap.c
> > +++ b/mm/mremap.c
> > @@ -1040,7 +1040,7 @@ static unsigned long prep_move_vma(struct vma_remap_struct *vrm)
> >  	 * We'd prefer to avoid failure later on in do_munmap:
> >  	 * which may split one vma into three before unmapping.
> >  	 */
> > -	if (current->mm->map_count >= sysctl_max_map_count - 3)
> > +	if (exceeds_max_map_count(current->mm, 4))
> >  		return -ENOMEM;
> 
> In my version this would be:
> 
> 	if (map_count_capacity(current->mm) < 4)
> 		return -ENOMEM;
> 

Someone could write map_count_capacity(current->mm) <= 4 and reintroduce
what this is trying to solve.  And with the way it is written in this
patch, someone could pass in the wrong number.

I'm not sure this is worth doing.  There are places we allow the count
to go higher.

Certainly fix the brk < to be <= and any other calculations, but the
rest seem okay as-is to me.  The only real way to be sure we don't cause
a bug in the future is to have better testing.

Thanks,
Liam

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ