lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <af0618c0-03c5-9133-bb14-db8ddb72b8de@google.com>
Date: Wed, 15 Oct 2025 22:05:20 -0700 (PDT)
From: Hugh Dickins <hughd@...gle.com>
To: Kalesh Singh <kaleshsingh@...gle.com>
cc: Hugh Dickins <hughd@...gle.com>, akpm@...ux-foundation.org, 
    minchan@...nel.org, lorenzo.stoakes@...cle.com, david@...hat.com, 
    Liam.Howlett@...cle.com, rppt@...nel.org, pfalcato@...e.de, 
    kernel-team@...roid.com, android-mm@...gle.com, stable@...r.kernel.org, 
    SeongJae Park <sj@...nel.org>, Alexander Viro <viro@...iv.linux.org.uk>, 
    Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>, 
    Kees Cook <kees@...nel.org>, Vlastimil Babka <vbabka@...e.cz>, 
    Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>, 
    Jann Horn <jannh@...gle.com>, Steven Rostedt <rostedt@...dmis.org>, 
    Masami Hiramatsu <mhiramat@...nel.org>, 
    Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, 
    Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, 
    Juri Lelli <juri.lelli@...hat.com>, 
    Vincent Guittot <vincent.guittot@...aro.org>, 
    Dietmar Eggemann <dietmar.eggemann@....com>, 
    Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
    Valentin Schneider <vschneid@...hat.com>, Shuah Khan <shuah@...nel.org>, 
    linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org, 
    linux-mm@...ck.org, linux-trace-kernel@...r.kernel.org, 
    linux-kselftest@...r.kernel.org
Subject: Re: [PATCH v3 1/5] mm: fix off-by-one error in VMA count limit
 checks

On Tue, 14 Oct 2025, Kalesh Singh wrote:
> On Mon, Oct 13, 2025 at 11:28 PM Hugh Dickins <hughd@...gle.com> wrote:
> >
> > Sorry for letting you go so far before speaking up (I had to test what
> > I believed to be true, and had hoped that meanwhile one of your many
> > illustrious reviewers would say so first, but no): it's a NAK from me.
> >
> > These are not off-by-ones: at the point of these checks, it is not
> > known whether an additional map/vma will have to be added, or the
> > addition will be merged into an existing map/vma.  So the checks
> > err on the lenient side, letting you get perhaps one more than the
> > sysctl said, but not allowing any more than that.
> >
> > Which is all that matters, isn't it? Limiting unrestrained growth.
> >
> > In this patch you're proposing to change it from erring on the
> > lenient side to erring on the strict side - prohibiting merges
> > at the limit which have been allowed for many years.
> >
> > Whatever one thinks about the merits of erring on the lenient versus
> > erring on the strict side, I see no reason to make this change now,
> > and most certainly not with a Fixes Cc: stable. There is no danger
> > in the current behaviour; there is danger in prohibiting what was
> > allowed before.
> >
> > As to the remainder of your series: I have to commend you for doing
> > a thorough and well-presented job, but I cannot myself see the point in
> > changing 21 files for what almost amounts to a max_map_count subsystem.
> > I call it misdirected effort, not at all to my taste, which prefers the
> > straightforward checks already there; but accept that my taste may be
> > out of fashion, so won't stand in the way if others think it worthwhile.
> 
> Hi Hugh,
> 
> Thanks for the detailed review and for taking the time to test the behavior.
> 
> You've raised a valid point. I wasn't aware of the history behind the
> lenient check for merges. The lack of a comment, like the one that
> exists for exceeding the limit in munmap(), led me to misinterpret
> this as an off-by-one bug. The convention makes sense if we consider
> potential merges.

Yes, a comment there would be helpful (and I doubt it's worth more
than adding a comment); but I did not understand at all, Liam's
suggestion for the comment "to state that the count may not change".

> 
> If it was in-fact the intended behavior, then I agree we should keep
> it lenient. It would mean though, that munmap() being able to free a
> VMA if a split is required (by permitting exceeding the limit by 1)
> would not work in the case where we have already exceeded the limit. I
> find this to be inconsistent but this is also the current behavior ...

You're saying that once we go one over the limit, say with a new mmap,
an munmap check makes it impossible to munmap that or any other vma?

If that's so, I do agree with you, that's nasty, and I would hate any
new code to behave that way.  In code that's survived as long as this
without troubling anyone, I'm not so sure: but if it's easily fixed
(a more lenient check at the munmap end?) that would seem worthwhile.

Ah, but reading again, you say "if a split is required": I guess
munmapping the whole vma has no problem; and it's fine for a middle
munmap, splitting into three before munmapping the middle, to fail.
I suppose it would be nicer if munmaping start or end succeeeded,
but I don't think that matters very much in this case.

> 
> I will drop this patch and the patch that introduces the
> vma_count_remaining() helper, as I see your point about it potentially
> being unnecessary overhead.
> 
> Regarding your feedback on the rest of the series, I believe the 3
> remaining patches are still valuable on their own.
> 
>  - The selftest adds a comprehensive tests for VMA operations at the
> sysctl_max_map_count limit. This will self-document the exact behavior
> expected, including the leniency for potential merges that you
> highlighted, preventing the kind of misunderstanding that led to my
> initial patch.
> 
>  - The rename of mm_struct->map_count to vma_count, is a
> straightforward cleanup for code clarity that makes the purpose of the
> field more explicit.
> 
>  - The tracepoint adds needed observability for telemetry, allowing us
> to see when processes are failing in the field due to VMA count limit.
> 
> The  selftest, is what  makes up a large portion of the diff you
> sited, and with vma_count_remaining() gone the series will not touch
> nearly as many files.
> 
> Would this be an acceptable path forward?

Possibly, if others like it: my concern was to end a misunderstanding
(I'm generally much too slow to get involved in cleanups).

Though given that the sysctl is named "max_map_count", I'm not very
keen on renaming everything else from map_count to vma_count
(and of course I'm not suggesting to rename the sysctl).

Hugh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ