[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpFkwZAw-qcD6E5SchHNXf8MmzyWtuWaHOpFhid3m5bg8A@mail.gmail.com>
Date: Tue, 22 Feb 2022 19:02:08 -0800
From: Suren Baghdasaryan <surenb@...gle.com>
To: Michal Hocko <mhocko@...e.com>
Cc: akpm@...ux-foundation.org, ccross@...gle.com,
sumit.semwal@...aro.org, dave.hansen@...el.com,
keescook@...omium.org, willy@...radead.org,
kirill.shutemov@...ux.intel.com, vbabka@...e.cz,
hannes@...xchg.org, ebiederm@...ssion.com, brauner@...nel.org,
legion@...nel.org, ran.xiaokai@....com.cn, sashal@...nel.org,
chris.hyser@...cle.com, dave@...olabs.net, pcc@...gle.com,
caoxiaofeng@...ong.com, david@...hat.com, gorcunov@...il.com,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
kernel-team@...roid.com
Subject: Re: [PATCH 2/3] mm: prevent vm_area_struct::anon_name refcount saturation
On Tue, Feb 22, 2022 at 7:56 AM Suren Baghdasaryan <surenb@...gle.com> wrote:
>
> On Tue, Feb 22, 2022 at 1:17 AM Michal Hocko <mhocko@...e.com> wrote:
> >
> > On Mon 21-02-22 21:40:24, Suren Baghdasaryan wrote:
> > > A deep process chain with many vmas could grow really high.
> >
> > This would really benefit from some numbers. With default
> > sysctl_max_map_count (64k) and default pid_max (32k) the INT_MAX could
> > be theoretically reached but I find it impractical because not all vmas
> > can be anonymous same as all available pids can be consumed for a
> > theoretical attack (if my counting is proper).
> > On the other hand any non-default configuration with any of the values
> > increased could hit this theoretically.
>
> re: This would really benefit from some numbers
> Should I just add the details you provided above into the description?
> Would that suffice?
Hmm. According to the defaults you posted, with max number of
processes being 32k and max number of vmas per process 64k, the max
number of vmas in the system is 2147450880. That's 32767 less than
REFCOUNT_MAX=INT_MAX (2147483647) and 1073774592 less than
REFCOUNT_SATURATED (3221225472). So with those defaults we should
never hit these limits. Are we adding this protection for systems that
set non-default higher limits or am I miscalculating something?
>
> >
> > > kref
> > > refcounting interface used in anon_vma_name structure will detect
> > > a counter overflow when it reaches REFCOUNT_SATURATED value but will
> > > only generate a warning about broken refcounting.
> > > To ensure anon_vma_name refcount does not overflow, stop anon_vma_name
> > > sharing when the refcount reaches INT_MAX, which still leaves INT_MAX/2
> > > values before the counter reaches REFCOUNT_SATURATED. This should provide
> > > enough headroom for raising the refcounts temporarily.
> > >
> > > Suggested-by: Michal Hocko <mhocko@...e.com>
> > > Signed-off-by: Suren Baghdasaryan <surenb@...gle.com>
> > > ---
> > > include/linux/mm_inline.h | 18 ++++++++++++++----
> > > mm/madvise.c | 3 +--
> > > 2 files changed, 15 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
> > > index 70b619442d56..b189e2638843 100644
> > > --- a/include/linux/mm_inline.h
> > > +++ b/include/linux/mm_inline.h
> > > @@ -156,15 +156,25 @@ static inline void anon_vma_name_get(struct anon_vma_name *anon_name)
> > >
> > > extern void anon_vma_name_put(struct anon_vma_name *anon_name);
> > >
> > > +static inline
> > > +struct anon_vma_name *anon_vma_name_reuse(struct anon_vma_name *anon_name)
> > > +{
> > > + /* Prevent anon_name refcount saturation early on */
> > > + if (kref_read(&anon_name->kref) < INT_MAX) {
> >
> > REFCOUNT_MAX seems to be defined by the kref framework.
>
> Ah, indeed. I missed that. Will change to use it.
>
> >
> > Other than that looks good to me.
>
> Thanks for the review!
>
> >
> > > + anon_vma_name_get(anon_name);
> > > + return anon_name;
> > > +
> > > + }
> > > + return anon_vma_name_alloc(anon_name->name);
> > > +}
> > > +
> > > static inline void dup_vma_anon_name(struct vm_area_struct *orig_vma,
> > > struct vm_area_struct *new_vma)
> > > {
> > > struct anon_vma_name *anon_name = vma_anon_name(orig_vma);
> > >
> > > - if (anon_name) {
> > > - anon_vma_name_get(anon_name);
> > > - new_vma->anon_name = anon_name;
> > > - }
> > > + if (anon_name)
> > > + new_vma->anon_name = anon_vma_name_reuse(anon_name);
> > > }
> > >
> > > static inline void free_vma_anon_name(struct vm_area_struct *vma)
> > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > index f81d62d8ce9b..a395884aeecb 100644
> > > --- a/mm/madvise.c
> > > +++ b/mm/madvise.c
> > > @@ -122,8 +122,7 @@ static int replace_vma_anon_name(struct vm_area_struct *vma,
> > > if (anon_vma_name_eq(orig_name, anon_name))
> > > return 0;
> > >
> > > - anon_vma_name_get(anon_name);
> > > - vma->anon_name = anon_name;
> > > + vma->anon_name = anon_vma_name_reuse(anon_name);
> > > anon_vma_name_put(orig_name);
> > >
> > > return 0;
> > > --
> > > 2.35.1.473.g83b2b277ed-goog
> >
> > --
> > Michal Hocko
> > SUSE Labs
Powered by blists - more mailing lists