lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHbLzkoB+oFTxtVYpeXQvko2q9HUVzUYrr83S6M6PUmXDQpkag@mail.gmail.com>
Date: Fri, 31 May 2024 17:59:22 -0700
From: Yang Shi <shy828301@...il.com>
To: Peter Xu <peterx@...hat.com>
Cc: David Hildenbrand <david@...hat.com>, kernel test robot <oliver.sang@...el.com>, 
	Jason Gunthorpe <jgg@...dia.com>, Vivek Kasireddy <vivek.kasireddy@...el.com>, 
	Rik van Riel <riel@...riel.com>, oe-lkp@...ts.linux.dev, lkp@...el.com, 
	linux-kernel@...r.kernel.org, Andrew Morton <akpm@...ux-foundation.org>, 
	Matthew Wilcox <willy@...radead.org>, Christopher Lameter <cl@...ux.com>, linux-mm@...ck.org
Subject: Re: [linus:master] [mm] efa7df3e3b: kernel_BUG_at_include/linux/page_ref.h

On Fri, May 31, 2024 at 5:01 PM Yang Shi <shy828301@...il.com> wrote:
>
> On Fri, May 31, 2024 at 4:25 PM Peter Xu <peterx@...hat.com> wrote:
> >
> > On Fri, May 31, 2024 at 07:46:41PM +0200, David Hildenbrand wrote:
> > > try_grab_folio()->try_get_folio()->folio_ref_try_add_rcu()
> > >
> > > Is called (mm-unstable) from:
> > >
> > > (1) gup_fast function, here IRQs are disable
> > > (2) gup_hugepte(), possibly problematic
> > > (3) memfd_pin_folios(), possibly problematic
> > > (4) __get_user_pages(), likely problematic
> > >
> > > (1) should be fine.
> > >
> > > (2) is possibly problematic on the !fast path. If so, due to commit
> > >     a12083d721d7 ("mm/gup: handle hugepd for follow_page()") ? CCing Peter.
> > >
> > > (3) is possibly wrong. CCing Vivek.
> > >
> > > (4) is what we hit here
> >
> > I guess it was overlooked because try_grab_folio() didn't have any comment
> > or implication on RCU or IRQ internal helpers being used, hence a bit
> > confusing.  E.g. it has different context requirement on try_grab_page(),
> > even though they look like sister functions.  It might be helpful to have a
> > better name, something like try_grab_folio_rcu() in this case.
> >
> > Btw, none of above cases (2-4) have real bug, but we're looking at some way
> > to avoid triggering the sanity check, am I right?  I hope besides the host
> > splash I didn't overlook any other side effect this issue would cause, and
> > the splash IIUC should so far be benign, as either gup slow (2,4) or the
> > newly added memfd_pin_folios() (3) look like to have the refcount stablized
> > anyway.
> >
> > Yang's patch in the other email looks sane to me, just that then we'll add
> > quite some code just to avoid this sanity check in paths 2-4 which seems
> > like an slight overkill.
> >
> > One thing I'm thinking is whether folio_ref_try_add_rcu() can get rid of
> > its RCU limitation. It boils down to whether we can use atomic_add_unless()
> > on TINY_RCU / UP setup too?  I mean, we do plenty of similar things
> > (get_page_unless_zero, etc.) in generic code and I don't understand why
> > here we need to treat folio_ref_try_add_rcu() specially.
> >
> > IOW, the assertions here we added:
> >
> >         VM_BUG_ON(!in_atomic() && !irqs_disabled());
> >
> > Is because we need atomicity of below sequences:
> >
> >         VM_BUG_ON_FOLIO(folio_ref_count(folio) == 0, folio);
> >         folio_ref_add(folio, count);
> >
> > But atomic ops avoids it.
>
> Yeah, I didn't think of why atomic can't do it either. But is it
> written in this way because we want to catch the refcount == 0 case
> since it means a severe bug? Did we miss something?

Thought more about it and disassembled the code. IIUC, this is an
optimization for non-SMP kernel. When in rcu critical section or irq
is disabled, we just need an atomic add instruction.
folio_ref_add_unless() would yield more instructions, including branch
instruction. But I'm wondering how useful it would be nowadays. Is it
really worth the complexity? AFAIK, for example, ARM64 has not
supported non-SMP kernel for years.

My patch actually replaced all folio_ref_add_unless() to
folio_ref_add() for slow paths, so it is supposed to run faster, but
we are already in slow path, it may be not measurable at all. So
having more simple and readable code may outweigh the potential slight
performance gain in this case?

>
> >
> > This chunk of code was (mostly) originally added in 2008 in commit
> > e286781d5f2e ("mm: speculative page references").
> >
> > In short, I'm wondering whether something like below would make sense and
> > easier:
> >
> > ===8<===
> > diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
> > index 1acf5bac7f50..c89a67d239d1 100644
> > --- a/include/linux/page_ref.h
> > +++ b/include/linux/page_ref.h
> > @@ -258,26 +258,9 @@ static inline bool folio_try_get(struct folio *folio)
> >         return folio_ref_add_unless(folio, 1, 0);
> >  }
> >
> > -static inline bool folio_ref_try_add_rcu(struct folio *folio, int count)
> > -{
> > -#ifdef CONFIG_TINY_RCU
> > -       /*
> > -        * The caller guarantees the folio will not be freed from interrupt
> > -        * context, so (on !SMP) we only need preemption to be disabled
> > -        * and TINY_RCU does that for us.
> > -        */
> > -# ifdef CONFIG_PREEMPT_COUNT
> > -       VM_BUG_ON(!in_atomic() && !irqs_disabled());
> > -# endif
> > -       VM_BUG_ON_FOLIO(folio_ref_count(folio) == 0, folio);
> > -       folio_ref_add(folio, count);
> > -#else
> > -       if (unlikely(!folio_ref_add_unless(folio, count, 0))) {
> > -               /* Either the folio has been freed, or will be freed. */
> > -               return false;
> > -       }
> > -#endif
> > -       return true;
> > +static inline bool folio_ref_try_add(struct folio *folio, int count)
> > +{
> > +       return folio_ref_add_unless(folio, count, 0);
> >  }
> >
> >  /**
> > @@ -305,7 +288,7 @@ static inline bool folio_ref_try_add_rcu(struct folio *folio, int count)
> >   */
> >  static inline bool folio_try_get_rcu(struct folio *folio)
> >  {
> > -       return folio_ref_try_add_rcu(folio, 1);
> > +       return folio_ref_try_add(folio, 1);
> >  }
> >
> >  static inline int page_ref_freeze(struct page *page, int count)
> > diff --git a/mm/gup.c b/mm/gup.c
> > index e17466fd62bb..17f89e8d31f1 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -78,7 +78,7 @@ static inline struct folio *try_get_folio(struct page *page, int refs)
> >         folio = page_folio(page);
> >         if (WARN_ON_ONCE(folio_ref_count(folio) < 0))
> >                 return NULL;
> > -       if (unlikely(!folio_ref_try_add_rcu(folio, refs)))
> > +       if (unlikely(!folio_ref_try_add(folio, refs)))
> >                 return NULL;
> >
> >         /*
> > ===8<===
> >
> > So instead of adding new code, we fix it by removing some.  There might be
> > some implication on TINY_RCU / UP config on using the atomic_add_unless()
> > to replace one atomic_add(), but I'm not sure whether that's a major issue.
> >
> > The benefit is try_get_folio() then don't need a renaming then, because the
> > rcu implication just went away.
> >
> > Thanks,
> >
> > --
> > Peter Xu
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ