lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8fc9b4e3-55d2-48e2-a9ad-4f21dc283f35@lucifer.local>
Date: Wed, 4 Jun 2025 16:44:11 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: David Hildenbrand <david@...hat.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
        Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
        Jason Gunthorpe <jgg@...pe.ca>, John Hubbard <jhubbard@...dia.com>,
        Peter Xu <peterx@...hat.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH v1] mm/gup: remove (VM_)BUG_ONs

On Wed, Jun 04, 2025 at 04:58:25PM +0200, David Hildenbrand wrote:
> On 04.06.25 16:48, Lorenzo Stoakes wrote:
> > +Linus in case he has an opinion about BUG_ON() in general...
> >
> > On Wed, Jun 04, 2025 at 04:05:44PM +0200, David Hildenbrand wrote:
> > > Especially once we hit one of the assertions in
> > > sanity_check_pinned_pages(), observing follow-up assertions failing
> > > in other code can give good clues about what went wrong, so use
> > > VM_WARN_ON_ONCE instead.
> >
> > I guess the situation where you'd actually want a BUG_ON() is one where
> > carrying on might cause further corruption so you just want things to stop.
>
> Yes. Like, serious data corruption would be avoidable.

Yeah, I just wonder how often this is ever reliably for sure the case...

>
> >
> > But usually we're already pretty screwed if the thing happened right? So
> > it's rare if ever that this would be legit?
> >
> > Linus's point of view is that we shouldn't use them _at all_ right? So
> > maybe even this situation isn't one where we'd want to use one?
>
> I think the grey zone is actual data corruption. But one has to have a
> pretty good reason to use a BUG_ON and not a WARN_ON_ONCE() + recovery.

Right.

>
> >
> > >
> > > While at it, let's just convert all VM_BUG_ON to VM_WARN_ON_ONCE as
> > > well. Add one comment for the pfn_valid() check.
> >
> > Yeah VM_BUG_ON() is just _weird_. Maybe we should get rid of all of them
> > full stop?
>
> That's my thinking a well.

:)

>
> >
> > >
> > > We have to introduce VM_WARN_ON_ONCE_VMA() to make that fly.
> >
> > I checked the implementation vs. the other VM_WARN_ON_ONCE_*()'s and it
> > looks good.
> >
> > I wonder if we can find a way to not duplicate this code... but one for a
> > follow up I think :>)
> >
> > >
> > > Drop the BUG_ON after mmap_read_lock_killable(), if that ever returns
> > > something > 0 we're in bigger trouble. Convert the other BUG_ON's into
> > > VM_WARN_ON_ONCE as well, they are in a similar domain "should never
> > > happen", but more reasonable to check for during early testing.
> > >
> > > Cc: Andrew Morton <akpm@...ux-foundation.org>
> > > Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> > > Cc: "Liam R. Howlett" <Liam.Howlett@...cle.com>
> > > Cc: Vlastimil Babka <vbabka@...e.cz>
> > > Cc: Mike Rapoport <rppt@...nel.org>
> > > Cc: Suren Baghdasaryan <surenb@...gle.com>
> > > Cc: Michal Hocko <mhocko@...e.com>
> > > Cc: Jason Gunthorpe <jgg@...pe.ca>
> > > Cc: John Hubbard <jhubbard@...dia.com>
> > > Cc: Peter Xu <peterx@...hat.com>
> > > Signed-off-by: David Hildenbrand <david@...hat.com>
> >
> > LGTM so,
> >
> > Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> >
> >
>
> Thanks!
>
> > One nit below.
> >
> > > ---
> > >
> > > Wanted to do this for a long time, but my todo list keeps growing ...
> >
> > Sounds familiar :) Merge window a chance to do some of these things...
> >
> > >
> > > Based on mm/mm-unstable
> > >
> > > ---
> > >   include/linux/mmdebug.h | 12 ++++++++++++
> > >   mm/gup.c                | 41 +++++++++++++++++++----------------------
> > >   2 files changed, 31 insertions(+), 22 deletions(-)
> > >
> > > diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
> > > index a0a3894900ed4..14a45979cccc9 100644
> > > --- a/include/linux/mmdebug.h
> > > +++ b/include/linux/mmdebug.h
> > > @@ -89,6 +89,17 @@ void vma_iter_dump_tree(const struct vma_iterator *vmi);
> > >   	}								\
> > >   	unlikely(__ret_warn_once);					\
> > >   })
> > > +#define VM_WARN_ON_ONCE_VMA(cond, vma)		({			\
> > > +	static bool __section(".data..once") __warned;			\
> > > +	int __ret_warn_once = !!(cond);					\
> > > +									\
> > > +	if (unlikely(__ret_warn_once && !__warned)) {			\
> > > +		dump_vma(vma);						\
> > > +		__warned = true;					\
> > > +		WARN_ON(1);						\
> > > +	}								\
> > > +	unlikely(__ret_warn_once);					\
> > > +})
> >
> > An aside, I wonder if we could somehow make this generic for various
> > WARN_ON_ONCE()'s?
>
> Yeah, probably. Maybe it will get .... ugly :)
>
> >
> > >   #define VM_WARN_ON_VMG(cond, vmg)		({			\
> > >   	int __ret_warn = !!(cond);					\
> > >   									\
> > > @@ -115,6 +126,7 @@ void vma_iter_dump_tree(const struct vma_iterator *vmi);
> > >   #define VM_WARN_ON_FOLIO(cond, folio)  BUILD_BUG_ON_INVALID(cond)
> > >   #define VM_WARN_ON_ONCE_FOLIO(cond, folio)  BUILD_BUG_ON_INVALID(cond)
> > >   #define VM_WARN_ON_ONCE_MM(cond, mm)  BUILD_BUG_ON_INVALID(cond)
> > > +#define VM_WARN_ON_ONCE_VMA(cond, vma)  BUILD_BUG_ON_INVALID(cond)
> > >   #define VM_WARN_ON_VMG(cond, vmg)  BUILD_BUG_ON_INVALID(cond)
> > >   #define VM_WARN_ONCE(cond, format...) BUILD_BUG_ON_INVALID(cond)
> > >   #define VM_WARN(cond, format...) BUILD_BUG_ON_INVALID(cond)
> > > diff --git a/mm/gup.c b/mm/gup.c
> > > index e065a49842a87..3c3931fcdd820 100644
> > > --- a/mm/gup.c
> > > +++ b/mm/gup.c
> > > @@ -64,11 +64,11 @@ static inline void sanity_check_pinned_pages(struct page **pages,
> > >   		    !folio_test_anon(folio))
> > >   			continue;
> > >   		if (!folio_test_large(folio) || folio_test_hugetlb(folio))
> > > -			VM_BUG_ON_PAGE(!PageAnonExclusive(&folio->page), page);
> > > +			VM_WARN_ON_ONCE_PAGE(!PageAnonExclusive(&folio->page), page);
> > >   		else
> > >   			/* Either a PTE-mapped or a PMD-mapped THP. */
> > > -			VM_BUG_ON_PAGE(!PageAnonExclusive(&folio->page) &&
> > > -				       !PageAnonExclusive(page), page);
> > > +			VM_WARN_ON_ONCE_PAGE(!PageAnonExclusive(&folio->page) &&
> > > +					     !PageAnonExclusive(page), page);
> >
> > Nit but wouldn't VM_WARN_ON_ONCE_FOLIO() work better here?
>
> No, we want the actual problematic page here, as that can give us clues what
> is going wrong.

Ah yeah... didn't notice we're checking both folio and
page... PageAnonExclusive() seems to be a weird beast:

	/*
	 * HugeTLB stores this information on the head page; THP keeps it per
	 * page
	 */

But anyway I'm digressing :)

>
> For the small-folio case above we could use it, though.

Ack, no big deal though.

>
> --
> Cheers,
>
> David / dhildenb
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ