linux-kernel - Re: [PATCH] mm: implement "memory.oops_if_bad

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5242fca8-4a17-4ff8-a624-08778fc64f19@lucifer.local>
Date: Thu, 10 Jul 2025 18:02:07 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Alexey Dobriyan <adobriyan@...il.com>
Cc: akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, David Hildenbrand <david@...hat.com>,
        "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
        Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>
Subject: Re: [PATCH] mm: implement "memory.oops_if_bad_pte=1" boot option

OK I wasn't clear enough I guess - NAK.

This is not upstreamable, nor anything like it.

On Thu, Jul 10, 2025 at 07:57:00PM +0300, Alexey Dobriyan wrote:
> On Thu, Jul 10, 2025 at 05:16:52PM +0100, Lorenzo Stoakes wrote:
> > Sorry but no - this seems to me to just be a hack. And it also appears to
> > violate the rules on BUG_ON() (see [0]) so this is just a no.
> >
> > [0]:https://lore.kernel.org/linux-mm/CAHk-=wjO1xL_ZRKUG_SJuh6sPTQ-6Lem3a3pGoo26CXEsx_w0g@mail.gmail.com/
> >
> > On Wed, Jul 09, 2025 at 09:10:59PM +0300, Alexey Dobriyan wrote:
> > > Implement
> > >
> > > 	memory.oops_if_bad_pte=1
> >
> > This is a totally new paradigm afaict - introducing an oops based on user
> > input, I really don't think that's sensible.
> >
> > Unless kernel.panic_on_oops is set this won't necessarily cause anything to
> > halt. Really you want a panic_on_bad_pte here, but that would be way way
> > too specific.
> >
> > So it seems like a hack just so you can get a vmcore?
> >
> > You seem to be using BUG_ON() to _maybe_ cause a panic, maybe not, but by
> > doing this you're inferring that there's unrecoverable system instability,
> > which isf clearly not the case.
> >
> > All of the bad PTE handling seems to be intended to be recoverable and
> > handled by calling code.
> >
> > Additionally we have uses like zap_present_folio_ptes() which use it to
> > output PTE state in the instance of an invalid mapcount value - I don't
> > think oopsing there would really be what you wanted right?
> >
> > >
> > > boot option which oopses the machine instead of dreadful
> > >
> > > 	BUG: Bad page map in process
> > >
> > > message.
> >
> > I'm not sure what's so dreadful about it?
>
> Because the root cause is unknown, happened at unknown time, dmesg
> rotated away and nobody bothered to coredump the machine because it
> didn't oops!
>
> > And why an oops is better?
>
> I apologize for stating the obvious but the less time between the bug
> and coredump collection the better.
>
> > > This is intended
> > > for people who want to panic at the slightest provocation and
> > > for people who ruled out hardware problems which in turn means that
> > > delaying vmcore collection is counter-productive.
> >
> > Seems to be a specific edge case.
>
> Yes, but the option is not enabled by default and costs 2 instructions
> on the coldest code path, so...
>
> > > Linux doesn't (never?) panicked on PTE corruption and even implemented
> > > ratelimited version of the message meaning it can go for minutes and
> > > even hours without anyone noticing which is exactly the opposite of what
> > > should be done to facilitate debugging.
> >
> > But are there real situations you can cite where this has been problematic?
> >
> > >
> > > Not enabled by default.
> >
> > Yeah, obviously.
> >
> > >
> > > Not advertised.
> >
> > Umm why? Seems like you just want to add this for your own very specific
> > purpose?
>
> Sort of, I don't want to patch and unpatch things every time.
>
> > > +/*
> > > + * Oops instead of printing "Bad page map in process" message and
> > > + * trying to continue.
> > > + */
> > > +static bool oops_if_bad_pte __ro_after_init = false;
> > > +module_param(oops_if_bad_pte, bool, 0444);
> > > +
> > >  /*
> > >   * This function is called to print an error when a bad pte
> > >   * is found. For example, we might have a PFN-mapped pte in
> > > @@ -490,6 +498,13 @@ static inline void add_mm_rss_vec(struct mm_struct *mm, int *rss)
> > >  static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
> > >  			  pte_t pte, struct page *page)
> > >  {
> > > +	/*
> > > +	 * This line is a formality to collect vmcore ASAP. Real bug
> > > +	 * (hardware or software) happened earlier, current registers and
> > > +	 * backtrace aren't interesting.
> > > +	 */
> > > +	BUG_ON(oops_if_bad_pte);
> >
> > Except that it won't without panic_on_oops?
>
> Yes, I'll update the comment. it is supposed to be used with
> panic_on_oops=1 for maximum effect.
>
> > I mean we can't just go around putting arbitrary BUG_ON()'s like this for
> > cases we want data on.
>
> Yes, we can!
>
> > And far worse here - this is a print_xxx() function, and you're making it
> > oops? That's really bad.
>
> It's fine because, it is conditional BUG_ON.
>
> > Note that other page table levels can be 'bad' as well, see pgd_bad() et
> > al. - none of these will be caught.
>
> Sure, I didn't think much about spreading this option to other places.
> It can be spread independently.
>
> > Overall I suspect there's one single case you're worried about, that really
> > you want to put a WARN_ON_ONCE() against - then you can panic_on_warn and
> > get what you want.
>
> Ehh, no. WARN is for home users who can maybe photo the oops and fish it
> out of dmesg and make bug report -- so that system survives until their
> data are flushed to disk.
>
> I suspect users are very bifurcated: some want to panic always, some
> want to panic during QA but not in the field, and then there are users
> whose only hope is cellphone camera.
>
> > If you can make an argument in favour of this that's convincing then that
> > would be a potentially upstreamable patch, but this one isn't, in my view.