linux-kernel - Re: [PATCH v1] mm/gup: remove (VM_)BUG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f33f6f73-58bd-4877-a2cc-5436943da292@redhat.com>
Date: Thu, 5 Jun 2025 08:08:24 +0200
From: David Hildenbrand <david@...hat.com>
To: Vlastimil Babka <vbabka@...e.cz>, John Hubbard <jhubbard@...dia.com>,
 linux-kernel@...r.kernel.org
Cc: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
 Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
 "Liam R. Howlett" <Liam.Howlett@...cle.com>, Mike Rapoport
 <rppt@...nel.org>, Suren Baghdasaryan <surenb@...gle.com>,
 Michal Hocko <mhocko@...e.com>, Jason Gunthorpe <jgg@...pe.ca>,
 Peter Xu <peterx@...hat.com>
Subject: Re: [PATCH v1] mm/gup: remove (VM_)BUG_ONs

On 05.06.25 07:37, Vlastimil Babka wrote:
> On 6/5/25 03:07, John Hubbard wrote:
>> On 6/4/25 7:05 AM, David Hildenbrand wrote:
>>> Especially once we hit one of the assertions in
>>> sanity_check_pinned_pages(), observing follow-up assertions failing
>>> in other code can give good clues about what went wrong, so use
>>> VM_WARN_ON_ONCE instead.
>>>
>>> While at it, let's just convert all VM_BUG_ON to VM_WARN_ON_ONCE as
>>> well. Add one comment for the pfn_valid() check.
>>
>> It would be a nice touch to add Linus' notes here, with the BUG() history
>> and all. It answers a FAQ about BUG vs. WARN* that is really nice
>> to have in the commit log.
> 
> Perhaps then rather put it somewhere appropriate in Documentation/process/
> than a random commit log?

I mean, I documented most of that already in coding-style.rst. :)

The full BUG history is not in there, but not sure if that is really required if ...
we're not supposed to use it.

Is there anything important missing?


commit 1cfd9d7e43d5a1cf739d1420b10b1e65feb02f88
Author: David Hildenbrand <david@...hat.com>
Date:   Fri Sep 23 13:34:24 2022 +0200

     coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")
     
     Linus notes [1] that the introduction of new code that uses VM_BUG_ON()
     is just as bad as BUG_ON(), because it will crash the kernel on
     distributions that enable CONFIG_DEBUG_VM (like Fedora):
     
         VM_BUG_ON() has the exact same semantics as BUG_ON. It is literally
         no different, the only difference is "we can make the code smaller
         because these are less important". [2]
     
     This resulted in a more generic discussion about usage of BUG() and
     friends. While there might be corner cases that still deserve a BUG_ON(),
     most BUG_ON() cases should simply use WARN_ON_ONCE() and implement a
     recovery path if reasonable:
     
         The only possible case where BUG_ON can validly be used is "I have
         some fundamental data corruption and cannot possibly return an
         error". [2]
     
     As a very good approximation is the general rule:
     
         "absolutely no new BUG_ON() calls _ever_" [2]
     
     ... not even if something really shouldn't ever happen and is merely for
     documenting that an invariant always has to hold. However, there are sill
     exceptions where BUG_ON() may be used:
     
         If you have a "this is major internal corruption, there's no way we can
         continue", then BUG_ON() is appropriate. [3]
     
     There is only one good BUG_ON():
     
         Now, that said, there is one very valid sub-form of BUG_ON():
         BUILD_BUG_ON() is absolutely 100% fine. [2]
     
     While WARN will also crash the machine with panic_on_warn set, that's
     exactly to be expected:
     
         So we have two very different cases: the "virtual machine with good
         logging where a dead machine is fine" - use 'panic_on_warn'. And
         the actual real hardware with real drivers, running real loads by
         users. [4]
     
     The basic idea is that warnings will similarly get reported by users
     and be found during testing. However, in contrast to a BUG(), there is a
     way to actually influence the expected behavior (e.g., panic_on_warn)
     and to eventually keep the machine alive to extract some debug info.
     
     Ingo notes that not all WARN_ON_ONCE cases need recovery. If we don't ever
     expect this code to trigger in any case, recovery code is not really
     helpful.
     
         I'd prefer to keep all these warnings 'simple' - i.e. no attempted
         recovery & control flow, unless we ever expect these to trigger.
         [5]
     
     There have been different rules floating around that were never properly
     documented. Let's try to clarify.
     
     [1] https://lkml.kernel.org/r/CAHk-=wiEAH+ojSpAgx_Ep=NKPWHU8AdO3V56BXcCsU97oYJ1EA@mail.gmail.com
     [2] https://lore.kernel.org/r/CAHk-=wg40EAZofO16Eviaj7mfqDhZ2gVEbvfsMf6gYzspRjYvw@mail.gmail.com
     [3] https://lkml.kernel.org/r/CAHk-=wit-DmhMfQErY29JSPjFgebx_Ld+pnerc4J2Ag990WwAA@mail.gmail.com
     [4] https://lore.kernel.org/r/CAHk-=wgF7K2gSSpy=m_=K3Nov4zaceUX9puQf1TjkTJLA2XC_g@mail.gmail.com
     [5] https://lore.kernel.org/r/YwIW+mVeZoTOxn%2F4@gmail.com
     


-- 
Cheers,

David / dhildenb