linux-kernel - Re: [PATCH] debug: Deprecate BUG_ON() use in new code, introduce CRASH

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150608080903.GA1236@gmail.com>
Date:	Mon, 8 Jun 2015 10:09:04 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	Alexander Holler <holler@...oftware.de>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Tejun Heo <htejun@...il.com>,
	Louis Langholtz <lou_langholtz@...com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	trivial@...nel.org, Rusty Russell <rusty@...tcorp.com.au>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH] debug: Deprecate BUG_ON() use in new code, introduce
 CRASH_ON()

* Alexander Holler <holler@...oftware.de> wrote:

> Am 08.06.2015 um 09:12 schrieb Ingo Molnar:
> >
> >* Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> >
> >>Stop with the random BUG_ON() additions.
> >
> > Yeah, so I propose the attached patch which attempts to resist new BUG_ON() 
> > additions.
> 
> As this reminded me at flame I received once from a maintainer because I wanted 
> to avoid a desastrous memory corruption by using a BUG_ON(). maybe someone 
> should mention that a BUG_ON or now CRASH_ON should be still prefered instead of 
> some random memory corruption which might lead to worse things. Or how is the 
> viewpoint of the kernel masters in regard to memory corruptions and use of 
> BUG_ON, WARN_ON or CRASH_ON?

So it depends on the actual change, but there's very few cases where a BUG_ON() is 
justified, even if the code detects memory corruption.

Most instances of memory corruption either come from the hardware or come from 
some other piece of code, so _your_ code crashing the system will be unexpected, 
and in most cases unproductive to finding the cause of the corruption.

The best action is to stop doing whatever your code was doing, trying to bail out 
with as little extra changes done to the system as possible.

An example for that are lockdep's asserts. An actual lockdep warning in a 
released, production kernel is frequently connected to a real risk of data 
corruption - yet what we do is that we report the bug non-intrusively and turn off 
lockdep completely, so that it does not make the situation worse and that we have 
a chance the messages can be saved and can be reported back to kernel developers.

The origins of widespread BUG_ON() use are twofold:

 - 20 years ago we didn't have much of any locking in the kernel, so a BUG_ON()
   resulted in essence in a graceful segfault of the application that happened to
   trigger it, in most cases. Kernel logs were still possible to retrieve if the
   bug did not trigger too often - and if not (because for example the crash
   happened in the idle thread) then the backtrace was still visible on the VGA
   text console.

 - in the early days we didn't have WARN_ON(), we only had BUG_ON(), so people
   used that. BUG_ON() used to be the 'graceful' assert, panic() was the
   equivalent of CRASH_ON().

These days a BUG_ON() is almost always fatal due to unreleased locks, plus we 
still don't print kernel crashes to the graphical console, so they are silent hard 
lockups in 99% of the cases.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/