lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161005190604.GA8116@1wt.eu>
Date:   Wed, 5 Oct 2016 21:06:04 +0200
From:   Willy Tarreau <w@....eu>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Paul Gortmaker <paul.gortmaker@...driver.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Antonio SJ Musumeci <trapexit@...wn.link>,
        Miklos Szeredi <miklos@...redi.hu>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        stable <stable@...r.kernel.org>
Subject: Re: BUG_ON() in workingset_node_shadows_dec() triggers

On Wed, Oct 05, 2016 at 08:52:54AM -0700, Linus Torvalds wrote:
> On Tue, Oct 4, 2016 at 10:44 PM, Willy Tarreau <w@....eu> wrote:
> >
> > I think instead we should completely remove any simple way to halt the
> > system and document how to do it.
> 
> Having slept on it, I suspect you're right. I worry about some
> BUG_ON() that really relies on the killing behavior, but if it takes a
> "real" fault later, that is when it gets killed. And on the whole,
> we've had lots of problems with the killing behavior over the years,
> so we should just try switching BUG_ON() over to non-fatal. It's
> unlikely to be worse than what we have now, as exemplified by this
> event.

I have the same doubts, so at least I would not want to run the "sed"
immediately, at least to keep the initial intent. But I think everyone
is right in is own yard when he puts a BUG_ON() when he doesn't know
how to handle an unsafe situation, he's wrong from a global perspective.

For example, it could be seen as safe to crash the system in a filesystem
driver to protect against the risk of data corruption resulting from an
impossible condition, but when this happens due to a dirty FS on a USB
stick that a person inserts on the PC to save her work, actually the
BUG_ON() is the one responsible for the data loss. Even something as
painful as leaving a process in D state in this situation would have
been cleaner as it would let the admin reboot when he wants and not
have to experience it at the worst moment.

I've already met 100% reproducible panics that I never had the time to
inestigate (one involving running an mmap-based hex editor on /dev/mem,
and the other one doing stupid things with mount --move), and I'm sure
once I find the cause I'll see a BUG_ON() that should have been a warning.

I'm pretty sure there are historically valid BUG_ON() that are probably
not needed anymore just like I'm also convinced that some of them are
hard to get rid of. Maybe at least having the same as WARN_ON() but
prepending the dump with a message saying "you encountered a critical
bug which should have crashed the kernel, you must absolutely report it"
would help at the beginning.

Cheers,
Willy

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ