linux-kernel - Re: [PATCH] firmware: dmi_scan: Drop dmi

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Sun, 24 Sep 2017 11:16:25 +0200
From:   Ingo Molnar <mingo@...nel.org>
To:     Jean Delvare <jdelvare@...e.de>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] firmware: dmi_scan: Drop dmi_initialized

* Jean Delvare <jdelvare@...e.de> wrote:

> Hi Ingo,
> 
> On Sat, 23 Sep 2017 12:50:31 +0200, Ingo Molnar wrote:
> > * Jean Delvare <jdelvare@...e.de> wrote:
> > 
> > > I don't think it makes sense to check for a possible bad
> > > initialization order at run time on every system when it is all
> > > decided at build time.
> > > 
> > > A more efficient way to make sure developers do not introduce new
> > > calls to dmi_check_system() too early in the initialization sequence
> > > is to simply document the expected call order. That way, developers
> > > have a chance to get it right immediately, without having to
> > > test-boot their kernel, wonder why it does not work, and parse the
> > > kernel logs for a warning message. And we get rid of the run-time
> > > performance penalty as a nice side effect.  
> > 
> > Huh? Initialization ordering requirements are very opaque,
> 
> They were. Now they are very documented.
> 
> > and by removing the debug check any such bugs are actively hidden. How
> > is documentation supposed to uncover such bugs once they happen?
> 
> You are looking at it the wrong way around. Documentation is how they
> do not happen in the first place.

That expectation, as a general statement, is very naive and contrary to 
experience: documentation is fine for one layer of defense to prevent bugs, but 
_when_ they happen and a bug slips through, documentation does not help anymore, 
because the dependencies in the _code_ are opaque and non-obvious ...

For example during the early SMP efforts of Linux we used to document lock 
dependencies as well, but once the kernel had more than a dozen spinlocks we 
periodically ran into deadlocks and the whole design became unmaintainable 
quickly. So we have lockdep in addition to documentation.

> You hit this problem once, 9 years ago. You thought it would have been easier to 
> debug if there was a warning, and you added it.

I did not just 'think' it would have been easier to debug, I wasted time on that 
bug and a warning would have helped so I added it. That was and remains 
objectively true.

While I expect most such warnings to never see any public email lists (because 
once a developer triggers it it gets fixed without the bug ever getting triggered 
by others), yet searching for "dmi check: not initialized yet" still finds a 
couple of incidents where real or potential bugs were found by this init 
dependency check, such as:

   http://lists.infradead.org/pipermail/linux-arm-kernel/2014-September/289347.html

or this:

   https://www.spinics.net/lists/linux-acpi/msg28698.html

... so this warning actually helped a number of kernel developers to not
waste time on the opaque dependency. This is a warning that was added due
to an _actual category of bugs_, which has been triggered subsequently as
well, so it's not a frivolous warning by any meaning.

> [...] It was one way to solve the problem but I claim it was not the best.
> 
> What I expect from developers calling a function they aren't familiar
> with is to read its documentation first. That's the very reason why we
> spend time writing the documentation. They should not just call the
> function, boot and see if it works or not. Software engineering vs.
> trial and error.

This statement is breathtaking in its ignorance :-(

> > So NAK.
> 
> This was FYI. I maintain this subsystem, and you did not convince me. I also 
> can't see a general trend of implementing what you suggest in the rest of the 
> kernel. Thankfully.

I find the arrogance displayed here breathtaking as well - the last thing we need 
is for firmware interfacing kernel code to become _more_ fragile.

This was and continues to be a useful warning - but what worries me even more is 
not just the removal of the warning, but the false and technically invalid 
justifications under which it is removed...

For those reasons I maintain my NAK:

  Nacked-by: Ingo Molnar <mingo@...nel.org>

Thanks,

	Ingo