linux-kernel - Re: [PATCH] Kbuild: Move -Wmaybe-uninitialized to W=1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160728164956.GA2865@gmail.com>
Date:	Thu, 28 Jul 2016 18:49:56 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	Borislav Petkov <bp@...en8.de>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Sam Ravnborg <sam@...nborg.org>,
	lkml <linux-kernel@...r.kernel.org>, Michael Matz <matz@...e.de>,
	linux-kbuild@...r.kernel.org, x86-ml <x86@...nel.org>,
	Arnaldo Carvalho de Melo <acme@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [PATCH] Kbuild: Move -Wmaybe-uninitialized to W=1


Firstly, I'd like to stress it that I know that the commit is upstream and there's 
a less than 1% chance for it to be reverted. I'm not arguing with the expectation 
to see it reverted, I'm arguing because I'm not convinced about the underlying 
technical arguments.

Secondly, I'd like to stress it that there are a number of truly crappy GCC 
warnings that not only are obnoxious but also actively create the *wrong* kind of 
'fixes': -Wsign-compare or -fno-strict-aliasing for example.

* Borislav Petkov <bp@...en8.de> wrote:

> On Thu, Jul 28, 2016 at 10:29:15AM +0200, Ingo Molnar wrote:
> > BUT, isn't this the natural state of things, that the 'final' warnings
> > that don't get fixed are the obnoxious, false positive ones - because
> > anyone who looks at them will say "oh crap, idiotic compiler!"?
> 
> Hmm, so my experience is like Linus' - that -Wmaybe thing generates too
> much noise and a lot of false positives. [...]

But it's only the *residual* warnings we get to see in a huge, 1000+ commits per 
week, 20+ MLOC code base, every 3 months.

We are only seeing a very small minority of warnings, and are seeing them again 
and again in the allmodconfig output because nobody is fixing them because 'GCC is 
obviously wrong'.

So I might be wrong about this, but by its very nature this warning, even if it's 
implemented well, has the natural property that 'crappy corner cases coalesce at 
the bottom'.

> > But over the last couple of years I think we probably had hundreds of
> > bugs avoided due to the warning (both at the development and at the
> > integration stage) - and
> 
> Really?

Yes, really - even many of the false positives were useful to me, because they 
flagged questionable coding practices and overly complex code.

> And I've yet to see an example where it actually helped :-\

I think partly because you are used to working on x86 architecture code where we 
take a look at every warning ASAP.

Check out:

 commit 8e8a6b23f115906678252190c8fcf4168cc60fd5
 Author: Hans Verkuil <hverkuil@...all.nl>
 Date:   Tue Jul 28 05:33:55 2015 -0300

    [media] mt9v032: fix uninitialized variable warning

    It can indeed be uninitialized in one corner case. Initialize to NULL.

... we now turn these warnings into false negatives.

This one's a real fix too I think:

  c30400fcffb7 drm/i915: set FDI translations to NULL on SKL

and these were literally the first two random examples I took with a bit of 
grepping.

Plus I'd say that in many cases even if it's a false positive, these warnings 
often flag borderline code complexity bug: code flow should essentially never be 
so complex as to confuse a compiler. If it confuses a compiler it will confuse 9 
coders out of 10.

These two commits:

  3354781a2184 sched/numa: Reflow task_numa_group() to avoid a compiler warning
  719038de98bc x86/intel/cacheinfo: Shut up last long-standing warning

show cases where it flagged actively confusing code.

But these are only those rare warnings that make it upstream - I'd expect more 
than 90% of them to be caught by developers.

> >   commit e01d8718de4170373cd7fbf5cf6f9cb61cebb1e9
> >   Author: Peter Zijlstra <peterz@...radead.org>
> >   Date:   Wed Jan 27 23:24:29 2016 +0100
> > 
> >     perf/x86: Fix uninitialized value usage
> > 
> >   ...
> > 
> >   Only took 6 hours of painful debugging to find this. Neither GCC nor
> >   Smatch warnings flagged this bug.
> 
> So that warning didn't help here either.

Exactly, and now we'll get more such false negatives!

> > ... and my worry here is that we are now telling GCC: "don't you dare
> > generate a false positive warning!" - at which point GCC folks will
> > add even MORE heuristics to avoid false positives that generate even
> > more false negatives
> 
> Why?

Because people react to incentives, and the incentives we are creating here is for 
compiler writers to choose between two options:

1) Solve a really difficult code flow analysis problem and implement the perfect 
   warning detector.

2) Slap even more heuristics on top of existing heuristics to make warnings go
   away.

Guess which one is more likely to be picked?

Now granted, GCC folks will probably not react to 'incentives' created by the 
kernel project (we are one project out of thousands), but still.

> I think we should enable only the real warnings and turn off the stuff which 
> generates a lot of false positives. Or, we could put them behind the -W= switch, 
> so that people can still build the kernel with it but not have them enabled by 
> default.

But that's my point, I believe the false positive rate is pretty low in fact, due 
to three factors:

 - 90% of the warnings get fixed by developers, we never see them upstream

 - I'd say a majority (say 70%) of the remaining warnings are flagging 'complexity 
   bugs'
 
 - only a residual 3% are obnoxious ones.

But these remaining 3% are the ones we are seeing again and again in various 
compiler output, so we tend to get a subjective impression that this warning 
produces countless false positives.

So I *think* the better option would be to do what we are doing in the perf 
tooling: force a build error for these warnings (by default, with an option 
available to make it build). That flushes them out and also makes it sure that 
those questionable sequences of code never get upstream to begin with.

Thanks,

	Ingo