linux-kernel - Re: [PATCH v2 0/2] Cleanup io.h

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <f6eb011b-40fb-409a-b2b2-a09d0e770bbd@app.fastmail.com>
Date: Fri, 21 Feb 2025 18:15:30 +0100
From: "Arnd Bergmann" <arnd@...db.de>
To: "Andy Shevchenko" <andriy.shevchenko@...ux.intel.com>
Cc: "Raag Jadav" <raag.jadav@...el.com>,
 "Greg Kroah-Hartman" <gregkh@...uxfoundation.org>,
 "Dave Airlie" <airlied@...il.com>, "Simona Vetter" <simona@...ll.ch>,
 dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 0/2] Cleanup io.h

On Fri, Feb 21, 2025, at 17:50, Andy Shevchenko wrote:
> On Fri, Feb 21, 2025 at 11:15:47AM +0100, Arnd Bergmann wrote:
>> As you already found, removing an old indirect #include that is
>> no longer needed usually leads to some files breaking. The more
>> impactful your change is in terms of build speed, the more
>> things break! I think in this case, removing linux/err.h and
>> linux/bug.h made very little difference because they are very
>> small files in terms of what else they include.
>
> While this is all true, removing unneeded inclusions rarely can lead to the
> "extra work with a little gain". When there is a replacement to the low
> level ones, it's also an improvement in my opinion and won't be harmful in
> the future. But I agree, that the stuff is way too tangled already and requires
> an enormous work to untangle it, even if doing it structurally.

The problem I see with prematurely applying small improvements like this
one is that they always cause build regressions, at least if the change
is any good. If we can find some more impactful changes like this one,
we can group them together in a branch and test them a lot better before
they even reach linux-next.

I mainly want to avoid people getting angry at Raag for repeatedly
breaking their subsystems by pushing small patches one at a time.

> Do you have your scripts for the showed statistics being published somewhere?

I had a good set of scripts on an older machine and might still
have some backups of that somewhere, but just hacked up something
ad-hoc today beased on what I remembered from that time. Here
are the snippets that you might find useful.

A patch to Kbuild to create a list of each included header for each
object file built in a given configuration (similar to the .filename.o.d
files, but in a format I found more convenient):

--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -307,7 +307,8 @@ cmd_ld_single = $(if $(objtool-enabled)$(is-single-obj-m), ; $(LD) $(ld_flags) -
 endif

 quiet_cmd_cc_o_c = CC $(quiet_modtag)  $@
-      cmd_cc_o_c = $(CC) $(c_flags) -c -o $@ $< \
+      cmd_cc_o_c = $(CC) $(c_flags) -c -o $@ $< ; \
+                   $(CC) $(c_flags) -E -o - $< | grep ^\#.*include | cut -f 2 -d\" | sort -u > $@...cludes \
                $(cmd_ld_single) \
                $(cmd_objtool)

shell oneliner to find the header files that are most commonly included
from those files:

$ find -name \*includes | xargs cat | sort | uniq -c | sed -e 's:\./\|\././::g' | sort -rn | head -n 1000 > mostincluded

oneliner to preprocess each of those headers 

$ cat mostincluded | grep include/linux | while read a i ; do gcc -E $i -o ${i%.h}.i ${GCCARGS} ; done

oneliner to sort by product of includes and lines:

$ cat mostincluded | grep include/linux/ | while read a b ; do if [ -e ${b%.h}.i ] ; then echo $a `wc -l ${b%.h}.i` ; fi ; done | sort -n -k2 | while read a b c ; do echo $[$a * $b] $a $b $c ; done | sort -nr > fulllist

In the old days, I had cleaner versions of these in an automated script,
and produced a .dot file to visualize the dependencies with graphviz.
I did get to the point of more than doubling compile speed, so there was
a clear incentive to continue. In fact, the further I got along the way,
the better the savings. In the end I gave up when I could not
get to a useful subset to upstream first that wouldn't already break
hundreds of drivers.

The best idea I have to avoid that is to pick one header to clean up
from my list and do all the prerequisites but not actually break anything
at first.

      Arnd