linux-kernel - Re: [PATCH] proc: use #pragma once

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 4 May 2018 00:14:57 +0200
From:   Rasmus Villemoes <linux@...musvillemoes.dk>
To:     Andrew Morton <akpm@...ux-foundation.org>,
        Alexey Dobriyan <adobriyan@...il.com>
Cc:     dsterba@...e.cz, Christoph Hellwig <hch@...radead.org>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] proc: use #pragma once

On 2018-05-02 00:13, Andrew Morton wrote:
> On Thu, 26 Apr 2018 22:24:44 +0300 Alexey Dobriyan <adobriyan@...il.com> wrote:
> 
>>> The LOC argument also does not sound very convincing.
>>
>> When was the last time you did -80 kLOC patch for free?
> 
> That would be the way to do it - sell the idea to Linus, send him a
> script to do it then stand back.  The piecemeal approach is ongoing
> pain.
> 

FWIW, it's not just removing some identifiers from cpp's hash tables, it
also reduces I/O: Due to our header mess, we have some cyclic includes,
e.g mm.h -> memremap.h -> mm.h. While parsing mm.h, cpp sees the #define
_LINUX_MM_H, then goes parsing memremap.h, but since it hasn't reached
the end of mm.h yet (seeing that there's nothing but comments outside
the #ifndef/#endif pair), it hasn't had a chance to set the internal
flag for mm.h, so it goes slurping in mm.h again. Obviously, the
definedness of _LINUX_MM_H at that point means it "only" has to parse
those 87K for comments and matching up #ifs, #ifdefs,#endifs etc. With
#pragma once, the flag gets set for mm.h immediately, so the #include
from memremap.h is entirely ignored. This can easily be verified with
strace. And mm.h is not the only header getting read twice.

I had some "extract the include guard" line noise lying around, so I
hacked up the below if someone wants to play some more with this. A few
not-very-careful kbuild timings didn't show anything significant, but
both the before and after times were way too noisy, and I only patched
include/linux/*.h.

Anyway, the first order of business is to figure out which ones to leave
alone. We have a bunch of #ifndef THAT_ONE #error "don't include
$this_one directly". The brute-force way is to simply record all macros
which are checked for definedness at least twice.

git grep -h -E '^\s*#\s*if(.*defined\s*\(|n?def)\s*[A-Za-z0-9_]+' | grep
-o -E '[A-Za-z_][A-Za-z_0-9]*' | sort | uniq --repeated > multest.txt

But there's also stuff like arch/x86/boot/compressed/kaslr.c that plays
games with pre-defining _EXPORT_H to avoid parsing export.h when it
inevitably gets included. Oh well, just add the list of macros that have
at least two definitions.

git grep -h -E '^\s*#\s*define\s+[A-Za-z0-9_]+' | grep -o -E
'^\s*#\s*define\s+[A-Za-z0-9_]+' | grep -oE '[A-Za-z0-9_]+' | sort |
uniq --repeated > muldef.txt

With those, one can just do

cat muldef.txt multest.txt | scripts/replace_ig.pl ...

This ends up detecting a lot of copy-pasting (e.g.
__LINUX_MFD_MAX8998_H), as well as lots of headers that for no obvious
reason do not have an include guard. Oh, and once.h has a redundant \.

Rasmus

wear sunglasses...

=== scripts/replace_ig.pl ===

#!/usr/bin/perl

use strict;
use warnings;
use File::Slurp;

my %preserve;

sub strip_comments {
    my $txt = shift;

    # Line continuations are handled before comment stripping, so
    # <slash> <backslash> <newline> <star> actually starts a comment,
    # and a // comment can swallow the following line. Let's just
    # assume nobody has modified the #if control flow using such dirty
    # tricks when we do a more naive line-by-line parsing below to
    # actually remove the include guard deffery.
    $txt =~ s/\\\n//g;

    # http://stackoverflow.com/a/911583/722859
    $txt =~ s{
		 /\*         ##  Start of /* ... */ comment
		 [^*]*\*+    ##  Non-* followed by 1-or-more *'s
		 (?:
		     [^/*][^*]*\*+
		 )*          ##  0-or-more things which don't start with /
		 ##    but do end with '*'
		 /           ##  End of /* ... */ comment

	     |
		 //     ## Start of // comment
		 [^\n]* ## Anything which is not a newline
		 (?=\n) ## End of // comment; use look-ahead to avoid consuming the
newline

	     |         ##     OR  various things which aren't comments:

		 (
		     "           ##  Start of " ... " string
		     (?:
			 \\.           ##  Escaped char
		     |               ##    OR
			 [^"\\]        ##  Non "\
		     )*
		     "           ##  End of " ... " string

		 |         ##     OR

		     '           ##  Start of ' ... ' string
		     (
			 \\.           ##  Escaped char
		     |               ##    OR
			 [^'\\]        ##  Non '\
		     )*
		     '           ##  End of ' ... ' string

		 |         ##     OR

		     .           ##  Anything other char
		     [^/"'\\]*   ##  Chars which doesn't start a comment, string or escape
		 )
	 }{defined $1 ? $1 : " "}gxse;

    return $txt;
}

sub include_guard {
    my $txt = shift;
    my @lines = (split /^/, $txt);
    my $i = 0;
    my $level = 1;
    my $name;

   # The first non-empty line must be an #ifndef or an #if !defined().
    ++$i while ($i < @lines && $lines[$i] =~ m/^\s*$/);
    goto not_found if ($i == @lines);
    goto not_found
	if (!($lines[$i] =~
m/^\s*#\s*ifndef\s+(?<name>[A-Za-z_][A-Za-z_0-9]*)\s*$/) &&
	    !($lines[$i] =~
m/^\s*#\s*if\s+!\s*defined\s*\(\s*(?<name>[A-Za-z_][A-Za-z_0-9]*)\s*\)\s*$/));
    $name = $+{name};

    # The next non-empty line must be a #define of that macro.
    1 while (++$i < @lines && $lines[$i] =~ m/^\s*$/);
    goto not_found if ($i == @lines);
    goto not_found if !($lines[$i] =~ m/^\s*#\s*define\s+\b$name\b/);

    # Now track #ifs and #endifs. #elifs and #elses don't change the level.
    while (++$i < @lines && $level > 0) {
	if ($lines[$i] =~ m/^\s*#\s*(?:if|ifdef|ifndef)\b/) {
	    $level++;
	} elsif ($lines[$i] =~ m/^\s*#\s*endif\b/) {
	    $level--;
	}
    }
    goto not_found if ($level > 0); # issue a warning?
    # Check that the rest of the file consists of empty lines.
    ++$i while ($i < @lines && $lines[$i] =~ m/^\s*$/);
    goto not_found if ($i < @lines);
    return $name;

 not_found:
    return undef;
}

sub do_file {
    my $fn = shift;
    my $src = read_file($fn);
    my $ig = include_guard(strip_comments($src));
    if (not defined $ig) {
	printf STDERR "%s: no include guard\n", $fn;
	return;
    }
    if (exists $preserve{$ig}) {
	printf STDERR "%s: include guard %s exempted\n", $fn, $ig;
	return;
    }

    # OK, the entire text should match this horrible regexp.
    if ($src =~ m{
  (.*?) # arbitrary stuff before #ifndef
  (^\s*\#\s*if(?:\s*!\s*defined\s*\(\s*$ig\s*\)|ndef\s*$ig) .*? \n #
  (?:^\s*\n)*
   ^\s*\#\s*define\s*$ig .*? \n) # 2/3 of include guard
  (.*(?=^\s*\#\s*endif)) # body of file
  (^\s*\#\s*endif .*? \n) # last 1/3
  (.*) # rest of file (trailing comments)
	}smx) {
	my $pre = $1;
	my $define = $2;
	my $body = $3;
	my $endif = $4;
	my $post = $5;
	$body =~ s/\n[ \t]*\n$/\n/g;
	$src = $pre . "#pragma once\n";
	$src .= $body . $post;
    } else {
	printf STDERR "%s: has include guard %s, but I failed to replace it
with #pragma once\n",
	$fn, $ig;
	return;
    }
    write_file($fn, $src);
}

while (<STDIN>) {
    chomp;
    $preserve{$_} = 1;
}

for (@ARGV) {
    do_file($_);
}