linux-hardening - Re: mainline build failure of powerpc allmodconfig for prom_init

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220718220839.GF25951@gate.crashing.org>
Date:   Mon, 18 Jul 2022 17:08:39 -0500
From:   Segher Boessenkool <segher@...nel.crashing.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Michael Ellerman <mpe@...erman.id.au>,
        Kees Cook <keescook@...omium.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Paul Mackerras <paulus@...ba.org>,
        linux-hardening@...r.kernel.org,
        linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>,
        Sudip Mukherjee <sudipm.mukherjee@...il.com>
Subject: Re: mainline build failure of powerpc allmodconfig for prom_init_check

On Mon, Jul 18, 2022 at 12:06:52PM -0700, Linus Torvalds wrote:
> On Sun, Jul 17, 2022 at 9:41 PM Michael Ellerman <mpe@...erman.id.au> wrote:
> > >         li 4,254                 #,
> >
> > Here we load 254 into r4, which is the 2nd parameter to memset (c).
> 
> I love how even powerpc people know that "4" is bogus, and have to
> make it clear that it means "r4".

This is compiler output.  Compiler output is mainly meant for the
assembler to produce object code from.  It isn't meant to be readable
(and e.g. -fverbose-asm didn't help much here, that's the "#," ;-) ).

The mnemonic determines what the operands mean.  It is much easier to
read and write "li 4,254" than "li r4,254" or "li %r4,254", all of which
are valid.  You can also write "li 3+1,2*127", but not with the other
forms (this is useful if you use assembler macros, which are way more
powerful and appropriate than abusing the C preprocessor, when writing
assembler code).

It matters more if you have three or four or five or six operands to an
assembler instruction, all the extra line noise makes things illegible.

The "%r4" variant hails from winnt.  It is a bit problematic in inline
assembler, because you need to escape the % in extended inline asm, but
not in basic inline asm.  It also is pure line noise to read.

The "r4" variant is problematic if you have symbols named the same.
When you use the -mregnames assembler option it is taken to mean the
register; you can write "(r6)" to mean the symbol.  (There also are "sp"
and "rtos" and "xer" and whatnot, not just "r4").

> I don't understand why the powerpc assembler is so messed up, and uses
> random integer constants for register "names".

360 was the same.  370 was the same.  390 is the same.  801 was the
same.  RIOS (aka POWER) was the same.  So yes, PowerPC inherited it, I
don't know how much thought was put into this, don't change a winning
team etc.

> And it gets even worse, when you start mixing FP, vector and integer "names".

It is clear from the mnemonic what the operands are: some register, an
immediate, a constant, etc.  An expression (which can include object
symbols) can be any of those.

Assembler language is unforgiving.  It isn't easy to write, and most
mistakes will not be diagnosed.  If the assmbler language makes it
easier to read the code, that makes it more likely correct code will be
written, and that correct code will be written in less time.

> I've seen many bad assemblers (in fact, I have *written* a couple of
> bad assemblers myself), but I have never seen anything quite that
> broken on any other architecture.
> 
> Oddities, yes ("$" as a prefix for register? Alpha asm is also very
> odd), but nothing *quite* as broken as "simple constants have entirely
> different meanings depending on the exact instruction and argument
> position".

What is broken about that?  It makes everything very consistent, and
very readable.  Sigils are just nasty, and having the register names the
same as valid symbol names is also problematic.

> It's not even an IBM thing. S390 uses perfectly sane register syntax,
> and calls things '%r4" etc.

s390 has the same syntax, and even inherited the GAS code for this from
the ppc port.

> The human-written asm files have those #define's in headers just to
> make things slightly more legible, because apparently the assembler
> doesn't even *accept* the sane names.

That was true a long time ago.  And the "#define r0 0" thing caused
quite a few bugs itself btw.

> So it's not even a "the compiler
> generates this abbreviated illegible mess". It's literally that the
> assembler is so horrid.

The disassembler has shown "r4" etc. by default since ages.  The
assembler needs -mregnames to accept it; enabling this by default would
be a compatibility break, not acceptable.

> Why do people put up with that?

Why are people misinformed?

Is there anything in particular in the documentation we could improve?

Hope this helps,

Segher