[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.2502172208570.65342@angie.orcam.me.uk>
Date: Mon, 17 Feb 2025 23:23:11 +0000 (GMT)
From: "Maciej W. Rozycki" <macro@...am.me.uk>
To: Thomas Weißschuh <linux@...ssschuh.net>
cc: Willy Tarreau <w@....eu>, Shuah Khan <shuah@...nel.org>,
Thomas Bogendoerfer <tsbogend@...ha.franken.de>,
linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
linux-mips@...r.kernel.org, llvm@...ts.linux.dev
Subject: Re: [PATCH] tools/nolibc: add support for N64 and N32 ABIs
Hi Thomas,
> thanks for your feedback!
You're welcome. Sadly little MIPS-fu seems still available nowadays and
for me it's often too easy to miss MIPS-related topics in the mailing list
flood.
> > Why is this code breaking stack alignment just to have to fix it up two
> > instructions down the line? Or is it that the incoming $sp is not aligned
> > in the first place (in which case we're having a deeper problem).
>
> nolibc itself does not assume that $sp is aligned.
> Maybe Willy can explain the historical background.
I'm all ears.
> The System V ABI MIPS supplement [0] says the following:
>
> The registers listed below have the specified contents at process entry:
> ...
>
> $sp The stack pointer holds the address of the bottom of the stack, which
> must be doubleword (8 byte) aligned.
> ...
>
> However "process entry" is main(), while this code is running before main.
Umm, no, process entry definitely is not main(); if you refer to the
somewhat obsolete and inaccurate MIPS psABI, then please note that the
paragraph right above your quote says:
$2 A non-zero value specifies a function pointer the application
should register with atexit(BA_OS). If $2 contains zero, no
action is required.
and one immediately below it says:
$31 The return address register is set to zero so that programs that
search backward through stack frames (stack backtracing) recognize
the last stack frame, that is, a stack frame with a zero in the
saved $31 slot.
and there is more on the initial process stack earlier on, including the
location of the auxiliary vector. All it making it clear it's not main()
this specification refers to, but the entry point from the OS kernel (then
of course you're aware already it's o32 it talks about; n64/n32 requires
16 bytes, but then again we only have secondary references[1][2], in this
case for SGI IRIX).
> The kernel always aligns the stack to a multiple of 16 bytes.
> See the usage of STACK_ROUND() in fs/binfmt_elf.c.
>
> So I guess we could remove the manual alignment.
> (At least for alignments of 16 bytes and less)
I think they all need to go then, but then stack pointer adjustments have
to be made in multiples of the alignment required by the psABI of course.
> > > +
> > > + /* ABI requires current function address in $t9 */
> > > +#if defined(_ABIO32) || defined(_ABIN32)
> > > + "lui $t9, %hi(_start_c)\n"
> > > "ori $t9, %lo(_start_c)\n"
> > > +#else
> > > + "lui $t9, %highest(_start_c)\n"
> > > + "ori $t9, %higher(_start_c)\n"
> > > + "dsll $t9, 0x10\n"
> > > + "ori $t9, %hi(_start_c)\n"
> > > + "dsll $t9, 0x10\n"
> > > + "ori $t9, %lo(_start_c)\n"
> >
> > This could be optimised using a temporary (e.g. $at, but I guess any will
> > do as I gather we don't have any ABI abnormalities here).
>
> clang rejects manual usage of $at without ".set noat".
> So $t0 is simpler.
It's always `.set at' that's required to use $at by hand; it's been so
long before clang was even thought of. Or you could use LA and DLA macros
for o32/n32 and n64 respectively for the assembler to do all this stuff
for you in the default `.set reorder' mode (assuming that clang is not
completely broken here).
> > > +#endif
> > > +
> > > "jalr $t9\n" /* transfer to c runtime
> > > */
> > > " nop\n" /* delayed slot
> >
> > On an unrelated matter JALR above ought to be JAL (or otherwise there's
> > no point in using the .cprestore pseudo-op). And I fail to see why this
> > code has to be "noreorder" (except for the .cpload piece, of course), it's
> > just asking for troubles.
>
> Thanks for the hints.
>
> Without "noreorder", is the manually addition of the delayed slot "nop"
> still necessary?
It's not. It's for the `.set noreorder' mode only, to fill the branch
delay slot by hand. Otherwise it'll become just a useless instruction
following the call sequence and executed after return.
> These points also apply to the existing O32 implementation, right?
Correct. Sadly it's the first time I see this code.
Overall I find it a bit of a chimera: it uses `.set noreorder' and
explicit relocations on one hand, and then high-level assembly `.cpload'
and `.cprestore' pseudo-ops on the other, effectively mixing the two
styles of assembly.
The pseudo-ops come from times when using assembly macros was the norm
and are there to support that coding model where macros rely on these
pseudo-ops, and before the use of explicit relocations became the norm at
least for GCC. In the absence of assembly macros you can write code
expansions for these pseudo-ops by hand, just as what GCC does nowadays
(in the `-mexplicit-relocs' mode, which is usually the default).
But due to architecture variations it's very hard to write handcoded
assembly in the `.set noreorder' mode: you need to take care of all the
data dependencies that appear due to the lack of interlocking between
pipeline stages, which results in code that either works correctly
everywhere, but is suboptimal for newer architecture revisions, or code
that works better with newer architecture revisions, but is actually
broken with earlier ones.
About the only typical case where you do want to use `.set noreorder' is
to schedule a branch delay slot by hand due to a data anti-dependency
between the two instructions. Patchable/self-modifying code is a less
frequent case. And I had literally one case in my entire career where I
wanted to actually jump to a branch delay slot instruction (it's still
there in arch/mips/include/asm/div64.h, in do_div64_32(); see label #2).
Also it appears no code in this function actually relies on $gp having
been set up, so perhaps this stuff can just be discarded in the first
place?
References:
[1] "MIPSpro 64-Bit Porting and Transition Guide",
<https://irix7.com/techpubs/007-2391-006.pdf>
[2] "MIPSpro N32 ABI Handbook",
<https://irix7.com/techpubs/007-2816-004.pdf>
Maciej
Powered by blists - more mailing lists