[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LSU.2.20.2110131601000.26294@wotan.suse.de>
Date: Wed, 13 Oct 2021 16:24:28 +0000 (UTC)
From: Michael Matz <matz@...e.de>
To: Willy Tarreau <w@....eu>
cc: Borislav Petkov <bp@...en8.de>,
Ammar Faizi <ammar.faizi@...dents.amikom.ac.id>,
Paul Walmsley <paul.walmsley@...ive.com>,
Palmer Dabbelt <palmer@...belt.com>,
Albert Ou <aou@...s.berkeley.edu>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Andy Lutomirski <luto@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, x86@...nel.org,
"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH] tools/nolibc: x86: Remove `r8`, `r9` and `r10` from the
clobber list
Hello,
On Wed, 13 Oct 2021, Willy Tarreau wrote:
> On Wed, Oct 13, 2021 at 04:20:55PM +0200, Borislav Petkov wrote:
> > On Wed, Oct 13, 2021 at 04:07:23PM +0200, Willy Tarreau wrote:
> > > Yes I agree with the "potentially" here. If it can potentially be (i.e.
> > > the kernel is allowed by contract to later change the way it's currently
> > > done) then we have to save them even if it means lower code efficiency.
> > >
> > > If, however, the kernel performs such savings on purpose because it is
> > > willing to observe a stricter saving than the AMD64 ABI, we can follow
> > > it but only once it's written down somewhere that it is by contract and
> > > will not change.
> >
> > Right, and Micha noted that such a change to the document can be done.
>
> great.
>
> > And we're basically doing that registers restoring anyway, in POP_REGS.
>
> That's what I based my analysis on when I wanted to verify Ammar's
> finding. I would tend to think that if we're burning cycles popping
> plenty of registers it's probably for a reason, maybe at least a good
> one, which is that it's the only way to make sure we're not leaking
> internal kernel data! This is not a concern for kernel->kernel nor
> user->user calls but for user->kernel calls it definitely is one, and
> I don't think we could relax that series of pop without causing leaks
> anyway.
It might also be interesting to know that while the wording of the psABI
was indeed intended to imply that all argument registers are potentially
clobbered (like with normal calls) glibc's inline assembler to call
syscalls relies on most registers to actually be preserved:
# define REGISTERS_CLOBBERED_BY_SYSCALL "cc", "r11", "cx"
...
#define internal_syscall6(number, arg1, arg2, arg3, arg4, arg5, arg6) \
({ \
unsigned long int resultvar; \
TYPEFY (arg6, __arg6) = ARGIFY (arg6); \
TYPEFY (arg5, __arg5) = ARGIFY (arg5); \
TYPEFY (arg4, __arg4) = ARGIFY (arg4); \
TYPEFY (arg3, __arg3) = ARGIFY (arg3); \
TYPEFY (arg2, __arg2) = ARGIFY (arg2); \
TYPEFY (arg1, __arg1) = ARGIFY (arg1); \
register TYPEFY (arg6, _a6) asm ("r9") = __arg6; \
register TYPEFY (arg5, _a5) asm ("r8") = __arg5; \
register TYPEFY (arg4, _a4) asm ("r10") = __arg4; \
register TYPEFY (arg3, _a3) asm ("rdx") = __arg3; \
register TYPEFY (arg2, _a2) asm ("rsi") = __arg2; \
register TYPEFY (arg1, _a1) asm ("rdi") = __arg1; \
asm volatile ( \
"syscall\n\t" \
: "=a" (resultvar) \
: "0" (number), "r" (_a1), "r" (_a2), "r" (_a3), "r" (_a4), \
"r" (_a5), "r" (_a6) \
: "memory", REGISTERS_CLOBBERED_BY_SYSCALL); \
(long int) resultvar; \
})
Note in particular the missing clobbers or outputs of any of the argument
regs.
So, even though the psABI (might have) meant something else, as glibc is
doing the above we in fact have a de-facto standard that the kernel can't
clobber any of the argument regs. The wording and the linux x86-64
syscall implementation (and use in glibc) all come from the same time in
2001, so there never was a time when the kernel was not saving/restoring
the arg registers, so it can't stop now.
In effect this means the psABI should be clarified to explicitely say the
the arg registers aren't clobbered, i.e. that the mentioned list of
clobbered regs isn't inclusive but exclusive. I will do that.
When I was discussing this with Boris earlier I hadn't yet looked at glibc
use but only gave my interpretation from memory and reading. Obviously
reality trumps anything like that :-)
In short: Ammars initial claim:
> Linux x86-64 syscall only clobbers rax, rcx and r11 (and "memory").
>
> - rax for the return value.
> - rcx to save the return address.
> - r11 to save the rflags.
>
> Other registers are preserved.
is accurate and I will clarify the psABI to make that explicit.
Ciao,
Michael.
Powered by blists - more mailing lists