[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3a502aae-4124-5cb2-1dac-bc18b8158fbe@zytor.com>
Date: Tue, 27 Apr 2021 17:20:55 -0700
From: "H. Peter Anvin" <hpa@...or.com>
To: Andy Lutomirski <luto@...nel.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Ingo Molnar <mingo@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Borislav Petkov <bp@...en8.de>,
LKML <linux-kernel@...r.kernel.org>,
Oleg Nesterov <oleg@...hat.com>,
Kees Cook <keescook@...omium.org>,
Will Drewry <wad@...omium.org>
Subject: Re: pt_regs->ax == -ENOSYS
On 4/27/21 5:11 PM, Andy Lutomirski wrote:
> On Tue, Apr 27, 2021 at 5:05 PM H. Peter Anvin <hpa@...or.com> wrote:
>>
>> On 4/27/21 4:23 PM, Andy Lutomirski wrote:
>>>
>>> I much prefer the model of saying that the bits that make sense for
>>> the syscall type (all 64 for 64-bit SYSCALL and the low 32 for
>>> everything else) are all valid. This way there are no weird reserved
>>> bits, no weird ptrace() interactions, etc. I'm a tiny bit concerned
>>> that this would result in a backwards compatibility issue, but not
>>> very. This would involve changing syscall_get_nr(), but that doesn't
>>> seem so bad. The biggest problem is that seccomp hardcoded syscall
>>> nrs to 32 bit.
>>>
>>> An alternative would be to declare that we always truncate to 32 bits,
>>> except that 64-bit SYSCALL with high bits set is an error and results
>>> in ENOSYS. The ptrace interaction there is potentially nasty.
>>>
>>> Basically, all choices here kind of suck, and I haven't done a real
>>> analysis of all the issues...
>>>
>>
>> OK, I really don't understand this. The *current* way of doing it causes
>> a bunch of ugly corner conditions, including in ptrace, which this would
>> get rid of. It isn't any different than passing any other argument which
>> is an int -- in fact we have this whole machinery to deal with that subcase.
>>
>
> Let's suppose we decide to truncate the syscall nr. What would the
> actual semantics be? Would ptrace see the truncated value in orig_ax?
> How about syscall user dispatch? What happens if ptrace writes a
> value with high bits set to orig_ax? Do we truncate it again? Or do
> we say that ptrace *can't* write too large a value?
>
> For better for worse, RAX is 64 bits, orig_ax is a 64-bit field, and
> it currently has nonsensical semantics. Redefining orig_ax as a
> 32-bit field is surely possible, but doing so cleanly is not
> necessarily any easier than any other approach. If it weren't for
> seccomp, I would say that the obviously correct answer is to just
> treat it everywhere as a 64-bit number.
>
We *used* to truncate the system call number; that was unsigned. It
causes massive headache to ptrace if a 32-bit ptrace wants to write -1,
which is a bit hacky.
I would personally like to see orig_ax to be the register passed in and
for the truncation to happen by syscall_get_nr().
I also note that kernel/seccomp.c and the tracing infrastructure all
expect a signed int as the system call number. Yes, orig_ax is a 64-bit
field, but so are the other register fields which doesn't necessarily
directly reflect the value of an argument -- like, say, %rdi in the case
of sys_write - it is an int argument so it gets sign extended; this is
*not* reflected in ptrace.
-hpa
Powered by blists - more mailing lists