[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <4E582577.2060805@zytor.com>
Date: Fri, 26 Aug 2011 16:00:07 -0700
From: "H. Peter Anvin" <hpa@...or.com>
To: LKML <linux-kernel@...r.kernel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
"H.J. Lu" <hjl.tools@...il.com>, Ingo Molnar <mingo@...e.hu>,
Thomas Gleixner <tglx@...utronix.de>
Subject: RFD: x32 ABI system call numbers
Hello all,
As most of you know I and H.J. Lu have been working on a native 32-bit
ABI for x86-64 Linux. H.J. has had a prototype git tree for a while; I
am currently in the process of cleaning up the kernel patches to post.
Before posting, Ingo suggested that I discuss the handling of system
calls, as this affects some of the machinery that needs to go into the
patchset.
x32 uses mostly the compat system calls already available for the i386
ABI (which means it also uses i386 ABI numbers and data structure
layouts). There are only seven, mostly signal-related, entirely new
system calls, and most of them are trivial wrappers.
x32 uses the same SYSCALL64 instruction as native x86-64. Currently, on
x86, the choice of system call ABI is a purely local property -- a
64-bit process can call int $0x80 and get the i386 ABI. I have wanted
to keep this property and avoid testing global state for the meaning of
a system call. As such, the only thing that is available to distinguish
an x32 system call from an x86-64 system call is the system call number
itself.
In the current patchset, rather than having two separate system call
tables (which would add several instructions to the system call entry
path, including for native 64-bit binaries) we have added the x32 system
calls to the 64-bit system call table with a small gap (starting at 512)
to avoid adding to the cache footprint of native 64-bit processes.
However, this leads to an annoying problem for the system calls which do
*not* need to be duplicated between x86-64 and x32, which is actually
most system calls -- 218 of 310 in the current kernel. Unfortunately, a
single subsystem -- input -- uses is_compat() on a bunch of the I/O
paths, even changing things like the text format of sysfs entries
depending on the ABI of the user space process.
Rather than duplicating the system call table, we are proposing to deal
with that by setting bit 30 in the system call number across the board
when called from x32, so we end up with:
# Shared system call, sys_read (0)
x86-64: %eax = 0x00000000
x32: %eax = 0x40000000
# Unshared system call, sys_stat (4/513)
x86-64: %eax = 0x00000004
x32: %eax = 0x40000201
The extra bit would be masked off and only affect device drivers like
input which relies on is_compat().
The question here is if anyone has a reason to believe this would be
unacceptable.
-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists