[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150213173345.GA26217@e104818-lin.cambridge.arm.com>
Date: Fri, 13 Feb 2015 17:33:46 +0000
From: Catalin Marinas <catalin.marinas@....com>
To: Rich Felker <dalias@...c.org>
Cc: "libc-alpha@...rceware.org" <libc-alpha@...rceware.org>,
"arnd@...db.de" <arnd@...db.de>,
"pinskia@...il.com" <pinskia@...il.com>,
"musl@...ts.openwall.com" <musl@...ts.openwall.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Andrew Pinski <apinski@...ium.com>,
Marcus Shawcroft <Marcus.Shawcroft@....com>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCHv3 00/24] ILP32 support in ARM64
On Fri, Feb 13, 2015 at 11:30:13AM -0500, Rich Felker wrote:
> On Fri, Feb 13, 2015 at 01:33:56PM +0000, Catalin Marinas wrote:
> > On Thu, Feb 12, 2015 at 07:59:24PM +0100, Arnd Bergmann wrote:
> > > Catalin Marinas <catalin.marinas@....com> hat am 12. Februar 2015 um 19:17
> > > geschrieben:
> > > > The solution (for new ports) could be similar to the other such
> > > > solutions in the compat layer. A kernel internal structure which is
> > > > binary-compatible with the ILP32 user one (as exported by the kernel):
> > > >
> > > > struct ilp32_timespec_kernel_internal_only {
> > > > __kernel_time_t tv_sec; /* seconds */
> > > > int tv_nsec; /* nanoseconds */
> > > > };
> > > >
> > > > and a syscall wrapper which converts between ilp32_timespec and timespec
> > > > (take compat_sys_clock_settime as an example).
> > >
> > > We then have to to this on all architectures, and not call it ilp32_timespec,
> > > but call it something else.
> > >
> > > I would much prefer to only have two versions of each syscall that takes a
> > > timespec rather than three versions, or having a version that behaves
> > > differently based on the type of program calling it. On native 32-bit
> > > systems, we should have the native syscall taking the 16-byte structure
> > > (using long long __kernel_time64_t)
> >
> > Can this also be 12 bytes in general if tv_nsec stays as 32-bit? The
> > size of such structure would be 16 bytes on ARM but I guess this depends
> > on long long the alignment requirements on specific architectures.
>
> The only archs with modern relevance I'm aware of where 64-bit types
> are not aligned are i386 and, by a regretable but hard-to-fix mistake,
> or1k. I don't have much opinion on whether the 64-bit-time_t timespec
> should be 12 bytes or 16 bytes on such archs. From my perspective it's
> a new ABI anyway so I'd like to be able to fix the 64-bit alignment
> issue at the same time, in which case the question would go away, but
> I'm sure others (glibc) will prefer a more transitional approach with
> symbol versioning or feature test macros or something.
The good thing about 16-byte timespec64 with appropriate (endianness
aware) struct padding is that the kernel can write tv_nsec to user as a
64-bit value (long on a 64-bit kernel). It's only the reading from user
that the 32-bit needs to be sign-extended into the kernel structure.
> > > In the kernel, it comes down to a function like
> > >
> > > int get_user_timespec64(struct timespec64 *ts, struct __kernel_timespec64 __user
> > > *uts, bool task_32bit)
> > > {
> > > struct __kernel_timespec64 input;
> > >
> > > if (copy_from_user(&input, uts, sizeof(input))
> > > return -EFAULT;
> > >
> > > ts->tv_sec = input.tv_sec;
> > > if (task_32bit)
> > > ts->tv_nsec = (int)input.tv_nsec;
> > > else
> > > ts->tv_nsec = input.tv_nsec;
> > >
> > > return 0;
> > > }
> >
> > The only drawback is that native 64-bit and new 32-bit have the same
> > handling path, potentially slowing down the former (it may not be
> > noticeable).
>
> Offhand, I would not consider a single predictable branch on syscall
> entry or return to be noticable relative to general syscall overhead.
It's not just a check+branch but accessing some TIF flag which requires
reading the current_thread_info()->flags and testing it. It is probably
lost in the noise, unless you do such calls in loop where you may notice
a slight variation (it depends on the branch predictor as well; on some
architecture we may be able to make use of unlikely(task_32bit)).
> > > The data structure definition is a little bit fragile, as it depends on
> > > user space not using the __BIT_ENDIAN symbol in a conflicting way. So
> > > far we have managed to keep that outside of general purpose headers, but
> > > it should at least blow up in an obvious way if it does, rather than
> > > breaking silently.
> > >
> > > I still think it's more practical to keep the zeroing in user space though.
> > > In that case, we keep defining __kernel_timespec64 with a 'typedef long
> > > long __kernel_snseconds_t', and it's up to the libc to either use
> > > __kernel_timespec64 as its timespec, or to define a C11-compliant
> > > timespec itself and zero out the bits before passing the data to the kernel.
> >
> > The problem with doing this in user space is syscall(2). If we don't
> > allow it, then it's fine to do the padding in libc.
>
> It's already the case that callers have to tiptoe around syscall(2)
> usage on a per-arch basis for silly things like the convention for
> passing 64-bit arguments on 32-bit archs, different arg orders to work
> around 64-bit alignment and issues with too many args, and various
> legacy issues. So I think manual use of syscall(2) is a less-critical
> issue, though of course from a libc perspective I would very much like
> for the kernel to handle it right.
I think there is another problem with sign-extending tv_nsec in libc.
The prototype for functions like clock_settime(2) take a const struct
timespec *. There isn't anything to prevent such structure being in a
read-only section, even though it is unlikely. So libc would have to
duplicate the structure rather than just sign-extending tv_nsec in
place.
BTW, I'll be offline for a week (holiday) and I won't be able to follow
up on this thread.
--
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists