lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2233518.Z2Q4dpO62C@wuerfel>
Date:	Wed, 22 Apr 2015 13:07:44 +0200
From:	Arnd Bergmann <arnd@...db.de>
To:	y2038@...ts.linaro.org
Cc:	Thomas Gleixner <tglx@...utronix.de>, pang.xunlei@...aro.org,
	Peter Zijlstra <peterz@...radead.org>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Heiko Carstens <heiko.carstens@...ibm.com>,
	Paul Mackerras <paulus@...ba.org>, cl@...ux.com,
	Ingo Molnar <mingo@...nel.org>, heenasirwani@...il.com,
	linux-arch@...r.kernel.org, linux-s390@...r.kernel.org,
	mpe@...erman.id.au, rafael.j.wysocki@...el.com, ahh@...gle.com,
	Frederic Weisbecker <fweisbec@...il.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>, pjt@...gle.com,
	riel@...hat.com, richardcochran@...il.com,
	Tejun Heo <tj@...nel.org>,
	John Stultz <john.stultz@...aro.org>, rth@...ddle.net,
	Baolin Wang <baolin.wang@...aro.org>,
	gregkh@...uxfoundation.org, LKML <linux-kernel@...r.kernel.org>,
	netdev@...r.kernel.org,
	Martin Schwidefsky <schwidefsky@...ibm.com>,
	linux390@...ibm.com, linuxppc-dev@...ts.ozlabs.org
Subject: Re: [Y2038] [PATCH 04/11] posix timers:Introduce the 64bit methods with timespec64 type for k_clock structure

On Wednesday 22 April 2015 10:45:23 Thomas Gleixner wrote:
> On Tue, 21 Apr 2015, Thomas Gleixner wrote:

> So we could save one translation step if we implement new syscalls
> which have a scalar nsec interface instead of the timespec/timeval
> cruft and let user space do the translation to whatever it wants.
> 
> So
> 
> sys_clock_nanosleep(const clockid_t which_clock, int flags,
> 	            const struct timespec __user *expires,
> 		    struct timespec __user *reminder)
> 
> would get the new syscall variant:
> 
> sys_clock_nanosleep_ns(const clockid_t which_clock, int flags,
> 		       const s64 expires, s64 __user *reminder)

As you might expect, there are a number of complications with this
approach:

- John Stultz likes to point out that it's easier to do one change
  at a time, so extending the interface to 64-bit has less potential
  of breaking things than a more fundamental change. I think it's
  useful to drop a lot of the syscalls when a more modern version
  is around (e.g. let libc implement usleep and nanosleep through
  clock_nanosleep), but keep the syscalls as close to the known-working
  64-bit versions as we can.
- The inode timestamp related syscalls (stat, utimes and variants
  thereof) require the full range of time64_t and cannot use ktime_t.
- converting between timespec types of different size is cheap,
  converting timespec to ktime_t is still relatively cheap, but
  converting ktime_t to timespec is rather expensive (at least eight
  32-bit multiplies, plus a few shifts and additions if you don't
  have 64-bit arithmetic).
- ioctls that pass a timespec need to keep doing that or would require
  a source-level change in user space instead of recompiling.

> I personally would welcome such an interface as it makes user space
> programming simpler. Just (re)arming a periodic nanosleep based on
> absolute expiry time is horrible stupid today:
> 
> 	 struct timespec expires;
> 	 ....
> 	 while ()
> 	       expires.tv_nsec += period.tv_nsec;
> 	       expires.tv_sec += period.tv_sec;
> 	       normalize_timespec(&expires);
> 	       sys_clock_nanosleep(CLOCK_ID, ABS, &expires, NULL);
> 
> So with a scalar interface this would reduce to:
> 
> 	 s64 expires;
> 	 ....
> 	 while ()
> 	       expires += period;
> 	       sys_clock_nanosleep_ns(CLOCK_ID, ABS, &expires, NULL);
> 
> There is a difference both in text and storage size plus the avoidance
> of the two translation steps (one translation step on 64bit).

We should probably look at it separately for each syscall. It's
quite possible that we find a number of them for which it helps
and others for which it hurts, so we need to see the big pictures.

There are also a few other calls that will never need 64-bit
time_t because the range is limited by the need to only ever
pass relative timeouts (select, poll, io_getevents, recvmmsg,
clock_getres, rt_sigtimedwait, sched_rr_get_interval, getrusage,
waitid, semtimedop, sysinfo), so we could actually leave them
using a 32-bit structure and have the libc do the conversion.

> I know that this is non portable, but OTOH if I look at the non
> portable mechanisms which are used by data bases, java VMs and other
> apps which exist to squeeze the last cycles out of the system, there
> is certainly some value to that.
> 
> The portable/spec conforming apps can still use the user space
> assisted translated timespec/timeval mechanisms.
> 
> There is one caveat though: sys_clock_gettime and sys_gettimeofday
> will still need a syscall_timespec64 variant. We have no double
> translation steps there because we maintain the timespec
> representation in the timekeeping code for performance reasons to
> avoid the division in the syscall interface. But everything else can
> do nicely without the timespec cruft.
> 
> We really should talk to libc folks and high performance users about
> this before blindly adding a gazillion of new timespec64 based
> interfaces.

I've started a list of affected syscalls at
https://docs.google.com/spreadsheets/d/1HCYwHXxs48TsTb6IGUduNjQnmfRvMPzCN6T_0YiQwis/edit?usp=sharing

Still adding more calls and description, let me know if you want edit
permissions.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ