[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK8P3a2YuoMJ654sJtzE4mJN7wdd4o5JtY8W7c9QocZX8JP6cw@mail.gmail.com>
Date: Mon, 25 Jun 2018 15:42:54 +0200
From: Arnd Bergmann <arnd@...db.de>
To: Andi Kleen <ak@...ux.intel.com>
Cc: Jens Axboe <axboe@...nel.dk>, Jan Kara <jack@...e.cz>,
Jeff Layton <jlayton@...hat.com>,
"Darrick J. Wong" <darrick.wong@...cle.com>,
y2038 Mailman List <y2038@...ts.linaro.org>,
Brian Foster <bfoster@...hat.com>,
Miklos Szeredi <miklos@...redi.hu>,
Pavel Tatashin <pasha.tatashin@...cle.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Linux FS-devel Mailing List <linux-fsdevel@...r.kernel.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
Andi Kleen <andi.kleen@...el.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Deepa Dinamani <deepa.kernel@...il.com>,
Daniel Lezcano <daniel.lezcano@...aro.org>,
Thomas Gleixner <tglx@...utronix.de>,
John Stultz <john.stultz@...aro.org>,
Stephen Boyd <sboyd@...nel.org>
Subject: Re: [PATCH] vfs: replace current_kernel_time64 with ktime equivalent
On Wed, Jun 20, 2018 at 9:35 PM, Arnd Bergmann <arnd@...db.de> wrote:
> On Wed, Jun 20, 2018 at 6:19 PM, Andi Kleen <ak@...ux.intel.com> wrote:
>> Arnd Bergmann <arnd@...db.de> writes:
>>>
>>> To clarify: current_kernel_time() uses at most millisecond resolution rather
>>> than microsecond, as tkr_mono.xtime_nsec only gets updated during the
>>> timer tick.
>>
>> Ah you're right. I remember now: the motivation was to make sure there
>> is basically no overhead. In some setups the full gtod can be rather
>> slow, particularly if it falls back to some crappy timer.
>
> This means, we're probably fine with a compile-time option that
> distros can choose to enable depending on what classes of hardware
> they are targetting, like
>
> struct timespec64 current_time(struct inode *inode)
> {
> struct timespec64 now;
> u64 gran = inode->i_sb->s_time_gran;
>
> if (IS_ENABLED(CONFIG_HIRES_INODE_TIMES) &&
> gran <= NSEC_PER_JIFFY)
> ktime_get_real_ts64(&now);
> else
> ktime_get_coarse_real_ts64(&now);
>
> return timespec64_trunc(now, gran);
> }
>
> With that implementation, we could still let file systems choose
> to get coarse timestamps by tuning the granularity in the
> superblock s_time_gran, which would result in nice round
> tv_nsec values that represent the actual accuracy.
I've done some simple tests and found that on a variety of
x86, arm32 and arm64 CPUs, it takes between 70 and 100
CPU cycles to read the TSC and add it to the coarse
clock, e.g. on a 3.1GHz Ryzen, using the little test program
below:
vdso hires: 37.18ns
vdso coarse: 6.44ns
sysc hires: 161.62ns
sysc coarse: 133.87ns
On the same machine, it takes around 400ns (1240 cycles)
to write one byte into a tmpfs file with pwrite(). Adding 5% to
10% overhead for accurate timestamps would definitely be
noticed, so I guess we wouldn't enable that unconditionally,
but could do it as an opt-in mount option if someone had a
use case.
Arnd
---
/* measure times for high-resolution clocksource access from userspace */
#include <stdio.h>
#include <time.h>
#include <unistd.h>
#include <stdbool.h>
#include <sys/syscall.h>
static int do_clock_gettime(clockid_t clkid, struct timespec *tp, bool vdso)
{
if (vdso)
return clock_gettime(clkid, tp);
return syscall(__NR_clock_gettime, clkid, tp);
}
static int loop1sec(int clkid, bool vdso)
{
int i;
struct timespec t, start;
do_clock_gettime(clkid, &start, vdso);
i = 0;
do {
do_clock_gettime(clkid, &t, vdso);
i++;
} while (t.tv_sec == start.tv_sec || t.tv_nsec < start.tv_nsec);
return i;
}
int main(void)
{
printf("vdso hires: %7.2fns\n", 1000000000.0 /
loop1sec(CLOCK_REALTIME, true));
printf("vdso coarse: %7.2fns\n", 1000000000.0 /
loop1sec(CLOCK_REALTIME_COARSE, true));
printf("sysc hires: %7.2fns\n", 1000000000.0 /
loop1sec(CLOCK_REALTIME, false));
printf("sysc coarse: %7.2fns\n", 1000000000.0 /
loop1sec(CLOCK_REALTIME_COARSE, false));
return 0;
}
Powered by blists - more mailing lists