linux-kernel - Re: [PATCH] vfs: replace current_kernel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAK8P3a1Lr2y1SZonodsyUETkc3d_NBOzyOGGv0OFSW=Xo_RVsA@mail.gmail.com>
Date:   Wed, 20 Jun 2018 21:35:00 +0200
From:   Arnd Bergmann <arnd@...db.de>
To:     Andi Kleen <ak@...ux.intel.com>
Cc:     Jens Axboe <axboe@...nel.dk>, Jan Kara <jack@...e.cz>,
        Jeff Layton <jlayton@...hat.com>,
        "Darrick J. Wong" <darrick.wong@...cle.com>,
        y2038 Mailman List <y2038@...ts.linaro.org>,
        Brian Foster <bfoster@...hat.com>,
        Miklos Szeredi <miklos@...redi.hu>,
        Pavel Tatashin <pasha.tatashin@...cle.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux FS-devel Mailing List <linux-fsdevel@...r.kernel.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Andi Kleen <andi.kleen@...el.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Deepa Dinamani <deepa.kernel@...il.com>,
        Daniel Lezcano <daniel.lezcano@...aro.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        John Stultz <john.stultz@...aro.org>,
        Stephen Boyd <sboyd@...nel.org>
Subject: Re: [PATCH] vfs: replace current_kernel_time64 with ktime equivalent

On Wed, Jun 20, 2018 at 6:19 PM, Andi Kleen <ak@...ux.intel.com> wrote:
> Arnd Bergmann <arnd@...db.de> writes:
>>
>> To clarify: current_kernel_time() uses at most millisecond resolution rather
>> than microsecond, as tkr_mono.xtime_nsec only gets updated during the
>> timer tick.
>
> Ah you're right. I remember now: the motivation was to make sure there
> is basically no overhead. In some setups the full gtod can be rather
> slow, particularly if it falls back to some crappy timer.
>
> I think it would be ok if it falls back to jiffies if TSC or a similar
> fast timer doesn't work. But the function you're using likely
> doesn't do that?

My patch as posted just uses ktime_get_coarse_real_ts64(), which
doesn't ever access the hires clocksource, the change is just cosmetic
so far.

The timekeeping and clocksource core code (maintainers added to Cc)
doesn't yet export an API that we can use to determine whether the
clocksource is "fast" or not, but I would expect that we can decide
to add that if needed.

This is also something that definitely changed over the years since
your patch was originally added. Back then, the x86 TSC probably
wasn't reliable enough to depend on it but now I would guess that
very few x86 machines in production use care. On embedded systems,
we used to have all kinds of clocksource drivers with varying
characteristics, but nowadays the embedded market is dominated
by ARMv7VE (Cortex-A7/A15/A17) or ARMv8, which are required
to have a fast clocksource (drivers/clocksource/arm_arch_timer.c),
and a lot of the others have it too (risc-v, modern mips, all ppc32,
most ARM Cortex-A9, ...).
The traditional non-x86 architectures (s390, powerpc, sparc) that
are still being used have of course had low-latency clocksource
access for a much longer time.

This means, we're probably fine with a compile-time option that
distros can choose to enable depending on what classes of hardware
they are targetting, like

struct timespec64 current_time(struct inode *inode)
{
        struct timespec64 now;
        u64 gran = inode->i_sb->s_time_gran;

        if (IS_ENABLED(CONFIG_HIRES_INODE_TIMES) &&
            gran <= NSEC_PER_JIFFY)
                  ktime_get_real_ts64(&now);
        else
                  ktime_get_coarse_real_ts64(&now);

        return timespec64_trunc(now, gran);
}

With that implementation, we could still let file systems choose
to get coarse timestamps by tuning the granularity in the
superblock s_time_gran, which would result in nice round
tv_nsec values that represent the actual accuracy.

Obviously this still needs performance testing on various bits
of real hardware, but I can imagine that the overhead is rather
small on hardware from the past five years.

     Arnd