lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.1809272247270.8118@nanos.tec.linutronix.de>
Date:   Thu, 27 Sep 2018 23:30:09 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     "Eric W. Biederman" <ebiederm@...ssion.com>
cc:     Andrey Vagin <avagin@...tuozzo.com>,
        Dmitry Safonov <dima@...sta.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Dmitry Safonov <0x7f454c46@...il.com>,
        Adrian Reber <adrian@...as.de>,
        Andy Lutomirski <luto@...nel.org>,
        Christian Brauner <christian.brauner@...ntu.com>,
        Cyrill Gorcunov <gorcunov@...nvz.org>,
        "H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
        Jeff Dike <jdike@...toit.com>, Oleg Nesterov <oleg@...hat.com>,
        Pavel Emelianov <xemul@...tuozzo.com>,
        Shuah Khan <shuah@...nel.org>,
        "containers@...ts.linux-foundation.org" 
        <containers@...ts.linux-foundation.org>,
        "criu@...nvz.org" <criu@...nvz.org>,
        "linux-api@...r.kernel.org" <linux-api@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>,
        Alexey Dobriyan <adobriyan@...il.com>,
        "linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>
Subject: Re: [RFC 00/20] ns: Introduce Time Namespace

On Wed, 26 Sep 2018, Eric W. Biederman wrote:
> Reading the code the calling sequence there is:
> tick_sched_do_timer
>    tick_do_update_jiffies64
>       update_wall_time
>           timekeeping_advance
>              timekeepging_update
> 
> If I read that properly under the right nohz circumstances that update
> can be delayed indefinitely.
> 
> So I think we could prototype a time namespace that was per
> timekeeping_update and just had update_wall_time iterate through
> all of the time namespaces.

Please don't go there. timekeeping_update() is already heavy and walking
through a gazillion of namespaces will just make it horrible,

> I don't think the naive version would scale to very many time
> namespaces.

:)

> At the same time using the techniques from the nohz work and a little
> smarts I expect we could get the code to scale.

You'd need to invoke the update when the namespace is switched in and
hasn't been updated since the last tick happened. That might be doable, but
you also need to take the wraparound constraints of the underlying
clocksources into account, which again can cause walking all name spaces
when they are all idle long enough.

>From there it becomes hairy, because it's not only timekeeping,
i.e. reading time, this is also affecting all timers which are armed from a
namespace.

That gets really ugly because when you do settimeofday() or adjtimex() for
a particular namespace, then you have to search for all armed timers of
that namespace and adjust them.

The original posix timer code had the same issue because it mapped the
clock realtime timers to the timer wheel so any setting of the clock caused
a full walk of all armed timers, disarming, adjusting and requeing
them. That's horrible not only performance wise, it's also a locking
nightmare of all sorts.

Add time skew via NTP/PTP into the picture and you might have to adjust
timers as well, because you need to guarantee that they are not expiring
early.

I haven't looked through Dimitry's patches yet, but I don't see how this
can work at all without introducing subtle issues all over the place.

Thanks,

	tglx



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ