linux-kernel - Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALAqxLVq+cQ4VS0rNYkGAt-d=+8yFr4Uy8D3-LbnjoZC_+niPA@mail.gmail.com>
Date:	Wed, 3 Jun 2015 10:44:18 -0700
From:	John Stultz <john.stultz@...aro.org>
To:	Ingo Molnar <mingo@...nel.org>
Cc:	lkml <linux-kernel@...r.kernel.org>,
	Prarit Bhargava <prarit@...hat.com>,
	Daniel Bristot de Oliveira <bristot@...hat.com>,
	Richard Cochran <richardcochran@...il.com>,
	Jan Kara <jack@...e.cz>, Jiri Bohac <jbohac@...e.cz>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	Shuah Khan <shuahkh@....samsung.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths

On Wed, Jun 3, 2015 at 2:04 AM, Ingo Molnar <mingo@...nel.org> wrote:
>
> * John Stultz <john.stultz@...aro.org> wrote:
>
>> > Instead of having these super rare special events, how about implementing leap
>> > second smearing instead? That's far less radical and a lot easier to test as
>> > well, as it's a continuous mechanism. It will also confuse user-space a lot
>> > less, because there are no sudden time jumps.
>>
>> So yea. Leap smearing/slewing is an attractive solution. The first issue is that
>> there's no standard yet for the range of time that the slew occurs (or even if
>> the slew is linear or a curve). The second is I don't think we can actually get
>> away from supporting UTC w/ leap, as applications may depend on precision. Also
>> things like NTP sync w/ mixed systems would be problematic, as NTPd and others
>> need to become savvy of which mode they are working with.
>
> Supporting it minimally is fine - supporting it with clearly unmaintainable
> complexity is not.
>
> So as long as we offer good smearing of the leap second (with a configurable
> parameter for how long the period should be), people in need of better leap
> second handling can take that.
>
>> The leap smearing method of only doing it in private networks and controlling it
>> by the NTP server is becoming more widespread, but it has its own problems,
>> since it doesn't handle CLOCK_TAI properly, and since CLOCK_REALTIME isn't yet
>> frequency steerable separately from the other clockids, this method ends up
>> slowing down CLOCK_TAI and CLOCK_MONOTONIC as well.
>
> All real time clock derived clocks should smear in sync as well.

Eeerrr.. So CLOCK_TAI is UTC without leapseconds, to smear TAI would
be wrong. Similarly, CLOCK_MONOTONIC/BOOTTIME probably shouldn't be
smeared either (but those are defined less strictly).

>> I'd like to try to get something working in the kernel so we could support
>> CLOCK_UTC and CLOCK_UTCSLS (smeared-leap-second) clockids, then allow
>> applications that care to migrate explicitly to the one they care about.
>> Possibly allowing CLOCK_REALTIME to be compile-time directed to CLOCK_UTCSLS so
>> that most applications that don't care can just ignore it.  But finding time to
>> do this has been hard (if anyone is interested in working on it, I'd be excited
>> to hear!).
>
> There should definitely be a Kconfig option to just map all relevant clocks to
> smeared seconds. Hopefully this ends up being the standard in a few years and we
> can pin down the exact parameters as well.
>
> Having separate clockids for mixed uses would be fine as well. Maybe.
>
>> But if you think this patch is complicated, creating a new separately steered
>> clockid is not going to be trvial (as there will be lots of ugly edge cases,
>> like what if a leap second is cancelled mid-way through the slewing adjustment,
>> etc).
>
> Well, I think the main advantage of leap second smearing is that it's not a
> binary, but a continuous interface, and so it's way easier to test than 'sudden'
> leap second insertions.
>
> In fact we could essentially implement leap second smearing via the usual adjtimex
> mechanisms: as far as the time code is concerned it does not matter why a gradual
> adjustment occurs, only the rate of change and the method of convergence is an
> open parameter.
>
> In fact I'd suggest we implement even original leap seconds by doing a high-rate
> 'smearing' in the final X minutes leading up to the leap second, where 'X' could
> be 1 by default. This way we could eliminate leap seconds as a separate logical
> entity mostly.
>
> This should be far more gentle to applications as well than sudden jumps, and
> timers will just work fine as well.

Well, again the problem with high-rate smearing as you describe is
that it would affect CLOCK_MONOTONIC as well, which could cause
periodic timers used for sampling, etc (imagine recording audio, etc)
to slow as well, possibly causing application problems. This is why
the smeared leap-seconds are usually done across a day at a slow rate.

To allow for CLOCK_REALTIME to be frequency adjusted separately from
CLOCK_MONOTONIC/CLOCK_TAI, which would would have the least unwanted
side-effects, we're probably going to have to manage it separately
(like we do w/ MONOTONIC_RAW time). But again, this creates a lot more
complexity.


>> > Secondly, why is there a directional flag? I thought leap seconds can only be
>> > inserted.
>>
>> A leap delete isn't likely to occur, but its supported by the adjtimex
>> interface. And given the irregularity of the earths rotation, I'm not sure I'd
>> rule it out completely.
>
> Well, the long term trend is clear and unambiguous: the rotation of Earth is
> slowing down (the main component of which is losing angular momentum to the Moon),
> hence the days are getting longer and we have to insert a leap second every second
> year or so.
>
> The short term trends (discounting massive asteorid strikes, at which point leap
> seconds will be the least of our problems) are somewhat chaotic:
>
>  - glaciation (which shifts water mass assymetrically)
>
>  - global warming (one component of which is thermal expansion, which expands
>    oceans assymetrically and shifts water mass - the other component is changing
>    climatology: different oceanic currents, etc. - which all shift mass around)
>
>  - tectonics (slow rearrangement of mass plus earthquakes).
>
>  - even slower scale rearrangement of mass (mantle plumes, etc.)
>
> but the long term trend still dominates. Look at this graph of measurements of the
> Earth's rotation:
>
>   http://en.wikipedia.org/wiki/File:Deviation_of_day_length_from_SI_day.svg
>
> See how the mean (the green line) was always above zero in the measured past. The
> monotonically increasing nature comes from that.
>
> and given how many problems we had with leap second insertion, on millions of
> installed systems, guess the likelihood of there being a leap second deleted? How
> many OSs that can do leap second insertion are unable to do leap second deletion?
>
> Also note that leap second deletion means a jump in time backward. Daylight saving
> time is already causing problems with that.

Err.. Other way around. Leap-second deletion is a jump in time forward
(jumping from 23:59:58 to 00:00:00, skipping 23:59:59). Which is
simpler to deal with. And luckily (at least for us) daylight savings
is done in userspace (as UTC, including leapseconds, ideally would be
from the kernel providing TAI time).

But yes, I agree that the leap deletion logic is likely to never run
outside of testing.


>> > So all in one, the leap second code is fragile and complex - lets re-think the
>> > whole topic instead of complicating it even more ...
>>
>> So the core complexity with this patch is that we're basically having to do
>> state-machine transitions in a read-only path (since the reads may happen before
>> the update path runs). Since there's a number of read-paths, there's some
>> duplication, and in some cases variance if the read path exports more state (ie:
>> adjtimex).
>
> My fundamental observation is: the cost/benefit ratio is insanely high.

I agree. In a perfect world, the kernel would export TAI not UTC,
leaving the translation to UTC to userspace (take heed developers of
new IoT OSes!). But the trouble is that historical posix/linux
provides UTC (without a leapsecond representation, which is why we
have to repeat a second).

And as more folks (userspace developers, not really kernel developers)
are caring about strict UTC correctness around the leapsecond, its
hard to rationalize avoiding the complexity (since they don't really
care, they just don't want to deal with anything unexpected in their
application).

> Interrupts are fundamentally jittery, there's no guarantee of their accuracy - you
> yourself said that as a reply to PeterZ's suggestion to drive leap seconds via
> hrtimers - and the motivation was to make interrupts arrive more accurately around
> leap seconds.
>
> So why make the code more fragile, more complex, just to solve a scenario that
> cannot really be done perfectly?

So here I worry I didn't communicate clearly enough what the patch does. :(

Its not about making interrupts more accurate around the leapsecond,
its about applying the leapsecond transition in the read-path
precisely at the leapsecond edge (rather then a short while later when
the timer fires and we update the timekeeping structures).

But more importantly, this change to the read path prevents timers
that may be expired before update_wall_time timer runs (most likely on
other cpus) from being expired early. Since the time read that is used
by the hrtimer expiration logic is adjusted properly right on that
edge.


> Especially as second smearing appears to be the way superior future method of
> handling leap seconds.
>

So here the problem is it depends on the user. For probably most
users, who really don't care, the leap-smear is ideal behavior for
CLOCK_REALTIME (I think leap-smears causing any change to other
clockids would be surprising). However, there are some users who
expect posix UTC leapsecond behavior. Either because they're
positioning telescopes doing things that do depend on strict solar
time, or because they are required (in some cases by law) to use UTC.

I don't think we can just abandon/break those users, for
leap-smearing. So I don't know if we can get away from that
complexity.
But maybe I'm not thinking "boldly" here.

>> I do agree that the complexity of the time subsystem is getting hard to manage.
>
> That's rather an understatement.
>
>> I'm at the point where I think we need to avoid keeping duplicated timespec and
>> ktime_t data (we can leave the ktime->timespec caching to the VDSOs). That will
>> help cut down the read paths a bit, but will also simplify updates since we'll
>> have less data to keep in sync.  How we manage the ntp state also needs a
>> rework, since the locking rules are getting too complex (bit me in an earlier
>> version of this patch), and we're in effect duplicating some of that state in
>> the timekeeper with this patch to handle the reads safely.
>
> Agreed.
>
>> But even assuming all those changes were already made, I think we'd still need
>> something close to this patch.
>
> I disagree rather strongly.

I do really appreciate the review and thoughts here, and respect and
share your concern about complexity, but I'm not yet seeing a viable
path forward with your proposals above. So additional ideas or
clarifications would be welcome.

So, I think with this push back, we're unlikely to have a solution
that will be deploy-able by the leap second at the end of the month
(though the issue was reported late enough that getting something
merged/backported/deployed in mass wasn't super realistic).  So we'll
get to hear how much folks actually care about this issue.

Since the leap is a discontinuity, and there is no way to set a
ABS_TIME CLOCK_REALTIME timer for the 23:59:60 leap second,  having a
few very early timers targeted for the next second expire early on
that repeated second is probably not a major issue in practice.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/