[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150218171404.GA31353@midget.suse.cz>
Date: Wed, 18 Feb 2015 18:14:04 +0100
From: Jiri Bohac <jbohac@...e.cz>
To: John Stultz <john.stultz@...aro.org>,
Roman Zippel <zippel@...ux-m68k.org>
Cc: Prarit Bhargava <prarit@...hat.com>,
lkml <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Miroslav Lichvar <mlichvar@...hat.com>,
Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH] time, ntp: Do not update time_state in middle of leap
second [v3]
On Tue, Feb 17, 2015 at 03:16:18PM -0800, John Stultz wrote:
> Ok, thanks for the more verbose explanation. Although this is more a
> history of what you've seen rather then the crux of the change.
>
> To distill this down just a bit, the point is the usual mode for NTP
> time_state machine looks like:
>
> TIME_OK -> TIME_INS -> TIME_OOP
> | |
> v v
> TIME_DEL ------------> TIME_WAIT -(back)-> TIME_OK
>
> (hopefully the ascii art survives here)
>
> Now, from any of these states, currently if adjtimex is called w/ the
> STA_PLL bit cleared (after STA_PLL was set), we reset back to TIME_OK,
> effectively cancelling any transitions. (You'll have to imagine a line
> from any of the states back to TIME_OK, since that's going to be too
> ugly to do in ascii)
>
> Your patch is trying to remove the line back from TIME_OOP back to
> TIME_OK. Basically stopping the ability to reset the ntp state during
> a leapsecond.
>
> I do get that the behavior seen was strange due to a bug in the test
> code which caused unexpected cancellation of state, but I'm not sure
> if we should change the behavior to enforce that cancellation not be
> possible. I could imagine some logic which really wants to reset the
> state, which just by chance lands during a leap second, and the
> application is confused since the state change didn't occur as
> expected.
>
> So I guess I'm not seeing that the state machine is actually "broken"
> in this case that you've outlined. If you can articulate better why
> the OOP -> OK transition is truly invalid, I'd be interested in
> hearing, but I'm not sure I want to risk a behavioral change unless
> there's wide agreement.
I think the only real problem occurs when the adjtimex is called in the
the TIME_OOP state with STA_PLL cleared _and_ STA_INS set.
In this case the state machine is reset to TIME_OK but goes back
to TIME_INS on the next second_overflow, potentially causing
another false leap second to be inserted on the following
midnight.
The state machine is meant to only go back to TIME_INS once STA_INS is
cleared and then set again - this is what the TIME_WAIT state is
for.
In fact, I don't see a reason why the STA_PLL -> !STA_PLL transition should
ever set the time_state to TIME_OK.
- When the STA_INS/STA_DEL flag is removed from the status, the state
machine will end up in TIME_OK from any state.
- When STA_INS/STA_DEL is set in
the status, the state mchine will transition from TIME_OK to
TIME_INS/TIME_DEL anyway.
I think the "time_status = TIME_OK" should be just dropped.
It has been added by eea83d896e318bda54be2d2770d2c5d6668d11db
(ntp: NTP4 user space bits update) and it's not clear why.
Roman?
--
Jiri Bohac <jbohac@...e.cz>
SUSE Labs, SUSE CZ
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists