[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87ttiwkel8.ffs@tglx>
Date: Fri, 17 May 2024 10:49:39 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Justin Stitt <justinstitt@...gle.com>
Cc: John Stultz <jstultz@...gle.com>, Stephen Boyd <sboyd@...nel.org>,
Nathan Chancellor <nathan@...nel.org>, Bill Wendling <morbo@...gle.com>,
linux-kernel@...r.kernel.org, llvm@...ts.linux.dev,
linux-hardening@...r.kernel.org
Subject: Re: [PATCH] ntp: remove accidental integer wrap-around
On Thu, May 16 2024 at 16:40, Justin Stitt wrote:
> On Tue, May 14, 2024 at 3:38 AM Thomas Gleixner <tglx@...utronix.de> wrote:
>> So how can 0xf42400 + 500000/1000 overflow in the first place?
>>
>> It can't unless time_maxerror is somehow initialized to a bogus
>> value and indeed it is:
>>
>> process_adjtimex_modes()
>> ....
>> if (txc->modes & ADJ_MAXERROR)
>> time_maxerror = txc->maxerror;
>>
>> So that wants to be fixed and not the symptom.
>
> Isn't this usually supplied from the user and can be some pretty
> random stuff?
Sure it comes from user space and can contain random nonsense as
syzkaller demonstrated.
> Are you suggesting we update timekeeping_validate_timex() to include a
> check to limit the maxerror field to (NTP_PHASE_LIMIT-(MAXFREQ /
> NSEC_PER_USEC))? It seems like we should handle the overflow case
> where it happens: in second_overflow().
>
> The clear intent of the existing code was to saturate at
> NTP_PHASE_LIMIT, they just did it in a way where the check itself
> triggers overflow sanitizers.
The clear intent of the code is to do saturation of a bound value.
Clearly the user space interface fails to validate the input to be in a
sane range and that makes you go and prevent the resulting overflow at
the usage site. Seriously?
Obviously the sanitizer detects the stupid in second_overflow(), but
that does not mean that the proper solution is to add overflow
protection to that code.
Tools are good to pin-point symptoms, but they are by definition
patently bad in root cause analysis. Otherwise we could just let the
tool write the "fix".
The obvious first question in such a case is to ask _WHY_ does
time_maxerror have a bogus value, which clearly cannot be achieved from
regular operation.
Once you figured out that the only way to set time_maxerror to a bogus
value is the user space interface the obvious follow up question is
whether such a value has to be considered as valid or not.
As it is obviously invalid the logical consequence is to add a sanity
check and reject that nonsense at that boundary, no?
Thanks,
tglx
Powered by blists - more mailing lists