linux-kernel - Re: [PATCH] ntp: remove accidental integer wrap-around

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <87ttiwkel8.ffs@tglx>
Date: Fri, 17 May 2024 10:49:39 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Justin Stitt <justinstitt@...gle.com>
Cc: John Stultz <jstultz@...gle.com>, Stephen Boyd <sboyd@...nel.org>,
 Nathan Chancellor <nathan@...nel.org>, Bill Wendling <morbo@...gle.com>,
 linux-kernel@...r.kernel.org, llvm@...ts.linux.dev,
 linux-hardening@...r.kernel.org
Subject: Re: [PATCH] ntp: remove accidental integer wrap-around

On Thu, May 16 2024 at 16:40, Justin Stitt wrote:
> On Tue, May 14, 2024 at 3:38 AM Thomas Gleixner <tglx@...utronix.de> wrote:
>> So how can 0xf42400 + 500000/1000 overflow in the first place?
>>
>> It can't unless time_maxerror is somehow initialized to a bogus
>> value and indeed it is:
>>
>> process_adjtimex_modes()
>>         ....
>>         if (txc->modes & ADJ_MAXERROR)
>>                 time_maxerror = txc->maxerror;
>>
>> So that wants to be fixed and not the symptom.
>
> Isn't this usually supplied from the user and can be some pretty
> random stuff?

Sure it comes from user space and can contain random nonsense as
syzkaller demonstrated.

> Are you suggesting we update timekeeping_validate_timex() to include a
> check to limit the maxerror field to (NTP_PHASE_LIMIT-(MAXFREQ /
> NSEC_PER_USEC))? It seems like we should handle the overflow case
> where it happens: in second_overflow().
>
> The clear intent of the existing code was to saturate at
> NTP_PHASE_LIMIT, they just did it in a way where the check itself
> triggers overflow sanitizers.

The clear intent of the code is to do saturation of a bound value.

Clearly the user space interface fails to validate the input to be in a
sane range and that makes you go and prevent the resulting overflow at
the usage site. Seriously?

Obviously the sanitizer detects the stupid in second_overflow(), but
that does not mean that the proper solution is to add overflow
protection to that code.

Tools are good to pin-point symptoms, but they are by definition
patently bad in root cause analysis. Otherwise we could just let the
tool write the "fix".

The obvious first question in such a case is to ask _WHY_ does
time_maxerror have a bogus value, which clearly cannot be achieved from
regular operation.

Once you figured out that the only way to set time_maxerror to a bogus
value is the user space interface the obvious follow up question is
whether such a value has to be considered as valid or not.

As it is obviously invalid the logical consequence is to add a sanity
check and reject that nonsense at that boundary, no?

Thanks,

        tglx