linux-kernel - Re: [PATCH] ntp: remove accidental integer wrap-around

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAFhGd8p94sHpDc8MApZK7q9iEQ_C8c5frwZx9v_bTnhwtAM=HQ@mail.gmail.com>
Date: Thu, 16 May 2024 16:40:01 -0700
From: Justin Stitt <justinstitt@...gle.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: John Stultz <jstultz@...gle.com>, Stephen Boyd <sboyd@...nel.org>, 
	Nathan Chancellor <nathan@...nel.org>, Bill Wendling <morbo@...gle.com>, linux-kernel@...r.kernel.org, 
	llvm@...ts.linux.dev, linux-hardening@...r.kernel.org
Subject: Re: [PATCH] ntp: remove accidental integer wrap-around

Hi,

On Tue, May 14, 2024 at 3:38 AM Thomas Gleixner <tglx@...utronix.de> wrote:
>
> On Tue, May 07 2024 at 04:34, Justin Stitt wrote:
> > Using syzkaller alongside the newly reintroduced signed integer overflow
> > sanitizer spits out this report:
> >
> > [  138.454979] ------------[ cut here ]------------
> > [  138.458089] UBSAN: signed-integer-overflow in ../kernel/time/ntp.c:461:16
> > [  138.462134] 9223372036854775807 + 500 cannot be represented in type 'long'
> > [  138.466234] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0-rc2-00038-gc0a509640e93-dirty #10
> > [  138.471498] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> > [  138.477110] Call Trace:
> > [  138.478657]  <IRQ>
> > [  138.479964]  dump_stack_lvl+0x93/0xd0
> > [  138.482276]  handle_overflow+0x171/0x1b0
> > [  138.484699]  second_overflow+0x2d6/0x500
> > [  138.487133]  accumulate_nsecs_to_secs+0x60/0x160
> > [  138.489931]  timekeeping_advance+0x1fe/0x890
> > [  138.492535]  update_wall_time+0x10/0x30
>
> Same comment vs. trimming.

Gotcha, in the next version this will be trimmed.

>
> > Historically, the signed integer overflow sanitizer did not work in the
> > kernel due to its interaction with `-fwrapv` but this has since been
> > changed [1] in the newest version of Clang. It was re-enabled in the
> > kernel with Commit 557f8c582a9ba8ab ("ubsan: Reintroduce signed overflow
> > sanitizer").
>
> Again. Irrelevant to the problem.

Right, I'll move it below the fold.

>
> > Let's introduce a new macro and use that against NTP_PHASE_LIMIT to
> > properly limit the max size of time_maxerror without overflowing during
> > the check itself.
>
> This fails to tell what is causing the issue and just talks about what
> the patch is doing. The latter can be seen from the patch itself, no?
>
> Something like this:
>
>    On second overflow time_maxerror is unconditionally incremented and
>    the result is checked against NTP_PHASE_LIMIT, but the increment can
>    overflow into negative space.
>
>    Prevent this by checking the overflow condition before incrementing.
>
> Hmm?

Sounds better :thumbs_up: I'll use this!

>
> But that obviously begs the question why this can happen at all.
>
> #define MAXPHASE 500000000L
> #define NTP_PHASE_LIMIT ((MAXPHASE / NSEC_PER_USEC) << 5)
>
> ==> NTP_PHASE_LIMIT = 1.6e+07 = 0xf42400
>
> #define MAXFREQ 500000
>
> So how can 0xf42400 + 500000/1000 overflow in the first place?
>
> It can't unless time_maxerror is somehow initialized to a bogus
> value and indeed it is:
>
> process_adjtimex_modes()
>         ....
>         if (txc->modes & ADJ_MAXERROR)
>                 time_maxerror = txc->maxerror;
>
> So that wants to be fixed and not the symptom.

Isn't this usually supplied from the user and can be some pretty
random stuff? Are you suggesting we update
timekeeping_validate_timex() to include a check to limit the maxerror
field to (NTP_PHASE_LIMIT-(MAXFREQ / NSEC_PER_USEC))? It seems like we
should handle the overflow case where it happens: in
second_overflow().

The clear intent of the existing code was to saturate at
NTP_PHASE_LIMIT, they just did it in a way where the check itself
triggers overflow sanitizers.

>
> Thanks,
>
>         tglx

Thanks
Justin