linux-kernel - Re: [PATCH RFC] tick/nohz: fix data races in get_cpu_idle_time

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y9lfe54aoCWlmyqy@p183>
Date:   Tue, 31 Jan 2023 21:35:39 +0300
From:   Alexey Dobriyan <adobriyan@...il.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Yu Liao <liaoyu15@...wei.com>, fweisbec@...il.com,
        mingo@...nel.org, liwei391@...wei.com,
        mirsad.todorovac@....unizg.hr, linux-kernel@...r.kernel.org,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH RFC] tick/nohz: fix data races in get_cpu_idle_time_us()

On Tue, Jan 31, 2023 at 03:44:00PM +0100, Thomas Gleixner wrote:
> On Sat, Jan 28 2023 at 10:00, Yu Liao wrote:
> > selftest/proc/proc-uptime-001 complains:
> >   Euler:/mnt # while true; do ./proc-uptime-001; done
> >   proc-uptime-001: proc-uptime-001.c:41: main: Assertion `i1 >= i0' failed.
> >   proc-uptime-001: proc-uptime-001.c:41: main: Assertion `i1 >= i0' failed.
> >
> > /proc/uptime should be monotonically increasing. This occurs because
> > the data races between get_cpu_idle_time_us and
> > tick_nohz_stop_idle/tick_nohz_start_idle, for example:
> >
> > CPU0                        CPU1
> > get_cpu_idle_time_us
> >
> >                             tick_nohz_idle_exit
> >                               now = ktime_get();
> >                               tick_nohz_stop_idle
> >                                 update_ts_time_stats
> >                                   delta = ktime_sub(now, ts->idle_entrytime);
> >                                   ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta)
> >                                   ts->idle_entrytime = now
> >
> > now = ktime_get();
> > if (ts->idle_active && !nr_iowait_cpu(cpu)) {
> >     ktime_t delta = ktime_sub(now, ts->idle_entrytime);
> >     idle = ktime_add(ts->idle_sleeptime, delta);
> >     //idle is slightly greater than the actual value
> > } else {
> >     idle = ts->idle_sleeptime;
> > }
> >                             ts->idle_active = 0
> >
> > After this, idle = idle_sleeptime(actual idle value) + now(CPU0) - now(CPU1).
> > If get_cpu_idle_time_us() is called immediately after ts->idle_active = 0,
> > only ts->idle_sleeptime is returned, which is smaller than the previously
> > read one, resulting in a non-monotonically increasing idle time. In
> > addition, there are other data race scenarios not listed here.
> 
> Seriously this procfs accuracy is the least of the problems and if this
> would be the only issue then we could trivially fix it by declaring that
> the procfs output might go backwards.

Declarations on l-k are meaningless.

> If there would be a real reason to ensure monotonicity there then we could
> easily do that in the readout code.

People expect it to be monotonic. I wrote this test fully expecting
that /proc/uptime is monotonic. It didn't ever occured to me that
idletime can go backwards (nor uptime, but uptime is not buggy).

> But the real issue is that both get_cpu_idle_time_us() and
> get_cpu_iowait_time_us() can invoke update_ts_time_stats() which is way
> worse than the above procfs idle time going backwards.
> 
> If update_ts_time_stats() is invoked concurrently for the same CPU then
> ts->idle_sleeptime and ts->iowait_sleeptime are turning into random
> numbers.
> 
> This has been broken 12 years ago in commit 595aac488b54 ("sched:
> Introduce a function to update the idle statistics").