linux-kernel - Re: [PATCH] nohz/cpuset: Make a CPU stick with do_timer() duty in the presence of nohz cpusets

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CABVzNXp0pLFPP_0NhJfRmi_88Hcmw=ebX7Y+0C+3ZrPNOWmKnA@mail.gmail.com>
Date:	Fri, 30 Nov 2012 12:41:43 -0700
From:	Hakan Akkan <hakanakkan@...il.com>
To:	Frederic Weisbecker <fweisbec@...il.com>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Steven Rostedt <rostedt@...dmis.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>
Subject: Re: [PATCH] nohz/cpuset: Make a CPU stick with do_timer() duty in the
 presence of nohz cpusets

On Fri, Nov 30, 2012 at 10:38 AM, Frederic Weisbecker
<fweisbec@...il.com> wrote:
> 2012/11/28 Hakan Akkan <hakanakkan@...il.com>:
>> +static int check_drop_timer_duty(int cpu)
>> +{
>> +       int curr_handler, prev_handler, new_handler;
>> +       int nrepeat = -1;
>> +       bool drop_recheck;
>> +
>> +repeat:
>> +       WARN_ON_ONCE(++nrepeat > 1);
>> +       drop_recheck = false;
>> +       curr_handler = cpu;
>> +       new_handler = TICK_DO_TIMER_NONE;
>> +
>> +#ifdef CONFIG_CPUSETS_NO_HZ
>> +       if (atomic_read(&nr_cpus_user_nohz) > 0) {
>
> Note atomic_read() is not SMP ordered. If another CPU does
> atomic_add() or atomic_add_return(), you may not see the new value as
> expected with atomic_read(). Doing atomic_add_return(0, &atomic_thing)
> returns the fully ordered result.

That code doesn't necessarily try to achieve a prefect ordering. I wasn't very
sure if we want the overhead of atomic_add_return(0, ...) in such a hot path
so this part of the function is racy. Nevertheless, it makes sure that someone
gets the duty.

#ifdef CONFIG_CPUSETS_NO_HZ
       if (atomic_read(&nr_cpus_user_nohz) > 0) {

The CPU who updated nr_cpus_user_nohz will certainly see the new
value and claim the duty if someone else hasn't yet. Another regular
CPU will eventually see the new value and assign itself to the duty.
So we can tolerate some race here for the sake of avoiding heavier locks.

>
> You also need to do that to ensure full ordering against
> tick_cpu_sched.user_nohz.
>
> On the current layout we have:
>
> (Write side)                                      (Read side)
>
> ts->user_nohz = 1;
> atomic_inc(&nr_cpus_user_nohz)
>
>                                                       if
> (atomic_read(&nr_cpus_user_nohz))
>                                                             if
> (per_cpu(tick_cpu_sched, curr_handler).user_nohz)
>                                                                 ....
>
> If you want to make sure that you see the expected value on user_nohz
> from the read side, you need to correctly order the write and read
> against nr_cpus_user_nohz.

Again, perfect ordering is not critical here. In the worst case we miss that
CPUs ts->user_nohz = 1 update and that adaptive nohz CPU will keep
the duty for a short while until someone else takes it. Trying to avoid this
would require more locking instructions like you suggest below.

If we really want to fully isolate nohz CPUs and avoid "being stuck with
jiffies update until someone else takes it away" situations, we can have a
cpumask denoting idle and/or non-adaptive-nohz CPUs and dump the duty
once we discover that we're "stuck".

>
> For this you can use atomic_add_return() which implies the full barrier:
>
>
> (Write side)                                      (Read side)
>
> ts->user_nohz = 1;
> atomic_inc_return(&nr_cpus_user_nohz)
>
>                                                       if
> (atomic_add_return(0, &nr_cpus_user_nohz))
>                                                             if
> (per_cpu(tick_cpu_sched, curr_handler).user_nohz)
>                                                                 ....
>
> I have much more comments on this patch, I will come back on this soon.
>
> Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/