lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtD5rxB29t1mCDBVTVv-RVFv9xMjhRaUeaxCU6znZGPJrw@mail.gmail.com>
Date:	Tue, 23 Apr 2013 09:52:58 +0200
From:	Vincent Guittot <vincent.guittot@...aro.org>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	linux-kernel <linux-kernel@...r.kernel.org>,
	"linaro-kernel@...ts.linaro.org" <linaro-kernel@...ts.linaro.org>,
	Ingo Molnar <mingo@...nel.org>,
	Frédéric Weisbecker <fweisbec@...il.com>,
	Paul Turner <pjt@...gle.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Mike Galbraith <efault@....de>
Subject: Re: [PATCH v7] sched: fix init NOHZ_IDLE flag

On 22 April 2013 22:10, Peter Zijlstra <peterz@...radead.org> wrote:
> On Mon, 2013-04-22 at 17:38 +0200, Vincent Guittot wrote:
>
>> ---
>>  include/linux/sched.h |    1 +
>>  kernel/sched/fair.c   |   34 ++++++++++++++++++++++++----------
>>  2 files changed, 25 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index d35d2b6..cde4f7f 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -899,6 +899,7 @@ struct sched_domain {
>>       unsigned int wake_idx;
>>       unsigned int forkexec_idx;
>>       unsigned int smt_gain;
>> +     unsigned long nohz_flags;       /* NOHZ_IDLE flag status */
>>       int flags;                      /* See SD_* */
>>       int level;
>
> There was a 4 byte hole here, I'm assuming you used unsigned long and
> not utilized the hole because of the whole atomic bitop interface
> taking unsigned long?
>
> Bit wasteful but ok..
>
> So we're only pulling NOHZ_IDLE out of nohz_flags()? Should we then not
> also amend the rq_nohz_flag_bits enum? And it seems pointless to me to
> set bit 2 our nohz_flags word if all other bits are unused.

Yes, i have been too quick to update the patchset. i'm going to remove
NOHZ_IDLE from the rq_nohz_flag_bits

>
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 7a33e59..09e440f 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5394,14 +5394,21 @@ static inline void set_cpu_sd_state_busy(void)
>>  {
>>       struct sched_domain *sd;
>>       int cpu = smp_processor_id();
>> -
>> -     if (!test_bit(NOHZ_IDLE, nohz_flags(cpu)))
>> -             return;
>> -     clear_bit(NOHZ_IDLE, nohz_flags(cpu));
>> +     int first_nohz_idle = 1;
>>
>>       rcu_read_lock();
>> -     for_each_domain(cpu, sd)
>> +     for_each_domain(cpu, sd) {
>> +             if (first_nohz_idle) {
>> +                     if (!test_bit(NOHZ_IDLE, &sd->nohz_flags))
>> +                             goto unlock;
>> +
>> +                     clear_bit(NOHZ_IDLE, &sd->nohz_flags);
>> +                     first_nohz_idle = 0;
>> +             }
>> +
>>               atomic_inc(&sd->groups->sgp->nr_busy_cpus);
>> +     }
>
> the mind boggles... what's wrong with something like:
>
> static inline unsigned long *rq_nohz_flags(int cpu)
> {
>         return rcu_dereference(cpu_rq(cpu)->sd)->nohz_flags;
> }
>
>         if (!test_bit(0, rq_nohz_flags(cpu)))
>                 return;
>         clear_bit(0, rq_nohz_flags(cpu));
>

AFAICT, if we use different rcu_dereferences for modifying nohz_flags
and for updating the nr_busy_cpus, we open a time window for
desynchronization, isn't it ?
That why i'm doing the update of rq_nohz_flags with the same
rcu_dereference than nr_busy_cpus

>> +unlock:
>>       rcu_read_unlock();
>>  }
>>
>> @@ -5409,14 +5416,21 @@ void set_cpu_sd_state_idle(void)
>>  {
>>       struct sched_domain *sd;
>>       int cpu = smp_processor_id();
>> -
>> -     if (test_bit(NOHZ_IDLE, nohz_flags(cpu)))
>> -             return;
>> -     set_bit(NOHZ_IDLE, nohz_flags(cpu));
>> +     int first_nohz_idle = 1;
>>
>>       rcu_read_lock();
>> -     for_each_domain(cpu, sd)
>> +     for_each_domain(cpu, sd) {
>> +             if (first_nohz_idle) {
>> +                     if (test_bit(NOHZ_IDLE, &sd->nohz_flags))
>> +                             goto unlock;
>> +
>> +                     set_bit(NOHZ_IDLE, &sd->nohz_flags);
>> +                     first_nohz_idle = 0;
>> +             }
>> +
>>               atomic_dec(&sd->groups->sgp->nr_busy_cpus);
>> +     }
>> +unlock:
>
> Same here, .. why on earth do it for every sched_domain for that cpu?

The update of rq_nohz_flags is done only once on the top level
sched_domain and the nr_busy_cpus is updated on each sched_domain
level in order to have a quick access to the number of busy cpu when
we check for the need to kick an idle load balance

>
>>       rcu_read_unlock();
>>  }
>>
>
> And since its now only a single bit in a single word, we can easily
> change it to something like:
>
>   if (*rq_nohz_idle(cpu))
>         return;
>   xchg(rq_nohz_idle(cpu), 1); /* set; use 0 for clear */
>
> which drops the unsigned long nonsense..

that's fair. i'm going to use xchg instead of atomic

>
> Or am I completely missing something obvious here?
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ