linux-kernel - Re: [PATCH 2/4] nohz: Synchronize sleep time stats with seqlock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAFTL4hzxN2nj8H+ycPaFLzvM0NK5TtfWtfVJVTSEtNzofTvqSw@mail.gmail.com>
Date:	Tue, 1 Oct 2013 16:26:01 +0200
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	Oleg Nesterov <oleg@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>
Cc:	Arjan van de Ven <arjan@...ux.intel.com>,
	Fernando Luis Vázquez Cao 
	<fernando_b1@....ntt.co.jp>, Ingo Molnar <mingo@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	LKML <linux-kernel@...r.kernel.org>,
	Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH 2/4] nohz: Synchronize sleep time stats with seqlock

2013/10/1 Frederic Weisbecker <fweisbec@...il.com>:
> On Wed, Aug 21, 2013 at 06:41:46PM +0200, Oleg Nesterov wrote:
>> On 08/21, Peter Zijlstra wrote:
>> >
>> > The other consideration is that this adds two branches to the normal
>> > schedule path. I really don't know what the regular ratio between
>> > schedule() and io_schedule() is -- and I suspect it can very much depend
>> > on workload -- but it might be a net loss due to that, even if it makes
>> > io_schedule() 'lots' cheaper.
>>
>> Yes, agreed. Please ignore it for now, I didn't try to actually suggest
>> this change. And even if this is fine perfomance wise, this needs some
>> benchmarking.
>>
>> Well. actually I have a vague feeling that _perhaps_ this change can
>> help to solve other problems we are discussing, but I am not sure and
>> right now I can't even explain the idea to me.
>>
>> In short: please ignore ;)
>>
>> Oleg.
>>
>
> Ok, the discussion is hard to synthesize but I think I now have more
> clues to send a better new iteration.
>
> So we have the choice between keeping atomics or using rq->lock. It seems
> that using rq->lock is going to be worrisome for several reasons. So let's
> stick to atomics (but tell me if you prefer the other way).
>
> So the idea for the next try is to do something along the lines of:
>
> struct cpu_idletime {
>        nr_iowait,
>        seqlock,
>        idle_start,
>        idle_time,
>        iowait_time,
> } __cacheline_aligned_in_smp;
>
> DEFINE_PER_CPU(struct cpu_idletime, cpu_idletime);
>
> io_schedule()
> {
>         int prev_cpu;
>
>         preempt_disable();
>         prev_cpu_idletime = __this_cpu_ptr(&cpu_idletime);
>         atomic_inc(prev_cpu_idletime->nr_iowait);
>         WARN_ON_ONCE(is_idle_task(current));
>         preempt_enable_no_resched();
>
>         schedule();
>
>         write_seqlock(prev_cpu_idletime->seqlock)
>         if (!atomic_dec_return(prev_cpu_idletime->nr_iowait))
>            flush_cpu_idle_time(prev_cpu_idletime, 1)

I forgot...
              cpu_idletime->idle_start;

after the update.

Also now I wonder if we actually should lock the inc part. Otherwise
it may be hard to get the readers right...

Thanks.

>         write_sequnlock(prev_cpu_idletime->seqlock)
>
> }
>
> flush_cpu_idle_time(cpu_idletime, iowait)
> {
>        if (!cpu_idletime->idle_start)
>             return;
>
>        if (nr_iowait)
>             cpu_idletime->iowait_time = NOW() - cpu_idletime->idle_start;
>        else
>             cpu_idletime->idle_time = NOW() - cpu_idletime->idle_start;
> }
>
> idle_entry()
> {
>         write_seqlock(this_cpu_idletime->seqlock)
>         this_cpu_idletime->idle_start = NOW();
>         write_sequnlock(iowait(cpu)->seqlock)
> }
>
> idle_exit()
> {
>         write_seqlock(this_cpu_idletime->seqlock)
>         flush_cpu_idle_time(this_cpu_idletime, atomic_read(&this_cpu_idletime->nr_iowait));
>         this_cpu_idletime->idle_start = 0;
>         write_sequnlock(this_cpu_idletime->seqlock)
> }
>
>
> Now this all realy on the fact that atomic_inc(cpu_idletime->nr_iowait) can't happen
> in a CPU that is already idle. So it can't happen between idle_entry() and idle_exit().
> Hence the WARN_ON(is_idle_task(current)) below after the inc.
>
> Hmm?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/