linux-kernel - Re: [PATCH 2/2] sched: Rewrite per entity runnable load average tracking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xm2638e9xhws.fsf@sword-of-the-dawn.mtv.corp.google.com>
Date:	Thu, 10 Jul 2014 10:06:27 -0700
From:	bsegall@...gle.com
To:	Yuyang Du <yuyang.du@...el.com>
Cc:	Peter Zijlstra <peterz@...radead.org>, mingo@...hat.com,
	linux-kernel@...r.kernel.org, rafael.j.wysocki@...el.com,
	arjan.van.de.ven@...el.com, len.brown@...el.com,
	alan.cox@...el.com, mark.gross@...el.com, pjt@...gle.com,
	fengguang.wu@...el.com
Subject: Re: [PATCH 2/2] sched: Rewrite per entity runnable load average tracking

Yuyang Du <yuyang.du@...el.com> writes:

> Thanks, Peter.
>
> On Wed, Jul 09, 2014 at 08:45:43PM +0200, Peter Zijlstra wrote:
>
>> Nope :-).. we got rid of that lock for a good reason.
>> 
>> Also, this is one area where I feel performance really trumps
>> correctness, we can fudge the blocked load a little. So the
>> sched_clock_cpu() difference is a strict upper bound on the
>> rq_clock_task() difference (and under 'normal' circumstances shouldn't
>> be much off).
>
> Strictly, migrating wakee task on remote CPU entails two steps:
>
> (1) Catch up with task's queue's last_update_time, and then substract
>
> (2) Cache up with "current" time of remote CPU (for comparable matter), and then
>     on new CPU, change to the new timing source (when enqueue)
>
> So I will try sched_clock_cpu(remote_cpu) for step (2). For step (2), maybe we
> should not use cfs_rq_clock_task anyway, since the task is about to going
> to another CPU/queue. Is this right?

So, sched_clock(_cpu) can be arbitrarily far off of cfs_rq_clock_task, so you
can't really do that. Ideally, yes, you would account for any time since
the last update and account that time as !runnable. However, I don't
think there is any good way to do that, and the current code doesn't.

>
> I made another mistake. Should not only track task entity load, group entity
> (as an entity) is also needed. Otherwise, task_h_load can't be done correctly...
> Sorry for the messup. But this won't make much change in the codes.

This will increase it to 2x __update_load_avg per cgroup per
enqueue/dequeue. What does this (and this patch in general) do to
context switch cost at cgroup depth 1/2/3?

>
> Thanks,
> Yuyang
>  
>> So we could simply use a timestamps from dequeue and one from enqueue,
>> and use that.
>> 
>> As to the remote subtraction, a RMW on another cacheline than the
>> rq->lock one should be good; esp since we don't actually observe the
>> per-rq total often (once per tick or so) I think, no?
>> 
>> The thing is, we do not want to disturb scheduling on whatever cpu the
>> task last ran on if we wake it to another cpu. Taking rq->lock wrecks
>> that for sure. 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/