linux-kernel - Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5127135E.7030502@linux.vnet.ibm.com>
Date:	Fri, 22 Feb 2013 14:42:38 +0800
From:	Michael Wang <wangyun@...ux.vnet.ibm.com>
To:	Mike Galbraith <efault@....de>
CC:	Ingo Molnar <mingo@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Paul Turner <pjt@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>, alex.shi@...el.com,
	Ram Pai <linuxram@...ibm.com>,
	"Nikunj A. Dadhania" <nikunj@...ux.vnet.ibm.com>,
	Namhyung Kim <namhyung@...nel.org>
Subject: Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()

On 02/22/2013 01:02 PM, Mike Galbraith wrote:
> On Fri, 2013-02-22 at 10:36 +0800, Michael Wang wrote: 
>> On 02/21/2013 05:43 PM, Mike Galbraith wrote:
>>> On Thu, 2013-02-21 at 17:08 +0800, Michael Wang wrote:
>>>
>>>> But is this patch set really cause regression on your Q6600? It may
>>>> sacrificed some thing, but I still think it will benefit far more,
>>>> especially on huge systems.
>>>
>>> We spread on FORK/EXEC, and will no longer will pull communicating tasks
>>> back to a shared cache with the new logic preferring to leave wakee
>>> remote, so while no, I haven't tested (will try to find round tuit) it
>>> seems  it _must_ hurt.  Dragging data from one llc to the other on Q6600
>>> hurts a LOT.  Every time a client and server are cross llc, it's a huge
>>> hit.  The previous logic pulled communicating tasks together right when
>>> it matters the most, intermittent load... or interactive use.
>>
>> I agree that this is a problem need to be solved, but don't agree that
>> wake_affine() is the solution.
> 
> It's not perfect, but it's better than no countering force at all.  It's
> a relic of the dark ages, when affine meant L2, ie this cpu.  Now days,
> affine has a whole new meaning, L3, so it could be done differently, but
> _some_ kind of opposing force is required.
> 
>> According to my understanding, in the old world, wake_affine() will only
>> be used if curr_cpu and prev_cpu share cache, which means they are in
>> one package, whatever search in llc sd of curr_cpu or prev_cpu, we won't
>> have the chance to spread the task out of that package.
> 
> ? affine_sd is the first domain spanning both cpus, that may be NODE.
> True we won't ever spread in the wakeup path unless SD_WAKE_BALANCE is
> set that is.  Would be nice to be able to do that without shredding
> performance.

That's right, we need two conditions in each select instance:
1. prev_cpu and curr_cpu are not affine
2. SD_WAKE_BALANCE

> 
> Off the top of my pointy head, I can think of a way to _maybe_ improve
> the "affine" wakeup criteria:  Add a small (package size? and very fast)
> FIFO queue to task struct, record waker/wakee relationship.  If
> relationship exists in that queue (rbtree), try to wake local, if not,
> wake remote.  The thought is to identify situations ala 1:N pgbench
> where you really need to keep the load spread.  That need arises when
> the sum wakees + waker won't fit in one cache.  True buddies would
> always hit (hm, hit rate), always try to become affine where they
> thrive.  1:N stuff starts missing when client count exceeds package
> size, starts expanding it's horizons. 'Course you would still need to
> NAK if imbalanced too badly, and let NUMA stuff NAK touching lard-balls
> and whatnot.  With a little more smarts, we could have happy 1:N, and
> buddies don't have to chat through 2m thick walls to make 1:N scale as
> well as it can before it dies of stupidity.

So this is trying to take care the condition when curr_cpu(local) and
prev_cpu(remote) are on different nodes, which in the old world,
wake_affine() won't be invoked, correct?

Hmm...I think this maybe a good additional checking before enter balance
path, but I could not estimate the cost to record the relationship at
this moment of time...

Whatever, after applied the affine logical into new world, it will gain
the ability to spread tasks cross nodes just like the old world, your
idea may be an optimization, but the logical is out of the changing in
this patch set, which means if it benefits, the beneficiary will be not
only new but also old.

Regards,
Michael Wang

> 
> -Mike
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/