lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 07 Feb 2011 14:50:42 +0100
From:	Peter Zijlstra <>
To:	Venkatesh Pallipadi <>
Cc:	Ingo Molnar <>,,
	Paul Turner <>,
	Suresh Siddha <>,
	Mike Galbraith <>
Subject: Re: [PATCH] sched: Resolve sd_idle and first_idle_cpu Catch-22 - v1

On Fri, 2011-02-04 at 13:25 -0800, Venkatesh Pallipadi wrote:
> Consider a system with { [ (A B) (C D) ] [ (E F) (G H) ] },
> () denoting SMT siblings, [] cores on same socket and {} system wide
> Further, A, C and D are idle, B is busy and one of EFGH has excess load.
> With sd_idle logic, a check in rebalance_domains() converts tick
> based load balance requests from CPU A to busy load balance for core
> and above domains (lower rate of balance and higher load_idx).

the if (load_balance())
	idle = CPU_NOT_IDLE;
bit, right?

> With first_idle_cpu logic, when CPU C or D tries to balance across domains
> the logic finds CPU A as first idle CPU in the group and nominates CPU A to
> idle balance across sockets.


> But, sd_idle above would not allow CPU A to do cross socket idle balance
> as CPU A switches its higher level balancing to busy balance.

Because it fails the sd->flags & SD_SHARE_CPUPOWER test at the beginning
of load_balance() and hence sd_idle will remain 0, right?

I'm just not quite sure how we then end up returning !0 for
load_balance(), both branches returning -1 seem conditional on
SD_SHARE_CPUPOWER but the [ (A B) (C D) ], domain doesn't have that set.

> So, this can result is no cross socket balancing for extended periods.

Which is bad

> The fix here adds additional check to detect sd_idle logic in
> first_idle_cpu code path. We will now nominate (in order or preference):
> * First fully idle CPU
> * First semi-idle CPU
> * First CPU
> Note that this solution works fine for 2 SMT siblings case and won't be
> perfect in picking proper semi-idle in case of more than 2 SMT threads.

All these SMT exceptions make my head hurt, can't we clean that up
instead of making them worse?

Why is SMT treaded differently from say a shared cache? In both cases we
want to spread the load as wide as possible to provide as much of the
resources to the few runnable tasks.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

Powered by blists - more mailing lists