linux-kernel - Re: [PATCH] sched: prefer an idle cpu vs an idle sibling for BALANCE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <55662460.2050501@fb.com>
Date:	Wed, 27 May 2015 16:09:04 -0400
From:	Josef Bacik <jbacik@...com>
To:	<riel@...hat.com>, <mingo@...hat.com>, <peterz@...radead.org>,
	<linux-kernel@...r.kernel.org>, <kernel-team@...com>
Subject: Re: [PATCH] sched: prefer an idle cpu vs an idle sibling for BALANCE_WAKE

On 05/26/2015 05:31 PM, Josef Bacik wrote:
> At Facebook we have a pretty heavily multi-threaded application that is
> sensitive to latency.  We have been pulling forward the old SD_WAKE_IDLE code
> because it gives us a pretty significant performance gain (like 20%).  It turns
> out this is because there are cases where the scheduler puts our task on a busy
> CPU when there are idle CPU's in the system.  We verify this by reading the
> cpu_delay_req_avg_us from the scheduler netlink stuff.  With our crappy patch we
> get much lower numbers vs baseline.
>
> SD_BALANCE_WAKE is supposed to find us an idle cpu to run on, however it is just
> looking for an idle sibling, preferring affinity over all else.  This is not
> helpful in all cases, and SD_BALANCE_WAKE's job is to find us an idle cpu, not
> garuntee affinity.  Fix this by first trying to find an idle sibling, and then
> if the cpu is not idle fall through to the logic to find an idle cpu.  With this
> patch we get slightly better performance than with our forward port of
> SD_WAKE_IDLE.  Thanks,
>

I rigged up a test script to run the perf bench sched tests and give me 
the numbers.  Here are the numbers

4.0

Messaging: 56.934 Total runtime in seconds
Pipe: 105620.762 ops/sec

4.0 + my patch

Messaging: 47.374
Pipe: 113691.199

so ~20% better performance out of the Messaging test which is sort of 
like HHVM and ~8% better pipe performance.  This box is a 2 socket 16 
core box.  I've attached the script I'm using, basically I just run each 
thing 5 times, and for the perf bench sched pipe run I do NR_CPUS/2 
instances of them in parallel.

If you are interested I'd be happy to show you numbers for our HHVM 
test, but they are less straightforward and require pretty pictures and 
a book of how to read the numbers.  Thanks

Josef