linux-kernel - Re: [PATCH RFC 1/2] sched: Minimize the idle cpu selection race window.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bc5730d4-b3ee-1fcc-7f57-824db606734f@oracle.com>
Date:   Wed, 1 Nov 2017 01:08:59 -0500
From:   Atish Patra <atish.patra@...cle.com>
To:     Mike Galbraith <efault@....de>,
        Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, joelaf@...gle.com,
        brendan.jackman@....com, jbacik@...com, mingo@...hat.com
Subject: Re: [PATCH RFC 1/2] sched: Minimize the idle cpu selection race
 window.



On 10/31/2017 03:48 AM, Mike Galbraith wrote:
> On Tue, 2017-10-31 at 09:20 +0100, Peter Zijlstra wrote:
>> On Tue, Oct 31, 2017 at 12:27:41AM -0500, Atish Patra wrote:
>>> Currently, multiple tasks can wakeup on same cpu from
>>> select_idle_sibiling() path in case they wakeup simulatenously
>>> and last ran on the same llc. This happens because an idle cpu
>>> is not updated until idle task is scheduled out. Any task waking
>>> during that period may potentially select that cpu for a wakeup
>>> candidate.
>>>
>>> Introduce a per cpu variable that is set as soon as a cpu is
>>> selected for wakeup for any task. This prevents from other tasks
>>> to select the same cpu again. Note: This does not close the race
>>> window but minimizes it to accessing the per-cpu variable. If two
>>> wakee tasks access the per cpu variable at the same time, they may
>>> select the same cpu again. But it minimizes the race window
>>> considerably.
>> The very most important question; does it actually help? What
>> benchmarks, give what numbers?
Here are the numbers from one of the OLTP configuration on a 8 socket 
x86 machine
kernel          txn/minute (normalized)    user/sys
baseline      1.0                                          80/5
pcpu            1.021                                      84/5

The throughput gains are not very high and close to run-to-run variation %.
The schedstat data (added for testing in 2/2 patch) indicates the there 
are many instances of the
race conditions that got addressed but may be not enough to trigger a 
significant throughput change.

All other benchmark I tested (TPCC, hackbench, schbench, swingbench) did 
not show any regression.

I will let Joel post numbers from Android benchmarks.
> I played with something ~similar (cmpxchg() idle cpu reservation)
I had an atomic version earlier as well. Peter's suggestion for per cpu 
seems to perform slightly better than atomic.
Thus, this patch has the per cpu version.
>   a
> while back in the context of schbench, and it did help that,
Do you have the schbench configuration somewhere that I can test? I 
tried various configurations but did not
see any improvement or regression.
> but for
> generic fast mover benchmarks, the added overhead had the expected
> effect, it shaved throughput a wee bit (rob Peter, pay Paul, repeat).
which benchmark ? Is it hackbench or something else ?
I have not found any regression yet in my testing. I would be happy to 
test if any other benchmark or different configuration
for hackbench.

Regards,
Atish
> I still have the patch lying about in my rubbish heap, but didn't
> bother to save any of the test results.
>
> 	-Mike
>
>