linux-kernel - Re: [PATCH] sched: consider WF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAEWA0a5hRPLwxxQHLx58C9v5QXqFiN=1e7gfM9hojhS-VNu-OQ@mail.gmail.com>
Date:   Tue, 1 Nov 2022 17:18:57 -0700
From:   Andrei Vagin <avagin@...gle.com>
To:     Mel Gorman <mgorman@...e.de>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        linux-kernel@...r.kernel.org, Andrei Vagin <avagin@...il.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>
Subject: Re: [PATCH] sched: consider WF_SYNC to find idle siblings

On Tue, Nov 1, 2022 at 2:42 AM Mel Gorman <mgorman@...e.de> wrote:
>
> On Thu, Oct 27, 2022 at 01:26:03PM -0700, Andrei Vagin wrote:
> > From: Andrei Vagin <avagin@...il.com>
> >
> > WF_SYNC means that the waker goes to sleep after wakeup, so the current
> > cpu can be considered idle if the waker is the only process that is
> > running on it.
> >
> > The perf pipe benchmark shows that this change reduces the average time
> > per operation from 8.8 usecs/op to 3.7 usecs/op.
> >
> > Before:
> >  $ ./tools/perf/perf bench sched pipe
> >  # Running 'sched/pipe' benchmark:
> >  # Executed 1000000 pipe operations between two processes
> >
> >      Total time: 8.813 [sec]
> >
> >        8.813985 usecs/op
> >          113456 ops/sec
> >
> > After:
> >  $ ./tools/perf/perf bench sched pipe
> >  # Running 'sched/pipe' benchmark:
> >  # Executed 1000000 pipe operations between two processes
> >
> >      Total time: 3.743 [sec]
> >
> >        3.743971 usecs/op
> >          267096 ops/sec
> >
>
> The WF_SYNC hint in unreliable as the waking process does not always
> go to sleep immediately. While it's great for a benchmark like a pipe
> benchmark as the relationship is strictly synchronous, it does not work
> out as well for networking which can use WF_SYNC for wakeups but either
> multiple tasks are being woken up or the waker does not go to sleep as
> there is sufficient inbound traffic to keep it awake.

This change should work fine when we wake up multiple tasks. If the waker
doesn't go to sleep, it sounds like a misuse of WF_SYNC. For example,
wake_affine_idle contains the same check like introduced in this
patch. At the first
glance, wake_affine_weight handles WF_SYNC incorrectly in this case too.

As for benchmarks, tbench shows much better numbers with this change:

$ tbench_srv &  "tbench" "-t" "15" "4" "127.0.0.1"
Before: Throughput 733.44 MB/sec  4 clients  4 procs  max_latency=0.935 ms
After:  Throughput 1778.94 MB/sec  4 clients  4 procs  max_latency=0.882 ms

I know it is just another synchronous benchmark...

I am working on the synchronous mode of seccom user notifies[1]. In the
first two versions, I used the WF_CURRENT_CPU [2] flag that has been borrowed
from the umcg patchset [3].  But when I was preparing the third version of the
patchset, I wondered why WF_SYNC didn't work in this case and ended up with this
patch. For the seccom patchset, fast synchronous context switches are the most
critical part, so any advice on how to do that properly are welcome.

[1] https://lore.kernel.org/lkml/20221020011048.156415-1-avagin@gmail.com/T/
[2] https://lore.kernel.org/lkml/20221020011048.156415-1-avagin@gmail.com/T/#m8a597d43764aa8ded2788ea7ce4276f9045668d1
[3] https://lkml.iu.edu/hypermail/linux/kernel/2111.0/04473.html

Thanks,
Andrei
> There used to be
> an attempt to track how accurate WF_SYNC was, using avg_overlap I think,
> but it was ultimately removed.
>
> --
> Mel Gorman
> SUSE Labs