[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANaxB-wcpKS64q6_0+r+OwoZupRN-A-PQvPRiVsMmEgB1TRSrw@mail.gmail.com>
Date: Fri, 13 Jan 2023 13:39:46 -0800
From: Andrei Vagin <avagin@...il.com>
To: Chen Yu <yu.c.chen@...el.com>
Cc: Andrei Vagin <avagin@...gle.com>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
linux-kernel@...r.kernel.org, Kees Cook <keescook@...omium.org>,
Christian Brauner <brauner@...nel.org>,
Andy Lutomirski <luto@...capital.net>,
Juri Lelli <juri.lelli@...hat.com>,
Peter Oskolkov <posk@...gle.com>,
Tycho Andersen <tycho@...ho.pizza>,
Will Drewry <wad@...omium.org>
Subject: Re: [PATCH 2/5] sched: add WF_CURRENT_CPU and externise ttwu
On Wed, Jan 11, 2023 at 11:36 PM Chen Yu <yu.c.chen@...el.com> wrote:
>
> On 2023-01-10 at 13:30:07 -0800, Andrei Vagin wrote:
> > From: Peter Oskolkov <posk@...gle.com>
> >
> > Add WF_CURRENT_CPU wake flag that advices the scheduler to
> > move the wakee to the current CPU. This is useful for fast on-CPU
> > context switching use cases.
> >
> > In addition, make ttwu external rather than static so that
> > the flag could be passed to it from outside of sched/core.c.
> >
> > Signed-off-by: Peter Oskolkov <posk@...gle.com>
> > Signed-off-by: Andrei Vagin <avagin@...il.com>
> > @@ -7380,6 +7380,10 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
> > if (wake_flags & WF_TTWU) {
> > record_wakee(p);
> >
> > + if ((wake_flags & WF_CURRENT_CPU) &&
> > + cpumask_test_cpu(cpu, p->cpus_ptr))
> > + return cpu;
> I agree that cross-CPU wake up brings pain to fast context switching
> use cases, especially on high core count system. We suffered from this
> issue as well, so previously we presented this issue as well. The difference
> is that we used some dynamic "WF_CURRENT_CPU" mechanism[1] to deal with it.
> That is, if the waker/wakee are both short duration tasks, let the waker wakes up
> the wakee on current CPU. So not only seccomp but also other components/workloads
> could benefit from this without having to set the WF_CURRENT_CPU flag.
>
> Link [1]:
> https://lore.kernel.org/lkml/cover.1671158588.git.yu.c.chen@intel.com/
Thanks for the link. I like the idea, but this change has no impact on the
seccom notify case. I used the benchmark from the fifth patch. It is
a ping-pong
benchmark in which one process triggers system calls, and another process
handles them. It measures the number of system calls that can be processed
within a specified time slice.
$ cd tools/testing/selftests/seccomp/
$ make
$ ./seccomp_bpf 2>&1| grep user_notification_sync
# RUN global.user_notification_sync ...
# seccomp_bpf.c:4281:user_notification_sync:basic: 8489 nsec/syscall
# seccomp_bpf.c:4281:user_notification_sync:sync: 3078 nsec/syscall
# OK global.user_notification_sync
ok 51 global.user_notification_sync
The results are the same with and without your change. I expected that
your change improves
the basic case so that it reaches the sync one.
I did some experiments and found that we can achieve the desirable
outcome if we move the "short-task" checks prior to considering waking
up on prev_cpu.
For example, with this patch:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2f89e44e237d..af20b58e3972 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6384,6 +6384,11 @@ static int wake_wide(struct task_struct *p)
static int
wake_affine_idle(int this_cpu, int prev_cpu, int sync)
{
+ /* The only running task is a short duration one. */
+ if (cpu_rq(this_cpu)->nr_running == 1 &&
+ is_short_task(cpu_curr(this_cpu)))
+ return this_cpu;
+
/*
* If this_cpu is idle, it implies the wakeup is from interrupt
* context. Only allow the move if cache is shared. Otherwise an
@@ -6405,11 +6410,6 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync)
if (available_idle_cpu(prev_cpu))
return prev_cpu;
- /* The only running task is a short duration one. */
- if (cpu_rq(this_cpu)->nr_running == 1 &&
- is_short_task(cpu_curr(this_cpu)))
- return this_cpu;
-
return nr_cpumask_bits;
}
@@ -6897,6 +6897,10 @@ static int select_idle_sibling(struct
task_struct *p, int prev, int target)
asym_fits_cpu(task_util, util_min, util_max, target))
return target;
+ if (!has_idle_core && cpu_rq(target)->nr_running == 1 &&
+ is_short_task(cpu_curr(target)) && is_short_task(p))
+ return target;
+
/*
* If the previous CPU is cache affine and idle, don't be stupid:
*/
the basic test case shows almost the same results as the sync one:
$ ./seccomp_bpf 2>&1| grep user_notification_sync
# RUN global.user_notification_sync ...
# seccomp_bpf.c:4281:user_notification_sync:basic: 3082 nsec/syscall
# seccomp_bpf.c:4281:user_notification_sync:sync: 2690 nsec/syscall
# OK global.user_notification_sync
ok 51 global.user_notification_sync
If you want to do any experiments, you can find my tree here:
[1] https://github.com/avagin/linux-task-diag/tree/wip/seccom-notify-sync-and-shed-short-task
Thanks,
Andrei
Powered by blists - more mailing lists