[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFTs51XyGDNj89+FCn4HZqMHuenjQu2wqTOW8ow4hSUbdGrGhw@mail.gmail.com>
Date: Mon, 29 Nov 2021 15:38:38 -0800
From: Peter Oskolkov <posk@...k.io>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Andy Lutomirski <luto@...nel.org>,
Linux Memory Management List <linux-mm@...ck.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-api@...r.kernel.org, Paul Turner <pjt@...gle.com>,
Ben Segall <bsegall@...gle.com>,
Peter Oskolkov <posk@...gle.com>,
Andrei Vagin <avagin@...gle.com>, Jann Horn <jannh@...gle.com>,
Thierry Delisle <tdelisle@...terloo.ca>
Subject: Re: [PATCH v0.9.1 3/6] sched/umcg: implement UMCG syscalls
On Mon, Nov 29, 2021 at 1:08 PM Peter Zijlstra <peterz@...radead.org> wrote:
[...]
> > > > Another big concern I have is that you removed UMCG_TF_LOCKED. I
> > >
> > > OOh yes, I forgot to mention that. I couldn't figure out what it was
> > > supposed to do.
[...]
>
> So then A does:
>
> A::next_tid = C.tid;
> sys_umcg_wait();
>
> Which will:
>
> pin(A);
> pin(S0);
>
> cmpxchg(A::state, RUNNING, RUNNABLE);
Hmm.... That's another difference between your patch and mine: my
approach was "the side that initiates the change updates the state".
So in my code the userspace changes the current task's state RUNNING
=> RUNNABLE and the next task's state, or the server's state, RUNNABLE
=> RUNNING before calling sys_umcg_wait(). The kernel changed worker
states to BLOCKED/RUNNABLE during block/wake detection, and marked
servers RUNNING when waking them during block/wake detection; but all
applicable state changes for sys_umcg_wait() happen in the userspace.
The reasoning behind this approach was:
- do in kernel only that which cannot be done in the userspace, to
make the kernel code smaller/simpler
- similar to how futexes work: futex_wait does not change the futex
value to the desired value, but just checks whether the futex value
matches the desired value
- similar to how futexes work, concurrent state changes can happen in
the userspace without calling into the kernel at all
for example:
- (a): worker A goes to sleep into sys_umcg_wait()
- (b): worker B wants to context switch into worker A "a moment" later
- due to preemption/interrupts/pagefaults/whatnot, (b) happens
in reality before (a)
in my patchset, the situation above happily resolves in the
userspace so that worker A keeps running without ever calling
sys_umcg_wait().
Again, I don't think this is deal breaking, and your approach will
work, just a bit less efficiently in some cases :)
I'm still not sure we can live without UMCG_TF_LOCKED. What if worker
A transfers its server to worker B that A intends to context switch
into, and then worker A pagefaults or gets interrupted before calling
sys_umcg_wait()? The server will be woken up and will see that it is
assigned to worker B; now what? If worker A is "locked" before the
whole thing starts, the pagefault/interrupt will not trigger
block/wake detection, worker A will keep RUNNING for all intended
purposes, and eventually will call sys_umcg_wait() as it had
intended...
[...]
Powered by blists - more mailing lists