lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211215222524.GH16608@worktop.programming.kicks-ass.net>
Date:   Wed, 15 Dec 2021 23:25:24 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Peter Oskolkov <posk@...gle.com>
Cc:     Peter Oskolkov <posk@...k.io>, Ingo Molnar <mingo@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>, juri.lelli@...hat.com,
        Vincent Guittot <vincent.guittot@...aro.org>,
        dietmar.eggemann@....com, Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, mgorman@...e.de,
        bristot@...hat.com,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux Memory Management List <linux-mm@...ck.org>,
        linux-api@...r.kernel.org, x86@...nel.org,
        Paul Turner <pjt@...gle.com>, Andrei Vagin <avagin@...gle.com>,
        Jann Horn <jannh@...gle.com>,
        Thierry Delisle <tdelisle@...terloo.ca>
Subject: Re: [RFC][PATCH 0/3] sched: User Managed Concurrency Groups

On Wed, Dec 15, 2021 at 11:49:51AM -0800, Peter Oskolkov wrote:

> TL;DR: our models are different here. In your model a single server
> can have a bunch of workers interacting with it; in my model only a
> single RUNNING worker is assigned to a server, which it wakes when it
> blocks.

So part of the problem is that none of that was evident from the code.
It is also completely different from the scheduler code it lives in,
making it double confusing.

After having read the code, I still had no clue what so ever how it was
supposed to be used. Which is where my reverse engineering started :/

> More details:
> 
> "Working servers" cannot get wakeups, because a "working server" has a
> single RUNNING worker attached to it. When a worker blocks, it wakes
> its attached server and becomes a detached blocked worker (same is
> true if the worker is "preempted": it blocks and wakes its assigned
> server).

But who would do the preemption if the server isn't allowed to run?

> Blocked workers upon wakeup do this, in order:
> 
> - always add themselves to the runnable worker list (the list is
> shared among ALL servers, it is NOT per server);

That seems like a scalability issue. And, as said, it is completely
alien when compared to the way Linux itself does scheduling.

> - wake a server pointed to by idle_server_ptr, if not NULL;
> - sleep, waiting for a wakeup from a server;
> 
> Server S, upon becoming IDLE (no worker to run, or woken on idle
> server list) does this, in order, in userspace (simplified, see
> umcg_get_idle_worker() in
> https://lore.kernel.org/lkml/20211122211327.5931-5-posk@google.com/):
> - take a userspace (spin) lock (so the steps below are all within a
> single critical section):

Don't ever suggest userspace spinlocks, they're horrible crap.

> - compare_xchg(idle_server_ptr, NULL, S);
>   - if failed, there is another server in idle_server_ptr, so S adds
> itself to the userspace idle server list, releases the lock, goes to
> sleep;
>   - if succeeded:
>     - check the runnable worker list;
>         - if empty, release the lock, sleep;
>         - if not empty:
>            - get the list
>            - xchg(idle_server_ptr, NULL) (either S removes itself, or
> a worker in the kernel does it first, does not matter);
>            - release the lock;
>            - wake server S1 on idle server list. S1 goes through all
> of these steps.
> 
> The protocol above serializes the userspace dealing with the idle
> server ptr/list. Wakeups in the kernel will be caught if there are
> idle servers. Yes, the protocol in the userspace is complicated (more
> complicated than outlined above, as the reaped idle/runnable worker
> list from the kernel is added to the userspace idle/runnable worker
> list), but the kernel side is very simple. I've tested this
> interaction extensively, I'm reasonably sure that no worker wakeups
> are lost.

Sure, but also seems somewhat congestion prone :/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ