linux-kernel - Re: [PATCH 5/5 v0.6] sched/umcg: add Documentation/userspace-api/umcg.txt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <40c37212-ab15-01ac-f5c5-e3f53c9b8e4e@uwaterloo.ca>
Date:   Tue, 12 Oct 2021 14:46:50 -0400
From:   Thierry Delisle <tdelisle@...terloo.ca>
To:     Peter Oskolkov <posk@...k.io>
CC:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        <linux-api@...r.kernel.org>, Paul Turner <pjt@...gle.com>,
        Ben Segall <bsegall@...gle.com>,
        Peter Oskolkov <posk@...gle.com>,
        Andrei Vagin <avagin@...gle.com>, Jann Horn <jannh@...gle.com>
Subject: Re: [PATCH 5/5 v0.6] sched/umcg: add
 Documentation/userspace-api/umcg.txt

 >> Just to be clear, sys_umcg_wait supports an operation that, when called
 >> from a worker, puts the worker to sleep without triggering block 
detection
 >> or context-switching back to the server?
 >
 > Potentially, yes - when a worker wants to yield (e.g. as part of a
 > custom UMCG-aware mutex/condvar code), and calls into the userspace
 > scheduler, it may be faster to skip the server wakeup (e.g. reassign
 > the server to another sleeping worker and wake this worker). This is
 > not a supported operation right now, but I see how it could be used to
 > optimize some things in the future.
 >
 > Do you have any concerns here?

To be honest, I did not realize this was a possibility until your previous
email. I'm not sure I buy your example, it just sounds like worker to worker
context-switching, but I could imagine "stop the world" cases or some "race
to idle" policy using this feature.

It seems to me the corresponding wake needs to know if it needs to enqueue
the worker into the idle workers list or if it should just schedule the 
worker
as it would a server.

How does the wake know which to do?



 > I don't see a big difference here, sorry. We are  mixing levels of
 > abstraction here again, I think. For example, the higher level
 > userspace scheduling code will have more nuanced treatment of IDLE
 > workers; but down at the kernel they are all the same: IDLE worker is
 > a worker that the userspace can "schedule" by marking it RUNNING,
 > regardless of whether the worker is "parked", or "woke from a blocking
 > op", or whatever other semantically different state the worker can be.
 > For the kernel, they are all the same, idle, not runnable, waiting for
 > the userspace to explicitly "schedule" them.
 >
 > Similarly, I don't see a need to semantically distinguish "yield" from
 > "park" at the kernel level of things; this distinction seems to be a
 > higher-level abstraction that can be properly expressed in the
 > userspace, and does not need to be explicitly addressed in the kernel
 > (to make the code faster and simpler, for example).

 From the kernel's perspective, I can see two distinct operation:

1 - Mark the worker as IDLE and put it to sleep.
2 - Mark the worker as IDLE, put it to sleep *and* immediately add it
     to the idle workers list.

The wait in operation 1 expects an outside wakeup call to match it and 
resume
the worker, while operation 2 is its own wakeup. To me that is the 
distinction
between wait/park and yield, respectively.

Is Operation 2 supported?

I'm not sure this distinction can be handled in userspace in all cases. 
Waking
oneself is generally not a possibility.