lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <acad5960-30b2-3693-9117-e0b054ee97a7@uwaterloo.ca>
Date:   Mon, 12 Jul 2021 17:44:18 -0400
From:   Thierry Delisle <tdelisle@...terloo.ca>
To:     <posk@...gle.com>
CC:     <avagin@...gle.com>, <bsegall@...gle.com>, <jannh@...gle.com>,
        <jnewsome@...project.org>, <joel@...lfernandes.org>,
        <linux-api@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <mingo@...hat.com>, <mkarsten@...terloo.ca>, <pabuhr@...terloo.ca>,
        <peterz@...radead.org>, <pjt@...gle.com>, <posk@...k.io>,
        <tdelisle@...terloo.ca>, <tglx@...utronix.de>
Subject: Re: [RFC PATCH 3/3 v0.2] sched/umcg: RFC: implement UMCG syscalls

 > sys_umcg_wait without next_tid puts the task in UMCG_IDLE state; wake
 > wakes it. These are standard sched operations. If they are emulated
 > via futexes, fast context switching will require something like
 > FUTEX_SWAP that was NACKed last year.

I understand these wait and wake semantics and the need for the fast
context-switch(swap). As I see it, you need 3 operations:

- SWAP: context-switch directly to a different thread, no scheduler involved
- WAIT: block current thread, go back to server thread
- WAKE: unblock target thread, add it to scheduler, e.g. through
         idle_workers_ptr

There is no existing syscalls to handle SWAP, so I agree sys_umcg_wait is
needed for this to work.

However, there already exists sys_futex to handle WAIT and WAKE. When a 
worker
calls either sys_futex WAIT or sys_umcg_wait next_tid == NULL, in both case
the worker will block, SWAP to the server and wait for FUTEX_WAKE,
UMCG_WAIT_WAKE_ONLY respectively. It's not obvious to me that there 
would be
performance difference and the semantics seem to be the same to me.

So what I am asking is: is UMCG_WAIT_WAKE_ONLY needed?

Is the idea to support workers directly context-switching among each other,
without involving server threads and without going through idle_servers_ptr?

If so, can you explain some of the intended state transitions in this case.


 > > However, I do not understand how the userspace is expected to use 
it. I also
 > > do not understand if these link fields form a stack or a queue and 
where is
 > > the head.
 >
 > When a server has nothing to do (no work to run), it is put into IDLE
 > state and added to the list. The kernel wakes an IDLE server if a
 > blocked worker unblocks.

 From the code in umcg_wq_worker_running (Step 3), I am guessing users are
expected to provide a global head somewhere in memory and
umcg_task.idle_servers_ptr points to the head of the list for all workers.
Servers are then added in user space using atomic_stack_push_user. Is this
correct? I did not find any documentation on the list head.

I like the idea that each worker thread points to a given list, it 
allows the
possibility for separate containers with their own independent servers, 
workers
and scheduling. However, it seems that the list itself could be implemented
using existing kernel APIs, for example a futex or an event fd. Like so:

struct umcg_task {
      [...]

      /**
       * @idle_futex_ptr: pointer to a futex user for idle server threads.
       *
       * When waking a worker, the kernel decrements the pointed to 
futex value
       * if it is non-zero and wakes a server if the decrement occurred.
       *
       * Server threads that have no work to do should increment the futex
       * value and FUTEX_WAIT
       */
      uint64_t    idle_futex_ptr;    /* r/w */

      [...]
} __attribute__((packed, aligned(8 * sizeof(__u64))));

I believe the futex approach, like the list, has the advantage that when 
there
are no idle servers, checking the list requires no locking. I don't know if
that can be achieved with eventfd.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ