[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <d0ed1dbd-1b7e-bf98-65c0-7f61dd1a3228@ddn.com>
Date: Fri, 24 Mar 2023 19:50:12 +0000
From: Bernd Schubert <bschubert@....com>
To: Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Miklos Szeredi <miklos@...redi.hu>
CC: Dharmendra Singh <dsingh@....com>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
Amir Goldstein <amir73il@...il.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: fuse uring / wake_up on the same core
Ingo, Peter,
I would like to ask how to wake up from a waitq on the same core. I have
tried __wake_up_sync()/WF_SYNC, but I do not see any effect.
I'm currently working on fuse/uring communication patches, besides uring
communication there is also a queue per core. Basic bonnie++ benchmarks
with a zero file size to just create/read(0)/delete show a ~3x IOPs
difference between CPU bound bonnie++ and unbound - i.e. with these
patches it _not_ fuse-daemon that needs to be bound, but the application
doing IO to the file system. We basically have
bonnie -> vfs (app/vfs)
fuse_req (app/fuse.ko)
qid = task_cpu(current) (app/fuse.ko)
ring(qid) / SQE completion (fuse.ko) (app/fuse.ko/uring)
wait_event(req->waitq, ...) (app/fuse.ko)
[app wait]
daemon ring / handle CQE (daemon)
send-back result as SQE (daemon/uring)
fuse_request_end (daemon/uring/fuse.ko)
wake_up() ---> random core (daemon/uring/fuse.ko)
[app wakeup/fuse/vfs/syscall return]
bonnie ==> different core
1) bound
[root@...srv1 ~]# numactl --localalloc --physcpubind=0 bonnie++ -q -x 1
-s0 -d /scratch/dest/ -n 20:1:1:20 -r 0 -u 0 | bon_csv2txt
------Sequential Create------ --------Random
Create--------
-Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
/sec %CP
imesrv1 20:1:1:20 6229 28 11289 41 12785 24 6615 28 7769 40
10020 25
Latency 411us 824us 816us 298us 10473us
200ms
2) not bound
[root@...srv1 ~]# bonnie++ -q -x 1 -s0 -d /scratch/dest/ -n 20:1:1:20
-r 0 -u 0 | bon_csv2txt
------Sequential Create------ --------Random
Create--------
-Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
/sec %CP
imesrv1 20:1:1:20 2064 33 2923 43 4556 28 2061 33 2186 42
4245 30
Latency 850us 3914us 2496us 738us 758us
6469us
With less files the difference becomes a bit smaller, but is still very
visible. Besides cache line bouncing, I'm sure that CPU frequency and
C-states will matter - I could tune that it in the lab, but in the end I
want to test what users do (I had recently checked with large HPC center
- Forschungszentrum Juelich - their HPC compute nodes are not tuned up,
to save energy).
Also, in order to really tune down latencies, I want want to add a
struct file_operations::uring_cmd_iopoll thread, which will spin for a
short time and avoid most of kernel/userspace communication. If
applications (with n-nthreads < n-cores) then get scheduled on different
core differnent rings will be used, result in
n-threads-spinning > n-threads-application
There was already a related thread about fuse before
https://lore.kernel.org/lkml/1638780405-38026-1-git-send-email-quic_pragalla@quicinc.com/
With the fuse-uring patches that part is basically solved - the waitq
that that thread is about is not used anymore. But as per above,
remaining is the waitq of the incoming workq (not mentioned in the
thread above). As I wrote, I have tried
__wake_up_sync((x), TASK_NORMAL), but it does not make a difference for
me - similar to Miklos' testing before. I have also tried struct
completion / swait - does not make a difference either.
I can see task_struct has wake_cpu, but there doesn't seem to be a good
interface to set it.
Any ideas?
Thanks,
Bernd
Powered by blists - more mailing lists