linux-kernel - Re: fuse uring / wake

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <3c0facd0-e3c7-0aa1-8b2e-961120d4f43d@ddn.com>
Date:   Thu, 27 Apr 2023 13:35:31 +0000
From:   Bernd Schubert <bschubert@....com>
To:     Hillf Danton <hdanton@...a.com>
CC:     Peter Zijlstra <peterz@...radead.org>,
        Miklos Szeredi <miklos@...redi.hu>,
        K Prateek Nayak <kprateek.nayak@....com>,
        Andrei Vagin <avagin@...il.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: fuse uring / wake_up on the same core

On 4/27/23 14:24, Hillf Danton wrote:
> On 26 Apr 2023 22:40:32 +0000 Bernd Schubert <bschubert@....com>
>> My issue is now that these patches are not enough and contrary to
>> previous testing, forcefully disabling cpu migration using
>> migrate_disable() before wait_event_* in fuse's request_wait_answer()
>> and enabling it after does not help either - my process to create files
>> (bonnie++) somewhere migrates to another cpu at a later time.
> 
> Less than 2 migrates every ten minutes?

The test does not run that long... kind of migrate immediately,
I think in less than a second.

> 
>> The only workaround I currently have is to set the ring thread
>> processing vfs/fuse events in userspace to SCHED_IDLE. In combination
>> with WF_CURRENT_CPU performance then goes from ~2200 to ~9000 file
>> creates/s for a single thread in the latest branch (should be scalable).
>> Which is very close to binding the bonnie++ process to a single core
>> (~9400 creates/s).
> 
> The scheduler is good at dispatching tasks to CPUs at least, and it works
> better with userspace hints as both Prateek and Andrei's works propose. 9400
> shows positive feedback from kernel, and the question is, is it feasible
> in your production environment to set CPU affinity? If yes, what else do
> you want?

Well, this is the fuse file system - each and every user would need to do that
and get core affinity right. I'm personally not setting core affinity for
any 'cp' or 'rsync' I'm doing.

Btw, a very hackish way to 'solve' the issue is this


diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index cd7aa679c3ee..dd32effb5010 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -373,6 +373,26 @@ static void request_wait_answer(struct fuse_req *req)
         int err;
         int prev_cpu = task_cpu(current);
  
+       /* When running over uring and core affined userspace threads, we
+        * do not want to let migrate away the request submitting process.
+        * Issue is that even after waking up on the right core, processes
+        * that have submitted requests might get migrated away, because
+        * the ring thread is still doing a bit of work or is in the process
+        * to go to sleep. Assumption here is that processes are started on
+        * the right core (i.e. idle cores) and can then stay on that core
+        * when they come and do file system requests.
+        * Another alternative way is to set SCHED_IDLE for ring threads,
+        * but that would have an issue if there are other processes keeping
+        * the cpu busy.
+        * SCHED_IDLE or this hack here result in about factor 3.5 for
+        * max meta request performance.
+        *
+        * Ideal would to tell the scheduler that ring threads are not disturbing
+        * that migration away from it should very very rarely happen.
+        */
+       if (fc->ring.ready)
+               migrate_disable();
+
         if (!fc->no_interrupt) {
                 /* Any signal may interrupt this */
                 err = wait_event_interruptible(req->waitq,


So it disables migration and never re-enables it...
I'm still continuing to digg if there is a better way, any
hints are very welcome.


Thanks,
Bernd