linux-kernel - Re: single aio thread is migrated crazily by scheduler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <93de0f75-3664-c71e-9947-5b37ae935ddc@plexistor.com>
Date:   Thu, 21 Nov 2019 17:02:42 +0200
From:   Boaz Harrosh <boaz@...xistor.com>
To:     Phil Auld <pauld@...hat.com>, Ming Lei <ming.lei@...hat.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Dave Chinner <david@...morbit.com>,
        linux-block@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-xfs@...r.kernel.org, linux-kernel@...r.kernel.org,
        Jeff Moyer <jmoyer@...hat.com>,
        Dave Chinner <dchinner@...hat.com>,
        Eric Sandeen <sandeen@...hat.com>,
        Christoph Hellwig <hch@....de>, Jens Axboe <axboe@...nel.dk>,
        Ingo Molnar <mingo@...hat.com>, Tejun Heo <tj@...nel.org>,
        Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: single aio thread is migrated crazily by scheduler

On 21/11/2019 16:12, Phil Auld wrote:
<>
> 
> The scheduler doesn't know if the queued_work submitter is going to go to sleep.
> That's why I was singling out AIO. My understanding of it is that you submit the IO
> and then keep going. So in that case it might be better to pick a node-local nearby
> cpu instead. But this is a user of work queue issue not a scheduler issue. 
> 

We have a very similar long standing problem in our system (zufs), that we had to do
hacks to fix.

We have seen these CPU bouncing exacly as above in fio and more benchmarks, Our final
analysis was: 
 One thread is in wait_event() if the wake_up() is on the same CPU as the
waiter, on some systems usually real HW and not VMs, would bounce to a different CPU.
Now our system has an array of worker-threads bound to each CPU. an incoming thread chooses
a corresponding cpu worker-thread, let it run, waiting for a reply, then when the
worker-thread is done it will do a wake_up(). Usually its fine and the wait_event() stays
on the same CPU. But on some systems it will wakeup in a different CPU.

Now this is a great pity because in our case and the work_queue case and high % of places 
the thread calling wake_up() will then immediately go to sleep on something.
(Work done lets wait for new work)

I wish there was a flag to wake_up() or to the event object that says to relinquish
the remaning of the time-slice to the waiter on same CPU, since I will be soon sleeping.

Then scheduler need not guess if the wake_up() caller is going to soon sleep or if its
going to continue. Let the coder give an hint about that?

(The hack was to set the waiter CPU mask to the incoming CPU and restore afer wakeup)

> Interestingly in our fio case the 4k one does not sleep and we get the active balance
> case where it moves the actually running thread.  The 512 byte case seems to be 
> sleeping since the migrations are all at wakeup time I believe. 
> 

Yes this is the same thing we saw in our system. (And it happens only sometimes)

> Cheers,
> Phil
> 
> 
>> Thanks,
>> Ming
> 

Very thanks
Boaz