linux-kernel - Re: io_uring vs CPU hotplug, was Re: [PATCH 5/9] blk-mq: don't set data->ctx and data->hctx in blk_mq_alloc_request

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20200522015719.GB755458@T590>
Date:   Fri, 22 May 2020 09:57:19 +0800
From:   Ming Lei <ming.lei@...hat.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Jens Axboe <axboe@...nel.dk>, Christoph Hellwig <hch@....de>,
        linux-kernel@...r.kernel.org, linux-block@...r.kernel.org,
        John Garry <john.garry@...wei.com>,
        Bart Van Assche <bvanassche@....org>,
        Hannes Reinecke <hare@...e.com>, io-uring@...r.kernel.org,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: io_uring vs CPU hotplug, was Re: [PATCH 5/9] blk-mq: don't set
 data->ctx and data->hctx in blk_mq_alloc_request_hctx

On Thu, May 21, 2020 at 08:39:16PM +0200, Thomas Gleixner wrote:
> Ming,
> 
> Ming Lei <ming.lei@...hat.com> writes:
> > On Thu, May 21, 2020 at 10:13:59AM +0200, Thomas Gleixner wrote:
> >> Ming Lei <ming.lei@...hat.com> writes:
> >> > On Thu, May 21, 2020 at 12:14:18AM +0200, Thomas Gleixner wrote:
> >> > - otherwise, the kthread just retries and retries to allocate & release,
> >> > and sooner or later, its time slice is consumed, and migrated out, and the
> >> > cpu hotplug handler will get chance to run and move on, then the cpu is
> >> > shutdown.
> >> 
> >> 1) This is based on the assumption that the kthread is in the SCHED_OTHER
> >>    scheduling class. Is that really a valid assumption?
> >
> > Given it is unlikely path, we can add msleep() before retrying when INACTIVE bit
> > is observed by current thread, and this way can avoid spinning and should work
> > for other schedulers.
> 
> That should work, but pretty is something else
> 
> >> 
> >> 2) What happens in the following scenario:
> >> 
> >>    unplug
> >> 
> >>      mq_offline
> >>        set_ctx_inactive()
> >>        drain_io()
> >>        
> >>    io_kthread()
> >>        try_queue()
> >>        wait_on_ctx()
> >> 
> >>    Can this happen and if so what will wake up that thread?
> >
> > drain_io() releases all tag of this hctx, then wait_on_ctx() will be waken up
> > after any tag is released.
> 
> drain_io() is already done ...
> 
> So looking at that thread function:
> 
> static int io_sq_thread(void *data)
> {
> 	struct io_ring_ctx *ctx = data;
> 
>         while (...) {
>               ....
> 	      to_submit = io_sqring_entries(ctx);
> 
> --> preemption
> 
> hotplug runs
>    mq_offline()
>       set_ctx_inactive();
>       drain_io();
>       finished();
> 
> --> thread runs again
> 
>       mutex_lock(&ctx->uring_lock);
>       ret = io_submit_sqes(ctx, to_submit, NULL, -1, true);
>       mutex_unlock(&ctx->uring_lock);
> 
>       ....
> 
>       if (!to_submit || ret == -EBUSY)
>           ...
>       	  wait_on_ctx();
> 
> Can this happen or did drain_io() already take care of the 'to_submit'
> items and the call to io_submit_sqes() turns into a zero action ?
> 
> If the above happens then nothing will wake it up because the context
> draining is done and finished.

As Jens replied, you mixed the ctx from io uring and blk-mq, both are in
two worlds.

Any wait in this percpu kthread should just wait for generic resource,
not directly related with blk-mq's inactive hctx. Once this thread is
migrated to other online cpu, it will move on.


Thanks,
Ming