linux-kernel - Re: [RESEND RFC PATCH 2/8] block: Allow sending a batch of requests from the scheduler to hardware

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADBw62rWWk=Jeg7=3nEBqK2prQjLzCGcFyJ+WkZ0D6_grcETHA@mail.gmail.com>
Date:   Fri, 20 Mar 2020 18:27:41 +0800
From:   Baolin Wang <baolin.wang7@...il.com>
To:     Ming Lei <ming.lei@...hat.com>
Cc:     axboe@...nel.dk, Paolo Valente <paolo.valente@...aro.org>,
        Ulf Hansson <ulf.hansson@...aro.org>,
        Adrian Hunter <adrian.hunter@...el.com>,
        Arnd Bergmann <arnd@...db.de>,
        Linus Walleij <linus.walleij@...aro.org>,
        Orson Zhai <orsonzhai@...il.com>,
        Chunyan Zhang <zhang.lyra@...il.com>,
        linux-mmc <linux-mmc@...r.kernel.org>,
        linux-block <linux-block@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RESEND RFC PATCH 2/8] block: Allow sending a batch of requests
 from the scheduler to hardware

Hi Ming,

On Wed, Mar 18, 2020 at 6:26 PM Baolin Wang <baolin.wang7@...il.com> wrote:
>
> Hi Ming,
>
> On Wed, Mar 18, 2020 at 6:01 PM Ming Lei <ming.lei@...hat.com> wrote:
> >
> > On Mon, Mar 16, 2020 at 06:01:19PM +0800, Baolin Wang wrote:
> > > As we know, some SD/MMC host controllers can support packed request,
> > > that means we can package several requests to host controller at one
> > > time to improve performence. So the hardware driver expects the blk-mq
> > > can dispatch a batch of requests at one time, and driver can use bd.last
> > > to indicate if it is the last request in the batch to help to combine
> > > requests as much as possible.
> > >
> > > Thus we should add batch requests setting from the block driver to tell
> > > the scheduler how many requests can be dispatched in a batch, as well
> > > as changing the scheduler to dispatch more than one request if setting
> > > the maximum batch requests number.
> > >
> >
> > I feel this batch dispatch style is more complicated, and some other
> > drivers(virtio blk/scsi) still may get benefit if we can pass real 'last' flag in
> > .queue_rq().
> >
> > So what about the following way by extending .commit_rqs() to this usage?
> > And you can do whatever batch processing in .commit_rqs() which will be
> > guaranteed to be called if BLK_MQ_F_FORCE_COMMIT_RQS is set by driver.
>
> I'm very appreciated for your good suggestion, which is much simpler than mine.
> It seems to solve my problem, and I will try it on my platform to see
> if it can work and give you the feadback. Thanks again.

I tried your approach on my platform, but met some problems, see below.

>
> > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > index 856356b1619e..cd2bbe56f83f 100644
> > --- a/block/blk-mq-sched.c
> > +++ b/block/blk-mq-sched.c
> > @@ -85,11 +85,12 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
> >   * its queue by itself in its completion handler, so we don't need to
> >   * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> >   */
> > -static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > +static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> >  {
> >         struct request_queue *q = hctx->queue;
> >         struct elevator_queue *e = q->elevator;
> >         LIST_HEAD(rq_list);
> > +       bool ret = false;
> >
> >         do {
> >                 struct request *rq;
> > @@ -112,7 +113,10 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> >                  * in blk_mq_dispatch_rq_list().
> >                  */
> >                 list_add(&rq->queuelist, &rq_list);
> > -       } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > +               ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > +       } while (ret);
> > +
> > +       return ret;
> >  }
> >
> >  static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> > @@ -131,11 +135,12 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> >   * its queue by itself in its completion handler, so we don't need to
> >   * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> >   */
> > -static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > +static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> >  {
> >         struct request_queue *q = hctx->queue;
> >         LIST_HEAD(rq_list);
> >         struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from);
> > +       bool ret = false;
> >
> >         do {
> >                 struct request *rq;
> > @@ -161,10 +166,12 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> >
> >                 /* round robin for fair dispatch */
> >                 ctx = blk_mq_next_ctx(hctx, rq->mq_ctx);
> > -
> > -       } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > +               ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > +       } while (ret);
> >
> >         WRITE_ONCE(hctx->dispatch_from, ctx);
> > +
> > +       return ret;
> >  }
> >
> >  void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > @@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> >         struct elevator_queue *e = q->elevator;
> >         const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
> >         LIST_HEAD(rq_list);
> > +       bool dispatch_ret;
> >
> >         /* RCU or SRCU read lock is needed before checking quiesced flag */
> >         if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> > @@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> >          */
> >         if (!list_empty(&rq_list)) {
> >                 blk_mq_sched_mark_restart_hctx(hctx);
> > -               if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
> > +               dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
> > +               if (dispatch_ret) {
> >                         if (has_sched_dispatch)
> > -                               blk_mq_do_dispatch_sched(hctx);
> > +                               dispatch_ret = blk_mq_do_dispatch_sched(hctx);

If we dispatched a request successfully by blk_mq_dispatch_rq_list(),
and got dispatch_ret = true now. Then we will try to dispatch more
reuqests from scheduler by blk_mq_do_dispatch_sched(), but if now no
more requests in scheduler, then we will got dispatch_ret = false. In
this case, we will not issue commit_rqs() to tell the hardware to
handle previous request dispatched from &rq_list.

So I think we should not overlap the 'dispatch_ret'? Or do you have
any other thoughts to fix?

> >                         else
> > -                               blk_mq_do_dispatch_ctx(hctx);
> > +                               dispatch_ret = blk_mq_do_dispatch_ctx(hctx);
> >                 }
> >         } else if (has_sched_dispatch) {
> > -               blk_mq_do_dispatch_sched(hctx);
> > +               dispatch_ret = blk_mq_do_dispatch_sched(hctx);
> >         } else if (hctx->dispatch_busy) {
> >                 /* dequeue request one by one from sw queue if queue is busy */
> > -               blk_mq_do_dispatch_ctx(hctx);
> > +               dispatch_ret = blk_mq_do_dispatch_ctx(hctx);
> >         } else {
> >                 blk_mq_flush_busy_ctxs(hctx, &rq_list);
> > -               blk_mq_dispatch_rq_list(q, &rq_list, false);
> > +               dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
> > +       }
> > +
> > +       if (dispatch_ret) {
> > +               if (hctx->flags & BLK_MQ_F_FORCE_COMMIT_RQS)
> > +                       hctx->queue->mq_ops->commit_rqs(hctx);
> >         }
> >  }
> >
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 87c6699f35ae..9b46f5d6c7fd 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -1238,11 +1238,15 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
> >                  * Flag last if we have no more requests, or if we have more
> >                  * but can't assign a driver tag to it.
> >                  */
> > -               if (list_empty(list))
> > -                       bd.last = true;
> > -               else {
> > -                       nxt = list_first_entry(list, struct request, queuelist);
> > -                       bd.last = !blk_mq_get_driver_tag(nxt);
> > +               if (!(hctx->flags & BLK_MQ_F_FORCE_COMMIT_RQS)) {
> > +                       if (list_empty(list))
> > +                               bd.last = true;
> > +                       else {
> > +                               nxt = list_first_entry(list, struct request, queuelist);
> > +                               bd.last = !blk_mq_get_driver_tag(nxt);
> > +                       }
> > +               } else {
> > +                       bd.last = false;

If we enabled BLK_MQ_F_FORCE_COMMIT_RQS flag, we will always get
bd.last = false even for the real last request in the IO scheduler. I
know you already use commit_irqs() to help to kick driver. But I
worried if it is reasonable that drivers always get bd.last = false.

Thanks for your approach.

> >                 }
> >
> >                 ret = q->mq_ops->queue_rq(hctx, &bd);
> > diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> > index 07fa767bff86..c0ef6990b698 100644
> > --- a/include/linux/blk-mq.h
> > +++ b/include/linux/blk-mq.h
> > @@ -394,6 +394,7 @@ enum {
> >         BLK_MQ_F_SHOULD_MERGE   = 1 << 0,
> >         BLK_MQ_F_TAG_SHARED     = 1 << 1,
> >         BLK_MQ_F_NO_MANAGED_IRQ = 1 << 2,
> > +       BLK_MQ_F_FORCE_COMMIT_RQS = 1 << 3,
> >         BLK_MQ_F_BLOCKING       = 1 << 5,
> >         BLK_MQ_F_NO_SCHED       = 1 << 6,
> >         BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
> >
> > Thanks,
> > Ming
> >
>
>
> --
> Baolin Wang



-- 
Baolin Wang