linux-kernel - Re: [PATCH v6 0/4] Add MMC software queue support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAK8P3a27PLB_rXkxuyOuK_eKvSSi29fekebcwsoTRLSAOmRX+w@mail.gmail.com>
Date:   Tue, 26 Nov 2019 10:54:01 +0100
From:   Arnd Bergmann <arnd@...db.de>
To:     Paolo Valente <paolo.valente@...aro.org>
Cc:     "(Exiting) Baolin Wang" <baolin.wang@...aro.org>,
        Baolin Wang <baolin.wang7@...il.com>,
        Adrian Hunter <adrian.hunter@...el.com>,
        Ulf Hansson <ulf.hansson@...aro.org>,
        Asutosh Das <asutoshd@...eaurora.org>,
        Orson Zhai <orsonzhai@...il.com>,
        Lyra Zhang <zhang.lyra@...il.com>,
        Linus Walleij <linus.walleij@...aro.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        linux-mmc <linux-mmc@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Hannes Reinecke <hare@...e.com>,
        linux-block <linux-block@...r.kernel.org>
Subject: Re: [PATCH v6 0/4] Add MMC software queue support

On Tue, Nov 26, 2019 at 8:41 AM Paolo Valente <paolo.valente@...aro.org> wrote:
> > Il giorno 22 nov 2019, alle ore 10:50, Arnd Bergmann <arnd@...db.de> ha scritto:
> > On Mon, Nov 18, 2019 at 11:04 AM (Exiting) Baolin Wang <baolin.wang@...aro.org> wrote:
> > Paolo, can you comment on why this is currently done, or if it can
> > be changed? It seems to me that sending multiple requests at
> > once would also have a significant benefit on the per-request overhead
> > on NVMe devices with with bfq.
> >
>
> Hi,
> actually, "one request dispatched at a time" is not a peculiarity of
> bfq.  Any scheduler can provide only one request at a time, with the
> current blk-mq API for I/O schedulers.
>
> Yet, when it is time to refill an hardware queue, blk-mq pulls as many
> requests as it deems appropriate from the scheduler, by invoking the
> latter multiple times.  See blk_mq_do_dispatch_sched() in
> block/blk-mq-sched.c.
>
> I don't know where the glitch for MMC is with respect to this scheme.

Right, this is what is puzzling me as well: in both blk_mq_do_dispatch_sched()
and blk_mq_do_dispatch_ctx(), we seem to always take one request from
the scheduler and dispatch it to the device, regardless of the driver or
the scheduler, so there should only ever be one request in the local list.

Yet, both the blk_mq_dispatch_rq_list() function and the NVMe driver
appear to be written based on the idea that there are multiple entries
in this list. The one place that I see putting multiple requests on the
local list before dispatching them is the end of
blk_mq_sched_dispatch_requests():

        if (!list_empty(&rq_list)) {
              ...
                }
        } else if (has_sched_dispatch) {
                blk_mq_do_dispatch_sched(hctx);
        } else if (hctx->dispatch_busy) {
                /* dequeue request one by one from sw queue if queue is busy */
                blk_mq_do_dispatch_ctx(hctx);
        } else {
->             blk_mq_flush_busy_ctxs(hctx, &rq_list);        <----
                blk_mq_dispatch_rq_list(q, &rq_list, false);
        }

So as you said, if we use an elevator (has_sched_dispatch == true),
we only get one request, but without an elevator, we get into this
optimized path.

Could we perhaps change the ops.dispatch_request() function to pass
down the list as in https://paste.ubuntu.com/p/MfSRwKqFCs/ ?

      Arnd