linux-kernel - Re: [rfc][patch 2.6.18-rc7] block: explicit plugging

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 18 Sep 2006 13:10:13 -0700
From:	"Nate Diller" <nate.diller@...il.com>
To:	"Nick Piggin" <npiggin@...e.de>
Cc:	"Linux Kernel Mailing List" <linux-kernel@...r.kernel.org>,
	"Jens Axboe" <axboe@...nel.dk>
Subject: Re: [rfc][patch 2.6.18-rc7] block: explicit plugging

On 9/16/06, Nick Piggin <npiggin@...e.de> wrote:
> Hi,
>
> I've been tinkering with this idea for a while, and I'd be interested
> in seeing what people think about it. The patch isn't in a great state
> of commenting or splitting ;) but I'd be interested feelings about the
> general approach, and whether I'm going to hit any bad problems (eg.
> with SCSI or IDE).

I *really* like this idea, and I would like to help get it merged.  I
can even run some benchmarks for you once I get my test rig running
again.

>
> This is a patch to perform block device plugging explicitly in the submitting
> process context rather than implicitly by the block device.
>
> There are several advantages to plugging in process context over plugging
> by the block device:
>
> - Implicit plugging is only active when the queue empties, so any
>   advantages are lost if there is parallel IO occuring. Not so with
>   explicit plugging.
>
> - Implicit plugging relies on a timer and watermarks and a kind-of-explicit
>   directive in lock_page which directs plugging. These are heuristics and
>   can cost performance due to holding a block device idle longer than it
>   should be. Explicit plugging avoids most of these issues by only holding
>   the device idle when it is known more requests will be submitted.
>
> - This lock_page directive uses a roundabout way to attempt to minimise
>   intrusiveness of plugging on the VM. In doing so, it gets needlessly
>   complex: the VM really is in a good position to direct the block layer
>   as to the nature of its requests, so there is no need to try to hide
>   the fact.

I had a related idea that I have not been able to work on yet.  I
called it "kernel anticipation", and it explicitly instructs the
scheduler when a function is submitting I/O that subsequent I/O is
dependent on.  In other words, when we are composing a bio and get the
"BH_Boundary" flag in a buffer head, mark the bio for mandatory
anticipation, since we know we'll have a hit.  This would enable the
anticipation code to act in some cases even for processes with very
high thinktimes.

> On a parallel tiobench benchmark, of the 800 000 calls to __make_request
> performed, this patch avoids 490 000 (62%) of queue_lock aquisitions by
> early merging on the private plugged list.

Have you run any FS performance benchmorks to check for regressions in
performance?  Who knows, you might even see be able to show a visible
increase :)

> @@ -2865,68 +2762,48 @@ static int __make_request(request_queue_
>          */
>         blk_queue_bounce(q, &bio);
>
> -       spin_lock_prefetch(q->queue_lock);
> -
>         barrier = bio_barrier(bio);
>         if (unlikely(barrier) && (q->next_ordered == QUEUE_ORDERED_NONE)) {
>                 err = -EOPNOTSUPP;
>                 goto end_io;
>         }
>
> +       /* Attempt merging with the plugged list before taking locks */
> +       ioc = current->io_context;
> +       if (ioc && ioc->plugged && !list_empty(&ioc->plugged_list)) {
> +               struct request *rq;
> +               rq = list_entry_rq(ioc->plugged_list.prev);
> +
> +               el_ret = elv_try_merge(rq, bio);
> +               if (el_ret == ELEVATOR_BACK_MERGE) {
> +                       if (bio_attempt_back_merge(q, rq, nr_sectors, bio))
> +                               goto out;
> +               } else if (el_ret == ELEVATOR_FRONT_MERGE) {
> +                       if (bio_attempt_front_merge(q, rq, nr_sectors, bio))
> +                               goto out;
> +               }
> +       }
> +
>         spin_lock_irq(q->queue_lock);
>
> -       if (unlikely(barrier) || elv_queue_empty(q))
> +       if (elv_queue_empty(q))
>                 goto get_rq;
>
>         el_ret = elv_merge(q, &req, bio);
> -       switch (el_ret) {

Have you considered skipping the queue merge entirely, if there is not
a hit in the plugged_list?  I would be interested to see a "hit rate"
of how many queue merge attempts are successful here.  I bet it's
pretty low.  The difference froim just skipping these entirely might
not even be visible in a benchmark.  It'd be pretty cool to be able to
eliminate the queue merging interface altogether.

Thanks for doing this work, it looks really promising.

NATE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/