linux-kernel - Re: [ANNOUNCE] ddtree: A git kernel tree for storage servers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <170fa0d20803191707y1591d389y898b34ba7b30e7be@mail.gmail.com>
Date:	Wed, 19 Mar 2008 20:07:32 -0400
From:	"Mike Snitzer" <snitzer@...il.com>
To:	"Daniel Phillips" <phillips@...nq.net>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: [ANNOUNCE] ddtree: A git kernel tree for storage servers

On Wed, Mar 19, 2008 at 7:33 PM, Daniel Phillips <phillips@...nq.net> wrote:
> On Wednesday 19 March 2008 13:23, Mike Snitzer wrote:
>
> > >   * Block layer deadlock fixes (Status: production)
>  >
>  > Do you happen to have a pointer to where these block layer deadlock
>  > fixes are?  Or will you be committing them shortly?
>
>  Hi Mike,
>
>  OK, this is committed now, but caveat: improved, untested except for
>  booting.  But what could possibly go wrong? :-/
>
>  http://phunq.net/ddtree?p=ddtree/.git;a=blob;f=patches/bio-throttle
>
>  The production version is sitting in the code.google.com svn repository
>  in ddsnap/patches/2.6.23.8.  That one has a known bug that has somehow
>  escaped being stomped with a new commit, since it only manifests if you
>  stack one stacking block device on top of another one.  I will post here
>  when we have an official, torture tested version of the patch.

You mean like LVM2 LV ontop of MD?  Or stacking purely DM-based
stacked devices (Maybe LVM2 LV ontop of mpath? or dm-crypt on LVM2?).

>  The patch above is improved from the most recently posted version by
>  using using the ->bi_max_vecs field for throttle accounting instead of
>  calling out to a per-driver metric.  This works nicely because the
>  max_vecs field cannot change during the life of the bio, and it gives
>  a decent upper bound on the resource consumption of the bio, better
>  than simply counting bios in flight.  The queue->metric() method is
>  still in there as a stub, some more cleanup to do there (and further
>  shrinking of the patch).  It does no harm.
>
>  This improvement shrinks the throttled version of struct bio by 4
>  bytes.

Cool, so I looked briefly at the ddsnap DM target some time ago and
saw that it needed to take special care to leverage this particular
throttle (I think this was the per-driver metric?).  My memory is
fuzzy on that but what I'm wondering is how "general" is this new
patch?  Do additional steps need to be taken to be able to _really_
guarantee devices won't deadlock?

I typically use dm-linear devices built on MD (raid1 w/ one member
being remote via nbd).  The per-bdi dirty writeback accounting has
proven useful but I've recently hit a nasty livelock when the bdi
accounting for a device no longer enables writeback progress to be
made, e.g:

BdiWriteback:            0 kB
BdiReclaimable:     321408 kB
BdiDirtyThresh:     316364 kB
DirtyThresh:        381284 kB
BackgroundThresh:   190640 kB

With an all too familiar trace like the following:
..
 [<ffffffff8044cda6>] io_schedule_timeout+0x4b/0x79
 [<ffffffff80271371>] congestion_wait+0x66/0x80
 [<ffffffff802457bd>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8026c64d>] balance_dirty_pages_ratelimited_nr+0x21d/0x2b1
 [<ffffffff80268191>] generic_file_buffered_write+0x5f3/0x711

I'm _hoping_ your simple/elegant patch can enable me to drop my 2.6.22
per-bdi backport and all will be right with the world.

What do you think?

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/