linux-kernel - Re: merging the per-bdi writeback patchset

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20090624100414.GJ31415@kernel.dk>
Date:	Wed, 24 Jun 2009 12:04:14 +0200
From:	Jens Axboe <jens.axboe@...cle.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Linux Kernel <linux-kernel@...r.kernel.org>, hch@...radead.org
Subject: Re: merging the per-bdi writeback patchset

On Tue, Jun 23 2009, Andrew Morton wrote:
> On Tue, 23 Jun 2009 10:55:05 +0200 Jens Axboe <jens.axboe@...cle.com> wrote:
> 
> > On Tue, Jun 23 2009, Andrew Morton wrote:
> > > On Tue, 23 Jun 2009 10:11:56 +0200 Jens Axboe <jens.axboe@...cle.com> wrote:
> > > 
> > > > Things are looking good for this patchset and it's been in -next for
> > > > almost a week without any reports of problems. So I'd like to merge it
> > > > for 2.6.31 if at all possible. Any objections?
> > > 
> > > erk.  I was rather expecting I'd have time to have a look at it all.
> > 
> > OK, we can wait if we have to, just trying to avoid having to keep this
> > fresh for one full cycle. I have posted this patchset 11 times though
> > over months, so it's not like it's a new piece of work :-)
> 
> Yeah, sorry.  
> 
> > > It's unclear to me actually _why_ the performance changes which were
> > > observed have actually occurred.  In fact it's a bit unclear (to me)
> > > why the patchset was written and what it sets out to achieve :(
> > 
> > It started out trying to get rid of the pdflush uneven writeout. If you
> > look at various pdflush intensive workloads, even on a single disk you
> > often have 5 or more pdflush threads working the same device. It's just
> > not optimal.
> 
> That's a bug, isn't it?  This
> 
> 		/* Is another pdflush already flushing this queue? */
> 		if (current_is_pdflush() && !writeback_acquire(bdi))
> 			break;
> 
> isn't working.

But that's on a per-inode basis. I didn't look further into the problem
to be honest, just noticed that you very quickly get a handful of
pdflush threads ticking along.

> > Another issue was starvation with request allocation. Given
> > that pdflush does non-blocking writes (it has to, by design), pdflush
> > can potentially be starved if someone else is working the device.
> 
> hm, true.  100% starved, or just "slowed down"?  The latter I trust -
> otherwise there are still failure modes?

Just slowed down, I'm suspecting this is where the lumpiness comes from
as well. At least in the cases I have seen, in theory you could starve
the pdflush thread indefinitely.

> > > A long time ago the XFS guys (Dave Chinner iirc) said that XFS needs
> > > more than one thread per device to keep the device saturated.  Did that
> > > get addressed?
> > 
> > It supports up to 32-threads per device, but Chinner et all have been
> > silent. So the support is there and there's a
> > super_operations->inode_get_wb() to map a dirty inode to a writeback
> > device. Nobody is doing that yet though.
> 
> OK.
> 
> How many kernel threads do the 1000-spindle people end up with?

If all 1000 spindles are exposed and flushing dirty data, you get 1000
threads. Realistically, you'll likely use some sort of dm/md frontend
though. And then you only get 1 thread per dm/md device.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/