lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 24 Sep 2009 21:22:52 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	Jens Axboe <jens.axboe@...cle.com>
Cc:	"Li, Shaohua" <shaohua.li@...el.com>,
	lkml <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Chris Mason <chris.mason@...cle.com>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	Jan Kara <jack@...e.cz>
Subject: Re: [RFC] page-writeback: move indoes from one superblock together

On Thu, Sep 24, 2009 at 08:35:19PM +0800, Jens Axboe wrote:
> On Thu, Sep 24 2009, Wu Fengguang wrote:
> > On Thu, Sep 24, 2009 at 02:54:20PM +0800, Li, Shaohua wrote:
> > > __mark_inode_dirty adds inode to wb dirty list in random order. If a disk has
> > > several partitions, writeback might keep spindle moving between partitions.
> > > To reduce the move, better write big chunk of one partition and then move to
> > > another. Inodes from one fs usually are in one partion, so idealy move indoes
> > > from one fs together should reduce spindle move. This patch tries to address
> > > this. Before per-bdi writeback is added, the behavior is write indoes
> > > from one fs first and then another, so the patch restores previous behavior.
> > > The loop in the patch is a bit ugly, should we add a dirty list for each
> > > superblock in bdi_writeback?
> > > 
> > > Test in a two partition disk with attached fio script shows about 3% ~ 6%
> > > improvement.
> > 
> > A side note: given the noticeable performance gain, I wonder if it
> > deserves to generalize the idea to do whole disk location ordered
> > writeback. That should benefit many small file workloads more than
> > 10%. Because this patch only sorted 2 partitions and inodes in 5s
> > time window, while the below patch will roughly divide the disk into
> > 5 areas and sort inodes in a larger 25s time window.
> > 
> >         http://lkml.org/lkml/2007/8/27/45
> > 
> > Judging from this old patch, the complexity cost would be about 250
> > lines of code (need a rbtree).
> 
> First of all, nice patch, I'll add it to the current tree. I too was

You mean Shaohua's patch? It should be a good addition for 2.6.32.

In long term move_expired_inodes() needs some rework.  Because it
could be time consuming to move around all the inodes in a large
system, and thus hold inode_lock() for too long time (and this patch
scales up the locked time).

So would need to split the list moves into smaller pieces in future,
or to change data structure.

> pondering using an rbtree for sb+dirty_time insertion and extraction.

FYI Michael Rubin did some work on a rbtree implementation, just
in case you are interested:

        http://lkml.org/lkml/2008/1/15/25

> But for 100 inodes or less, I bet that just doing the re-sort in
> writeback time ends up being cheaper on the CPU cycle side.

Yeah.

Thanks,
Fengguang

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ