lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090317093728.GB27476@kernel.dk>
Date:	Tue, 17 Mar 2009 10:37:29 +0100
From:	Jens Axboe <jens.axboe@...cle.com>
To:	Dave Chinner <david@...morbit.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
	chris.mason@...cle.com, npiggin@...e.de
Subject: Re: [PATCH 2/7] writeback: switch to per-bdi threads for flushing
	data

On Tue, Mar 17 2009, Dave Chinner wrote:
> On Mon, Mar 16, 2009 at 08:33:21AM +0100, Jens Axboe wrote:
> > On Mon, Mar 16 2009, Dave Chinner wrote:
> > > On Fri, Mar 13, 2009 at 11:54:46AM +0100, Jens Axboe wrote:
> > > > On Thu, Mar 12 2009, Andrew Morton wrote:
> > > > > On Thu, 12 Mar 2009 15:33:43 +0100 Jens Axboe <jens.axboe@...cle.com> wrote:
> > > > > Bear in mind that the XFS guys found that one thread per fs had
> > > > > insufficient CPU power to keep up with fast devices.
> > > > 
> > > > Yes, I definitely want to experiment with > 1 thread per device in the
> > > > near future.
> > > 
> > > The question here is how to do this efficiently. Even if XFS is
> > > operating on a single device, it is not optimal just to throw
> > > multiple threads at the bdi. Ideally we want a thread per region
> > > (allocation group) of the filesystem as each allocation group has
> > > it's own inode cache (radix tree) to traverse. These traversals can
> > > be done completely in parallel and won't contend either at the
> > > traversal level or in the IO hardware....
> > > 
> > > i.e. what I'd like to see is the ability so any new flushing
> > > mechanism to be able to offload responsibility of tracking,
> > > traversing and flushing of dirty inodes to the filesystem.
> > > Filesystems that don't do such things could use a generic
> > > bdi-based implementation.
> > > 
> > > FWIW, we also want to avoid the current pattern of flushing
> > > data, then the inode, then data, then the inode, ....
> > > By offloading into the filesystem, this writeback ordering can
> > > be done as efficiently as possible for each given filesystem.
> > > XFs already has all the hooks to be able to do this
> > > effectively....
> > > 
> > > I know that Christoph was doing some work towards this end;
> > > perhaps he can throw his 2c worth in here...
> > 
> > This is very useful feedback, thanks Dave. So on the filesystem vs bdi
> > side, XFS could register a bdi per allocation group.
> 
> How do multiple bdis on a single block device interact?

I think this should be revised. So the structure I have now has a list
of flusher containers hanging off the bdi. backing device registration
will fork the single flusher, then the intention is that users like XFS
could add more flusher threads to the bdi. Basically what I wrote
further down :-)

> > Then set the proper
> > inode->mapping->backing_dev_info from sb->s_op->alloc_inode and
> > __mark_inode_dirty() should get the placement right. For private
> > traverse and flush, provide some address_space op to override
> > generic_sync_bdi_inodes().
> 
> Yes, that seems like it would support the sort of internal XFS
> structure I've been thinking of.

Goodie!

> > It sounds like I should move the bdi flushing bits separate from the bdi
> > itself. Embed one in the bdi, but allow outside registration of others.
> > Will fit better with the need for more than one flusher per backing
> > device.
> 
> *nod*
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@...morbit.com

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ