lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <E1M2ts6-0005jr-Cg@closure.thunk.org>
Date:	Sat, 09 May 2009 17:14:14 -0400
From:	"Theodore Ts'o" <tytso@....edu>
To:	Matthew Wilcox <willy@...ux.intel.com>,
	Jens Axboe <jens.axboe@...cle.com>,
	Ric Wheeler <rwheeler@...hat.com>
cc:	linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org
Subject: Is TRIM/DISCARD going to be a performance problem?


Currently, ext4 is wired up to call sb_issue_discard, which is a wrapper
around blkdev_issue_discard().  The way we do this is we keep track of
deleted extents, coalescing them as much as possible, and then once we
commit the transaction where they are deleted, we send the discards down
the pipe via sb_issue_discard.  For example, after marking approximately
200 mail messages as deleted, and running the mbsync command which
synchronizes my local Maildir store with my IMAP server (and thus
deleting approximately 200 files), and the next commit, we see this:

3480.770129: jbd2_start_commit: dev dm-0 transaction 760204 sync 0
3480.783797: ext4_discard_blocks: dev dm-0 blk 15967955 count 1
3480.783830: ext4_discard_blocks: dev dm-0 blk 15970048 count 104
3480.783839: ext4_discard_blocks: dev dm-0 blk 17045096 count 14
3480.783842: ext4_discard_blocks: dev dm-0 blk 15702398 count 2
	     .
	     .
	     .
3480.784009: ext4_discard_blocks: dev dm-0 blk 15461632 count 32
3480.784015: ext4_discard_blocks: dev dm-0 blk 17057632 count 32
3480.784023: ext4_discard_blocks: dev dm-0 blk 17049120 count 32
3480.784026: ext4_discard_blocks: dev dm-0 blk 17045408 count 32
3480.784031: ext4_discard_blocks: dev dm-0 blk 15448634 count 6
3480.784036: ext4_discard_blocks: dev dm-0 blk 17146618 count 1
3480.784039: ext4_discard_blocks: dev dm-0 blk 17146370 count 1
3480.784043: ext4_discard_blocks: dev dm-0 blk 15967947 count 6
3480.784046: jbd2_end_commit: dev dm-0 transaction 760204 sync 0 head 758551

There were 42 calls to blkdev_issue_discard (I ommitted some for the
sake of brevity), and that's a relatively minimal example.  A "make
mrclean" in the kernel tree, especially one that tends to be more
fragmented due to a mix of source and binary files getting updated via
"git pull", will be much, much worse, and could result in potential
hundreds of calls to blkev_issue_discard().  Given that each call to
blkdeV_issue_discard() acts like a barrier command and requires that the
queue be completely drained (of both read and write requests, if I
understand things correctly) if there's anything else happening in
parallel, such as other write or read requests, performance is going to
go down the tubes.

What I'm thinking that we might have to do is:

*)  Batch the trim requests more than a single commit, by having a
	separate rbtree for trim requests
*)  If blocks get reused, we'll need to remove them from the rbtree
*)  In some cases, we may be able to collapse the rbtree by querying the
	filesystem block allocation data structures to determine that if
	we have an entry for blocks 1003-1008 and 1011-1050, and block
	1009 and 1010 are unused, we can combine this into a single
	trim request for 1003-1050.
*)  Create an upcall from the block layer to the trim management layer
	indicating that the I/O device is idle, so this would be a good
	time to send down a whole bunch of trim requeusts.
*)  Optionally have a mode to support stupid thin-provision
	devices that require the trim request to be aligned on some
	large 1 or 4 megabyte boundaries, and be multiples of 1-4
	megabyte ranges, or they will ignroe them.
*)  Optionally have a mode which allows the filesystem's block allocator
	to query the list of blocks on the "to be trimmed" list, so they
	can be reused and hopefully avoid needing to send the trim
	request in the first place.

This could either be done as ext4-specific code, or as a generic "trim
management layer" which could be utilized by any filesystem.

So, a couple of questions:  First of all, do people agree with my
concerns?   Secondly, does the above design seem sane?   And finally, if
the answers to the first two questions are yes, I'm rather busy and
could really use a minion to implement my evil plans --- anyone have any
ideas about how to contact the vendors of these large thin-provisioning
devices, and perhaps gently suggest to them that if they plan to make
$$$ off their devices, maybe they should fund this particular piece of
work?   :-)

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ