lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <930A5754-5CE6-4567-8CF0-62447C97825C@dilger.ca>
Date:   Tue, 7 Apr 2020 10:51:52 -0600
From:   Andreas Dilger <adilger@...ger.ca>
To:     Harshad Shirwadkar <harshadshirwadkar@...il.com>
Cc:     linux-ext4@...r.kernel.org
Subject: Re: [PATCH v2 2/3] ext4: shrink directories on dentry delete

On Apr 7, 2020, at 12:46 AM, Harshad Shirwadkar <harshadshirwadkar@...il.com> wrote:
> 
> This patch adds shrinking support for htree based directories. The
> high level algorithm is as follows:
> 
> * If after dentry removal the dirent block (let's call it B) becomes
>  empty, then remove its references in its dx parent.
> * Swap its contents with that of the last block (L) in directory.
> * Update L's parents to point to B instead.
> * Remove L
> * Repeat this for all the ancestors of B.
> 
> We add variants of dx_probe that allow us perform reverse lookups from
> a logical block to its dx parents.
> 
> Ran kvm-xfstests smoke and verified that no new failures are
> introduced. Ran shrinking for directories with following number of
> files and then deleted files one by one:
> * 1000 (size before deletion 36K, after deletion 4K)
> * 10000 (size before deletion 196K, after deletion 4K)
> * 100000 (size before deletion 2.1M, after deletion 4K)
> * 200000 (size before deletion 4.2M, after deletion 4K)
> 
> In all cases directory shrunk significantly. We fallback to linear
> directories if the directory becomes empty.
> 
> But note that most of the shrinking happens during last 1-2% deletions
> in an average case. Therefore, the next step here is to merge dx nodes
> when possible. That can be achieved by storing the fullness index in
> htree nodes.
> 
> Performance Testing:
> -------------------
> 
> Created 1 million files and unlinked all of them. Did this with and without
> directory shrinking. Journalling was on. Used ftrace to measure time
> spent in ext4_unlink:
> 
> * Without directory shrinking
> 
>  Size Before: 22M     /mnt/1
>  1000000 files created and deleted.
>  Average Total Elapsed Time (across 3 runs): 43.787s
>  Size After: 22M     /mnt/1
> 
>  useconds          #count
>  ------------------------
>  0-9                3690
>  10-19           2790131
>  20-29            179209
>  30-39             14270
>  40-49              7981
>  50-59              2212
>  60-69              1132
>  70-79               703
>  80-89               403
>  90-99               274
> 
>  Num samples > 40us: 12615
> 
> * With directory shrinking
> 
>  Size Before: 22M     /mnt/1
>  1000000 files created and deleted.
>  Average Total Elapsed Time(across 3 runs): 44.230s
>  Size After: 4.0K    /mnt/1
> 
>  useconds         #count
>  -----------------------
>  0-9                3316
>  10-19           2786451
>  20-29            172015
>  30-39             13259
>  40-49             17847
>  50-59              4843
>  60-69               924
>  70-79               569
>  80-89               390
>  90-99               389
> 
>  Num samples > 40us: 24962
> 
>  We see doubled number of samples of >40us in case of directory shrinking.
>  Because these constitute to < 1% of total samples, the overall effect of
>  direcotry shrinking on unlink / rmdir performance is negligible. There
>  was no notable difference in cpu usage.
> 
> This patch supersedes the other directory shrinking patch sent in Aug
> 2019 ("ext4: attempt to shrink directory on dentry removal").
> 
> Changes since V1:
>  * ext4_remove_dx_entry(), dx_probe_dx_node() fixes
>  * dx_probe_dirent_blk() continuation fix
>  * Added performance evaluation
> 
> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@...il.com>

Some *very* minor cleanups *iff* the patch is refreshed, but otherwise LGTM.
Thanks also for the good performance numbers, that makes it pretty clear
there is very little downside to enabling this all the time.

Reviewed-by: Andreas Dilger <adilger@...ger.ca>

> ---
> fs/ext4/ext4.h      |   3 +-
> fs/ext4/ext4_jbd2.h |   7 +
> fs/ext4/inline.c    |   2 +-
> fs/ext4/namei.c     | 357 ++++++++++++++++++++++++++++++++++++++++++--
> 4 files changed, 356 insertions(+), 13 deletions(-)
> 
> @@ -820,7 +826,9 @@ dx_probe(struct ext4_filename *fname, struct inode *dir,
> 	dxtrace(printk("Look up %x", hash));
> 	while (1) {
> 		count = dx_get_count(entries);
> -		if (!count || count > dx_get_limit(entries)) {
> +		/* If we are called from the shrink path */
> +		if (count > dx_get_limit(entries) ||
> +		    (strict && !count)) {

(minor) I was wondering if (strict && !count) should be kept first, since it
is a simpler check, but then realized this whole check should be marked
unlikely(), since it should never happen unless the filesystem is corrupted.

> + * This function tries to remove the entry of a dirent block (which was just
> + * emptied by the caller) from the dx frame. It does so by reducing the count by

(very minor) could move "by" to next line to balance line length... :-)

> + * 1 and left shifting all the entries after the deleted entry.
> + * ext4_remove_dx_entry() does NOT mark dx_frame dirty, that's left to the
> + * caller. The reason for doing this is that this function removes entry from
> + * a dx_node and can potentially leave dx_node with no entries. Since older
> + * kernels don't allow dx_parent nodes to have 0 count, the caller should
> + * only mark buffer dirty when we are sure that we won't leave dx_node
> + * with 0 count.
> + */
> +int
> +ext4_remove_dx_entry(handle_t *handle, struct inode *dir,

(style) normally the return type is on the same line as the function
declaration, unless (like dx_probe()) it is so long as to consume a
lot of space.  That isn't the case here.

> +/*
> + * Checks if dirent block is empty. If de is passed, we
> + * skip directory entries before de in the block.
> + */
> +static inline bool is_empty_dirent_block(struct inode *dir,
> +					struct buffer_head *bh,
> +					 struct ext4_dir_entry_2 *de)
> +{
> +	if (!de)
> +		de = (struct ext4_dir_entry_2 *)bh->b_data;
> +	while ((u8 *)de < (u8 *)bh->b_data + dir->i_sb->s_blocksize) {
> +		if (de->inode)
> +			return false;
> +		de = ext4_next_entry(de, dir->i_sb->s_blocksize);
> +	}
> +	return true;
> +}
> +
> @@ -2486,6 +2643,9 @@ int ext4_generic_delete_entry(handle_t *handle,
> 					blocksize);
> 			else
> 				de->inode = 0;
> +			if (empty)
> +				*empty = is_empty_dirent_block(dir, bh,
> +								pde ? pde : de);

Nice.  I was going to say that this shouldn't re-check the whole block,
but it doesn't do that.  is_empty_dirent_block() only checks entries after
the removed entry, exits early if any non-empty dirent is found, and only
checks if we care whether the block is empty or not, and which is ideal.

Cheers, Andreas






Download attachment "signature.asc" of type "application/pgp-signature" (874 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ