lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 6 Apr 2020 23:39:22 -0700
From:   harshad shirwadkar <harshadshirwadkar@...il.com>
To:     Andreas Dilger <adilger@...ger.ca>
Cc:     Ext4 Developers List <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH 2/2] ext4: shrink directories on dentry delete

Thanks for the review Andreas.

On Sat, Mar 28, 2020 at 5:01 PM Andreas Dilger <adilger@...ger.ca> wrote:
>
> On Mar 25, 2020, at 3:37 AM, Harshad Shirwadkar <harshadshirwadkar@...il.com> wrote:
> >
> > This patch adds shrinking support for htree based directories. The
> > high level algorithm is as follows:
> >
> > * If after dentry removal the dirent block (let's call it B) becomes
> >  empty, then remove its references in its dx parent.
> > * Swap its contents with that of the last block (L) in directory.
> > * Update L's parents to point to B instead.
> > * Remove L
> > * Repeat this for all the ancestors of B.
> >
> > We add variants of dx_probe that allow us perform reverse lookups from
> > a logical block to its dx parents.
> >
> > Ran kvm-xfstests smoke and verified that no new failures are
> > introduced. Ran shrinking for directories with following number of
> > files and then deleted files one by one:
> > * 1000 (size before deletion 36K, after deletion 4K)
> > * 10000 (size before deletion 196K, after deletion 4K)
> > * 100000 (size before deletion 2.1M, after deletion 4K)
> > * 200000 (size before deletion 4.2M, after deletion 4K)
> >
> > In all cases directory shrunk significantly. We fallback to linear
> > directories if the directory becomes empty.
> >
> > But note that most of the shrinking happens during last 1-2% deletions
> > in an average case. Therefore, the next step here is to merge dx nodes
> > when possible. That can be achieved by storing the fullness index in
> > htree nodes. But that's an on-disk format change. We can instead build
> > on tooling added by this patch to perform reverse lookup on a dx
> > node and then reading adjacent nodes to check their fullness.
> >
> > This patch supersedes the other directory shrinking patch sent in Aug
> > 2019 ("ext4: attempt to shrink directory on dentry removal").
> >
> > Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@...il.com>
> > ---
> > fs/ext4/ext4_jbd2.h |   7 +
> > fs/ext4/namei.c     | 355 ++++++++++++++++++++++++++++++++++++++++++--
> > 2 files changed, 353 insertions(+), 9 deletions(-)
> >
> > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> > index d567b9589875..b78c6f9a6eba 100644
> > --- a/fs/ext4/namei.c
> > +++ b/fs/ext4/namei.c
> >
> > +/*
> > + * dx_probe with relaxed checks. This function is used in the directory
> > + * shrinking code since we can run into intermediate states where we have
> > + * internal dx nodes with count = 0.
> > + */
> > +static inline struct dx_frame *
> > +dx_probe_relaxed(struct ext4_filename *fname, struct inode *dir,
> > +             struct dx_hash_info *hinfo, struct dx_frame *frame_in)
> > +{
> > +     return __dx_probe(fname, dir, hinfo, frame_in, 0, false);
> > +}
> > +
> > +/*
> > + * Perform only a parttial dx_probe until we find block end_lblk.
>
> (typo) "partial"
Ack
>
> > +static inline struct dx_frame *
> > +dx_probe_partial(struct ext4_filename *fname, struct inode *dir,
> > +              struct dx_hash_info *hinfo, struct dx_frame *frame_in,
> > +              ext4_lblk_t end_lblk)
> > +{
> > +     return __dx_probe(fname, dir, hinfo, frame_in, end_lblk, false);
> > +}
> > +
> [snip]
Ack
> > +/*
> > + * This function tries to remove the entry of a dirent block (which was just
> > + * emptied by the caller) from the dx frame. It does so by reducing the count by
> > + * 1 and left shifting all the entries after the deleted entry.
> > + */
> > +int
> > +ext4_remove_dx_entry(handle_t *handle, struct inode *dir,
> > +                  struct dx_frame *dx_frame)
> > +{
> > +     err = ext4_journal_get_write_access(handle, dx_frame->bh);
> > +     if (err) {
> > +             ext4_std_error(dir->i_sb, err);
> > +             return -EINVAL;
> > +     }
> > +
> > +     for (; i < count - 1; i++)
> > +             entries[i] = entries[i + 1];
>
> It would be more efficient to do this with "memmove()" rather than copying
> each entry separately.
Ack, implemented in V2.
>
> > +     /*
> > +      * If i was 0 when we began above loop, we would have overwritten count
> > +      * and limit values since those values live in dx_entry->hash of the
> > +      * first entry. We need to update count but we should set limit as well.
> > +      */
> > +     dx_set_count(entries, count - 1);
> > +     dx_set_limit(entries, limit);
>
> How hard is it to avoid clobbering these fields in the first place?
> I'm just thinking that "clobber + fixup" is subject to race conditions
> at various times in the past, and may become an issue in the future
> (e.g. with parallel directory operations).
>
Ack, I agree better to not clobber in the first place. Implemented in V2.
> > static inline bool is_empty_dirent_block(struct inode *dir,
> > +                                      struct buffer_head *bh)
> > +{
>
> This should be combined with ext4_empty_dir() to avoid code duplication.
Thanks, this also simplifies ext4_empty_dir().
>
> > +     struct ext4_dir_entry_2 *de = (struct ext4_dir_entry_2 *)bh->b_data;
> > +     int     csum_size = 0;
> > +
> > +     if (ext4_has_metadata_csum(dir->i_sb) && is_dx(dir))
> > +             csum_size = sizeof(struct ext4_dir_entry_tail);
> > +
> > +     return ext4_rec_len_from_disk(de->rec_len, dir->i_sb->s_blocksize) ==
> > +                     dir->i_sb->s_blocksize - csum_size && de->inode == 0;
> > +}
>
> This looks like a low cost way to determine the leaf block is empty,
> but checking this for every unlink likely has a non-zero cost.
Ack
>
> > @@ -2530,6 +2864,9 @@ static int ext4_delete_entry(handle_t *handle,
> >       if (unlikely(err))
> >               goto out;
> >
> > +     if (is_dx(dir))
> > +             ext4_try_dir_shrink(handle, dir, lblk, bh);
> > +
> >       return 0;
>
> It would be useful to run a comparison benchmark between the patched ext4
> and unpatched when deleting a large number of entries that checks both CPU
> usage and performance.  That will give us an idea of how much this costs
> to be checked for every entry.
CPU performance wasn't that different fo both the cases, but there are
some differences in overall unlink performance. I added a performance
evaluation section in V2.
>
> Also, rather than calling ext4_try_dir_shrink() and is_empty_dirent_block()
> for every entry, couldn't this be returned from ext4_generic_delete_entry(),
> since it has that information already.
Thanks, implemented this in V2.

Thanks,
Harshad
>
> Cheers, Andreas
>
>
>
>
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ