lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200803203400.GB1214@quack2.suse.cz>
Date:   Mon, 3 Aug 2020 22:34:00 +0200
From:   Jan Kara <jack@...e.cz>
To:     Ted Tso <tytso@....edu>
Cc:     Alex Zhuravlev <azhuravlev@...mcloud.com>,
        "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
        Ritesh Harjani <riteshh@...ux.ibm.com>
Subject: Re: [PATCH 2/2] ext4: mballoc - limit scanning of uninitialized
 groups

On Thu 21-05-20 12:34:29, Ritesh Harjani wrote:
> 
> 
> On 5/20/20 3:15 PM, Alex Zhuravlev wrote:
> > cr=0 is supposed to be an optimization to save CPU cycles, but if
> > buddy data (in memory) is not initialized then all this makes no
> > sense as we have to do sync IO taking a lot of cycles.
> > also, at cr=0 mballoc doesn't store any available chunk. cr=1 also
> > skips groups using heuristic based on avg. fragment size. it's more
> > useful to skip such groups and switch to cr=2 where groups will be
> > scanned for available chunks.
> > 
> > The goal group is not skipped to prevent allocations in foreign groups,
> > which can happen after mount while buddy data is still being populated.
> > 
> > using sparse image and dm-slow virtual device of 120TB was simulated.
> > then the image was formatted and filled using debugfs to mark ~85% of
> > available space as busy. the very first allocation w/o the patch could
> > not complete in half an hour (according to vmstat it would take ~10-1
> > hours). with the patch applied the allocation took ~20 seconds.
> > 
> > Signed-off-by: Alex Zhuravlev <bzzz@...mcloud.com>
> > Reviewed-by: Andreas Dilger <adilger@...mcloud.com>
> 
> This looks even better to me. Feel free to add:
> Reviewed-by: Ritesh Harjani <riteshh@...ux.ibm.com>

Going through some old email... Ted, why wasn't this patch merged?

								Honza

> >   fs/ext4/mballoc.c | 30 +++++++++++++++++++++++++++++-
> >   1 file changed, 29 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> > index 30d5d97548c4..f719714862b5 100644
> > --- a/fs/ext4/mballoc.c
> > +++ b/fs/ext4/mballoc.c
> > @@ -1877,6 +1877,21 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac,
> >   	return 0;
> >   }
> > +static inline int ext4_mb_uninit_on_disk(struct super_block *sb,
> > +				    ext4_group_t group)
> > +{
> > +	struct ext4_group_desc *desc;
> > +
> > +	if (!ext4_has_group_desc_csum(sb))
> > +		return 0;
> > +
> > +	desc = ext4_get_group_desc(sb, group, NULL);
> > +	if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))
> > +		return 1;
> > +
> > +	return 0;
> > +}
> > +
> >   /*
> >    * The routine scans buddy structures (not bitmap!) from given order
> >    * to max order and tries to find big enough chunk to satisfy the req
> > @@ -2060,7 +2075,20 @@ static int ext4_mb_good_group(struct ext4_allocation_context *ac,
> >   	/* We only do this if the grp has never been initialized */
> >   	if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) {
> > -		int ret = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS);
> > +		int ret;
> > +
> > +		/* cr=0/1 is a very optimistic search to find large
> > +		 * good chunks almost for free. if buddy data is
> > +		 * not ready, then this optimization makes no sense.
> > +		 * instead it leads to loading (synchronously) lots
> > +		 * of groups and very slow allocations.
> > +		 * but don't skip the goal group to keep blocks in
> > +		 * the inode's group. */
> > +
> > +		if (cr < 2 && !ext4_mb_uninit_on_disk(ac->ac_sb, group) &&
> > +		    ac->ac_g_ex.fe_group != group)
> > +			return 0;
> > +		ret = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS);
> >   		if (ret)
> >   			return ret;
> >   	}
> > 
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ