lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZBRHCHySeQ0KC/f7@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com>
Date:   Fri, 17 Mar 2023 16:25:04 +0530
From:   Ojaswin Mujoo <ojaswin@...ux.ibm.com>
To:     Jan Kara <jack@...e.cz>
Cc:     linux-ext4@...r.kernel.org, "Theodore Ts'o" <tytso@....edu>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        Ritesh Harjani <ritesh.list@...il.com>,
        Andreas Dilger <adilger@...ger.ca>
Subject: Re: [RFC 08/11] ext4: Don't skip prefetching BLOCK_UNINIT groups

On Thu, Mar 09, 2023 at 03:14:22PM +0100, Jan Kara wrote:
> On Fri 27-01-23 18:07:35, Ojaswin Mujoo wrote:
> > Currently, ext4_mb_prefetch() and ext4_mb_prefetch_fini() skip
> > BLOCK_UNINIT groups since fetching their bitmaps doesn't need disk IO.
> > As a consequence, we end not initializing the buddy structures and CR0/1
> > lists for these BGs, even though it can be done without any disk IO
> > overhead. Hence, don't skip such BGs during prefetch and prefetch_fini.
> > 
> > This improves the accuracy of CR0/1 allocation as earlier, we could have
> > essentially empty BLOCK_UNINIT groups being ignored by CR0/1 due to their buddy
> > not being initialized, leading to slower CR2 allocations. With this patch CR0/1
> > will be able to discover these groups as well, thus improving performance.
> > 
> > Signed-off-by: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
> > Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@...il.com>
> 
> The patch looks good. I just somewhat wonder - this change may result in
> uninitialized groups being initialized and used earlier (previously we'd
> rather search in other already initialized groups) which may spread
> allocations more. But I suppose that's fine and uninit groups are not
> really a feature meant to limit fragmentation and as the filesystem ages
> the differences should be minimal. So feel free to add:
> 
> Reviewed-by: Jan Kara <jack@...e.cz>
> 
> 								Honza
Thanks for the review. As for the allocation spread, I agree that it
should be something our goal determination logic should take care of
rather than limiting the BGs available to the allocator.

Another point I wanted to discuss wrt this patch series was why were the
BLOCK_UNINIT groups not being prefetched earlier. One point I can think
of is that this might lead to memory pressure when we have too many
empty BGs in a very large (say terabytes) disk.

But i'd still like to know if there's some history behind not
prefetching block uninit.

Cc'ing Andreas as well to check if they came across anything in Lustre
in the past.
> 
> > ---
> >  fs/ext4/mballoc.c | 8 ++------
> >  1 file changed, 2 insertions(+), 6 deletions(-)
> > 
> > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> > index 14529d2fe65f..48726a831264 100644
> > --- a/fs/ext4/mballoc.c
> > +++ b/fs/ext4/mballoc.c
> > @@ -2557,9 +2557,7 @@ ext4_group_t ext4_mb_prefetch(struct super_block *sb, ext4_group_t group,
> >  		 */
> >  		if (!EXT4_MB_GRP_TEST_AND_SET_READ(grp) &&
> >  		    EXT4_MB_GRP_NEED_INIT(grp) &&
> > -		    ext4_free_group_clusters(sb, gdp) > 0 &&
> > -		    !(ext4_has_group_desc_csum(sb) &&
> > -		      (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)))) {
> > +		    ext4_free_group_clusters(sb, gdp) > 0 ) {
> >  			bh = ext4_read_block_bitmap_nowait(sb, group, true);
> >  			if (bh && !IS_ERR(bh)) {
> >  				if (!buffer_uptodate(bh) && cnt)
> > @@ -2600,9 +2598,7 @@ void ext4_mb_prefetch_fini(struct super_block *sb, ext4_group_t group,
> >  		grp = ext4_get_group_info(sb, group);
> >  
> >  		if (EXT4_MB_GRP_NEED_INIT(grp) &&
> > -		    ext4_free_group_clusters(sb, gdp) > 0 &&
> > -		    !(ext4_has_group_desc_csum(sb) &&
> > -		      (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)))) {
> > +		    ext4_free_group_clusters(sb, gdp) > 0) {
> >  			if (ext4_mb_init_group(sb, group, GFP_NOFS))
> >  				break;
> >  		}
> > -- 
> > 2.31.1
> > 
> -- 
> Jan Kara <jack@...e.com>
> SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ