[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230818050036.GG3464136@mit.edu>
Date: Fri, 18 Aug 2023 01:00:36 -0400
From: "Theodore Ts'o" <tytso@....edu>
To: Kemeng Shi <shikemeng@...weicloud.com>
Cc: adilger.kernel@...ger.ca, linux-ext4@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 12/13] ext4: remove unnecessary check for avoiding
multiple update_backups in ext4_flex_group_add
On Fri, Aug 18, 2023 at 11:16:35AM +0800, Kemeng Shi wrote:
> Ah, I guess here is the thing I missed that make this confusing:
> sbi->s_group_desc contains only primary block of each group. i.e.
> sbi->s_group_desc['i'] is the primary gdb block of group 'i'.
Correct. In fact, when we need to modify a block group descriptor for
a group, we call ext4_get_group_desc(), and it references
sbi->s_group_desc to find the appropriate buffer_head for the bg
descriptor that we want to modify.
I'm not sure "only" is the right adjective to use here, since the
whole *point* of s_group_desc[] is to keep the buffer_heads for the
block group descriptor blocks in memory, so we can modify them when we
allocate or free blocks, inodes, etc. And we only modify the primary
block group descriptors.
The secondary, or backup block group descriptors are only by used
e2fsck when the primary block group descriptor has been overwritten,
so we can find the inode table, allocation bitmaps, and so on. So we
do *not* modify them in the course of normal operations, and that's by
design. The only time the kernel will update those block group
descriptors is when we are doing an online resize, and we need make
sure the backup descriptors are updated, so that if the primary
descriptors get completely smashed, we can still get critical
information such as the location of that block group's inode table.
> From add_new_gdb and add_new_gdb_meta_bg, we can find that we always
> add primary gdb block of new group to s_group_desc. To be more specific:
> add_new_gdb
> gdblock = EXT4_SB(sb)->s_sbh->b_blocknr + 1 + gdb_num;
> gdb_bh = ext4_sb_bread(sb, gdblock, 0);
> n_group_desc[gdb_num] = gdb_bh;
>
> add_new_gdb_meta_bg
> gdblock = ext4_meta_bg_first_block_no(sb, group) +
> ext4_bg_has_super(sb, group);
> gdb_bh = ext4_sb_bread(sb, gdblock, 0);
> n_group_desc[gdb_num] = gdb_bh;
Put another way, there are EXT4_DESC_PER_BLOCK(sb) bg descriptors in a
block. For a file system with the 64-bit feature enabled, the size of
the block group descriptor is 128. If the block size is 4096, then we
can fit 32 block group descriptors in a block.
When we add a new block group such that its block group descriptor
will spill into a new block, then we need to expand s_group_desc[]
array, and initialize the new block group descriptor block. And
that's the job of add_new_gdb() and add_new_gdb_meta_bg().
> > More generally, this whole patch series is making it clear that the
> > online resize code is hard to understand, because it's super
> > complicated, leading to potential bugs, and very clearly code which is
> > very hard to maintain. So this may mean we need better comments to
> > make it clear *when* the backup block groups are going to be
> > accomplished, given the various different cases (e.g., no flex_bg or
> > meta_bg, with flex_bg, or with meat_bg).
> >
> Yes, I agree with this. I wonder if a series to add comments of some
> common rules is good to you. Like the information mentioned above
> that s_group_desc contains primary gdb block of each group.
Well, the meaning of s_group_desc[] was obvious to me, but that's why
it's useful to have somone with "new eyes" take a look at code, since
what may be obvious to old hands might not be obvious to someone
looking at the code for the first time --- and so yes, it's probably
worth documenting. The question is where is the best place to put it,
since the primary place where s_group_desc[] is *not* online resize.
s_group_desc[] is initialized in ext4_group_desc_init() in
fs/ext4/super.c, and it is used in fs/ext4/balloc.c, and of course, it
is defined in fs/ext4.h.
- Ted
Powered by blists - more mailing lists