[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8f8dc5cf-cfd9-eb90-9f09-ee2dc89de537@huaweicloud.com>
Date: Fri, 18 Aug 2023 16:42:31 +0800
From: Kemeng Shi <shikemeng@...weicloud.com>
To: Theodore Ts'o <tytso@....edu>
Cc: adilger.kernel@...ger.ca, linux-ext4@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 12/13] ext4: remove unnecessary check for avoiding
multiple update_backups in ext4_flex_group_add
on 8/18/2023 1:00 PM, Theodore Ts'o wrote:
> On Fri, Aug 18, 2023 at 11:16:35AM +0800, Kemeng Shi wrote:
>> Ah, I guess here is the thing I missed that make this confusing:
>> sbi->s_group_desc contains only primary block of each group. i.e.
>> sbi->s_group_desc['i'] is the primary gdb block of group 'i'.
>
> Correct. In fact, when we need to modify a block group descriptor for
> a group, we call ext4_get_group_desc(), and it references
> sbi->s_group_desc to find the appropriate buffer_head for the bg
> descriptor that we want to modify.
>
> I'm not sure "only" is the right adjective to use here, since the
> whole *point* of s_group_desc[] is to keep the buffer_heads for the
> block group descriptor blocks in memory, so we can modify them when we
> allocate or free blocks, inodes, etc. And we only modify the primary
> block group descriptors.
>
> The secondary, or backup block group descriptors are only by used
> e2fsck when the primary block group descriptor has been overwritten,
> so we can find the inode table, allocation bitmaps, and so on. So we
> do *not* modify them in the course of normal operations, and that's by
> design. The only time the kernel will update those block group
> descriptors is when we are doing an online resize, and we need make
> sure the backup descriptors are updated, so that if the primary
> descriptors get completely smashed, we can still get critical
> information such as the location of that block group's inode table.
> >> From add_new_gdb and add_new_gdb_meta_bg, we can find that we always
>> add primary gdb block of new group to s_group_desc. To be more specific:
>> add_new_gdb
>> gdblock = EXT4_SB(sb)->s_sbh->b_blocknr + 1 + gdb_num;
>> gdb_bh = ext4_sb_bread(sb, gdblock, 0);
>> n_group_desc[gdb_num] = gdb_bh;
>>
>> add_new_gdb_meta_bg
>> gdblock = ext4_meta_bg_first_block_no(sb, group) +
>> ext4_bg_has_super(sb, group);
>> gdb_bh = ext4_sb_bread(sb, gdblock, 0);
>> n_group_desc[gdb_num] = gdb_bh;
>
> Put another way, there are EXT4_DESC_PER_BLOCK(sb) bg descriptors in a
> block. For a file system with the 64-bit feature enabled, the size of
> the block group descriptor is 128. If the block size is 4096, then we
> can fit 32 block group descriptors in a block.
>
> When we add a new block group such that its block group descriptor
> will spill into a new block, then we need to expand s_group_desc[]
> array, and initialize the new block group descriptor block. And
> that's the job of add_new_gdb() and add_new_gdb_meta_bg().
>
Hi Ted, thanks for the explain. Here are more updates from what I found:
I find that descriptor_loc show layout of gdb blocks in s_group_desc[]
which is: block of s_group_desc[i] will be superblock + i + 1 for
non-meta_bg and 'first block of meta_bg' + has_super for meta_bg.
Although descriptor_loc is called to initialize s_group_desc[], the
expanded gdb block to s_group_desc from add_new_gdb obeys the same
layout.
Back to the original purpose of this patch which is to remove
unnecessary of equality check of s_group_desc[gdb_num - 1].b_blocknr and
s_group_desc[gdb_num].b_blocknr, we can see each s_group_desc has it's
own block from layout above and the check should be unnecessary.
But no insistant on this if you still have concern about it.
>>> More generally, this whole patch series is making it clear that the
>>> online resize code is hard to understand, because it's super
>>> complicated, leading to potential bugs, and very clearly code which is
>>> very hard to maintain. So this may mean we need better comments to
>>> make it clear *when* the backup block groups are going to be
>>> accomplished, given the various different cases (e.g., no flex_bg or
>>> meta_bg, with flex_bg, or with meat_bg).
>>>
>> Yes, I agree with this. I wonder if a series to add comments of some
>> common rules is good to you. Like the information mentioned above
>> that s_group_desc contains primary gdb block of each group.
>
> Well, the meaning of s_group_desc[] was obvious to me, but that's why
> it's useful to have somone with "new eyes" take a look at code, since
> what may be obvious to old hands might not be obvious to someone
> looking at the code for the first time --- and so yes, it's probably
Yes. this is just for anyone starting to read this code.
> worth documenting. The question is where is the best place to put it,
> since the primary place where s_group_desc[] is *not* online resize.
>
> s_group_desc[] is initialized in ext4_group_desc_init() in
> fs/ext4/super.c, and it is used in fs/ext4/balloc.c, and of course, it
> is defined in fs/ext4.h.
I plan to add comment in fs/ext4.h as following:
struct ext4_sb_info {
...
struct buffer_head * __rcu *s_group_desc; /* Primary gdb blocks of online groups */
But I'm not sure it's proper now as you menthioned s_group_desc[] is to
keep the buffer_heads for the block group descriptor blocks in memory
and it contains primary gdb block is a coincidence that we only modify
primary block in kernel.
Besides, I plan to go through the resize code again in fulture and
add some comments to make it easy for anyone starting read this
or make it easy to maintain. Please let me if you disklike it.
>
> - Ted
>
Powered by blists - more mailing lists