[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3930aad6-174d-4422-944e-6c90a3ea065a@huawei.com>
Date: Wed, 16 Oct 2024 16:02:40 +0800
From: Baokun Li <libaokun1@...wei.com>
To: Jan Kara <jack@...e.cz>
CC: Qianqiang Liu <qianqiang.liu@....com>, <tytso@....edu>,
<adilger.kernel@...ger.ca>, syzbot
<syzbot+f792df426ff0f5ceb8d1@...kaller.appspotmail.com>,
<linux-ext4@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<syzkaller-bugs@...glegroups.com>, Yang Erkun <yangerkun@...wei.com>, Baokun
Li <libaokun1@...wei.com>
Subject: Re: [PATCH] ext4: fix out-of-bounds issue in ext4_xattr_set_entry
On 2024/10/15 0:31, Jan Kara wrote:
> On Fri 11-10-24 10:18:04, Baokun Li wrote:
>> On 2024/10/9 23:50, Jan Kara wrote:
>>>> Or go one step further and add a mechanism like xfs Reverse-Mapping, which
>>>> makes sure that allocated blocks do point to the target inode, which could
>>>> replace the current block_validity, and could also be used in future online
>>>> fscks.
>>> Well, that is a rather big change. It requires significant on-disk format
>>> change and also performance cost when to maintain. Furthermore for xattr
>>> blocks which can be shared by many inodes it is not even clear how to
>>> implement this... So I'm not sure we really want to do this either.
>> Yes, there can be a lot of work involved.
>>
>> * Perhaps we could create an rmap file to store the rmap tree to avoid
>> on-disk format changes.
>> * The performance impact of maintaining rmap really needs to be evaluated,
>> perhaps by writing a simple DEMO to test it.
>> * XFS supports shared blocks(A.K.A. reflink.), so even if the physical
>> blocks are the same, but the inodes are different or the logical blocks
>> are different, they will be recorded multiple times in the tree. So the
>> shared xattr block can be handled similarly.
>>
>> We have plans to support online fsck in the future, and implementing rmap
>> is one of the steps. Perhaps one can wait until rmap is implemented to
>> assess whether it is worth a strict check here.
> Yes, we could implement something like this be as you wrote, it's going to
> be a lot of work. We've briefly discussed this with Ted on ext4 call and we
> came to a conclusion that this is a type of corruption ext4 may never
> protect agaist. You simply should not mount arbitrarily corrupted
> filesystems...
Thank you for discussing this with Ted.
This kind of problem is really tricky.
> But if you want to try, sure go ahead :)
As server clusters get larger and larger, server maintenance becomes very
difficult. Therefore, timely detection of problems (i.e. online scanning,
similar to e2fsck -fn) and timely and non-stop fixing of problems (i.e.
online fsck, similar to e2fsck -a) have always been the requirements of
our customers. Thus online fsck has been on our TODO list, and it's really
time to start doing it. 😀
> One relatively easy solution to similar class of problems would be to store
> the type of metadata buffer inside the buffer_head when we are verifying
> checksum, clear the info when freeing the block in __ext4_forget(), and
> fail with EFSCORRUPTED error when one type -> another type transition would
> happen.
This solution looks great. If we save checksum further, we can sense not
only the change of block type, but also the change of block data.
But now journal_head takes up bh->b_private, so to record information,
we need to add new fields to buffer_head (e.g. b_info), or modify the logic
of journal_head (e.g. make bh->b_private hold ext4_bufdata, and include the
journal_head in ext4_bufdata. ocfs2 needs a similar adaptation).
>> Implementing rmap may take some time, until then we can avoid the problem
>> as much as possible by checking the magic and xattr block csum.
>> Maybe something like this?
>>
>> diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
>> index 7647e9f6e190..cd3ae1e3371c 100644
>> --- a/fs/ext4/xattr.c
>> +++ b/fs/ext4/xattr.c
>> @@ -1676,6 +1676,13 @@ static int ext4_xattr_set_entry(struct
>> ext4_xattr_info *i,
>> }
>> }
>>
>> + if (WARN_ON_ONCE(last < here)) {
>> + EXT4_ERROR_INODE(inode, "corrupted xattr entries in %s",
>> + is_block ? "block" : "ibody");
>> + ret = -EFSCORRUPTED;
>> + goto out;
>> + }
>> +
>> /* Check whether we have enough space. */
>> if (i->value) {
>> size_t free;
>> @@ -1923,6 +1930,7 @@ ext4_xattr_block_set(handle_t *handle, struct inode
>> *inode,
>> }
>>
>> if (s->base) {
>> + struct ext4_xattr_header *hdr;
>> int offset = (char *)s->here - bs->bh->b_data;
>>
>> BUFFER_TRACE(bs->bh, "get_write_access");
>> @@ -1932,6 +1940,16 @@ ext4_xattr_block_set(handle_t *handle, struct inode
>> *inode,
>> goto cleanup;
>>
>> lock_buffer(bs->bh);
>> + hdr = header(s->base);
>> +
>> + if (hdr->h_magic != cpu_to_le32(EXT4_XATTR_MAGIC) ||
>> + (ext4_has_metadata_csum(inode->i_sb) &&
>> + (ext4_xattr_block_csum(inode, bs->bh->b_blocknr, hdr)
>> !=
>> + hdr->h_checksum))) {
>> + unlock_buffer(bs->bh);
>> + error = -EFSCORRUPTED;
>> + goto bad_block;
>> + }
>>
>> if (header(s->base)->h_refcount == cpu_to_le32(1)) {
>> __u32 hash = le32_to_cpu(BHDR(bs->bh)->h_hash);
> Hum, there are more places in xattr code that access a buffer that could
> have been modified. So why do you add check into this place? Is it somehow
> special?
>
> Honza
The out-of-bounds access occurs here because the last dentry obtained
in ext4_xattr_set_entry() is not the same as the last dentry obtained
in ext4_xattr_block_find().
When we modify an xattr, we always hold the inode lock, so the ibody
case is not considered. Check_xattrs() is called to do a full check
when looking up an xattr, so it's not necessary to consider that either.
When inserting an xattr into an xattr block, there are three cases:
* xattr block is newly allocated, and the block is allocated only
after the entry is set in memory.
* xattr block is unshared, insert the new xattr directly.
* xattr block is shared, copy the xattr block and insert the new xattr.
Only in the last two cases can a multiply claimed xattr block result in
out-of-bounds access, so the buffer lock is held to verify that the data
is correct.
(It looks like kmemdup should be moved to the buffer lock here as well.🤔)
Thanks again!
Regards,
Baokun
Powered by blists - more mailing lists