linux-kernel - Re: [PATCH] ext4: fix out-of-bounds issue in ext4_xattr_set

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3930aad6-174d-4422-944e-6c90a3ea065a@huawei.com>
Date: Wed, 16 Oct 2024 16:02:40 +0800
From: Baokun Li <libaokun1@...wei.com>
To: Jan Kara <jack@...e.cz>
CC: Qianqiang Liu <qianqiang.liu@....com>, <tytso@....edu>,
	<adilger.kernel@...ger.ca>, syzbot
	<syzbot+f792df426ff0f5ceb8d1@...kaller.appspotmail.com>,
	<linux-ext4@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	<syzkaller-bugs@...glegroups.com>, Yang Erkun <yangerkun@...wei.com>, Baokun
 Li <libaokun1@...wei.com>
Subject: Re: [PATCH] ext4: fix out-of-bounds issue in ext4_xattr_set_entry

On 2024/10/15 0:31, Jan Kara wrote:
> On Fri 11-10-24 10:18:04, Baokun Li wrote:
>> On 2024/10/9 23:50, Jan Kara wrote:
>>>> Or go one step further and add a mechanism like xfs Reverse-Mapping, which
>>>> makes sure that allocated blocks do point to the target inode, which could
>>>> replace the current block_validity, and could also be used in future online
>>>> fscks.
>>> Well, that is a rather big change. It requires significant on-disk format
>>> change and also performance cost when to maintain. Furthermore for xattr
>>> blocks which can be shared by many inodes it is not even clear how to
>>> implement this... So I'm not sure we really want to do this either.
>> Yes, there can be a lot of work involved.
>>
>>   * Perhaps we could create an rmap file to store the rmap tree to avoid
>>     on-disk format changes.
>>   * The performance impact of maintaining rmap really needs to be evaluated,
>>     perhaps by writing a simple DEMO to test it.
>>   * XFS supports shared blocks(A.K.A. reflink.), so even if the physical
>>     blocks are the same, but the inodes are different or the logical blocks
>>     are different, they will be recorded multiple times in the tree. So the
>>     shared xattr block can be handled similarly.
>>
>> We have plans to support online fsck in the future, and implementing rmap
>> is one of the steps. Perhaps one can wait until rmap is implemented to
>> assess whether it is worth a strict check here.
> Yes, we could implement something like this be as you wrote, it's going to
> be a lot of work. We've briefly discussed this with Ted on ext4 call and we
> came to a conclusion that this is a type of corruption ext4 may never
> protect agaist. You simply should not mount arbitrarily corrupted
> filesystems...

Thank you for discussing this with Ted.

This kind of problem is really tricky.

>   But if you want to try, sure go ahead :)
As server clusters get larger and larger, server maintenance becomes very
difficult. Therefore, timely detection of problems (i.e. online scanning,
similar to e2fsck -fn) and timely and non-stop fixing of problems (i.e.
online fsck, similar to e2fsck -a) have always been the requirements of
our customers. Thus online fsck has been on our TODO list, and it's really
time to start doing it. 😀
> One relatively easy solution to similar class of problems would be to store
> the type of metadata buffer inside the buffer_head when we are verifying
> checksum, clear the info when freeing the block in __ext4_forget(), and
> fail with EFSCORRUPTED error when one type -> another type transition would
> happen.
This solution looks great. If we save checksum further, we can sense not
only the change of block type, but also the change of block data.

But now journal_head takes up bh->b_private, so to record information,
we need to add new fields to buffer_head (e.g. b_info), or modify the logic
of journal_head (e.g. make bh->b_private hold ext4_bufdata, and include the
journal_head in ext4_bufdata. ocfs2 needs a similar adaptation).
>> Implementing rmap may take some time, until then we can avoid the problem
>> as much as possible by checking the magic and xattr block csum.
>> Maybe something like this?
>>
>> diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
>> index 7647e9f6e190..cd3ae1e3371c 100644
>> --- a/fs/ext4/xattr.c
>> +++ b/fs/ext4/xattr.c
>> @@ -1676,6 +1676,13 @@ static int ext4_xattr_set_entry(struct
>> ext4_xattr_info *i,
>>                  }
>>          }
>>
>> +       if (WARN_ON_ONCE(last < here)) {
>> +               EXT4_ERROR_INODE(inode, "corrupted xattr entries in %s",
>> +                                       is_block ? "block" : "ibody");
>> +               ret = -EFSCORRUPTED;
>> +               goto out;
>> +       }
>> +
>>          /* Check whether we have enough space. */
>>          if (i->value) {
>>                  size_t free;
>> @@ -1923,6 +1930,7 @@ ext4_xattr_block_set(handle_t *handle, struct inode
>> *inode,
>>          }
>>
>>          if (s->base) {
>> +               struct ext4_xattr_header *hdr;
>>                  int offset = (char *)s->here - bs->bh->b_data;
>>
>>                  BUFFER_TRACE(bs->bh, "get_write_access");
>> @@ -1932,6 +1940,16 @@ ext4_xattr_block_set(handle_t *handle, struct inode
>> *inode,
>>                          goto cleanup;
>>
>>                  lock_buffer(bs->bh);
>> +               hdr = header(s->base);
>> +
>> +               if (hdr->h_magic != cpu_to_le32(EXT4_XATTR_MAGIC) ||
>> +                   (ext4_has_metadata_csum(inode->i_sb) &&
>> +                    (ext4_xattr_block_csum(inode, bs->bh->b_blocknr, hdr)
>> !=
>> +                     hdr->h_checksum))) {
>> +                       unlock_buffer(bs->bh);
>> +                       error = -EFSCORRUPTED;
>> +                       goto bad_block;
>> +               }
>>
>>                  if (header(s->base)->h_refcount == cpu_to_le32(1)) {
>>                          __u32 hash = le32_to_cpu(BHDR(bs->bh)->h_hash);
> Hum, there are more places in xattr code that access a buffer that could
> have been modified. So why do you add check into this place? Is it somehow
> special?
>
> 								Honza

The out-of-bounds access occurs here because the last dentry obtained
in ext4_xattr_set_entry() is not the same as the last dentry obtained
in ext4_xattr_block_find().

When we modify an xattr, we always hold the inode lock, so the ibody
case is not considered. Check_xattrs() is called to do a full check
when looking up an xattr, so it's not necessary to consider that either.

When inserting an xattr into an xattr block, there are three cases:

  * xattr block is newly allocated, and the block is allocated only
    after the entry is set in memory.
  * xattr block is unshared, insert the new xattr directly.
  * xattr block is shared, copy the xattr block and insert the new xattr.

Only in the last two cases can a multiply claimed xattr block result in
out-of-bounds access, so the buffer lock is held to verify that the data
is correct.
(It looks like kmemdup should be moved to the buffer lock here as well.🤔)

Thanks again!

Regards,
Baokun