linux-ext4 - Re: [PATCH] ext4: if zeroout fails fall back to splitting the extent node

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9e47b349-0360-3426-dfa3-cc77f444fac3@huawei.com>
Date:   Wed, 24 Nov 2021 20:11:43 +0800
From:   yangerkun <yangerkun@...wei.com>
To:     Jan Kara <jack@...e.cz>
CC:     Theodore Ts'o <tytso@....edu>,
        Ext4 Developers List <linux-ext4@...r.kernel.org>,
        <yukuai3@...wei.com>
Subject: Re: [PATCH] ext4: if zeroout fails fall back to splitting the extent
 node



On 2021/11/24 18:37, Jan Kara wrote:
> On Wed 24-11-21 17:01:12, yangerkun wrote:
>> On 2021/11/23 17:27, Jan Kara wrote:
>>> Hello,
>>>
>>> On Sun 26-09-21 19:35:01, yangerkun wrote:
>>>> Rethink about this problem. Should we consider other place which call
>>>> ext4_issue_zeroout? Maybe it can trigger the problem too(in theory, not
>>>> really happened)...
>>>>
>>>> How about include follow patch which not only transfer ENOSPC to EIO. But
>>>> also stop to overwrite the error return by ext4_ext_insert_extent in
>>>> ext4_split_extent_at.
>>>>
>>>> Besides, 308c57ccf431 ("ext4: if zeroout fails fall back to splitting the
>>>> extent node") can work together with this patch.
>>>
>>> I've got back to this. The ext4_ext_zeroout() calls in
>>> ext4_split_extent_at() seem to be there as fallback when insertion of a new
>>> extent fails due to ENOSPC / EDQUOT. If even ext4_ext_zeroout(), then I
>>> think returning an error as the code does now is correct and we don't have
>>> much other option. Also we are really running out of disk space so I think
>>> returning ENOSPC is fine. What exact scenario are you afraid of?
>>
>> I am afraid about the EDQUOT from ext4_ext_insert_extent may be overwrite by
>> ext4_ext_zeroout with ENOSPC. And this may lead to dead loop since
>> ext4_writepages will retry once get ENOSPC? Maybe I am wrong...
> 
> OK, so passing back original error instead of the error from
> ext4_ext_zeroout() makes sense. But I don't think doing much more is needed
> - firstly, ENOSPC or EDQUOT should not happen in ext4_split_extent_at()
> called from ext4_writepages() because we should have reserved enough
> space for extent splits when writing data. So hitting that is already

ext4_da_write_begin
   ext4_da_get_block_prep
     ext4_insert_delayed_block
       ext4_da_reserve_space

It seems we will only reserve space for data, no for metadata...


> unexpected. Committing transaction holding blocks that are expected to be
> free is the most likely reason for us seeing ENOSPC and returning EIO in
> that case would be bug. 

Agree. EIO from ext4_ext_zeroout that overwrite the ENOSPC from
ext4_ext_insert_extent seems buggy too. Maybe we should ignore the error
from ext4_ext_zeroout and return the error from ext4_ext_insert_extent
once ext4_ext_zeroout in ext4_split_extent_at got a error. Something
like this:

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 0ecf819bf189..56cc00ee42a1 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3185,6 +3185,7 @@ static int ext4_split_extent_at(handle_t *handle,
         struct ext4_extent *ex2 = NULL;
         unsigned int ee_len, depth;
         int err = 0;
+       int err1;

         BUG_ON((split_flag & (EXT4_EXT_DATA_VALID1 | 
EXT4_EXT_DATA_VALID2)) ==
                (EXT4_EXT_DATA_VALID1 | EXT4_EXT_DATA_VALID2));
@@ -3255,7 +3256,7 @@ static int ext4_split_extent_at(handle_t *handle,
         if (EXT4_EXT_MAY_ZEROOUT & split_flag) {
                 if (split_flag & 
(EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) {
                         if (split_flag & EXT4_EXT_DATA_VALID1) {
-                               err = ext4_ext_zeroout(inode, ex2);
+                               err1 = ext4_ext_zeroout(inode, ex2);
                                 zero_ex.ee_block = ex2->ee_block;
                                 zero_ex.ee_len = cpu_to_le16(
 
ext4_ext_get_actual_len(ex2));
@@ -3270,7 +3271,7 @@ static int ext4_split_extent_at(handle_t *handle,
                                                       ext4_ext_pblock(ex));
                         }
                 } else {
-                       err = ext4_ext_zeroout(inode, &orig_ex);
+                       err1 = ext4_ext_zeroout(inode, &orig_ex);
                         zero_ex.ee_block = orig_ex.ee_block;
                         zero_ex.ee_len = cpu_to_le16(
 
ext4_ext_get_actual_len(&orig_ex));
@@ -3278,7 +3279,7 @@ static int ext4_split_extent_at(handle_t *handle,
                                               ext4_ext_pblock(&orig_ex));
                 }

-               if (!err) {
+               if (!err1) {
                         /* update the extent length and mark as 
initialized */
                         ex->ee_len = cpu_to_le16(ee_len);
                         ext4_ext_try_to_merge(handle, inode, path, ex);



> Secondly, returning EIO instead of ENOSPC is IMO a
> bit confusing for upper layers and makes it harder to analyze where the
> real problem is...
> 
> 								Honza
>