linux-kernel - Re: [PATCH] ovl: skip overlayfs superblocks at global sync

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6af394f4-7cfd-5303-0042-9e37e43cf346@yandex-team.ru>
Date:   Thu, 9 Apr 2020 15:04:39 +0300
From:   Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
To:     Amir Goldstein <amir73il@...il.com>
Cc:     linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Miklos Szeredi <miklos@...redi.hu>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        overlayfs <linux-unionfs@...r.kernel.org>,
        Dmitry Monakhov <dmtrmonakhov@...dex-team.ru>,
        Linux Containers <containers@...ts.linux-foundation.org>,
        Theodore Tso <tytso@....edu>
Subject: Re: [PATCH] ovl: skip overlayfs superblocks at global sync



On 09/04/2020 14.48, Amir Goldstein wrote:
> On Thu, Apr 9, 2020 at 2:28 PM Konstantin Khlebnikov
> <khlebnikov@...dex-team.ru> wrote:
>>
>> On 09/04/2020 13.23, Amir Goldstein wrote:
>>> On Thu, Apr 9, 2020 at 11:30 AM Konstantin Khlebnikov
>>> <khlebnikov@...dex-team.ru> wrote:
>>>>
>>>> Stacked filesystems like overlayfs has no own writeback, but they have to
>>>> forward syncfs() requests to backend for keeping data integrity.
>>>>
>>>> During global sync() each overlayfs instance calls method ->sync_fs()
>>>> for backend although it itself is in global list of superblocks too.
>>>> As a result one syscall sync() could write one superblock several times
>>>> and send multiple disk barriers.
>>>>
>>>> This patch adds flag SB_I_SKIP_SYNC into sb->sb_iflags to avoid that.
>>>>
>>>> Reported-by: Dmitry Monakhov <dmtrmonakhov@...dex-team.ru>
>>>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
>>>> ---
>>>
>>> Seems reasonable.
>>> You may add:
>>> Reviewed-by: Amir Goldstein <amir73il@...il.com>
>>>
>>> +CC: containers list
>>
>> Thanks
>>
>>>
>>> This bring up old memories.
>>> I posted this way back to fix handling of emergency_remount() in the
>>> presence of loop mounted fs:
>>> https://lore.kernel.org/linux-ext4/CAA2m6vfatWKS1CQFpaRbii2AXiZFvQUjVvYhGxWTSpz+2rxDyg@mail.gmail.com/
>>>
>>> But seems to me that emergency_sync() and sync(2) are equally broken
>>> for this use case.
>>>
>>> I wonder if anyone cares enough about resilience of loop mounted fs to try
>>> and change the iterate_* functions to iterate supers/bdevs in reverse order...
>>
>> Now I see reason behind "sync; sync; sync; reboot" =)
>>
>> Order old -> new allows to not miss new items if list modifies.
>> Might be important for some users.
>>
> 
> That's not the reason I suggested reverse order.
> The reason is that with loop mounted fs, the correct order of flushing is:
> 1. sync loop mounted fs inodes => writes to loop image file
> 2. sync loop mounted fs sb => fsyncs the loop image file
> 3. sync the loop image host fs sb
> 
> With forward sb iteration order, #3 happens before #1, so the
> loop mounted fs changes are not really being made durable by
> a single sync(2) call.

If fs in loop mounted with barriers then sync_fs will issue
REQ_OP_FLUSH to loop device and trigger fsync() for image file.
Sync() might write something twice but data should be safe.
Without barriers this scenario is broken for sure.

Emergency remount R/O is other thing. It really needs reverse order.

> 
>> bdev iteration seems already reversed: inode_sb_list_add adds to the head
>>
> 
> I think bdev iteration order will not make a difference in this case.
> flushing /dev/loopX will not be needed and it happens too late
> anyway.
> 
> Thanks,
> Amir.
>