linux-kernel - Re: [loop] 322c4293ec: xfstests.xfs.049.fail

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20211220115823.GB20005@quack2.suse.cz>
Date:   Mon, 20 Dec 2021 12:58:23 +0100
From:   Jan Kara <jack@...e.cz>
To:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Cc:     kernel test robot <oliver.sang@...el.com>,
        Jens Axboe <axboe@...nel.dk>,
        syzbot <syzbot+643e4ce4b6ad1347d372@...kaller.appspotmail.com>,
        Christoph Hellwig <hch@...radead.org>, Jan Kara <jack@...e.cz>,
        Christoph Hellwig <hch@....de>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux Memory Management List <linux-mm@...ck.org>,
        lkp@...ts.01.org, lkp@...el.com
Subject: Re: [loop] 322c4293ec: xfstests.xfs.049.fail

On Mon 20-12-21 00:45:46, Tetsuo Handa wrote:
> On 2021/12/20 0:09, kernel test robot wrote:
> >     @@ -13,3 +13,5 @@
> >      --- clean
> >      --- umount ext2 on xfs
> >      --- umount xfs
> >     +!!! umount xfs failed
> >     +(see /lkp/benchmarks/xfstests/results//xfs/049.full for details)
> >     ...
> >     (Run 'diff -u /lkp/benchmarks/xfstests/tests/xfs/049.out /lkp/benchmarks/xfstests/results//xfs/049.out.bad'  to see the entire diff)
> 
> Yes, we know this race condition can happen.
> 
> https://lkml.kernel.org/r/16c7d304-60ef-103f-1b2c-8592b48f47c6@i-love.sakura.ne.jp
> https://lkml.kernel.org/r/YaYfu0H2k0PSQL6W@infradead.org
> 
> Should we try to wait for autoclear operation to complete?

So I think we should try to fix this because as Dave writes in the
changelog for a1ecac3b0656 ("loop: Make explicit loop device destruction
lazy") which started all this, having random EBUSY failures (either from
losetup or umount) is annoying and you need to work it around it lots of
unexpected places.

We cannot easily wait for work completion in the loop device code without
reintroducing the deadlock - whole lo_release() is called under
disk->open_mutex which you also need to grab in __loop_clr_fd(). So to
avoid holding backing file busy longer than expected, we could use
task_work instead of ordinary work as I suggested - but you were right that
we need to be somewhat careful and in case we are running in a kthread, we
would still need to offload to a normal work (but in that case we don't
care about delaying file release anyway).

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR