lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211220115823.GB20005@quack2.suse.cz>
Date:   Mon, 20 Dec 2021 12:58:23 +0100
From:   Jan Kara <jack@...e.cz>
To:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Cc:     kernel test robot <oliver.sang@...el.com>,
        Jens Axboe <axboe@...nel.dk>,
        syzbot <syzbot+643e4ce4b6ad1347d372@...kaller.appspotmail.com>,
        Christoph Hellwig <hch@...radead.org>, Jan Kara <jack@...e.cz>,
        Christoph Hellwig <hch@....de>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux Memory Management List <linux-mm@...ck.org>,
        lkp@...ts.01.org, lkp@...el.com
Subject: Re: [loop] 322c4293ec: xfstests.xfs.049.fail

On Mon 20-12-21 00:45:46, Tetsuo Handa wrote:
> On 2021/12/20 0:09, kernel test robot wrote:
> >     @@ -13,3 +13,5 @@
> >      --- clean
> >      --- umount ext2 on xfs
> >      --- umount xfs
> >     +!!! umount xfs failed
> >     +(see /lkp/benchmarks/xfstests/results//xfs/049.full for details)
> >     ...
> >     (Run 'diff -u /lkp/benchmarks/xfstests/tests/xfs/049.out /lkp/benchmarks/xfstests/results//xfs/049.out.bad'  to see the entire diff)
> 
> Yes, we know this race condition can happen.
> 
> https://lkml.kernel.org/r/16c7d304-60ef-103f-1b2c-8592b48f47c6@i-love.sakura.ne.jp
> https://lkml.kernel.org/r/YaYfu0H2k0PSQL6W@infradead.org
> 
> Should we try to wait for autoclear operation to complete?

So I think we should try to fix this because as Dave writes in the
changelog for a1ecac3b0656 ("loop: Make explicit loop device destruction
lazy") which started all this, having random EBUSY failures (either from
losetup or umount) is annoying and you need to work it around it lots of
unexpected places.

We cannot easily wait for work completion in the loop device code without
reintroducing the deadlock - whole lo_release() is called under
disk->open_mutex which you also need to grab in __loop_clr_fd(). So to
avoid holding backing file busy longer than expected, we could use
task_work instead of ordinary work as I suggested - but you were right that
we need to be somewhat careful and in case we are running in a kthread, we
would still need to offload to a normal work (but in that case we don't
care about delaying file release anyway).

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ