lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171012165707.GJ10593@eguan.usersys.redhat.com>
Date:   Fri, 13 Oct 2017 00:57:07 +0800
From:   Eryu Guan <eguan@...hat.com>
To:     Jan Kara <jack@...e.cz>
Cc:     linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org,
        lczerner@...hat.com
Subject: Re: [v4.14-rc3 bug] scheduling while atomic in generic/451 test on
 extN

On Thu, Oct 12, 2017 at 05:07:40PM +0200, Jan Kara wrote:
> Hi Eryu!
> 
> On Thu 05-10-17 14:07:00, Eryu Guan wrote:
> > I hit "scheduling while atomic" bug by running fstests generic/451 on
> > extN filesystems in v4.14-rc3 testing, but it didn't reproduce for me on
> > every host I tried, but I've seen it multiple times on multiple hosts. A
> > test vm of mine with 4 vcpus and 8G memory reproduced the bug reliably,
> > while a bare metal host with 8 cpus and 8G mem couldn't.
> > 
> > This is due to commit 332391a9935d ("fs: Fix page cache inconsistency
> > when mixing buffered and AIO DIO"), which defers AIO DIO io completion
> > to a workqueue if the inode has mapped pages and does page cache
> > invalidation in process context. I think that the problem is that the
> > pages can be mapped after the dio->inode->i_mapping->nrpages check, so
> > we're doing page cache invalidation, which could sleep, in interrupt
> > context, thus "scheduling while atomic" bug happens.
> > 
> > Defering all AIO DIO completion to workqueue unconditionally (as what
> > the iomap based path does) fixed the problem for me. But there're
> > performance concerns to do so in the original discussions.
> > 
> > https://www.spinics.net/lists/linux-fsdevel/msg112669.html
> 
> Thanks for report and the detailed analysis. I think your analysis is
> correct and the nrpages check in dio_bio_end_aio() is racy. My solution to
> this would be to pass to dio_complete() as an argument whether invalidation
> is required or not (and set it to true for deferred completion and to false
> when we decide not to defer completion since nrpages is 0 at that moment).
> Lukas?

But wouldn't that bring the original bug back? i.e. read the stale data
from pagecache, because it's possible that we need to invalidate the
caches but we didn't.

Thanks,
Eryu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ