lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 10 Aug 2018 09:24:54 -1000
From:   Ross Zwisler <zwisler@...il.com>
To:     dave.jiang@...el.com
Cc:     Eric Sandeen <sandeen@...hat.com>, "Theodore Ts'o" <tytso@....edu>,
        darrick.wong@...cle.com, Jan Kara <jack@...e.cz>,
        linux-nvdimm@...ts.01.org, Dave Chinner <david@...morbit.com>,
        linux-xfs <linux-xfs@...r.kernel.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        lczerner@...hat.com, linux-ext4 <linux-ext4@...r.kernel.org>,
        Christoph Hellwig <hch@....de>
Subject: Re: [PATCH v2 2/2] [PATCH] xfs: Close race between direct IO and xfs_break_layouts()

On Fri, Aug 10, 2018 at 9:23 AM Dave Jiang <dave.jiang@...el.com> wrote:
> On 08/10/2018 11:31 AM, Eric Sandeen wrote:
> > On 8/8/18 12:31 PM, Dave Jiang wrote:
> >> This patch is the duplicate of ross's fix for ext4 for xfs.
> >>
> >> If the refcount of a page is lowered between the time that it is returned
> >> by dax_busy_page() and when the refcount is again checked in
> >> xfs_break_layouts() => ___wait_var_event(), the waiting function
> >> xfs_wait_dax_page() will never be called.  This means that
> >> xfs_break_layouts() will still have 'retry' set to false, so we'll stop
> >> looping and never check the refcount of other pages in this inode.
> >>
> >> Instead, always continue looping as long as dax_layout_busy_page() gives us
> >> a page which it found with an elevated refcount.
> >
> > Hi Dave, does this have a testcase?  Have you seen the issue using Ross's
> > xfstest generic/503 or is there some other test?  Apologies if I missed
> > prior discussion on a testcase or race frequency...
>
> I do not have a testcase. I know Ross replicated it on ext4. And Jan
> asked to create the same fix with XFS when he reviewed Ross's fix for ext4.

In my testing I couldn't get this race to hit with XFS.  I couldn't
even get a failure with generic/503 when testing XFS before Dan's
initial patches went in which added xfs_break_layouts() et al.  I
think that Dan had to manually insert timing delays to get the warning
to hit for XFS when testing his patches.

The race we're fixing happens consistently with ext4 and through code
inspection we can see that the race exists in XFS.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ