lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y4jonnIvY2F/dD9y@magnolia>
Date:   Thu, 1 Dec 2022 09:47:10 -0800
From:   "Darrick J. Wong" <djwong@...nel.org>
To:     Shiyang Ruan <ruansy.fnst@...itsu.com>
Cc:     Dan Williams <dan.j.williams@...el.com>,
        linux-kernel@...r.kernel.org, linux-xfs@...r.kernel.org,
        nvdimm@...ts.linux.dev, linux-fsdevel@...r.kernel.org,
        david@...morbit.com, akpm@...ux-foundation.org
Subject: Re: [PATCH 0/2] fsdax,xfs: fix warning messages

On Thu, Dec 01, 2022 at 11:39:12PM +0800, Shiyang Ruan wrote:
> 
> 
> 在 2022/12/1 5:08, Darrick J. Wong 写道:
> > On Tue, Nov 29, 2022 at 11:05:30PM -0800, Dan Williams wrote:
> > > Darrick J. Wong wrote:
> > > > On Tue, Nov 29, 2022 at 07:59:14PM -0800, Dan Williams wrote:
> > > > > [ add Andrew ]
> > > > > 
> > > > > Shiyang Ruan wrote:
> > > > > > Many testcases failed in dax+reflink mode with warning message in dmesg.
> > > > > > This also effects dax+noreflink mode if we run the test after a
> > > > > > dax+reflink test.  So, the most urgent thing is solving the warning
> > > > > > messages.
> > > > > > 
> > > > > > Patch 1 fixes some mistakes and adds handling of CoW cases not
> > > > > > previously considered (srcmap is HOLE or UNWRITTEN).
> > > > > > Patch 2 adds the implementation of unshare for fsdax.
> > > > > > 
> > > > > > With these fixes, most warning messages in dax_associate_entry() are
> > > > > > gone.  But honestly, generic/388 will randomly failed with the warning.
> > > > > > The case shutdown the xfs when fsstress is running, and do it for many
> > > > > > times.  I think the reason is that dax pages in use are not able to be
> > > > > > invalidated in time when fs is shutdown.  The next time dax page to be
> > > > > > associated, it still remains the mapping value set last time.  I'll keep
> > > > > > on solving it.
> > > > > > 
> > > > > > The warning message in dax_writeback_one() can also be fixed because of
> > > > > > the dax unshare.
> > > > > 
> > > > > Thank you for digging in on this, I had been pinned down on CXL tasks
> > > > > and worried that we would need to mark FS_DAX broken for a cycle, so
> > > > > this is timely.
> > > > > 
> > > > > My only concern is that these patches look to have significant collisions with
> > > > > the fsdax page reference counting reworks pending in linux-next. Although,
> > > > > those are still sitting in mm-unstable:
> > > > > 
> > > > > http://lore.kernel.org/r/20221108162059.2ee440d5244657c4f16bdca0@linux-foundation.org
> > > > > 
> > > > > My preference would be to move ahead with both in which case I can help
> > > > > rebase these fixes on top. In that scenario everything would go through
> > > > > Andrew.
> > > > > 
> > > > > However, if we are getting too late in the cycle for that path I think
> > > > > these dax-fixes take precedence, and one more cycle to let the page
> > > > > reference count reworks sit is ok.
> > > > 
> > > > Well now that raises some interesting questions -- dax and reflink are
> > > > totally broken on 6.1.  I was thinking about cramming them into 6.2 as a
> > > > data corruption fix on the grounds that is not an acceptable state of
> > > > affairs.
> > > 
> > > I agree it's not an acceptable state of affairs, but for 6.1 the answer
> > > may be to just revert to dax+reflink being forbidden again. The fact
> > > that no end user has noticed is probably a good sign that we can disable
> > > that without any one screaming. That may be the easy answer for 6.2 as
> > > well given how late this all is.
> > > 
> > > > OTOH we're past -rc7, which is **really late** to be changing core code.
> > > > Then again, there aren't so many fsdax users and nobody's complained
> > > > about 6.0/6.1 being busted, so perhaps the risk of regression isn't so
> > > > bad?  Then again, that could be a sign that this could wait, if you and
> > > > Andrew are really eager to merge the reworks.
> > > 
> > > The page reference counting has also been languishing for a long time. A
> > > 6.2 merge would be nice, it relieves maintenance burden, but they do not
> > > start to have real end user implications until CXL memory hotplug
> > > platforms arrive and the warts in the reference counting start to show
> > > real problems in production.
> > 
> > Hm.  How bad *would* it be to rebase that patchset atop this one?
> > 
> > After overnight testing on -rc7 it looks like Ruan's patchset fixes all
> > the problems AFAICT.  Most of the remaining regressions are to mask off
> > fragmentation testing because fsdax cow (like the directio write paths)
> > doesn't make much use of extent size hints.
> > 
> > > > Just looking at the stuff that's still broken with dax+reflink -- I
> > > > noticed that xfs/550-552 (aka the dax poison tests) are still regressing
> > > > on reflink filesystems.
> > > 
> > > That's worrying because the whole point of reworking dax, xfs, and
> > > mm/memory-failure all at once was to handle the collision of poison and
> > > reflink'd dax files.
> > 
> > I just tried out -rc7 and all three pass, so disregard this please.
> > 
> > > > So, uh, what would this patchset need to change if the "fsdax page
> > > > reference counting reworks" were applied?  Would it be changing the page
> > > > refcount instead of stashing that in page->index?
> > > 
> > > Nah, it's things like switching from pages to folios and shifting how
> > > dax goes from pfns to pages.
> > > 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-unstable&id=cca48ba3196
> > > 
> > > Ideally fsdax would never deal in pfns at all and do everything in terms
> > > of offsets relative to a 'struct dax_device'.
> > > 
> > > My gut is saying these patches, the refcount reworks, and the
> > > dax+reflink fixes, are important but not end user critical. One more
> > > status quo release does not hurt, and we can circle back to get this all
> > > straightened early in v6.3.
> > 
> > Being a data corruption fix, I don't see why we shouldn't revisit this
> > during the 6.2 cycle, even if it comes after merging the refcounting
> > stuff.
> > 
> > Question for Ruan: Would it be terribly difficult to push out a v2 with
> > the review comments applied so that we have something we can backport to
> > 6.1; and then rebase the series atop 6.2-rc1 so we can apply it to
> > upstream (and then apply the 6.1 version to the LTS)?  Or is this too
> > convoluted...?
> 
> It's fine to me.  V2 has been posted just now.  The big patch has been
> separated.

Ok, thank you.

Since akpm/Dan aren't moving forward with the page refcounting changes
for 6.2, I think I'll try to merge these fixes for 6.2-rc1 without so
much rebasing. :)

--D

> 
> --
> Thanks,
> Ruan.
> 
> > 
> > > I.e. just revert:
> > > 
> > > 35fcd75af3ed xfs: fail dax mount if reflink is enabled on a partition
> > > 
> > > ...for v6.1-rc8 and get back to this early in the New Year.
> > 
> > Hm.  Tempting.
> > 
> > --D
> > 
> > > > 
> > > > --D
> > > > 
> > > > > > Shiyang Ruan (2):
> > > > > >    fsdax,xfs: fix warning messages at dax_[dis]associate_entry()
> > > > > >    fsdax,xfs: port unshare to fsdax
> > > > > > 
> > > > > >   fs/dax.c             | 166 ++++++++++++++++++++++++++++++-------------
> > > > > >   fs/xfs/xfs_iomap.c   |   6 +-
> > > > > >   fs/xfs/xfs_reflink.c |   8 ++-
> > > > > >   include/linux/dax.h  |   2 +
> > > > > >   4 files changed, 129 insertions(+), 53 deletions(-)
> > > > > > 
> > > > > > -- 
> > > > > > 2.38.1
> > > 
> > > 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ