linux-kernel - RE: [PATCH v4 03/10] fs: Introduce ->corrupted

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <OSBPR01MB292031EC271D4AD843389A3FF40E9@OSBPR01MB2920.jpnprd01.prod.outlook.com>
Date:   Thu, 17 Jun 2021 08:12:49 +0000
From:   "ruansy.fnst@...itsu.com" <ruansy.fnst@...itsu.com>
To:     Dan Williams <dan.j.williams@...el.com>
CC:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-xfs <linux-xfs@...r.kernel.org>,
        Linux MM <linux-mm@...ck.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        device-mapper development <dm-devel@...hat.com>,
        "Darrick J. Wong" <darrick.wong@...cle.com>,
        david <david@...morbit.com>, Christoph Hellwig <hch@....de>,
        Alasdair Kergon <agk@...hat.com>,
        Mike Snitzer <snitzer@...hat.com>,
        Goldwyn Rodrigues <rgoldwyn@...e.de>,
        Linux NVDIMM <nvdimm@...ts.linux.dev>
Subject: RE: [PATCH v4 03/10] fs: Introduce ->corrupted_range() for superblock

> -----Original Message-----
> From: Dan Williams <dan.j.williams@...el.com>
> Subject: Re: [PATCH v4 03/10] fs: Introduce ->corrupted_range() for superblock
> 
> On Wed, Jun 16, 2021 at 11:51 PM ruansy.fnst@...itsu.com
> <ruansy.fnst@...itsu.com> wrote:
> >
> > > -----Original Message-----
> > > From: Dan Williams <dan.j.williams@...el.com>
> > > Subject: Re: [PATCH v4 03/10] fs: Introduce ->corrupted_range() for
> > > superblock
> > >
> > > [ drop old linux-nvdimm@...ts.01.org, add nvdimm@...ts.linux.dev ]
> > >
> > > On Thu, Jun 3, 2021 at 6:19 PM Shiyang Ruan <ruansy.fnst@...itsu.com>
> wrote:
> > > >
> > > > Memory failure occurs in fsdax mode will finally be handled in
> > > > filesystem.  We introduce this interface to find out files or
> > > > metadata affected by the corrupted range, and try to recover the
> > > > corrupted data if possiable.
> > > >
> > > > Signed-off-by: Shiyang Ruan <ruansy.fnst@...itsu.com>
> > > > ---
> > > >  include/linux/fs.h | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/include/linux/fs.h b/include/linux/fs.h index
> > > > c3c88fdb9b2a..92af36c4225f 100644
> > > > --- a/include/linux/fs.h
> > > > +++ b/include/linux/fs.h
> > > > @@ -2176,6 +2176,8 @@ struct super_operations {
> > > >                                   struct shrink_control *);
> > > >         long (*free_cached_objects)(struct super_block *,
> > > >                                     struct shrink_control *);
> > > > +       int (*corrupted_range)(struct super_block *sb, struct
> > > > + block_device
> > > *bdev,
> > > > +                              loff_t offset, size_t len, void
> > > > + *data);
> > >
> > > Why does the superblock need a new operation? Wouldn't whatever
> > > function is specified here just be specified to the dax_dev as the
> > > ->notify_failure() holder callback?
> >
> > Because we need to find out which file is effected by the given poison page so
> that memory-failure code can do collect_procs() and kill_procs() jobs.  And it
> needs filesystem to use its rmap feature to search the file from a given offset.
> So, we need this implemented by the specified filesystem and called by
> dax_device's holder.
> >
> > This is the call trace I described in cover letter:
> > memory_failure()
> >  * fsdax case
> >  pgmap->ops->memory_failure()      => pmem_pgmap_memory_failure()
> >   dax_device->holder_ops->corrupted_range() =>
> >                                       - fs_dax_corrupted_range()
> >                                       - md_dax_corrupted_range()
> >    sb->s_ops->currupted_range()    => xfs_fs_corrupted_range()  <==
> **HERE**
> >     xfs_rmap_query_range()
> >      xfs_currupt_helper()
> >       * corrupted on metadata
> >           try to recover data, call xfs_force_shutdown()
> >       * corrupted on file data
> >           try to recover data, call mf_dax_kill_procs()
> >  * normal case
> >  mf_generic_kill_procs()
> >
> > As you can see, this new added operation is an important for the whole
> progress.
> 
> I don't think you need either fs_dax_corrupted_range() nor
> sb->s_ops->corrupted_range(). In fact that fs_dax_corrupted_range()
> looks broken because the filesystem may not even be mounted on the device
> associated with the error. 

If filesystem is not mounted, then there won't be any process using the broken page and no one need to be killed in memory-failure.  So, I think we can just return and handle the error on driver level if needed.

> The holder_data and holder_op should be sufficient
> from communicating the stack of notifications:
> 
> pgmap->notify_memory_failure() => pmem_pgmap_notify_failure()
> pmem_dax_dev->holder_ops->notify_failure(pmem_dax_dev) =>
> md_dax_notify_failure()
> md_dax_dev->holder_ops->notify_failure() => xfs_notify_failure()
> 
> I.e. the entire chain just walks dax_dev holder ops.

Oh, I see.  Just need to implement holder_ops in filesystem or mapped_device directly.  I made the routine complicated.


--
Thanks,
Ruan.