linux-ext4 - Re: [Lsf-pc] [LSF TOPIC] online repair of filesystems: what next?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOQ4uxgV5LiUdJqAGDaGyCUMSM9LG+CBbj0sS8-ZVHw-Q+Y7yg@mail.gmail.com>
Date:   Sun, 16 Apr 2023 11:43:21 +0300
From:   Amir Goldstein <amir73il@...il.com>
To:     Qu Wenruo <quwenruo.btrfs@....com>
Cc:     "Darrick J. Wong" <djwong@...nel.org>,
        lsf-pc@...ts.linux-foundation.org, linux-fsdevel@...r.kernel.org,
        xfs <linux-xfs@...r.kernel.org>,
        linux-ext4 <linux-ext4@...r.kernel.org>,
        linux-btrfs <linux-btrfs@...r.kernel.org>
Subject: Re: [Lsf-pc] [LSF TOPIC] online repair of filesystems: what next?

On Sun, Apr 16, 2023 at 11:11 AM Qu Wenruo <quwenruo.btrfs@....com> wrote:
>
>
>
> On 2023/3/1 04:49, Darrick J. Wong wrote:
> > Hello fsdevel people,
> >
> > Five years ago[0], we started a conversation about cross-filesystem
> > userspace tooling for online fsck.  I think enough time has passed for
> > us to have another one, since a few things have happened since then:
> >
> > 1. ext4 has gained the ability to send corruption reports to a userspace
> >     monitoring program via fsnotify.  Thanks, Collabora!
>
> Not familiar with the new fsnotify thing, any article to start?

https://docs.kernel.org/admin-guide/filesystem-monitoring.html#file-system-error-reporting

fs needs to opt-in with fsnotify_sb_error() calls and currently, only
ext4 does that.

>
> I really believe we should have a generic interface to report errors,
> currently btrfs reports extra details just through dmesg (like the
> logical/physical of the corruption, reason, involved inodes etc), which
> is far from ideal.
>
> >
> > 2. XFS now tracks successful scrubs and corruptions seen during runtime
> >     and during scrubs.  Userspace can query this information.
> >
> > 3. Directory parent pointers, which enable online repair of the
> >     directory tree, is nearing completion.
> >
> > 4. Dave and I are working on merging online repair of space metadata for
> >     XFS.  Online repair of directory trees is feature complete, but we
> >     still have one or two unresolved questions in the parent pointer
> >     code.
> >
> > 5. I've gotten a bit better[1] at writing systemd service descriptions
> >     for scheduling and performing background online fsck.
> >
> > Now that fsnotify_sb_error exists as a result of (1), I think we
> > should figure out how to plumb calls into the readahead and writeback
> > code so that IO failures can be reported to the fsnotify monitor.  I
> > suspect there may be a few difficulties here since fsnotify (iirc)
> > allocates memory and takes locks.
> >
> > As a result of (2), XFS now retains quite a bit of incore state about
> > its own health.  The structure that fsnotify gives to userspace is very
> > generic (superblock, inode, errno, errno count).  How might XFS export
> > a greater amount of information via this interface?  We can provide
> > details at finer granularity -- for example, a specific data structure
> > under an allocation group or an inode, or specific quota records.
>
> The same for btrfs.
>
> Some btrfs specific info like subvolume id is also needed to locate the
> corrupted inode (ino is not unique among the full fs, but only inside
> one subvolume).
>

The fanotify error event (which btrfs does not currently generate)
contains an "FID record", FID is fsid+file_handle.
For btrfs, file_handle would be FILEID_BTRFS_WITHOUT_PARENT
so include the subvol root ino.

> And something like file paths for the corrupted inode is also very
> helpful for end users to locate (and normally delete) the offending inode.
>

This interface was merged without the ability to report an fs-specific
info blob, but it was designed in a way that would allow adding that blob.

Thanks,
Amir.