[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 15 Apr 2023 15:18:05 +0300
From: Amir Goldstein <amir73il@...il.com>
To: "Darrick J. Wong" <djwong@...nel.org>
Cc: lsf-pc@...ts.linux-foundation.org, linux-fsdevel@...r.kernel.org,
xfs <linux-xfs@...r.kernel.org>,
linux-ext4 <linux-ext4@...r.kernel.org>,
linux-btrfs <linux-btrfs@...r.kernel.org>
Subject: Re: [Lsf-pc] [LSF TOPIC] online repair of filesystems: what next?
On Tue, Feb 28, 2023 at 10:49 PM Darrick J. Wong <djwong@...nel.org> wrote:
>
> Hello fsdevel people,
>
> Five years ago[0], we started a conversation about cross-filesystem
> userspace tooling for online fsck. I think enough time has passed for
> us to have another one, since a few things have happened since then:
>
> 1. ext4 has gained the ability to send corruption reports to a userspace
> monitoring program via fsnotify. Thanks, Collabora!
>
> 2. XFS now tracks successful scrubs and corruptions seen during runtime
> and during scrubs. Userspace can query this information.
>
> 3. Directory parent pointers, which enable online repair of the
> directory tree, is nearing completion.
>
> 4. Dave and I are working on merging online repair of space metadata for
> XFS. Online repair of directory trees is feature complete, but we
> still have one or two unresolved questions in the parent pointer
> code.
>
> 5. I've gotten a bit better[1] at writing systemd service descriptions
> for scheduling and performing background online fsck.
>
> Now that fsnotify_sb_error exists as a result of (1), I think we
> should figure out how to plumb calls into the readahead and writeback
> code so that IO failures can be reported to the fsnotify monitor. I
> suspect there may be a few difficulties here since fsnotify (iirc)
> allocates memory and takes locks.
>
> As a result of (2), XFS now retains quite a bit of incore state about
> its own health. The structure that fsnotify gives to userspace is very
> generic (superblock, inode, errno, errno count). How might XFS export
> a greater amount of information via this interface? We can provide
> details at finer granularity -- for example, a specific data structure
> under an allocation group or an inode, or specific quota records.
>
> With (4) on the way, I can envision wanting a system service that would
> watch for these fsnotify events, and transform the error reports into
> targeted repair calls in the kernel. This of course would be very
> filesystem specific, but I would also like to hear from anyone pondering
> other usecases for fsnotify filesystem error monitors.
>
> Once (3) lands, XFS gains the ability to translate a block device IO
> error to an inode number and file offset, and then the inode number to a
> path. In other words, your file breaks and now we can tell applications
> which file it was so they can failover or redownload it or whatever.
> Ric Wheeler mentioned this in 2018's session.
>
> The final topic from that 2018 session concerned generic wrappers for
> fsscrub. I haven't pushed hard on that topic because XFS hasn't had
> much to show for that. Now that I'm better versed in systemd services,
> I envision three ways to interact with online fsck:
>
> - A CLI program that can be run by anyone.
>
> - Background systemd services that fire up periodically.
>
> - A dbus service that programs can bind to and request a fsck.
>
> I still think there's an opportunity to standardize the naming to make
> it easier to use a variety of filesystems. I propose for the CLI:
>
> /usr/sbin/fsscrub $mnt that calls /usr/sbin/fsscrub.$FSTYP $mnt
>
> For systemd services, I propose "fsscrub@<escaped mountpoint>". I
> suspect we want a separate background service that itself runs
> periodically and invokes the fsscrub@...t services. xfsprogs already
> has a xfs_scrub_all service that does that. The services are nifty
> because it's really easy to restrict privileges, implement resource
> usage controls, and use private name/mountspaces to isolate the process
> from the rest of the system.
>
> dbus is a bit trickier, since there's no precedent at all. I guess
> we'd have to define an interface for filesystem "object". Then we could
> write a service that establishes a well-known bus name and maintains
> object paths for each mounted filesystem. Each of those objects would
> export the filesystem interface, and that's how programs would call
> online fsck as a service.
>
> Ok, that's enough for a single session topic. Thoughts? :)
Darrick,
Quick question.
You indicated that you would like to discuss the topics:
Atomic file contents exchange
Atomic directio writes
Are those intended to be in a separate session from online fsck?
Both in the same session?
I know you posted patches for FIEXCHANGE_RANGE [1],
but they were hiding inside a huge DELUGE and people
were on New Years holidays, so nobody commented.
Perhaps you should consider posting an uptodate
topic suggestion to let people have an opportunity to
start a discussion before LSFMM.
Thanks,
Amir.
[1] https://lore.kernel.org/linux-fsdevel/167243843494.699466.5163281976943635014.stgit@magnolia/
Powered by blists - more mailing lists