[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOQ4uxjcQWQ9n1rO7=js2SQ8-ZEbX2Wjvq-6ZGCyy5X5CJcTbw@mail.gmail.com>
Date: Fri, 19 Mar 2021 12:35:40 +0200
From: Amir Goldstein <amir73il@...il.com>
To: Miklos Szeredi <miklos@...redi.hu>
Cc: harshad shirwadkar <harshadshirwadkar@...il.com>,
Ext4 <linux-ext4@...r.kernel.org>, Theodore Tso <tytso@....edu>,
overlayfs <linux-unionfs@...r.kernel.org>
Subject: Re: [PATCH] ext4: add rename whiteout support for fast commit
On Fri, Mar 19, 2021 at 10:36 AM Miklos Szeredi <miklos@...redi.hu> wrote:
>
> On Fri, Mar 19, 2021 at 6:52 AM Amir Goldstein <amir73il@...il.com> wrote:
> >
> > [adding overlayfs list]
> >
> > On Fri, Mar 19, 2021 at 3:32 AM harshad shirwadkar
> > <harshadshirwadkar@...il.com> wrote:
> > >
> > > Thanks for the review Amir.
> > >
> > > Sure changing the subject makes sense.
> > >
> > > Also, on further discussions on Ext4 conference call, we also thought
> > > that with this patch, overlayfs customers would not benefit from fast
> > > commits much if they call renames often. So, in order to really make
> > > rename whiteout a fast commit compatible operation, we probably would
> > > need to add support in fast commit to replay a char device creation
> > > event (since whiteout object is a char device). That would imply, we
> > > would need to do careful versioning and would need to burn an on-disk
> > > feature flag.
> > >
> > > An alternative to this would be to have a static whiteout object with
> > > irrelevant nlink count and to have every rename point to that object
> > > instead. Based on how we decide to implement that, at max only the
> > > first rename operation would be fast commit incompatible since that's
> > > when this object would get created. All the further operations would
> > > be fast commit compatible. The big benefit of this approach is that
> > > this way we don't have to add support for char device creation in fast
> > > commit replay code and thus we don't have to worry about versioning.
> > >
> >
> > I'm glad to hear that, Harshad.
> >
> > Please note that creating a static whiteout object on-disk is one possible
> > implementation option. Not creating any object on-disk may be even better.
>
> I don't really get it. What's the advantage of not having an object?
>
> Readdir returning DT_WHT internally might be nice, but I'd be careful
> with exporting that to userspace, since it's likely to cause more
> problems that it solves. And on the stat(2) interface adding S_IFWHT
> or even worse: ENOENT are really out of the question due to backward
> incompatibility with almost every application.
>
> > One other challenge is how to handle users trying to make operations
> > on the upper layer directly (migrating images etc).
> > As long as the tools still observe the whiteout as a chadev (with stat(2))
> > then export and import should work fine (creating a real chardev on import).
>
> Right.
>
> Can't mkfs.ext4 just create the static object? That sounds to me like
> the simplest approach.
>
Sure. I think that was the original intention and it is probably the easier way.
One thing that we will probably need to do is use the RENAME_WHITEOUT
interface as the explicit way to create the shared whiteout instead of using
vfs_whiteout() for filesystems that support RENAME_WHITEOUT
(we check for RENAME_WHITEOUT support anyway).
The only thing that bothered me in moving from per-ovl-instance singleton
to per-ext4-singleton is what happens if someone tries to (say) chown -R
the upper layer or some other offline modification that was working up to
now and seemed to make sense.
Surely, the ext4 singleton whiteout cannot allow modifications like that,
so what do we do about this? Let those scripts fail (if they exist) and
let their owners fix them to skip errors on whiteouts?
Thanks,
Amir.
Powered by blists - more mailing lists