linux-kernel - Re: [PATCH 3/3] overlayfs: Report writeback errors on upper

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201224101215.GA23046@ircssh-2.c.rugged-nimbus-611.internal>
Date:   Thu, 24 Dec 2020 10:12:16 +0000
From:   Sargun Dhillon <sargun@...gun.me>
To:     Amir Goldstein <amir73il@...il.com>
Cc:     Matthew Wilcox <willy@...radead.org>,
        Vivek Goyal <vgoyal@...hat.com>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        overlayfs <linux-unionfs@...r.kernel.org>,
        Jeff Layton <jlayton@...nel.org>,
        Miklos Szeredi <miklos@...redi.hu>, Jan Kara <jack@...e.cz>,
        NeilBrown <neilb@...e.com>, Al Viro <viro@...iv.linux.org.uk>,
        Christoph Hellwig <hch@....de>,
        Chengguang Xu <cgxu519@...ernel.net>
Subject: Re: [PATCH 3/3] overlayfs: Report writeback errors on upper

On Thu, Dec 24, 2020 at 11:32:55AM +0200, Amir Goldstein wrote:
> On Wed, Dec 23, 2020 at 10:44 PM Matthew Wilcox <willy@...radead.org> wrote:
> >
> > On Wed, Dec 23, 2020 at 08:21:41PM +0000, Sargun Dhillon wrote:
> > > On Wed, Dec 23, 2020 at 08:07:46PM +0000, Matthew Wilcox wrote:
> > > > On Wed, Dec 23, 2020 at 07:29:41PM +0000, Sargun Dhillon wrote:
> > > > > On Wed, Dec 23, 2020 at 06:50:44PM +0000, Matthew Wilcox wrote:
> > > > > > On Wed, Dec 23, 2020 at 06:20:27PM +0000, Sargun Dhillon wrote:
> > > > > > > I fail to see why this is neccessary if you incorporate error reporting into the
> > > > > > > sync_fs callback. Why is this separate from that callback? If you pickup Jeff's
> > > > > > > patch that adds the 2nd flag to errseq for "observed", you should be able to
> > > > > > > stash the first errseq seen in the ovl_fs struct, and do the check-and-return
> > > > > > > in there instead instead of adding this new infrastructure.
> > > > > >
> > > > > > You still haven't explained why you want to add the "observed" flag.
> > > > >
> > > > >
> > > > > In the overlayfs model, many users may be using the same filesystem (super block)
> > > > > for their upperdir. Let's say you have something like this:
> > > > >
> > > > > /workdir [Mounted FS]
> > > > > /workdir/upperdir1 [overlayfs upperdir]
> > > > > /workdir/upperdir2 [overlayfs upperdir]
> > > > > /workdir/userscratchspace
> > > > >
> > > > > The user needs to be able to do something like:
> > > > > sync -f ${overlayfs1}/file
> > > > >
> > > > > which in turn will call sync on the the underlying filesystem (the one mounted
> > > > > on /workdir), and can check if the errseq has changed since the overlayfs was
> > > > > mounted, and use that to return an error to the user.
> > > >
> > > > OK, but I don't see why the current scheme doesn't work for this.  If
> > > > (each instance of) overlayfs samples the errseq at mount time and then
> > > > check_and_advances it at sync time, it will see any error that has occurred
> > > > since the mount happened (and possibly also an error which occurred before
> > > > the mount happened, but hadn't been reported to anybody before).
> > > >
> > >
> > > If there is an outstanding error at mount time, and the SEEN flag is unset,
> > > subsequent errors will not increment the counter, until the user calls sync on
> > > the upperdir's filesystem. If overlayfs calls check_and_advance on the upperdir's
> > > super block at any point, it will then set the seen block, and if the user calls
> > > syncfs on the upperdir, it will not return that there is an outstanding error,
> > > since overlayfs just cleared it.
> >
> > Your concern is this case:
> >
> > fs is mounted on /workdir
> > /workdir/A is written to and then closed.
> > writeback happens and -EIO happens, but there's nobody around to care.
> > /workdir/upperdir1 becomes part of an overlayfs mount
> > overlayfs samples the error
> > a user writes to /workdir/B, another -EIO occurs, but nothing happens
> > someone calls syncfs on /workdir/upperdir/A, gets the EIO.
> > a user opens /workdir/B and calls syncfs, but sees no error
> >
> > do i have that right?  or is it something else?
> 
> IMO it is something else. Others may disagree.
> IMO the level of interference between users accessing overlay and users
> accessing upper fs directly is not well defined and it can stay this way.
> 
> Concurrent access to  /workdir/upperdir/A via overlay and underlying fs
> is explicitly warranted against in Documentation/filesystems/overlayfs.rst#
> Changes to underlying filesystems:
> "Changes to the underlying filesystems while part of a mounted overlay
> filesystem are not allowed.  If the underlying filesystem is changed,
> the behavior of the overlay is undefined, though it will not result in
> a crash or deadlock."
> 
> The question is whether syncfs(open(/workdir/B)) is considered
> "Changes to the underlying filesystems". Regardless of the answer,
> this is not an interesting case IMO.
> 
> The real issue is with interference between overlays that share the
> same upper fs, because this is by far and large the common use case
> that is creating real problems for a lot of container users.
> 
> Workloads running inside containers (with overlayfs storage driver)
> will never be as isolated as workloads running inside VMs, but it
> doesn't mean we cannot try to improve.
> 
> In current master, syncfs() on any file by any container user will
> result in full syncfs() of the upperfs, which is very bad for container
> isolation. This has been partly fixed by Chengguang Xu [1] and I expect
> his work will be merged soon. Overlayfs still does not do the writeback
> and syncfs() in overlay still waits for all upper fs writeback to complete,
> but at least syncfs() in overlay only kicks writeback for upper fs files
> dirtied by this overlay.
> 
> [1] https://lore.kernel.org/linux-unionfs/CAJfpegsbb4iTxW8ZyuRFVNc63zg7Ku7vzpSNuzHASYZH-d5wWA@mail.gmail.com/
> 
> Sharing the same SEEN flag among thousands of containers is also
> far from ideal, because effectively this means that any given workload
> in any single container has very little chance of observing the SEEN flag.
> 
> To this end, I do agree with Matthew that overlayfs should sample errseq
> and the best patchset to implement it so far IMO is Jeff's patchset [2].
> This patch set was written to cater only "volatile" overlayfs mount, but
> there is no reason not to use the same mechanism for regular overlay
> mount. The only difference being that "volatile" overlay only checks for
> error since mount on syncfs() (because "volatile" overlay does NOT
> syncfs upper fs) and regular overlay checks and advances the overlay's
> errseq sample on syncfs (and does syncfs upper fs).
> 
> Matthew, I hope that my explanation of the use case and Jeff's answer
> is sufficient to understand why the split of the SEEN flag is needed.
> 
> [2] https://lore.kernel.org/linux-unionfs/20201213132713.66864-1-jlayton@kernel.org/
> 
> w.r.t Vivek's patchset (this one), I do not object to it at all, but it fixes
> a problem that Jeff's patch had already solved with an ugly hack:
> 
>   /* Propagate errors from upper to overlayfs */
>   ret = errseq_check(&upper_sb->s_wb_err, ofs->err_mark);
>   errseq_set(&sb->s_wb_err, ret);
> 
> Since Jeff's patch is minimal, I think that it should be the fix applied
> first and proposed for stable (with adaptations for non-volatile overlay).
> 
> I guess that Vivek's patch 1/3 from this series [3] is also needed to
> complement the work that should go to stable.
> 
> Vivek, Sargun,
> 
> Do you understand my proposal?
Yes. I agree that Jeff's patch should be added to stable. The fact we don't
bubble up writeback errors turns out to be a real problem that I never knew
was happening, but upon investigating, it looks like a real thing.

I think we can use Jeff's hacky approach in stable, as it's far more minimal, 
and has a much lower chance of causing issues, but if we make further 
improvements, we wont be able to backport them to stable as easily. I have
nothing explicitly against Vivek's approach though.

> Do you agree with it as a way forward to address the various syncfs
> issues for volatile/non-volatile that both of you were trying to address?
Yes. I think Vivek's patchset of introducing a new superblock callback is
the best approach.

> 
> Sargun, I know this all discussion has forked from your volatile re-use
> patch set, but let's not confuse fsdevel forks more than we have to.
> The way forward for volatile re-use from this proposal is straight forward.

I think that Vivek's patchset of adding a new callback, plus Jeff's new flag 
solves detection of errors for normal mounts, volatile mounts, and volatile 
remounts.

I think I responded to Jeff's patch and it looked good, bar one of the loops.
My only suggestion is that someone add the intended behaviour here as comment
to super_ops of the new callback.
> 
> Thanks,
> Amir.