[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOQ4uxhxvFt3_Wb3BGcjj4pGp=OFTBHNPJ4r4eH8245t-+CW+g@mail.gmail.com>
Date: Sat, 16 Dec 2023 13:38:16 +0200
From: Amir Goldstein <amir73il@...il.com>
To: Vinicius Costa Gomes <vinicius.gomes@...el.com>
Cc: hu1.chen@...el.com, miklos@...redi.hu, malini.bhandaru@...el.com,
tim.c.chen@...el.com, mikko.ylinen@...el.com, lizhen.you@...el.com,
linux-unionfs@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-fsdevel <linux-fsdevel@...r.kernel.org>, Christian Brauner <brauner@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>, David Howells <dhowells@...hat.com>
Subject: Re: [RFC] HACK: overlayfs: Optimize overlay/restore creds
On Sat, Dec 16, 2023 at 12:16 PM Amir Goldstein <amir73il@...il.com> wrote:
>
> On Fri, Dec 15, 2023 at 10:00 PM Vinicius Costa Gomes
> <vinicius.gomes@...el.com> wrote:
> >
> > Hi Amir,
> >
> > Amir Goldstein <amir73il@...il.com> writes:
> >
> > > +fsdevel because this may be relevant to any subsystem that
> > > keeps a long live cred copy (e.g. nfsd, ksmbd, cachefiles).
> > >
> > > +linus who wrote
> > > d7852fbd0f04 access: avoid the RCU grace period for the temporary
> > > subjective credentials
> > >
> > >
> > > On Fri, Dec 15, 2023 at 12:02 AM Vinicius Costa Gomes
> > > <vinicius.gomes@...el.com> wrote:
> > >>
> > >> Permission checks in overlayfs also check against the credentials
> > >> associated with the superblock, which are assigned at mount() time, so
> > >> pretty long lived. So, we can omit the reference counting for this
> > >> case.
> > >
> > > You forgot to mention WHY you are proposing this and to link to the
> > > original report with the first optimization attempt:
> > >
> > > https://lore.kernel.org/linux-unionfs/20231018074553.41333-1-hu1.chen@intel.com/
> > >
> >
> > I thought that the "in-reply-to" would do that, I should have been more
> > explicit on the context. Sorry.
> >
> > >>
> > >> This (very early) proof of concept does two things:
> > >>
> > >> Add a flag "immutable" (TODO: find a better name) to struct cred to
> > >> indicate that it is long lived, and that refcount can be omitted.
> > >>
> > >
> > > This reminds me of the many discussions about Rust abstractions
> > > that are going on right now.
> > > I think an abstraction like this one is called a "borrowed reference".
> > >
> >
> > Yeah, very similar to a borrow in rust.
> >
> > >> Add "guard" helpers, so we can use automatic cleanup to be sure
> > >> override/restore are always paired. (I didn't like that I have
> > >> 'ovl_cred' to be a container for the credentials, but couldn't think
> > >> of other solutions)
> > >>
> > >
> > > I like the guard but see comments below...
> > >
> > >> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@...el.com>
> > >> ---
> > >> Hi Amir,
> > >>
> > >> Just to know if I am more or less on right track.
> > >>
> > >> This is a different attempt, instead of the local copy idea, I am
> > >> using the fact that the credentials associated with the mount() will
> > >> be alive for a long time. I think the result is almost the same. But I
> > >> could be missing something.
> > >>
> > >> TODO:
> > >> - Add asserts.
> > >> - Replace ovl_override_creds()/revert_Creds() by
> > >> ovl_creator_cred()/guard() everywhere.
> > >> - Probably more.
> > >>
> > >>
> > >> fs/overlayfs/inode.c | 7 ++++---
> > >> fs/overlayfs/overlayfs.h | 18 ++++++++++++++++++
> > >> fs/overlayfs/params.c | 4 +++-
> > >> fs/overlayfs/super.c | 10 +++++++---
> > >> fs/overlayfs/util.c | 10 ++++++++++
> > >> include/linux/cred.h | 12 ++++++++++--
> > >> 6 files changed, 52 insertions(+), 9 deletions(-)
> > >>
> > >> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> > >> index c63b31a460be..2c016a3bbe2d 100644
> > >> --- a/fs/overlayfs/inode.c
> > >> +++ b/fs/overlayfs/inode.c
> > >> @@ -290,9 +290,9 @@ int ovl_permission(struct mnt_idmap *idmap,
> > >> struct inode *inode, int mask)
> > >> {
> > >> struct inode *upperinode = ovl_inode_upper(inode);
> > >> + struct ovl_cred ovl_cred;
> > >> struct inode *realinode;
> > >> struct path realpath;
> > >> - const struct cred *old_cred;
> > >
> > > Nit: please don't reorder the variable definitions.
> > >
> >
> > Sorry about that. "bad" habits from the networking side :-)
> >
>
> I fully support all forms and shapes of OCD ;-)
>
> > >> int err;
> > >>
> > >> /* Careful in RCU walk mode */
> > >> @@ -310,7 +310,9 @@ int ovl_permission(struct mnt_idmap *idmap,
> > >> if (err)
> > >> return err;
> > >>
> > >> - old_cred = ovl_override_creds(inode->i_sb);
> > >> + ovl_cred = ovl_creator_cred(inode->i_sb);
> > >> + guard(ovl_creds)(&ovl_cred);
> > >> +
> > >> if (!upperinode &&
> > >> !special_file(realinode->i_mode) && mask & MAY_WRITE) {
> > >> mask &= ~(MAY_WRITE | MAY_APPEND);
> > >> @@ -318,7 +320,6 @@ int ovl_permission(struct mnt_idmap *idmap,
> > >> mask |= MAY_READ;
> > >> }
> > >> err = inode_permission(mnt_idmap(realpath.mnt), realinode, mask);
> > >> - revert_creds(old_cred);
> > >>
> > >> return err;
> > >> }
> > >> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> > >> index 05c3dd597fa8..22ea3066376e 100644
> > >> --- a/fs/overlayfs/overlayfs.h
> > >> +++ b/fs/overlayfs/overlayfs.h
> > >> @@ -416,6 +416,24 @@ static inline int ovl_do_getattr(const struct path *path, struct kstat *stat,
> > >> return vfs_getattr(path, stat, request_mask, flags);
> > >> }
> > >>
> > >> +struct ovl_cred {
> > >> + const struct cred *cred;
> > >> +};
> > >> +
> > >> +static inline struct ovl_cred ovl_creator_cred(struct super_block *sb)
> > >> +{
> > >> + struct ovl_fs *ofs = OVL_FS(sb);
> > >> +
> > >> + return (struct ovl_cred) { .cred = ofs->creator_cred };
> > >> +}
> > >> +
> > >> +void ovl_override_creds_new(struct ovl_cred *creator_cred);
> > >> +void ovl_revert_creds_new(struct ovl_cred *creator_cred);
> > >> +
> > >> +DEFINE_GUARD(ovl_creds, struct ovl_cred *,
> > >> + ovl_override_creds_new(_T),
> > >> + ovl_revert_creds_new(_T));
> > >> +
> > >
> > > This pattern is not unique to overlayfs.
> > > It is probably better to define a common container type struct override_cred
> > > in cred.h/cred.c that other code could also use.
> > >
> >
> > Good idea.
> >
> > >> /* util.c */
> > >> int ovl_get_write_access(struct dentry *dentry);
> > >> void ovl_put_write_access(struct dentry *dentry);
> > >> diff --git a/fs/overlayfs/params.c b/fs/overlayfs/params.c
> > >> index 3fe2dde1598f..008377b9241a 100644
> > >> --- a/fs/overlayfs/params.c
> > >> +++ b/fs/overlayfs/params.c
> > >> @@ -770,8 +770,10 @@ void ovl_free_fs(struct ovl_fs *ofs)
> > >> kfree(ofs->config.lowerdirs);
> > >> kfree(ofs->config.upperdir);
> > >> kfree(ofs->config.workdir);
> > >> - if (ofs->creator_cred)
> > >> + if (ofs->creator_cred) {
> > >> + cred_set_immutable(ofs->creator_cred, false);
> > >> put_cred(ofs->creator_cred);
> > >
> > > Not happy about this API.
> > >
> > > Two solutions I can think of:
> > > 1. (my preference) keep two copies of creator_cred, one refcounted copy
> > > and one non-refcounted that is used for override_creds()
> > > 2. put_cred_ref() which explicitly opts-in to dropping refcount on
> > > a borrowed reference, same as you do above but hidden behind
> > > a properly documented helper
> > >
> >
> > Probably because I already have option (2) more or less understood, but
> > I think that having a single creator_cred marked as
> > "non-refcounted/long-lived" is simpler than having two copies, even the
> > the extra copy only exists for the duration of the override.
> >
> > But it could be that I still can't imagine what you have in mind about
> > (1).
> >
>
> (1) is actually quite similar to your first proposal of copying creator_cred
> to a local stack variable.
> My only concern about this proposal was that the core code was not aware
> of the fact that it is working with an object that must not be freed.
> And this is where non_refcount comes in to help.
>
> With (2) prepare_creds_ref()/put_cred_ref() can be used to manage the
> lifetime of a long-lived cred object from the cred_jar memory pool.
>
> With (1) the caller is responsible to manage the lifetime of the non-refcounted
> copy, for example, if the object lives on the stack.
>
> For overlayfs this could mean, either create a non-refcounted copy on the
> stack in every operator using ovl_creator_cred() helper as your first proposal
> did, or keep another non-refcounted copy in ofs->override_cred, which gets
> allocated/freed by overlayfs (i.e. kmalloc/kfree).
>
> > >> + }
> > >> kfree(ofs);
> > >> }
> > >>
> > >> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > >> index a0967bb25003..1ffb4f0f8186 100644
> > >> --- a/fs/overlayfs/super.c
> > >> +++ b/fs/overlayfs/super.c
> > >> @@ -1304,6 +1304,13 @@ int ovl_fill_super(struct super_block *sb, struct fs_context *fc)
> > >> if (!cred)
> > >> goto out_err;
> > >>
> > >> + /* Never override disk quota limits or use reserved space */
> > >> + cap_lower(cred->cap_effective, CAP_SYS_RESOURCE);
> > >> + /* The cred that is going to be associated with the super
> > >> + * block will not change.
> > >> + */
> > >> + cred_set_immutable(cred, true);
> > >> +
> > >
> > > Likewise, either:
> > > 1. Create a non-refcounted copy of creator_cred
> >
> > Ah! I think now I see what you meant. Not sure I like, I think it's a
> > bit too error prone, it's hard to enforce that the copies would be kept
> > in sync in general. (even if in practice the only thing that would need
> > to be kept in sync is the destruction of both, at least for now).
> >
>
> I agree that it makes more sense to create the non-refcounted copy
> on the stack.
>
> As a matter of fact, maybe it makes sense to embed a non-refcounted
> copy in the struct used for the guard:
>
> struct override_cred {
> struct cred copy;
> const struct cred *cred;
> };
>
> Then override_cred_save() can create the non-refcounted copy:
>
> void override_cred_save(struct override_cred *override)
> {
> override->copy = *override;
> override->copy.non_refcount = 1;
> override->cred = override_creds(&override->copy);
> override->cred = override_creds(override->cred);
This second line is leftover of course...
> }
>
> > > or
> > > 2. Use a documented helper prepare_creds_ref() to hide
> > > this implementation detail
> >
> > This I like more, having the properties documented in the constructor.
> > And much better than my _set_immutable().
> >
>
> Yes, the important thing is that an object cannot change
> its non_refcount property during its lifetime -
... which means that put_creds_ref() should assert that
there is only a single refcount - the one handed out by
prepare_creds_ref() before removing non_refcount or
directly freeing the cred object.
I must say that the semantics of making a non-refcounted copy
to an object whose lifetime is managed by the caller sounds a lot
less confusing to me.
Thanks,
Amir.
Powered by blists - more mailing lists