[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <z2yuwgzsbbirtfmr2rkgbq3yhjtvihumdxp4bvwgkybikubhgp@lfjfhlvtckmr>
Date: Mon, 11 Aug 2025 13:30:53 -0500
From: John Groves <John@...ves.net>
To: "Darrick J. Wong" <djwong@...nel.org>
Cc: Amir Goldstein <amir73il@...il.com>,
Dan Williams <dan.j.williams@...el.com>, Miklos Szeredi <miklos@...redi.hu>,
Bernd Schubert <bschubert@....com>, John Groves <jgroves@...ron.com>,
Jonathan Corbet <corbet@....net>, Vishal Verma <vishal.l.verma@...el.com>,
Dave Jiang <dave.jiang@...el.com>, Matthew Wilcox <willy@...radead.org>, Jan Kara <jack@...e.cz>,
Alexander Viro <viro@...iv.linux.org.uk>, Christian Brauner <brauner@...nel.org>,
Randy Dunlap <rdunlap@...radead.org>, Jeff Layton <jlayton@...nel.org>,
Kent Overstreet <kent.overstreet@...ux.dev>, linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
nvdimm@...ts.linux.dev, linux-cxl@...r.kernel.org, linux-fsdevel@...r.kernel.org,
Jonathan Cameron <Jonathan.Cameron@...wei.com>, Stefan Hajnoczi <shajnocz@...hat.com>,
Joanne Koong <joannelkoong@...il.com>, Josef Bacik <josef@...icpanda.com>,
Aravind Ramesh <arramesh@...ron.com>, Ajay Joshi <ajayjoshi@...ron.com>, john@...ves.net
Subject: Re: [RFC V2 10/18] famfs_fuse: Basic fuse kernel ABI enablement for
famfs
On 25/07/10 08:32PM, John Groves wrote:
> On 25/07/08 06:53PM, Darrick J. Wong wrote:
> > On Tue, Jul 08, 2025 at 07:02:03AM -0500, John Groves wrote:
> > > On 25/07/07 10:39AM, Darrick J. Wong wrote:
> > > > On Fri, Jul 04, 2025 at 08:39:59AM -0500, John Groves wrote:
> > > > > On 25/07/04 09:54AM, Amir Goldstein wrote:
> > > > > > On Thu, Jul 3, 2025 at 8:51 PM John Groves <John@...ves.net> wrote:
> > > > > > >
> > > > > > > * FUSE_DAX_FMAP flag in INIT request/reply
> > > > > > >
> > > > > > > * fuse_conn->famfs_iomap (enable famfs-mapped files) to denote a
> > > > > > > famfs-enabled connection
> > > > > > >
> > > > > > > Signed-off-by: John Groves <john@...ves.net>
> > > > > > > ---
> > > > > > > fs/fuse/fuse_i.h | 3 +++
> > > > > > > fs/fuse/inode.c | 14 ++++++++++++++
> > > > > > > include/uapi/linux/fuse.h | 4 ++++
> > > > > > > 3 files changed, 21 insertions(+)
> > > > > > >
> > > > > > > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> > > > > > > index 9d87ac48d724..a592c1002861 100644
> > > > > > > --- a/fs/fuse/fuse_i.h
> > > > > > > +++ b/fs/fuse/fuse_i.h
> > > > > > > @@ -873,6 +873,9 @@ struct fuse_conn {
> > > > > > > /* Use io_uring for communication */
> > > > > > > unsigned int io_uring;
> > > > > > >
> > > > > > > + /* dev_dax_iomap support for famfs */
> > > > > > > + unsigned int famfs_iomap:1;
> > > > > > > +
> > > > > >
> > > > > > pls move up to the bit fields members.
> > > > >
> > > > > Oops, done, thanks.
> > > > >
> > > > > >
> > > > > > > /** Maximum stack depth for passthrough backing files */
> > > > > > > int max_stack_depth;
> > > > > > >
> > > > > > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> > > > > > > index 29147657a99f..e48e11c3f9f3 100644
> > > > > > > --- a/fs/fuse/inode.c
> > > > > > > +++ b/fs/fuse/inode.c
> > > > > > > @@ -1392,6 +1392,18 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
> > > > > > > }
> > > > > > > if (flags & FUSE_OVER_IO_URING && fuse_uring_enabled())
> > > > > > > fc->io_uring = 1;
> > > > > > > + if (IS_ENABLED(CONFIG_FUSE_FAMFS_DAX) &&
> > > > > > > + flags & FUSE_DAX_FMAP) {
> > > > > > > + /* XXX: Should also check that fuse server
> > > > > > > + * has CAP_SYS_RAWIO and/or CAP_SYS_ADMIN,
> > > > > > > + * since it is directing the kernel to access
> > > > > > > + * dax memory directly - but this function
> > > > > > > + * appears not to be called in fuse server
> > > > > > > + * process context (b/c even if it drops
> > > > > > > + * those capabilities, they are held here).
> > > > > > > + */
> > > > > > > + fc->famfs_iomap = 1;
> > > > > > > + }
> > > > > >
> > > > > > 1. As long as the mapping requests are checking capabilities we should be ok
> > > > > > Right?
> > > > >
> > > > > It depends on the definition of "are", or maybe of "mapping requests" ;)
> > > > >
> > > > > Forgive me if this *is* obvious, but the fuse server capabilities are what
> > > > > I think need to be checked here - not the app that it accessing a file.
> > > > >
> > > > > An app accessing a regular file doesn't need permission to do raw access to
> > > > > the underlying block dev, but the fuse server does - becuase it is directing
> > > > > the kernel to access that for apps.
> > > > >
> > > > > > 2. What's the deal with capable(CAP_SYS_ADMIN) in process_init_limits then?
> > > > >
> > > > > I *think* that's checking the capabilities of the app that is accessing the
> > > > > file, and not the fuse server. But I might be wrong - I have not pulled very
> > > > > hard on that thread yet.
> > > >
> > > > The init reply should be processed in the context of the fuse server.
> > > > At that point the kernel hasn't exposed the fs to user programs, so
> > > > (AFAICT) there won't be any other programs accessing that fuse mount.
> > >
> > > Hmm. It would be good if you're right about that. My fuse server *is* running
> > > as root, and when I check those capabilities in process_init_reply(), I
> > > find those capabilities. So far so good.
> > >
> > > Then I added code to my fuse server to drop those capabilities prior to
> > > starting the fuse session (prctl(PR_CAPBSET_DROP, CAP_SYS_RAWIO) and
> > > prctl(PR_CAPBSET_DROP, CAP_SYS_ADMIN). I expected (hoped?) to see those
> > > capabilities disappear in process_init_reply() - but they did not disappear.
> > >
> > > I'm all ears if somebody can see a flaw in my logic here. Otherwise, the
> > > capabilities need to be stashed away before the reply is processsed, when
> > > fs/fuse *is* running in fuse server context.
> > >
> > > I'm somewhat surprised if that isn't already happening somewhere...
> >
> > Hrm. I *thought* that since FUSE_INIT isn't queued as a background
> > command, it should still execute in the same process context as the fuse
> > server.
> >
> > OTOH it also occurs to me that I have this code in fuse_send_init:
> >
> > if (has_capability_noaudit(current, CAP_SYS_RAWIO))
> > flags |= FUSE_IOMAP | FUSE_IOMAP_DIRECTIO | FUSE_IOMAP_PAGECACHE;
> > ...
> > ia->in.flags = flags;
> > ia->in.flags2 = flags >> 32;
> >
> > which means that we only advertise iomap support in FUSE_INIT if the
> > process running fuse_fill_super (which you hope is the fuse server)
> > actually has CAP_SYS_RAWIO. Would that work for you? Or are you
> > dropping privileges before you even open /dev/fuse?
>
> Ah - that might be the answer. I will check if dropped capabilities
> disappear in fuse_send_init. If so, I can work with that - not advertising
> the famfs capability unless the capability is present at that point looks
> like a perfectly good option. Thanks for that idea!
Review: the famfs fuse server directs the kernel to provide access to raw
(memory) devices, so it should should be required to have have the
CAP_SYS_RAWIO capability. fs/fuse needs to detect this at init time,
and fail the connection/mount if the capability is missing.
I initially attempted to do this verification in process_init_reply(), but
that doesn't run in the fuse server process context.
I am now checking the capability in fuse_send_init(), and not advertising
the FUSE_DAX_FMAP capability (in in_args->flags[2]) unless the server has
CAP_SYS_RAWIO.
That requires that process_init_reply() reject FUSE_DAX_FMAP from a server
if FUSE_DAX_FMAP was not set in in_args->flags[2]. process_init_reply() was
not previously checking the in_args, but no big deal - this works.
This leads to an apparent dilemma in libfuse. In fuse_lowlevel_ops->init(),
I should check for (flags & FUSE_DAX_IOMAP), and fail the connection if
that capability is not on offer. But fuse_lowlevel_ops->init() doesn't
have an obvious way to fail the connection.
How should I do that? Hoping Bernd, Amir or the other libfuse people may
have "the answer" (tm).
And of course if any of this doesn't sound like the way to go, let me know...
Thanks!
John
Powered by blists - more mailing lists