lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87eczu41r9.fsf@igalia.com>
Date: Wed, 19 Feb 2025 11:23:06 +0000
From: Luis Henriques <luis@...lia.com>
To: Dave Chinner <david@...morbit.com>
Cc: Miklos Szeredi <miklos@...redi.hu>,  Bernd Schubert <bschubert@....com>,
  Alexander Viro <viro@...iv.linux.org.uk>,  Christian Brauner
 <brauner@...nel.org>,  Jan Kara <jack@...e.cz>,  Matt Harvey
 <mharvey@...ptrading.com>,  linux-fsdevel@...r.kernel.org,
  linux-kernel@...r.kernel.org,  Valentin Volkl <valentin.volkl@...n.ch>,
  Laura Promberger <laura.promberger@...n.ch>
Subject: Re: [PATCH v6 2/2] fuse: add new function to invalidate cache for
 all inodes

On Wed, Feb 19 2025, Dave Chinner wrote:

> On Tue, Feb 18, 2025 at 06:11:17PM +0000, Luis Henriques wrote:
>> On Tue, Feb 18 2025, Miklos Szeredi wrote:
>> 
>> > On Tue, 18 Feb 2025 at 12:51, Luis Henriques <luis@...lia.com> wrote:
>> >>
>> >> On Tue, Feb 18 2025, Miklos Szeredi wrote:
>> >>
>> >> > On Tue, 18 Feb 2025 at 11:04, Luis Henriques <luis@...lia.com> wrote:
>> >> >
>> >> >> The problem I'm trying to solve is that, if a filesystem wants to ask the
>> >> >> kernel to get rid of all inodes, it has to request the kernel to forget
>> >> >> each one, individually.  The specific filesystem I'm looking at is CVMFS,
>> >> >> which is a read-only filesystem that needs to be able to update the full
>> >> >> set of filesystem objects when a new generation snapshot becomes
>> >> >> available.
>> >> >
>> >> > Yeah, we talked about this use case.  As I remember there was a
>> >> > proposal to set an epoch, marking all objects for "revalidate needed",
>> >> > which I think is a better solution to the CVMFS problem, than just
>> >> > getting rid of unused objects.
>> >>
>> >> OK, so I think I'm missing some context here.  And, obviously, I also miss
>> >> some more knowledge on the filesystem itself.  But, if I understand it
>> >> correctly, the concept of 'inode' in CVMFS is very loose: when a new
>> >> snapshot generation is available (you mentioned 'epoch', which is, I
>> >> guess, the same thing) the inodes are all renewed -- the inode numbers
>> >> aren't kept between generations/epochs.
>> >>
>> >> Do you have any links for such discussions, or any details on how this
>> >> proposal is being implemented?  This would probably be done mostly in
>> >> user-space I guess, but it would still need a way to get rid of the unused
>> >> inodes from old snapshots, right?  (inodes from old snapshots still in use
>> >> would obvious be kept aroud).
>> >
>> > I don't have links.  Adding Valentin Volkl and Laura Promberger to the
>> > Cc list, maybe they can help with clarification.
>> >
>> > As far as I understand it would work by incrementing fc->epoch on
>> > FUSE_INVALIDATE_ALL. When an object is looked up/created the current
>> > epoch is copied to e.g. dentry->d_time.  fuse_dentry_revalidate() then
>> > compares d_time with fc->epoch and forces an invalidate on mismatch.
>> 
>> OK, so hopefully Valentin or Laura will be able to help providing some
>> more details.  But, from your description, we would still require this
>> FUSE_INVALIDATE_ALL operation to exist in order to increment the epoch.
>> And this new operation could do that *and* also already invalidate those
>> unused objects.
>
> I think you are still looking at this from the wrong direction.
>
> Invalidation is -not the operation- that is being requested. The
> CVMFS fuse server needs to update some global state in the kernel
> side fuse mount (i.e. the snapshot ID/epoch), and the need to evict
> cached inodes from previous IDs is a CVMFS implementation
> optimisation related to changing the global state.
>
>> > Only problem with this is that it seems very CVMFS specific, but I
>> > guess so is your proposal.
>> >
>> > Implementing the LRU purge is more generally useful, but I'm not sure
>> > if that helps CVMFS, since it would only get rid of unused objects.
>> 
>> The LRU inodes purge can indeed work for me as well, because my patch is
>> also only getting rid of unused objects, right?  Any inode still being
>> referenced will be kept around.
>> 
>> So, based on your reply, let me try to summarize a possible alternative
>> solution, that I think would be useful for CVMFS but also generic enough
>> for other filesystems:
>> 
>> - Add a new operation FUSE_INVAL_LRU_INODES, which would get rid of, at
>>   most, 'N' unused inodes.
>>
>> - This operation would have an argument 'N' with the maximum number of
>>   inodes to invalidate.
>>
>> - In addition, it would also increment this new fuse_connection attribute
>>   'epoch', to be used in the dentry revalidation as you suggested above
>
> As per above: invalidation is an implementation optimisation for the
> CVMFS epoch update. Invalidation, OTOH, does not imply that any fuse
> mount/connector global state (e.g. the epoch) needs to change...
>
> ii.e. the operation should be FUSE_UPDATE_EPOCH, not
> FUSE_INVAL_LRU_INODES...
>
>> 
>> - This 'N' could also be set to a pre-#define'ed value that would mean
>>   *all* (unused) inodes.
>
> Saying "only invalidate N inodes" makes no sense to me - it is
> fundamentally impossible for userspace to get right. Either the
> epoch update should evict all unreferenced inodes immediately, or it
> should leave them all behind to be purged by memory pressure or
> other periodic garbage collection mechanisms.

So, below I've a patch that is totally untested (not even compile-tested).
It's unlikely to be fully correct, but I just wanted to make sure I got
the main idea right.

What I'm trying to do there is to initialize this new 'epoch'
counter, both in the fuse connection and in every new dentry.  Then, in
the ->d_revalidate() it simply invalidate a dentry if the epochs don't
match.  Then, there's the new fuse notify operation to increment the
epoch and shrink dcache (dropped the call to {evict,invalidate}_inodes()
as Miklos suggested elsewhere).

Does this look reasonable?

(I may be missing other places where epoch should be checked or
initialized.)

Cheers,
-- 
Luís

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 5b5f789b37eb..f560d1bc327e 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1902,6 +1902,22 @@ static int fuse_notify_resend(struct fuse_conn *fc)
 	return 0;
 }
 
+static int fuse_notify_update_epoch(struct fuse_conn *fc)
+{
+	struct fuse_mount *fm;
+	struct inode *inode;
+
+	inode = fuse_ilookup(fc, FUSE_ROOT_ID, &fm);
+	if (!inode) || !fm)
+		return -ENOENT;
+	
+	iput(inode);
+	atomic_inc(&fc->epoch);
+	shrink_dcache_sb(fm->sb);
+
+	return 0;
+}
+
 static int fuse_notify(struct fuse_conn *fc, enum fuse_notify_code code,
 		       unsigned int size, struct fuse_copy_state *cs)
 {
@@ -1930,6 +1946,9 @@ static int fuse_notify(struct fuse_conn *fc, enum fuse_notify_code code,
 	case FUSE_NOTIFY_RESEND:
 		return fuse_notify_resend(fc);
 
+	case FUSE_NOTIFY_UPDATE_EPOCH:
+		return fuse_notify_update_epoch(fc);
+
 	default:
 		fuse_copy_finish(cs);
 		return -EINVAL;
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 198862b086ff..d4d58b169c57 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -204,6 +204,12 @@ static int fuse_dentry_revalidate(struct inode *dir, const struct qstr *name,
 	int ret;
 
 	inode = d_inode_rcu(entry);
+	if (inode) {
+		fm = get_fuse_mount(inode);
+		if (entry->d_time < atomic_read(&fm->fc->epoch))
+			goto invalid;
+	}
+
 	if (inode && fuse_is_bad(inode))
 		goto invalid;
 	else if (time_before64(fuse_dentry_time(entry), get_jiffies_64()) ||
@@ -446,6 +452,12 @@ static struct dentry *fuse_lookup(struct inode *dir, struct dentry *entry,
 		goto out_err;
 
 	entry = newent ? newent : entry;
+	if (inode) {
+		struct fuse_mount *fm = get_fuse_mount(inode);
+		entry->d_time = atomic_read(&fm->fc->epoch);
+	} else {
+		entry->d_time = 0;
+	}
 	if (outarg_valid)
 		fuse_change_entry_timeout(entry, &outarg);
 	else
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index fee96fe7887b..bb6b1ebaa42d 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -611,6 +611,8 @@ struct fuse_conn {
 	/** Number of fuse_dev's */
 	atomic_t dev_count;
 
+	atomic_t epoch;
+
 	struct rcu_head rcu;
 
 	/** The user id for this mount */
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index e9db2cb8c150..5d2d29fad658 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -959,6 +959,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm,
 	init_rwsem(&fc->killsb);
 	refcount_set(&fc->count, 1);
 	atomic_set(&fc->dev_count, 1);
+	atomic_set(&fc->epoch, 1);
 	init_waitqueue_head(&fc->blocked_waitq);
 	fuse_iqueue_init(&fc->iq, fiq_ops, fiq_priv);
 	INIT_LIST_HEAD(&fc->bg_queue);
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 5e0eb41d967e..62cc60e61cca 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -666,6 +666,7 @@ enum fuse_notify_code {
 	FUSE_NOTIFY_RETRIEVE = 5,
 	FUSE_NOTIFY_DELETE = 6,
 	FUSE_NOTIFY_RESEND = 7,
+	FUSE_NOTIFY_UPDATE_EPOCH = 8,
 	FUSE_NOTIFY_CODE_MAX,
 };
 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ