[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <afeb9082273098f47b26371a7e252381d1268c8e.camel@ibm.com>
Date: Thu, 13 Mar 2025 21:46:54 +0000
From: Viacheslav Dubeyko <Slava.Dubeyko@....com>
To: "slava@...eyko.com" <slava@...eyko.com>,
David Howells
<dhowells@...hat.com>
CC: Xiubo Li <xiubli@...hat.com>,
"linux-fsdevel@...r.kernel.org"
<linux-fsdevel@...r.kernel.org>,
"ceph-devel@...r.kernel.org"
<ceph-devel@...r.kernel.org>,
"brauner@...nel.org" <brauner@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Alex Markuze
<amarkuze@...hat.com>,
"jlayton@...nel.org" <jlayton@...nel.org>,
"idryomov@...il.com" <idryomov@...il.com>,
"viro@...iv.linux.org.uk"
<viro@...iv.linux.org.uk>
Subject: RE: Does ceph_fill_inode() mishandle I_NEW?
On Thu, 2025-03-13 at 20:47 +0000, David Howells wrote:
> slava@...eyko.com wrote:
>
> > What do you mean by mishandling? Do you imply that Ceph has to set up
> > the I_NEW somehow? Is it not VFS responsibility?
>
> No - I mean that if I_NEW *isn't* set when the function is called,
> ceph_fill_inode() will go and partially reinitialise the inode. Now, having
> reviewed the code in more depth and talked to Jeff Layton about it, I think
> that the non-I_NEW pass will only change pointers with some sort of locking
> and will release the old target - though it may overwrite some pointers with
> the same value without protection (i_fops for example).
>
> That said, if it's possible for *two* processes to be going through that
> function without I_NEW set, you can get places where both of them will try
> freeing the old data and replacing it with new without any locking - but I
> don't know if that can happen.
>
I see your point now.
As far as I can see, ceph_fill_inode() has comment: "Populate an inode based on
info from mds. May be called on new or existing inodes". It sounds to me that
particular CephFS kernel client could have obsolete state of inode compared with
MDS's state. And we need to "re-new" the existing inode with the actual state
that we received from MDS side. My vision is that we need to take into account
the distributed nature of Ceph and inode metadata can be updated from multiple
CephFS kernel client instances. Am I right here?
Thanks,
Slava.
Powered by blists - more mailing lists