[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1edf9fe3-5e39-463b-8825-67b4d1ad01be@linux.alibaba.com>
Date: Tue, 17 Sep 2024 15:14:58 +0800
From: Gao Xiang <hsiangkao@...ux.alibaba.com>
To: Yiyang Wu <toolmanp@...p.cc>, Al Viro <viro@...iv.linux.org.uk>
Cc: linux-erofs@...ts.ozlabs.org, rust-for-linux@...r.kernel.org,
linux-fsdevel@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 19/24] erofs: introduce namei alternative to C
On 2024/9/17 14:48, Yiyang Wu wrote:
> On Mon, Sep 16, 2024 at 06:08:01PM GMT, Al Viro wrote:
>> On Mon, Sep 16, 2024 at 09:56:29PM +0800, Yiyang Wu wrote:
>>> +/// Lookup function for dentry-inode lookup replacement.
>>> +#[no_mangle]
>>> +pub unsafe extern "C" fn erofs_lookup_rust(
>>> + k_inode: NonNull<inode>,
>>> + dentry: NonNull<dentry>,
>>> + _flags: c_uint,
>>> +) -> *mut c_void {
>>> + // SAFETY: We are sure that the inode is a Kernel Inode since alloc_inode is called
>>> + let erofs_inode = unsafe { &*container_of!(k_inode.as_ptr(), KernelInode, k_inode) };
>>
>> Ummm... A wrapper would be highly useful. And the reason why
>> it's safe is different - your function is called only via ->i_op->lookup,
>> the is only one instance of inode_operations that has that ->lookup
>> method, and the only place where an inode gets ->i_op set to that
>> is erofs_fill_inode(). Which is always passed erofs_inode::vfs_inode.
>>
> So my original intention behind this is that all vfs_inodes come from
> that erofs_iget function and it's always gets initialized in this case
> And this just followes the same convention here. I can document this
> more precisely.
I think Al just would like a wrapper here, like the current C EROFS_I().
>>> + // SAFETY: The super_block is initialized when the erofs_alloc_sbi_rust is called.
>>> + let sbi = erofs_sbi(unsafe { NonNull::new(k_inode.as_ref().i_sb).unwrap() });
>>
>> Again, that calls for a wrapper - this time not erofs-specific;
>> inode->i_sb is *always* non-NULL, is assign-once and always points
>> to live struct super_block instance at least until the call of
>> destroy_inode().
>>
>
> Will be modified correctly, I'm not a native speaker and I just can't
> find a better way, I will take my note here.
Same here, like the current EROFS_I_SB().
>>> + // SAFETY: this is backed by qstr which is c representation of a valid slice.
>>
>> What is that sentence supposed to mean? Nevermind "why is it correct"...
>>
>>> + let name = unsafe {
>>> + core::str::from_utf8_unchecked(core::slice::from_raw_parts(
>>> + dentry.as_ref().d_name.name,
>>> + dentry.as_ref().d_name.__bindgen_anon_1.__bindgen_anon_1.len as usize,
>>
...
>
>> Current erofs_lookup() (and your version as well) *is* indeed
>> safe in that respect, but the proof (from filesystem POV) is that "it's
>> called only as ->lookup() instance, so dentry is initially unhashed
>> negative and will remain such until it's passed to d_splice_alias();
>> until that point it is guaranteed to have ->d_name and ->d_parent stable".
Agreed.
>>
>> Note that once you _have_ called d_splice_alias(), you can't
>> count upon the ->d_name stability - or, indeed, upon ->d_name.name you've
>> sampled still pointing to allocated memory.
>>
>> For directory-modifying methods it's "stable, since parent is held
>> exclusive". Some internal function called from different environments?
>> Well... Swear, look through the call graph and see what can be proven
>> for each.
>
> Sorry for my ignorance.
> I mean i just borrowed the code from the fs/erofs/namei.c and i directly
> translated that into Rust code. That might be a problem that also
> exists in original working C code.
As for EROFS (an immutable fs), I think after d_splice_alias(), d_name is
still stable (since we don't have rename semantics likewise for now).
But as the generic filesystem POV, d_name access is actually tricky under
RCU walk path indeed.
>
>> Expressing that kind of fun in any kind of annotations (Rust type
>> system included) is not pleasant. _Probably_ might be handled by a type
>> that would be a dentry pointer with annotation along the lines "->d_name
>> and ->d_parent of that one are stable". Then e.g. ->lookup() would
>> take that thing as an argument and d_splice_alias() would consume it.
>> ->mkdir() would get the same thing, etc. I hadn't tried to get that
>> all way through (the amount of annotation churn in existing filesystems
>> would be high and hard to split into reviewable patch series), so there
>> might be dragons - and there definitely are places where the stability is
>> proven in different ways (e.g. if dentry->d_lock is held, we have the damn
>> thing stable; then there's a "take a safe snapshot of name" API; etc.).
>
> That's kinda interesting, I originally thought that VFS will make sure
> its d_name / d_parent is stable in the first place.
> Again, I just don't have a full picture or understanding of VFS and my
> code is just basic translation of original C code, Maybe we can address
> this later.
d_alloc will allocate an unhashed dentry which is almost unrecognized
by VFS dcache (d_name is stable of course).
After d_splice_alias() and d_add(), rename() could change d_name. So
either we take d_lock or with rcu_read_lock() to take a snapshot of
d_name in the RCU walk path. That is my overall understanding.
But for EROFS, since we don't have rename, so it doesn't matter.
Thanks,
Gao Xiang
Powered by blists - more mailing lists