linux-kernel - Re: [RFC PATCH 19/24] erofs: introduce namei alternative to C

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1edf9fe3-5e39-463b-8825-67b4d1ad01be@linux.alibaba.com>
Date: Tue, 17 Sep 2024 15:14:58 +0800
From: Gao Xiang <hsiangkao@...ux.alibaba.com>
To: Yiyang Wu <toolmanp@...p.cc>, Al Viro <viro@...iv.linux.org.uk>
Cc: linux-erofs@...ts.ozlabs.org, rust-for-linux@...r.kernel.org,
 linux-fsdevel@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 19/24] erofs: introduce namei alternative to C



On 2024/9/17 14:48, Yiyang Wu wrote:
> On Mon, Sep 16, 2024 at 06:08:01PM GMT, Al Viro wrote:
>> On Mon, Sep 16, 2024 at 09:56:29PM +0800, Yiyang Wu wrote:
>>> +/// Lookup function for dentry-inode lookup replacement.
>>> +#[no_mangle]
>>> +pub unsafe extern "C" fn erofs_lookup_rust(
>>> +    k_inode: NonNull<inode>,
>>> +    dentry: NonNull<dentry>,
>>> +    _flags: c_uint,
>>> +) -> *mut c_void {
>>> +    // SAFETY: We are sure that the inode is a Kernel Inode since alloc_inode is called
>>> +    let erofs_inode = unsafe { &*container_of!(k_inode.as_ptr(), KernelInode, k_inode) };
>>
>> 	Ummm...  A wrapper would be highly useful.  And the reason why
>> it's safe is different - your function is called only via ->i_op->lookup,
>> the is only one instance of inode_operations that has that ->lookup
>> method, and the only place where an inode gets ->i_op set to that
>> is erofs_fill_inode().  Which is always passed erofs_inode::vfs_inode.
>>
> So my original intention behind this is that all vfs_inodes come from
> that erofs_iget function and it's always gets initialized in this case
> And this just followes the same convention here. I can document this
> more precisely.

I think Al just would like a wrapper here, like the current C EROFS_I().

>>> +    // SAFETY: The super_block is initialized when the erofs_alloc_sbi_rust is called.
>>> +    let sbi = erofs_sbi(unsafe { NonNull::new(k_inode.as_ref().i_sb).unwrap() });
>>
>> 	Again, that calls for a wrapper - this time not erofs-specific;
>> inode->i_sb is *always* non-NULL, is assign-once and always points
>> to live struct super_block instance at least until the call of
>> destroy_inode().
>>
> 
> Will be modified correctly, I'm not a native speaker and I just can't
> find a better way, I will take my note here.

Same here, like the current EROFS_I_SB().

>>> +    // SAFETY: this is backed by qstr which is c representation of a valid slice.
>>
>> 	What is that sentence supposed to mean?  Nevermind "why is it correct"...
>>
>>> +    let name = unsafe {
>>> +        core::str::from_utf8_unchecked(core::slice::from_raw_parts(
>>> +            dentry.as_ref().d_name.name,
>>> +            dentry.as_ref().d_name.__bindgen_anon_1.__bindgen_anon_1.len as usize,
>>

...

> 
>> 	Current erofs_lookup() (and your version as well) *is* indeed
>> safe in that respect, but the proof (from filesystem POV) is that "it's
>> called only as ->lookup() instance, so dentry is initially unhashed
>> negative and will remain such until it's passed to d_splice_alias();
>> until that point it is guaranteed to have ->d_name and ->d_parent stable".

Agreed.

>>
>> 	Note that once you _have_ called d_splice_alias(), you can't
>> count upon the ->d_name stability - or, indeed, upon ->d_name.name you've
>> sampled still pointing to allocated memory.
>>
>> 	For directory-modifying methods it's "stable, since parent is held
>> exclusive".  Some internal function called from different environments?
>> Well...  Swear, look through the call graph and see what can be proven
>> for each.
> 
> Sorry for my ignorance.
> I mean i just borrowed the code from the fs/erofs/namei.c and i directly
> translated that into Rust code. That might be a problem that also
> exists in original working C code.

As for EROFS (an immutable fs), I think after d_splice_alias(), d_name is
still stable (since we don't have rename semantics likewise for now).

But as the generic filesystem POV, d_name access is actually tricky under
RCU walk path indeed.

> 
>> 	Expressing that kind of fun in any kind of annotations (Rust type
>> system included) is not pleasant.  _Probably_ might be handled by a type
>> that would be a dentry pointer with annotation along the lines "->d_name
>> and ->d_parent of that one are stable".  Then e.g. ->lookup() would
>> take that thing as an argument and d_splice_alias() would consume it.
>> ->mkdir() would get the same thing, etc.  I hadn't tried to get that
>> all way through (the amount of annotation churn in existing filesystems
>> would be high and hard to split into reviewable patch series), so there
>> might be dragons - and there definitely are places where the stability is
>> proven in different ways (e.g. if dentry->d_lock is held, we have the damn
>> thing stable; then there's a "take a safe snapshot of name" API; etc.).
> 
> That's kinda interesting, I originally thought that VFS will make sure
> its d_name / d_parent is stable in the first place.
> Again, I just don't have a full picture or understanding of VFS and my
> code is just basic translation of original C code, Maybe we can address
> this later.

d_alloc will allocate an unhashed dentry which is almost unrecognized
by VFS dcache (d_name is stable of course).

After d_splice_alias() and d_add(), rename() could change d_name.  So
either we take d_lock or with rcu_read_lock() to take a snapshot of
d_name in the RCU walk path.  That is my overall understanding.

But for EROFS, since we don't have rename, so it doesn't matter.

Thanks,
Gao Xiang