linux-kernel - Re: [PATCH V3] smb: client: Fix mount deadlock by avoiding super block iteration in DFS reconnect

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f0490a5be0ebe2eac03be48ec787e1ec@manguebit.org>
Date: Thu, 14 Aug 2025 10:47:41 -0300
From: Paulo Alcantara <pc@...guebit.org>
To: Wang Zhaolong <wangzhaolong@...weicloud.com>, sfrench@...ba.org
Cc: linux-cifs@...r.kernel.org, samba-technical@...ts.samba.org,
 linux-kernel@...r.kernel.org, chengzhihao1@...wei.com,
 yi.zhang@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH V3] smb: client: Fix mount deadlock by avoiding super
 block iteration in DFS reconnect

Wang Zhaolong <wangzhaolong@...weicloud.com> writes:

> An AA deadlock occurs when network interruption during mount triggers
> DFS reconnection logic that calls iterate_supers_type().
>
> The detailed call process is as follows:
>
>       mount.cifs
> -------------------------
> path_mount
>   do_new_mount
>     vfs_get_tree
>       smb3_get_tree
>         cifs_smb3_do_mount
>           sget
>             alloc_super
>               down_write_nested(&s->s_umount, ..);  // Hold lock
>           cifs_root_iget
>             cifs_get_inode_info
>               smb2_query_path_info
>                 smb2_compound_op
>                   SMB2_open_init
>                     smb2_plain_req_init
>                       smb2_reconnect           // Trigger reconnection
>                         cifs_tree_connect
>                           cifs_get_dfs_tcon_super
>                             __cifs_get_super
>                               iterate_supers_type
>                                 super_lock_shared
>                                   super_lock
>                                     __super_lock
>                                       down_read(&sb->s_umount); // Deadlock
>     do_new_mount_fc
>       up_write(&sb->s_umount);  // Release lock
>
> During mount phase, if reconnection is triggered, the foreground mount
> process may enter smb2_reconnect prior to the reconnect worker being
> scheduled, leading to a deadlock when subsequent DFS tree connect
> attempts reacquire the s_umount lock.
>
> The essential condition for triggering the issue is that the API
> iterate_supers_type() reacquires the s_umount lock. Therefore, one
> possible solution is to avoid using iterate_supers_type() and instead
> directly access the superblock through internal data structures.
>
> This patch fixes the problem by:
> - Add vfs_sb back-pointer to cifs_sb_info for direct access
> - Protect list traversal with existing tcon->sb_list_lock
> - Use atomic operations to safely manage super block references
> - Remove complex callback-based iteration in favor of simple loop
> - Rename cifs_put_tcp_super() to cifs_put_super() to avoid confusion
>
> Fixes: 3ae872de4107 ("smb: client: fix shared DFS root mounts with different prefixes")
> Signed-off-by: Wang Zhaolong <wangzhaolong@...weicloud.com>
> ---
>
> V3:
>  - Adjust the trace diagram for the super_lock_shared() section to align with
>    the latest mainline call flow. 
> V2:
>  - Adjust the trace diagram in the commit message to indicate when the lock
>    is released
>
>  fs/smb/client/cifs_fs_sb.h |  1 +
>  fs/smb/client/cifsfs.c     |  1 +
>  fs/smb/client/cifsproto.h  |  2 +-
>  fs/smb/client/dfs.c        |  2 +-
>  fs/smb/client/misc.c       | 84 ++++++++++++++------------------------
>  5 files changed, 34 insertions(+), 56 deletions(-)

Thanks for the patch.  Unfortunately this introduces a NULL ptr
dereference with DFS multiuser mounts:

[  725.383434] ==================================================================                                                  
[  725.383647] BUG: KASAN: null-ptr-deref in cifs_tree_connect+0x23c/0xc10 [cifs]                                                  
[  725.383983] Read of size 8 at addr 0000000000000680 by task kworker/2:1/82                                                      
[  725.384189]                                                                                                                     
[  725.384241] CPU: 2 UID: 0 PID: 82 Comm: kworker/2:1 Not tainted 6.17.0-rc1 #2 PREEMPT(voluntary)                                
[  725.384245] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-5.fc42 04/01/2014                                   
[  725.384246] Workqueue: cifsiod smb2_reconnect_server [cifs]                                                                     
[  725.384360] Call Trace:                                                                                                         
[  725.384361]  <TASK>                                                                                                             
[  725.384362]  dump_stack_lvl+0x5d/0x80                                                                                           
[  725.384368]  kasan_report+0xdf/0x1a0                                                                                            
[  725.384371]  ? cifs_tree_connect+0x23c/0xc10 [cifs]                                                                             
[  725.384543]  cifs_tree_connect+0x23c/0xc10 [cifs]                                                                               
[  725.384650]  ? __lock_acquire+0xc46/0x19d0                                                                                      
[  725.384653]  ? rcu_lockdep_current_cpu_online+0x62/0xb0                                                                         
[  725.384656]  ? __pfx_cifs_tree_connect+0x10/0x10 [cifs]                                                                         
[  725.384762]  ? lock_acquire+0x157/0x2e0                                                                                         
[  725.384765]  ? lock_acquire+0x157/0x2e0                                                                                         
[  725.384767]  ? __pfx_do_raw_spin_trylock+0x10/0x10                                                                              
[  725.384769]  ? find_held_lock+0x32/0x90                                                                                         
[  725.384772]  ? cifs_mark_open_files_invalid+0xe5/0x130 [cifs]                                                                   
[  725.384887]  ? local_clock_noinstr+0xd/0xd0                                                                                     
[  725.384889]  ? smb2_reconnect+0x31f/0xe10 [cifs]                                                                                
[  725.385004]  ? local_clock+0x15/0x30                                                                                            
[  725.385006]  ? lock_release+0x29b/0x390                                                                                         
[  725.385009]  smb2_reconnect+0x347/0xe10 [cifs]                                                                                  
[  725.385116]  smb2_reconnect_server+0x413/0xd30 [cifs]                                                                           
[  725.385225]  ? __pfx_smb2_reconnect_server+0x10/0x10 [cifs]                                                                     
[  725.385331]  ? local_clock_noinstr+0xd/0xd0                                                                                     
[  725.385333]  ? local_clock+0x15/0x30                                                                                            
[  725.385334]  ? lock_release+0x29b/0x390                                                                                         
[  725.385337]  process_one_work+0x4c5/0xa10                                                                                       
[  725.385341]  ? __pfx_process_one_work+0x10/0x10                                                                                 
[  725.385343]  ? __list_add_valid_or_report+0x37/0x120                                                                            
[  725.385349]  worker_thread+0x2f1/0x5a0                                                                                          
[  725.385351]  ? __kthread_parkme+0xde/0x100                                                                                      
[  725.385354]  ? __pfx_worker_thread+0x10/0x10                                                                                    
[  725.385356]  kthread+0x1fe/0x380                                                                                                
[  725.385358]  ? kthread+0x10f/0x380                                                                                              
[  725.385360]  ? __pfx_kthread+0x10/0x10                                                                                          
[  725.385363]  ? local_clock_noinstr+0x3e/0xd0                                                                                    
[  725.385365]  ? ret_from_fork+0x1b/0x1f0                                                                                         
[  725.385367]  ? local_clock+0x15/0x30                                                                                            
[  725.385369]  ? lock_release+0x29b/0x390                                                                                         
[  725.385370]  ? rcu_is_watching+0x20/0x50                                                                                        
[  725.385373]  ? __pfx_kthread+0x10/0x10                                                                                          
[  725.385375]  ret_from_fork+0x15b/0x1f0                                                                                          
[  725.385377]  ? __pfx_kthread+0x10/0x10                                                                                          
[  725.385380]  ret_from_fork_asm+0x1a/0x30
[  725.385384]  </TASK>

$ ./scripts/faddr2line --list fs/smb/client/cifs.o cifs_tree_connect+0x23c
cifs_tree_connect+0x23c/0xc10:

CIFS_SB at /home/pc/g/linux/fs/smb/client/cifsglob.h:1624
 1619   }
 1620 
 1621   static inline struct cifs_sb_info *
 1622   CIFS_SB(struct super_block *sb)
 1623   {
>1624<          return sb->s_fs_info;
 1625   }
 1626 
 1627   static inline struct cifs_sb_info *
 1628   CIFS_FILE_SB(struct file *file)
 1629   {

(inlined by) cifs_tree_connect at /home/pc/g/linux/fs/smb/client/dfs.c:435
 430                    goto out;
 431            }
 432 
 433            sb = cifs_get_dfs_tcon_super(tcon);
 434            if (!IS_ERR(sb))
>435<                   cifs_sb = CIFS_SB(sb);
 436 
 437            /* Tree connect to last share in @tcon->tree_name if no DFS referral */
 438            if (!server->leaf_fullpath ||
 439                dfs_cache_noreq_find(server->leaf_fullpath + 1, &ref, &tl)) {
 440                    rc = ops->tree_connect(xid, tcon->ses, tcon->tree_name,

You can reproduce with the following:

$ kinit administrator@...DA.TEST
$ su testuser -c 'kinit administrator@...DA.TEST'
$ mount.cifs //w22-dc1.zelda.test/dfstest/link2 /mnt/1 -o sec=krb5,multiuser,echo_interval=10
$ mount -t cifs
//w22-dc1.zelda.test/dfstest/link2 on /mnt/1 type cifs (rw,relatime,vers=3.1.1,sec=krb5,cruid=0,cache=strict,upcall_target=app,multiuser,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.124.33,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,noperm,reparse=nfs,nativesocket,symlink=native,rsize=4194304,wsize=4194304,bsize=1048576,retrans=1,echo_interval=10,actimeo=1,closetimeo=1)
$ su testuser -c 'ls /mnt/1'
dir1  dir10  dir3  dir5  dir6  dir8  dir9  target1_file.txt  tsub
# disconnect target server 192.168.124.33 and then wait for oops...

The problem seems related to the tcon created for testuser not having a
link to the CIFS superblock, hence the oops when reconnecting the tcon.

Could you please verify?  Thanks.