lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250407104559.GB395307@horms.kernel.org>
Date: Mon, 7 Apr 2025 11:45:59 +0100
From: Simon Horman <horms@...nel.org>
To: Kuniyuki Iwashima <kuniyu@...zon.com>
Cc: "David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Willem de Bruijn <willemb@...gle.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Steve French <sfrench@...ba.org>,
	Enzo Matsumiya <ematsumiya@...e.de>,
	Wang Zhaolong <wangzhaolong1@...wei.com>,
	Kuniyuki Iwashima <kuni1840@...il.com>, netdev@...r.kernel.org,
	stable@...r.kernel.org
Subject: Re: [PATCH v2 net] net: Fix null-ptr-deref by
 sock_lock_init_class_and_name() and rmmod.

On Thu, Apr 03, 2025 at 01:34:31PM -0700, Kuniyuki Iwashima wrote:
> When I ran the repro [0] and waited a few seconds, I observed two
> LOCKDEP splats: a warning immediately followed by a null-ptr-deref. [1]
> 
> Reproduction Steps:
> 
>   1) Mount CIFS
>   2) Add an iptables rule to drop incoming FIN packets for CIFS
>   3) Unmount CIFS
>   4) Unload the CIFS module
>   5) Remove the iptables rule
> 
> At step 3), the CIFS module calls sock_release() for the underlying
> TCP socket, and it returns quickly.  However, the socket remains in
> FIN_WAIT_1 because incoming FIN packets are dropped.
> 
> At this point, the module's refcnt is 0 while the socket is still
> alive, so the following rmmod command succeeds.
> 
>   # ss -tan
>   State      Recv-Q Send-Q Local Address:Port  Peer Address:Port
>   FIN-WAIT-1 0      477        10.0.2.15:51062   10.0.0.137:445
> 
>   # lsmod | grep cifs
>   cifs                 1159168  0
> 
> This highlights a discrepancy between the lifetime of the CIFS module
> and the underlying TCP socket.  Even after CIFS calls sock_release()
> and it returns, the TCP socket does not die immediately in order to
> close the connection gracefully.
> 
> While this is generally fine, it causes an issue with LOCKDEP because
> CIFS assigns a different lock class to the TCP socket's sk->sk_lock
> using sock_lock_init_class_and_name().
> 
> Once an incoming packet is processed for the socket or a timer fires,
> sk->sk_lock is acquired.
> 
> Then, LOCKDEP checks the lock context in check_wait_context(), where
> hlock_class() is called to retrieve the lock class.  However, since
> the module has already been unloaded, hlock_class() logs a warning
> and returns NULL, triggering the null-ptr-deref.
> 
> If LOCKDEP is enabled, we must ensure that a module calling
> sock_lock_init_class_and_name() (CIFS, NFS, etc) cannot be unloaded
> while such a socket is still alive to prevent this issue.
> 
> Let's hold the module reference in sock_lock_init_class_and_name()
> and release it when the socket is freed in __sk_destruct().
> 
> Note that sock_lock_init() clears sk->sk_owner for svc_create_socket()
> that calls sock_lock_init_class_and_name() for a listening socket,
> which clones a socket by sk_clone_lock() without GFP_ZERO.
> 
> [0]:
> CIFS_SERVER="10.0.0.137"
> CIFS_PATH="//${CIFS_SERVER}/Users/Administrator/Desktop/CIFS_TEST"
> DEV="enp0s3"
> CRED="/root/WindowsCredential.txt"
> 
> MNT=$(mktemp -d /tmp/XXXXXX)
> mount -t cifs ${CIFS_PATH} ${MNT} -o vers=3.0,credentials=${CRED},cache=none,echo_interval=1
> 
> iptables -A INPUT -s ${CIFS_SERVER} -j DROP
> 
> for i in $(seq 10);
> do
>     umount ${MNT}
>     rmmod cifs
>     sleep 1
> done
> 
> rm -r ${MNT}
> 
> iptables -D INPUT -s ${CIFS_SERVER} -j DROP
> 
> [1]:
> DEBUG_LOCKS_WARN_ON(1)
> WARNING: CPU: 10 PID: 0 at kernel/locking/lockdep.c:234 hlock_class (kernel/locking/lockdep.c:234 kernel/locking/lockdep.c:223)
> Modules linked in: cifs_arc4 nls_ucs2_utils cifs_md4 [last unloaded: cifs]
> CPU: 10 UID: 0 PID: 0 Comm: swapper/10 Not tainted 6.14.0 #36
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> RIP: 0010:hlock_class (kernel/locking/lockdep.c:234 kernel/locking/lockdep.c:223)
> ...
> Call Trace:
>  <IRQ>
>  __lock_acquire (kernel/locking/lockdep.c:4853 kernel/locking/lockdep.c:5178)
>  lock_acquire (kernel/locking/lockdep.c:469 kernel/locking/lockdep.c:5853 kernel/locking/lockdep.c:5816)
>  _raw_spin_lock_nested (kernel/locking/spinlock.c:379)
>  tcp_v4_rcv (./include/linux/skbuff.h:1678 ./include/net/tcp.h:2547 net/ipv4/tcp_ipv4.c:2350)
> ...
> 
> BUG: kernel NULL pointer dereference, address: 00000000000000c4
>  PF: supervisor read access in kernel mode
>  PF: error_code(0x0000) - not-present page
> PGD 0
> Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 10 UID: 0 PID: 0 Comm: swapper/10 Tainted: G        W          6.14.0 #36
> Tainted: [W]=WARN
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> RIP: 0010:__lock_acquire (kernel/locking/lockdep.c:4852 kernel/locking/lockdep.c:5178)
> Code: 15 41 09 c7 41 8b 44 24 20 25 ff 1f 00 00 41 09 c7 8b 84 24 a0 00 00 00 45 89 7c 24 20 41 89 44 24 24 e8 e1 bc ff ff 4c 89 e7 <44> 0f b6 b8 c4 00 00 00 e8 d1 bc ff ff 0f b6 80 c5 00 00 00 88 44
> RSP: 0018:ffa0000000468a10 EFLAGS: 00010046
> RAX: 0000000000000000 RBX: ff1100010091cc38 RCX: 0000000000000027
> RDX: ff1100081f09ca48 RSI: 0000000000000001 RDI: ff1100010091cc88
> RBP: ff1100010091c200 R08: ff1100083fe6e228 R09: 00000000ffffbfff
> R10: ff1100081eca0000 R11: ff1100083fe10dc0 R12: ff1100010091cc88
> R13: 0000000000000001 R14: 0000000000000000 R15: 00000000000424b1
> FS:  0000000000000000(0000) GS:ff1100081f080000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000000000c4 CR3: 0000000002c4a003 CR4: 0000000000771ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
>  <IRQ>
>  lock_acquire (kernel/locking/lockdep.c:469 kernel/locking/lockdep.c:5853 kernel/locking/lockdep.c:5816)
>  _raw_spin_lock_nested (kernel/locking/spinlock.c:379)
>  tcp_v4_rcv (./include/linux/skbuff.h:1678 ./include/net/tcp.h:2547 net/ipv4/tcp_ipv4.c:2350)
>  ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205 (discriminator 1))
>  ip_local_deliver_finish (./include/linux/rcupdate.h:878 net/ipv4/ip_input.c:234)
>  ip_sublist_rcv_finish (net/ipv4/ip_input.c:576)
>  ip_list_rcv_finish (net/ipv4/ip_input.c:628)
>  ip_list_rcv (net/ipv4/ip_input.c:670)
>  __netif_receive_skb_list_core (net/core/dev.c:5939 net/core/dev.c:5986)
>  netif_receive_skb_list_internal (net/core/dev.c:6040 net/core/dev.c:6129)
>  napi_complete_done (./include/linux/list.h:37 ./include/net/gro.h:519 ./include/net/gro.h:514 net/core/dev.c:6496)
>  e1000_clean (drivers/net/ethernet/intel/e1000/e1000_main.c:3815)
>  __napi_poll.constprop.0 (net/core/dev.c:7191)
>  net_rx_action (net/core/dev.c:7262 net/core/dev.c:7382)
>  handle_softirqs (kernel/softirq.c:561)
>  __irq_exit_rcu (kernel/softirq.c:596 kernel/softirq.c:435 kernel/softirq.c:662)
>  irq_exit_rcu (kernel/softirq.c:680)
>  common_interrupt (arch/x86/kernel/irq.c:280 (discriminator 14))
>   </IRQ>
>  <TASK>
>  asm_common_interrupt (./arch/x86/include/asm/idtentry.h:693)
> RIP: 0010:default_idle (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:92 arch/x86/kernel/process.c:744)
> Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d c3 2b 15 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
> RSP: 0018:ffa00000000ffee8 EFLAGS: 00000202
> RAX: 000000000000640b RBX: ff1100010091c200 RCX: 0000000000061aa4
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff812f30c5
> RBP: 000000000000000a R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000000
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>  ? do_idle (kernel/sched/idle.c:186 kernel/sched/idle.c:325)
>  default_idle_call (./include/linux/cpuidle.h:143 kernel/sched/idle.c:118)
>  do_idle (kernel/sched/idle.c:186 kernel/sched/idle.c:325)
>  cpu_startup_entry (kernel/sched/idle.c:422 (discriminator 1))
>  start_secondary (arch/x86/kernel/smpboot.c:315)
>  common_startup_64 (arch/x86/kernel/head_64.S:421)
>  </TASK>
> Modules linked in: cifs_arc4 nls_ucs2_utils cifs_md4 [last unloaded: cifs]
> CR2: 00000000000000c4
> 
> Fixes: ed07536ed673 ("[PATCH] lockdep: annotate nfs/nfsd in-kernel sockets")
> Signed-off-by: Kuniyuki Iwashima <kuniyu@...zon.com>
> Cc: stable@...r.kernel.org
> ---
> v2:
>   * Clear sk_owner in sock_lock_init()
>   * Define helper under the same #if guard
>   * Remove redundant null check before module_put()
> 
> v1: https://lore.kernel.org/netdev/20250403020837.51664-1-kuniyu@amazon.com/
> ---
>  include/net/sock.h | 38 ++++++++++++++++++++++++++++++++++++--
>  net/core/sock.c    |  4 ++++
>  2 files changed, 40 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 8daf1b3b12c6..4216d7d86150 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -547,6 +547,10 @@ struct sock {
>  	struct rcu_head		sk_rcu;
>  	netns_tracker		ns_tracker;
>  	struct xarray		sk_user_frags;
> +
> +#if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(CONFIG_MODULES)
> +	struct module		*sk_owner;
> +#endif

Not a proper review, but FWIIW, sk_owner should be added to the Kernel doc
for struct sock.

>  };
>  
>  struct sock_bh_locked {

...

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ