lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aD6Y7b8xnObUjeJn@mev-dev.igk.intel.com>
Date: Tue, 3 Jun 2025 08:40:45 +0200
From: Michal Swiatkowski <michal.swiatkowski@...ux.intel.com>
To: Antonio Quartulli <antonio@...nvpn.net>
Cc: netdev@...r.kernel.org, Sabrina Dubroca <sd@...asysnail.net>,
	"David S . Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Oleksandr Natalenko <oleksandr@...alenko.name>,
	Qingfang Deng <dqfext@...il.com>,
	Gert Doering <gert@...enie.muc.de>
Subject: Re: [PATCH net 2/5] ovpn: ensure sk is still valid during cleanup

On Fri, May 30, 2025 at 12:12:51PM +0200, Antonio Quartulli wrote:
> Removing a peer while userspace attempts to close its transport
> socket triggers a race condition resulting in the following
> crash:
> 
> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
> KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
> CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G           O        6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
> Tainted: [O]=OOT_MODULE
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
> Workqueue: events ovpn_peer_keepalive_work [ovpn]
> RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
> Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
> RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
> RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
> RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
> RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
> R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
> R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
> FS:  0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
> PKRU: 55555554
> Call Trace:
>  <TASK>
>  unlock_ovpn+0x8b/0xe0 [ovpn]
>  ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
>  ? ovpn_peers_free+0x780/0x780 [ovpn]
>  ? lock_acquire+0x56/0x70
>  ? process_one_work+0x888/0x1740
>  process_one_work+0x933/0x1740
>  ? pwq_dec_nr_in_flight+0x10b0/0x10b0
>  ? move_linked_works+0x12d/0x2c0
>  ? assign_work+0x163/0x270
>  worker_thread+0x4d6/0xd90
>  ? preempt_count_sub+0x4c/0x70
>  ? process_one_work+0x1740/0x1740
>  kthread+0x36c/0x710
>  ? trace_preempt_on+0x8c/0x1e0
>  ? kthread_is_per_cpu+0xc0/0xc0
>  ? preempt_count_sub+0x4c/0x70
>  ? _raw_spin_unlock_irq+0x36/0x60
>  ? calculate_sigpending+0x7b/0xa0
>  ? kthread_is_per_cpu+0xc0/0xc0
>  ret_from_fork+0x3a/0x80
>  ? kthread_is_per_cpu+0xc0/0xc0
>  ret_from_fork_asm+0x11/0x20
>  </TASK>
> Modules linked in: ovpn(O)
> 
> This happens because the peer deletion operation reaches
> ovpn_socket_release() while ovpn_sock->sock (struct socket *)
> and its sk member (struct sock *) are still both valid.
> Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
> becomes NULL, due to the concurrent socket closing triggered
> from userspace.
> 
> After having invoked synchronize_rcu(), ovpn_socket_release() will
> attempt dereferencing ovpn_sock->sock->sk, triggering the crash
> reported above.
> 
> The reason for accessing sk is that we need to retrieve its
> protocol and continue the cleanup routine accordingly.
> 
> This crash can be easily produced by running openvpn userspace in
> client mode with `--keepalive 10 20`, while entirely omitting this
> option on the server side.
> After 20 seconds ovpn will assume the peer (server) to be dead,
> will start removing it and will notify userspace. The latter will
> receive the notification and close the transport socket, thus
> triggering the crash.
> 
> To fix the race condition for good, we need to refactor struct ovpn_socket.
> Since ovpn is always only interested in the sock->sk member (struct sock *)
> we can directly hold a reference to it, raher than accessing it via
> its struct socket container.
> 
> This means changing "struct socket *ovpn_socket->sock" to
> "struct sock *ovpn_socket->sk".
> 
> While acquiring a reference to sk, we can increase its refcounter
> without affecting the socket close()/destroy() notification
> (which we rely on when userspace closes a socket we are using).
> 
> By increasing sk's refcounter we know we can dereference it
> in ovpn_socket_release() without incurring in any race condition
> anymore.
> 
> ovpn_socket_release() will ultimately decrease the reference
> counter.
> 
> Cc: Oleksandr Natalenko <oleksandr@...alenko.name>
> Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
> Reported-by: Qingfang Deng <dqfext@...il.com>
> Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
> Tested-by: Gert Doering <gert@...enie.muc.de>
> Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
> Signed-off-by: Antonio Quartulli <antonio@...nvpn.net>
> ---
>  drivers/net/ovpn/io.c      |  8 ++---
>  drivers/net/ovpn/netlink.c | 16 ++++-----
>  drivers/net/ovpn/peer.c    |  4 +--
>  drivers/net/ovpn/socket.c  | 68 +++++++++++++++++++++-----------------
>  drivers/net/ovpn/socket.h  |  4 +--
>  drivers/net/ovpn/tcp.c     | 65 ++++++++++++++++++------------------
>  drivers/net/ovpn/tcp.h     |  3 +-
>  drivers/net/ovpn/udp.c     | 34 +++++++------------
>  drivers/net/ovpn/udp.h     |  4 +--
>  9 files changed, 102 insertions(+), 104 deletions(-)
> 

Thanks for wide description in commit message. Changes looks fine for
me.

Reviewed-by: Michal Swiatkowski <michal.swiatkowski@...ux.intel.com>

> -- 
> 2.49.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ