[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aD6Y7b8xnObUjeJn@mev-dev.igk.intel.com>
Date: Tue, 3 Jun 2025 08:40:45 +0200
From: Michal Swiatkowski <michal.swiatkowski@...ux.intel.com>
To: Antonio Quartulli <antonio@...nvpn.net>
Cc: netdev@...r.kernel.org, Sabrina Dubroca <sd@...asysnail.net>,
"David S . Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Oleksandr Natalenko <oleksandr@...alenko.name>,
Qingfang Deng <dqfext@...il.com>,
Gert Doering <gert@...enie.muc.de>
Subject: Re: [PATCH net 2/5] ovpn: ensure sk is still valid during cleanup
On Fri, May 30, 2025 at 12:12:51PM +0200, Antonio Quartulli wrote:
> Removing a peer while userspace attempts to close its transport
> socket triggers a race condition resulting in the following
> crash:
>
> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
> KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
> CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
> Tainted: [O]=OOT_MODULE
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
> Workqueue: events ovpn_peer_keepalive_work [ovpn]
> RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
> Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
> RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
> RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
> RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
> RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
> R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
> R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
> FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
> PKRU: 55555554
> Call Trace:
> <TASK>
> unlock_ovpn+0x8b/0xe0 [ovpn]
> ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
> ? ovpn_peers_free+0x780/0x780 [ovpn]
> ? lock_acquire+0x56/0x70
> ? process_one_work+0x888/0x1740
> process_one_work+0x933/0x1740
> ? pwq_dec_nr_in_flight+0x10b0/0x10b0
> ? move_linked_works+0x12d/0x2c0
> ? assign_work+0x163/0x270
> worker_thread+0x4d6/0xd90
> ? preempt_count_sub+0x4c/0x70
> ? process_one_work+0x1740/0x1740
> kthread+0x36c/0x710
> ? trace_preempt_on+0x8c/0x1e0
> ? kthread_is_per_cpu+0xc0/0xc0
> ? preempt_count_sub+0x4c/0x70
> ? _raw_spin_unlock_irq+0x36/0x60
> ? calculate_sigpending+0x7b/0xa0
> ? kthread_is_per_cpu+0xc0/0xc0
> ret_from_fork+0x3a/0x80
> ? kthread_is_per_cpu+0xc0/0xc0
> ret_from_fork_asm+0x11/0x20
> </TASK>
> Modules linked in: ovpn(O)
>
> This happens because the peer deletion operation reaches
> ovpn_socket_release() while ovpn_sock->sock (struct socket *)
> and its sk member (struct sock *) are still both valid.
> Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
> becomes NULL, due to the concurrent socket closing triggered
> from userspace.
>
> After having invoked synchronize_rcu(), ovpn_socket_release() will
> attempt dereferencing ovpn_sock->sock->sk, triggering the crash
> reported above.
>
> The reason for accessing sk is that we need to retrieve its
> protocol and continue the cleanup routine accordingly.
>
> This crash can be easily produced by running openvpn userspace in
> client mode with `--keepalive 10 20`, while entirely omitting this
> option on the server side.
> After 20 seconds ovpn will assume the peer (server) to be dead,
> will start removing it and will notify userspace. The latter will
> receive the notification and close the transport socket, thus
> triggering the crash.
>
> To fix the race condition for good, we need to refactor struct ovpn_socket.
> Since ovpn is always only interested in the sock->sk member (struct sock *)
> we can directly hold a reference to it, raher than accessing it via
> its struct socket container.
>
> This means changing "struct socket *ovpn_socket->sock" to
> "struct sock *ovpn_socket->sk".
>
> While acquiring a reference to sk, we can increase its refcounter
> without affecting the socket close()/destroy() notification
> (which we rely on when userspace closes a socket we are using).
>
> By increasing sk's refcounter we know we can dereference it
> in ovpn_socket_release() without incurring in any race condition
> anymore.
>
> ovpn_socket_release() will ultimately decrease the reference
> counter.
>
> Cc: Oleksandr Natalenko <oleksandr@...alenko.name>
> Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
> Reported-by: Qingfang Deng <dqfext@...il.com>
> Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
> Tested-by: Gert Doering <gert@...enie.muc.de>
> Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
> Signed-off-by: Antonio Quartulli <antonio@...nvpn.net>
> ---
> drivers/net/ovpn/io.c | 8 ++---
> drivers/net/ovpn/netlink.c | 16 ++++-----
> drivers/net/ovpn/peer.c | 4 +--
> drivers/net/ovpn/socket.c | 68 +++++++++++++++++++++-----------------
> drivers/net/ovpn/socket.h | 4 +--
> drivers/net/ovpn/tcp.c | 65 ++++++++++++++++++------------------
> drivers/net/ovpn/tcp.h | 3 +-
> drivers/net/ovpn/udp.c | 34 +++++++------------
> drivers/net/ovpn/udp.h | 4 +--
> 9 files changed, 102 insertions(+), 104 deletions(-)
>
Thanks for wide description in commit message. Changes looks fine for
me.
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@...ux.intel.com>
> --
> 2.49.0
Powered by blists - more mailing lists