netdev - Re: [PATCH net-next] net: stream: don't purge sk_error_queue in sk_stream_kill

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bb4e66df-7639-0797-49ed-0909fb83a85a@gmail.com>
Date:   Fri, 15 Oct 2021 12:59:00 -0700
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     Jakub Kicinski <kuba@...nel.org>, davem@...emloft.net
Cc:     netdev@...r.kernel.org
Subject: Re: [PATCH net-next] net: stream: don't purge sk_error_queue in
 sk_stream_kill_queues()



On 10/15/21 6:37 AM, Jakub Kicinski wrote:
> sk_stream_kill_queues() can be called on close when there are
> still outstanding skbs to transmit. Those skbs may try to queue
> notifications to the error queue (e.g. timestamps).
> If sk_stream_kill_queues() purges the queue without taking
> its lock the queue may get corrupted, and skbs leaked.
> 
> This shows up as a warning about an rmem leak:
> 
> WARNING: CPU: 24 PID: 0 at net/ipv4/af_inet.c:154 inet_sock_destruct+0x...
> 
> The leak is always a multiple of 0x300 bytes (the value is in
> %rax on my builds, so RAX: 0000000000000300). 0x300 is truesize of
> an empty sk_buff. Indeed if we dump the socket state at the time
> of the warning the sk_error_queue is often (but not always)
> corrupted. The ->next pointer points back at the list head,
> but not the ->prev pointer. Indeed we can find the leaked skb
> by scanning the kernel memory for something that looks like
> an skb with ->sk = socket in question, and ->truesize = 0x300.
> The contents of ->cb[] of the skb confirms the suspicion that
> it is indeed a timestamp notification (as generated in
> __skb_complete_tx_timestamp()).
> 
> Removing purging of sk_error_queue should be okay, since
> inet_sock_destruct() does it again once all socket refs
> are gone. Eric suggests this may cause sockets that go
> thru disconnect() to maintain notifications from the
> previous incarnations of the socket, but that should be
> okay since the race was there anyway, and disconnect()
> is not exactly dependable.
> 
> Thanks to Jonathan Lemon and Omar Sandoval for help at various
> stages of tracing the issue.
> 
> Fixes: cb9eff097831 ("net: new user space API for time stamping of incoming and outgoing packets")
> Signed-off-by: Jakub Kicinski <kuba@...nel.org>
> ---
> v1: delete the purge completely
> 
> Sorry for the delay from RFC, took a while to get enough
> production signal to confirm the fix.
> ---
>  net/core/stream.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/net/core/stream.c b/net/core/stream.c
> index e09ffd410685..06b36c730ce8 100644
> --- a/net/core/stream.c
> +++ b/net/core/stream.c
> @@ -195,9 +195,6 @@ void sk_stream_kill_queues(struct sock *sk)
>  	/* First the read buffer. */
>  	__skb_queue_purge(&sk->sk_receive_queue);
>  
> -	/* Next, the error queue. */
> -	__skb_queue_purge(&sk->sk_error_queue);
> -
>  	/* Next, the write queue. */
>  	WARN_ON(!skb_queue_empty(&sk->sk_write_queue));
>  
> 

Thanks Jakub !

Reviewed-by: Eric Dumazet <edumazet@...gle.com>