[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200725055840.GD1047853@kroah.com>
Date: Sat, 25 Jul 2020 07:58:40 +0200
From: Greg KH <greg@...ah.com>
To: Dexuan Cui <decui@...rosoft.com>
Cc: Al Viro <viro@...iv.linux.org.uk>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"stable@...r.kernel.org" <stable@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>,
'Eric Dumazet' <edumazet@...gle.com>,
'Willy Tarreau' <w@....eu>,
Joseph Salisbury <Joseph.Salisbury@...rosoft.com>,
Michael Kelley <mikelley@...rosoft.com>
Subject: Re: UDP data corruption in v4.4
On Sat, Jul 25, 2020 at 02:21:06AM +0000, Dexuan Cui wrote:
> Hi,
> The v4.4 stable kernel (currently it's v4.4.231) lacks this bugfix:
> 327868212381 ("make skb_copy_datagram_msg() et.al. preserve ->msg_iter on error")
> , as a result, the v4.4 kernel can deliver corrupt data to the application
> when a corrupt UDP packet is closely followed by a valid UDP packet:
> when the same invocation of the recvmsg() syscall delivers the corrupt
> packet's UDP payload to the application's receive buffer, it provides the
> UDP payload length and the "from IP/Port" of the valid packet to the
> application -- this mismatch makes the issue worse.
>
> Details:
>
> For a UDP packet longer than 76 bytes (see the v5.8-rc6 kernel's
> include/linux/skbuff.h:3951), Linux delays the UDP checksum verification
> until the application invokes the syscall recvmsg().
>
> In the recvmsg() syscall handler, while Linux is copying the UDP payload
> to the application's memory, it calculates the UDP checksum. If the
> calculated checksum doesn't match the received checksum, Linux drops the
> corrupt UDP packet, and then starts to process the next packet (if any),
> and if the next packet is valid (i.e. the checksum is correct), Linux will
> copy the valid UDP packet's payload to the application's receiver buffer.
>
> The bug is: before Linux starts to copy the valid UDP packet, the data
> structure used to track how many more bytes should be copied to the
> application memory is not reset to what it was when the application just
> entered the kernel by the syscall! Consequently, only a small portion or
> none of the valid packet's payload is copied to the application's receive
> buffer, and later when the application exits from the kernel, actually
> most of the application's receive buffer contains the payload of the
> corrupt packet while recvmsg() returns the length of the UDP payload of
> the valid packet.
>
> For the mainline kernel, the bug was fixed by Al Viro in the commit
> 327868212381, but unluckily the bugfix is only backported to the
> upstream v4.9+ kernels. I hope the bugfix can be backported to the
> v4.4 stable kernel, since it's a "longterm" kernel and is still used by
> some Linux distros.
>
> It turns out backporting 327868212381 to v4.4 means that some
> Supporting patches must be backported first, so the overall changes
> are pretty big...
>
> I made the below one-line workaround patch to force the recvmsg() syscall
> handler to return to the userspace when Linux detects a corrupt UDP packet,
> so the application will invoke the syscall again to receive the following valid
> UDP packet (note: the patch may not work well with blocking sockets, for
> which typically the application doesn't expect an error of -EAGAIN. I
> guess it would be safer to return -EINTR instead?):
>
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1367,6 +1367,7 @@ csum_copy_err:
> /* starting over for a new packet, but check if we need to yield */
> cond_resched();
> msg->msg_flags &= ~MSG_TRUNC;
> + return -EAGAIN;
> goto try_again;
> }
>
>
> Eric Dumazet made an alternative that performs the csum validation earlier:
>
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1589,8 +1589,7 @@ int udp_queue_rcv_skb(struct sock *sk, struct
> sk_buff *skb)
> }
> }
>
> - if (rcu_access_pointer(sk->sk_filter) &&
> - udp_lib_checksum_complete(skb))
> + if (udp_lib_checksum_complete(skb))
> goto csum_error;
>
> if (sk_rcvqueues_full(sk, sk->sk_rcvbuf)) {
>
> I personally like Eric's fix and IMHO we'd better have it in v4.4 rather than
> trying to backport 327868212381.
Does Eric's fix work with your testing? If so, great, can you turn it
into something I can apply to the 4.4.y stable tree and send it to
stable@...r.kernel.org?
thanks,
greg k-h
Powered by blists - more mailing lists