[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cb5993a4-9b00-2c9a-60ca-9cfa4c5c15b3@gmail.com>
Date: Mon, 4 Mar 2019 07:51:07 -0800
From: Eric Dumazet <eric.dumazet@...il.com>
To: Vasily Averin <vvs@...tuozzo.com>,
Eric Dumazet <edumazet@...gle.com>
Cc: netdev <netdev@...r.kernel.org>, Al Viro <viro@...iv.linux.org.uk>
Subject: Re: [PATCH] tcp: detect use sendpage for slab-based objects
On 03/04/2019 04:58 AM, Vasily Averin wrote:
> On 2/21/19 7:00 PM, Eric Dumazet wrote:
>> On Thu, Feb 21, 2019 at 7:30 AM Vasily Averin <vvs@...tuozzo.com> wrote:
>>>
>>> There was few incidents when XFS over network block device generates
>>> IO requests with slab-based metadata. If these requests are processed
>>> via sendpage path tcp_sendpage() calls skb_can_coalesce() and merges
>>> neighbour slab objects into one skb fragment.
>>>
>>> If receiving side is located on the same host tcp_recvmsg() can trigger
>>> following BUG_ON
>>> usercopy: kernel memory exposure attempt detected
>>> from XXXXXX (kmalloc-512) (1024 bytes)
>>>
>>> This patch helps to detect the reason of similar incidents on sending side.
>>>
>>> Signed-off-by: Vasily Averin <vvs@...tuozzo.com>
>>> ---
>>> net/ipv4/tcp.c | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>> index 2079145a3b7c..cf9572f4fc0f 100644
>>> --- a/net/ipv4/tcp.c
>>> +++ b/net/ipv4/tcp.c
>>> @@ -996,6 +996,7 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
>>> goto wait_for_memory;
>>>
>>> if (can_coalesce) {
>>> + WARN_ON_ONCE(PageSlab(page));
>>
>> Please use VM_WARN_ON_ONCE() to make this a nop for CONFIG_VM_DEBUG=n
>> Also the whole tcp_sendpage() should be protected, not only the coalescing part.
>> (The get_page() done few lines later should not be attempted either)
>
> Eric, what do you think about following patch?
> I validate its backported version on RHEL7 based OpenVZ kernel before sending to mainline.
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index cf3c5095c10e..7be7b6abe8b5 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -943,6 +943,11 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
> ssize_t copied;
> long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
>
> + if (PageSlab(page)) {
> + VM_WARN_ONCE(true, "sendpage should not handle Slab objects,"
> + " please fix callers\n");
> + return sock_no_sendpage_locked(sk, page, offset, size, flags);
> + }
> /* Wait for a connection to finish. One exception is TCP Fast Open
> * (passive side) where data is allowed to be sent before a connection
> * is fully established.
>
There are at least four problems with this approach :
1) VM_WARN_ONCE() might be a NOP, and if not, it is simply some lines in syslog,
among thousands.
2) Falling back will give no incentive for callers to fix their code.
3) slowing down TCP, just because of some weird kernel-users.
I agree to add sanity check for everything user space can think of (aka syzbot),
but kernel users need to be fixed, without adding code in TCP.
4) sendpage() API is providing one page at a time.
We therefore call very expensive lock_sock() and release_sock() for every page.
sendfile() is sub optimal (compared to sendmsg(MSG_ZEROCOPY))
There is an effort to provide batches of pages per round.
Your patch would cancel this effort, or make it very complicated.
Powered by blists - more mailing lists