[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <875yxrs2sc.fsf@cloudflare.com>
Date: Sat, 03 Jul 2021 19:52:35 +0200
From: Jakub Sitnicki <jakub@...udflare.com>
To: John Fastabend <john.fastabend@...il.com>
Cc: Cong Wang <xiyou.wangcong@...il.com>, netdev@...r.kernel.org,
bpf@...r.kernel.org, Cong Wang <cong.wang@...edance.com>,
Jiang Wang <jiang.wang@...edance.com>,
Daniel Borkmann <daniel@...earbox.net>,
Lorenz Bauer <lmb@...udflare.com>
Subject: Re: [Patch bpf v2] skmsg: check sk_rcvbuf limit before queuing to
ingress_skb
On Thu, Jul 01, 2021 at 06:23 PM CEST, John Fastabend wrote:
[...]
>> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
>> index 9b6160a191f8..a5185c781332 100644
>> --- a/net/core/skmsg.c
>> +++ b/net/core/skmsg.c
>> @@ -854,7 +854,8 @@ static int sk_psock_skb_redirect(struct sk_psock *from, struct sk_buff *skb)
>> return -EIO;
>> }
>> spin_lock_bh(&psock_other->ingress_lock);
>> - if (!sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED)) {
>> + if (!sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED) ||
>> + atomic_read(&sk_other->sk_rmem_alloc) > READ_ONCE(sk_other->sk_rcvbuf)) {
>> spin_unlock_bh(&psock_other->ingress_lock);
>> skb_bpf_redirect_clear(skb);
>> sock_drop(from->sk, skb);
>> @@ -930,7 +931,8 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb,
>> }
>> if (err < 0) {
>> spin_lock_bh(&psock->ingress_lock);
>> - if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) {
>> + if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED) &&
>> + atomic_read(&sk_other->sk_rmem_alloc) <= READ_ONCE(sk_other->sk_rcvbuf)) {
>> skb_queue_tail(&psock->ingress_skb, skb);
>
> We can't just drop the packet in the memory overrun case here. This will
> break TCP because the data will be gone and no one will retransmit.
I don't think it's always the case that data will be gone. But you're
right that it breaks TCP. I was too quick to Ack this patch.
When running with just the verdict prog attached, the -EIO error from
sk_psock_verdict_apply is propagated up to tcp_read_sock. That is, it
maps to 0 bytes used by recv_actor. sk_psock_verdict_recv in this case.
tcp_read_sock, if 0 bytes were used = copied, won't sk_eat_skb. It stays
on sk_receive_queue.
sk->sk_data_ready
sk_psock_verdict_data_ready
->read_sock(..., sk_psock_verdict_recv)
tcp_read_sock (used = copied = 0)
sk_psock_verdict_recv -> ret = 0
sk_psock_verdict_apply -> -EIO
sk_psock_skb_redirect -> -EIO
However, I think this gets us stuck. What if no more data gets queued,
and sk_data_ready doesn't get called again?
Then there is the case when a parser prog is attached. In this case the
skb is really gone if we drop it on redirect.
In sk_psock_strp_read, we ignore the -EIO error from
sk_psock_verdict_apply, and return to tcp_read_sock how many bytes have
been parsed.
sk->sk_data_ready
sk_psock_verdict_data_ready
->read_sock(..., sk_psock_verdict_recv)
tcp_read_sock (used = copied = eaten)
strp_recv -> ret = eaten
__strp_recv -> ret = eaten
strp->cb.rcv_msg -> -EIO
sk_psock_verdict_apply -> -EIO
sk_psock_redirect -> -EIO
Maybe we could put the skb back on strp->skb_head list on error, though?
But again some notification would need to trigger a re-read, or we are
stuck.
Powered by blists - more mailing lists