lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <675b8f8f65e28_ff0720890@john.notmuch>
Date: Thu, 12 Dec 2024 17:36:15 -0800
From: John Fastabend <john.fastabend@...il.com>
To: Jiayuan Chen <mrpre@....com>, 
 John Fastabend <john.fastabend@...il.com>
Cc: bpf@...r.kernel.org, 
 martin.lau@...ux.dev, 
 ast@...nel.org, 
 edumazet@...gle.com, 
 jakub@...udflare.com, 
 davem@...emloft.net, 
 dsahern@...nel.org, 
 kuba@...nel.org, 
 pabeni@...hat.com, 
 linux-kernel@...r.kernel.org, 
 song@...nel.org, 
 andrii@...nel.org, 
 mhal@...x.co, 
 yonghong.song@...ux.dev, 
 daniel@...earbox.net, 
 xiyou.wangcong@...il.com, 
 horms@...nel.org
Subject: Re: [PATCH bpf v2 1/2] bpf: fix wrong copied_seq calculation

Jiayuan Chen wrote:
> On Tue, Dec 10, 2024 at 06:11:26PM -0800, John Fastabend wrote:
> > Jiayuan Chen wrote:
> > > 'sk->copied_seq' was updated in the tcp_eat_skb() function when the
> > > action of a BPF program was SK_REDIRECT. For other actions, like SK_PASS,
> > > the update logic for 'sk->copied_seq' was moved to
> > > tcp_bpf_recvmsg_parser() to ensure the accuracy of the 'fionread' feature.
> > > 
> > > It works for a single stream_verdict scenario, as it also modified
> > > 'sk_data_ready->sk_psock_verdict_data_ready->tcp_read_skb'
> > > to remove updating 'sk->copied_seq'.
> > > 
> > > However, for programs where both stream_parser and stream_verdict are
> > > active(strparser purpose), tcp_read_sock() was used instead of
> > > tcp_read_skb() (sk_data_ready->strp_data_ready->tcp_read_sock)
> > > tcp_read_sock() now still update 'sk->copied_seq', leading to duplicated
> > > updates.
> > > 
> > > In summary, for strparser + SK_PASS, copied_seq is redundantly calculated
> > > in both tcp_read_sock() and tcp_bpf_recvmsg_parser().
> > > 
> > > The issue causes incorrect copied_seq calculations, which prevent
> > > correct data reads from the recv() interface in user-land.
> > > 
> > > Modifying tcp_read_sock() or strparser implementation directly is
> > > unreasonable, as it is widely used in other modules.
> > > 
> > > Here, we introduce a method tcp_bpf_read_sock() to replace
> > > 'sk->sk_socket->ops->read_sock' (like 'tls_build_proto()' does in
> > > tls_main.c). Such replacement action was also used in updating
> > > tcp_bpf_prots in tcp_bpf.c, so it's not weird.
> > > (Note that checkpatch.pl may complain missing 'const' qualifier when we
> > > define the bpf-specified 'proto_ops', but we have to do because we need
> > > update it).
> > > 
> > > Also we remove strparser check in tcp_eat_skb() since we implement custom
> > > function tcp_bpf_read_sock() without copied_seq updating.
> > > 
> > > Since strparser currently supports only TCP, it's sufficient for 'ops' to
> > > inherit inet_stream_ops.
> > > 
> > > In strparser's implementation, regardless of partial or full reads,
> > > it completely clones the entire skb, allowing us to unconditionally
> > > free skb in tcp_bpf_read_sock().
> > > 
> > > Fixes: e5c6de5fa025 ("bpf, sockmap: Incorrectly handling copied_seq")
> > > Signed-off-by: Jiayuan Chen <mrpre@....com>
> > 
> > [...]
> > 
> > > +/* The tcp_bpf_read_sock() is an alternative implementation
> > > + * of tcp_read_sock(), except that it does not update copied_seq.
> > > + */
> > > +static int tcp_bpf_read_sock(struct sock *sk, read_descriptor_t *desc,
> > > +			     sk_read_actor_t recv_actor)
> > > +{
> > > +	struct sk_buff *skb;
> > > +	int copied = 0;
> > > +
> > > +	if (sk->sk_state == TCP_LISTEN)
> > > +		return -ENOTCONN;
> > > +
> > > +	while ((skb = skb_peek(&sk->sk_receive_queue)) != NULL) {
> > > +		u8 tcp_flags;
> > > +		int used;
> > > +
> > > +		WARN_ON_ONCE(!skb_set_owner_sk_safe(skb, sk));
> > > +		tcp_flags = TCP_SKB_CB(skb)->tcp_flags;
> > > +		used = recv_actor(desc, skb, 0, skb->len);
> > 
> > Here the skb is still on the receive_queue how does this work with
> > tcp_try_coalesce()? So I believe you need to unlink before you
> > call the actor which creates a bit of trouble if recv_actor
> > doesn't want the entire skb.  
> > 
> > I think easier is to do similar logic to read_sock and track
> > offset and len? Did I miss something.
> 
> Thanks to John Fastabend.
> 
> Let me explain it.
> Now I only replace the read_sock handler when using strparser.
> 
> My previous implementation added the replacement of read_sock in
> sk_psock_start_strp() to more explicitly require replacement when using
> strparser, for instance:
> '''skmsg.c
> void sk_psock_start_strp(struct sock *sk, struct sk_psock *psock)
> {
>     ...
>     sk->sk_data_ready = sk_psock_strp_data_ready;
>     /* Replacement */
>     sk->sk_socket->ops->read_sock = tcp_bpf_read_sock;
> }
> '''
> 
> As you can see that it only works for strparser.
> (The current implementation of replacement in tcp_bpf_update_proto()
> achieves the same effect just not as obviously.)
> 
> So the current implementation of recv_actor() can only be strp_recv(),
> with the call stack as follows:
> '''
> sk_psock_strp_data_ready
>   -> strp_data_ready
>     -> strp_read_sock
>       -> strp_recv
> '''
> 
> The implementation of strp_recv() will consume all input skb. Even if it
> reads part of the data, it will clone it, then place it into its own
> queue, expecting the caller to release the skb. I didn't find any
> logic like tcp_try_coalesce() (fix me if i miss something).


So here is what I believe the flow is,

sk_psock_strp_data_ready
  -> str_data_ready
     -> strp_read_sock
        -> sock->ops->read_sock(.., strp_recv)


We both have the same idea up to here. But then the proposed data_ready()
call

+	while ((skb = skb_peek(&sk->sk_receive_queue)) != NULL) {
+		u8 tcp_flags;
+		int used;
+
+		WARN_ON_ONCE(!skb_set_owner_sk_safe(skb, sk));
+		tcp_flags = TCP_SKB_CB(skb)->tcp_flags;
+		used = recv_actor(desc, skb, 0, skb->len);

The recv_actor here is strp_recv() all good so far. But, because
that skb is still on the sk_receive_queue() the TCP stack may
at the same time do

 tcp_data_queue
  -> tcp_queue_rcv
     -> tail = skb_peek_tail(&sk->sk_receive_queue);
        tcp_try_coalesce(sk, tail, skb, fragstolen)
         -> skb_try_coalesce()
            ... skb->len += len

So among other things you will have changed the skb->len and added some
data to it. If this happens while you are running the recv actor we will
eat the data by calling tcp_eat_recv_skb(). Any data added from the
try_coalesce will just be dropped and never handled? The clone() from
the strparser side doesn't help you the tcp_eat_recv_skb call will
unlik the skb from the sk_receive_queue.

I don't think you have any way to protect this at the moment.

> 
> The record of the 'offset' is stored within its own context(strparser/_strp_msg).
> With all skbs and offset saved in strp context, the caller does not need to
> maintain it.
> 
> I have also added various logic tests for this situation in the test
> case, and it works correctly. 
> > > +		/* strparser clone and consume all input skb
> > > +		 * even in waiting head or body status
> > > +		 */
> > > +		tcp_eat_recv_skb(sk, skb);
> > > +		if (used <= 0) {
> > > +			if (!copied)
> > > +				copied = used;
> > > +			break;
> > > +		}
> > > +		copied += used;
> > > +		if (!desc->count)
> > > +			break;
> > > +		if (tcp_flags & TCPHDR_FIN)
> > > +			break;
> > > +	}
> > > +	return copied;
> > > +}
> > > +
> > >  enum {
> > >  	TCP_BPF_IPV4,
> > >  	TCP_BPF_IPV6,
> > > @@ -595,6 +636,10 @@ enum {
> 
> 



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ