netdev - Re: [PATCH v1 bpf-next 03/10] bpf: tcp: Get rid of st_bucket

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bfey2fu3e74d52wjnoimu5ra7wqox2idnc2syzlrvsyjzezdli@lhywkrucesbf>
Date: Sat, 24 May 2025 14:09:41 -0700
From: Jordan Rife <jordan@...fe.io>
To: Martin KaFai Lau <martin.lau@...ux.dev>
Cc: Kuniyuki Iwashima <kuniyu@...zon.com>, alexei.starovoitov@...il.com, 
	bpf@...r.kernel.org, daniel@...earbox.net, netdev@...r.kernel.org, 
	willemdebruijn.kernel@...il.com
Subject: Re: [PATCH v1 bpf-next 03/10] bpf: tcp: Get rid of st_bucket_done

On Fri, May 23, 2025 at 03:07:32PM -0700, Martin KaFai Lau wrote:
> On 5/22/25 1:42 PM, Kuniyuki Iwashima wrote:
> > From: Jordan Rife <jordan@...fe.io>
> > Date: Thu, 22 May 2025 11:16:13 -0700
> > > > > >   static void bpf_iter_tcp_put_batch(struct bpf_tcp_iter_state *iter)
> > > > > >   {
> > > > > > -	while (iter->cur_sk < iter->end_sk)
> > > > > > -		sock_gen_put(iter->batch[iter->cur_sk++]);
> > > > > > +	unsigned int cur_sk = iter->cur_sk;
> > > > > > +
> > > > > > +	while (cur_sk < iter->end_sk)
> > > > > > +		sock_gen_put(iter->batch[cur_sk++]);
> > > > > 
> > > > > Why is this chunk included in this patch ?
> > > > 
> > > > This should be in patch 5 to keep cur_sk for find_cookie
> > > 
> > > Without this, iter->cur_sk is mutated when iteration stops, and we lose
> > > our place. When iteration resumes and we call bpf_iter_tcp_batch the
> > > iter->cur_sk == iter->end_sk condition will always be true, so we will
> > > skip to the next bucket without seeking to the offset.
> > > 
> > > Before, we relied on st_bucket_done to tell us if we had remaining items
> > > in the current bucket to process but now need to preserve iter->cur_sk
> > > through iterations to make the behavior equivalent to what we had before.
> > 
> > Thanks for explanation, I was confused by calling tcp_seek_last_pos()
> > multiple times, and I think we need to preserve/restore st->offset too
> > in patch 2 and need this change.
> > 
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index ac00015d5e7a..0816f20bfdff 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -2791,6 +2791,7 @@ static void *tcp_seek_last_pos(struct seq_file *seq)
> >   			break;
> >   		st->bucket = 0;
> >   		st->state = TCP_SEQ_STATE_ESTABLISHED;
> > +		offset = 0;
> 
> This seems like an existing bug not necessarily related to this set.

Agree that this is more of an existing bug.

> The patch 5 has also removed the tcp_seek_last_pos() dependency, so I think
> it can be a standalone fix on its own.

With the tcp_seq_* ops there are also other corner cases that can lead
to skips, since they rely on st->offset to seek to the last position.

In the scenario described above, sockets disappearing from the last lhash
bucket leads to skipped sockets in the first ehash bucket, but you could
also have a scenario where, for example, the current lhash bucket has 6
sockets, iter->offset is currently 3, 3 sockets disappear from the start
of the current lhash bucket then tcp_seek_last_pos skips the remaining 3
sockets and goes to the next bucket.

I'm not sure it's worth fixing just this one case without also
overhauling the tcp_seq_* logic to prevent these other cases. Otherwise,
it seems more like a Band-aid fix. Perhaps a later series could explore
a more comprehensive solution there.

Jordan