netdev - Re: [PATCH v3 bpf 2/3] bpf: Avoid iter->offset making backward progress in bpf_iter

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <B865DECA-2350-471F-B75B-A9D2F33672CA@isovalent.com>
Date: Fri, 12 Jan 2024 15:57:55 -0800
From: Aditi Ghag <aditi.ghag@...valent.com>
To: Martin KaFai Lau <martin.lau@...ux.dev>
Cc: bpf@...r.kernel.org,
 Alexei Starovoitov <ast@...nel.org>,
 Andrii Nakryiko <andrii@...nel.org>,
 Daniel Borkmann <daniel@...earbox.net>,
 netdev@...r.kernel.org,
 kernel-team@...a.com,
 Yonghong Song <yonghong.song@...ux.dev>
Subject: Re: [PATCH v3 bpf 2/3] bpf: Avoid iter->offset making backward
 progress in bpf_iter_udp



> On Jan 12, 2024, at 11:05 AM, Martin KaFai Lau <martin.lau@...ux.dev> wrote:
> 
> From: Martin KaFai Lau <martin.lau@...nel.org>
> 
> There is a bug in the bpf_iter_udp_batch() function that stops
> the userspace from making forward progress.
> 
> The case that triggers the bug is the userspace passed in
> a very small read buffer. When the bpf prog does bpf_seq_printf,
> the userspace read buffer is not enough to capture the whole bucket.
> 
> When the read buffer is not large enough, the kernel will remember
> the offset of the bucket in iter->offset such that the next userspace
> read() can continue from where it left off.
> 
> The kernel will skip the number (== "iter->offset") of sockets in
> the next read(). However, the code directly decrements the
> "--iter->offset". This is incorrect because the next read() may
> not consume the whole bucket either and then the next-next read()
> will start from offset 0. The net effect is the userspace will
> keep reading from the beginning of a bucket and the process will
> never finish. "iter->offset" must always go forward until the
> whole bucket is consumed.
> 
> This patch fixes it by using a local variable "resume_offset"
> and "resume_bucket". "iter->offset" is always reset to 0 before
> it may be used. "iter->offset" will be advanced to the
> "resume_offset" when it continues from the "resume_bucket" (i.e.
> "state->bucket == resume_bucket"). This brings it closer to
> the bpf_iter_tcp's offset handling which does not suffer
> the same bug.
> 
> Cc: Aditi Ghag <aditi.ghag@...valent.com>
> Fixes: c96dac8d369f ("bpf: udp: Implement batching for sockets iterator")
> Acked-by: Yonghong Song <yonghong.song@...ux.dev>
> Signed-off-by: Martin KaFai Lau <martin.lau@...nel.org>

Reviewed-by: Aditi Ghag <aditi.ghag@...valent.com>
 
Thanks!

> ---
> net/ipv4/udp.c | 21 ++++++++++-----------
> 1 file changed, 10 insertions(+), 11 deletions(-)
> 
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 978b83d3c094..04c534a9ef89 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -3137,16 +3137,18 @@ static struct sock *bpf_iter_udp_batch(struct seq_file *seq)
> 	struct bpf_udp_iter_state *iter = seq->private;
> 	struct udp_iter_state *state = &iter->state;
> 	struct net *net = seq_file_net(seq);
> +	int resume_bucket, resume_offset;
> 	struct udp_table *udptable;
> 	unsigned int batch_sks = 0;
> 	bool resized = false;
> 	struct sock *sk;
> 
> +	resume_bucket = state->bucket;
> +	resume_offset = iter->offset;
> +
> 	/* The current batch is done, so advance the bucket. */
> -	if (iter->st_bucket_done) {
> +	if (iter->st_bucket_done)
> 		state->bucket++;
> -		iter->offset = 0;
> -	}
> 
> 	udptable = udp_get_table_seq(seq, net);
> 
> @@ -3166,19 +3168,19 @@ static struct sock *bpf_iter_udp_batch(struct seq_file *seq)
> 	for (; state->bucket <= udptable->mask; state->bucket++) {
> 		struct udp_hslot *hslot2 = &udptable->hash2[state->bucket];
> 
> -		if (hlist_empty(&hslot2->head)) {
> -			iter->offset = 0;
> +		if (hlist_empty(&hslot2->head))
> 			continue;
> -		}
> 
> +		iter->offset = 0;
> 		spin_lock_bh(&hslot2->lock);
> 		udp_portaddr_for_each_entry(sk, &hslot2->head) {
> 			if (seq_sk_match(seq, sk)) {
> 				/* Resume from the last iterated socket at the
> 				 * offset in the bucket before iterator was stopped.
> 				 */
> -				if (iter->offset) {
> -					--iter->offset;
> +				if (state->bucket == resume_bucket &&
> +				    iter->offset < resume_offset) {

I like this invariant of ensuring that the batching and resume operations are performed for the same bucket under consideration.


> +					++iter->offset;
> 					continue;
> 				}
> 				if (iter->end_sk < iter->max_sk) {
> @@ -3192,9 +3194,6 @@ static struct sock *bpf_iter_udp_batch(struct seq_file *seq)
> 
> 		if (iter->end_sk)
> 			break;
> -
> -		/* Reset the current bucket's offset before moving to the next bucket. */
> -		iter->offset = 0;
> 	}
> 
> 	/* All done: no batch made. */
> -- 
> 2.34.1
>