netdev - Re: [PATCH bpf-next v4 16/20] bpf: Add a special case for bitwise AND on range [-1, 0]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b6xgxp45bisr2rygzzs6rwwtyaq23e6vwf6wpfnzho4fxrgijw@e4imapyktalv>
Date: Tue, 16 Jul 2024 23:19:16 +0800
From: Shung-Hsi Yu <shung-hsi.yu@...e.com>
To: Xu Kuohai <xukuohai@...weicloud.com>
Cc: Eduard Zingerman <eddyz87@...il.com>, bpf@...r.kernel.org, 
	netdev@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>, 
	Andrii Nakryiko <andrii@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, 
	Martin KaFai Lau <martin.lau@...ux.dev>, Song Liu <song@...nel.org>, 
	Yonghong Song <yonghong.song@...ux.dev>, John Fastabend <john.fastabend@...il.com>, 
	KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...gle.com>, Hao Luo <haoluo@...gle.com>, 
	Jiri Olsa <jolsa@...nel.org>, Roberto Sassu <roberto.sassu@...wei.com>, 
	Edward Cree <ecree.xilinx@...il.com>, Eric Dumazet <edumazet@...gle.com>, 
	Jakub Kicinski <kuba@...nel.org>, Harishankar Vishwanathan <harishankar.vishwanathan@...il.com>, 
	Santosh Nagarakatte <santosh.nagarakatte@...gers.edu>, Srinivas Narayana <srinivas.narayana@...gers.edu>, 
	Matan Shachnai <m.shachnai@...gers.edu>
Subject: Re: [PATCH bpf-next v4 16/20] bpf: Add a special case for bitwise
 AND on range [-1, 0]

On Tue, Jul 16, 2024 at 03:05:11PM GMT, Xu Kuohai wrote:
> On 7/15/2024 11:29 PM, Shung-Hsi Yu wrote:
> > Cc Harishankar Vishwanathan, Prof. Srinivas Narayana and Prof. Santosh
> > Nagarakatte, and Matan Shachnai, whom have recently work on
> > scalar*_min_max_and(); also dropping LSM/FS related mails from Cc since
> > it's a bit long and I'm not sure whether the mailing list will reject
> > due to too many email in Cc.
> > 
> > On Thu, Jul 11, 2024 at 07:38:24PM GMT, Xu Kuohai wrote:
> > > With lsm return value check, the no-alu32 version test_libbpf_get_fd_by_id_opts
> > > is rejected by the verifier, and the log says:
> > > 
> > > 0: R1=ctx() R10=fp0
> > > ; int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode) @ test_libbpf_get_fd_by_id_opts.c:27
> > > 0: (b7) r0 = 0                        ; R0_w=0
> > > 1: (79) r2 = *(u64 *)(r1 +0)
> > > func 'bpf_lsm_bpf_map' arg0 has btf_id 916 type STRUCT 'bpf_map'
> > > 2: R1=ctx() R2_w=trusted_ptr_bpf_map()
> > > ; if (map != (struct bpf_map *)&data_input) @ test_libbpf_get_fd_by_id_opts.c:29
> > > 2: (18) r3 = 0xffff9742c0951a00       ; R3_w=map_ptr(map=data_input,ks=4,vs=4)
> > > 4: (5d) if r2 != r3 goto pc+4         ; R2_w=trusted_ptr_bpf_map() R3_w=map_ptr(map=data_input,ks=4,vs=4)
> > > ; int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode) @ test_libbpf_get_fd_by_id_opts.c:27
> > > 5: (79) r0 = *(u64 *)(r1 +8)          ; R0_w=scalar() R1=ctx()
> > > ; if (fmode & FMODE_WRITE) @ test_libbpf_get_fd_by_id_opts.c:32
> > > 6: (67) r0 <<= 62                     ; R0_w=scalar(smax=0x4000000000000000,umax=0xc000000000000000,smin32=0,smax32=umax32=0,var_off=(0x0; 0xc000000000000000))
> > > 7: (c7) r0 s>>= 63                    ; R0_w=scalar(smin=smin32=-1,smax=smax32=0)
> > > ;  @ test_libbpf_get_fd_by_id_opts.c:0
> > > 8: (57) r0 &= -13                     ; R0_w=scalar(smax=0x7ffffffffffffff3,umax=0xfffffffffffffff3,smax32=0x7ffffff3,umax32=0xfffffff3,var_off=(0x0; 0xfffffffffffffff3))
> > > ; int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode) @ test_libbpf_get_fd_by_id_opts.c:27
> > > 9: (95) exit
> > > 
> > > And here is the C code of the prog.
> > > 
> > > SEC("lsm/bpf_map")
> > > int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode)
> > > {
> > >      if (map != (struct bpf_map *)&data_input)
> > > 	    return 0;
> > > 
> > >      if (fmode & FMODE_WRITE)
> > > 	    return -EACCES;
> > > 
> > >      return 0;
> > > }
> > > 
> > > It is clear that the prog can only return either 0 or -EACCESS, and both
> > > values are legal.
> > > 
> > > So why is it rejected by the verifier?
> > > 
> > > The verifier log shows that the second if and return value setting
> > > statements in the prog is optimized to bitwise operations "r0 s>>= 63"
> > > and "r0 &= -13". The verifier correctly deduces that the value of
> > > r0 is in the range [-1, 0] after verifing instruction "r0 s>>= 63".
> > > But when the verifier proceeds to verify instruction "r0 &= -13", it
> > > fails to deduce the correct value range of r0.
> > > 
> > > 7: (c7) r0 s>>= 63                    ; R0_w=scalar(smin=smin32=-1,smax=smax32=0)
> > > 8: (57) r0 &= -13                     ; R0_w=scalar(smax=0x7ffffffffffffff3,umax=0xfffffffffffffff3,smax32=0x7ffffff3,umax32=0xfffffff3,var_off=(0x0; 0xfffffffffffffff3))
> > > 
> > > So why the verifier fails to deduce the result of 'r0 &= -13'?
> > > 
> > > The verifier uses tnum to track values, and the two ranges "[-1, 0]" and
> > > "[0, -1ULL]" are encoded to the same tnum. When verifing instruction
> > > "r0 &= -13", the verifier erroneously deduces the result from
> > > "[0, -1ULL] AND -13", which is out of the expected return range
> > > [-4095, 0].
> > > 
> > > As explained by Eduard in [0], the clang transformation that generates this
> > > pattern is located in DAGCombiner::SimplifySelectCC() method (see [1]).
> > ...
> > > As suggested by Eduard and Andrii, this patch makes a special case
> > > for source or destination register of '&=' operation being in
> > > range [-1, 0].
> > ...
> > 
> > Been wonder whether it possible for a more general approach ever since I
> > saw the discussion back in April. I think I've finally got something.
> > 
> > The problem we face here is that the tightest bound for the [-1, 0] case
> > was tracked with signed ranges, yet the BPF verifier looses knowledge of
> > them all too quickly in scalar*_min_max_and(); knowledge of previous
> > signed ranges were not used at all to derive the outcome of signed
> > ranges after BPF_AND.
> > 
> > 	static void scalar_min_max_and(...) {
> > 		...
> > 		if ((s64)dst_reg->umin_value <= (s64)dst_reg->umax_value) {
> > 			dst_reg->smin_value = dst_reg->umin_value;
> > 			dst_reg->smax_value = dst_reg->umax_value;
> > 		} else {
> > 			dst_reg->smin_value = S64_MIN;
> > 			dst_reg->smax_value = S64_MAX;
> > 		}
> > 		...
> > 	}
> > 
> 
> This is indeed the root cause.
> 
> > So looks like its time to be nobody[1] and try to teach BPF verifier how
> > track signed ranges when ANDing two (possibly) negative numbers. Luckily
> > bitwise AND is comparatively easier to do than other bitwise operations:
> > non-negative range & non-negative range is always non-negative,
> > non-negative range & negative range is still always non-negative, and
> > negative range & negative range is always negative.
> > 
> 
> Right, only bitwise ANDing two negatives yields to a negative result.
> 
> > smax_value is straight forwards, we can just do
> > 
> > 	max(dst_reg->smax_value, src_reg->smax_value)
> > 
> > which works across all sign combinations. Technically for non-negative &
> > non-negative we can use min() instead of max(), but the non-negative &
> > non-negative case should be handled pretty well by the unsigned ranges
> > already; it seems simpler to let such knowledge flows from unsigned
> > ranges to signed ranges during reg_bounds_sync(). Plus we are not wrong
> > for non-negative & non-negative by using max(), just imprecise, so no
> > correctness/soundness issue here.
> > 
> 
> I think this is correct, since in two's complement, more '1' bits means
> more large, regardless of sign, and bitwise AND never generates more '1'
> bits.
> 
> > smin_value is the tricker one, but doable with
> > 
> > 	masked_negative(min(dst_reg->smin_value, src_reg->smin_value))
> > 
> > where masked_negative(v) basically just clear all bits after the most
> > significant unset bit, effectively rounding a negative value down to a
> > negative power-of-2 value, and returning 0 for non-negative values. E.g.
> > for some 8-bit, negative value
> > 
> > 	masked_negative(0b11101001) == 0b11100000
> > 
> 
> Ah, it's really tricky. Seems it's the longest high '1' bits sequence
> in both operands. This '1' bits should remain unchanged by the bitwise
> AND operation. So this sequence must be in the result, making it the
> minimum possible value.
> 
> > This can be done with a tweaked version of "Round up to the next highest
> > power of 2"[2],
> > 
> > 	/* Invert the bits so the first unset bit can be propagated with |= */
> > 	v = ~v;
> > 	/* Now propagate the first (previously unset, now set) bit to the
> > 	 * trailing positions */
> > 	v |= v >> 1;
> > 	v |= v >> 2;
> > 	v |= v >> 4;
> > 	...
> > 	v |= v >> 32; /* Assuming 64-bit */
> > 	/* Propagation done, now invert again */
> > 	v = ~v;
> > 
> > Again, we technically can do better if we take sign bit into account,
> > but deriving smin_value this way should still be correct/sound across
> > different sign combinations, and overall should help us derived [-16, 0]
> > from "[-1, 0] AND -13", thus preventing BPF verifier from rejecting the
> > program.
> > 
> > ---
> > 
> > Alternatively we can employ a range-splitting trick (think I saw this in
> > [3]) that allow us to take advantage of existing tnum_and() by splitting
> > the signed ranges into two if the range crosses the sign boundary (i.e.
> > contains both non-negative and negative values), one range will be
> > [smin, U64_MAX], the other will be [0, smax]. This way we get around
> > tnum's weakness of representing [-1, 0] as [0, U64_MAX].
> > 
> > 	if (src_reg->smin_value < 0 && src_reg->smax_value >= 0) {
> > 		src_lower = tnum_range(src_reg->smin_value, U64_MAX);
> > 		src_higher = tnum_range(0, src_reg->smax_value);
> > 	} else {
> > 		src_lower = tnum_range(src_reg->smin_value, src_reg->smax_value);
> > 		src_higher = tnum_range(src_reg->smin_value, src_reg->smax_value);
> > 	}
> > 
> > 	if (dst_reg->smin_value < 0 && dst_reg->smax_value >= 0) {
> > 		dst_lower = tnum_range(dst_reg->smin_value, U64_MAX);
> > 		dst_higher = tnum_range(0, dst_reg->smax_value);
> > 	} else {
> > 		dst_lower = tnum_range(dst_reg->smin_value, dst_reg->smax_value);
> > 		dst_higher = tnum_range(dst_reg->smin_value, dst_reg->smax_value);
> > 	}
> > 
> > 	lower = tnum_and(src_lower, dst_lower);
> > 	higher = tnum_and(src_higher, dst_higher);
> > 	dst->smin_value = lower.value;
> > 	dst->smax_value = higher.value | higher.mask;
> 
> This looks even more tricky...

Indeed, and I think the above is still wrong because it did not proper
set smin_value to S64_MIN when needed.

> > Personally I like the first method better as it is simpler yet still
> > does the job well enough. I'll work on that in the next few days and see
> > if it actually works.
> 
> This really sounds great. Thank you for the excellent work!

Sent RFC in sibling thread. I think it would be better if the patch was
included as part of your series. But let's see what the other think of
it first.