[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b6xgxp45bisr2rygzzs6rwwtyaq23e6vwf6wpfnzho4fxrgijw@e4imapyktalv>
Date: Tue, 16 Jul 2024 23:19:16 +0800
From: Shung-Hsi Yu <shung-hsi.yu@...e.com>
To: Xu Kuohai <xukuohai@...weicloud.com>
Cc: Eduard Zingerman <eddyz87@...il.com>, bpf@...r.kernel.org,
netdev@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>,
Andrii Nakryiko <andrii@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
Martin KaFai Lau <martin.lau@...ux.dev>, Song Liu <song@...nel.org>,
Yonghong Song <yonghong.song@...ux.dev>, John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...gle.com>, Hao Luo <haoluo@...gle.com>,
Jiri Olsa <jolsa@...nel.org>, Roberto Sassu <roberto.sassu@...wei.com>,
Edward Cree <ecree.xilinx@...il.com>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Harishankar Vishwanathan <harishankar.vishwanathan@...il.com>,
Santosh Nagarakatte <santosh.nagarakatte@...gers.edu>, Srinivas Narayana <srinivas.narayana@...gers.edu>,
Matan Shachnai <m.shachnai@...gers.edu>
Subject: Re: [PATCH bpf-next v4 16/20] bpf: Add a special case for bitwise
AND on range [-1, 0]
On Tue, Jul 16, 2024 at 03:05:11PM GMT, Xu Kuohai wrote:
> On 7/15/2024 11:29 PM, Shung-Hsi Yu wrote:
> > Cc Harishankar Vishwanathan, Prof. Srinivas Narayana and Prof. Santosh
> > Nagarakatte, and Matan Shachnai, whom have recently work on
> > scalar*_min_max_and(); also dropping LSM/FS related mails from Cc since
> > it's a bit long and I'm not sure whether the mailing list will reject
> > due to too many email in Cc.
> >
> > On Thu, Jul 11, 2024 at 07:38:24PM GMT, Xu Kuohai wrote:
> > > With lsm return value check, the no-alu32 version test_libbpf_get_fd_by_id_opts
> > > is rejected by the verifier, and the log says:
> > >
> > > 0: R1=ctx() R10=fp0
> > > ; int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode) @ test_libbpf_get_fd_by_id_opts.c:27
> > > 0: (b7) r0 = 0 ; R0_w=0
> > > 1: (79) r2 = *(u64 *)(r1 +0)
> > > func 'bpf_lsm_bpf_map' arg0 has btf_id 916 type STRUCT 'bpf_map'
> > > 2: R1=ctx() R2_w=trusted_ptr_bpf_map()
> > > ; if (map != (struct bpf_map *)&data_input) @ test_libbpf_get_fd_by_id_opts.c:29
> > > 2: (18) r3 = 0xffff9742c0951a00 ; R3_w=map_ptr(map=data_input,ks=4,vs=4)
> > > 4: (5d) if r2 != r3 goto pc+4 ; R2_w=trusted_ptr_bpf_map() R3_w=map_ptr(map=data_input,ks=4,vs=4)
> > > ; int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode) @ test_libbpf_get_fd_by_id_opts.c:27
> > > 5: (79) r0 = *(u64 *)(r1 +8) ; R0_w=scalar() R1=ctx()
> > > ; if (fmode & FMODE_WRITE) @ test_libbpf_get_fd_by_id_opts.c:32
> > > 6: (67) r0 <<= 62 ; R0_w=scalar(smax=0x4000000000000000,umax=0xc000000000000000,smin32=0,smax32=umax32=0,var_off=(0x0; 0xc000000000000000))
> > > 7: (c7) r0 s>>= 63 ; R0_w=scalar(smin=smin32=-1,smax=smax32=0)
> > > ; @ test_libbpf_get_fd_by_id_opts.c:0
> > > 8: (57) r0 &= -13 ; R0_w=scalar(smax=0x7ffffffffffffff3,umax=0xfffffffffffffff3,smax32=0x7ffffff3,umax32=0xfffffff3,var_off=(0x0; 0xfffffffffffffff3))
> > > ; int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode) @ test_libbpf_get_fd_by_id_opts.c:27
> > > 9: (95) exit
> > >
> > > And here is the C code of the prog.
> > >
> > > SEC("lsm/bpf_map")
> > > int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode)
> > > {
> > > if (map != (struct bpf_map *)&data_input)
> > > return 0;
> > >
> > > if (fmode & FMODE_WRITE)
> > > return -EACCES;
> > >
> > > return 0;
> > > }
> > >
> > > It is clear that the prog can only return either 0 or -EACCESS, and both
> > > values are legal.
> > >
> > > So why is it rejected by the verifier?
> > >
> > > The verifier log shows that the second if and return value setting
> > > statements in the prog is optimized to bitwise operations "r0 s>>= 63"
> > > and "r0 &= -13". The verifier correctly deduces that the value of
> > > r0 is in the range [-1, 0] after verifing instruction "r0 s>>= 63".
> > > But when the verifier proceeds to verify instruction "r0 &= -13", it
> > > fails to deduce the correct value range of r0.
> > >
> > > 7: (c7) r0 s>>= 63 ; R0_w=scalar(smin=smin32=-1,smax=smax32=0)
> > > 8: (57) r0 &= -13 ; R0_w=scalar(smax=0x7ffffffffffffff3,umax=0xfffffffffffffff3,smax32=0x7ffffff3,umax32=0xfffffff3,var_off=(0x0; 0xfffffffffffffff3))
> > >
> > > So why the verifier fails to deduce the result of 'r0 &= -13'?
> > >
> > > The verifier uses tnum to track values, and the two ranges "[-1, 0]" and
> > > "[0, -1ULL]" are encoded to the same tnum. When verifing instruction
> > > "r0 &= -13", the verifier erroneously deduces the result from
> > > "[0, -1ULL] AND -13", which is out of the expected return range
> > > [-4095, 0].
> > >
> > > As explained by Eduard in [0], the clang transformation that generates this
> > > pattern is located in DAGCombiner::SimplifySelectCC() method (see [1]).
> > ...
> > > As suggested by Eduard and Andrii, this patch makes a special case
> > > for source or destination register of '&=' operation being in
> > > range [-1, 0].
> > ...
> >
> > Been wonder whether it possible for a more general approach ever since I
> > saw the discussion back in April. I think I've finally got something.
> >
> > The problem we face here is that the tightest bound for the [-1, 0] case
> > was tracked with signed ranges, yet the BPF verifier looses knowledge of
> > them all too quickly in scalar*_min_max_and(); knowledge of previous
> > signed ranges were not used at all to derive the outcome of signed
> > ranges after BPF_AND.
> >
> > static void scalar_min_max_and(...) {
> > ...
> > if ((s64)dst_reg->umin_value <= (s64)dst_reg->umax_value) {
> > dst_reg->smin_value = dst_reg->umin_value;
> > dst_reg->smax_value = dst_reg->umax_value;
> > } else {
> > dst_reg->smin_value = S64_MIN;
> > dst_reg->smax_value = S64_MAX;
> > }
> > ...
> > }
> >
>
> This is indeed the root cause.
>
> > So looks like its time to be nobody[1] and try to teach BPF verifier how
> > track signed ranges when ANDing two (possibly) negative numbers. Luckily
> > bitwise AND is comparatively easier to do than other bitwise operations:
> > non-negative range & non-negative range is always non-negative,
> > non-negative range & negative range is still always non-negative, and
> > negative range & negative range is always negative.
> >
>
> Right, only bitwise ANDing two negatives yields to a negative result.
>
> > smax_value is straight forwards, we can just do
> >
> > max(dst_reg->smax_value, src_reg->smax_value)
> >
> > which works across all sign combinations. Technically for non-negative &
> > non-negative we can use min() instead of max(), but the non-negative &
> > non-negative case should be handled pretty well by the unsigned ranges
> > already; it seems simpler to let such knowledge flows from unsigned
> > ranges to signed ranges during reg_bounds_sync(). Plus we are not wrong
> > for non-negative & non-negative by using max(), just imprecise, so no
> > correctness/soundness issue here.
> >
>
> I think this is correct, since in two's complement, more '1' bits means
> more large, regardless of sign, and bitwise AND never generates more '1'
> bits.
>
> > smin_value is the tricker one, but doable with
> >
> > masked_negative(min(dst_reg->smin_value, src_reg->smin_value))
> >
> > where masked_negative(v) basically just clear all bits after the most
> > significant unset bit, effectively rounding a negative value down to a
> > negative power-of-2 value, and returning 0 for non-negative values. E.g.
> > for some 8-bit, negative value
> >
> > masked_negative(0b11101001) == 0b11100000
> >
>
> Ah, it's really tricky. Seems it's the longest high '1' bits sequence
> in both operands. This '1' bits should remain unchanged by the bitwise
> AND operation. So this sequence must be in the result, making it the
> minimum possible value.
>
> > This can be done with a tweaked version of "Round up to the next highest
> > power of 2"[2],
> >
> > /* Invert the bits so the first unset bit can be propagated with |= */
> > v = ~v;
> > /* Now propagate the first (previously unset, now set) bit to the
> > * trailing positions */
> > v |= v >> 1;
> > v |= v >> 2;
> > v |= v >> 4;
> > ...
> > v |= v >> 32; /* Assuming 64-bit */
> > /* Propagation done, now invert again */
> > v = ~v;
> >
> > Again, we technically can do better if we take sign bit into account,
> > but deriving smin_value this way should still be correct/sound across
> > different sign combinations, and overall should help us derived [-16, 0]
> > from "[-1, 0] AND -13", thus preventing BPF verifier from rejecting the
> > program.
> >
> > ---
> >
> > Alternatively we can employ a range-splitting trick (think I saw this in
> > [3]) that allow us to take advantage of existing tnum_and() by splitting
> > the signed ranges into two if the range crosses the sign boundary (i.e.
> > contains both non-negative and negative values), one range will be
> > [smin, U64_MAX], the other will be [0, smax]. This way we get around
> > tnum's weakness of representing [-1, 0] as [0, U64_MAX].
> >
> > if (src_reg->smin_value < 0 && src_reg->smax_value >= 0) {
> > src_lower = tnum_range(src_reg->smin_value, U64_MAX);
> > src_higher = tnum_range(0, src_reg->smax_value);
> > } else {
> > src_lower = tnum_range(src_reg->smin_value, src_reg->smax_value);
> > src_higher = tnum_range(src_reg->smin_value, src_reg->smax_value);
> > }
> >
> > if (dst_reg->smin_value < 0 && dst_reg->smax_value >= 0) {
> > dst_lower = tnum_range(dst_reg->smin_value, U64_MAX);
> > dst_higher = tnum_range(0, dst_reg->smax_value);
> > } else {
> > dst_lower = tnum_range(dst_reg->smin_value, dst_reg->smax_value);
> > dst_higher = tnum_range(dst_reg->smin_value, dst_reg->smax_value);
> > }
> >
> > lower = tnum_and(src_lower, dst_lower);
> > higher = tnum_and(src_higher, dst_higher);
> > dst->smin_value = lower.value;
> > dst->smax_value = higher.value | higher.mask;
>
> This looks even more tricky...
Indeed, and I think the above is still wrong because it did not proper
set smin_value to S64_MIN when needed.
> > Personally I like the first method better as it is simpler yet still
> > does the job well enough. I'll work on that in the next few days and see
> > if it actually works.
>
> This really sounds great. Thank you for the excellent work!
Sent RFC in sibling thread. I think it would be better if the patch was
included as part of your series. But let's see what the other think of
it first.
Powered by blists - more mailing lists