netdev - Re: [RFC bpf-next] bpf, verifier: improve signed ranges inference for BPF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <be239a5581e5b7d5c6f310c2a4c11282aa5896b5.camel@gmail.com>
Date: Wed, 17 Jul 2024 14:10:35 -0700
From: Eduard Zingerman <eddyz87@...il.com>
To: Shung-Hsi Yu <shung-hsi.yu@...e.com>, Xu Kuohai
 <xukuohai@...weicloud.com>
Cc: bpf@...r.kernel.org, netdev@...r.kernel.org, Alexei Starovoitov
 <ast@...nel.org>, Andrii Nakryiko <andrii@...nel.org>, Daniel Borkmann
 <daniel@...earbox.net>, Martin KaFai Lau <martin.lau@...ux.dev>, Song Liu
 <song@...nel.org>, Yonghong Song <yonghong.song@...ux.dev>, John Fastabend
 <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>, Stanislav
 Fomichev <sdf@...gle.com>, Hao Luo <haoluo@...gle.com>, Jiri Olsa
 <jolsa@...nel.org>,  Roberto Sassu <roberto.sassu@...wei.com>, Edward Cree
 <ecree.xilinx@...il.com>, Eric Dumazet <edumazet@...gle.com>, Jakub
 Kicinski <kuba@...nel.org>, Harishankar Vishwanathan
 <harishankar.vishwanathan@...il.com>, Santosh Nagarakatte
 <santosh.nagarakatte@...gers.edu>,  Srinivas Narayana
 <srinivas.narayana@...gers.edu>, Matan Shachnai <m.shachnai@...gers.edu>
Subject: Re: [RFC bpf-next] bpf, verifier: improve signed ranges inference
 for BPF_AND

On Tue, 2024-07-16 at 22:52 +0800, Shung-Hsi Yu wrote:

[...]

> To allow verification of such instruction pattern, update
> scalar*_min_max_and() to infer signed ranges directly from signed ranges
> of the operands. With BPF_AND, the resulting value always gains more
> unset '0' bit, thus it only move towards 0x0000000000000000. The
> difficulty lies with how to deal with signs. While non-negative
> (positive and zero) value simply grows smaller, a negative number can
> grows smaller, but may also underflow and become a larger value.
> 
> To better address this situation we split the signed ranges into
> negative range and non-negative range cases, ignoring the mixed sign
> cases for now; and only consider how to calculate smax_value.
> 
> Since negative range & negative range preserve the sign bit, so we know
> the result is still a negative value, thus it only move towards S64_MIN,
> but never underflow, thus a save bet is to use a value in ranges that is
> closet to 0, thus "max(dst_reg->smax_value, src->smax_value)". For
> negative range & positive range the sign bit is always cleared, thus we
> know the resulting is a non-negative, and only moves towards 0, so a
> safe bet is to use smax_value of the non-negative range. Last but not
> least, non-negative range & non-negative range is still a non-negative
> value, and only moves towards 0; however same as the unsigned range
> case, the maximum is actually capped by the lesser of the two, and thus
> min(dst_reg->smax_value, src_reg->smax_value);
> 
> Listing out the above reasoning as a table (dst_reg abbreviated as dst,
> src_reg abbreviated as src, smax_value abbrivated as smax) we get:
> 
>                         |                         src_reg
>        smax = ?         +---------------------------+---------------------------
>                         |        negative           |       non-negative
> ---------+--------------+---------------------------+---------------------------
>          | negative     | max(dst->smax, src->smax) |         src->smax
> dst_reg  +--------------+---------------------------+---------------------------
>          | non-negative |         dst->smax         | min(dst->smax, src->smax)
> 
> However this is quite complicated, luckily it can be simplified given
> the following observations
> 
>     max(dst_reg->smax_value, src_reg->smax_value) >= src_reg->smax_value
>     max(dst_reg->smax_value, src_reg->smax_value) >= dst_reg->smax_value
>     max(dst_reg->smax_value, src_reg->smax_value) >= min(dst_reg->smax_value, src_reg->smax_value)
> 
> So we could substitute the cells in the table above all with max(...),
> and arrive at:
> 
>                         |                         src_reg
>       smax' = ?         +---------------------------+---------------------------
>                         |        negative           |       non-negative
> ---------+--------------+---------------------------+---------------------------
>          | negative     | max(dst->smax, src->smax) | max(dst->smax, src->smax)
> dst_reg  +--------------+---------------------------+---------------------------
>          | non-negative | max(dst->smax, src->smax) | max(dst->smax, src->smax)
> 
> Meaning that simply using
> 
>   max(dst_reg->smax_value, src_reg->smax_value)
> 
> to calculate the resulting smax_value would work across all sign combinations.
> 
> 
> For smin_value, we know that both non-negative range & non-negative
> range and negative range & non-negative range both result in a
> non-negative value, so an easy guess is to use the minimum non-negative
> value, thus 0.
> 
>                         |                         src_reg
>        smin = ?         +----------------------------+---------------------------
>                         |          negative          |       non-negative
> ---------+--------------+----------------------------+---------------------------
>          | negative     |             ?              |             0
> dst_reg  +--------------+----------------------------+---------------------------
>          | non-negative |             0              |             0
> 
> This leave the negative range & negative range case to be considered. We
> know that negative range & negative range always yield a negative value,
> so a preliminary guess would be S64_MIN. However, that guess is too
> imprecise to help with the r0 <<= 62, r0 s>>= 63, r0 &= -13 pattern
> we're trying to deal with here.
> 
> This can be further improve with the observation that for negative range
> & negative range, the smallest possible value must be one that has
> longest _common_ most-significant set '1' bits sequence, thus we can use
> min(dst_reg->smin_value, src->smin_value) as the starting point, as the
> smaller value will be the one with the shorter most-significant set '1'
> bits sequence. But that alone is not enough, as we do not know whether
> rest of the bits would be set, so the safest guess would be one that
> clear alls bits after the most-significant set '1' bits sequence,
> something akin to bit_floor(), but for rounding to a negative power-of-2
> instead.
> 
>     negative_bit_floor(0xffff000000000003) == 0xffff000000000000
>     negative_bit_floor(0xf0ff0000ffff0000) == 0xf000000000000000
>     negative_bit_floor(0xfffffb0000000000) == 0xfffff80000000000
> 
> With negative range & negative range solve, we now have:
> 
>                         |                         src_reg
>        smin = ?         +----------------------------+---------------------------
>                         |        negative            |       non-negative
> ---------+--------------+----------------------------+---------------------------
>          |   negative   |negative_bit_floor(         |             0
>          |              |  min(dst->smin, src->smin))|
> dst_reg  +--------------+----------------------------+---------------------------
>          | non-negative |           0                |             0
> 
> This can be further simplied since min(dst->smin, src->smin) < 0 when both
> dst_reg and src_reg have a negative range. Which means using
> 
>     negative_bit_floor(min(dst_reg->smin_value, src_reg->smin_value)
> 
> to calculate the resulting smin_value would work across all sign combinations.
> 
> Together these allows us to infer the signed range of the result of BPF_AND
> operation using the signed range from its operands.


Hi Shung-Hsi,

This seems quite elegant.
As an additional check, I did a simple brute-force for all possible
ranges of 6-bit integers and bounds are computed safely.

[...]