linux-kernel - Re: [PATCH bpf-next v5 4/5] bpf: verifier: Support eliding map lookup nullness

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAEf4BzaqCgW9keiT+tJUBQWT6Q+jMwuvn4O2ZghO0c+ZvACNrw@mail.gmail.com>
Date: Fri, 13 Dec 2024 15:02:11 -0800
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Daniel Xu <dxu@...uu.xyz>
Cc: andrii@...nel.org, ast@...nel.org, eddyz87@...il.com, shuah@...nel.org, 
	daniel@...earbox.net, john.fastabend@...il.com, martin.lau@...ux.dev, 
	song@...nel.org, yonghong.song@...ux.dev, kpsingh@...nel.org, sdf@...ichev.me, 
	haoluo@...gle.com, jolsa@...nel.org, mykolal@...com, bpf@...r.kernel.org, 
	linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org, 
	netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next v5 4/5] bpf: verifier: Support eliding map lookup nullness

On Thu, Dec 12, 2024 at 3:23 PM Daniel Xu <dxu@...uu.xyz> wrote:
>
> This commit allows progs to elide a null check on statically known map
> lookup keys. In other words, if the verifier can statically prove that
> the lookup will be in-bounds, allow the prog to drop the null check.
>
> This is useful for two reasons:
>
> 1. Large numbers of nullness checks (especially when they cannot fail)
>    unnecessarily pushes prog towards BPF_COMPLEXITY_LIMIT_JMP_SEQ.
> 2. It forms a tighter contract between programmer and verifier.
>
> For (1), bpftrace is starting to make heavier use of percpu scratch
> maps. As a result, for user scripts with large number of unrolled loops,
> we are starting to hit jump complexity verification errors.  These
> percpu lookups cannot fail anyways, as we only use static key values.
> Eliding nullness probably results in less work for verifier as well.
>
> For (2), percpu scratch maps are often used as a larger stack, as the
> currrent stack is limited to 512 bytes. In these situations, it is
> desirable for the programmer to express: "this lookup should never fail,
> and if it does, it means I messed up the code". By omitting the null
> check, the programmer can "ask" the verifier to double check the logic.
>
> Tests also have to be updated in sync with these changes, as the
> verifier is more efficient with this change. Notable, iters.c tests had
> to be changed to use a map type that still requires null checks, as it's
> exercising verifier tracking logic w.r.t iterators.
>
> Signed-off-by: Daniel Xu <dxu@...uu.xyz>
> ---
>  kernel/bpf/verifier.c                         | 80 ++++++++++++++++++-
>  tools/testing/selftests/bpf/progs/iters.c     | 14 ++--
>  .../selftests/bpf/progs/map_kptr_fail.c       |  2 +-
>  .../selftests/bpf/progs/verifier_map_in_map.c |  2 +-
>  .../testing/selftests/bpf/verifier/map_kptr.c |  2 +-
>  5 files changed, 87 insertions(+), 13 deletions(-)
>

Eduard has great points. I've added a few more comments below.

pw-bot: cr

> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 58b36cc96bd5..4947ef884a18 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -287,6 +287,7 @@ struct bpf_call_arg_meta {
>         u32 ret_btf_id;
>         u32 subprogno;
>         struct btf_field *kptr_field;
> +       s64 const_map_key;
>  };
>
>  struct bpf_kfunc_call_arg_meta {
> @@ -9163,6 +9164,53 @@ static int check_reg_const_str(struct bpf_verifier_env *env,
>         return 0;
>  }
>
> +/* Returns constant key value if possible, else -1 */
> +static s64 get_constant_map_key(struct bpf_verifier_env *env,
> +                               struct bpf_reg_state *key,
> +                               u32 key_size)
> +{
> +       struct bpf_func_state *state = func(env, key);
> +       struct bpf_reg_state *reg;
> +       int zero_size = 0;
> +       int stack_off;
> +       u8 *stype;
> +       int slot;
> +       int spi;
> +       int i;
> +
> +       if (!env->bpf_capable)
> +               return -1;
> +       if (key->type != PTR_TO_STACK)
> +               return -1;
> +       if (!tnum_is_const(key->var_off))
> +               return -1;
> +
> +       stack_off = key->off + key->var_off.value;
> +       slot = -stack_off - 1;
> +       spi = slot / BPF_REG_SIZE;
> +
> +       /* First handle precisely tracked STACK_ZERO, up to BPF_REG_SIZE */
> +       stype = state->stack[spi].slot_type;
> +       for (i = 0; i < BPF_REG_SIZE && stype[i] == STACK_ZERO; i++)

it's Friday and I'm lazy, but please double-check that this works for
both big-endian and little-endian :)

with Eduard's suggestion this also becomes interesting when you have
000mmm mix (as one example), because that gives you a small range, and
all values might be valid keys for arrays

> +               zero_size++;
> +       if (zero_size == key_size)
> +               return 0;
> +
> +       if (!is_spilled_reg(&state->stack[spi]))
> +               /* Not pointer to stack */

!is_spilled_reg and "Not pointer to stack" seem to be not exactly the
same things?

btw, we also have is_spilled_scalar_reg() which you can use here
instead of two separate checks?

> +               return -1;
> +
> +       reg = &state->stack[spi].spilled_ptr;
> +       if (reg->type != SCALAR_VALUE)
> +               /* Only scalars are valid array map keys */
> +               return -1;
> +       else if (!tnum_is_const(reg->var_off))
> +               /* Stack value not statically known */
> +               return -1;
> +
> +       return reg->var_off.value;
> +}
> +

[...]