netdev - Re: [PATCH bpf-next 1/2] bpf: track contents of read-only maps as scalars

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191009003424.ewbra36vpgla2rlj@kafai-mbp.dhcp.thefacebook.com>
Date:   Wed, 9 Oct 2019 00:34:32 +0000
From:   Martin Lau <kafai@...com>
To:     Andrii Nakryiko <andrii.nakryiko@...il.com>
CC:     "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Alexei Starovoitov <ast@...com>,
        "daniel@...earbox.net" <daniel@...earbox.net>,
        Kernel Team <Kernel-team@...com>
Subject: Re: [PATCH bpf-next 1/2] bpf: track contents of read-only maps as
 scalars

On Tue, Oct 08, 2019 at 04:49:30PM -0700, Andrii Nakryiko wrote:
> On Tue, Oct 8, 2019 at 2:53 PM Martin Lau <kafai@...com> wrote:
> >
> > On Tue, Oct 08, 2019 at 12:45:47PM -0700, Andrii Nakryiko wrote:
> > > Maps that are read-only both from BPF program side and user space side
> > > have their contents constant, so verifier can track referenced values
> > > precisely and use that knowledge for dead code elimination, branch
> > > pruning, etc. This patch teaches BPF verifier how to do this.
> > >
> > > Signed-off-by: Andrii Nakryiko <andriin@...com>
> > > ---
> > >  kernel/bpf/verifier.c | 58 +++++++++++++++++++++++++++++++++++++++++--
> > >  1 file changed, 56 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index ffc3e53f5300..1e4e4bd64ca5 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -2739,6 +2739,42 @@ static void coerce_reg_to_size(struct bpf_reg_state *reg, int size)
> > >       reg->smax_value = reg->umax_value;
> > >  }
> > >
> > > +static bool bpf_map_is_rdonly(const struct bpf_map *map)
> > > +{
> > > +     return (map->map_flags & BPF_F_RDONLY_PROG) &&
> > > +            ((map->map_flags & BPF_F_RDONLY) || map->frozen);
> > > +}
> > > +
> > > +static int bpf_map_direct_read(struct bpf_map *map, int off, int size, u64 *val)
> > > +{
> > > +     void *ptr;
> > > +     u64 addr;
> > > +     int err;
> > > +
> > > +     err = map->ops->map_direct_value_addr(map, &addr, off + size);
> > Should it be "off" instead of "off + size"?
> 
> From array_map_direct_value_addr() code, offset is used only to check
> that access is happening within array value bounds.
The "size" check is done separately in the check_map_access(),
so "off" is offset alone, I think.

> It's not used to
> calculate returned pointer.
> But now re-reading its code again, I think this check is wrong:
> 
> if (off >= map->value_size)
>         break;
> 
> It has to be (off > map->value_size). But it seems like this whole
> interface is counter-intuitive.
> 
> I'm wondering if Daniel can clarify the intent behind this particular behavior.
> 
> For now the easiest fix is to pass (off + size - 1). But maybe we
> should change the contract to be something like
> 
> int map_direct_value_addr(const struct bpf_map *map, u64 off, int
> size, void *ptr)
> 
> This then can validate that entire access in the range of [off, off +
> size) is acceptable to a map, and then return void * pointer according
> to given off. Thoughts?
> 
> >
> > > +     if (err)
> > > +             return err;
> > > +     ptr = (void *)addr + off;
> > > +
> > > +     switch (size) {
> > > +     case sizeof(u8):
> > > +             *val = (u64)*(u8 *)ptr;
> > > +             break;
> > > +     case sizeof(u16):
> > > +             *val = (u64)*(u16 *)ptr;
> > > +             break;
> > > +     case sizeof(u32):
> > > +             *val = (u64)*(u32 *)ptr;
> > > +             break;
> > > +     case sizeof(u64):
> > > +             *val = *(u64 *)ptr;
> > > +             break;
> > > +     default:
> > > +             return -EINVAL;
> > > +     }
> > > +     return 0;
> > > +}
> > > +
> > >  /* check whether memory at (regno + off) is accessible for t = (read | write)
> > >   * if t==write, value_regno is a register which value is stored into memory
> > >   * if t==read, value_regno is a register which will receive the value from memory
> > > @@ -2776,9 +2812,27 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
> > >               if (err)
> > >                       return err;
> > >               err = check_map_access(env, regno, off, size, false);
> > > -             if (!err && t == BPF_READ && value_regno >= 0)
> > > -                     mark_reg_unknown(env, regs, value_regno);
> > > +             if (!err && t == BPF_READ && value_regno >= 0) {
> > > +                     struct bpf_map *map = reg->map_ptr;
> > > +
> > > +                     /* if map is read-only, track its contents as scalars */
> > > +                     if (tnum_is_const(reg->var_off) &&
> > > +                         bpf_map_is_rdonly(map) &&
> > > +                         map->ops->map_direct_value_addr) {
> > > +                             int map_off = off + reg->var_off.value;
> > > +                             u64 val = 0;
> > >
> > > +                             err = bpf_map_direct_read(map, map_off, size,
> > > +                                                       &val);
> > > +                             if (err)
> > > +                                     return err;
> > > +
> > > +                             regs[value_regno].type = SCALAR_VALUE;
> > > +                             __mark_reg_known(&regs[value_regno], val);
> > > +                     } else {
> > > +                             mark_reg_unknown(env, regs, value_regno);
> > > +                     }
> > > +             }
> > >       } else if (reg->type == PTR_TO_CTX) {
> > >               enum bpf_reg_type reg_type = SCALAR_VALUE;
> > >
> > > --
> > > 2.17.1
> > >