netdev - Re: [Potential Spoof] [PATCH bpf-next] libbpf: improve handling of failed CO-RE relocations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAEf4BzbM7s8JWM8bPq=JdFX-ujkbYUifD7hNUQOGSJpJ7x5NJw@mail.gmail.com>
Date:   Fri, 24 Jan 2020 10:30:12 -0800
From:   Andrii Nakryiko <andrii.nakryiko@...il.com>
To:     Martin Lau <kafai@...com>
Cc:     Andrii Nakryiko <andriin@...com>,
        "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Alexei Starovoitov <ast@...com>,
        "daniel@...earbox.net" <daniel@...earbox.net>,
        Kernel Team <Kernel-team@...com>
Subject: Re: [Potential Spoof] [PATCH bpf-next] libbpf: improve handling of
 failed CO-RE relocations

On Thu, Jan 23, 2020 at 11:21 PM Martin Lau <kafai@...com> wrote:
>
> On Thu, Jan 23, 2020 at 09:38:37PM -0800, Andrii Nakryiko wrote:
> > Previously, if libbpf failed to resolve CO-RE relocation for some
> > instructions, it would either return error immediately, or, if
> > .relaxed_core_relocs option was set, would replace relocatable offset/imm part
> > of an instruction with a bogus value (-1). Neither approach is good, because
> > there are many possible scenarios where relocation is expected to fail (e.g.,
> > when some field knowingly can be missing on specific kernel versions). On the
> > other hand, replacing offset with invalid one can hide programmer errors, if
> > this relocation failue wasn't anticipated.
> >
> > This patch deprecates .relaxed_core_relocs option and changes the approach to
> > always replacing instruction, for which relocation failed, with invalid BPF
> > helper call instruction. For cases where this is expected, BPF program should
> > already ensure that that instruction is unreachable, in which case this
> > invalid instruction is going to be silently ignored. But if instruction wasn't
> > guarded, BPF program will be rejected at verification step with verifier log
> > pointing precisely to the place in assembly where the problem is.
> >
> > Signed-off-by: Andrii Nakryiko <andriin@...com>
> > ---
> >  tools/lib/bpf/libbpf.c | 95 +++++++++++++++++++++++++-----------------
> >  tools/lib/bpf/libbpf.h |  6 ++-
> >  2 files changed, 61 insertions(+), 40 deletions(-)
> >
> > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > index ae34b681ae82..39f1b7633a7c 100644
> > --- a/tools/lib/bpf/libbpf.c
> > +++ b/tools/lib/bpf/libbpf.c
> > @@ -345,7 +345,6 @@ struct bpf_object {
> >
> >       bool loaded;
> >       bool has_pseudo_calls;
> > -     bool relaxed_core_relocs;
> >
> >       /*
> >        * Information when doing elf related work. Only valid if fd
> > @@ -4238,25 +4237,38 @@ static int bpf_core_calc_field_relo(const struct bpf_program *prog,
> >   */
> >  static int bpf_core_reloc_insn(struct bpf_program *prog,
> >                              const struct bpf_field_reloc *relo,
> > +                            int relo_idx,
> >                              const struct bpf_core_spec *local_spec,
> >                              const struct bpf_core_spec *targ_spec)
> >  {
> > -     bool failed = false, validate = true;
> >       __u32 orig_val, new_val;
> >       struct bpf_insn *insn;
> > +     bool validate = true;
> >       int insn_idx, err;
> >       __u8 class;
> >
> >       if (relo->insn_off % sizeof(struct bpf_insn))
> >               return -EINVAL;
> >       insn_idx = relo->insn_off / sizeof(struct bpf_insn);
> > +     insn = &prog->insns[insn_idx];
> > +     class = BPF_CLASS(insn->code);
> >
> >       if (relo->kind == BPF_FIELD_EXISTS) {
> >               orig_val = 1; /* can't generate EXISTS relo w/o local field */
> >               new_val = targ_spec ? 1 : 0;
> >       } else if (!targ_spec) {
> > -             failed = true;
> > -             new_val = (__u32)-1;
> > +             pr_debug("prog '%s': relo #%d: substituting insn #%d w/ invalid insn\n",
> > +                      bpf_program__title(prog, false), relo_idx, insn_idx);
> > +             insn->code = BPF_JMP | BPF_CALL;
> > +             insn->dst_reg = 0;
> > +             insn->src_reg = 0;
> > +             insn->off = 0;
> > +             /* if this instruction is reachable (not a dead code),
> > +              * verifier will complain with the following message:
> > +              * invalid func unknown#195896080
> > +              */
> > +             insn->imm = 195896080; /* => 0xbad2310 => "bad relo" */
> Should this value become a binded contract in uapi/bpf.h so
> that the verifier can print a more meaningful name than "unknown#195896080"?
>

It feels a bit premature to fix this in kernel. It's one of many ways
we can do this, e.g., alternative would be using invalid opcode
altogether. It's not yet clear what's the best way to report this from
kernel. Maybe in the future verifier will have some better way to
pinpoint where and what problem there is in user's program through
some more structured approach than current free-form log.

So what I'm trying to say is that we should probably get a bit more
experience using these features first and understand what
kernel/userspace interface should be for reporting issues like this,
before setting anything in stone in verifier. For now, this
"unknown#195896080" should be a pretty unique search term :)

> > +             return 0;
> >       } else {
> >               err = bpf_core_calc_field_relo(prog, relo, local_spec,
> >                                              &orig_val, &validate);