linux-kernel - Re: [PATCH v2 2/2] x86/cfi,bpf: Fix BPF JIT call

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ivhrgimonsvy3tyj5iidoqmlcyqvtsh2ay3cm3ouemsdbvjzs4@6jlt6zv55tgh>
Date:   Thu, 7 Dec 2023 14:32:12 -0800
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Jiri Olsa <olsajiri@...il.com>, Song Liu <song@...nel.org>,
        Song Liu <songliubraving@...a.com>,
        Paul Walmsley <paul.walmsley@...ive.com>,
        Palmer Dabbelt <palmer@...belt.com>,
        Albert Ou <aou@...s.berkeley.edu>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        X86 ML <x86@...nel.org>, "H. Peter Anvin" <hpa@...or.com>,
        "David S. Miller" <davem@...emloft.net>,
        David Ahern <dsahern@...nel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <martin.lau@...ux.dev>,
        Yonghong Song <yonghong.song@...ux.dev>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...nel.org>,
        Stanislav Fomichev <sdf@...gle.com>,
        Hao Luo <haoluo@...gle.com>, Arnd Bergmann <arnd@...db.de>,
        Sami Tolvanen <samitolvanen@...gle.com>,
        Kees Cook <keescook@...omium.org>,
        Nathan Chancellor <nathan@...nel.org>,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        linux-riscv <linux-riscv@...ts.infradead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Network Development <netdev@...r.kernel.org>,
        bpf <bpf@...r.kernel.org>,
        linux-arch <linux-arch@...r.kernel.org>,
        clang-built-linux <llvm@...ts.linux.dev>,
        Josh Poimboeuf <jpoimboe@...nel.org>,
        Joao Moreira <joao@...rdrivepizza.com>,
        Mark Rutland <mark.rutland@....com>
Subject: Re: [PATCH v2 2/2] x86/cfi,bpf: Fix BPF JIT call

On Thu, Dec 07, 2023 at 10:31:05AM +0100, Peter Zijlstra wrote:
> On Wed, Dec 06, 2023 at 01:39:43PM -0800, Alexei Starovoitov wrote:
> 
> 
> > All is ok until kCFI comes into picture.
> > Here we probably need to teach arch_prepare_bpf_trampoline() to emit
> > different __kcfi_typeid depending on kernel function proto,
> > so that caller hash checking logic won't be tripped.
> > I suspect that requires to reverse engineer an algorithm of computing kcfi from clang.
> > other ideas?
> 
> I was going to try and extend bpf_struct_ops with a pointer, this
> pointer will point to a struct of the right type with all ops filled out
> as stubs.

Right. Something like this, but it's more nuanced.

The bpf_struct_ops concept is a generic mechanism to provide bpf-based callback
to any set of kernel callbacks.

bpf tcp CC plugs into:
struct tcp_congestion_ops {
        /* do new cwnd calculation (required) */
        void (*cong_avoid)(struct sock *sk, u32 ack, u32 acked);

        /* call before changing ca_state (optional) */
        void (*set_state)(struct sock *sk, u8 new_state);

        /* call when cwnd event occurs (optional) */
        void (*cwnd_event)(struct sock *sk, enum tcp_ca_event ev);
  ...
};

and from bpf side we don't touch tcp kernel bits at all.
tcp stack doesn't know whether it's calling bpf based CC or builtin CC or CC provided by kernel module.

bpf struct_ops mechanim is a zero cost extension to potentially any kernel mechanism
that is already setup with callbacks. tcp_congestion_ops is one of them.

The allowlisting of tcp_congestion_ops for bpf use is done in net/ipv4/bpf_tcp_ca.c via:

struct bpf_struct_ops bpf_tcp_congestion_ops = {
        ...
        .reg = bpf_tcp_ca_reg,
        .unreg = bpf_tcp_ca_unreg,
        ...
        .name = "tcp_congestion_ops",
};
static int bpf_tcp_ca_reg(void *kdata)
{
        return tcp_register_congestion_control(kdata);
}
and
 int tcp_register_congestion_control(struct tcp_congestion_ops *type);
is a normal TCP CC registration routine that is used by all CCs.

The bpf struct_ops infra prepares everything inside
'struct tcp_congestion_ops' that makes it indistinguishable from normal kernel CC,
except kCFI part. sadly.

The kCFI challenge is that clang may not generate any __cfi + __kcfi_typeid at all.
Like if vmlinux doesn't have any TCP CCs built-in there will be no kCFI hashes
in the kernel that represent a required hash to call cong_avoid, set_state, cwnd_event.

At the callsite like in net/ipv4/tcp_input.c
  icsk->icsk_ca_ops->cong_avoid(sk, ack, acked);
the clang will compute the kcfi hash, but there will be no __cfi symbol in vmlinux.

If there was one we could teach the verifier to look for __kcfi...cong_avoid
in kallsyms, then read cfi hash from there and populate it into generated asm
in arch_prepare_bpf_trampoline.

So I'm thinking to do this:

diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
index c7bbd8f3c708..afaadc2c0827 100644
--- a/net/ipv4/bpf_tcp_ca.c
+++ b/net/ipv4/bpf_tcp_ca.c
@@ -283,6 +283,12 @@ struct bpf_struct_ops bpf_tcp_congestion_ops = {
        .name = "tcp_congestion_ops",
 };

+/* never accessed, here for clang kCFI only */
+extern void __kcfi_cong_avoid(struct sock *sk, u32 ack, u32 acked);
+__ADDRESSABLE(__kcfi_cong_avoid);
+extern void __kcfi_set_state(struct sock *sk, u8 new_state);
+__ADDRESSABLE(__kcfi_set_state);

To force kcfi generation and then teach struct_ops infra to look
for __kcfi_typeid___kcfi_##callbackname in kallsyms,
read kcfi from there and populate into bpf trampoline.

Since kcfi and bpf are not working well, I believe it's bpf folks job to fix it.
Especially since it's starting to look bpf infra heavy.

If you're interested I can, of course, point to relevant bits in kernel/bpf/bpf_struct_ops.c
that would need to be extended to support such kcfi_typeid search,
but I think it's my job to fix it.

If I get stuck, I'll ask for help.

I also researched a different approach.
llvm does the following to compute the kcfi hash:

llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) {
  if (auto *FnType = T->getAs<FunctionProtoType>())
    T = getContext().getFunctionType(
        FnType->getReturnType(), FnType->getParamTypes(),
        FnType->getExtProtoInfo().withExceptionSpec(EST_None));

  std::string OutName;
  llvm::raw_string_ostream Out(OutName);
  getCXXABI().getMangleContext().mangleCanonicalTypeName(
      T, Out, getCodeGenOpts().SanitizeCfiICallNormalizeIntegers);

  if (getCodeGenOpts().SanitizeCfiICallNormalizeIntegers)
    Out << ".normalized";

  return llvm::ConstantInt::get(Int32Ty,
                                static_cast<uint32_t>(llvm::xxHash64(OutName)));
}

xxhash is already available in the kernel.
We can add type mangling logic and convert prototype of cong_avoid, set_state, etc
(that are already available in vmlinux BTF) into a mangled string and
apply xxhash on that string.
This way we wouldn't need to add __kcfi stubs to net/ipv4/bpf_tcp_ca.c.
kcfi will be computed on demand.

> > 
> > This case is easier to make work with kCFI.
> > The JIT will use:
> > cfi_bpf_hash:
> >       .long   __kcfi_typeid___bpf_prog_runX  
> > like your patch already does.
> > And will use
> > extern u64 __bpf_callback_fn(u64, u64, u64, u64, u64);
> > cfi_bpf_subprog_hash:
> >       .long   __kcfi_typeid___bpf_callback_fn
> > to JIT all subprogs. See bpf_is_subprog().
> 
> Aaah!, yes it should be trivial to use another hash value when
> is_subprog in emit_prologue().

Great. I'll wait for your respin and then will start building "kcfi for struct-ops"
via one of the two approaches above.

> Yes, we can do that. Plans have changed on my side too -- I'm taking a 6
> week break soon, so I'll do whatever I can before I'm out, and then
> continue from whatever state I find when I get back.

6 weeks! Nice. Enjoy the long break.
Last time I took 3 week PTO in a row I got bored after week 2 and went back to hacking :)