[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5f249f5a74483_54fa2b1d9fe285b4c5@john-XPS-13-9370.notmuch>
Date: Fri, 31 Jul 2020 15:46:50 -0700
From: John Fastabend <john.fastabend@...il.com>
To: Daniel Borkmann <daniel@...earbox.net>,
John Fastabend <john.fastabend@...il.com>, kafai@...com,
ast@...nel.org
Cc: netdev@...r.kernel.org, bpf@...r.kernel.org
Subject: Re: [bpf PATCH v2 1/5] bpf: sock_ops ctx access may stomp registers
in corner case
Daniel Borkmann wrote:
> On 7/29/20 6:22 PM, John Fastabend wrote:
> > I had a sockmap program that after doing some refactoring started spewing
> > this splat at me:
> >
> > [18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
> > [...]
> > [18610.807359] Call Trace:
> > [18610.807370] ? 0xffffffffc114d0d5
> > [18610.807382] __cgroup_bpf_run_filter_sock_ops+0x7d/0xb0
> > [18610.807391] tcp_connect+0x895/0xd50
> > [18610.807400] tcp_v4_connect+0x465/0x4e0
> > [18610.807407] __inet_stream_connect+0xd6/0x3a0
> > [18610.807412] ? __inet_stream_connect+0x5/0x3a0
> > [18610.807417] inet_stream_connect+0x3b/0x60
> > [18610.807425] __sys_connect+0xed/0x120
> >
> > After some debugging I was able to build this simple reproducer,
> >
> > __section("sockops/reproducer_bad")
> > int bpf_reproducer_bad(struct bpf_sock_ops *skops)
> > {
> > volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> > return 0;
> > }
> >
> > And along the way noticed that below program ran without splat,
> >
> > __section("sockops/reproducer_good")
> > int bpf_reproducer_good(struct bpf_sock_ops *skops)
> > {
> > volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> > volatile __maybe_unused __u32 family;
> >
> > compiler_barrier();
> >
> > family = skops->family;
> > return 0;
> > }
> >
> > So I decided to check out the code we generate for the above two
> > programs and noticed each generates the BPF code you would expect,
> >
> > 0000000000000000 <bpf_reproducer_bad>:
> > ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> > 0: r1 = *(u32 *)(r1 + 96)
> > 1: *(u32 *)(r10 - 4) = r1
> > ; return 0;
> > 2: r0 = 0
> > 3: exit
> >
> > 0000000000000000 <bpf_reproducer_good>:
> > ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> > 0: r2 = *(u32 *)(r1 + 96)
> > 1: *(u32 *)(r10 - 4) = r2
> > ; family = skops->family;
> > 2: r1 = *(u32 *)(r1 + 20)
> > 3: *(u32 *)(r10 - 8) = r1
> > ; return 0;
> > 4: r0 = 0
> > 5: exit
> >
> > So we get reasonable assembly, but still something was causing the null
> > pointer dereference. So, we load the programs and dump the xlated version
> > observing that line 0 above 'r* = *(u32 *)(r1 +96)' is going to be
> > translated by the skops access helpers.
> >
> > int bpf_reproducer_bad(struct bpf_sock_ops * skops):
> > ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> > 0: (61) r1 = *(u32 *)(r1 +28)
> > 1: (15) if r1 == 0x0 goto pc+2
> > 2: (79) r1 = *(u64 *)(r1 +0)
> > 3: (61) r1 = *(u32 *)(r1 +2340)
> > ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> > 4: (63) *(u32 *)(r10 -4) = r1
> > ; return 0;
> > 5: (b7) r0 = 0
> > 6: (95) exit
> >
> > int bpf_reproducer_good(struct bpf_sock_ops * skops):
> > ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> > 0: (61) r2 = *(u32 *)(r1 +28)
> > 1: (15) if r2 == 0x0 goto pc+2
> > 2: (79) r2 = *(u64 *)(r1 +0)
> > 3: (61) r2 = *(u32 *)(r2 +2340)
> > ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> > 4: (63) *(u32 *)(r10 -4) = r2
> > ; family = skops->family;
> > 5: (79) r1 = *(u64 *)(r1 +0)
> > 6: (69) r1 = *(u16 *)(r1 +16)
> > ; family = skops->family;
> > 7: (63) *(u32 *)(r10 -8) = r1
> > ; return 0;
> > 8: (b7) r0 = 0
> > 9: (95) exit
> >
> > Then we look at lines 0 and 2 above. In the good case we do the zero
> > check in r2 and then load 'r1 + 0' at line 2. Do a quick cross-check
> > into the bpf_sock_ops check and we can confirm that is the 'struct
> > sock *sk' pointer field. But, in the bad case,
> >
> > 0: (61) r1 = *(u32 *)(r1 +28)
> > 1: (15) if r1 == 0x0 goto pc+2
> > 2: (79) r1 = *(u64 *)(r1 +0)
> >
> > Oh no, we read 'r1 +28' into r1, this is skops->fullsock and then in
> > line 2 we read the 'r1 +0' as a pointer. Now jumping back to our spat,
> >
> > [18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
> >
> > The 0x01 makes sense because that is exactly the fullsock value. And
> > its not a valid dereference so we splat.
> >
> > To fix we need to guard the case when a program is doing a sock_ops field
> > access with src_reg == dst_reg. This is already handled in the load case
> > where the ctx_access handler uses a tmp register being careful to
> > store the old value and restore it. To fix the get case test if
> > src_reg == dst_reg and in this case do the is_fullsock test in the
> > temporary register. Remembering to restore the temporary register before
> > writing to either dst_reg or src_reg to avoid smashing the pointer into
> > the struct holding the tmp variable.
> >
> > Adding this inline code to test_tcpbpf_kern will now be generated
> > correctly from,
> >
> > 9: r2 = *(u32 *)(r2 + 96)
> >
> > to xlated code,
I have this in my logs at line 12,
*(u64 *)(r2 +32) = r9
> > 13: (61) r9 = *(u32 *)(r2 +28)
> > 14: (15) if r9 == 0x0 goto pc+4
> > 15: (79) r9 = *(u64 *)(r2 +32)
> > 16: (79) r2 = *(u64 *)(r2 +0)
> > 17: (61) r2 = *(u32 *)(r2 +2348)
> > 18: (05) goto pc+1
> > 19: (79) r9 = *(u64 *)(r2 +32)
>
> The diff below looks good to me, but I'm confused on this one above. I'm probably
> missing something, but given this is the dst == src case with the r2 register, where
> in the dump do we first saves the content of r9 into the scratch tmp store?
> Line 19 seems to restore it, but the save is missing, no?
>
> Please double check whether this was just omitted, but I would really like to have
> the commit message 100% correct as it otherwise causes confusion when we stare at it
> again a month later wrt what was the original intention.
off-by-one on the cut'n'paste into the commit message. Let me send a v3
with a correction to the commit. I do want this to be correct.
Powered by blists - more mailing lists