[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAADnVQL3m_pDaTF9aL85Sz1Fr8UWcWmnKe2foqzVj35GiEofYA@mail.gmail.com>
Date: Sun, 24 Mar 2024 12:12:00 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Puranjay Mohan <puranjay12@...il.com>
Cc: Daniel Borkmann <daniel@...earbox.net>, "David S. Miller" <davem@...emloft.net>,
David Ahern <dsahern@...nel.org>, Alexei Starovoitov <ast@...nel.org>, Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>, Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>,
Yonghong Song <yonghong.song@...ux.dev>, John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...gle.com>, Hao Luo <haoluo@...gle.com>,
Jiri Olsa <jolsa@...nel.org>, Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>, X86 ML <x86@...nel.org>,
"H. Peter Anvin" <hpa@...or.com>, Jean-Philippe Brucker <jean-philippe@...aro.org>,
Network Development <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>, Ilya Leoshkevich <iii@...ux.ibm.com>
Subject: Re: [PATCH bpf v4] bpf: verifier: prevent userspace memory access
On Sun, Mar 24, 2024 at 3:44 AM Puranjay Mohan <puranjay12@...il.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@...il.com> writes:
>
> > On Fri, Mar 22, 2024 at 9:28 AM Daniel Borkmann <daniel@...earbox.net> wrote:
> >>
> >> On 3/22/24 4:05 PM, Puranjay Mohan wrote:
> >> [...]
> >> >>> + /* Make it impossible to de-reference a userspace address */
> >> >>> + if (BPF_CLASS(insn->code) == BPF_LDX &&
> >> >>> + (BPF_MODE(insn->code) == BPF_PROBE_MEM ||
> >> >>> + BPF_MODE(insn->code) == BPF_PROBE_MEMSX)) {
> >> >>> + struct bpf_insn *patch = &insn_buf[0];
> >> >>> + u64 uaddress_limit = bpf_arch_uaddress_limit();
> >> >>> +
> >> >>> + if (!uaddress_limit)
> >> >>> + goto next_insn;
> >> >>> +
> >> >>> + *patch++ = BPF_MOV64_REG(BPF_REG_AX, insn->src_reg);
> >> >>> + if (insn->off)
> >> >>> + *patch++ = BPF_ALU64_IMM(BPF_ADD, BPF_REG_AX, insn->off);
> >> >>> + *patch++ = BPF_ALU64_IMM(BPF_RSH, BPF_REG_AX, 32);
> >> >>> + *patch++ = BPF_JMP_IMM(BPF_JLE, BPF_REG_AX, uaddress_limit >> 32, 2);
> >> >>> + *patch++ = *insn;
> >> >>> + *patch++ = BPF_JMP_IMM(BPF_JA, 0, 0, 1);
> >> >>> + *patch++ = BPF_MOV64_IMM(insn->dst_reg, 0);
> >> >>
> >> >> But how does this address other cases where we could fault e.g. non-canonical,
> >> >> vsyscall page, etc? Technically, we would have to call to copy_from_kernel_nofault_allowed()
> >> >> to really address all the cases aside from the overflow (good catch btw!) where kernel
> >> >> turns into user address.
> >> >
> >> > So, we are trying to ~simulate a call to
> >> > copy_from_kernel_nofault_allowed() here. If the address under
> >> > consideration is below TASK_SIZE (TASK_SIZE + 4GB to be precise) then we
> >> > skip that load because that address could be mapped by the user.
> >> >
> >> > If the address is above TASK_SIZE + 4GB, we allow the load and it could
> >> > cause a fault if the address is invalid, non-canonical etc. Taking the
> >> > fault is fine because JIT will add an exception table entry for
> >> > for that load with BPF_PBOBE_MEM.
> >>
> >> Are you sure? I don't think the kernel handles non-canonical fixup.
> >
> > I believe it handles it fine otherwise our selftest bpf_testmod_return_ptr:
> > case 4: return (void *)(1ull << 60); /* non-canonical and invalid */
> > would have been crashing for the last 3 years,
> > since we've been running it.
> >
> >> > The vsyscall page is special, this approach skips all loads from this
> >> > page. I am not sure if that is acceptable.
> >>
> >> The bpf_probe_read_kernel() does handle it fine via copy_from_kernel_nofault().
> >>
> >> So there is tail risk that BPF_PROBE_* could trigger a crash.
> >
> > For this patch let's do
> > return max(TASK_SIZE_MAX + PAGE_SIZE, VSYSCALL_ADDR)
> > to cover both with one check?
>
> I agree, will add this in the next version.
Sorry. I didn't look at actual value when I suggested this.
Let's think how to check for them with one conditional branch.
Maybe we can REG_AX >>= 24 instead of 32
and do some clever math, since we need to:
addr < 0x7ffffful || addr == 0xfffffffffful
?
If we have to do two branches that's fine for now,
but we probably need to leave it to JITs,
since they can emit two faster conditional moves and use
single conditional jump or
introduce new pseudo bpf insns equivalent to x86 set[ae|ne|..]
Powered by blists - more mailing lists