lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CABEBQikk7=cL_Y1CYrSig4okjcjzVUU_f051+FHD96qnvSyCYg@mail.gmail.com>
Date:   Wed, 9 Mar 2022 11:37:54 +0000
From:   Frank Hofmann <fhofmann@...udflare.com>
To:     netdev@...r.kernel.org, kernel-team <kernel-team@...udflare.com>
Subject: #GP in fnhe_dump_bucket()

Hi,

we've had crashes on two of our systems with identical backtraces,

general protection fault, probably for non-canonical address
0x200000001820f0b6: 0000 [#1] SMP NOPTI
CPU: 74 PID: 243831 Comm: conduit-edge Tainted: G           O
5.15.26-cloudflare-2022.3.4 #1
Hardware name: HYVE EDGE-METAL-GEN11/HS1811D_Lite, BIOS V0.07-sig 05/20/2021
RIP: 0010:fib_dump_info_fnhe+0x11c/0x250
Code: 24 28 48 8b 7c 24 28 e8 02 9e ff ff 48 83 c4 20 85 c0 75 79 8b
45 00 83 c0 01 89 45 00 48 8b 1b 48 85 db 74 40 41 39 c4 7f ed <44> 3b
73 08 75 e7 48 8b 53 20 48 85 d2 74 0c 48 8b 0d 2e 0d 1a 01
RSP: 0018:ffffba03f44db9e8 EFLAGS: 00010297
RAX: 0000000000000057 RBX: 200000001820f0ae RCX: 00000001030a5ecf
RDX: 0000000100a2576e RSI: ffffba03f44db908 RDI: 00000000000003e6
RBP: ffffba03f44dba94 R08: 0000000000000000 R09: ffff980fae5630c4
R10: 00000000000003e6 R11: ffff980d106f6368 R12: 0000000000000000
R13: 00000000000003e6 R14: 0000000000000000 R15: ffff980d106f6368
FS:  00007fe58effd700(0000) GS:ffff98179fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fdc68503000 CR3: 0000003219c22003 CR4: 0000000000770ee0
PKRU: 55555554
Call Trace:
 <TASK>
 fib_table_dump+0x210/0x300
 inet_dump_fib+0x136/0x270
 rtnl_dump_all+0xaf/0xe0
 netlink_dump+0x168/0x3d0
 ? validate_linkmsg+0x100/0x100
 __netlink_dump_start+0x1c4/0x2a0
 rtnetlink_rcv_msg+0x290/0x380
 ? 0xffffffffc0e460c8
 ? validate_linkmsg+0x100/0x100
 ? rtnl_calcit.isra.0+0x130/0x130
 netlink_rcv_skb+0x50/0xf0
 netlink_unicast+0x1fc/0x2c0
 netlink_sendmsg+0x255/0x4d0
 sock_sendmsg+0x5e/0x60
 __sys_sendto+0xee/0x150
 ? call_rcu+0x91/0x250
 ? auditd_test_task+0x33/0x40
 ? __audit_syscall_entry+0xe6/0x110
 __x64_sys_sendto+0x25/0x30
 do_syscall_64+0x3b/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x48060a
Code: e8 9b ca fe ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 4c
8b 54 24 28 4c 8b 44 24 30 4c 8b 4c 24 38 48 8b 44 24 08 0f 05 <48> 3d
01 f0 ff ff 76 20 48 c7 44 24 40 ff ff ff ff 48 c7 44 24 48
RSP: 002b:000000c00935efb8 EFLAGS: 00000212 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 000000c00004b800 RCX: 000000000048060a
RDX: 0000000000000020 RSI: 000000c002f73ac0 RDI: 0000000000000024
RBP: 000000c00935f020 R08: 000000c00487d780 R09: 000000000000000c
R10: 0000000000000000 R11: 0000000000000212 R12: 0000000000000000
R13: 0000000000000001 R14: 000000c000438680 R15: ffffffffffffffff
</TASK>

This is actually inlined the fnhe_dump_bucket(),

 static int fnhe_dump_bucket(struct net *net, struct sk_buff *skb,
                            struct netlink_callback *cb, u32 table_id,
                            struct fnhe_hash_bucket *bucket, int genid,
                            int *fa_index, int fa_start, unsigned int flags)
{
        int i;
        for (i = 0; i < FNHE_HASH_SIZE; i++) {
                struct fib_nh_exception *fnhe;
                for (fnhe = rcu_dereference(bucket[i].chain); fnhe;
                     fnhe = rcu_dereference(fnhe->fnhe_next)) {
                        struct rtable *rt;
                        int err;
                        if (*fa_index < fa_start)
                                goto next;
                        if (fnhe->fnhe_genid != genid)
                            ^^^^^^^^^^^^^^^^
                                goto next;
...

I don't know / can't say from the trace above whether the non-canon
addr is in the bucket or the chain (outer/inner loop).

The code there appears pretty much unchanged since the introduction in
ee28906fd7a1437ca77a60a99b6b9c6d676220f8 and the RCU addition (around
the call to above) in fib_dump_info_fnhe() via
93ed54b15b2aae060c75ac00eb251ed02745eed1 (per 2019 syzkaller report
https://syzkaller.appspot.com/bug?id=2987ec035b602681b9c84de888615ea144f3c8ea).

We haven't found a reproducer yet; could it be that
fnhe_dump_bucket(), as it iterates all of hash+buckets, needs
spin_lock_bh(&fnhe_lock)/spin_unlock_bh(&fnhe_lock) as well ?

This seen on 5.15.26.

Thanks in advance for casting an eye over this !

Frank Hofmann

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ