linux-kernel - infinite loop in read_hpet from ktime_get_boot_fast

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHmME9qBDtO1vJrA2Ch3SQigsu435wR7Q3vTm_3R=u=BE49S-Q@mail.gmail.com>
Date:   Fri, 7 Jun 2019 16:14:00 +0200
From:   "Jason A. Donenfeld" <Jason@...c4.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     LKML <linux-kernel@...r.kernel.org>, clemens@...isch.de,
        Sultan Alsawaf <sultan@...neltoast.com>
Subject: infinite loop in read_hpet from ktime_get_boot_fast_ns

Hey Thomas,

After some discussions here prior about the different clocks
available, WireGuard uses ktime_get_boot_fast_ns() pretty extensively.
The requirement is for a quasi-accurate monotonic counter that takes
into account sleep time, and this seems to fit the bill pretty well.
Sultan (CC'd) reported to me a non-reproducible bug he encountered in
4.19.47 (arch's linux-lts package), where the CPU was hung in
read_hpet.

CPU: 1 PID: 7927 Comm: kworker/1:3 Tainted: G           OE     4.19.47-1-lts #1
Hardware name: Dell Inc. XPS 15 9570/02MJVY, BIOS 1.10.1 04/26/2019
Workqueue: wg-crypt-interface wg_packet_tx_worker [wireguard]
RIP: 0010:read_hpet+0x67/0xc0
Code: c0 75 11 ba 01 00 00 00 f0 0f b1 15 a3 3d 1a 01 85 c0 74 37 48
89 cf 57 9d 0f 1f 44 00 00 48 c1 ee 20 eb 04 85 c9 74 12 f3 90 <49> 8b
08 48 89 ca 48 c1 ea 20 89 d0 39 f2 74 ea c3 48 8b 05 89 56
RSP: 0018:ffffb8d382533e18 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
RAX: 0000000018a4c89e RBX: 0000000000000000 RCX: 18a4c89e00000001
RDX: 0000000018a4c89e RSI: 0000000018a4c89e RDI: ffffffffb8227980
RBP: 000006c1c3f602a2 R08: ffffffffb8205040 R09: 0000000000000000
R10: 000001d58fd28efc R11: 0000000000000000 R12: ffffffffb8259a80
R13: 00000000ffffffff R14: 0000000518a0d8c4 R15: 000000000010fa5a
FS:  0000000000000000(0000) GS:ffff9b90ac240000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00003663b14d9ce8 CR3: 000000030f20a006 CR4: 00000000003606e0
Call Trace:
 ktime_get_mono_fast_ns+0x53/0xa0
 ktime_get_boot_fast_ns+0x5/0x10
 wg_packet_tx_worker+0x183/0x220 [wireguard]
 process_one_work+0x1f4/0x3e0
 worker_thread+0x2d/0x3e0
 ? process_one_work+0x3e0/0x3e0
 kthread+0x112/0x130
 ? kthread_park+0x80/0x80
 ret_from_fork+0x35/0x40
watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [kworker/1:3:7927]

It looks like RIP is spinning in this loop in read_hpet:

do {
    cpu_relax();
    new.lockval = READ_ONCE(hpet.lockval);
} while ((new.value == old.value) && arch_spin_is_locked(&new.lock));

I imagine this could be a bug in the hpet code, or a failure of the
hpet hardware. But I thought it'd be most prudent to check, first,
whether there are actually very particular conditions on when and
where ktime_get_boot_fast_ns and friends can be called. In other
words, maybe the bug is actually in my code. I was under the
impression that invoking it from anywhere was fine, given the
documentation says "NMI safe", but maybe there are still some
requirements I should keep in mind?

Thanks,
Jason