linux-kernel - Re: Oops from calibrate_delay_is_known on qemu machine with Linux v4.5-1523-g271ecc5253e2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.11.1603172146420.3978@nanos>
Date:	Thu, 17 Mar 2016 22:01:38 +0100 (CET)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Josh Boyer <jwboyer@...oraproject.org>
cc:	"Richard W.M. Jones" <rjones@...hat.com>, x86 <x86@...nel.org>,
	"Linux-Kernel@...r. Kernel. Org" <linux-kernel@...r.kernel.org>
Subject: Re: Oops from calibrate_delay_is_known on qemu machine with Linux
 v4.5-1523-g271ecc5253e2

Josh,

On Thu, 17 Mar 2016, Josh Boyer wrote:
> We've had a report [1] of the mainline kernel crashing on a single-cpu
> QEMU machine (not kvm) in Fedora.  It looks as if the emulated machine
> is failing to provide a TSC and the calibrate_delay_is_known function
> is passing NULL to cpumask_any_but for the mask parameter.  At least
> that's all I've been able to discern thus far.
> 
> I was wondering if you had any insight into this issue, given your
> recent commit to change calibrate_delay_is_known to use
> topology_core_cpumask.  The backtrace is below.

> at           (null)
> [    0.010000] IP: [<ffffffff814698b5>] _find_next_bit.part.0+0x15/0x70
> [    0.010000] PGD 0
>
> [    0.010000] RSP: 0000:ffffffff81e03e40  EFLAGS: 00000246
> [    0.010000] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [    0.010000] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
> [    0.010000] RBP: ffffffff81e03e50 R08: ffffffffffffffff R09: 0000000000000000
> [    0.010000] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [    0.010000] R13: ffffffff82248960 R14: ffffffff822562e0 R15: 0000000000000000
> [    0.010000] FS:  0000000000000000(0000) GS:ffff88001ee00000(0000)
> knlGS:0000000000000000
> [    0.010000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.010000] CR2: 0000000000000000 CR3: 0000000001e06000 CR4: 00000000000006b0
> [    0.010000] Stack:
> [    0.010000]  ffffffff81e03e50 ffffffff81469928 ffffffff81e03e70
> ffffffff81453d56
> [    0.010000]  0000000000000000 ffff88001f3fa780 ffffffff81e03e80
> ffffffff81040495
> [    0.010000]  ffffffff81e03f40 ffffffff8100285a ffffffff810eefb3
> ffffffff00000000
> [    0.010000] Call Trace:
> [    0.010000]  [<ffffffff81469928>] ? find_next_bit+0x18/0x20
> [    0.010000]  [<ffffffff81453d56>] cpumask_any_but+0x26/0x50

Yuck. That requires that topology_core_cpumask(cpu) is NULL.

#define topology_core_cpumask(cpu)        (per_cpu(cpu_core_map, cpu))

...

DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_map);

So that can only result in a NULL pointer if you CONFIG_CPUMASK_OFFSTACK
enabled and the allocation fails, which is not checked !?@!

I tried to reproduce with Richards script, but so far no dice. Can you please
provide your kernel config?

Thanks,

	tglx