linux-kernel - Re: [PREEMPT-RT] Oops in rapl_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <26425897-5229-d2c5-1e1b-a08442441f68@runbox.com>
Date:   Tue, 1 Nov 2016 13:15:53 +0300
From:   "M. Vefa Bicakci" <m.v.b@...box.com>
To:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Charles (Chas) Williams" <ciwillia@...cade.com>
Subject: Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()

> On 2016-10-27 15:00:32 [-0400], Charles (Chas) Williams wrote:
>>
>> [snip]
>>
>> But sometimes the topology info is correct and if I get lucky, the
>> package id could be valid for all the CPU's.  Given the behavior,
>> I have seen so far it makes me thing the RAPL isn't being emulated.
>> So even if I did boot onto a "valid" set of cores, would I always be
>> certain that I will be on those cores?
> 
> I don't what vmware does here. Nor do they ship source to check. So if
> you have a big HW box with say two packages, it might make sense to give
> this information to the guest _if_ the CPUs are pinned and the guest
> never migrates.
> 
>> Per your request in your next email:
>> 
>> > One thing I forgot to ask: Could you please check if you get the same
>> > pkgid reported for cpu 0-3 on a pre-v4.8 kernel? (before the hotplug
>> > rework).
>> 
>> Our previous kernel was 4.4, and didn't use the logical package id:
>
> I see.
> 
> Did the patch I sent fixed it for you and were you not able to test?

Hello Sebastian,

The patch fixes the kernel oops for me.

I am using a custom 4.8.5-based kernel on Qubes OS R3.2, which is based
on Xen 4.6.3. Apparently, Xen also has a similar bug/flaw/quirk regarding
the allocation of package identifiers for the virtual CPUs.

Prior to your patch, my Xen-based virtual machines would intermittently
crash most of the time at boot-up with the backtrace reported by Charles.
Due to this, I was under the impression that this is a subtle race
condition.

With your patch, the virtual machines boot-up successfully, all the time.
Here are the relevant excerpts from dmesg:

=== 8< ===
[    0.263936] RAPL PMU: rapl pmu error: max package: 1 but CPU0 belongs to 65535
...
[    2.213669] intel_rapl: Found RAPL domain package
[    2.213689] intel_rapl: Found RAPL domain core
[    2.216337] intel_rapl: Found RAPL domain uncore
[    2.216370] intel_rapl: RAPL package 0 domain package locked by BIOS
=== >8 ===

Thank you,

Vefa

Please note: I am not subscribed to the Linux kernel mailing list, so
I had to manually construct the headers of this reply with the proper
In-Reply-To and References values (which were extracted from marc.info).
As a result, this e-mail may not show up as a reply to your earlier
conversation with Charles.