lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <26425897-5229-d2c5-1e1b-a08442441f68@runbox.com>
Date:   Tue, 1 Nov 2016 13:15:53 +0300
From:   "M. Vefa Bicakci" <m.v.b@...box.com>
To:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Charles (Chas) Williams" <ciwillia@...cade.com>
Subject: Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()

> On 2016-10-27 15:00:32 [-0400], Charles (Chas) Williams wrote:
>>
>> [snip]
>>
>> But sometimes the topology info is correct and if I get lucky, the
>> package id could be valid for all the CPU's.  Given the behavior,
>> I have seen so far it makes me thing the RAPL isn't being emulated.
>> So even if I did boot onto a "valid" set of cores, would I always be
>> certain that I will be on those cores?
> 
> I don't what vmware does here. Nor do they ship source to check. So if
> you have a big HW box with say two packages, it might make sense to give
> this information to the guest _if_ the CPUs are pinned and the guest
> never migrates.
> 
>> Per your request in your next email:
>> 
>> > One thing I forgot to ask: Could you please check if you get the same
>> > pkgid reported for cpu 0-3 on a pre-v4.8 kernel? (before the hotplug
>> > rework).
>> 
>> Our previous kernel was 4.4, and didn't use the logical package id:
>
> I see.
> 
> Did the patch I sent fixed it for you and were you not able to test?

Hello Sebastian,

The patch fixes the kernel oops for me.

I am using a custom 4.8.5-based kernel on Qubes OS R3.2, which is based
on Xen 4.6.3. Apparently, Xen also has a similar bug/flaw/quirk regarding
the allocation of package identifiers for the virtual CPUs.

Prior to your patch, my Xen-based virtual machines would intermittently
crash most of the time at boot-up with the backtrace reported by Charles.
Due to this, I was under the impression that this is a subtle race
condition.

With your patch, the virtual machines boot-up successfully, all the time.
Here are the relevant excerpts from dmesg:

=== 8< ===
[    0.263936] RAPL PMU: rapl pmu error: max package: 1 but CPU0 belongs to 65535
...
[    2.213669] intel_rapl: Found RAPL domain package
[    2.213689] intel_rapl: Found RAPL domain core
[    2.216337] intel_rapl: Found RAPL domain uncore
[    2.216370] intel_rapl: RAPL package 0 domain package locked by BIOS
=== >8 ===

Thank you,

Vefa

Please note: I am not subscribed to the Linux kernel mailing list, so
I had to manually construct the headers of this reply with the proper
In-Reply-To and References values (which were extracted from marc.info).
As a result, this e-mail may not show up as a reply to your earlier
conversation with Charles.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ