[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <26425897-5229-d2c5-1e1b-a08442441f68@runbox.com>
Date: Tue, 1 Nov 2016 13:15:53 +0300
From: "M. Vefa Bicakci" <m.v.b@...box.com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"Charles (Chas) Williams" <ciwillia@...cade.com>
Subject: Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()
> On 2016-10-27 15:00:32 [-0400], Charles (Chas) Williams wrote:
>>
>> [snip]
>>
>> But sometimes the topology info is correct and if I get lucky, the
>> package id could be valid for all the CPU's. Given the behavior,
>> I have seen so far it makes me thing the RAPL isn't being emulated.
>> So even if I did boot onto a "valid" set of cores, would I always be
>> certain that I will be on those cores?
>
> I don't what vmware does here. Nor do they ship source to check. So if
> you have a big HW box with say two packages, it might make sense to give
> this information to the guest _if_ the CPUs are pinned and the guest
> never migrates.
>
>> Per your request in your next email:
>>
>> > One thing I forgot to ask: Could you please check if you get the same
>> > pkgid reported for cpu 0-3 on a pre-v4.8 kernel? (before the hotplug
>> > rework).
>>
>> Our previous kernel was 4.4, and didn't use the logical package id:
>
> I see.
>
> Did the patch I sent fixed it for you and were you not able to test?
Hello Sebastian,
The patch fixes the kernel oops for me.
I am using a custom 4.8.5-based kernel on Qubes OS R3.2, which is based
on Xen 4.6.3. Apparently, Xen also has a similar bug/flaw/quirk regarding
the allocation of package identifiers for the virtual CPUs.
Prior to your patch, my Xen-based virtual machines would intermittently
crash most of the time at boot-up with the backtrace reported by Charles.
Due to this, I was under the impression that this is a subtle race
condition.
With your patch, the virtual machines boot-up successfully, all the time.
Here are the relevant excerpts from dmesg:
=== 8< ===
[ 0.263936] RAPL PMU: rapl pmu error: max package: 1 but CPU0 belongs to 65535
...
[ 2.213669] intel_rapl: Found RAPL domain package
[ 2.213689] intel_rapl: Found RAPL domain core
[ 2.216337] intel_rapl: Found RAPL domain uncore
[ 2.216370] intel_rapl: RAPL package 0 domain package locked by BIOS
=== >8 ===
Thank you,
Vefa
Please note: I am not subscribed to the Linux kernel mailing list, so
I had to manually construct the headers of this reply with the proper
In-Reply-To and References values (which were extracted from marc.info).
As a result, this e-mail may not show up as a reply to your earlier
conversation with Charles.
Powered by blists - more mailing lists