lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <bf0a41c61939d50d24585a1769b80738@eikelenboom.it>
Date:	Wed, 02 Dec 2015 11:04:28 +0100
From:	Sander Eikelenboom <linux@...elenboom.it>
To:	Boris Ostrovsky <boris.ostrovsky@...cle.com>
Cc:	linux-kernel@...r.kernel.org, xen-devel@...ts.xen.org,
	david.vrabel@...rix.com
Subject: Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv
 guest under Xen with single vcpu.

On 2015-12-02 00:41, Boris Ostrovsky wrote:
> On 12/01/2015 06:30 PM, Sander Eikelenboom wrote:
>> On 2015-12-02 00:19, Boris Ostrovsky wrote:
>>> On 12/01/2015 06:00 PM, Sander Eikelenboom wrote:
>>>> On 2015-12-01 23:47, Boris Ostrovsky wrote:
>>>>> On 11/30/2015 05:55 PM, Sander Eikelenboom wrote:
>>>>>> On 2015-11-30 23:54, Boris Ostrovsky wrote:
>>>>>>> On 11/30/2015 04:46 PM, Sander Eikelenboom wrote:
>>>>>>>> On 2015-11-30 22:45, Konrad Rzeszutek Wilk wrote:
>>>>>>>>> On Sat, Nov 28, 2015 at 04:47:43PM +0100, Sander Eikelenboom 
>>>>>>>>> wrote:
>>>>>>>>>> Hi all,
>>>>>>>>>> 
>>>>>>>>>> I have just tested a 4.4-rc2 kernel (current linus tree) + the 
>>>>>>>>>> tip tree
>>>>>>>>>> pulled on top.
>>>>>>>>>> 
>>>>>>>>>> Running this kernel under Xen on PV-guests with multiple vcpus 
>>>>>>>>>> goes well (on
>>>>>>>>>> idle < 10% cpu usage),
>>>>>>>>>> but a guest with only a single vcpu doesn't idle at all, it 
>>>>>>>>>> seems a kworker
>>>>>>>>>> thread is stuck:
>>>>>>>>>> root       569 98.0  0.0      0     0 ?        R 16:02 12:47
>>>>>>>>>> [kworker/0:1]
>>>>>>>>>> 
>>>>>>>>>> Running a 4.3 kernel works fine with a single vpcu, bisecting 
>>>>>>>>>> would probably
>>>>>>>>>> quite painful since there were some breakages this merge 
>>>>>>>>>> window with respect
>>>>>>>>>> to Xen pv-guests.
>>>>>>>>>> 
>>>>>>>>>> There are some differences in the diff's from booting a 4.3, 
>>>>>>>>>> 4.4-single,
>>>>>>>>>> 4.4-multi cpu boot:
>>>>>>>>> 
>>>>>>>>> Boris has been tracking a bunch of them. I am attaching the 
>>>>>>>>> latest set of
>>>>>>>>> patches I've to carry on top of v4.4-rc3.
>>>>>>>> 
>>>>>>>> Hi Konrad,
>>>>>>>> 
>>>>>>>> i will test those, see if it fixes all my issues and report back
>>>>>>> 
>>>>>>> They shouldn't help you ;-( (and I just saw a message from you 
>>>>>>> confirming this)
>>>>>>> 
>>>>>>> The first one fixes a 32-bit bug (on bare metal too). The second 
>>>>>>> fixes
>>>>>>> a fatal bug for 32-bit PV guests. The other two are code
>>>>>>> improvements/cleanup.
>>>>>> 
>>>>>> One of these patches also fixes a bug i was having with a 
>>>>>> pci-passthrough device in
>>>>>> a HVM that wasn't working (depending on which dom0-kernel i was 
>>>>>> using (4.3 or 4.4)),
>>>>>> but didn't report yet.
>>>>>> 
>>>>>> Fingers crossed but i think this pv-guest single vcpu issue is the 
>>>>>> last i'm troubled by for now ;)
>>>>> 
>>>>> I could not reproduce this, including with your kernel config file.
>>>> 
>>>> Hmm that's unpleasant :-\
>>>> 
>>>> Hmm other strange thing is it doesn't seem to affect dom0 (which is 
>>>> also a PV guest), but only unprivileged ones
>>>> All unprivileged pv-guests seem to have the irq issue, but only with 
>>>> a single vcpu i see to get the stuck kworker thread that got my 
>>>> attention, with a 2 vcpu that doesn't seem to happen, but you still 
>>>> get the dmesg output and warnings about hvc)
>>>> 
>>>> Could it be that:
>>>> 
>>>> arch/x86/include/asm/i8259.h
>>>> static inline int nr_legacy_irqs(void)
>>>> {
>>>>         return legacy_pic->nr_legacy_irqs;
>>>> }
>>>> 
>>>> returns something different in some circumstances ?
>>> 
>>> It should return 16 pre-8c058b0b9c34d8c8d7912880956543769323e2d8 and 
>>> 0
>>> after that commit.
>>> 
>>> This is the last number that you see in
>>>     NR_IRQS:4352 nr_irqs:48 0
>>> line.
>>> 
>>> I think you should be able to safely revert both
>>> b4ff8389ed14b849354b59ce9b360bdefcdbf99c and
>>> 8c058b0b9c34d8c8d7912880956543769323e2d8 and see if it makes any
>>> difference.
>>> 
>>> 
>>> -boris
>>> 
>> 
>> That was already underway compiling :)
>> 
>> And it does reveal that reverting both fixes the issue, no stuck 
>> kworker thread .. and no:
>>    genirq: Flags mismatch irq 8. 00000000 (hvc_console) vs. 00000000 
>> (rtc0)
>>    hvc_open: request_irq failed with rc -16.
> 
> 
> Let me try it again tomorrow. Can you post your guest config file, Xen
> version and host HW (Intel or AMD)? 'xl info' maybe?
> 
> -boris

Hi Boris,

A fresh new day .. a fresh new thought.
If i look at the /proc/interrupts from a broken and a kernel with both 
commits the
thing that catches the eye is irq8, just as the dmesg message was 
telling.

In my PV guest rtc0 now seems to try and take irq8 that was already 
assigned to HVC ?
Sounds like some assumptions around the legacy range are broken 
somewhere.

What is the benefit of not just reserving the legacy range ?

Attached the /proc/interrupts from both boots.

--
Sander


> 
>> 
>> What i did get was an conflict reverting 
>> b4ff8389ed14b849354b59ce9b360bdefcdbf99c:
>> arch/arm64/include/asm/irq.h, although that shouldn't matter because 
>> we are on x86 and not on arm.
>> 
>> -- Sander
>> 
>> 
>>>> 
>>>> -- Sander
>>>> 
>>>>> 
>>>>> -boris
>>>> 
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@...ts.xen.org
>>>> http://lists.xen.org/xen-devel
View attachment "interrupts-after-reverts.txt" of type "text/plain" (1304 bytes)

View attachment "interrupts-broken.txt" of type "text/plain" (1297 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ