lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <22c1eb51-1e09-3eb4-f88a-d404b4ffcdb4@oracle.com>
Date:   Tue, 5 Mar 2019 10:19:42 +0800
From:   Dongli Zhang <dongli.zhang@...cle.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     xen-devel@...ts.xenproject.org, stable@...r.kernel.org,
        linux-kernel@...r.kernel.org, boris.ostrovsky@...cle.com,
        sstabellini@...nel.org, jgross@...e.com, joe.jin@...cle.com,
        Herbert Van Den Bergh <herbert.van.den.bergh@...cle.com>,
        sboyd@...nel.org, john.stultz@...aro.org
Subject: Re: [BUG linux-4.9.x] xen hotplug cpu leads to 100% steal usage

Hi Thomas,

On 3/2/19 7:43 AM, Thomas Gleixner wrote:
> On Thu, 28 Feb 2019, Dongli Zhang wrote:
>>
>> The root cause is that the return type of jiffies_to_usecs() is 'unsigned int',
>> but not 'unsigned long'. As a result, the leading 32 bits are discarded.
> 
> Errm. No. The root cause is that jiffies_to_usecs() is used for that in the
> first place. The function has been that way forever and all usage sites
> (except a broken dev_debug print in infiniband) feed delta values. Yes, it
> could have documentation....

Thank you very much for the explanation. It would help the developers clarify
the usage of jiffies_to_usecs() (which we should always feed with dealt value)
with comments above it.

Indeed, the input value in this bug is also a delta value. Because of the
special mechanisms used by xen to account steal clock, the initial delta value
is always very large, only when the new cpu is added after the VM is already up
for very long time.

Dongli Zhang


> 
>> jiffies_to_usecs() is indirectly triggered by cputime_to_nsecs() at line 264.
>> If guest is already up for long time, the initial steal time for new vcpu might
>> be large and the leading 32 bits of jiffies_to_usecs() would be discarded.
> 
>> So far, I have two solutions:
>>
>> 1. Change the return type from 'unsigned int' to 'unsigned long' as in above
>> link and I am afraid it would bring side effect. The return type in latest
>> mainline kernel is still 'unsigned int'.
> 
> Changing it to unsigned long would just solve the issue for 64bit.
> 
> Thanks,
> 
> 	tglx
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ