lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 24 Mar 2018 07:29:48 -0700
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     Ingo Molnar <mingo@...nel.org>, Eric Dumazet <edumazet@...gle.com>
Cc:     x86 <x86@...nel.org>, lkml <linux-kernel@...r.kernel.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Hugh Dickins <hughd@...gle.com>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [PATCH v3 1/2] x86, msr: allow rdmsr_safe_on_cpu() to schedule



On 03/24/2018 01:09 AM, Ingo Molnar wrote:
> 
> * Eric Dumazet <edumazet@...gle.com> wrote:
> 
>> I noticed high latencies caused by a daemon periodically reading
>> various MSR on all cpus. KASAN kernels would see ~10ms latencies
>> simply reading one MSR. Even without KASAN, sending IPI to CPU
>> in deep sleep state or blocking hard IRQ in a a long section,
>> then waiting for the answer can consume hundreds of usec.
>>
>> Converts rdmsr_safe_on_cpu() to use a completion instead
>> of busy polling.
>>
>> Overall daemon cpu usage was reduced by 35 %,
>> and latencies caused by msr_read() disappeared.
> 
> What "daemon" is this and why is it reading MSRs?

It is named gsysd, "Google System Tool", a daemon+cli that is run 
on all machines in production to provide a generic interface
for interacting with the system hardware.

I am not sure if this answers your question, I probably 
could give a rough estimation of MWh this daemon consumes on the planet
if that helps.

Note that the source of the problem is not reading the MSR, but having cpus
blocking hard irqs for a long time.

Ingo, it looks like any loop protected by unlock_task_sighand() might be the main
offender.

Application writers seem to love getrusage() for example.
Can we rewrite it to not block hard irqs ?

Thanks !

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ