linux-kernel - Re: [patch 2/2] MM: allow per-cpu vmstat_threshold and vmstat

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Thu, 25 May 2017 16:35:08 -0300
From:   Marcelo Tosatti <mtosatti@...hat.com>
To:     Luiz Capitulino <lcapitulino@...hat.com>,
        Christoph Lameter <cl@...ux.com>
Cc:     Christoph Lameter <cl@...ux.com>, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, Rik van Riel <riel@...hat.com>,
        Linux RT Users <linux-rt-users@...r.kernel.org>,
        cmetcalf@...lanox.com
Subject: Re: [patch 2/2] MM: allow per-cpu vmstat_threshold and vmstat_worker
 configuration

On Fri, May 19, 2017 at 01:49:34PM -0400, Luiz Capitulino wrote:
> On Fri, 19 May 2017 12:13:26 -0500 (CDT)
> Christoph Lameter <cl@...ux.com> wrote:
> 
> > > So why are you against integrating this simple, isolated patch which
> > > does not affect how current logic works?  
> > 
> > Frankly the argument does not make sense. Vmstat updates occur very
> > infrequently (probably even less than you IPIs and the other OS stuff that
> > also causes additional latencies that you seem to be willing to tolerate).
> 
> Infrequently is not good enough. It only has to happen once to
> cause a problem.
> 
> Also, IPIs take a few us, usually less. That's not a problem. In our
> testing we see the preemption caused by the kworker take 10us or
> even more. I've never seeing it take 3us. I'm not saying this is not
> true, I'm saying if this is causing a problem to us it will cause
> a problem to other people too.

Christoph, 

Some data:

 qemu-system-x86-12902 [003] ....1..  6517.621557: kvm_exit: reason
EXTERNAL_INTERRUPT rip 0x4004f1 info 0 800000fc
 qemu-system-x86-12902 [003] d...2..  6517.621557: kvm_entry: vcpu 2
 qemu-system-x86-12902 [003] ....1..  6517.621560: kvm_exit: reason
EXTERNAL_INTERRUPT rip 0x4004f1 info 0 800000fc
 qemu-system-x86-12902 [003] d...2..  6517.621561: kvm_entry: vcpu 2
 qemu-system-x86-12902 [003] ....1..  6517.621563: kvm_exit: reason
EXTERNAL_INTERRUPT rip 0x4004f1 info 0 800000fc
 qemu-system-x86-12902 [003] d...2..  6517.621564: kvm_entry: vcpu 2
 qemu-system-x86-12902 [003] d..h1..  6517.622037: empty_smp_call_func:
empty_smp_call_func ran
 qemu-system-x86-12902 [003] ....1..  6517.622040: kvm_exit: reason
EXTERNAL_INTERRUPT rip 0x4004f1 info 0 800000fb
 qemu-system-x86-12902 [003] d...2..  6517.622041: kvm_entry: vcpu 2

empty_smp_function_call: 3us.

 qemu-system-x86-12902 [003] ....1..  6517.702739: kvm_exit: reason
EXTERNAL_INTERRUPT rip 0x4004f1 info 0 800000ef
 qemu-system-x86-12902 [003] d...2..  6517.702741: kvm_entry: vcpu 2
 qemu-system-x86-12902 [003] d..h1..  6517.702758: scheduler_tick
<-update_process_times
 qemu-system-x86-12902 [003] ....1..  6517.702760: kvm_exit: reason
EXTERNAL_INTERRUPT rip 0x4004f1 info 0 800000ef
 qemu-system-x86-12902 [003] d...2..  6517.702760: kvm_entry: vcpu 2

scheduler_tick: 2us.

 qemu-system-x86-12902 [003] ....1..  6518.194570: kvm_exit: reason
EXTERNAL_INTERRUPT rip 0x4004f1 info 0 800000ef
 qemu-system-x86-12902 [003] d...2..  6518.194571: kvm_entry: vcpu 2
 qemu-system-x86-12902 [003] ....1..  6518.194591: kvm_exit: reason
EXTERNAL_INTERRUPT rip 0x4004f1 info 0 800000ef
 qemu-system-x86-12902 [003] d...2..  6518.194593: kvm_entry: vcpu 2

That, and the 10us number for kworker mentioned above changes your
point of view of your 
"Frankly the argument does not make sense. Vmstat updates occur very
infrequently (probably even less than you IPIs and the other OS stuff that
also causes additional latencies that you seem to be willing to tolerate).
And you can configure the interval of vmstat updates freely.... Set
 the vmstat_interval to 60 seconds instead of 2 for a try? Is that rare
enough?" 

Argument? We're showing you the data that this is causing a latency
problem for us.

Is there anything you'd like to be improved on the patch?
Is there anything you dislike about it?

> No, we'd have to set it high enough to disable it and this will
> affect all CPUs.
> 
> Something that crossed my mind was to add a new tunable to set
> the vmstat_interval for each CPU, this way we could essentially
> disable it to the CPUs where DPDK is running. What's the implications
> of doing this besides not getting up to date stats in /proc/vmstat
> (which I still have to confirm would be OK)? Can this break anything
> in the kernel for example?

Well, you get incorrect statistics.