linux-kernel - Re: [PATCH v2 2/3] vmstat: skip periodic vmstat update for nohz full CPUs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZH4FhQ5Lh6xFBjyz@dhcp22.suse.cz>
Date:   Mon, 5 Jun 2023 17:55:49 +0200
From:   Michal Hocko <mhocko@...e.com>
To:     Marcelo Tosatti <mtosatti@...hat.com>
Cc:     Christoph Lameter <cl@...ux.com>,
        Aaron Tomlin <atomlin@...mlin.com>,
        Frederic Weisbecker <frederic@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Vlastimil Babka <vbabka@...e.cz>
Subject: Re: [PATCH v2 2/3] vmstat: skip periodic vmstat update for nohz full
 CPUs

On Mon 05-06-23 11:53:56, Marcelo Tosatti wrote:
> On Mon, Jun 05, 2023 at 09:55:57AM +0200, Michal Hocko wrote:
> > On Fri 02-06-23 15:57:59, Marcelo Tosatti wrote:
> > > The interruption caused by vmstat_update is undesirable 
> > > for certain aplications:
> > > 
> > > oslat   1094.456862: sys_mlock(start: 7f7ed0000b60, len: 1000)
> > > oslat   1094.456971: workqueue_queue_work: ... function=vmstat_update ...
> > > oslat   1094.456974: sched_switch: prev_comm=oslat ... ==> next_comm=kworker/5:1 ...
> > > kworker 1094.456978: sched_switch: prev_comm=kworker/5:1 ==> next_comm=oslat ...
> > > 
> > > The example above shows an additional 7us for the
> > > 
> > >        	oslat -> kworker -> oslat
> > > 
> > > switches. In the case of a virtualized CPU, and the vmstat_update  
> > > interruption in the host (of a qemu-kvm vcpu), the latency penalty
> > > observed in the guest is higher than 50us, violating the acceptable
> > > latency threshold.
> > 
> > I personally find the above problem description insufficient. I have
> > asked several times and only got piece by piece information each time.
> > Maybe there is a reason to be secretive but it would be great to get at
> > least some basic expectations described  and what they are based on.
> 
> There is no reason to be secretive. 
> 
> > 
> > E.g. workloads are running on isolated cpus with nohz full mode to
> > shield off any kernel interruption. Yet there are operations that update
> > counters (like mlock, but not mlock alone) that update per cpu counters
> > that will eventually get flushed and that will cause some interference.
> > Now the host/guest transition and intereference. How that happens when
> > the guest is running on an isolated and dedicated cpu?
> 
> Follows the updated changelog. Does it contain the information
> requested ?
> 
> ----
> 
> Performance details for the kworker interruption:
> 
> With workloads that are running on isolated cpus with nohz full mode to
> shield off any kernel interruption. For example, a VM running a
> time sensitive application with a 50us maximum acceptable interruption
> (use case: soft PLC).
> 
> oslat   1094.456862: sys_mlock(start: 7f7ed0000b60, len: 1000)
> oslat   1094.456971: workqueue_queue_work: ... function=vmstat_update ...
> oslat   1094.456974: sched_switch: prev_comm=oslat ... ==> next_comm=kworker/5:1 ...
> kworker 1094.456978: sched_switch: prev_comm=kworker/5:1 ==> next_comm=oslat ...
> 
> The example above shows an additional 7us for the
> 
>         oslat -> kworker -> oslat
> 
> switches. In the case of a virtualized CPU, and the vmstat_update
> interruption in the host (of a qemu-kvm vcpu), the latency penalty
> observed in the guest is higher than 50us, violating the acceptable
> latency threshold.
> 
> The isolated vCPU can perform operations that modify per-CPU page counters,
> for example to complete I/O operations:
> 
>       CPU 11/KVM-9540    [001] dNh1.  2314.248584: mod_zone_page_state <-__folio_end_writeback
>       CPU 11/KVM-9540    [001] dNh1.  2314.248585: <stack trace>
>  => 0xffffffffc042b083
>  => mod_zone_page_state
>  => __folio_end_writeback
>  => folio_end_writeback
>  => iomap_finish_ioend
>  => blk_mq_end_request_batch
>  => nvme_irq
>  => __handle_irq_event_percpu
>  => handle_irq_event
>  => handle_edge_irq
>  => __common_interrupt
>  => common_interrupt
>  => asm_common_interrupt
>  => vmx_do_interrupt_nmi_irqoff
>  => vmx_handle_exit_irqoff
>  => vcpu_enter_guest
>  => vcpu_run
>  => kvm_arch_vcpu_ioctl_run
>  => kvm_vcpu_ioctl
>  => __x64_sys_ioctl
>  => do_syscall_64
>  => entry_SYSCALL_64_after_hwframe

OK, this is really useful. It is just not really clear whether the IO
triggered here is from the guest or it a host activity.

overall this is much better!
-- 
Michal Hocko
SUSE Labs