linux-kernel - Re: [PATCH v7 00/13] fold per-CPU vmstats remotely

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZEA95uBeUECRvO5e@tpad>
Date:   Wed, 19 Apr 2023 16:15:50 -0300
From:   Marcelo Tosatti <mtosatti@...hat.com>
To:     Vlastimil Babka <vbabka@...e.cz>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Christoph Lameter <cl@...ux.com>,
        Aaron Tomlin <atomlin@...mlin.com>,
        Frederic Weisbecker <frederic@...nel.org>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Russell King <linux@...linux.org.uk>,
        Huacai Chen <chenhuacai@...nel.org>,
        Heiko Carstens <hca@...ux.ibm.com>, x86@...nel.org,
        Michal Hocko <mhocko@...e.com>
Subject: Re: [PATCH v7 00/13] fold per-CPU vmstats remotely

On Wed, Apr 19, 2023 at 06:47:30PM +0200, Vlastimil Babka wrote:
> On 4/19/23 13:29, Marcelo Tosatti wrote:
> > On Wed, Apr 19, 2023 at 08:14:09AM -0300, Marcelo Tosatti wrote:
> >> This was tried before:
> >> https://lore.kernel.org/lkml/20220127173037.318440631@fedora.localdomain/
> >> 
> >> My conclusion from that discussion (and work) is that a special system
> >> call:
> >> 
> >> 1) Does not allow the benefits to be widely applied (only modified
> >> applications will benefit). Is not portable across different operating systems. 
> >> 
> >> Removing the vmstat_work interruption is a benefit for HPC workloads, 
> >> for example (in fact, it is a benefit for any kind of application, 
> >> since the interruption causes cache misses).
> >> 
> >> 2) Increases the system call cost for applications which would use
> >> the interface.
> >> 
> >> So avoiding the vmstat_update update interruption, without userspace 
> >> knowledge and modifications, is a better than solution than a modified
> >> userspace.
> > 
> > Another important point is this: if an application dirties
> > its own per-CPU vmstat cache, while performing a system call,
> > and a vmstat sync event is triggered on a different CPU, you'd have to:
> > 
> > 1) Wait for that CPU to return to userspace and sync its stats
> > (unfeasible).
> > 
> > 2) Queue work to execute on that CPU (undesirable, as that causes
> > an interruption).
> 
> So you're saying the application might do a syscall from the isolcpu, so
> IIUC it cannot expect any latency guarantees at that very moment, 

Why not? cyclictest uses nanosleep and its the main tool for measuring
latency.

> but then
> it immediately starts expecting them again after returning to userspace, 

No, the expectation more generally is this:

For certain types of applications (for example PLC software or
RAN processing), upon occurrence of an event, it is necessary to
complete a certain task in a maximum amount of time (deadline).

One way to express this requirement is with a pair of numbers,
deadline time and execution time, where:

        * deadline time: length of time between event and deadline.
        * execution time: length of time it takes for processing of event
                          to occur on a particular hardware platform
                          (uninterrupted).

The particular values depend on use-case. For the case
where the realtime application executes in a virtualized
guest, an interruption which must be serviced in the host will cause
the following sequence of events:

        1) VM-exit
        2) execution of IPI (and function call) (or switch to kwork
	thread to execute some work item).
        3) VM-entry

Which causes an excess of 50us latency as observed by cyclictest
(this violates the latency requirement of vRAN application with 1ms TTI,
for example).

> and
> a single interruption for a one-time flush after the syscall would be too
> intrusive?

Generally, if you can't complete the task (which involves executing a
number of instructions) before the deadline, then its a problem.

One-time flush? You mean to switch between:

rt-app -> kworker (to execute vmstat_update flush) -> rt-app

My measurement, which probably had vmstat_update code/data in cache, took 7us.
It might be the case that the code to execute must be brought in from
memory, which takes even longer.

> (elsewhere in the thread you described an RT app initialization that may
> generate vmstats to flush and then entry userspace loop, again, would a
> single interruption soon after entering the loop be so critical?)

1) It depends on the application. For the use-case above, where < 50us
interruption is desired, yes it is critical.

2) The interruptions can come from different sources.

Time
0			rt-app executing instruction 1
1			rt-app executing instruction 2
2			scheduler switches between rt-app and kworker
3			kworker runs vmstat_work
4			scheduler switches between kworker and rt-app
5			rt-app executing instruction 3
6			ipi to handle a KVM request IPI
7			fill in your preferred IPI handler

So the argument "a single interruption might not cause your deadline
to be exceeded" fails (because the time to handle the 
different interruptions might sum).

Does that make sense?

> > 3) Remotely sync the vmstat for that CPU.
> > 
> > 
> > 
> 
>