linux-kernel - Re: [PATCH v7 00/13] fold per-CPU vmstats remotely

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZEoy2aYpGJ4/wOK7@dhcp22.suse.cz>
Date:   Thu, 27 Apr 2023 10:31:21 +0200
From:   Michal Hocko <mhocko@...e.com>
To:     Marcelo Tosatti <mtosatti@...hat.com>
Cc:     Frederic Weisbecker <frederic@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Christoph Lameter <cl@...ux.com>,
        Aaron Tomlin <atomlin@...mlin.com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Russell King <linux@...linux.org.uk>,
        Huacai Chen <chenhuacai@...nel.org>,
        Heiko Carstens <hca@...ux.ibm.com>, x86@...nel.org,
        Vlastimil Babka <vbabka@...e.cz>
Subject: Re: [PATCH v7 00/13] fold per-CPU vmstats remotely

On Wed 26-04-23 11:34:00, Marcelo Tosatti wrote:
> On Thu, Apr 20, 2023 at 10:45:20AM -0300, Marcelo Tosatti wrote:
[...]
> > There are additional details that were not mentioned. When we think
> > of flushing caches, or disabling per-CPU caches, this means that the
> > isolated application loses the benefit of those caches (which means you
> > are turning a "general purpose" programming environment into
> > potentially slower environment for applications to execute).

I do not really buy this argument! Nothing is really free and somebody
has to pay for the overhead. You want highly specialized workload to
enjoy all the performance while having high demand on latency yet the
overhead has to pay everybody else.

> https://www.uwsg.indiana.edu/hypermail/linux/kernel/2012.0/06823.html

This is just talking about who benefits from isolation and I do not
think there is any dispute in that regard. I haven't questioned that. My
main argument was that those really need to be special and careful to
achieve their goal and Thomas says a very similar thing. I do not see
any objection to an explicit programming model to achieve that goal.

> > (yes, of course, one has to be mindful of which system calls can be
> > used, for example the execution time of system calls which take locks will
> > depend on whether, and how many, users of those locks there are at a
> > given moment).

This is simply not maintainble state. Once you enter the kernel you
cannot really expect your _ultra low_ latency expectations.

[...]
> > So it seems to me (unless there are points that show otherwise, which
> > would indicate that explicit userspace interfaces are preferred) _not_
> > requiring userspace changes is a superior solution.
> > 
> > Perhaps the complexity should be judged for individual cases 
> > of interruptions, and if a given interruption-free conversion 
> > is seen as too complex, then a "disable feature which makes use of per-CPU
> > caches" style solution can be made (and then userspace has to
> > explicitly request for that per-CPU feature to be disabled).
> > 
> > But i don't see that this patchset introduces unmanageable complexity,
> > neither: 

As I've tried to explain, I disagree about the approach you are taking.
You are fixing your problem at a wrong layer. You really need to address
the fundamental issue and that is that you do not want housekeeping done
on isolated cpu(s) while your workload is running there.

vmstat updates are just one of schedule_on_cpu users who could disturb
your workload.  We do not want to chase every single one and keep
doing that for ever as new callers of that API are added. See the
point? "Fixing" vmstat will not make your isolated workload more
reliable. You really need a more generic solution rather than a quick
hack.

Also vmstat already has a concept of silencing - i.e. quiet_vmstat. IIRC
this is used by NOHZ. I do not remember any details but if anything this
is something I would have a look into.

There is close to 0 benefit to teaching remote stat flushing. As I've
said stats are only for debugging purposes and imprecise values
shouldn't matter. So this just adds a complexity without any actual real
benefit.

-- 
Michal Hocko
SUSE Labs