linux-kernel - Re: [PATCH v8 00/13] fold per-CPU vmstats remotely

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZG4W01AcwhD5AiQU@tpad>
Date:   Wed, 24 May 2023 10:53:23 -0300
From:   Marcelo Tosatti <mtosatti@...hat.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Christoph Lameter <cl@...ux.com>,
        Aaron Tomlin <atomlin@...mlin.com>,
        Frederic Weisbecker <frederic@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Russell King <linux@...linux.org.uk>,
        Huacai Chen <chenhuacai@...nel.org>,
        Heiko Carstens <hca@...ux.ibm.com>, x86@...nel.org,
        Vlastimil Babka <vbabka@...e.cz>
Subject: Re: [PATCH v8 00/13] fold per-CPU vmstats remotely

On Wed, May 24, 2023 at 02:51:55PM +0200, Michal Hocko wrote:
> [Sorry for a late response but I was conferencing last two weeks and now
> catching up]
> 
> On Mon 15-05-23 15:00:15, Marcelo Tosatti wrote:
> [...]
> > v8
> > - Add summary of discussion on -v7 to cover letter
> 
> Thanks this is very useful! This helps to frame the further discussion.
> 
> I believe the most important question to answer is this in fact
> > I think what needs to be done is to avoid new queue_work_on()
> > users from being introduced in the tree (the number of
> > existing ones is finite and can therefore be fixed).
> > 
> > Agree with the criticism here, however, i can't see other
> > options than the following:
> > 
> >         1) Given an activity, which contains a sequence of instructions
> >            to execute on a CPU, to change the algorithm
> >            to execute that code remotely (therefore avoid interrupting a CPU),
> >            or to avoid the interruption somehow (which must be dealt with
> >            on a case-by-case basis).
> > 
> >         2) To block that activity from happening in the first place,
> >            for the sites where it can be blocked (that return errors to
> >            userspace, for example).
> > 
> >         3) Completly isolate the CPU from the kernel (off-line it).
> 
> I agree that a reliable cpu isolation implementation needs to address
> queue_work_on problem. And it has to do that _realiably_. This cannot by
> achieved by an endless whack-a-mole and chasing each new instance. There
> must be a more systematic approach. One way would be to change the
> semantic of schedule_work_on and fail call for an isolated CPU. The
> caller would have a way to fallback and handle the operation by other
> means. E.g. vmstat could simply ignore folding pcp data because an
> imprecision shouldn't really matter. Other callers might chose to do the
> operation remotely. This is a lot of work, no doubt about that, but it
> is a long term maintainable solution that doesn't give you new surprises
> with any new released kernel. There are likely other remote interfaces
> that would need to follow that scheme.
> 
> If the cpu isolation is not planned to be worth that time investment
> then I do not think it is also worth reducing a highly optimized vmstat
> code. These stats are invoked from many hot paths and per-cpu
> implementation has been optimized for that case.

It is exactly the same code, but now with a "LOCK" prefix for CMPXCHG
instruction. Which should not cost much due to cache locking (these are
per-CPU variables anyway).

> If your workload would
> like to avoid that as disturbing then you already have a quiet_vmstat
> precedence so find a way how to use it for your workload instead.
>  
> -- 
> Michal Hocko
> SUSE Labs

OK so an alternative solution is to completly disable vmstat updates
for isolated CPUs. Are you OK with that ?