linux-kernel - Re: [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210705144542.GA27275@fuller.cnet>
Date:   Mon, 5 Jul 2021 11:45:42 -0300
From:   Marcelo Tosatti <mtosatti@...hat.com>
To:     Christoph Lameter <cl@...two.de>
Cc:     linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
        Frederic Weisbecker <frederic@...nel.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Nitesh Lal <nilal@...hat.com>
Subject: Re: [patch 0/5] optionally sync per-CPU vmstats counter on return to
 userspace

On Mon, Jul 05, 2021 at 04:26:48PM +0200, Christoph Lameter wrote:
> On Fri, 2 Jul 2021, Marcelo Tosatti wrote:
> 
> > > > The logic to disable vmstat worker thread, when entering
> > > > nohz full, does not cover all scenarios. For example, it is possible
> > > > for the following to happen:
> > > >
> > > > 1) enter nohz_full, which calls refresh_cpu_vm_stats, syncing the stats.
> > > > 2) app runs mlock, which increases counters for mlock'ed pages.
> > > > 3) start -RT loop
> > > >
> > > > Since refresh_cpu_vm_stats from nohz_full logic can happen _before_
> > > > the mlock, vmstat shepherd can restart vmstat worker thread on
> > > > the CPU in question.
> > >
> > > Can we enter nohz_full after the app runs mlock?
> >
> > Hum, i don't think its a good idea to use that route, because
> > entering or exiting nohz_full depends on a number of variable
> > outside of one's control (and additional variables might be
> > added in the future).
> 
> Then I do not see any need for this patch. Because after a certain time
> of inactivity (after the mlock) the system will enter nohz_full again.
> If userspace has no direct control over nohz_full and can only wait then
> it just has to do so.

Sorry, fail to see what you mean.

The problem (well its not a bug per se, but basically the current
disablement of vmstat_worker thread is not aggressive enough).

>From the initial message:

1) enter nohz_full, which calls refresh_cpu_vm_stats, syncing the stats.
2) app runs mlock, which increases counters for mlock'ed pages.
3) start -RT loop

Note that any activity that triggers stat counter changes (other than
mlock, it just happens that it was mlock in the test application i was 
using, just replace with any other system call that triggers writes
to per-CPU vmstat counters), will cause this.

You said:

"Because after a certain time of inactivity (after the mlock) the 
system will enter nohz_full again."

Yes, but we can't tolerate any activity from vmstat worker thread
on this particular CPU.

Do you want the app to wait for an event saying: "vmstat_worker is now
disabled, as long as you don't dirty vmstat counters, vmstat_shepherd
won't wake it up".

Rather than that, what this patch does is to sync the vmstat counters on 
return to userspace, so that:

"We synced per-CPU vmstat counters to global counters, and disable
local-CPU vmstat worker (on return to userspace). As long as you 
don't dirty vmstat counters, vmstat_shepherd won't wake it up".

Makes sense?

> > So preparing the system to function
> > while entering nohz_full at any location seems the sane thing to do.
> >
> > And that would be at return to userspace (since, if mlocked, after
> > that point there will be no more changes to propagate to vmstat
> > counters).
> >
> > Or am i missing something else you can think of ?
> 
> I assumed that the "enter nohz full" was an action by the user
> space app because I saw some earlier patches to introduce such
> functionality in the past.

No, it meant "enter nohz full" (in the current Linux codebase, for
existing applications).