linux-kernel - Re: [PATCH v1 0/3] Avoid scheduling cache draining to isolated cpus

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <958969c204e1041dead005d1c801cf3c54ab86f1.camel@redhat.com>
Date:   Wed, 25 Jan 2023 04:44:12 -0300
From:   Leonardo Brás <leobras@...hat.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Shakeel Butt <shakeelb@...gle.com>,
        Muchun Song <songmuchun@...edance.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Frederic Weisbecker <frederic@...nel.org>,
        Phil Auld <pauld@...hat.com>,
        Marcelo Tosatti <mtosatti@...hat.com>,
        linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
        linux-mm@...ck.org
Subject: Re: [PATCH v1 0/3] Avoid scheduling cache draining to isolated cpus

On Wed, 2022-11-09 at 09:05 +0100, Michal Hocko wrote:
> On Tue 08-11-22 20:09:25, Leonardo Brás wrote:
> [...]
> > > Yes, with a notable difference that with your spin lock option there is
> > > still a chance that the remote draining could influence the isolated CPU
> > > workload throug that said spinlock. If there is no pcp cache for that
> > > cpu being used then there is no potential interaction at all.
> > 
> > I see. 
> > But the slow path is slow for some reason, right?
> > Does not it make use of any locks also? So on normal operation there could be a
> > potentially larger impact than a spinlock, even though there would be no
> > scheduled draining.
> 
> Well, for the regular (try_charge) path that is essentially page_counter_try_charge
> which boils down to atomic_long_add_return of the memcg counter + all
> parents up the hierarchy and high memory limit evaluation (essentially 2
> atomic_reads for the memcg + all parents up the hierchy). That is not
> whole of a lot - especially when the memcg hierarchy is not very deep.
> 
> Per cpu batch amortizes those per hierarchy updates as well as atomic
> operations + cache lines bouncing on updates.
> 
> On the other hand spinlock would do the unconditional atomic updates as
> well and even much more on CONFIG_RT. A plus is that the update will be
> mostly local so cache line bouncing shouldn't be terrible. Unless
> somebody heavily triggers pcp cache draining but this shouldn't be all
> that common (e.g. when a memcg triggers its limit.
> 
> All that being said, I am still not convinced that the pcp cache bypass
> for isolated CPUs would make a dramatic difference. Especially in the
> context of workloads that tend to run on isolated CPUs and rarely enter
> kernel.
>  
> > > It is true true that appart from user
> > > space memory which can be under full control of the userspace there are
> > > kernel allocations which can be done on behalf of the process and those
> > > could be charged to memcg as well. So I can imagine the pcp cache could
> > > be populated even if the process is not faulting anything in during RT
> > > sensitive phase.
> > 
> > Humm, I think I will apply the change and do a comparative testing with
> > upstream. This should bring good comparison results.
> 
> That would be certainly appreciated!
>  (
> > > > On the other hand, compared to how it works now now, this should be a more
> > > > controllable way of introducing latency than a scheduled cache drain.
> > > > 
> > > > Your suggestion on no-stocks/caches in isolated CPUs would be great for
> > > > predictability, but I am almost sure the cost in overall performance would not
> > > > be fine.
> > > 
> > > It is hard to estimate the overhead without measuring that. Do you think
> > > you can give it a try? If the performance is not really acceptable
> > > (which I would be really surprised) then we can think of a more complex
> > > solution.
> > 
> > Sure, I can try that.
> > Do you suggest any specific workload that happens to stress the percpu cache
> > usage, with usual drains and so? Maybe I will also try with synthetic worloads
> > also.
> 
> I really think you want to test it on the isolcpu aware workload.
> Artificial benchmark are not all that useful in this context.

Hello Michael,
I just sent a v2 for this patchset with a lot of changes.
https://lore.kernel.org/lkml/20230125073502.743446-1-leobras@redhat.com/

I have tried to gather some data on the performance numbers as suggested, but I
got carried away and the cover letter ended up too big. I hope it's not too much
trouble.

Best regards,
Leo