lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230422060450.GA264770@ziqianlu-desk2>
Date:   Sat, 22 Apr 2023 14:04:50 +0800
From:   Aaron Lu <aaron.lu@...el.com>
To:     Chen Yu <yu.c.chen@...el.com>
CC:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>,
        "Mel Gorman" <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        Tim Chen <tim.c.chen@...el.com>,
        Nitin Tekchandani <nitin.tekchandani@...el.com>,
        Waiman Long <longman@...hat.com>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH] sched/fair: Make tg->load_avg per node

On Sat, Apr 22, 2023 at 12:01:59PM +0800, Chen Yu wrote:
> On 2023-03-27 at 13:39:55 +0800, Aaron Lu wrote:
> > 
> > The reason why only cpus of one node has bigger overhead is: task_group
> > is allocated on demand from a slab and whichever cpu happens to do the
> > allocation, the allocated tg will be located on that node and accessing
> > to tg->load_avg will have a lower cost for cpus on the same node and
> > a higer cost for cpus of the remote node.
> [...]
> >  static inline void update_tg_load_avg(struct cfs_rq *cfs_rq)
> >  {
> >  	long delta = cfs_rq->avg.load_avg - cfs_rq->tg_load_avg_contrib;
> > +	int node = cpu_to_node(cfs_rq->rq->cpu);
> >  
> >  	/*
> >  	 * No need to update load_avg for root_task_group as it is not used.
> > @@ -3616,7 +3617,7 @@ static inline void update_tg_load_avg(struct cfs_rq *cfs_rq)
> >  		return;
> >  
> >  	if (abs(delta) > cfs_rq->tg_load_avg_contrib / 64) {
> > -		atomic_long_add(delta, &cfs_rq->tg->load_avg);
> > +		atomic_long_add(delta, &cfs_rq->tg->node_info[node]->load_avg);
> When entered enqueue_entity(cfs_rq, se) -> update_tg_load_avg(cfs_rq)
> the cfs_rq->rq->cpu is not necessary the current cpu, so the node returned by
> cpu_to_node(cfs_rq->rq->cpu) is not necessary the current node as the current
> CPU, would atomic_add still introduce cross-node overhead due to remote access
> to cfs_rq->tg->node_info[node]->load_avg in this case, or do I miss something?

That's good point.

The chance cpu being different is high, but the node being different is
pretty low though. The wakeup path will not do cross LLC activation with
TTWU_QUEUE, but the load balance path does make it possible to dequeue
a task that is on a remote node and I think that's the only path where
the two cpus(current vs cfs_rq->rq->cpu) can be on different nodes.

A quick test shows during a 5s window of running hackbench on a VM, the
number of times when the two nodes derived from current and cfs_rq->rq->cpu
being different are less than 100 while being equal are 992548.

I'll switch to using smp_processor_id() in next posting since it's the
right thing to do, thanks for spotting this!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ