lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 15 Sep 2014 09:46:47 +0530 From: Preeti U Murthy <preeti@...ux.vnet.ibm.com> To: Vincent Guittot <vincent.guittot@...aro.org>, "peterz@...radead.org" <peterz@...radead.org> CC: Ingo Molnar <mingo@...nel.org>, Rik van Riel <riel@...hat.com>, Morten Rasmussen <Morten.Rasmussen@....com>, LKML <linux-kernel@...r.kernel.org>, Mike Galbraith <efault@....de>, Nicolas Pitre <nicolas.pitre@...aro.org>, "daniel.lezcano@...aro.org" <daniel.lezcano@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>, Kamalesh Babulal <kamalesh@...ux.vnet.ibm.com>, Srikar Dronamraju <srikar@...ux.vnet.ibm.com> Subject: Re: [QUERY] Confusing usage of rq->nr_running in load balancing Hi Peter, Vincent, On 09/03/2014 10:28 PM, Vincent Guittot wrote: > On 3 September 2014 14:21, Preeti U Murthy <preeti@...ux.vnet.ibm.com> wrote: >> Hi, > > Hi Preeti, > >> >> There are places in kernel/sched/fair.c in the load balancing part where >> rq->nr_running is used as against cfs_rq->nr_running. At least I could >> not make out why the former was used in the following scenarios. >> It looks to me that it can very well lead to incorrect load balancing. >> Also I did not pay attention to the numa balancing part of the code >> while skimming through this file to catch this scenario. There are a >> couple of places there too which need to be scrutinized. >> >> 1. load_balance(): The check (busiest->nr_running > 1) >> The load balancing would be futile if there are tasks of other >> scheduling classes, wouldn't it? > > agree with you > >> >> 2. active_load_balance_cpu_stop(): A similar check and a similar >> consequence as 1 here. > > agree with you > >> >> 3. nohz_kick_needed() : We check for more than one task on the runqueue >> and hence trigger load balancing even if there are rt-tasks. > > I can see one potentiel reason why rq->nr_running is interesting that > is the group capacity might have changed because of non cfs tasks > since last load balance. So we need to monitor the change of the > groups' capacity to ensure that the average load of each group is > still in the same level I tried a patch which changes nr_running to cfs.h_nr_running in the above three scenarios and found that the performance of the workload *drops significantly*. The workload that I ran was ebizzy with a few threads running at rt priority and few running at normal priority and running in parallel. This was tried on a 16 core SMT-8 machine. The drop in the performance was around 18% with the patch across different number of threads. I figured that it was because if we consider only cfs.h_nr_running in the above cases, we reduce load balancing attempts even when the capacity of the cpus to run fair tasks is significantly reduced. For example if the cpu is running two rt tasks and one fair task, we skip load balancing altogether with the patch. Besides this, we may end up doing active load balancing too often. So I think we are good with nr_running although that may mean unnecessary load balancing attempts when only rt tasks are running on the cpus. But evaluating on nr_running in the above three scenarios when there is a mix of rt and fair tasks is better so as to see if the cpus have enough capacity to handle the one fair task that they can possibly be running (If there are more fair tasks, we load balance anyway). As for the usage of nr_running in find_busiest_queue(), we are good there as Vincent pointed out as below. " > > 8. find_busiest_queue(): This anomaly shows up when we filter against > rq->nr_running == 1 and imbalance cannot be taken care of by the > existing task on this rq. agree with you even if the test with wl should prevent wrong decision as a wl will be null if no cfs task are present " So the only changes we require around this is the change of nr_running to cfs.h_nr_running in update_sg_lb_stats() and cpu_avg_load_per_task() which is being done by Vincent already in the consolidation of cpu_capacity patches and I did not see regressions there during my tests. Thanks Regards Preeti U Murthy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists