lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aO5VK4PO_REXNhnN@linux.ibm.com>
Date: Tue, 14 Oct 2025 19:20:35 +0530
From: Srikar Dronamraju <srikar@...ux.ibm.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Tim Chen <tim.c.chen@...ux.intel.com>, Ingo Molnar <mingo@...nel.org>,
        Chen Yu <yu.c.chen@...el.com>, Doug Nelson <doug.nelson@...el.com>,
        Mohini Narkhede <mohini.narkhede@...el.com>,
        linux-kernel@...r.kernel.org,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Shrikanth Hegde <sshegde@...ux.ibm.com>,
        K Prateek Nayak <kprateek.nayak@....com>
Subject: Re: [RESEND PATCH] sched/fair: Skip sched_balance_running cmpxchg
 when balance is not due

* Peter Zijlstra <peterz@...radead.org> [2025-10-14 11:24:36]:

> On Mon, Oct 13, 2025 at 02:54:19PM -0700, Tim Chen wrote:
> 
> 
> Right, Yu Chen said something like that as well, should_we_balance() is
> too late.
> 
> Should we instead move the whole serialize thing inside
> sched_balance_rq() like so:
> 
> @@ -12122,21 +12148,6 @@ static int active_load_balance_cpu_stop(void *data)
>  	return 0;
>  }
>  
> -/*
> - * This flag serializes load-balancing passes over large domains
> - * (above the NODE topology level) - only one load-balancing instance
> - * may run at a time, to reduce overhead on very large systems with
> - * lots of CPUs and large NUMA distances.
> - *
> - * - Note that load-balancing passes triggered while another one
> - *   is executing are skipped and not re-tried.
> - *
> - * - Also note that this does not serialize rebalance_domains()
> - *   execution, as non-SD_SERIALIZE domains will still be
> - *   load-balanced in parallel.
> - */
> -static atomic_t sched_balance_running = ATOMIC_INIT(0);
> -
>  /*
>   * Scale the max sched_balance_rq interval with the number of CPUs in the system.
>   * This trades load-balance latency on larger machines for less cross talk.
> @@ -12192,7 +12203,7 @@ static void sched_balance_domains(struct rq *rq, enum cpu_idle_type idle)
>  	/* Earliest time when we have to do rebalance again */
>  	unsigned long next_balance = jiffies + 60*HZ;
>  	int update_next_balance = 0;
> -	int need_serialize, need_decay = 0;
> +	int need_decay = 0;
>  	u64 max_cost = 0;
>  
>  	rcu_read_lock();
> @@ -12216,13 +12227,6 @@ static void sched_balance_domains(struct rq *rq, enum cpu_idle_type idle)
>  		}
>  
>  		interval = get_sd_balance_interval(sd, busy);
> -
> -		need_serialize = sd->flags & SD_SERIALIZE;
> -		if (need_serialize) {
> -			if (atomic_cmpxchg_acquire(&sched_balance_running, 0, 1))
> -				goto out;
> -		}
> -
>  		if (time_after_eq(jiffies, sd->last_balance + interval)) {
>  			if (sched_balance_rq(cpu, rq, sd, idle, &continue_balancing)) {
>  				/*
> @@ -12236,9 +12240,7 @@ static void sched_balance_domains(struct rq *rq, enum cpu_idle_type idle)
>  			sd->last_balance = jiffies;
>  			interval = get_sd_balance_interval(sd, busy);
>  		}
> -		if (need_serialize)
> -			atomic_set_release(&sched_balance_running, 0);
> -out:
> +
>  		if (time_after(next_balance, sd->last_balance + interval)) {
>  			next_balance = sd->last_balance + interval;
>  			update_next_balance = 1;

I think this is better since previously the one CPU which was not suppose to
do the balancing may increment the atomic variable. If the CPU, that was
suppose to do the balance now tries it may fail since the variable was not
yet decremented.

-- 
Thanks and Regards
Srikar Dronamraju

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ