lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20250508195952.391587-1-yury.norov@gmail.com>
Date: Thu,  8 May 2025 15:59:50 -0400
From: Yury Norov <yury.norov@...il.com>
To: Kristen Accardi <kristen.c.accardi@...el.com>,
	Vinicius Costa Gomes <vinicius.gomes@...el.com>,
	Herbert Xu <herbert@...dor.apana.org.au>,
	"David S. Miller" <davem@...emloft.net>,
	linux-crypto@...r.kernel.org,
	linux-kernel@...r.kernel.org
Cc: Yury Norov <yury.norov@...il.com>
Subject: [PATCH] crypto: iaa - Optimize rebalance_wq_table()

The function opencodes for_each_cpu() by using a plain for-loop. The
loop calls cpumask_weight() inside the conditional section. Because
cpumask_weight() is O(1), the overall complexity of the function is
O(node * node_cpus^2). Also, cpumask_nth() internally calls hweight(),
which, if not hardware accelerated, is slower than cpumask_next() in
for_each_cpu().

If switched to the dedicated for_each_cpu(), the rebalance_wq_table()
can drop calling cpumask_weight(), together with some housekeeping code.
This makes the overall complexity O(node * node_cpus), or simply speaking
O(nr_cpu_ids).

While there, fix opencoded for_each_possible_cpu() too.

Signed-off-by: Yury Norov <yury.norov@...il.com>
---
 drivers/crypto/intel/iaa/iaa_crypto_main.c | 35 +++++++++-------------
 1 file changed, 14 insertions(+), 21 deletions(-)

diff --git a/drivers/crypto/intel/iaa/iaa_crypto_main.c b/drivers/crypto/intel/iaa/iaa_crypto_main.c
index 09d9589f2d68..0c5ff1c6e335 100644
--- a/drivers/crypto/intel/iaa/iaa_crypto_main.c
+++ b/drivers/crypto/intel/iaa/iaa_crypto_main.c
@@ -894,7 +894,7 @@ static int wq_table_add_wqs(int iaa, int cpu)
 static void rebalance_wq_table(void)
 {
 	const struct cpumask *node_cpus;
-	int node, cpu, iaa = -1;
+	int node_cpu, node, cpu, iaa = 0;
 
 	if (nr_iaa == 0)
 		return;
@@ -905,36 +905,29 @@ static void rebalance_wq_table(void)
 	clear_wq_table();
 
 	if (nr_iaa == 1) {
-		for (cpu = 0; cpu < nr_cpus; cpu++) {
-			if (WARN_ON(wq_table_add_wqs(0, cpu))) {
-				pr_debug("could not add any wqs for iaa 0 to cpu %d!\n", cpu);
-				return;
-			}
+		for_each_possible_cpu(cpu) {
+			if (WARN_ON(wq_table_add_wqs(0, cpu)))
+				goto err;
 		}
 
 		return;
 	}
 
 	for_each_node_with_cpus(node) {
+		cpu = 0;
 		node_cpus = cpumask_of_node(node);
 
-		for (cpu = 0; cpu <  cpumask_weight(node_cpus); cpu++) {
-			int node_cpu = cpumask_nth(cpu, node_cpus);
-
-			if (WARN_ON(node_cpu >= nr_cpu_ids)) {
-				pr_debug("node_cpu %d doesn't exist!\n", node_cpu);
-				return;
-			}
-
-			if ((cpu % cpus_per_iaa) == 0)
-				iaa++;
-
-			if (WARN_ON(wq_table_add_wqs(iaa, node_cpu))) {
-				pr_debug("could not add any wqs for iaa %d to cpu %d!\n", iaa, cpu);
-				return;
-			}
+		for_each_cpu(node_cpu, node_cpus) {
+			iaa = cpu / cpus_per_iaa;
+			if (WARN_ON(wq_table_add_wqs(iaa, node_cpu)))
+				goto err;
+			cpu++;
 		}
 	}
+
+	return;
+err:
+	pr_debug("could not add any wqs for iaa %d to cpu %d!\n", iaa, cpu);
 }
 
 static inline int check_completion(struct device *dev,
-- 
2.43.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ