lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250108172948.452158-1-richard120310@gmail.com>
Date: Thu,  9 Jan 2025 01:29:47 +0800
From: I Hsin Cheng <richard120310@...il.com>
To: mingo@...hat.com
Cc: peterz@...radead.org,
	juri.lelli@...hat.com,
	vincent.guittot@...aro.org,
	dietmar.eggemann@....com,
	rostedt@...dmis.org,
	bsegall@...gle.com,
	mgorman@...e.de,
	vschneid@...hat.com,
	linux-kernel@...r.kernel.org,
	I Hsin Cheng <richard120310@...il.com>
Subject: [RFC PATCH] sched/fair: Refactor can_migrate_task() to elimate looping

The function "can_migrate_task()" utilize "for_each_cpu_and" with a
"if" statement inside to find the destination cpu. It's the same logic
to find the first set bit of the result of the bitwise-AND of
"env->dst_grpmask", "env->cpus" and "p->cpus_ptr".

Refactor it by directly perform bitwise-AND for "env->dst_grpmask",
"env->cpus" and "p->cpus_ptr" and use "cpumask_first()" to select the
destination cpu, so we can elimate the need of looping and multiple
times of branch.

After the refactoring this part of the code can speed up from ~130ns to
~93ns, according to the test below.

Ran the test for 5 times and the result is showned in the following
table, and the test script is paste in next section.

-------------------------------------------------------
|Old method|  123|  135|  126|  129|  135|  avg ~130ns|
-------------------------------------------------------
|New method|   87|   97|   90|   92|   98|  avg  ~93ns|
-------------------------------------------------------

Signed-off-by: I Hsin Cheng <richard120310@...il.com>
---
Test is done on Linux 6.8.0-48-generic x86_64 with
Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz

Test is executed in the form of kernel module.

Test script:

int init_module(void)
{
    struct cpumask cur_mask, custom_mask, result_mask;
    struct task_struct *p = current;
    int cpu, cpu1 = nr_cpu_ids, cpu2 = nr_cpu_ids;
    unsigned tmp = 0;

    cpumask_copy(&cur_mask, cpu_online_mask);
    /* Self-implemented function, didn't paste here because the length */
    generate_random_cpumask(&custom_mask);

    ktime_t start_1 = ktime_get();
    for_each_cpu_and(cpu, &cur_mask, &custom_mask) {
        if (cpumask_test_cpu(cpu, p->cpus_ptr)) {
            /* imitate load balance operation */
            tmp |= 0x01010101;
            cpu1 = cpu;
            break;
        }
    }
    ktime_t end_1 = ktime_get();

    ktime_t start_2 = ktime_get();
    cpumask_and(&result_mask, &cur_mask, &custom_mask);
    cpumask_and(&result_mask, &result_mask, p->cpus_ptr);

    cpu = cpumask_first(&result_mask);

    if (cpu < nr_cpu_ids) {
        /* imitate load balance operation */
        tmp |= 0x01010101;
        cpu2 = cpu;
    }
    ktime_t end_2 = ktime_get();

    if (cpu1 != cpu2) {
        pr_err("Failed Assertion, cpu1 = %d, cpu2 = %d\n", cpu1, cpu2);
        return 0;
    }

    pr_info("Old method spend time : %lld\n", ktime_to_ns(end_1 - start_1));
    pr_info("New method spend time : %lld\n", ktime_to_ns(end_2 - start_2));

    return 0;
}
---
 kernel/sched/fair.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2d16c8545..ce46f61da 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9404,12 +9404,16 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
 			return 0;
 
 		/* Prevent to re-select dst_cpu via env's CPUs: */
-		for_each_cpu_and(cpu, env->dst_grpmask, env->cpus) {
-			if (cpumask_test_cpu(cpu, p->cpus_ptr)) {
-				env->flags |= LBF_DST_PINNED;
-				env->new_dst_cpu = cpu;
-				break;
-			}
+		struct cpumask dst_mask;
+
+		cpumask_and(&dst_mask, env->dst_grpmask, env->cpus);
+		cpumask_and(&dst_mask, &dst_mask, p->cpus_ptr);
+
+		cpu = cpumask_first(&dst_mask);
+
+		if (cpu < nr_cpu_ids) {
+			env->flags |= LBF_DST_PINNED;
+			env->new_dst_cpu = cpu;
 		}
 
 		return 0;
-- 
2.43.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ