linux-kernel - Re: [PATCH] dl_server: Reset DL server params when rd changes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a8e3dfbd-0efa-4b4e-bc18-d00abaa79f14@redhat.com>
Date: Sat, 9 Nov 2024 13:18:17 -0500
From: Waiman Long <llong@...hat.com>
To: Juri Lelli <juri.lelli@...hat.com>,
 Joel Fernandes <joel@...lfernandes.org>
Cc: linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
 Peter Zijlstra <peterz@...radead.org>,
 Vincent Guittot <vincent.guittot@...aro.org>,
 Dietmar Eggemann <dietmar.eggemann@....com>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
 Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
 Suleiman Souhlal <suleiman@...gle.com>, Aashish Sharma <shraash@...gle.com>,
 Shin Kawamura <kawasin@...gle.com>,
 Vineeth Remanan Pillai <vineeth@...byteword.org>
Subject: Re: [PATCH] dl_server: Reset DL server params when rd changes

On 11/8/24 10:30 PM, Waiman Long wrote:
> I have the patchset to enforce that rebuild_sched_domains_locked() 
> will only be called at most once per cpuset operation.
>
> By adding some debug code to further study the null total_bw issue 
> when cpuset.cpus.partition is being changed, I found that eliminating 
> the redundant rebuild_sched_domains_locked() reduced the chance of 
> hitting null total_bw, it did not eliminate it. By running my cpuset 
> test script, I hit 250 cases of null total_bw with the v6.12-rc6 
> kernel. With my new cpuset patch applied, it reduces it to 120 cases 
> of null total_bw.
>
> I will try to look further for the exact condition that triggers null 
> total_bw generation.

After further testing, the 120 cases of null total_bw can be classified 
into the following 3 categories.

1) 51 cases when an isolated partition with isolated CPUs is created. 
Isolated CPU is not subjected to scheduling and so a total_bw of 0 is 
fine and not really a problem.

2) 67 cases when a nested partitions are being removed (A1 - A2). There 
is probably caused by some kind of race condtion. If I insert an 
artifical delay between the removal of A2 and A1, total_bw is fine. If 
there is no delay, I can see a null total_bw. That shouldn't really a 
problem in practice, though we may still need to figure out why.

2) Two cases where null total_bw is seen when a new partition is created 
by moving all the CPUs in the parent cgroup into its partition and the 
parent becomes a null partition with no CPU. The following example 
illustrates the steps.

#!/bin/bash
CGRP=/sys/fs/cgroup
cd $CGRP
echo +cpuset > cgroup.subtree_control
mkdir A1
cd A1
echo 0-1 > cpuset.cpus
echo root > cpuset.cpus.partition
echo "A1 partition"
echo +cpuset > cgroup.subtree_control
mkdir A2
cd A2
echo 0-1 > cpuset.cpus
echo root > cpuset.cpus.partition
echo "A2 partition"
cd ..
echo "Remove A2"
rmdir A2
cd ..
echo "Remove A1"
rmdir A1

In this corner case, there is actually no change in the set of sched 
domains. In this case, the sched domain set of CPUs 0-1 is being moved 
from partition A1 to A2 and vice versa in the removal of A2. This is 
similar to calling rebuild_sched_domains_locked() twice with the same 
input. I believe that is the condition that causes null total_bw.

Now the question is why the deadline code behaves this way. It is 
probably a bug that needs to be addressed.

Cheers,
Longman