lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e2e558b863c929c5019264b2ddefd4c0@codethink.co.uk>
Date: Thu, 25 Sep 2025 15:33:54 +0200
From: Matteo Martelli <matteo.martelli@...ethink.co.uk>
To: Aaron Lu <ziqianlu@...edance.com>, K Prateek Nayak
	<kprateek.nayak@....com>
Cc: Valentin Schneider <vschneid@...hat.com>, Ben Segall <bsegall@...gle.com>,
	Peter Zijlstra <peterz@...radead.org>, Chengming Zhou
	<chengming.zhou@...ux.dev>, Josh Don <joshdon@...gle.com>, Ingo Molnar
	<mingo@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, Xi Wang
	<xii@...gle.com>, linux-kernel@...r.kernel.org, Juri Lelli
	<juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>, Mel Gorman <mgorman@...e.de>,
	Chuyi Zhou <zhouchuyi@...edance.com>, Jan Kiszka <jan.kiszka@...mens.com>,
	Florian Bezdeka <florian.bezdeka@...mens.com>, Songtang Liu
	<liusongtang@...edance.com>, Chen Yu <yu.c.chen@...el.com>,
	Michal Koutný <mkoutny@...e.com>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Matteo Martelli <matteo.martelli@...ethink.co.uk>
Subject: Re: [PATCH 1/4] sched/fair: Propagate load for throttled cfs_rq

Hi Aaron,

On Thu, 25 Sep 2025 20:05:04 +0800, Aaron Lu <ziqianlu@...edance.com> wrote:
> On Thu, Sep 25, 2025 at 04:52:25PM +0530, K Prateek Nayak wrote:
> > 
> > On 9/25/2025 2:59 PM, Aaron Lu wrote:
> > > Hi Prateek,
> > > 
> > > On Thu, Sep 25, 2025 at 01:47:35PM +0530, K Prateek Nayak wrote:
> > >> Hello Aaron, Matteo,
> > >>
> > >> On 9/24/2025 5:03 PM, Aaron Lu wrote:
> > >>>> ...
> > >>>> The test setup is the same used in my previous testing for v3 [2], where
> > >>>> the CFS throttling events are mostly triggered by the first ssh logins
> > >>>> into the system as the systemd user slice is configured with CPUQuota of
> > >>>> 25%. Also note that the same systemd user slice is configured with CPU
> > >>>
> > >>> I tried to replicate this setup, below is my setup using a 4 cpu VM
> > >>> and rt kernel at commit fe8d238e646e("sched/fair: Propagate load for
> > >>> throttled cfs_rq"):
> > >>> # pwd
> > >>> /sys/fs/cgroup/user.slice
> > >>> # cat cpu.max
> > >>> 25000 100000
> > >>> # cat cpuset.cpus
> > >>> 0
> > >>>
> > >>> I then login using ssh as a normal user and I can see throttle happened
> > >>> but couldn't hit this warning. Do you have to do something special to
> > >>> trigger it?

It wasn't very reproducible in my setup either, but I found out that the
warning was being triggered more often when I tried to ssh into the
system just after boot, probably due to some additional load from
processes spawned during the boot phase. Therefore I prepared a
reproducer script that resemble my initial setup, plus a stress-ng
worker in the background while connecting with ssh to the system. I also
reduced the CPUQuota to 10% which seemed to increase the probability to
trigger the warning. With this script I can reproduce the warning about
once or twice every 10 ssh executions. See the script at the end of this
email.

> > >>>> [   18.421350] WARNING: CPU: 0 PID: 1 at kernel/sched/fair.c:400 enqueue_task_fair+0x925/0x980
> > >>>
> > >>> I stared at the code and haven't been able to figure out when
> > >>> enqueue_task_fair() would end up with a broken leaf cfs_rq list.
> > >>>

> > >>
> > >> Yeah neither could I. I tried running with PREEMPT_RT too and still
> > >> couldn't trigger it :(
> > >>
> > >> But I'm wondering if all we are missing is:
> > >>
> > >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > >> index f993de30e146..5f9e7b4df391 100644
> > >> --- a/kernel/sched/fair.c
> > >> +++ b/kernel/sched/fair.c
> > >> @@ -6435,6 +6435,7 @@ static void sync_throttle(struct task_group *tg, int cpu)
> > >>  
> > >>  	cfs_rq->throttle_count = pcfs_rq->throttle_count;
> > >>  	cfs_rq->throttled_clock_pelt = rq_clock_pelt(cpu_rq(cpu));
> > >> +	cfs_rq->pelt_clock_throttled = pcfs_rq->pelt_clock_throttled;
> > >>  }
> > >>  
> > >>  /* conditionally throttle active cfs_rq's from put_prev_entity() */
> > >> ---
> > >>
> > >> This is the only way we can currently have a break in
> > >> cfs_rq_pelt_clock_throttled() hierarchy.
> > >>
> ...
> 
> Hi Matteo,
> 
> Can you test the above diff Prateek sent in his last email? Thanks.
> 

I have just tested with the same script below the diff sent by Prateek
in [1] (also quoted above) that changes sync_throttle(), and I couldn't
reproduce the warning.

Here's the script (I hope it doesn't add too much noise to the email
thread).

---
# Copyright (C) 2024 Codethink Limited
# SPDX-License-Identifier: GPL-2.0-only
#!/usr/bin/bash

script_path=$(realpath "$0")
wrk_dir=$(pwd)
kernel_version=fe8d238e646e
kernel_dir=${wrk_dir}/../linux
kernel_image=${kernel_dir}/arch/x86_64/boot/bzImage
qemu_image_url=https://cdimage.debian.org/images/cloud/sid/daily/20250919-2240/debian-sid-nocloud-amd64-daily-20250919-2240.qcow2
qemu_image_src=${wrk_dir}/$(basename "${qemu_image_url}")
qemu_image=${wrk_dir}/image.qcow2
guest_pkgs="stress-ng" # comma separated list of additional packages to install

run_qemu() {
    qemu-system-x86_64 \
        -m 2G -smp 4 \
        -nographic \
        -nic user,hostfwd=tcp::2222-:22 \
        -M q35,accel=kvm \
        -drive format=qcow2,file="${qemu_image}" \
        -virtfs local,path=.,mount_tag=shared,security_model=mapped-xattr \
        -serial file:console.log \
        -monitor none \
        -append "root=/dev/sda1 console=ttyS0,115200 sysctl.kernel.panic_on_oops=1" \
        -kernel "${kernel_image}"
}

run_ssh() {
    local cmd=$1
    ssh root@...alhost -p 2222 \
        -i ${wrk_dir}/id_ed25519 -F none \
        -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
        $cmd
}

setup_kernel() {
    echo "Building kernel ..."
    pushd $kernel_dir
    git reset --hard $kernel_version && git clean -f -d
    make mrproper
    make defconfig
    scripts/config -k -e EXPERT
    scripts/config -k -e PREEMPT_RT
    scripts/config -k -e CFS_BANDWIDTH
    scripts/config -k -e FUNCTION_TRACER
    make olddefconfig
    make -j8
    popd
}

setup_image() {
    echo "Preparing qemu image ..."
    killall -9 qemu-system-x86_64

    if [ ! -f "${qemu_image_src}" ] ; then
        wget ${qemu_image_url} -O ${qemu_image_src}
    fi

    cp ${qemu_image_src} ${qemu_image}

    echo "[Unit]
Description=User and Session Slice
Before=slices.target

[Slice]
CPUQuota=10%
AllowedCPUs=0
    " > user.slice

    yes | ssh-keygen -t ed25519 -f id_ed25519 -N ''

    # https://wiki.debian.org/ThomasChung/CloudImage
    virt-customize -a ${qemu_image} \
        --install ssh,${guest_pkgs} \
        --append-line "/root/.ssh/authorized_keys:$(cat id_ed25519.pub)" \
        --upload "user.slice:/etc/systemd/system/user.slice" \
        --chown "0:0:/etc/systemd/system/user.slice"
}

setup_kernel
setup_image

echo "Run test..."
ts=$(date --utc "+%FT%TZ")
out_dir=${wrk_dir}/out/${ts}
out_log_dir=${out_dir}/logs
mkdir -p ${out_dir} ${out_log_dir}
cp ${script_path} ${out_dir}
cp ${kernel_dir}/.config ${out_dir}/kernel.config
trap "exit" INT TERM
trap "kill 0" EXIT
export -f run_ssh
export wrk_dir

while true; do
        run_qemu &
        qemu_pid=$!
        sleep 10
        run_ssh 'stress-ng --cpu 1 --timeout 60' &
        for i in $(seq 1 10) ; do
            echo "running ssh $i"
            timeout 3 bash -c run_ssh #launch interactive ssh
        done
        run_ssh "systemctl poweroff"
        wait $qemu_pid

        serial_out_file=${out_log_dir}/serial-$(date "+%F-%T").log
	mv ${wrk_dir}/console.log ${serial_out_file}
	grep -e "Kernel panic" -e "Call Trace" -e "Kernel BUG" -e "cut here" ${serial_out_file} && \
            echo "${serial_out_file}" >> ${out_dir}/panics-warns
        grep -e "rcu.*detected.*stall" ${serial_out_file} && \
            echo "${serial_out_file}" >> ${out_dir}/rcu-stalls
done
---

[1]: https://lore.kernel.org/all/db7fc090-5c12-450b-87a4-bcf06e10ef68@amd.com/

Best regards,
Matteo Martelli



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ