lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241120091314.GJ38972@noisy.programming.kicks-ass.net>
Date: Wed, 20 Nov 2024 10:13:14 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Chenbo Lu <chenbo.lu@...yaviation.com>
Cc: stable@...r.kernel.org, regressions@...ts.linux.dev, mingo@...hat.com,
	juri.lelli@...hat.com, linux-kernel@...r.kernel.org,
	vschneid@...hat.com
Subject: Re: Performance Degradation After Upgrading to Kernel 6.8

On Wed, Nov 20, 2024 at 10:03:54AM +0100, Peter Zijlstra wrote:
> On Tue, Nov 19, 2024 at 04:30:02PM -0800, Chenbo Lu wrote:
> > Hello,
> > 
> > I am experiencing a significant performance degradation after
> > upgrading my kernel from version 6.6 to 6.8 and would appreciate any
> > insights or suggestions.
> > 
> > I am running a high-load simulation system that spawns more than 1000
> > threads and the overall CPU usage is 30%+ . Most of the threads are
> > using real-time
> > scheduling (SCHED_RR), and the threads of a model are using
> > SCHED_DEADLINE. After upgrading the kernel, I noticed that the
> > execution time of my model has increased from 4.5ms to 6ms.
> > 
> > What I Have Done So Far:
> > 1. I found this [bug
> > report](https://bugzilla.kernel.org/show_bug.cgi?id=219366#c7) and
> > reverted the commit efa7df3e3bb5da8e6abbe37727417f32a37fba47 mentioned
> > in the post. Unfortunately, this did not resolve the issue.
> > 2. I performed a git bisect and found that after these two commits
> > related to scheduling (RT and deadline) were merged, the problem
> > happened. They are 612f769edd06a6e42f7cd72425488e68ddaeef0a,
> > 5fe7765997b139e2d922b58359dea181efe618f9
> 
> And yet you failed to Cc Valentin, the author of said commits :/
> 
> > After reverting these two commits, the model execution time improved
> > to around 5 ms.
> > 3. I revert two more commits, and the execution time is back to 4.7ms:
> > 63ba8422f876e32ee564ea95da9a7313b13ff0a1,
> > efa7df3e3bb5da8e6abbe37727417f32a37fba47
> > 
> > My questions are:
> > 1.Has anyone else experienced similar performance degradation after
> > upgrading to kernel 6.8?
> 
> This is 4 kernel releases back, I my memory isn't that long.
> 
> > 2.Can anyone explain why these two commits are causing the problem? I
> > am not very familiar with the kernel code and would appreciate any
> > insights.
> 
> There might be a race window between setting the tro and sending the
> IPI, such that previously the extra IPIs would sooner find the newly
> pushable task.
> 
> Valentin, would it make sense to set tro before enqueueing the pushable,
> instead of after it?

s/tro/rto/ clearly I'm consistently not capable of typing that :-)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ