lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <cover.1648228023.git.tim.c.chen@linux.intel.com>
Date:   Fri, 25 Mar 2022 15:54:15 -0700
From:   Tim Chen <tim.c.chen@...ux.intel.com>
To:     Peter Zijlstra <peterz@...radead.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Ingo Molnar <mingo@...e.hu>, Juri Lelli <juri.lelli@...hat.com>
Cc:     Tim Chen <tim.c.chen@...ux.intel.com>,
        Yu Chen <yu.c.chen@...el.com>,
        Walter Mack <walter.mack@...el.com>,
        Mel Gorman <mgorman@...e.de>, linux-kernel@...r.kernel.org
Subject: [PATCH 0/2] sched/fair: Fix starvation caused by task migration 

Walter Mack noticed during stress testing on 2 socket Sapphire Rapids
system, there were anomalies where tasks were starved for more
than 70 secs before getting scheduled.

The stress test scenario is an extreme case where about 50 threads
per CPU are started on each core.  Then each thread hops from
one core to another continuously.

We discussed this issue with Peter Z., who narrowed
things down to problem with vruntime setting of a migrated
task being too out of sync with the tasks on the target run queue.

Peter suggested the following two patches that did fix
the starvation anomalies that Walter saw.

Yu Chen also kicked the patches into our 0-day test infrastructure to
check for regressions.  The performance changes of note are below:

5.15        Throughput    5.15+patchest  Test
	    Changes       
4634070      -7.5%        4285823        stress-ng.sigsuspend.ops_per_sec
  29934     +37.0%          41006        aim7.jobs-per-min

Stress-ng sigsuspend is the worst affected.  But for most workloads,
they are not negatively impacted.  In fact, we saw 37% improvement
in Aim7 due to these patches.

Tim

Peter Zijlstra (1):
  sched/fair: Don't rely on ->exec_start for migration

Peter Zijlstra (Intel) (1):
  sched/fair: Simple runqueue order on migrate

 include/linux/sched.h   |  1 +
 kernel/sched/fair.c     | 37 +++++++++++++++++++++++++++++++++----
 kernel/sched/features.h |  2 ++
 3 files changed, 36 insertions(+), 4 deletions(-)

-- 
2.32.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ