[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <f7046fcc-91e3-434e-930c-10259b36a90b@arm.com>
Date: Fri, 29 Nov 2024 11:12:15 +0100
From: Dietmar Eggemann <dietmar.eggemann@....com>
To: Cristian Prundeanu <cpru@...zon.com>
Cc: abuehaze@...zon.com, alisaidi@...zon.com, benh@...nel.crashing.org,
blakgeof@...zon.com, csabac@...zon.com, doebel@...zon.com,
gautham.shenoy@....com, joseph.salisbury@...cle.com, kprateek.nayak@....com,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-tip-commits@...r.kernel.org, mingo@...hat.com, peterz@...radead.org,
x86@...nel.org
Subject: Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and
RUN_TO_PARITY and move them to sysctl
On 28/11/2024 11:32, Cristian Prundeanu wrote:
[...]
> On 2024-11-26, Dietmar Eggemann wrote:
>
>> SUT kernel arm64 (mysql-8.4.0)
>> (2) 6.12.0-rc4 -12.9%
>> (3) 6.12.0-rc4 NO_PLACE_LAG +6.4%
>> (4) v6.12-rc4 SCHED_BATCH +10.8%
>
> This is very interesting; our setups are close, yet I have not seen any
> feature or policy combination that performs above the 6.5 CFS baseline.
> I look forward to seeing your results with the repro when it's ready.
>
> Did you only use NO_PLACE_LAG or was it together with NO_RUN_TO_PARITY?
Only NO_PLACE_LAG.
> Was SCHED_BATCH used with the default feature set (all enabled)?
Yes.
> Which distro/version did you use for the SUT?
The default, Ubuntu 24.04 Arm64 server.
>> Maybe a difference in our test setup can explain the different test results:
>>
>> I use:
>>
>> HammerDB Load Generator <-> MySQL SUT
>> 192 VCPUs <-> 16 VCPUs
>>
>> Virtual users: 256
>> Warehouse count: 64
>> 3 min rampup
>> 10 min test run time
>> performance data: NOPM (New Operations Per Minute)
>>
>> So I have 256 'connection' tasks running on the 16 SUT VCPUS.
>
> My setup:
>
> SUT - 16 vCPUs, 32 GB RAM
> Loadgen - 64 vCPU, 128 GB RAM (anything large enough to not be a
> bottleneck should work)
>
> Virtual users: 4 x vCPUs = 64
> Warehouses: 24
> Rampup: 5 min
> Test runtime: 20 min x 10 times, each on 4 different SUT/Loadgen pairs
> Value recorded: geometric_mean(NOPM)
Looks like you have 4 times less 'connection' tasks on your 16 VCPUs. So
much less concurrency/preemption ...
Powered by blists - more mailing lists