[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5fe8b838-ffe6-8f5d-e23d-61c58bc7f6b3@amd.com>
Date: Thu, 14 Dec 2023 10:45:18 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: John Stultz <jstultz@...gle.com>
Cc: LKML <linux-kernel@...r.kernel.org>,
Joel Fernandes <joelaf@...gle.com>,
Qais Yousef <qyousef@...gle.com>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Valentin Schneider <vschneid@...hat.com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>,
Zimuzo Ezeozue <zezeozue@...gle.com>,
Youssef Esmat <youssefesmat@...gle.com>,
Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Will Deacon <will@...nel.org>,
Waiman Long <longman@...hat.com>,
Boqun Feng <boqun.feng@...il.com>,
"Paul E . McKenney" <paulmck@...nel.org>, kernel-team@...roid.com,
Metin Kaya <Metin.Kaya@....com>
Subject: Re: [PATCH v6 00/20] Proxy Execution: A generalized form of Priority
Inheritance v6
Hello John,
Thank you for taking a look at the report.
On 12/14/2023 12:41 AM, John Stultz wrote:
> On Tue, Dec 12, 2023 at 10:37 PM K Prateek Nayak <kprateek.nayak@....com> wrote:
>>
>> Hello John,
>>
>> I may have some data that might help you debug a potential performance
>> issue mentioned below.
>
> Hey Prateek,
> Thank you so much for taking the time to try this out and providing
> such helpful analysis!
> More below.
>
>> On 11/7/2023 1:04 AM, John Stultz wrote:
>>> [..snip..]
>>>
>>> Performance:
>>> —----------
>>> This patch series switches mutexes to use handoff mode rather
>>> than optimistic spinning. This is a potential concern where locks
>>> are under high contention. However, earlier performance analysis
>>> (on both x86 and mobile devices) did not see major regressions.
>>> That said, Chenyu did report a regression[3], which I’ll need to
>>> look further into.
>>
>> I too see this as the most notable regression. Some of the other
>> benchmarks I've tested (schbench, tbench, netperf, ycsb-mongodb,
>> DeathStarBench) show little to no difference when running with Proxy
>
> This is great to hear! Thank you for providing this input!
>
>> Execution, however sched-messaging sees a 10x blowup in the runtime.
>> (taskset -c 0-7,128-125 perf bench sched messaging -p -t -l 100000 -g 1)
>
> Oof. I appreciate you sharing this!
>
>> While investigating, I drew up the runqueue length when running
>> sched-messaging pinned to 1CCX (CPUs 0-7,128-125 on my 3rd Generation
>> EPYC system) using the following bpftrace script that dumps it csv
>> format:
>
> Just so I'm following you properly on the processor you're using, cpus
> 0-7 and 128-125 are in the same CCX?
> (I thought there were only 8 cores per CCX?)
Sorry about that! It should be 0-7,128-135 (16 threads of 8 cores in the
same CCX) The pinning was added so that I could only observe a subset of
the total CPUs since analyzing the behavior of 40 tasks on 256 CPUs was
much harder than analyzing it on 16 CPUs :)
>
>> rqlen.bt
>> ---
> <snip>
>> --
>>
>> I've attached the csv for tip (rqlen50-tip-pinned.csv) and proxy
>> execution (rqlen50-pe-pinned.csv) below.
>>
>> The trend I see with hackbench is that the chain migration leads
>> to a single runqueue being completely overloaded, followed by some
>> amount of the idling on the entire CCX and a similar chain appearing
>> on a different CPU. The trace for tip show a lot more CPUs being
>> utilized.
>>
>> Mathieu has been looking at hackbench and the effect of task migration
>> on the runtime and it appears that lowering the migrations improves
>> the hackbench performance significantly [1][2][3]
>>
>> [1] https://lore.kernel.org/lkml/20230905072141.GA253439@ziqianlu-dell/
>> [2] https://lore.kernel.org/lkml/20230905171105.1005672-1-mathieu.desnoyers@efficios.com/
>> [3] https://lore.kernel.org/lkml/20231019160523.1582101-1-mathieu.desnoyers@efficios.com/
>>
>> Since migration itself is not cheap, I believe the chain migration at
>> the current scale hampers the performance since sched-messaging
>> emulates a worst-case scenario for proxy-execution.
>
> Hrm.
>
>> I'll update the thread once I have more information. I'll continue
>> testing and take a closer look at the implementation.
>>
>>> I also briefly re-tested with this v5 series
>>> and saw some average latencies grow vs v4, suggesting the changes
>>> to return-migration (and extra locking) have some impact. With v6
>>> the extra overhead is reduced but still not as nice as v4. I’ll
>>> be digging more there, but my priority is still stability over
>>> speed at this point (it’s easier to validate correctness of
>>> optimizations if the baseline isn’t crashing).
>>>
>>>
>>> If folks find it easier to test/tinker with, this patch series
>>> can also be found here:
>>> https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v6-6.6
>>> https://github.com/johnstultz-work/linux-dev.git proxy-exec-v6-6.6
>>
>> P.S. I was using the above tree.
>
> Ok, I've been working on getting v7 ready which includes two main things:
> 1) I've reworked the return migration back into the ttwu path to avoid
> the lock juggling
> 2) Working to properly conditionalize and re-add Connor's
> chain-migration feature (which when a migration happens pulls the full
> blocked_donor list with it)
>
> So I'll try to reproduce your results and see if these help any with
> this particular case, and then I'll start to look closer at what can
> be done.
>
> Again, thanks so much, I've got so much gratitude for your testing and
> analysis here. I really appreciate your feedback!
> Do let me know if you find anything further!
Sure thing! I'll keep you updated of any finding. Thank you for digging
further into this issue.
>
> thanks
> -john
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists