[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <88efe20e-75a9-6805-3ae4-dc67742f9057@oracle.com>
Date: Fri, 9 Feb 2018 12:33:03 -0500
From: Steven Sistare <steven.sistare@...cle.com>
To: Mike Galbraith <efault@....de>,
Rohit Jain <rohit.k.jain@...cle.com>,
linux-kernel@...r.kernel.org
Cc: peterz@...radead.org, mingo@...hat.com, joelaf@...gle.com,
jbacik@...com, riel@...hat.com, juri.lelli@...hat.com,
dhaval.giani@...cle.com
Subject: Re: [RFC 2/2] Introduce sysctl(s) for the migration costs
On 2/9/2018 12:08 PM, Mike Galbraith wrote:
> On Fri, 2018-02-09 at 11:10 -0500, Steven Sistare wrote:
>> On 2/8/2018 10:54 PM, Mike Galbraith wrote:
>>> On Thu, 2018-02-08 at 14:19 -0800, Rohit Jain wrote:
>>>> This patch introduces the sysctl for sched_domain based migration costs.
>>>> These in turn can be used for performance tuning of workloads.
>>>
>>> With this patch, we trade 1 completely bogus constant (cost is really
>>> highly variable) for 3, twiddling of which has zero effect unless you
>>> trigger a domain rebuild afterward, which is neither mentioned in the
>>> changelog, nor documented.
>>>
>>> bogo-numbers++ is kinda hard to love.
>>
>> Yup, the domain rebuild is missing.
>>
>> I am no fan of tunables, the fewer the better, but one of the several flaws
>> of the single figure for migration cost is that it ignores the very large
>> difference in cost when migrating between near vs far levels of the cache hierarchy.
>> Migration between CPUs of the same core should be free, as they share L1 cache.
>> Rohit defined a tunable for it, but IMO it could be hard coded to 0.
>
> That cost is never really 0 in the context of load balancing, as the
> load balancing machinery is non-free. When the idle_balance() throttle
> was added, that was done to mitigate the (at that time) quite high cost
> to high frequency cross core scheduling ala localhost communication.
I was imprecise. The cache-loss component of cost as represented by
sched_migration_cost should be 0 in this case. The cost of the machinery
is non-zero and remains in the code, and can still prevent migration.
>> Migration
>> between CPUs in different sockets is the most expensive and is represented by
>> the existing sysctl_sched_migration_cost tunable. Migration between CPUs in
>> the same core cluster, or in the same socket, is somewhere in between, as
>> they share L2 or L3 cache. We could avoid a separate tunable by setting it to
>> sysctl_sched_migration_cost / 10.
>
> Shrug. It's bogus no mater what we do. Once Upon A Time, a cost
> number was generated via measurement, but the end result was just as
> bogus as a number pulled out of the ether. How much bandwidth you have
> when blasting data to/from wherever says nothing about misses you avoid
> vs those you generate.
Yes, yes and yes. I cannot make the original tunable less bogus. Using a smaller
cost for closer caches still makes logical sense and is supported by the data.
- Steve
Powered by blists - more mailing lists