[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <af1e1aa1-7965-eb53-bbd9-079253eb83ba@oracle.com>
Date: Tue, 26 Mar 2019 18:02:13 -0700
From: Subhra Mazumdar <subhra.mazumdar@...cle.com>
To: Julien Desfossez <jdesfossez@...italocean.com>,
Peter Zijlstra <peterz@...radead.org>, mingo@...nel.org,
tglx@...utronix.de, pjt@...gle.com, tim.c.chen@...ux.intel.com,
torvalds@...ux-foundation.org
Cc: linux-kernel@...r.kernel.org, fweisbec@...il.com,
keescook@...omium.org, kerrnel@...gle.com,
Vineeth Pillai <vpillai@...italocean.com>,
Nishanth Aravamudan <naravamudan@...italocean.com>
Subject: Re: [RFC][PATCH 03/16] sched: Wrap rq::lock access
On 3/22/19 5:06 PM, Subhra Mazumdar wrote:
>
> On 3/21/19 2:20 PM, Julien Desfossez wrote:
>> On Tue, Mar 19, 2019 at 10:31 PM Subhra Mazumdar
>> <subhra.mazumdar@...cle.com>
>> wrote:
>>> On 3/18/19 8:41 AM, Julien Desfossez wrote:
>>>
>> On further investigation, we could see that the contention is mostly
>> in the
>> way rq locks are taken. With this patchset, we lock the whole core if
>> cpu.tag is set for at least one cgroup. Due to this, __schedule() is
>> more or
>> less serialized for the core and that attributes to the performance loss
>> that we are seeing. We also saw that newidle_balance() takes
>> considerably
>> long time in load_balance() due to the rq spinlock contention. Do you
>> think
>> it would help if the core-wide locking was only performed when
>> absolutely
>> needed ?
>>
> Is the core wide lock primarily responsible for the regression? I ran
> upto patch
> 12 which also has the core wide lock for tagged cgroups and also calls
> newidle_balance() from pick_next_task(). I don't see any regression.
> Of course
> the core sched version of pick_next_task() may be doing more but
> comparing with
> the __pick_next_task() it doesn't look too horrible.
I gathered some data with only 1 DB instance running (which also has 52%
slow
down). Following are the numbers of pick_next_task() calls and their avg
cost
for patch 12 and patch 15. The total number of calls seems to be similar
but the
avg cost (in us) has more than doubled. For both the patches I had put
the DB
instance into a cpu tagged cgroup.
patch12 patch15
count pick_next_task 62317898 58925395
avg cost pick_next_task 0.6566323209 1.4223810108
Powered by blists - more mailing lists