linux-kernel - Re: [RFC][PATCH 03/16] sched: Wrap rq::lock access

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <af1e1aa1-7965-eb53-bbd9-079253eb83ba@oracle.com>
Date:   Tue, 26 Mar 2019 18:02:13 -0700
From:   Subhra Mazumdar <subhra.mazumdar@...cle.com>
To:     Julien Desfossez <jdesfossez@...italocean.com>,
        Peter Zijlstra <peterz@...radead.org>, mingo@...nel.org,
        tglx@...utronix.de, pjt@...gle.com, tim.c.chen@...ux.intel.com,
        torvalds@...ux-foundation.org
Cc:     linux-kernel@...r.kernel.org, fweisbec@...il.com,
        keescook@...omium.org, kerrnel@...gle.com,
        Vineeth Pillai <vpillai@...italocean.com>,
        Nishanth Aravamudan <naravamudan@...italocean.com>
Subject: Re: [RFC][PATCH 03/16] sched: Wrap rq::lock access


On 3/22/19 5:06 PM, Subhra Mazumdar wrote:
>
> On 3/21/19 2:20 PM, Julien Desfossez wrote:
>> On Tue, Mar 19, 2019 at 10:31 PM Subhra Mazumdar 
>> <subhra.mazumdar@...cle.com>
>> wrote:
>>> On 3/18/19 8:41 AM, Julien Desfossez wrote:
>>>
>> On further investigation, we could see that the contention is mostly 
>> in the
>> way rq locks are taken. With this patchset, we lock the whole core if
>> cpu.tag is set for at least one cgroup. Due to this, __schedule() is 
>> more or
>> less serialized for the core and that attributes to the performance loss
>> that we are seeing. We also saw that newidle_balance() takes 
>> considerably
>> long time in load_balance() due to the rq spinlock contention. Do you 
>> think
>> it would help if the core-wide locking was only performed when 
>> absolutely
>> needed ?
>>
> Is the core wide lock primarily responsible for the regression? I ran 
> upto patch
> 12 which also has the core wide lock for tagged cgroups and also calls
> newidle_balance() from pick_next_task(). I don't see any regression.  
> Of course
> the core sched version of pick_next_task() may be doing more but 
> comparing with
> the __pick_next_task() it doesn't look too horrible.
I gathered some data with only 1 DB instance running (which also has 52% 
slow
down). Following are the numbers of pick_next_task() calls and their avg 
cost
for patch 12 and patch 15. The total number of calls seems to be similar 
but the
avg cost (in us) has more than doubled. For both the patches I had put 
the DB
instance into a cpu tagged cgroup.

                              patch12 patch15
count pick_next_task         62317898 58925395
avg cost pick_next_task      0.6566323209   1.4223810108