linux-kernel - Re: [LKP] [mm] 9bc8039e71: will-it-scale.per_thread

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <51265121-6e54-ff3a-cdfa-e5a2b838268d@linux.alibaba.com>
Date:   Mon, 5 Nov 2018 12:17:59 -0800
From:   Yang Shi <yang.shi@...ux.alibaba.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     rong.a.chen@...el.com, vbabka@...e.cz,
        kirill.shutemov@...ux.intel.com, mhocko@...nel.org,
        Matthew Wilcox <willy@...radead.org>,
        ldufour@...ux.vnet.ibm.com,
        Andrew Morton <akpm@...ux-foundation.org>,
        Colin King <colin.king@...onical.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        lkp@...org
Subject: Re: [LKP] [mm] 9bc8039e71: will-it-scale.per_thread_ops -64.1%
 regression



On 11/5/18 10:35 AM, Linus Torvalds wrote:
> On Mon, Nov 5, 2018 at 10:28 AM Yang Shi <yang.shi@...ux.alibaba.com> wrote:
>> Actually, the commit is mainly for optimizing the long stall time caused
>> by holding mmap_sem by write when unmapping or shrinking large mapping.
>> It downgrades write mmap_sem to read when zapping pages. So, it looks
>> the downgrade incurs more context switches. This is kind of expected.
>>
>> However, the test looks just shrink the mapping with one normal 4K page
>> size. It sounds the overhead of context switches outpace the gain in
>> this case at the first glance.
> I'm not seeing why there should be a context switch in the first place.
>
> Even if you have lots of concurrent brk() users, they should all block
> exactly the same way as before (a write lock blocks against a write
> lock, but it *also* blocks against a downgraded read lock).

Yes, it is true. The brk() users will not get waken up. What I can think 
of for now is there might be other helper processes and/or kernel 
threads are waiting for read mmap_sem. They might get waken up by the 
downgrade.

But, I also saw huge increase in cpu idle time and sched_goidle events. 
Not have clue yet for why idle goes up.

20610709 ± 15%   +2376.0%  5.103e+08 ± 34%  cpuidle.C1.time
28753819 ± 39%   +1054.5%  3.319e+08 ± 49%  cpuidle.C3.time

175049 ± 72%    +840.7%    1646720 ± 72%  sched_debug.cpu.sched_goidle.stddev


Thanks,
Yang

>
> So no, I don't want just some limit to hide this problem for that
> particular test. There's something else going on.
>
>                   Linus