linux-kernel - Re: [mm/gup] 57efa1fe59: will-it-scale.per_thread

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wg=LX1WKioiA_Cn-5hp6Hce_kTZg23uksSjNWgV_Ofj1Q@mail.gmail.com>
Date:   Fri, 4 Jun 2021 10:57:44 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Feng Tang <feng.tang@...el.com>
Cc:     Jason Gunthorpe <jgg@...dia.com>,
        kernel test robot <oliver.sang@...el.com>,
        John Hubbard <jhubbard@...dia.com>, Jan Kara <jack@...e.cz>,
        Peter Xu <peterx@...hat.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        "Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>,
        Christoph Hellwig <hch@....de>,
        Hugh Dickins <hughd@...gle.com>, Jann Horn <jannh@...gle.com>,
        Kirill Shutemov <kirill@...temov.name>,
        Kirill Tkhai <ktkhai@...tuozzo.com>,
        Leon Romanovsky <leonro@...dia.com>,
        Michal Hocko <mhocko@...e.com>,
        Oleg Nesterov <oleg@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        kernel test robot <lkp@...el.com>,
        "Huang, Ying" <ying.huang@...el.com>, zhengjun.xing@...el.com
Subject: Re: [mm/gup] 57efa1fe59: will-it-scale.per_thread_ops -9.2% regression

On Fri, Jun 4, 2021 at 12:52 AM Feng Tang <feng.tang@...el.com> wrote:
>
> On Fri, Jun 04, 2021 at 03:04:11PM +0800, Feng Tang wrote:
> > >
> > > The perf data doesn't even mention any of the GUP paths, and on the
> > > pure fork path the biggest impact would be:
> > >
> > >  (a) maybe "struct mm_struct" changed in size or had a different cache layout
> >
> > Yes, this seems to be the cause of the regression.
> >
> > The test case is many thread are doing map/unmap at the same time,
> > so the process's rw_semaphore 'mmap_lock' is highly contended.
> >
> > Before the patch (with 0day's kconfig), the mmap_lock is separated
> > into 2 cachelines, the 'count' is in one line, and the other members
> > sit in the next line, so it luckily avoid some cache bouncing. After
> > the patch, the 'mmap_lock' is pushed into one cacheline, which may
> > cause the regression.

Ok, thanks for following up on this.

> We've tried some patch, which can restore the regerssion. As the
> newly added member 'write_protect_seq' is 4 bytes long, and putting
> it into an existing 4 bytes long hole can restore the regeression,
> while not affecting most of other member's alignment. Please review
> the following patch, thanks!

The patch looks fine to me.

At the same time, I do wonder if maybe it would be worth exploring if
it's a good idea to perhaps move the 'mmap_sem' thing instead.

Or at least add a big comment. It's not clear to me exactly _which_
other fields are the ones that are so hot that the contention on
mmap_sem then causes even more cacheline bouncing.

For example, is it either

 (a) we *want* the mmap_sem to be in the first 128-byte region,
because then when we get the mmap_sem, the other fields in that same
cacheline are hot

OR

 (b) we do *not* want mmap_sem to be in the *second* 128-byte region,
because there is something *else* in that region that is touched
independently of mmap_sem that is very very hot and now you get even
more bouncing?

but I can't tell which one it is.

It would be great to have a comment in the code - and in the commit
message - about exactly which fields are the criticial ones. Because I
doubt it is 'write_protect_seq' itself that matters at all.

If it's "mmap_sem should be close to other commonly used fields",
maybe we should just move mmap_sem upwards in the structure?

              Linus