linux-kernel - Re: [RFC PATCH] vm: align vma allocation and move the lock back into the struct

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJuCfpEisU8Lfe96AYJDZ+OM4NoPmnw9bP53cT_kbfP_pR+-2g@mail.gmail.com>
Date: Sun, 11 Aug 2024 15:50:12 -0700
From: Suren Baghdasaryan <surenb@...gle.com>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Mateusz Guzik <mjguzik@...il.com>, linux-kernel@...r.kernel.org, linux-mm@...ck.org, 
	Liam.Howlett@...cle.com, lstoakes@...il.com, pedro.falcato@...il.com
Subject: Re: [RFC PATCH] vm: align vma allocation and move the lock back into
 the struct

On Fri, Aug 9, 2024 at 9:56 AM Suren Baghdasaryan <surenb@...gle.com> wrote:
>
> On Fri, Aug 9, 2024 at 3:09 PM Vlastimil Babka <vbabka@...e.cz> wrote:
> >
> > On 8/9/24 05:57, Suren Baghdasaryan wrote:
> > > Maybe it has something to do with NUMA? The system I'm running has 2 NUMA nodes:
> >
> > I kinda doubt the NUMA aspect. Whether you allocate a vma that embeds a
> > lock, or a vma and immediately the separate lock, it's unlikely they would
> > end up on different nodes so from the NUMA perspective I don't see a
> > difference. And if they ended up on separate nodes, it would more likely be
> > worse for the case of separate locks, not better.
>
> I have an UMA machine. Will try the test there as well. It won't
> provide hard proof but at least some possible hints.

Ok, disabling adjacent cacheline prefetching seems to do the trick (or
at least cuts down the regression drastically):

Hmean     faults/cpu-1    470577.6434 (   0.00%)   470745.2649 *   0.04%*
Hmean     faults/cpu-4    445862.9701 (   0.00%)   445572.2252 *  -0.07%*
Hmean     faults/cpu-7    422516.4002 (   0.00%)   422677.5591 *   0.04%*
Hmean     faults/cpu-12   344483.7047 (   0.00%)   330476.7911 *  -4.07%*
Hmean     faults/cpu-21   192836.0188 (   0.00%)   195266.8071 *   1.26%*
Hmean     faults/cpu-30   140745.9472 (   0.00%)   140655.0459 *  -0.06%*
Hmean     faults/cpu-48   110507.4310 (   0.00%)   103802.1839 *  -6.07%*
Hmean     faults/cpu-56    93507.7919 (   0.00%)    95105.1875 *   1.71%*
Hmean     faults/sec-1    470232.3887 (   0.00%)   470404.6525 *   0.04%*
Hmean     faults/sec-4   1757368.9266 (   0.00%)  1752852.8697 *  -0.26%*
Hmean     faults/sec-7   2909554.8150 (   0.00%)  2915885.8739 *   0.22%*
Hmean     faults/sec-12  4033840.8719 (   0.00%)  3845165.3277 *  -4.68%*
Hmean     faults/sec-21  3845857.7079 (   0.00%)  3890316.8799 *   1.16%*
Hmean     faults/sec-30  3838607.4530 (   0.00%)  3838861.8142 *   0.01%*
Hmean     faults/sec-48  4882118.9701 (   0.00%)  4608985.0530 *  -5.59%*
Hmean     faults/sec-56  4933535.7567 (   0.00%)  5004208.3329 *   1.43%*

Now, how do we disable prefetching extra cachelines for vm_area_structs only?