lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpEisU8Lfe96AYJDZ+OM4NoPmnw9bP53cT_kbfP_pR+-2g@mail.gmail.com>
Date: Sun, 11 Aug 2024 15:50:12 -0700
From: Suren Baghdasaryan <surenb@...gle.com>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Mateusz Guzik <mjguzik@...il.com>, linux-kernel@...r.kernel.org, linux-mm@...ck.org, 
	Liam.Howlett@...cle.com, lstoakes@...il.com, pedro.falcato@...il.com
Subject: Re: [RFC PATCH] vm: align vma allocation and move the lock back into
 the struct

On Fri, Aug 9, 2024 at 9:56 AM Suren Baghdasaryan <surenb@...gle.com> wrote:
>
> On Fri, Aug 9, 2024 at 3:09 PM Vlastimil Babka <vbabka@...e.cz> wrote:
> >
> > On 8/9/24 05:57, Suren Baghdasaryan wrote:
> > > Maybe it has something to do with NUMA? The system I'm running has 2 NUMA nodes:
> >
> > I kinda doubt the NUMA aspect. Whether you allocate a vma that embeds a
> > lock, or a vma and immediately the separate lock, it's unlikely they would
> > end up on different nodes so from the NUMA perspective I don't see a
> > difference. And if they ended up on separate nodes, it would more likely be
> > worse for the case of separate locks, not better.
>
> I have an UMA machine. Will try the test there as well. It won't
> provide hard proof but at least some possible hints.

Ok, disabling adjacent cacheline prefetching seems to do the trick (or
at least cuts down the regression drastically):

Hmean     faults/cpu-1    470577.6434 (   0.00%)   470745.2649 *   0.04%*
Hmean     faults/cpu-4    445862.9701 (   0.00%)   445572.2252 *  -0.07%*
Hmean     faults/cpu-7    422516.4002 (   0.00%)   422677.5591 *   0.04%*
Hmean     faults/cpu-12   344483.7047 (   0.00%)   330476.7911 *  -4.07%*
Hmean     faults/cpu-21   192836.0188 (   0.00%)   195266.8071 *   1.26%*
Hmean     faults/cpu-30   140745.9472 (   0.00%)   140655.0459 *  -0.06%*
Hmean     faults/cpu-48   110507.4310 (   0.00%)   103802.1839 *  -6.07%*
Hmean     faults/cpu-56    93507.7919 (   0.00%)    95105.1875 *   1.71%*
Hmean     faults/sec-1    470232.3887 (   0.00%)   470404.6525 *   0.04%*
Hmean     faults/sec-4   1757368.9266 (   0.00%)  1752852.8697 *  -0.26%*
Hmean     faults/sec-7   2909554.8150 (   0.00%)  2915885.8739 *   0.22%*
Hmean     faults/sec-12  4033840.8719 (   0.00%)  3845165.3277 *  -4.68%*
Hmean     faults/sec-21  3845857.7079 (   0.00%)  3890316.8799 *   1.16%*
Hmean     faults/sec-30  3838607.4530 (   0.00%)  3838861.8142 *   0.01%*
Hmean     faults/sec-48  4882118.9701 (   0.00%)  4608985.0530 *  -5.59%*
Hmean     faults/sec-56  4933535.7567 (   0.00%)  5004208.3329 *   1.43%*

Now, how do we disable prefetching extra cachelines for vm_area_structs only?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ