[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpEisU8Lfe96AYJDZ+OM4NoPmnw9bP53cT_kbfP_pR+-2g@mail.gmail.com>
Date: Sun, 11 Aug 2024 15:50:12 -0700
From: Suren Baghdasaryan <surenb@...gle.com>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Mateusz Guzik <mjguzik@...il.com>, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
Liam.Howlett@...cle.com, lstoakes@...il.com, pedro.falcato@...il.com
Subject: Re: [RFC PATCH] vm: align vma allocation and move the lock back into
the struct
On Fri, Aug 9, 2024 at 9:56 AM Suren Baghdasaryan <surenb@...gle.com> wrote:
>
> On Fri, Aug 9, 2024 at 3:09 PM Vlastimil Babka <vbabka@...e.cz> wrote:
> >
> > On 8/9/24 05:57, Suren Baghdasaryan wrote:
> > > Maybe it has something to do with NUMA? The system I'm running has 2 NUMA nodes:
> >
> > I kinda doubt the NUMA aspect. Whether you allocate a vma that embeds a
> > lock, or a vma and immediately the separate lock, it's unlikely they would
> > end up on different nodes so from the NUMA perspective I don't see a
> > difference. And if they ended up on separate nodes, it would more likely be
> > worse for the case of separate locks, not better.
>
> I have an UMA machine. Will try the test there as well. It won't
> provide hard proof but at least some possible hints.
Ok, disabling adjacent cacheline prefetching seems to do the trick (or
at least cuts down the regression drastically):
Hmean faults/cpu-1 470577.6434 ( 0.00%) 470745.2649 * 0.04%*
Hmean faults/cpu-4 445862.9701 ( 0.00%) 445572.2252 * -0.07%*
Hmean faults/cpu-7 422516.4002 ( 0.00%) 422677.5591 * 0.04%*
Hmean faults/cpu-12 344483.7047 ( 0.00%) 330476.7911 * -4.07%*
Hmean faults/cpu-21 192836.0188 ( 0.00%) 195266.8071 * 1.26%*
Hmean faults/cpu-30 140745.9472 ( 0.00%) 140655.0459 * -0.06%*
Hmean faults/cpu-48 110507.4310 ( 0.00%) 103802.1839 * -6.07%*
Hmean faults/cpu-56 93507.7919 ( 0.00%) 95105.1875 * 1.71%*
Hmean faults/sec-1 470232.3887 ( 0.00%) 470404.6525 * 0.04%*
Hmean faults/sec-4 1757368.9266 ( 0.00%) 1752852.8697 * -0.26%*
Hmean faults/sec-7 2909554.8150 ( 0.00%) 2915885.8739 * 0.22%*
Hmean faults/sec-12 4033840.8719 ( 0.00%) 3845165.3277 * -4.68%*
Hmean faults/sec-21 3845857.7079 ( 0.00%) 3890316.8799 * 1.16%*
Hmean faults/sec-30 3838607.4530 ( 0.00%) 3838861.8142 * 0.01%*
Hmean faults/sec-48 4882118.9701 ( 0.00%) 4608985.0530 * -5.59%*
Hmean faults/sec-56 4933535.7567 ( 0.00%) 5004208.3329 * 1.43%*
Now, how do we disable prefetching extra cachelines for vm_area_structs only?
Powered by blists - more mailing lists