linux-kernel - Re: [PATCH v2 2/5] mm: move per-vma lock into vm_area

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <23fdd6b6-3922-43e8-9aeb-213534ba3e94@suse.cz>
Date: Wed, 13 Nov 2024 16:09:50 +0100
From: Vlastimil Babka <vbabka@...e.cz>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Suren Baghdasaryan <surenb@...gle.com>, akpm@...ux-foundation.org,
 willy@...radead.org, liam.howlett@...cle.com, mhocko@...e.com,
 hannes@...xchg.org, mjguzik@...il.com, oliver.sang@...el.com,
 mgorman@...hsingularity.net, david@...hat.com, peterx@...hat.com,
 oleg@...hat.com, dave@...olabs.net, paulmck@...nel.org, brauner@...nel.org,
 dhowells@...hat.com, hdanton@...a.com, hughd@...gle.com, minchan@...gle.com,
 jannh@...gle.com, shakeel.butt@...ux.dev, souravpanda@...gle.com,
 pasha.tatashin@...een.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org,
 kernel-team@...roid.com
Subject: Re: [PATCH v2 2/5] mm: move per-vma lock into vm_area_struct

On 11/13/24 15:58, Lorenzo Stoakes wrote:
> On Wed, Nov 13, 2024 at 03:45:24PM +0100, Vlastimil Babka wrote:
>> On 11/13/24 15:28, Lorenzo Stoakes wrote:
>> > On Tue, Nov 12, 2024 at 11:46:32AM -0800, Suren Baghdasaryan wrote:
>> >> Back when per-vma locks were introduces, vm_lock was moved out of
>> >> vm_area_struct in [1] because of the performance regression caused by
>> >> false cacheline sharing. Recent investigation [2] revealed that the
>> >> regressions is limited to a rather old Broadwell microarchitecture and
>> >> even there it can be mitigated by disabling adjacent cacheline
>> >> prefetching, see [3].
>> >
>> > I don't see a motivating reason as to why we want to do this? We increase
>> > memory usage here which is not good, but later lock optimisation mitigates
>>
>> I'd say we don't normally split logically single structures into multiple
>> structures just because they might pack better in multiple slabs vs single
>> slab. Because that means more complicated management, extra pointer
>> dereferences etc. And if that split away part is a lock, it even complicates
>> things further. So the only motivation for doing that split was that it
>> appeared to perform better, but that was found to be misleading.
>>
>> But sure it can be described better, and include the new
>> SLAB_TYPESAFE_BY_RCU conversion part as the motivation - that would be
>> likely impossible to do with a split away lock.
> 
> Right, my point is that there is no justification given here at all, and we
> should give one. I understand the provenance of why we split the lock, but
> there has to be a motivating reason if everything is working fine right
> now.
> 
> The SLAB_TYPESAFE_BY_RCU one seems to be the key one, but also something
> along the lines of complicated management, concern about ordering of
> allocating/freeing things, etc.
> 
> Just needs some extra explanation here.

Right.

>>
>> > it, but why wouldn't we just do the lock optimisations and use less memory
>> > overall?
>>
>> If the lock is made much smaller then the packing benefit by split might
>> disappear, as is the case here.
>>
> 
> Yeah, but the lock would be smaller so we'd save space still right?

If it becomes small enough that it makes the vma including the lock fit in
192 bytes, then we're no longer saving space by keeping it in a separate
cache. Yes it would make the motivation even more obvious by shrinking it
first and then saying "look, we can even save more space by not having it
separate anymore". But there's the downside of nonstandard locking so I
think the order of changes that first unsplit the lock and only then
consider the pros and cons of shrinking makes more sense.
>> >> This patchset moves vm_lock back into vm_area_struct, aligning it at the
>> >> cacheline boundary and changing the cache to be cache-aligned as well.
>> >> This causes VMA memory consumption to grow from 160 (vm_area_struct) + 40
>> >> (vm_lock) bytes to 256 bytes:
>> >>
>> >>     slabinfo before:
>> >>      <name>           ... <objsize> <objperslab> <pagesperslab> : ...
>> >>      vma_lock         ...     40  102    1 : ...
>> >>      vm_area_struct   ...    160   51    2 : ...
>> >
>> > Pedantry, but it might be worth mentioning how much this can vary by config.
>> >
>> > For instance, on my machine:
>> >
>> > vm_area_struct    125238 138820    184   44
>> >
>> >>
>> >>     slabinfo after moving vm_lock:
>> >>      <name>           ... <objsize> <objperslab> <pagesperslab> : ...
>> >>      vm_area_struct   ...    256   32    2 : ...
>> >>
>> >> Aggregate VMA memory consumption per 1000 VMAs grows from 50 to 64 pages,
>> >> which is 5.5MB per 100000 VMAs. This memory consumption growth can be
>> >> addressed later by optimizing the vm_lock.
>> >
>> > Yes grabbing this back is of critical importance I'd say! :)
>>
>> I doubt it's that critical. We'll have to weight that against introducing
>> another non-standard locking primitive.
> 
> Avoiding unnecessary VMA overhead is important, and a strong part of the
> motivation for the guard pages series for instance, so I don't think we
> should be unconcerned about unnecessary extra memory usage.

Yeah guard pages eliminate whole VMAs from existence so they are great :)

> I'm guessing from what you say you're not in favour of the subsequent
> 'non-standard' locking changes...

True.

>>
>> > Functionally it looks ok to me but would like to see a stronger
>> > justification in the commit msg! :)
>> >
>> >>
>> >> [1] https://lore.kernel.org/all/20230227173632.3292573-34-surenb@google.com/
>> >> [2] https://lore.kernel.org/all/ZsQyI%2F087V34JoIt@xsang-OptiPlex-9020/
>> >> [3] https://lore.kernel.org/all/CAJuCfpEisU8Lfe96AYJDZ+OM4NoPmnw9bP53cT_kbfP_pR+-2g@mail.gmail.com/
>> >>
>> >> Signed-off-by: Suren Baghdasaryan <surenb@...gle.com>
>> >> ---
>> >>  include/linux/mm.h       | 28 +++++++++++++----------
>> >>  include/linux/mm_types.h |  6 +++--
>> >>  kernel/fork.c            | 49 ++++------------------------------------
>> >>  3 files changed, 25 insertions(+), 58 deletions(-)
>> >>