[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <173d4dbe-399d-4330-944c-9689588f18e8@suse.cz>
Date: Mon, 24 Feb 2025 21:53:11 +0100
From: Vlastimil Babka <vbabka@...e.cz>
To: Suren Baghdasaryan <surenb@...gle.com>,
Kent Overstreet <kent.overstreet@...ux.dev>
Cc: "Liam R. Howlett" <Liam.Howlett@...cle.com>,
Christoph Lameter <cl@...ux.com>, David Rientjes <rientjes@...gle.com>,
Roman Gushchin <roman.gushchin@...ux.dev>,
Hyeonggon Yoo <42.hyeyoo@...il.com>, Uladzislau Rezki <urezki@...il.com>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org, rcu@...r.kernel.org,
maple-tree@...ts.infradead.org,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Alexei Starovoitov <ast@...nel.org>
Subject: Re: [PATCH RFC v2 00/10] SLUB percpu sheaves
On 2/24/25 02:36, Suren Baghdasaryan wrote:
> On Sat, Feb 22, 2025 at 8:44 PM Suren Baghdasaryan <surenb@...gle.com> wrote:
>>
>> Don't know about this particular part but testing sheaves with maple
>> node cache and stress testing mmap/munmap syscalls shows performance
>> benefits as long as there is some delay to let kfree_rcu() do its job.
>> I'm still gathering results and will most likely post them tomorrow.
Without such delay, the perf is same or worse?
> Here are the promised test results:
>
> First I ran an Android app cycle test comparing the baseline against sheaves
> used for maple tree nodes (as this patchset implements). I registered about
> 3% improvement in app launch times, indicating improvement in mmap syscall
> performance.
There was no artificial 500us delay added for this test, right?
> Next I ran an mmap stress test which maps 5 1-page readable file-backed
> areas, faults them in and finally unmaps them, timing mmap syscalls.
> Repeats that 200000 cycles and reports the total time. Average of 10 such
> runs is used as the final result.
> 3 configurations were tested:
>
> 1. Sheaves used for maple tree nodes only (this patchset).
>
> 2. Sheaves used for maple tree nodes with vm_lock to vm_refcnt conversion [1].
> This patchset avoids allocating additional vm_lock structure on each mmap
> syscall and uses TYPESAFE_BY_RCU for vm_area_struct cache.
>
> 3. Sheaves used for maple tree nodes and for vm_area_struct cache with vm_lock
> to vm_refcnt conversion [1]. For the vm_area_struct cache I had to replace
> TYPESAFE_BY_RCU with sheaves, as we can't use both for the same cache.
Hm why we can't use both? I don't think any kmem_cache_create check makes
them exclusive? TYPESAFE_BY_RCU only affects how slab pages are freed, it
doesn't e.g. delay reuse of individual objects, and caching in a sheaf
doesn't write to the object. Am I missing something?
> The values represent the total time it took to perform mmap syscalls, less is
> better.
>
> (1) baseline control
> Little core 7.58327 6.614939 (-12.77%)
> Medium core 2.125315 1.428702 (-32.78%)
> Big core 0.514673 0.422948 (-17.82%)
>
> (2) baseline control
> Little core 7.58327 5.141478 (-32.20%)
> Medium core 2.125315 0.427692 (-79.88%)
> Big core 0.514673 0.046642 (-90.94%)
>
> (3) baseline control
> Little core 7.58327 4.779624 (-36.97%)
> Medium core 2.125315 0.450368 (-78.81%)
> Big core 0.514673 0.037776 (-92.66%)
>
> Results in (3) vs (2) indicate that using sheaves for vm_area_struct
> yields slightly better averages and I noticed that this was mostly due
> to sheaves results missing occasional spikes that worsened
> TYPESAFE_BY_RCU averages (the results seemed more stable with
> sheaves).
Thanks a lot, that looks promising!
> [1] https://lore.kernel.org/all/20250213224655.1680278-1-surenb@google.com/
>
Powered by blists - more mailing lists