[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20221107213114.916231-1-nphamcs@gmail.com>
Date: Mon, 7 Nov 2022 13:31:14 -0800
From: Nhat Pham <nphamcs@...il.com>
To: senozhatsky@...omium.org
Cc: hannes@...xchg.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, minchan@...nel.org,
ngupta@...are.org, akpm@...ux-foundation.org, sjenning@...hat.com,
ddstreet@...e.org, vitaly.wool@...sulko.com
Subject: Re: [PATCH 2/5] zsmalloc: Consolidate zs_pool's migrate_lock and size_class's locks
We have benchmarked the lock consolidation to see the performance effect of
this change on zram. First, we ran a synthetic FS workload on a server machine
with 36 cores (same machine for all runs), using this benchmark script:
https://github.com/josefbacik/fs_mark
using 32 threads, and cranking the pressure up to > 80% FS usage.
Here is the result (unit is file/second):
With lock consolidation (btrfs):
Average: 13520.2, Median: 13531.0, Stddev: 137.5961482019028
Without lock consolidation (btrfs):
Average: 13487.2, Median: 13575.0, Stddev: 309.08283679298665
With lock consolidation (ext4):
Average: 16824.4, Median: 16839.0, Stddev: 89.97388510006668
Without lock consolidation (ext4)
Average: 16958.0, Median: 16986.0, Stddev: 194.7370021336469
As you can see, we observe a 0.3% regression for btrfs, and a 0.9% regression
for ext4. This is a small, barely measurable difference in my opinion.
For a more realistic scenario, we also tries building the kernel on zram.
Here is the time it takes (in seconds):
With lock consolidation (btrfs):
real
Average: 319.6, Median: 320.0, Stddev: 0.8944271909999159
user
Average: 6894.2, Median: 6895.0, Stddev: 25.528415540334656
sys
Average: 521.4, Median: 522.0, Stddev: 1.51657508881031
Without lock consolidation (btrfs):
real
Average: 319.8, Median: 320.0, Stddev: 0.8366600265340756
user
Average: 6896.6, Median: 6899.0, Stddev: 16.04057355583023
sys
Average: 520.6, Median: 521.0, Stddev: 1.140175425099138
With lock consolidation (ext4):
real
Average: 320.0, Median: 319.0, Stddev: 1.4142135623730951
user
Average: 6896.8, Median: 6878.0, Stddev: 28.621670111997307
sys
Average: 521.2, Median: 521.0, Stddev: 1.7888543819998317
Without lock consolidation (ext4)
real
Average: 319.6, Median: 319.0, Stddev: 0.8944271909999159
user
Average: 6886.2, Median: 6887.0, Stddev: 16.93221781102523
sys
Average: 520.4, Median: 520.0, Stddev: 1.140175425099138
The difference is entirely within the noise of a typical run on zram. This
hardly justifies the complexity of maintaining both the pool lock and the class
lock. In fact, for writeback, we would need to introduce yet another lock to
prevent data races on the pool's LRU, further complicating the lock handling
logic. IMHO, it is just better to collapse all of these into a single
pool-level lock.
Powered by blists - more mailing lists