linux-kernel - Re: [PATCH 2/5] zsmalloc: Consolidate zs_pool's migrate_lock and size

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20221107213114.916231-1-nphamcs@gmail.com>
Date:   Mon,  7 Nov 2022 13:31:14 -0800
From:   Nhat Pham <nphamcs@...il.com>
To:     senozhatsky@...omium.org
Cc:     hannes@...xchg.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, minchan@...nel.org,
        ngupta@...are.org, akpm@...ux-foundation.org, sjenning@...hat.com,
        ddstreet@...e.org, vitaly.wool@...sulko.com
Subject: Re: [PATCH 2/5] zsmalloc: Consolidate zs_pool's migrate_lock and size_class's locks

We have benchmarked the lock consolidation to see the performance effect of
this change on zram. First, we ran a synthetic FS workload on a server machine
with 36 cores (same machine for all runs), using this benchmark script:

https://github.com/josefbacik/fs_mark

using 32 threads, and cranking the pressure up to > 80% FS usage.

Here is the result (unit is file/second):

With lock consolidation (btrfs):
Average: 13520.2, Median: 13531.0, Stddev: 137.5961482019028

Without lock consolidation (btrfs):
Average: 13487.2, Median: 13575.0, Stddev: 309.08283679298665

With lock consolidation (ext4):
Average: 16824.4, Median: 16839.0, Stddev: 89.97388510006668

Without lock consolidation (ext4)
Average: 16958.0, Median: 16986.0, Stddev: 194.7370021336469

As you can see, we observe a 0.3% regression for btrfs, and a 0.9% regression
for ext4. This is a small, barely measurable difference in my opinion.

For a more realistic scenario, we also tries building the kernel on zram.
Here is the time it takes (in seconds):

With lock consolidation (btrfs):
real
Average: 319.6, Median: 320.0, Stddev: 0.8944271909999159
user
Average: 6894.2, Median: 6895.0, Stddev: 25.528415540334656
sys
Average: 521.4, Median: 522.0, Stddev: 1.51657508881031

Without lock consolidation (btrfs):
real
Average: 319.8, Median: 320.0, Stddev: 0.8366600265340756
user
Average: 6896.6, Median: 6899.0, Stddev: 16.04057355583023
sys
Average: 520.6, Median: 521.0, Stddev: 1.140175425099138

With lock consolidation (ext4):
real
Average: 320.0, Median: 319.0, Stddev: 1.4142135623730951
user
Average: 6896.8, Median: 6878.0, Stddev: 28.621670111997307
sys
Average: 521.2, Median: 521.0, Stddev: 1.7888543819998317

Without lock consolidation (ext4)
real
Average: 319.6, Median: 319.0, Stddev: 0.8944271909999159
user
Average: 6886.2, Median: 6887.0, Stddev: 16.93221781102523
sys
Average: 520.4, Median: 520.0, Stddev: 1.140175425099138

The difference is entirely within the noise of a typical run on zram. This
hardly justifies the complexity of maintaining both the pool lock and the class
lock. In fact, for writeback, we would need to introduce yet another lock to
prevent data races on the pool's LRU, further complicating the lock handling
logic. IMHO, it is just better to collapse all of these into a single
pool-level lock.