linux-kernel - Re: [PATCHv4 14/17] zsmalloc: make zspage lock preemptible

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <droaoze6w4atf7guiv6t4imhcmkpteyvoaigdnw5p3vdg75ebx@m56xi2y527i4>
Date: Wed, 12 Feb 2025 14:00:26 +0900
From: Sergey Senozhatsky <senozhatsky@...omium.org>
To: Yosry Ahmed <yosry.ahmed@...ux.dev>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>, 
	Andrew Morton <akpm@...ux-foundation.org>, Minchan Kim <minchan@...nel.org>, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, Kairui Song <ryncsn@...il.com>
Subject: Re: [PATCHv4 14/17] zsmalloc: make zspage lock preemptible

On (25/02/07 21:09), Yosry Ahmed wrote:
> Can we do some perf testing to make sure this custom locking is not
> regressing performance (selfishly I'd like some zswap testing too)?

So for zsmalloc I (usually) write some simple testing code which is
triggered via sysfs (device attr) and that is completely reproducible,
so that I compares apples to apples.  In this particular case I just
have a loop that creates objects (we don't need to compress or decompress
anything, zsmalloc doesn't really care)

-	echo 1 > /sys/ ... / test_prepare

	for (sz = 32; sz < PAGE_SIZE; sz += 64) {
		for (i = 0; i < 4096; i++) {
			ent->handle = zs_malloc(zram->mem_pool, sz)
			list_add(ent)
		}
	}


And now I just `perf stat` writes:

-	perf stat echo 1 > /sys/ ... / test_exec_old

	list_for_each_entry
		zs_map_object(ent->handle, ZS_MM_RO);
		zs_unmap_object(ent->handle)

	list_for_each_entry
		dst = zs_map_object(ent->handle, ZS_MM_WO);
		memcpy(dst, tmpbuf, ent->sz)
		zs_unmap_object(ent->handle)



-	perf stat echo 1 > /sys/ ... / test_exec_new

	list_for_each_entry
		dst = zs_obj_read_begin(ent->handle, loc);
		zs_obj_read_end(ent->handle, dst);

	list_for_each_entry
		zs_obj_write(ent->handle, tmpbuf, ent->sz);


-	echo 1 > /sys/ ... / test_finish

	free all handles and ent-s


The nice part is that we don't depend on any of the upper layers, we
don't even need to compress/decompress anything; we allocate objects
of required sizes and memcpy static data there (zsmalloc doesn't have
any opinion on that) and that's pretty much it.


OLD API
=======

10 runs

       369,205,778      instructions                     #    0.80  insn per cycle            
        40,467,926      branches                         #  113.732 M/sec                     

       369,002,122      instructions                     #    0.62  insn per cycle            
        40,426,145      branches                         #  189.361 M/sec                     

       369,051,170      instructions                     #    0.45  insn per cycle            
        40,434,677      branches                         #  157.574 M/sec                     

       369,014,522      instructions                     #    0.63  insn per cycle            
        40,427,754      branches                         #  201.464 M/sec                     

       369,019,179      instructions                     #    0.64  insn per cycle            
        40,429,327      branches                         #  198.321 M/sec                     

       368,973,095      instructions                     #    0.64  insn per cycle            
        40,419,245      branches                         #  234.210 M/sec                     

       368,950,705      instructions                     #    0.64  insn per cycle            
        40,414,305      branches                         #  231.460 M/sec                     

       369,041,288      instructions                     #    0.46  insn per cycle            
        40,432,599      branches                         #  155.576 M/sec                     

       368,964,080      instructions                     #    0.67  insn per cycle            
        40,417,025      branches                         #  245.665 M/sec                     

       369,036,706      instructions                     #    0.63  insn per cycle            
        40,430,860      branches                         #  204.105 M/sec                     


NEW API
=======

10 runs

       265,799,293      instructions                     #    0.51  insn per cycle            
        29,834,567      branches                         #  170.281 M/sec                     

       265,765,970      instructions                     #    0.55  insn per cycle            
        29,829,019      branches                         #  161.602 M/sec                     

       265,764,702      instructions                     #    0.51  insn per cycle            
        29,828,015      branches                         #  189.677 M/sec                     

       265,836,506      instructions                     #    0.38  insn per cycle            
        29,840,650      branches                         #  124.237 M/sec                     

       265,836,061      instructions                     #    0.36  insn per cycle            
        29,842,285      branches                         #  137.670 M/sec                     

       265,887,080      instructions                     #    0.37  insn per cycle            
        29,852,881      branches                         #  126.060 M/sec                     

       265,769,869      instructions                     #    0.57  insn per cycle            
        29,829,873      branches                         #  210.157 M/sec                     

       265,803,732      instructions                     #    0.58  insn per cycle            
        29,835,391      branches                         #  186.940 M/sec                     

       265,766,624      instructions                     #    0.58  insn per cycle            
        29,827,537      branches                         #  212.609 M/sec                     

       265,843,597      instructions                     #    0.57  insn per cycle            
        29,843,650      branches                         #  171.877 M/sec                     


x old-api-insn
+ new-api-insn
+-------------------------------------------------------------------------------------+
|+                                                                                   x|
|+                                                                                   x|
|+                                                                                   x|
|+                                                                                   x|
|+                                                                                   x|
|+                                                                                   x|
|+                                                                                   x|
|+                                                                                   x|
|+                                                                                   x|
|+                                                                                   x|
|A                                                                                   A|
+-------------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  10  3.689507e+08 3.6920578e+08 3.6901918e+08 3.6902586e+08     71765.519
+  10  2.657647e+08 2.6588708e+08 2.6580373e+08 2.6580734e+08     42187.024
Difference at 95.0% confidence
	-1.03219e+08 +/- 55308.7
	-27.9705% +/- 0.0149878%
	(Student's t, pooled s = 58864.4)


> Perhaps Kairui can help with that since he was already testing this
> series.

Yeah, would be great.