lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230105053510.1819862-4-senozhatsky@chromium.org>
Date:   Thu,  5 Jan 2023 14:35:09 +0900
From:   Sergey Senozhatsky <senozhatsky@...omium.org>
To:     Minchan Kim <minchan@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>
Cc:     linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Sergey Senozhatsky <senozhatsky@...omium.org>
Subject: [PATCH 3/4] zsmalloc: make zspage chain size configurable

Remove hard coded limit on the maximum number of physical
pages per-zspage.

This will allow tuning of zsmalloc pool as zspage chain
size changes `pages per-zspage` and `objects per-zspage`
characteristics of size classes which also affects size
classes clustering (the way size classes are merged).

Signed-off-by: Sergey Senozhatsky <senozhatsky@...omium.org>
---
 .../admin-guide/blockdev/zsmalloc.rst         | 157 ++++++++++++++++++
 mm/Kconfig                                    |  19 +++
 mm/zsmalloc.c                                 |  15 +-
 3 files changed, 180 insertions(+), 11 deletions(-)
 create mode 100644 Documentation/admin-guide/blockdev/zsmalloc.rst

diff --git a/Documentation/admin-guide/blockdev/zsmalloc.rst b/Documentation/admin-guide/blockdev/zsmalloc.rst
new file mode 100644
index 000000000000..2e238afb1b4b
--- /dev/null
+++ b/Documentation/admin-guide/blockdev/zsmalloc.rst
@@ -0,0 +1,157 @@
+========================================
+zsmalloc allocator
+========================================
+
+Internals
+---------
+
+zsmalloc has 255 size classes. Size classes hold a number of zspages, each
+zspage can consist of up to ZSMALLOC_CHAIN_SIZE physical (0 order) pages.
+The exact (most optimal) zspage chain size is calculated for each size class
+during zsmalloc pool creation (see calculate_zspage_chain_size()).
+
+As a reasonable optimization, zsmalloc merges size classes that have
+similar characteristics: number of pages per zspage and number of
+objects zspage can store.
+
+For example, let's look at the following size classes:::
+
+class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
+..
+   94  1536           0            0             0          0          0                3        0
+  100  1632           0            0             0          0          0                2        0
+..
+
+Size classes #95-99 are merged with size class #100. That is, each time
+we store an object of size, say, 1568 bytes instead of using class #96
+we end up storing it in size class #100. Class #100 is for objects of
+1632 bytes in size, hence every 1568 bytes object wastes 1632-1568 bytes.
+Class #100 zspages consist of 2 physical pages and can hold 5 objects.
+When we need to store, say, 13 objects of size 1568 we end up allocating
+three zspages; in other words, 6 physical pages.
+
+However, if we'll look closer at size class #96 (which should hold objects
+of size 1568 bytes) and trace calculate_zspage_chain_size():::
+
+    pages per zspage      wasted bytes     used%
+           1                  960           76
+           2                  352           95
+           3                 1312           89
+           4                  704           95
+           5                   96           99
+
+We'd notice that the most optimal zspage configuration for this class is
+when it consists of 5 physical pages. A 5 page class #96 configuration
+would store 13 objects of size 1568 in a single zspage, allocating 5 physical
+pages, as opposed to 6 physical pages that class #100 would allocate otherwise.
+
+A larger zspage chain size for class #96 also changes its key characteristics:
+pages per-zspage and objects per-zspage. As a result we merge less classes. In
+other words classes are grouped in a more compact way, which decreases memory
+wastage.
+
+Let's take a closer look at the bottom of /sys/kernel/debug/zsmalloc/zramX/classes:::
+
+class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
+..
+  202  3264           0            0             0          0          0                4        0
+  254  4096           0            0             0          0          0                1        0
+..
+
+For exactly same reason - maximum 4 pages per zspage - the last non-huge
+size class is #202, which stores objects of size 3264 bytes. Any object
+larger than 3264 bytes, hence, is considered to be huge and lands in size
+class #254, which uses a whole physical page to store every object (objects
+in huge classes don't share physical pages).
+
+Another consequence of larger zspages chain sizes is that we move the huge
+size class watermark up and as a result have less huge classes and store
+large objects in a more compact way.
+
+For zspage chain size of 8, huge class watermark becomes 3632 bytes:::
+
+class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
+..
+  202  3264           0            0             0          0          0                4        0
+  211  3408           0            0             0          0          0                5        0
+  217  3504           0            0             0          0          0                6        0
+  222  3584           0            0             0          0          0                7        0
+  225  3632           0            0             0          0          0                8        0
+  254  4096           0            0             0          0          0                1        0
+..
+
+For zspage chain size of 16, huge class watermark becomes 3840 bytes:::
+
+class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
+..
+  202  3264           0            0             0          0          0                4        0
+  206  3328           0            0             0          0          0               13        0
+  207  3344           0            0             0          0          0                9        0
+  208  3360           0            0             0          0          0               14        0
+  211  3408           0            0             0          0          0                5        0
+  212  3424           0            0             0          0          0               16        0
+  214  3456           0            0             0          0          0               11        0
+  217  3504           0            0             0          0          0                6        0
+  219  3536           0            0             0          0          0               13        0
+  222  3584           0            0             0          0          0                7        0
+  223  3600           0            0             0          0          0               15        0
+  225  3632           0            0             0          0          0                8        0
+  228  3680           0            0             0          0          0                9        0
+  230  3712           0            0             0          0          0               10        0
+  232  3744           0            0             0          0          0               11        0
+  234  3776           0            0             0          0          0               12        0
+  235  3792           0            0             0          0          0               13        0
+  236  3808           0            0             0          0          0               14        0
+  238  3840           0            0             0          0          0               15        0
+  254  4096           0            0             0          0          0                1        0
+..
+
+Overall the combined zspage chain size effect on zsmalloc pool configuration:::
+
+pages per zspage   number of size classes (clusters)   huge size class watermark
+       4                        69                               3264
+       5                        86                               3408
+       6                        93                               3504
+       7                       112                               3584
+       8                       123                               3632
+       9                       140                               3680
+      10                       143                               3712
+      11                       159                               3744
+      12                       164                               3776
+      13                       180                               3792
+      14                       183                               3808
+      15                       188                               3840
+      16                       191                               3840
+
+A synthetic test:::
+
+CONFIG_ZSMALLOC_CHAIN_SIZE=4
+
+zsmalloc classes stats
+ class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
+ ..
+ Total                13           51        413836     412973     159955                         3
+
+zram mm_stat
+1691783168 628083717 655175680        0 655175680       60        0    34048    34049
+
+CONFIG_ZSMALLOC_CHAIN_SIZE=8
+
+zsmalloc classes stats
+ class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
+ ..
+ Total                18           87        414852     412978     156666                         0
+
+zram mm_stat
+1691803648 627793930 641703936        0 641703936       60        0    33591    33591
+
+Note that for the same amount of data zsmalloc uses less physical pages: down
+to 156666 from 159955, and maximum zsmalloc pool memory usage also went down
+from 655175680 to 641703936 bytes.
+
+The obvious downside of larger zspage chains is that some zspages require
+more physical pages, which can, in theory, increase system memory pressure
+in cases when zspool suffers from heavy internal fragmentation and zspool
+compaction cannot relocate objects and release some zspages. In such cases
+users are advised to lower zspage chain size limit (CONFIG_ZSMALLOC_CHAIN_SIZE
+option).
diff --git a/mm/Kconfig b/mm/Kconfig
index ff7b209dec05..995a7c4083c2 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -191,6 +191,25 @@ config ZSMALLOC_STAT
 	  information to userspace via debugfs.
 	  If unsure, say N.
 
+config ZSMALLOC_CHAIN_SIZE
+	int "Maximum number of physical pages per-zspage"
+	default 4
+	range 1 16
+	depends on ZSMALLOC
+	help
+	  Each zmalloc page (zspage) can consist of 1 or more physical
+	  (0 order) non contiguous pages. This option sets the upper
+	  (hard) limit on that number.
+
+	  The exact zspage chain size is calculated for each size class
+	  individually during pool initialisation. Changing this results
+	  in different size classes characteristics (pages per-zspage,
+	  objects per-zspage) which in turn results in different pool
+	  configurations: zsmalloc merges size classes that share key
+	  characteristics.
+
+	  Please read zsmalloc documentation for more details.
+
 menu "SLAB allocator options"
 
 choice
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 9a0f1963b803..34ba97d1175f 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -73,13 +73,6 @@
  */
 #define ZS_ALIGN		8
 
-/*
- * A single 'zspage' is composed of up to 2^N discontiguous 0-order (single)
- * pages. ZS_MAX_ZSPAGE_ORDER defines upper limit on N.
- */
-#define ZS_MAX_ZSPAGE_ORDER 2
-#define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER)
-
 #define ZS_HANDLE_SIZE (sizeof(unsigned long))
 
 /*
@@ -126,7 +119,7 @@
 #define MAX(a, b) ((a) >= (b) ? (a) : (b))
 /* ZS_MIN_ALLOC_SIZE must be multiple of ZS_ALIGN */
 #define ZS_MIN_ALLOC_SIZE \
-	MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS))
+	MAX(32, (CONFIG_ZSMALLOC_CHAIN_SIZE << PAGE_SHIFT >> OBJ_INDEX_BITS))
 /* each chunk includes extra space to keep handle */
 #define ZS_MAX_ALLOC_SIZE	PAGE_SIZE
 
@@ -1078,7 +1071,7 @@ static struct zspage *alloc_zspage(struct zs_pool *pool,
 					gfp_t gfp)
 {
 	int i;
-	struct page *pages[ZS_MAX_PAGES_PER_ZSPAGE];
+	struct page *pages[CONFIG_ZSMALLOC_CHAIN_SIZE];
 	struct zspage *zspage = cache_alloc_zspage(pool, gfp);
 
 	if (!zspage)
@@ -1910,7 +1903,7 @@ static void replace_sub_page(struct size_class *class, struct zspage *zspage,
 				struct page *newpage, struct page *oldpage)
 {
 	struct page *page;
-	struct page *pages[ZS_MAX_PAGES_PER_ZSPAGE] = {NULL, };
+	struct page *pages[CONFIG_ZSMALLOC_CHAIN_SIZE] = {NULL, };
 	int idx = 0;
 
 	page = get_first_page(zspage);
@@ -2293,7 +2286,7 @@ static int calculate_zspage_chain_size(int class_size)
 	if (is_power_of_2(class_size))
 		return chain_size;
 
-	for (i = 1; i <= ZS_MAX_PAGES_PER_ZSPAGE; i++) {
+	for (i = 1; i <= CONFIG_ZSMALLOC_CHAIN_SIZE; i++) {
 		int waste;
 
 		waste = (i * PAGE_SIZE) % class_size;
-- 
2.39.0.314.g84b9a713c41-goog

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ