[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230830125654.21257-1-zhangpeng.00@bytedance.com>
Date: Wed, 30 Aug 2023 20:56:48 +0800
From: Peng Zhang <zhangpeng.00@...edance.com>
To: Liam.Howlett@...cle.com, corbet@....net, akpm@...ux-foundation.org,
willy@...radead.org, brauner@...nel.org, surenb@...gle.com,
michael.christie@...cle.com, peterz@...radead.org,
mathieu.desnoyers@...icios.com, npiggin@...il.com, avagin@...il.com
Cc: linux-mm@...ck.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
Peng Zhang <zhangpeng.00@...edance.com>
Subject: [PATCH v2 0/6] Introduce __mt_dup() to improve the performance of fork()
In the process of duplicating mmap in fork(), VMAs will be inserted into the new
maple tree one by one. When inserting into the maple tree, the maple tree will
be rebalanced multiple times. The rebalancing of maple tree is not as fast as
the rebalancing of red-black tree and will be slower. Therefore, __mt_dup() is
introduced to directly duplicate the structure of the old maple tree, and then
modify each element of the new maple tree. This avoids rebalancing and some extra
copying, so is faster than the original method.
More information can refer to [1].
There is a "spawn" in byte-unixbench[2], which can be used to test the performance
of fork(). I modified it slightly to make it work with different number of VMAs.
Below are the test numbers. There are 21 VMAs by default. The first row indicates
the number of added VMAs. The following two lines are the number of fork() calls
every 10 seconds. These numbers are different from the test results in v1 because
this time the benchmark is bound to a CPU. This way the numbers are more stable.
Increment of VMAs: 0 100 200 400 800 1600 3200 6400
6.5.0-next-20230829: 111878 75531 53683 35282 20741 11317 6110 3158
Apply this patchset: 114531 85420 64541 44592 28660 16371 9038 4831
+2.37% +13.09% +20.23% +26.39% +38.18% +44.66% +47.92% +52.98%
Todo:
- Update the documentation.
Changes since v1:
- Reimplement __mt_dup() and mtree_dup(). Loops are implemented without using
goto instructions.
- The new tree also needs to be locked to avoid some lockdep warnings.
- Drop and add some helpers.
- Add test for duplicating full tree.
- Drop mas_replace_entry(), it doesn't seem to have a big impact on the
performance of fork().
[1] https://lore.kernel.org/lkml/463899aa-6cbd-f08e-0aca-077b0e4e4475@bytedance.com/
[2] https://github.com/kdlucas/byte-unixbench/tree/master
v1: https://lore.kernel.org/lkml/20230726080916.17454-1-zhangpeng.00@bytedance.com/
Peng Zhang (6):
maple_tree: Add two helpers
maple_tree: Introduce interfaces __mt_dup() and mtree_dup()
maple_tree: Add test for mtree_dup()
maple_tree: Skip other tests when BENCH is enabled
maple_tree: Update check_forking() and bench_forking()
fork: Use __mt_dup() to duplicate maple tree in dup_mmap()
include/linux/maple_tree.h | 3 +
kernel/fork.c | 34 ++-
lib/maple_tree.c | 277 ++++++++++++++++++++++++-
lib/test_maple_tree.c | 69 +++---
mm/mmap.c | 14 +-
tools/testing/radix-tree/maple.c | 346 +++++++++++++++++++++++++++++++
6 files changed, 697 insertions(+), 46 deletions(-)
--
2.20.1
Powered by blists - more mailing lists