lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 5 Feb 2024 14:22:29 +0800
From: JonasZhou <jonaszhou-oc@...oxin.com>
To: <willy@...radead.org>
CC: <CobeChen@...oxin.com>, <JonasZhou-oc@...oxin.com>,
	<JonasZhou@...oxin.com>, <LouisQi@...oxin.com>, <brauner@...nel.org>,
	<jack@...e.cz>, <linux-fsdevel@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <viro@...iv.linux.org.uk>
Subject: Re: [PATCH] fs/address_space: move i_mmap_rwsem to mitigate a false sharing with i_mmap.

> On Fri, Feb 02, 2024 at 03:03:51PM +0000, Matthew Wilcox wrote:
> > On Fri, Feb 02, 2024 at 05:34:07PM +0800, JonasZhou-oc wrote:
> > > In the struct address_space, there is a 32-byte gap between i_mmap
> > > and i_mmap_rwsem. Due to the alignment of struct address_space
> > > variables to 8 bytes, in certain situations, i_mmap and
> > > i_mmap_rwsem may end up in the same CACHE line.
> > > 
> > > While running Unixbench/execl, we observe high false sharing issues
> > > when accessing i_mmap against i_mmap_rwsem. We move i_mmap_rwsem
> > > after i_private_list, ensuring a 64-byte gap between i_mmap and
> > > i_mmap_rwsem.
> > 
> > I'm confused.  i_mmap_rwsem protects i_mmap.  Usually you want the lock
> > and the thing it's protecting in the same cacheline.  Why is that not
> > the case here?
>
> We actually had this seven months ago:
>
> https://lore.kernel.org/all/20230628105624.150352-1-lipeng.zhu@intel.com/
>
> Unfortunately, no argumentation was forthcoming about *why* this was
> the right approach.  All we got was a different patch and an assertion
> that it still improved performance.
>
> We need to understand what's going on!  Please don't do the same thing
> as the other submitter and just assert that it does.

When running UnixBench/execl, each execl process repeatedly performs 
i_mmap_lock_write -> vma_interval_tree_remove/insert -> 
i_mmap_unlock_write. As indicated below, when i_mmap and i_mmap_rwsem 
are in the same CACHE Line, there will be more HITM.

Func0: i_mmap_lock_write
Func1: vma_interval_tree_remove/insert
Func2: i_mmap_unlock_write
In the same CACHE Line
Process A | Process B | Process C | Process D | CACHE Line state 
----------+-----------+-----------+-----------+-----------------
Func0     |           |           |           | I->M
          | Func0     |           |           | HITM M->S
Func1     |           |           |           | may change to M
          |           | Func0     |           | HITM M->S
Func2     |           |           |           | S->M
          |           |           | Func0     | HITM M->S

In different CACHE Lines
Process A | Process B | Process C | Process D | CACHE Line state 
----------+-----------+-----------+-----------+-----------------
Func0     |           |           |           | I->M
          | Func0     |           |           | HITM M->S
Func1     |           |           |           | 
          |           | Func0     |           | S->S
Func2     |           |           |           | S->M
          |           |           | Func0     | HITM M->S

The same issue will occur in Unixbench/shell because the shell 
launches a lot of shell commands, loads executable files and dynamic 
libraries into memory, execute, and exit.

Yes, his commit has been merged into the Linux kernel, but there 
is an issue. After moving i_mmap_rwsem below flags, there is a 
32-byte gap between i_mmap_rwsem and i_mmap. However, the struct 
address_space is aligned to sizeof(long), which is 8 on the x86-64 
architecture. As a result, i_mmap_rwsem and i_mmap may be placed on 
the same CACHE Line, causing a false sharing problem. This issue has 
been observed using the perf c2c tool.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ