lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Wed, 15 May 2024 17:17:27 +0800
From: Yafang Shao <laoar.shao@...il.com>
To: torvalds@...ux-foundation.org
Cc: brauner@...nel.org,
	jack@...e.cz,
	laoar.shao@...il.com,
	linux-fsdevel@...r.kernel.org,
	longman@...hat.com,
	viro@...iv.linux.org.uk,
	walters@...bum.org,
	wangkai86@...wei.com,
	linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	Matthew Wilcox <willy@...radead.org>
Subject: [PATCH] vfs: Delete the associated dentry when deleting a file

Our applications, built on Elasticsearch[0], frequently create and delete
files. These applications operate within containers, some with a memory
limit exceeding 100GB. Over prolonged periods, the accumulation of negative
dentries within these containers can amount to tens of gigabytes.

Upon container exit, directories are deleted. However, due to the numerous
associated dentries, this process can be time-consuming. Our users have
expressed frustration with this prolonged exit duration, which constitutes
our first issue.

Simultaneously, other processes may attempt to access the parent directory
of the Elasticsearch directories. Since the task responsible for deleting
the dentries holds the inode lock, processes attempting directory lookup
experience significant delays. This issue, our second problem, is easily
demonstrated:

  - Task 1 generates negative dentries:
  $ pwd
  ~/test
  $ mkdir es && cd es/ && ./create_and_delete_files.sh

  [ After generating tens of GB dentries ]

  $ cd ~/test && rm -rf es

  [ It will take a long duration to finish ]

  - Task 2 attempts to lookup the 'test/' directory
  $ pwd
  ~/test
  $ ls

  The 'ls' command in Task 2 experiences prolonged execution as Task 1
  is deleting the dentries.

We've devised a solution to address both issues by deleting associated
dentry when removing a file. Interestingly, we've noted that a similar
patch was proposed years ago[1], although it was rejected citing the
absence of tangible issues caused by negative dentries. Given our current
challenges, we're resubmitting the proposal. All relevant stakeholders from
previous discussions have been included for reference.

Some alternative solutions are also under discussion[2][3], such as
shrinking child dentries outside of the parent inode lock or even
asynchronously shrinking child dentries. However, given the straightforward
nature of the current solution, I believe this approach is still necessary.

[0]. https://github.com/elastic/elasticsearch
[1]. https://patchwork.kernel.org/project/linux-fsdevel/patch/1502099673-31620-1-git-send-email-wangkai86@huawei.com
[2]. https://lore.kernel.org/linux-fsdevel/20240511200240.6354-2-torvalds@linux-foundation.org/
[3]. https://lore.kernel.org/linux-fsdevel/CAHk-=wjEMf8Du4UFzxuToGDnF3yLaMcrYeyNAaH1NJWa6fwcNQ@mail.gmail.com/

Suggested-by: Linus Torvalds <torvalds@...ux-foundation.org>
Signed-off-by: Yafang Shao <laoar.shao@...il.com>
Cc: Al Viro <viro@...iv.linux.org.uk>
Cc: Christian Brauner <brauner@...nel.org>
Cc: Jan Kara <jack@...e.cz>
Cc: Waiman Long <longman@...hat.com>
Cc: Matthew Wilcox <willy@...radead.org>
Cc: Wangkai <wangkai86@...wei.com>
Cc: Colin Walters <walters@...bum.org>
---
 fs/dcache.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 71a8e943a0fa..2ffdb98e9166 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2360,19 +2360,17 @@ EXPORT_SYMBOL(d_hash_and_lookup);
  * - unhash this dentry and free it.
  *
  * Usually, we want to just turn this into
- * a negative dentry, but if anybody else is
- * currently using the dentry or the inode
- * we can't do that and we fall back on removing
- * it from the hash queues and waiting for
- * it to be deleted later when it has no users
+ * a negative dentry, but certain workloads can
+ * generate a large number of negative dentries.
+ * Therefore, it would be better to simply
+ * unhash it.
  */
  
 /**
  * d_delete - delete a dentry
  * @dentry: The dentry to delete
  *
- * Turn the dentry into a negative dentry if possible, otherwise
- * remove it from the hash queues so it can be deleted later
+ * Remove the dentry from the hash queues so it can be deleted later.
  */
  
 void d_delete(struct dentry * dentry)
@@ -2381,14 +2379,14 @@ void d_delete(struct dentry * dentry)
 
 	spin_lock(&inode->i_lock);
 	spin_lock(&dentry->d_lock);
+	__d_drop(dentry);
+
 	/*
 	 * Are we the only user?
 	 */
 	if (dentry->d_lockref.count == 1) {
-		dentry->d_flags &= ~DCACHE_CANT_MOUNT;
 		dentry_unlink_inode(dentry);
 	} else {
-		__d_drop(dentry);
 		spin_unlock(&dentry->d_lock);
 		spin_unlock(&inode->i_lock);
 	}
-- 
2.30.1 (Apple Git-130)


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ