lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z4WiOUZizUok2VAs@dread.disaster.area>
Date: Tue, 14 Jan 2025 10:31:05 +1100
From: Dave Chinner <david@...morbit.com>
To: "Artem S. Tashkinov" <aros@....com>
Cc: linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: Spooling large metadata updates / Proposal for a new API/feature
 in the Linux Kernel (VFS/Filesystems):

On Sat, Jan 11, 2025 at 09:17:49AM +0000, Artem S. Tashkinov wrote:
> Hello,
> 
> I had this idea on 2021-11-07, then I thought it was wrong/stupid, now
> I've asked AI and it said it was actually not bad,

Which shows just how poor "AI" is at analysing complex systems.

> so I'm bringing it
> forward now:
> 
> Imagine the following scenarios:
> 
>  * You need to delete tens of thousands of files.
>  * You need to change the permissions, ownership, or security context
> (chmod, chown, chcon) for tens of thousands of files.
>  * You need to update timestamps for tens of thousands of files.
> 
> All these operations are currently relatively slow because they are
> executed sequentially, generating significant I/O overhead.

Yes, they are executed sequentially by the -application- not the
filesystem. i.e. the performance limiter is application concurrency,
not the filesystem.

> What if these operations could be spooled and performed as a single
> transaction?

You get nothing extra - they'd still executed "sequentially" within
the transaction. Operational concurrency is required to speed up
these operations, and we have io_uring and/or threads for that...

> By bundling metadata updates into one atomic operation,
> such tasks could become near-instant or significantly faster.

No. The filesystem still has to do exactly the same amount of work
to modify thousands of files. Transactional atomicity has nothing to
do with the cost of modification of an otherwise unrelated set of
filesystem objects...

The overhead of 'rm -rf' could be optimised, but the filesystem
would still have to do the directory traversal first to find all the
inodes that have to be unlinked/rmdir'd and process them before the
parent directory is freed. i.e. we can make it *look fast*, but it
still has largely the same cost in terms of IO, CPU and memory
overhead.

And, of course, operations like sync() would have to block on an
offloaded 'rm -rf' operation. That is likely to cause people
more unexpected issues than userspace implementing a concurrent 'rm
-rf' based on sub-dir concurrency....

> This would
> also reduce the number of writes, leading to less wear and tear on
> storage devices.

Not for a typical journalling filesystem. They aggregate and
journal delta changes efficiently, then do batched writeback of the
modified metadata efficiently, eliding all writes possible. 

> Does this idea make sense? If it already exists, or if there’s a reason
> it wouldn’t work, please let me know.

Filesystems can already do operations concurrently. As long as
concurrency for directory traversal based operations is done on a
per-directory level, they scale out to the inherent concurrency the
filesytem can provide.

In the case of XFS, we can scale out to around 600-700,000 file
creates/s, about 1 million chmod/chown/chcon/futimes modifications
per second and about 500,000 unlinks per second. With some VFS
scalablility mods, we can get up around the 1 million file creates/s
mark....

IOWs, if application are having problems with sequential filesystem
operations being slow, the problem is generally application level
algorithms and concurrency and not the filesystem implementations or
syscall interfaces.

-Dave.
-- 
Dave Chinner
david@...morbit.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ