linux-kernel - Re: [PATCH] async: Don't call async_synchronize_full_special() while holding sb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090109014054.GN9448@disturbed>
Date:	Fri, 9 Jan 2009 12:40:54 +1100
From:	Dave Chinner <david@...morbit.com>
To:	Arjan van de Ven <arjan@...ux.intel.com>
Cc:	Dave Kleikamp <shaggy@...ux.vnet.ibm.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Grissiom <chaos.proton@...il.com>, linux-kernel@...r.kernel.org,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH] async: Don't call async_synchronize_full_special()
	while holding sb_lock

On Thu, Jan 08, 2009 at 02:51:52PM -0800, Arjan van de Ven wrote:
> Dave Chinner wrote:
>> On Thu, Jan 08, 2009 at 09:46:31AM -0600, Dave Kleikamp wrote:
>>> sync_filesystems() shouldn't be calling async_synchronize_full_special
>>> while holding a spinlock.  The second while loop in that function is the
>>> right place for this anyway.
>>
>> Out of curiousity, what on earth does
>> async_synchronize_full_special() do and why does it need to be in
>> sync_filesystems()?
>>
> now that we have asynchronous operations, this function makes sure that all the functions
> that we started async before this point complete, so that what when you sync, you sync all in progress work.

So what you are saying that we now have async operations that
need magical undocumented sync primitives apparently randomly
strewn throughout the code base?

Could you name the interface somewhat better? it's an async
operation synchronisation primitive - currently the name is
full of words that are typically suffixes to something that
is typically "namespace_operation_suffix". This is whole API
looks like "suffix_suffix_suffix_suffix".....

<rummage around the code>

Oh, fuck. You've made generic_delete_inode() an asychronous operation.
This is bad. Really Bad.

XFS does transactions in ->clear_inode so what you've done is
removed one of the synchronous throttleѕ on inode removal that
prevents build-up of massive numbers of deleted XFS inodes in
memory that have run the first unlink phase but not the second.

I know for a fact (because this has been done within XFS before)
that this causes serious writeback, sync and unmount latency
problems with XFS as well as introducing a whole new class of OOM
problems when under heavy load.  The depth of the async queues
requires serious throttling to prevent these issues from occurring.

How bad? For example, rm -rf そf a larg enumber of files can cause
unmount or sync can take hours to run the async queues under heavy
load because the inodes existed only in memory and are randomly
spread around the filesystem (this is a real world example). In this
case under heavy memory load, XFS required inode buffer RMW to do
the modifcation, and given that it was on software RAID5 this also
required a RAID5 RMW cycle. The result?  The async inode delete
queue drained at *50* deleted inodes per second.  The synchronous
unlink rate was around 5000 inodes per second.....

So, given the potential impact of this change, what testing have
you done in terms of:

	- performance impact
	- regression testing
	- sync() safety
	- removing a million files and queuing all of the
	  deletes in the async queues....

On all the different filesystems under heavy I/O workloads?

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/