lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 21 Dec 2012 18:03:35 -0500
From:	Theodore Ts'o <tytso@....edu>
To:	Jan Kara <jack@...e.cz>
Cc:	Dmitry Monakhov <dmonakhov@...nvz.org>, linux-ext4@...r.kernel.org
Subject: Re: Uninitialized extent races

On Fri, Dec 21, 2012 at 11:49:47PM +0100, Jan Kara wrote:
>   It's actually simpler than that. We wait for any pending DIO using
> inode_dio_wait() and i_mutex protects from new writes to be submitted. So
> that takes care of one possibility. truncate_inode_pages() waits for
> PageWriteback bit so that handles waiting for IO itself. 

Hmm, yes, I should have known/remembered that.  I've seen cases where
very rarely, it's possible for a unlink() or truncate() call to stall
for multiple minutes(!).  This can happen if you have writeback
happening in a container which has a very small (low priority)
constraint on its block I/O bandwidth.  If you try to delete an inode
which has writeback work pending, it's possible for the writeback to
take a looong time, which in turn causes the unlink to take a long
time.

It becomes worse the process doing the unlink is a high priority
process (say, the cluster management daemon who is cleaning up after
said low-priority job has completed), but the writeback is happening
in the context of a low priority cgroup.  You can end up with a nasty
priority inversion.

And there's not a lot we can do at the kernel level.  We could
dispatch the truncate to a workqueue and just make sure the file name
has disappeared from the file system name space before the unlink() to
userspace, but then the disk space gets released after the unlink()
call returns, which can cause other problems.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ