linux-kernel - Re: Terrible performance of sequential O_DIRECT 4k writes in SAN environment. ~3 times slower then Solars 10 with the same HBA/Storage.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140108011713.GA5212@quack.suse.cz>
Date:	Wed, 8 Jan 2014 02:17:13 +0100
From:	Jan Kara <jack@...e.cz>
To:	Christoph Hellwig <hch@...radead.org>
Cc:	Jan Kara <jack@...e.cz>, Sergey Meirovich <rathamahata@...il.com>,
	linux-scsi <linux-scsi@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Gluk <git.user@...il.com>
Subject: Re: Terrible performance of sequential O_DIRECT 4k writes in SAN
 environment. ~3 times slower then Solars 10 with the same HBA/Storage.

On Tue 07-01-14 07:58:30, Christoph Hellwig wrote:
> On Mon, Jan 06, 2014 at 09:10:32PM +0100, Jan Kara wrote:
> >   This is likely a problem of Linux direct IO implementation. The thing is
> > that in Linux when you are doing appending direct IO (i.e., direct IO which
> > changes file size), the IO is performed synchronously so that we have our
> > life simpler with inode size update etc. (and frankly our current locking
> > rules make inode size update on IO completion almost impossible). Since
> > appending direct IO isn't very common, we seem to get away with this
> > simplification just fine...
> 
> Shouldn't be too much of a problem at least for XFS and maybe even ext4
> with the workqueue based I/O end handler.  For XFS we protect size
> updates by the ilock which we already taken in that handler, not sure
> what ext4 would do there.
  Well, I was specifically worried about i_mutex locking. In particular:
Before we report appending IO completion we need to update i_size.
To update i_size we need to grab i_mutex.

Now this is unpleasant because inode_dio_wait() happens under i_mutex so
the above would create lock inversion. And we cannot really do
inode_dio_done() before grabbing i_mutex as that would open interesting
races between truncate decreasing i_size and DIO increasing it.

								Honza
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/