[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140108011713.GA5212@quack.suse.cz>
Date: Wed, 8 Jan 2014 02:17:13 +0100
From: Jan Kara <jack@...e.cz>
To: Christoph Hellwig <hch@...radead.org>
Cc: Jan Kara <jack@...e.cz>, Sergey Meirovich <rathamahata@...il.com>,
linux-scsi <linux-scsi@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Gluk <git.user@...il.com>
Subject: Re: Terrible performance of sequential O_DIRECT 4k writes in SAN
environment. ~3 times slower then Solars 10 with the same HBA/Storage.
On Tue 07-01-14 07:58:30, Christoph Hellwig wrote:
> On Mon, Jan 06, 2014 at 09:10:32PM +0100, Jan Kara wrote:
> > This is likely a problem of Linux direct IO implementation. The thing is
> > that in Linux when you are doing appending direct IO (i.e., direct IO which
> > changes file size), the IO is performed synchronously so that we have our
> > life simpler with inode size update etc. (and frankly our current locking
> > rules make inode size update on IO completion almost impossible). Since
> > appending direct IO isn't very common, we seem to get away with this
> > simplification just fine...
>
> Shouldn't be too much of a problem at least for XFS and maybe even ext4
> with the workqueue based I/O end handler. For XFS we protect size
> updates by the ilock which we already taken in that handler, not sure
> what ext4 would do there.
Well, I was specifically worried about i_mutex locking. In particular:
Before we report appending IO completion we need to update i_size.
To update i_size we need to grab i_mutex.
Now this is unpleasant because inode_dio_wait() happens under i_mutex so
the above would create lock inversion. And we cannot really do
inode_dio_done() before grabbing i_mutex as that would open interesting
races between truncate decreasing i_size and DIO increasing it.
Honza
--
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists