lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 14 Oct 2009 11:48:37 -0700
From:	Mingming <cmm@...ibm.com>
To:	Jiaying Zhang <jiayingz@...gle.com>
Cc:	ext4 development <linux-ext4@...r.kernel.org>,
	Andrew Morton <akpm@...gle.com>,
	Michael Rubin <mrubin@...gle.com>,
	Manuel Benitez <rickyb@...gle.com>
Subject: Re: ext4 DIO read performance issue on SSD

On Fri, 2009-10-09 at 16:34 -0700, Jiaying Zhang wrote:
> Hello,
> 
> Recently, we are evaluating the ext4 performance on a high speed SSD.
> One problem we found is that ext4 performance doesn't scale well with
> multiple threads or multiple AIOs reading a single file with O_DIRECT.
> E.g., with 4k block size, multiple-thread DIO AIO random read on ext4
> can lose up to 50% throughput compared to the results we get via RAW IO.
> 
> After some initial analysis, we think the ext4 performance problem is caused
> by the use of i_mutex lock during DIO read. I.e., during DIO read, we grab
> the i_mutex lock in __blockdev_direct_IO because ext4 uses the default
> DIO_LOCKING from the generic fs code. I did a quick test by calling
> blockdev_direct_IO_no_locking() in ext4_direct_IO() and I saw ext4 DIO read
> got 99% performance as raw IO.
> 

This is very interesting...and impressive number.

I tried to change ext4 to call blockdev_direct_IO_no_locking() directly,
but then realize that we can't do this all the time, as ext4 support
ext3 non-extent based files, and uninitialized extent is not support on
ext3 format file.

> As we understand, the reason why we want to take i_mutex lock during DIO
> read is to prevent it from accessing stale data that may be exposed by a
> simultaneous write. We saw that Mingming Cao has implemented a patch set
> with which when a get_block request comes from direct write, ext4 only
> allocates or splits an uninitialized extent. That uninitialized extent
> will be marked as initialized at the end_io callback.

Though I need to clarify that with all the patches in mainline, we only
treat new allocated blocks form direct io write to holes, not to writes
to the end of file. I actually have proposed to treat the write to the
end of file also as unintialized extents, but there is some concerns
that this getting tricky with updating inode size when it is async IO
direct IO. So it didn't getting done yet.

>  We are wondering
> whether we can extend this idea to buffer write as well. I.e., we always
> allocate an uninitialized extent first during any write and convert it
> as initialized at the time of end_io callback. This will eliminate the need
> to hold i_mutex lock during direct read because a DIO read should never get
> a block marked initialized before the block has been written with new data.
> 

Oh I don't think so. For buffered IO, the data is being copied to
buffer, direct IO read would first flush what's in page cache to disk,
then read from disk. So if there is concurrent buffered write and direct
read, removing the i_mutex locks from the direct IO path should still
gurantee the right order, without having to treat buffered allocation
with uninitialized extent/end_io.

The i_mutex lock, from my understanding, is there to protect direct IO
write to hole and concurrent direct IO read, we should able to remove
this lock for extent based ext4 file. 

> We haven't implemented anything yet because we want to ask here first to
> see whether this proposal makes sense to you.
> 

It does make sense to me.

Mingming
> Regards,
> 
> Jiaying
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ