linux-ext4 - Re: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPcyv4jnz69a3S+XZgLaLojHZmpfoVXGDkJkt_1Q=8kk0gik9w@mail.gmail.com>
Date:	Mon, 2 May 2016 09:49:59 -0700
From:	Dan Williams <dan.j.williams@...el.com>
To:	Boaz Harrosh <boaz@...xistor.com>
Cc:	Vishal Verma <vishal.l.verma@...el.com>,
	"linux-nvdimm@...ts.01.org" <linux-nvdimm@...ts.01.org>,
	linux-block@...r.kernel.org, Jan Kara <jack@...e.cz>,
	Matthew Wilcox <matthew@....cx>,
	Dave Chinner <david@...morbit.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	XFS Developers <xfs@....sgi.com>, Jens Axboe <axboe@...com>,
	Linux MM <linux-mm@...ck.org>,
	Al Viro <viro@...iv.linux.org.uk>,
	Christoph Hellwig <hch@...radead.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-ext4 <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io

On Mon, May 2, 2016 at 9:22 AM, Boaz Harrosh <boaz@...xistor.com> wrote:
> On 05/02/2016 07:01 PM, Dan Williams wrote:
>> On Mon, May 2, 2016 at 8:41 AM, Boaz Harrosh <boaz@...xistor.com> wrote:
>>> On 04/29/2016 12:16 AM, Vishal Verma wrote:
>>>> All IO in a dax filesystem used to go through dax_do_io, which cannot
>>>> handle media errors, and thus cannot provide a recovery path that can
>>>> send a write through the driver to clear errors.
>>>>
>>>> Add a new iocb flag for DAX, and set it only for DAX mounts. In the IO
>>>> path for DAX filesystems, use the same direct_IO path for both DAX and
>>>> direct_io iocbs, but use the flags to identify when we are in O_DIRECT
>>>> mode vs non O_DIRECT with DAX, and for O_DIRECT, use the conventional
>>>> direct_IO path instead of DAX.
>>>>
>>>
>>> Really? What are your thinking here?
>>>
>>> What about all the current users of O_DIRECT, you have just made them
>>> 4 times slower and "less concurrent*" then "buffred io" users. Since
>>> direct_IO path will queue an IO request and all.
>>> (And if it is not so slow then why do we need dax_do_io at all? [Rhetorical])
>>>
>>> I hate it that you overload the semantics of a known and expected
>>> O_DIRECT flag, for special pmem quirks. This is an incompatible
>>> and unrelated overload of the semantics of O_DIRECT.
>>
>> I think it is the opposite situation, it us undoing the premature
>> overloading of O_DIRECT that went in without performance numbers.
>
> We have tons of measurements. Is not hard to imagine the results though.
> Specially the 1000 threads case
>
>> This implementation clarifies that dax_do_io() handles the lack of a
>> page cache for buffered I/O and O_DIRECT behaves as it nominally would
>> by sending an I/O to the driver.
>
>> It has the benefit of matching the
>> error semantics of a typical block device where a buffered write could
>> hit an error filling the page cache, but an O_DIRECT write potentially
>> triggers the drive to remap the block.
>>
>
> I fail to see how in writes the device error semantics regarding remapping of
> blocks is any different between buffered and direct IO. As far as the block
> device it is the same exact code path. All The big difference is higher in the
> VFS.
>
> And ... So you are willing to sacrifice the 99% hotpath for the sake of the
> 1% error path? and piggybacking on poor O_DIRECT.
>
> Again there are tons of O_DIRECT apps out there, why are you forcing them to
> change if they want true pmem performance?

This isn't forcing them to change.  This is the path of least surprise
as error semantics are identical to a typical block device.  Yes, an
application can go faster by switching to the "buffered" / dax_do_io()
path it can go even faster to switch to mmap() I/O and use DAX
directly.  If we can later optimize the O_DIRECT path to bring it's
performance more in line with dax_do_io(), great, but the
implementation should be correct first and optimized later.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html