lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Fri, 17 Apr 2015 20:00:53 -0600
From:	Jens Axboe <axboe@...nel.dk>
To:	Dave Chinner <david@...morbit.com>, Jens Axboe <axboe@...com>
CC:	Ming Lin <mlin@...nel.org>, lkml <linux-kernel@...r.kernel.org>,
	linux-fsdevel@...r.kernel.org, ming.l@....samsung.com,
	"Kwan (Hingkwan) Huen-SSI" <kwan.huen@....samsung.com>
Subject: Re: [PATCH 3/6] direct-io: add support for write stream IDs

On 04/17/2015 05:51 PM, Dave Chinner wrote:
> On Fri, Apr 17, 2015 at 05:11:40PM -0600, Jens Axboe wrote:
>> On 04/17/2015 05:06 PM, Dave Chinner wrote:
>>> On Thu, Apr 16, 2015 at 11:20:45PM -0700, Ming Lin wrote:
>>>> On Sat, Apr 11, 2015 at 4:59 AM, Dave Chinner <david@...morbit.com> wrote:
>>>>> On Fri, Apr 10, 2015 at 04:50:05PM -0700, Ming Lin wrote:
>>>>>> On Wed, Mar 25, 2015 at 7:26 AM, Jens Axboe <axboe@...nel.dk> wrote:
>>>>>>>> If iocb->ki_filp->f_streamid is not set, then it should fall back to
>>>>>>>> whatever is set on the inode->i_streamid.
>>>>>>
>>>>>> Why should do the fall back?
>>>>>
>>>>> Because then you have a method of using streams with applications
>>>>> that aren't aware of streams.
>>>>>
>>>>> Or perhaps you have a file you know has different access patterns to
>>>>> the rest of the files in a directory, and you don't want to have to
>>>>> set the stream on every process that opens and uses that file. e.g.
>>>>> database writeahead log files (sequential write, never read) vs
>>>>> database index/table files (random read/write).....
>>>>>
>>>>>>> Good point, agree. Will make that change.
>>>>>>
>>>>>> That change causes problem for direct IO, for example
>>>>>>
>>>>>> process 1:
>>>>>> fd = open("/dev/nvme0n1", O_DIRECT...);
>>>>>> //set stream_id 1
>>>>>> fadvise(fd, 1, 0, POSIX_FADV_STREAMID);
>>>>>> pwrite(fd, ....);
>>>>>>
>>>>>> process 2:
>>>>>> fd = open("/dev/nvme0n1", O_DIRECT...);
>>>>>> //should be legacy stream_id 0
>>>>>> pwrite(fd, ....);
>>>>>>
>>>>>> But now process 2 also see stream_id 1, which is wrong.
>>>>>
>>>>> It's not wrong, your behaviour model is just different You have
>>>>> defined a process/fd based stream model and not considered
>>>>> considered that admins and applications might want to use a file
>>>>> based stream model instead, so applications don't need to even be
>>>>> aware that write streams are in use...
>>>>
>>>> The stream must be opened, otherwise device will return error if application
>>>> write to a not-opened stream.
>>>
>>> That's an extremely device specific *implementation* of a write
>>> stream. The *concept* of a write stream being passed from userspace to
>>> the block layer doesn't have such constraints, and I get realy
>>> concerned when implementations of a generic concept are so tightly
>>> focussed around one type of hardware implementation of the
>>> concept...
>>
>> Indeed, which is why the implementation posted cares ONLY about the
>> stream ID itself, and passing that through.
>>
>> But the point about fallback is valid, however, for some use cases
>> that will not be what you want. But we have to make some sort of
>> decision, and falling back to the inode set value (if one is set) is
>> probably the right thing to do in most use cases.
>
> Right, the question is then whether fadvise should set the value on
> the inode at all, because then the effect of setting it on a fd also
> changes the fallback. Perhaps we need to a distinction between
> "setting the stream for this fd" which lasts as long as the fd is
> active, and "setting the default inode stream" which is potentially
> a persistent operation if the filesystem stores it on disk...

Yes, that might be a good compromise. The easiest would be to define a 
second fadvise advice, where the stronger advice would be file + inode. 
Another option would be changing the file approach to use fcntl(), and 
keeping the fadvise for the inode. I'll be happy to take input on what 
people would prefer here.

>>>> Device has limited number of streams, for example, 16 streams.
>>>> There are 2 APIs to open/close the stream.
>>>
>>> What's to stop me writing something for DM-thinp that understands
>>> write streams in bios and uses it to separate out the write streams
>>> into different regions of the thinp device to improve locality of
>>> it's data placement and hence reduce fragmentation?
>>
>> Absolutely nothing, in fact that's one of the use cases that I had
>> in mind. Or for for caching software.
>
> *nod*. We are on the same page, then :)

Yes completely, basically just wanted to clarify that.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ