[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7aabb6b4-df8d-8554-fbe3-90504887fb8e@kernel.dk>
Date: Mon, 6 Mar 2017 10:06:43 -0700
From: Jens Axboe <axboe@...nel.dk>
To: Avi Kivity <avi@...lladb.com>, Jan Kara <jack@...e.cz>
Cc: Goldwyn Rodrigues <rgoldwyn@...e.de>, jack@...e.com,
hch@...radead.org, linux-fsdevel@...r.kernel.org,
linux-block@...r.kernel.org, linux-btrfs@...r.kernel.org,
linux-ext4@...r.kernel.org, linux-xfs@...r.kernel.org
Subject: Re: [PATCH 0/8 v2] Non-blocking AIO
On 03/06/2017 09:59 AM, Avi Kivity wrote:
>
>
> On 03/06/2017 06:08 PM, Jens Axboe wrote:
>> On 03/06/2017 08:59 AM, Avi Kivity wrote:
>>> On 03/06/2017 05:38 PM, Jens Axboe wrote:
>>>> On 03/06/2017 08:29 AM, Avi Kivity wrote:
>>>>> On 03/06/2017 05:19 PM, Jens Axboe wrote:
>>>>>> On 03/06/2017 01:25 AM, Jan Kara wrote:
>>>>>>> On Sun 05-03-17 16:56:21, Avi Kivity wrote:
>>>>>>>>> The goal of the patch series is to return -EAGAIN/-EWOULDBLOCK if
>>>>>>>>> any of these conditions are met. This way userspace can push most
>>>>>>>>> of the write()s to the kernel to the best of its ability to complete
>>>>>>>>> and if it returns -EAGAIN, can defer it to another thread.
>>>>>>>>>
>>>>>>>> Is it not possible to push the iocb to a workqueue? This will allow
>>>>>>>> existing userspace to work with the new functionality, unchanged. Any
>>>>>>>> userspace implementation would have to do the same thing, so it's not like
>>>>>>>> we're saving anything by pushing it there.
>>>>>>> That is not easy because until IO is fully submitted, you need some parts
>>>>>>> of the context of the process which submits the IO (e.g. memory mappings,
>>>>>>> but possibly also other credentials). So you would need to somehow transfer
>>>>>>> this information to the workqueue.
>>>>>> Outside of technical challenges, the API also needs to return EAGAIN or
>>>>>> start blocking at some point. We can't expose a direct connection to
>>>>>> queue work like that, and let any user potentially create millions of
>>>>>> pending work items (and IOs).
>>>>> You wouldn't expect more concurrent events than the maxevents parameter
>>>>> that was supplied to io_setup syscall; it should have reserved any
>>>>> resources needed.
>>>> Doesn't matter what limit you apply, my point still stands - at some
>>>> point you have to return EAGAIN, or block. Returning EAGAIN without
>>>> the caller having flagged support for that change of behavior would
>>>> be problematic.
>>> Doesn't it already return EAGAIN (or some other error) if you exceed
>>> maxevents?
>> It's a setup thing. We check these limits when someone creates an IO
>> context, and carve out the specified entries form our global pool. Then
>> we free those "resources" when the io context is freed.
>>
>> Right now I can setup an IO context with 1000 entries on it, yet that
>> number has NO bearing on when io_submit() would potentially block or
>> return EAGAIN.
>>
>> We can have a huge gap on the intent signaled by io context setup, and
>> the reality imposed by what actually happens on the IO submission side.
>
> Isn't that a bug? Shouldn't that 1001st incomplete io_submit() return
> EAGAIN?
>
> Just tested it, and maxevents is not respected for this:
>
> io_setup(1, [0x7fc64537f000]) = 0
> io_submit(0x7fc64537f000, 10, [{pread, fildes=3, buf=0x1eb4000,
> nbytes=4096, offset=0}, {pread, fildes=3, buf=0x1eb4000, nbytes=4096,
> offset=0}, {pread, fildes=3, buf=0x1eb4000, nbytes=4096, offset=0},
> {pread, fildes=3, buf=0x1eb4000, nbytes=4096, offset=0}, {pread,
> fildes=3, buf=0x1eb4000, nbytes=4096, offset=0}, {pread, fildes=3,
> buf=0x1eb4000, nbytes=4096, offset=0}, {pread, fildes=3, buf=0x1eb4000,
> nbytes=4096, offset=0}, {pread, fildes=3, buf=0x1eb4000, nbytes=4096,
> offset=0}, {pread, fildes=3, buf=0x1eb4000, nbytes=4096, offset=0},
> {pread, fildes=3, buf=0x1eb4000, nbytes=4096, offset=0}]) = 10
>
> which is unexpected, to me.
ioctx_alloc()
{
[...]
/*
* We keep track of the number of available ringbuffer slots, to prevent
* overflow (reqs_available), and we also use percpu counters for this.
*
* So since up to half the slots might be on other cpu's percpu counters
* and unavailable, double nr_events so userspace sees what they
* expected: additionally, we move req_batch slots to/from percpu
* counters at a time, so make sure that isn't 0:
*/
nr_events = max(nr_events, num_possible_cpus() * 4);
nr_events *= 2;
}
--
Jens Axboe
Powered by blists - more mailing lists