linux-kernel - Re: [uml-devel] [PATCH] [RFC] um: Convert ubd driver to blk-mq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <c463ba61-440f-b8b6-9ece-5b625dab9e23@kot-begemot.co.uk>
Date:   Sun, 26 Nov 2017 14:42:27 +0000
From:   Anton Ivanov <anton.ivanov@...-begemot.co.uk>
To:     Richard Weinberger <richard@....at>
Cc:     user-mode-linux-devel <user-mode-linux-devel@...ts.sourceforge.net>,
        linux-kernel@...r.kernel.org, "hch@....de" <hch@....de>,
        Jens Axboe <axboe@...com>, linux-block@...r.kernel.org
Subject: Re: [uml-devel] [PATCH] [RFC] um: Convert ubd driver to blk-mq

On 26/11/17 13:56, Richard Weinberger wrote:
> Anton,
>
> please don't crop the CC list.

Apologies, I wanted to keep the discussion UML side until we have come
up with something.

Will not do it again.

>
> Am Sonntag, 26. November 2017, 14:41:12 CET schrieb Anton Ivanov:
>> I need to do some reading on this.
>>
>> First of all - a stupid question: mq's primary advantage is in
>> multi-core systems as it improves io and core utilization. We are still
>> single-core in UML and AFAIK this is likely to stay that way, right?
> Well, someday blk-mq should completely replace the legacy block interface.
> Christoph asked me convert the UML driver.
> Also do find corner cases in blk-mq.
>  
>> On 26/11/17 13:10, Richard Weinberger wrote:
>>> This is the first attempt to convert the UserModeLinux block driver
>>> (UBD) to blk-mq.
>>> While the conversion itself is rather trivial, a few questions
>>> popped up in my head. Maybe you can help me with them.
>>>
>>> MAX_SG is 64, used for blk_queue_max_segments(). This comes from
>>> a0044bdf60c2 ("uml: batch I/O requests"). Is this still a good/sane
>>> value for blk-mq?
>>>
>>> The driver does IO batching, for each request it issues many UML struct
>>> io_thread_req request to the IO thread on the host side.
>>> One io_thread_req per SG page.
>>> Before the conversion the driver used blk_end_request() to indicate that
>>> a part of the request is done.
>>> blk_mq_end_request() does not take a length parameter, therefore we can
>>> only mark the whole request as done. See the new is_last property on the
>>> driver.
>>> Maybe there is a way to partially end requests too in blk-mq?
>>>
>>> Another obstacle with IO batching is that UML IO thread requests can
>>> fail. Not only due to OOM, also because the pipe between the UML kernel
>>> process and the host IO thread can return EAGAIN.
>>> In this case the driver puts the request into a list and retried later
>>> again when the pipe turns writable.
>>> I’m not sure whether this restart logic makes sense with blk-mq, maybe
>>> there is a way in blk-mq to put back a (partial) request?
>> This all sounds to me as blk-mq requests need different inter-thread
>> IPC. We presently rely on the fact that each request to the IO thread is
>> fixed size and there is no natural request grouping coming from upper
>> layers.
>>
>> Unless I am missing something, this looks like we are now getting group
>> requests, right? We need to send a group at a time which is not
>> processed until the whole group has been received in the IO thread. We
>> cans still batch groups though, but should not batch individual
>> requests, right?
> The question is, do we really need batching at all with blk-mq?
> Jeff implemented that 10 years ago.

Well, but in that case we need to match our IPC to the existing batching
in the blck queue, right?

So my proposal still stands - I suggest we roll back my batching patch
which is no longer needed and change the IPC to match what is coming out
of blk-mq :)

>
>> My first step (before moving to mq) would have been to switch to a unix
>> domain socket pair probably using SOCK_SEQPACKET or SOCK_DGRAM. The
>> latter for a socket pair will return ENOBUF if you try to push more than
>> the receiving side can handle so we should not have IPC message loss.
>> This way, we can push request groups naturally instead of relying on a
>> "last" flag and keeping track of that for "end of request".
> The pipe is currently a socketpair. UML just calls it "pipe". :-(

I keep forgetting if we applied that patch or not :)

It was a pipe once upon a time and I suggested we change it socket pair
due to better buffering behavior for lots of small requests.

>
>> It will be easier to roll back the batching before we do that. Feel free
>> to roll back that commit.
>>
>> Once that is in, the whole batching will need to be redone as it should
>> account for variable IPC record size and use sendmmsg/recvmmsg pair -
>> same as in the vector IO. I am happy to do the honors on that one :)
> Let's see what block guys say.

Ack.

A.

>
> Thanks,
> //richard
>