[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4f22f15f-c15f-5fba-1569-3da8c0f37f0e@kernel.dk>
Date: Thu, 19 Jan 2023 11:49:04 -0700
From: Jens Axboe <axboe@...nel.dk>
To: Ming Lei <ming.lei@...hat.com>, io-uring@...r.kernel.org,
linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
nbd@...er.debian.org
Subject: Re: ublk-nbd: ublk-nbd is avaialbe
On 1/19/23 7:23 AM, Ming Lei wrote:
> Hi,
>
> ublk-nbd[1] is available now.
>
> Basically it is one nbd client, but totally implemented in userspace,
> and wrt. current nbd-client in [2], the transmission phase is done
> by linux block nbd driver.
>
> The handshake implementation is borrowed from nbd project[2], so
> basically ublk-nbd just adds new code for implementing transmission
> phase, and it can be thought as moving linux block nbd driver into
> userspace.
>
> The added new code is basically in nbd/tgt_nbd.cpp, and io handling
> is based on liburing[3], and implemented by c++20 coroutine, so
> everything is done in single pthread totally lockless, meantime turns
> out it is pretty easy to design & implement, attributed to ublk framework,
> c++20 coroutine and liburing.
>
> ublk-nbd supports both tcp and unix socket, and allows to enable io_uring
> send zero copy via command line '--send_zc', see details in README[4].
>
> No regression is found in xfstests by using ublk-nbd as both test device
> and scratch device, and builtin test(make test T=nbd) runs well.
>
> Fio test("make test T=nbd") shows that ublk-nbd performance is
> basically same with nbd-client/nbd driver when running fio on real
> ethernet link(1g, 10+g), but ublk-nbd IOPS is higher by ~40% than
> nbd-client(nbd driver) with 512K BS, which is because linux nbd
> driver sets max_sectors_kb as 64KB at default.
>
> But when running fio over local tcp socket, it is observed in my test
> machine that ublk-nbd performs better than nbd-client/nbd driver,
> especially with 2 queue/2 jobs, and the gap could be 10% ~ 30%
> according to different block size.
This is pretty nice! Just curious, have you tried setting up your
ring with
p.flags |= IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN;
and see if that yields any extra performance improvements for you?
Depending on how you do processing, you should not need to do any
further changes there.
A "lighter" version is just setting IORING_SETUP_COOP_TASKRUN.
--
Jens Axboe
Powered by blists - more mailing lists