linux-kernel - Re: [PATCH V4 00/27] ublk: add UBLK_F_BATCH

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aSpLN3xPwCqToYrZ@fedora>
Date: Sat, 29 Nov 2025 09:24:07 +0800
From: Ming Lei <ming.lei@...hat.com>
To: Caleb Sander Mateos <csander@...estorage.com>
Cc: Jens Axboe <axboe@...nel.dk>, linux-block@...r.kernel.org,
	Uday Shankar <ushankar@...estorage.com>,
	Stefani Seibold <stefani@...bold.net>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH V4 00/27] ublk: add UBLK_F_BATCH_IO

On Fri, Nov 28, 2025 at 11:07:17AM -0800, Caleb Sander Mateos wrote:
> On Fri, Nov 28, 2025 at 8:19 AM Jens Axboe <axboe@...nel.dk> wrote:
> >
> > On 11/28/25 4:59 AM, Ming Lei wrote:
> > > On Fri, Nov 21, 2025 at 09:58:22AM +0800, Ming Lei wrote:
> > >> Hello,
> > >>
> > >> This patchset adds UBLK_F_BATCH_IO feature for communicating between kernel and ublk
> > >> server in batching way:
> > >>
> > >> - Per-queue vs Per-I/O: Commands operate on queues rather than individual I/Os
> > >>
> > >> - Batch processing: Multiple I/Os are handled in single operation
> > >>
> > >> - Multishot commands: Use io_uring multishot for reducing submission overhead
> > >>
> > >> - Flexible task assignment: Any task can handle any I/O (no per-I/O daemons)
> > >>
> > >> - Better load balancing: Tasks can adjust their workload dynamically
> > >>
> > >> - help for future optimizations:
> > >>      - blk-mq batch tags free
> > >>      - support io-poll
> > >>      - per-task batch for avoiding per-io lock
> > >>      - fetch command priority
> > >>
> > >> - simplify command cancel process with per-queue lock
> > >>
> > >> selftest are provided.
> > >>
> > >>
> > >> Performance test result(IOPS) on V3:
> > >>
> > >> - page copy
> > >>
> > >> tools/testing/selftests/ublk//kublk add -t null -q 16 [-b]
> > >>
> > >> - zero copy(--auto_zc)
> > >> tools/testing/selftests/ublk//kublk add -t null -q 16 --auto_zc [-b]
> > >>
> > >> - IO test
> > >> taskset -c 0-31 fio/t/io_uring -p0 -n $JOBS -r 30 /dev/ublkb0
> > >>
> > >> 1) 16 jobs IO
> > >> - page copy:                         37.77M vs. 42.40M(BATCH_IO), +12%
> > >> - zero copy(--auto_zc): 42.83M vs. 44.43M(BATCH_IO), +3.7%
> > >>
> > >>
> > >> 2) single job IO
> > >> - page copy:                         2.54M vs. 2.6M(BATCH_IO),   +2.3%
> > >> - zero copy(--auto_zc): 3.13M vs. 3.35M(BATCH_IO),  +7%
> > >>
> > >>
> > >> V4:
> > >>      - fix handling in case of running out of mshot buffer, request has to
> > >>        be un-prepared for zero copy
> > >>      - don't expose unused tag to userspace
> > >>      - replace fixed buffer with plain user buffer for
> > >>        UBLK_U_IO_PREP_IO_CMDS and UBLK_U_IO_COMMIT_IO_CMDS
> > >>      - replace iov iterator with plain copy_from_user() for
> > >>        ublk_walk_cmd_buf(), code is simplified with performance improvement
> > >>      - don't touch sqe->len for UBLK_U_IO_PREP_IO_CMDS and
> > >>        UBLK_U_IO_COMMIT_IO_CMDS(Caleb Sander Mateos)
> > >>      - use READ_ONCE() for access sqe->addr (Caleb Sander Mateos)
> > >>      - all kinds of patch style fix(Caleb Sander Mateos)
> > >>      - inline __kfifo_alloc() (Caleb Sander Mateos)
> > >
> > > Hi Caleb Sander Mateos and Jens,
> > >
> > > Caleb have reviewed patch 1 ~ patch 8, and driver patch 9 ~ patch 18 are not
> > > reviewed yet.
> > >
> > > I'd want to hear your idea for how to move on. So far, looks there are
> > > several ways:
> > >
> > > 1) merge patch 1 ~ patch 6 to v6.19 first, which can be prep patches for BATCH_IO
> > >
> > > 2) delay the whole patchset to v6.20 cycle
> > >
> > > 3) merge the whole patchset to v6.19
> > >
> > > I am fine with either one, which one do you prefer to?
> > >
> > > BTW, V4 pass all builtin function and stress tests, and there is just one small bug
> > > fix not posted yet, which can be a follow-up. The new feature takes standalone
> > > code path, so regression risk is pretty small.
> >
> > I'm fine taking the whole thing for 6.19. Caleb let me know if you
> > disagree. I'll queue 1..6 for now, then can follow up later today with
> > the rest as needed.
> 
> Sorry I haven't gotten around to reviewing the rest of the series yet.
> I will try to take a look at them all this weekend. I'm not sure the
> batching feature would make sense for our ublk application use case,
> but I have no objection to it as long as it doesn't regress the
> non-batched ublk behavior/performance.
> No problem with queueing up patches 1-6 now (though patch 1 may need
> an ack from a kfifo maintainer?).

BTW, there are many good things with BATCH_IO features:

- batch blk-mq completion: page copy IO mode has shown >12% IOPS improvement; and
	there is chance to apply it for zero copy too in future

- io poll become much easier to support: it can be used to poll nvme char/block device
  to get better iops

- io cancel code path becomes less fragile, and easier to debug: in typical
  implementation, there is only one or two per-queue FETCH(multishot)
  command, others are just sync one-shot commands.

- more chances to improve perf: saved lots of generic uring_cmd code
  path cost, such as, security_uring_cmd()

- `perf bug fix` for UBLK_F_PER_IO_DAEMON, meantime robust load balance
  support

	iops is improved by 4X-5X in `fio/t/io_uring -p0 /dev/ublkbN` between:
		./kublk add -t null  --nthreads 8 -q 4 --per_io_tasks
		and
		./kublk add -t null  --nthreads 8 -q 4 -b

- with per-io lock: fast io path becomes more robust, still can be bypassed
  in future in case of per-io-daemon 


The cost is some complexity in ublk server implementation for maintaining
one or two per-queue FETCH buffer, and one or two per-queue COMMIT buffer.


Thanks,
Ming