[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250507-ublk_task_per_io-v6-8-a2a298783c01@purestorage.com>
Date: Wed, 07 May 2025 15:49:42 -0600
From: Uday Shankar <ushankar@...estorage.com>
To: Ming Lei <ming.lei@...hat.com>, Jens Axboe <axboe@...nel.dk>,
Caleb Sander Mateos <csander@...estorage.com>,
Andrew Morton <akpm@...ux-foundation.org>, Shuah Khan <shuah@...nel.org>,
Jonathan Corbet <corbet@....net>
Cc: linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-kselftest@...r.kernel.org, linux-doc@...r.kernel.org,
Uday Shankar <ushankar@...estorage.com>
Subject: [PATCH v6 8/8] Documentation: ublk: document UBLK_F_RR_TAGS
Document the new flag UBLK_F_RR_TAGS along with its intended use case.
Also describe the new restrictions on threading model imposed by
ublk_drv (one (qid,tag) pair is can be served by only one thread), and
remove references to ubq_daemon/per-queue threads, since such a concept
no longer exists.
Signed-off-by: Uday Shankar <ushankar@...estorage.com>
---
Documentation/block/ublk.rst | 83 ++++++++++++++++++++++++++++++++++++++------
1 file changed, 72 insertions(+), 11 deletions(-)
diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst
index 854f823b46c2add01d0b65ba36aecd26c45bb65d..e9cbabdd69c5539a02463780ba5e51de0416c3f6 100644
--- a/Documentation/block/ublk.rst
+++ b/Documentation/block/ublk.rst
@@ -115,15 +115,15 @@ managing and controlling ublk devices with help of several control commands:
- ``UBLK_CMD_START_DEV``
- After the server prepares userspace resources (such as creating per-queue
- pthread & io_uring for handling ublk IO), this command is sent to the
+ After the server prepares userspace resources (such as creating I/O handler
+ threads & io_uring for handling ublk IO), this command is sent to the
driver for allocating & exposing ``/dev/ublkb*``. Parameters set via
``UBLK_CMD_SET_PARAMS`` are applied for creating the device.
- ``UBLK_CMD_STOP_DEV``
Halt IO on ``/dev/ublkb*`` and remove the device. When this command returns,
- ublk server will release resources (such as destroying per-queue pthread &
+ ublk server will release resources (such as destroying I/O handler threads &
io_uring).
- ``UBLK_CMD_DEL_DEV``
@@ -208,15 +208,15 @@ managing and controlling ublk devices with help of several control commands:
modify how I/O is handled while the ublk server is dying/dead (this is called
the ``nosrv`` case in the driver code).
- With just ``UBLK_F_USER_RECOVERY`` set, after one ubq_daemon(ublk server's io
- handler) is dying, ublk does not delete ``/dev/ublkb*`` during the whole
+ With just ``UBLK_F_USER_RECOVERY`` set, after the ublk server exits,
+ ublk does not delete ``/dev/ublkb*`` during the whole
recovery stage and ublk device ID is kept. It is ublk server's
responsibility to recover the device context by its own knowledge.
Requests which have not been issued to userspace are requeued. Requests
which have been issued to userspace are aborted.
- With ``UBLK_F_USER_RECOVERY_REISSUE`` additionally set, after one ubq_daemon
- (ublk server's io handler) is dying, contrary to ``UBLK_F_USER_RECOVERY``,
+ With ``UBLK_F_USER_RECOVERY_REISSUE`` additionally set, after the ublk server
+ exits, contrary to ``UBLK_F_USER_RECOVERY``,
requests which have been issued to userspace are requeued and will be
re-issued to the new process after handling ``UBLK_CMD_END_USER_RECOVERY``.
``UBLK_F_USER_RECOVERY_REISSUE`` is designed for backends who tolerate
@@ -241,10 +241,11 @@ can be controlled/accessed just inside this container.
Data plane
----------
-ublk server needs to create per-queue IO pthread & io_uring for handling IO
-commands via io_uring passthrough. The per-queue IO pthread
-focuses on IO handling and shouldn't handle any control & management
-tasks.
+The ublk server should create dedicated threads for handling I/O. Each
+thread should have its own io_uring through which it is notified of new
+I/O, and through which it can complete I/O. These dedicated threads
+should focus on IO handling and shouldn't handle any control &
+management tasks.
The's IO is assigned by a unique tag, which is 1:1 mapping with IO
request of ``/dev/ublkb*``.
@@ -265,6 +266,13 @@ with specified IO tag in the command data:
destined to ``/dev/ublkb*``. This command is sent only once from the server
IO pthread for ublk driver to setup IO forward environment.
+ Once a thread issues this command against a given (qid,tag) pair, the thread
+ registers itself as that I/O's daemon. In the future, only that I/O's daemon
+ is allowed to issue commands against the I/O. If any other thread attempts
+ to issue a command against a (qid,tag) pair for which the thread is not the
+ daemon, the command will fail. Daemons can be reset only be going through
+ recovery.
+
- ``UBLK_IO_COMMIT_AND_FETCH_REQ``
When an IO request is destined to ``/dev/ublkb*``, the driver stores
@@ -309,6 +317,59 @@ with specified IO tag in the command data:
``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy
the server buffer (pages) read to the IO request pages.
+Load balancing
+--------------
+
+A simple approach to designing a ublk server might involve selecting a
+number of I/O handler threads N, creating devices with N queues, and
+pairing up I/O handler threads with queues, so that each thread gets a
+unique qid, and it issues ``FETCH_REQ``s against all tags for that qid.
+Indeed, before the introduction of the ``UBLK_F_RR_TAGS`` feature, this
+was essentially the only option (*)
+
+This approach can run into performance issues under imbalanced load.
+This architecture taken together with the `blk-mq architecture
+<https://docs.kernel.org/block/blk-mq.html>`_ implies that there is a
+fixed mapping from I/O submission CPU to the ublk server thread that
+handles it. If the workload is CPU-bottlenecked, only allowing one ublk
+server thread to handle all the I/O generated from a single CPU can
+limit peak bandwidth.
+
+To address this issue, two changes were made:
+
+- ublk servers can now pair up threads with I/Os (i.e. (qid,tag) pairs)
+ arbitrarily. In particular, the preexisting restriction that all I/Os
+ in one queue must be served by the same thread is lifted.
+- ublk servers can now specify ``UBLK_F_RR_TAGS`` when creating a ublk
+ device to get round-robin tag allocation on each queue
+
+The ublk server can check for the presence of these changes by testing
+for the ``UBLK_F_RR_TAGS`` feature.
+
+With these changes, a ublk server can balance load as follows:
+
+- create the device with ``UBLK_F_RR_TAGS`` set in
+ ``ublksrv_ctrl_dev_info::flags`` when issuing the ``ADD_DEV`` command
+- issue ``FETCH_REQ``s from ublk server threads to (qid,tag) pairs in
+ a round-robin manner. For example, for a device configured with
+ ``nr_hw_queues=2`` and ``queue_depth=4``, and a ublk server having 4
+ I/O handling threads, ``FETCH_REQ``s could be issued as follows, where
+ each entry in the table is the pair (``ublksrv_io_cmd::q_id``,
+ ``ublksrv_io_cmd::tag``) in the payload of the ``FETCH_REQ``.
+
+ ======== ======== ======== ========
+ thread 0 thread 1 thread 2 thread 3
+ ======== ======== ======== ========
+ (0, 0) (0, 1) (0, 2) (0, 3)
+ (1, 3) (1, 0) (1, 1) (1, 2)
+
+With this setup, I/O submitted on a CPU which maps to queue 0 will be
+balanced across all threads instead of all landing on the same thread.
+Thus, a potential bottleneck is avoided.
+
+(*) technically, one I/O handling thread could service multiple queues
+if it wanted to, but that doesn't help with imbalanced load
+
Zero copy
---------
--
2.34.1
Powered by blists - more mailing lists