[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e454bcf1-ae3c-41bd-b376-6560ea534925@infradead.org>
Date: Wed, 7 May 2025 16:08:49 -0700
From: Randy Dunlap <rdunlap@...radead.org>
To: Uday Shankar <ushankar@...estorage.com>, Ming Lei <ming.lei@...hat.com>,
Jens Axboe <axboe@...nel.dk>, Caleb Sander Mateos <csander@...estorage.com>,
Andrew Morton <akpm@...ux-foundation.org>, Shuah Khan <shuah@...nel.org>,
Jonathan Corbet <corbet@....net>
Cc: linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-kselftest@...r.kernel.org, linux-doc@...r.kernel.org
Subject: Re: [PATCH v6 8/8] Documentation: ublk: document UBLK_F_RR_TAGS
Hi,
On 5/7/25 2:49 PM, Uday Shankar wrote:
> Document the new flag UBLK_F_RR_TAGS along with its intended use case.
> Also describe the new restrictions on threading model imposed by
> ublk_drv (one (qid,tag) pair is can be served by only one thread), and
> remove references to ubq_daemon/per-queue threads, since such a concept
> no longer exists.
>
> Signed-off-by: Uday Shankar <ushankar@...estorage.com>
> ---
> Documentation/block/ublk.rst | 83 ++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 72 insertions(+), 11 deletions(-)
>
> diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst
> index 854f823b46c2add01d0b65ba36aecd26c45bb65d..e9cbabdd69c5539a02463780ba5e51de0416c3f6 100644
> --- a/Documentation/block/ublk.rst
> +++ b/Documentation/block/ublk.rst
> @@ -115,15 +115,15 @@ managing and controlling ublk devices with help of several control commands:
>
> - ``UBLK_CMD_START_DEV``
>
> - After the server prepares userspace resources (such as creating per-queue
> - pthread & io_uring for handling ublk IO), this command is sent to the
> + After the server prepares userspace resources (such as creating I/O handler
> + threads & io_uring for handling ublk IO), this command is sent to the
> driver for allocating & exposing ``/dev/ublkb*``. Parameters set via
> ``UBLK_CMD_SET_PARAMS`` are applied for creating the device.
>
> - ``UBLK_CMD_STOP_DEV``
>
> Halt IO on ``/dev/ublkb*`` and remove the device. When this command returns,
> - ublk server will release resources (such as destroying per-queue pthread &
> + ublk server will release resources (such as destroying I/O handler threads &
> io_uring).
>
> - ``UBLK_CMD_DEL_DEV``
> @@ -208,15 +208,15 @@ managing and controlling ublk devices with help of several control commands:
> modify how I/O is handled while the ublk server is dying/dead (this is called
> the ``nosrv`` case in the driver code).
>
> - With just ``UBLK_F_USER_RECOVERY`` set, after one ubq_daemon(ublk server's io
> - handler) is dying, ublk does not delete ``/dev/ublkb*`` during the whole
> + With just ``UBLK_F_USER_RECOVERY`` set, after the ublk server exits,
> + ublk does not delete ``/dev/ublkb*`` during the whole
> recovery stage and ublk device ID is kept. It is ublk server's
> responsibility to recover the device context by its own knowledge.
> Requests which have not been issued to userspace are requeued. Requests
> which have been issued to userspace are aborted.
>
> - With ``UBLK_F_USER_RECOVERY_REISSUE`` additionally set, after one ubq_daemon
> - (ublk server's io handler) is dying, contrary to ``UBLK_F_USER_RECOVERY``,
> + With ``UBLK_F_USER_RECOVERY_REISSUE`` additionally set, after the ublk server
> + exits, contrary to ``UBLK_F_USER_RECOVERY``,
> requests which have been issued to userspace are requeued and will be
> re-issued to the new process after handling ``UBLK_CMD_END_USER_RECOVERY``.
> ``UBLK_F_USER_RECOVERY_REISSUE`` is designed for backends who tolerate
> @@ -241,10 +241,11 @@ can be controlled/accessed just inside this container.
> Data plane
> ----------
>
> -ublk server needs to create per-queue IO pthread & io_uring for handling IO
> -commands via io_uring passthrough. The per-queue IO pthread
> -focuses on IO handling and shouldn't handle any control & management
> -tasks.
> +The ublk server should create dedicated threads for handling I/O. Each
> +thread should have its own io_uring through which it is notified of new
> +I/O, and through which it can complete I/O. These dedicated threads
> +should focus on IO handling and shouldn't handle any control &
> +management tasks.
>
> The's IO is assigned by a unique tag, which is 1:1 mapping with IO
???
> request of ``/dev/ublkb*``.
> @@ -265,6 +266,13 @@ with specified IO tag in the command data:
> destined to ``/dev/ublkb*``. This command is sent only once from the server
> IO pthread for ublk driver to setup IO forward environment.
>
> + Once a thread issues this command against a given (qid,tag) pair, the thread
> + registers itself as that I/O's daemon. In the future, only that I/O's daemon
> + is allowed to issue commands against the I/O. If any other thread attempts
> + to issue a command against a (qid,tag) pair for which the thread is not the
> + daemon, the command will fail. Daemons can be reset only be going through
> + recovery.
> +
> - ``UBLK_IO_COMMIT_AND_FETCH_REQ``
>
> When an IO request is destined to ``/dev/ublkb*``, the driver stores
> @@ -309,6 +317,59 @@ with specified IO tag in the command data:
> ``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy
> the server buffer (pages) read to the IO request pages.
>
> +Load balancing
> +--------------
> +
> +A simple approach to designing a ublk server might involve selecting a
> +number of I/O handler threads N, creating devices with N queues, and
> +pairing up I/O handler threads with queues, so that each thread gets a
> +unique qid, and it issues ``FETCH_REQ``s against all tags for that qid.
> +Indeed, before the introduction of the ``UBLK_F_RR_TAGS`` feature, this
> +was essentially the only option (*)
Add ending period (full stop), please.
> +
> +This approach can run into performance issues under imbalanced load.
> +This architecture taken together with the `blk-mq architecture
> +<https://docs.kernel.org/block/blk-mq.html>`_ implies that there is a
> +fixed mapping from I/O submission CPU to the ublk server thread that
> +handles it. If the workload is CPU-bottlenecked, only allowing one ublk
> +server thread to handle all the I/O generated from a single CPU can
> +limit peak bandwidth.
> +
> +To address this issue, two changes were made:
> +
> +- ublk servers can now pair up threads with I/Os (i.e. (qid,tag) pairs)
> + arbitrarily. In particular, the preexisting restriction that all I/Os
> + in one queue must be served by the same thread is lifted.
> +- ublk servers can now specify ``UBLK_F_RR_TAGS`` when creating a ublk
> + device to get round-robin tag allocation on each queue
Add ending period (full stop), please.
> +
> +The ublk server can check for the presence of these changes by testing
> +for the ``UBLK_F_RR_TAGS`` feature.
> +
> +With these changes, a ublk server can balance load as follows:
> +
> +- create the device with ``UBLK_F_RR_TAGS`` set in
> + ``ublksrv_ctrl_dev_info::flags`` when issuing the ``ADD_DEV`` command
> +- issue ``FETCH_REQ``s from ublk server threads to (qid,tag) pairs in
> + a round-robin manner. For example, for a device configured with
> + ``nr_hw_queues=2`` and ``queue_depth=4``, and a ublk server having 4
> + I/O handling threads, ``FETCH_REQ``s could be issued as follows, where
> + each entry in the table is the pair (``ublksrv_io_cmd::q_id``,
> + ``ublksrv_io_cmd::tag``) in the payload of the ``FETCH_REQ``.
> +
> + ======== ======== ======== ========
> + thread 0 thread 1 thread 2 thread 3
> + ======== ======== ======== ========
> + (0, 0) (0, 1) (0, 2) (0, 3)
> + (1, 3) (1, 0) (1, 1) (1, 2)
> +
> +With this setup, I/O submitted on a CPU which maps to queue 0 will be
> +balanced across all threads instead of all landing on the same thread.
> +Thus, a potential bottleneck is avoided.
> +
> +(*) technically, one I/O handling thread could service multiple queues
Technically,
> +if it wanted to, but that doesn't help with imbalanced load
Add ending period (full stop), please.
> +
> Zero copy
> ---------
>
>
--
~Randy
Powered by blists - more mailing lists