lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e454bcf1-ae3c-41bd-b376-6560ea534925@infradead.org>
Date: Wed, 7 May 2025 16:08:49 -0700
From: Randy Dunlap <rdunlap@...radead.org>
To: Uday Shankar <ushankar@...estorage.com>, Ming Lei <ming.lei@...hat.com>,
 Jens Axboe <axboe@...nel.dk>, Caleb Sander Mateos <csander@...estorage.com>,
 Andrew Morton <akpm@...ux-foundation.org>, Shuah Khan <shuah@...nel.org>,
 Jonathan Corbet <corbet@....net>
Cc: linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-kselftest@...r.kernel.org, linux-doc@...r.kernel.org
Subject: Re: [PATCH v6 8/8] Documentation: ublk: document UBLK_F_RR_TAGS

Hi,

On 5/7/25 2:49 PM, Uday Shankar wrote:
> Document the new flag UBLK_F_RR_TAGS along with its intended use case.
> Also describe the new restrictions on threading model imposed by
> ublk_drv (one (qid,tag) pair is can be served by only one thread), and
> remove references to ubq_daemon/per-queue threads, since such a concept
> no longer exists.
> 
> Signed-off-by: Uday Shankar <ushankar@...estorage.com>
> ---
>  Documentation/block/ublk.rst | 83 ++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 72 insertions(+), 11 deletions(-)
> 
> diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst
> index 854f823b46c2add01d0b65ba36aecd26c45bb65d..e9cbabdd69c5539a02463780ba5e51de0416c3f6 100644
> --- a/Documentation/block/ublk.rst
> +++ b/Documentation/block/ublk.rst
> @@ -115,15 +115,15 @@ managing and controlling ublk devices with help of several control commands:
>  
>  - ``UBLK_CMD_START_DEV``
>  
> -  After the server prepares userspace resources (such as creating per-queue
> -  pthread & io_uring for handling ublk IO), this command is sent to the
> +  After the server prepares userspace resources (such as creating I/O handler
> +  threads & io_uring for handling ublk IO), this command is sent to the
>    driver for allocating & exposing ``/dev/ublkb*``. Parameters set via
>    ``UBLK_CMD_SET_PARAMS`` are applied for creating the device.
>  
>  - ``UBLK_CMD_STOP_DEV``
>  
>    Halt IO on ``/dev/ublkb*`` and remove the device. When this command returns,
> -  ublk server will release resources (such as destroying per-queue pthread &
> +  ublk server will release resources (such as destroying I/O handler threads &
>    io_uring).
>  
>  - ``UBLK_CMD_DEL_DEV``
> @@ -208,15 +208,15 @@ managing and controlling ublk devices with help of several control commands:
>    modify how I/O is handled while the ublk server is dying/dead (this is called
>    the ``nosrv`` case in the driver code).
>  
> -  With just ``UBLK_F_USER_RECOVERY`` set, after one ubq_daemon(ublk server's io
> -  handler) is dying, ublk does not delete ``/dev/ublkb*`` during the whole
> +  With just ``UBLK_F_USER_RECOVERY`` set, after the ublk server exits,
> +  ublk does not delete ``/dev/ublkb*`` during the whole
>    recovery stage and ublk device ID is kept. It is ublk server's
>    responsibility to recover the device context by its own knowledge.
>    Requests which have not been issued to userspace are requeued. Requests
>    which have been issued to userspace are aborted.
>  
> -  With ``UBLK_F_USER_RECOVERY_REISSUE`` additionally set, after one ubq_daemon
> -  (ublk server's io handler) is dying, contrary to ``UBLK_F_USER_RECOVERY``,
> +  With ``UBLK_F_USER_RECOVERY_REISSUE`` additionally set, after the ublk server
> +  exits, contrary to ``UBLK_F_USER_RECOVERY``,
>    requests which have been issued to userspace are requeued and will be
>    re-issued to the new process after handling ``UBLK_CMD_END_USER_RECOVERY``.
>    ``UBLK_F_USER_RECOVERY_REISSUE`` is designed for backends who tolerate
> @@ -241,10 +241,11 @@ can be controlled/accessed just inside this container.
>  Data plane
>  ----------
>  
> -ublk server needs to create per-queue IO pthread & io_uring for handling IO
> -commands via io_uring passthrough. The per-queue IO pthread
> -focuses on IO handling and shouldn't handle any control & management
> -tasks.
> +The ublk server should create dedicated threads for handling I/O. Each
> +thread should have its own io_uring through which it is notified of new
> +I/O, and through which it can complete I/O. These dedicated threads
> +should focus on IO handling and shouldn't handle any control &
> +management tasks.
>  
>  The's IO is assigned by a unique tag, which is 1:1 mapping with IO

   ???

>  request of ``/dev/ublkb*``.
> @@ -265,6 +266,13 @@ with specified IO tag in the command data:
>    destined to ``/dev/ublkb*``. This command is sent only once from the server
>    IO pthread for ublk driver to setup IO forward environment.
>  
> +  Once a thread issues this command against a given (qid,tag) pair, the thread
> +  registers itself as that I/O's daemon. In the future, only that I/O's daemon
> +  is allowed to issue commands against the I/O. If any other thread attempts
> +  to issue a command against a (qid,tag) pair for which the thread is not the
> +  daemon, the command will fail. Daemons can be reset only be going through
> +  recovery.
> +
>  - ``UBLK_IO_COMMIT_AND_FETCH_REQ``
>  
>    When an IO request is destined to ``/dev/ublkb*``, the driver stores
> @@ -309,6 +317,59 @@ with specified IO tag in the command data:
>    ``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy
>    the server buffer (pages) read to the IO request pages.
>  
> +Load balancing
> +--------------
> +
> +A simple approach to designing a ublk server might involve selecting a
> +number of I/O handler threads N, creating devices with N queues, and
> +pairing up I/O handler threads with queues, so that each thread gets a
> +unique qid, and it issues ``FETCH_REQ``s against all tags for that qid.
> +Indeed, before the introduction of the ``UBLK_F_RR_TAGS`` feature, this
> +was essentially the only option (*)

Add ending period (full stop), please.

> +
> +This approach can run into performance issues under imbalanced load.
> +This architecture taken together with the `blk-mq architecture
> +<https://docs.kernel.org/block/blk-mq.html>`_ implies that there is a
> +fixed mapping from I/O submission CPU to the ublk server thread that
> +handles it. If the workload is CPU-bottlenecked, only allowing one ublk
> +server thread to handle all the I/O generated from a single CPU can
> +limit peak bandwidth.
> +
> +To address this issue, two changes were made:
> +
> +- ublk servers can now pair up threads with I/Os (i.e. (qid,tag) pairs)
> +  arbitrarily. In particular, the preexisting restriction that all I/Os
> +  in one queue must be served by the same thread is lifted.
> +- ublk servers can now specify ``UBLK_F_RR_TAGS`` when creating a ublk
> +  device to get round-robin tag allocation on each queue

Add ending period (full stop), please.

> +
> +The ublk server can check for the presence of these changes by testing
> +for the ``UBLK_F_RR_TAGS`` feature.
> +
> +With these changes, a ublk server can balance load as follows:
> +
> +- create the device with ``UBLK_F_RR_TAGS`` set in
> +  ``ublksrv_ctrl_dev_info::flags`` when issuing the ``ADD_DEV`` command
> +- issue ``FETCH_REQ``s from ublk server threads to (qid,tag) pairs in
> +  a round-robin manner. For example, for a device configured with
> +  ``nr_hw_queues=2`` and ``queue_depth=4``, and a ublk server having 4
> +  I/O handling threads, ``FETCH_REQ``s could be issued as follows, where
> +  each entry in the table is the pair (``ublksrv_io_cmd::q_id``,
> +  ``ublksrv_io_cmd::tag``) in the payload of the ``FETCH_REQ``.
> +
> +  ======== ======== ======== ========
> +  thread 0 thread 1 thread 2 thread 3
> +  ======== ======== ======== ========
> +  (0, 0)   (0, 1)   (0, 2)   (0, 3)
> +  (1, 3)   (1, 0)   (1, 1)   (1, 2)
> +
> +With this setup, I/O submitted on a CPU which maps to queue 0 will be
> +balanced across all threads instead of all landing on the same thread.
> +Thus, a potential bottleneck is avoided.
> +
> +(*) technically, one I/O handling thread could service multiple queues

       Technically,

> +if it wanted to, but that doesn't help with imbalanced load

Add ending period (full stop), please.


> +
>  Zero copy
>  ---------
>  
> 

-- 
~Randy


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ