lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <697d9ed8-a9bd-cfc9-35a0-c1d8649a571a@infradead.org>
Date:   Wed, 22 Mar 2023 11:47:53 -0700
From:   Randy Dunlap <rdunlap@...radead.org>
To:     Jakub Kicinski <kuba@...nel.org>, davem@...emloft.net
Cc:     netdev@...r.kernel.org, edumazet@...gle.com, pabeni@...hat.com,
        Bagas Sanjaya <bagasdotme@...il.com>,
        Toke Høiland-Jørgensen <toke@...hat.com>,
        Pavel Pisa <pisa@....felk.cvut.cz>,
        Tony Nguyen <anthony.l.nguyen@...el.com>,
        Florian Fainelli <f.fainelli@...il.com>, corbet@....net,
        jesse.brandeburg@...el.com, mkl@...gutronix.de,
        linux-doc@...r.kernel.org, stephen@...workplumber.org,
        romieu@...zoreil.com
Subject: Re: [PATCH net-next v3] docs: networking: document NAPI

Hi,
I have a few comments for your consideration.

On 3/21/23 22:38, Jakub Kicinski wrote:
> Add basic documentation about NAPI. We can stop linking to the ancient
> doc on the LF wiki.
> 
> Signed-off-by: Jakub Kicinski <kuba@...nel.org>
> Link: https://lore.kernel.org/all/20230315223044.471002-1-kuba@kernel.org/
> Reviewed-by: Bagas Sanjaya <bagasdotme@...il.com>
> Reviewed-by: Toke Høiland-Jørgensen <toke@...hat.com>
> Acked-by: Pavel Pisa <pisa@....felk.cvut.cz> # for ctucanfd-driver.rst
> Reviewed-by: Tony Nguyen <anthony.l.nguyen@...el.com>
> Reviewed-by: Florian Fainelli <f.fainelli@...il.com>
> ---
> v3: rebase on net-next (to avoid ixgb conflict)
>     fold in grammar fixes from Stephen
> v2: https://lore.kernel.org/all/20230321050334.1036870-1-kuba@kernel.org/
>     remove the links in CAN and in ICE as well
>     improve the start of the threaded NAPI section
>     name footnote
>     internal links from the intro to sections
>     various clarifications from Florian and Stephen
> 
> CC: corbet@....net
> CC: jesse.brandeburg@...el.com
> CC: anthony.l.nguyen@...el.com
> CC: pisa@....felk.cvut.cz
> CC: mkl@...gutronix.de
> CC: linux-doc@...r.kernel.org
> CC: f.fainelli@...il.com
> CC: stephen@...workplumber.org
> CC: romieu@...zoreil.com
> ---
>  .../can/ctu/ctucanfd-driver.rst               |   3 +-
>  .../device_drivers/ethernet/intel/e100.rst    |   3 +-
>  .../device_drivers/ethernet/intel/i40e.rst    |   4 +-
>  .../device_drivers/ethernet/intel/ice.rst     |   4 +-
>  Documentation/networking/index.rst            |   1 +
>  Documentation/networking/napi.rst             | 251 ++++++++++++++++++
>  include/linux/netdevice.h                     |  13 +-
>  7 files changed, 266 insertions(+), 13 deletions(-)
>  create mode 100644 Documentation/networking/napi.rst


> diff --git a/Documentation/networking/napi.rst b/Documentation/networking/napi.rst
> new file mode 100644
> index 000000000000..4f848d86750c
> --- /dev/null
> +++ b/Documentation/networking/napi.rst
> @@ -0,0 +1,251 @@
> +.. _napi:
> +
> +====
> +NAPI
> +====
> +
> +NAPI is the event handling mechanism used by the Linux networking stack.
> +The name NAPI no longer stands for anything in particular [#]_.
> +
> +In basic operation device notifies the host about new events via an interrupt.

                     {a | the} device

> +The host then schedules a NAPI instance to process the events.
> +Device may also be polled for events via NAPI without receiving

A device may also be
The device may also be
Devices may also be

> +interrupts first (:ref:`busy polling<poll>`).
> +
> +NAPI processing usually happens in the software interrupt context,
> +but there is an option to use :ref:`separate kernel threads<threaded>`
> +for NAPI processing.
> +
> +All in all NAPI abstracts away from the drivers the context and configuration
> +of event (packet Rx and Tx) processing.
> +
> +Driver API
> +==========
> +
> +The two most important elements of NAPI are the struct napi_struct
> +and the associated poll method. struct napi_struct holds the state
> +of the NAPI instance while the method is the driver-specific event
> +handler. The method will typically free Tx packets that have been
> +transmitted and process newly received packets.
> +
> +.. _drv_ctrl:
> +
> +Control API
> +-----------
> +

[snip]

> +
> +Datapath API
> +------------
> +
> +napi_schedule() is the basic method of scheduling a NAPI poll.
> +Drivers should call this function in their interrupt handler
> +(see :ref:`drv_sched` for more info). A successful call to napi_schedule()
> +will take ownership of the NAPI instance.
> +
> +Later, after NAPI is scheduled, the driver's poll method will be
> +called to process the events/packets. The method takes a ``budget``
> +argument - drivers can process completions for any number of Tx
> +packets but should only process up to ``budget`` number of
> +Rx packets. Rx processing is usually much more expensive.
> +
> +In other words, it is recommended to ignore the budget argument when
> +performing TX buffer reclamation to ensure that the reclamation is not
> +arbitrarily bounded, however, it is required to honor the budget argument

               bounded; however
or
               bounded. However

> +for RX processing.
> +
> +.. warning::
> +
> +   The ``budget`` argument may be 0 if core tries to only process Tx completions
> +   and no Rx packets.
> +
> +The poll method returns the amount of work done. If the driver still
> +has outstanding work to do (e.g. ``budget`` was exhausted)
> +the poll method should return exactly ``budget``. In that case,

So it return the original 'budget' parameter value that was passed in to it?

> +the NAPI instance will be serviced/polled again (without the
> +need to be scheduled).
> +
> +If event processing has been completed (all outstanding packets
> +processed) the poll method should call napi_complete_done()
> +before returning. napi_complete_done() releases the ownership
> +of the instance.
> +
> +.. warning::
> +
> +   The case of finishing all events and using exactly ``budget``
> +   must be handled carefully. There is no way to report this
> +   (rare) condition to the stack, so the driver must either
> +   not call napi_complete_done() and wait to be called again,
> +   or return ``budget - 1``.
> +
> +   If the ``budget`` is 0 napi_complete_done() should never be called.
> +
> +Call sequence
> +-------------
> +

[snip]
> +
> +.. _drv_sched:
> +
> +Scheduling and IRQ masking
> +--------------------------
> +

[snip]

> +
> +Instance to queue mapping
> +-------------------------
> +
> +Modern devices have multiple NAPI instances (struct napi_struct) per
> +interface. There is no strong requirement on how the instances are
> +mapped to queues and interrupts. NAPI is primarily a polling/processing
> +abstraction without specific user-facing semantics. That said, most networking
> +devices end up using NAPI in fairly similar ways.
> +
> +NAPI instances most often correspond 1:1:1 to interrupts and queue pairs
> +(queue pair is a set of a single Rx and single Tx queue).
> +
> +In less common cases a NAPI instance may be used for multiple queues
> +or Rx and Tx queues can be serviced by separate NAPI instances on a single
> +core. Regardless of the queue assignment, however, there is usually still
> +a 1:1 mapping between NAPI instances and interrupts.
> +
> +It's worth noting that the ethtool API uses a "channel" terminology where
> +each channel can be either ``rx``, ``tx`` or ``combined``. It's not clear
> +what constitutes a channel, the recommended interpretation is to understand

                      channel;

> +a channel as an IRQ/NAPI which services queues of a given type. For example,
> +a configuration of 1 ``rx``, 1 ``tx`` and 1 ``combined`` channel is expected
> +to utilize 3 interrupts, 2 Rx and 2 Tx queues.
> +
> +User API
> +========
> +
> +User interactions with NAPI depend on NAPI instance ID. The instance IDs
> +are only visible to the user thru the ``SO_INCOMING_NAPI_ID`` socket option.
> +It's not currently possible to query IDs used by a given device.
> +
> +Software IRQ coalescing
> +-----------------------
> +

[snip]

> +
> +.. _poll:
> +
> +Busy polling
> +------------
> +
> +Busy polling allows user process to check for incoming packets before

                allows a user process
or
                allows user processes

> +the device interrupt fires. As is the case with any busy polling it trades
> +off CPU cycles for lower latency (in fact production uses of NAPI busy

I would drop "in fact".

> +polling are not well known).
> +
> +Busy polling is enabled by either setting ``SO_BUSY_POLL`` on
> +selected sockets or using the global ``net.core.busy_poll`` and
> +``net.core.busy_read`` sysctls. An io_uring API for NAPI busy polling
> +also exists.
> +
> +IRQ mitigation
> +---------------
> +

[snip]

and in any case:
Reviewed-by: Randy Dunlap <rdunlap@...radead.org>

Thanks.
-- 
~Randy

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ