lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALzJLG8EdBJFYv2++qg-ht5thQL-6M85onxQNR4ee7H3ryuJwQ@mail.gmail.com>
Date:   Fri, 24 Nov 2017 04:05:22 -0800
From:   Saeed Mahameed <saeedm@....mellanox.co.il>
To:     Andy Gospodarek <andy@...yhouse.net>,
        Tariq Toukan <tariqt@...lanox.com>, talg@...lanox.com
Cc:     Linux Netdev List <netdev@...r.kernel.org>,
        Michael Chan <mchan@...adcom.com>,
        Saeed Mahameed <saeedm@...lanox.com>,
        Andy Gospodarek <gospo@...adcom.com>
Subject: Re: [RFC 0/9] net: create adaptive software irq moderation library

On Sun, Nov 5, 2017 at 9:44 PM, Andy Gospodarek <andy@...yhouse.net> wrote:
> From: Andy Gospodarek <gospo@...adcom.com>
>
> This RFC converts the adaptive interrupt moderation library from the
> mlx5_en driver into a library so it can be used by any driver.  The last
> patch in this set adds support for interrupt moderation in the bnxt_en
> driver.
>
> The main purpose of this code in the mlx5_en driver is to allow an
> administrator to make sure that default coalesce settings are optimized
> for low latency, but quickly adapt to handle high throughput traffic and
> optimize how many packets are received during each napi poll.
>
> For any new driver the following changes would be needed to use this
> library:
>
> - add elements in ring struct to track items needed by this library
> - create function that can be called to actually set coalesce settings
>   for the driver
>
> My main reason for making this an RFC is that I would like verification
> from Mellanox that the performance of their driver does not change in a
> unintended way.  I did some basic testing (netperf) and did not note a
> statistically significant change in throughput or CPU utilization before
> and after this set.
>
> Credit to Rob Rice and Lee Reed for doing some of the initial proof of
> concept and testing for this patch.

Hi Andy,

Following our conversation in netdev 2.2,  i would like to suggest the
following:

Instead of introducing a new API which demands from the driver to
provide callbacks and function pointers to the adaptive moderation
logic, which might be called on every irq interrupt, and to avoid
performance hit, we can move the generic code and the core adaptive
moderation logic to a header file.

the mlx5e am logic and data structures are already written in a very
modular way and can be stripped out of mlx5e fairly easily.
And i would like to suggest to do it in the following manner:

1. naming convention:
I would like to change the generic code naming convention to have the
words DIM (Dynamically-Tuned Interrupt Moderation) instead of mlx5e_am
or am, Following our public blog [1] of the matter and the official
name we prefer for this feature.

[1] https://community.mellanox.com/docs/DOC-2511

Suggested naming convention instead of rx_am:  net_dim (DIM for net
applications).
As the rx_am or (dim) logic can be applied to other applications.

2. Data types:

All below mlx5e am data types can be used as is as they hold nothing
mlx5 related.

struct mlx5e_rx_am_sample
  - Holds the current stats sample with ktime stamp
  - rename to: net_dim_sample

struct mlx5e_rx_am_stats
 - Holds the needed stats (delta) calculation of last 2 samples
 - rename to: net_dim_stats

struct mlx5e_rx_am
 - Adaptive moderation handle
 - rename to: net_dim

3. static inline generic functions API (based on the usage from
mlx5e_rx_am function)

//Make a DIM measurement:
net_dim_sample(struct *net_dim_sample sample, packets, bytes, event_ctr)
- previously mlx5e_am_sample()
- Fills a sample struct with the provided stats and the current timestamp


//start a new DIM measurement and handles the DIM state machine initial state:
net_dim_start_sample(struct *net_dim rx_dim)
 - Makes a new measurement
 - stores it into rx_dim->start
 - rx_dim->state = DIM_MEASURE_IN_PROGRESS


// Takes a new sample (curr_sample) and makes the decision (handles
DIM_MEASURE_IN_PROGRESS state)
net_dim_decision(struct *net_dim rx_dim, curr_sample)
  -  previously mlx5e_am_decision
  - Note, instead of providing the current_stats (delta between start
and current_sample) I suggest to provide the current_sample and move
the stats calculation logic into net_dime_decision.
   - All the logic in this function will move to the generic code.

4. Driver implementation: (according to the above suggested API)
   -  Driver should initialize struct net_dim rx_dim, and provide a
work function to handle "dim apply new profile" decision.
   - in napi_poll driver should implement the rx_dim state machine
using the above API before arming the completion event queues as
follows:

mlx5e_rx_am:

void mlx5e_rx_am(struct mlx5e_rq *rq)
{
       struct net_dim *rx_dim = &rq->dim;
       struct net_dim_sample end_sample;
       u16 nevents;

       switch (rx_dim->state) {
       case DIM_MEASURE_IN_PROGRESS:
           // driver specific pre condition to decide whether to
continue or skip
           // Note that here we only sample and don't calc the delta
stats, this logic moved into net_dim_decision
           net_dim_sample(rq, &end_sample, rq->packets, rq->bytes, cq->events);
           if (net_dim_decision(rx_dim, &end_sample)) {
               rx_dim->state = DIM_APPLY_NEW_PROFILE;
               schedule_work(&rx_dim->work);
            }
        /* fall through */
       case DIM_START_MEASURE:
           net_dim_start_sample(rx_dim);
       break;
       case DIM_APPLY_NEW_PROFILE:
       break;
}

Thanks,
Saeed.

>
> Andy Gospodarek (9):
>   mlx5_en: move interrupt moderation structs to new file
>   mlx5_en: move interrupt moderation forward delcarations
>   mlx5_en: remove rq references in mlx5e_rx_am
>   mlx5_en: move AM logic enums
>   mlx5_en: move generic functions to new file
>   mlx5_en: rename en_rx_am.h to net_rx_am.h
>   mlx5_en: remove Mellanox references in AM code
>   net: move adaptive interrpt coalescing code to lib/
>   bnxt_en: add support for software adaptive interrupt moderation
>
>  drivers/net/ethernet/broadcom/bnxt/Makefile        |   2 +-
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c          |  51 ++++
>  drivers/net/ethernet/broadcom/bnxt/bnxt.h          |  34 ++-
>  drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c  |   7 +
>  drivers/net/ethernet/broadcom/bnxt/bnxt_rx_am.c    |  32 +++
>  drivers/net/ethernet/mellanox/mlx5/core/en.h       |  43 +--
>  .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   6 +-
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  18 +-
>  drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   4 +-
>  drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 298 +-------------------
>  drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |   5 +-
>  include/linux/mlx5/mlx5_ifc.h                      |   6 -
>  include/linux/net_rx_am.h                          | 109 ++++++++
>  lib/Makefile                                       |   2 +-
>  lib/net_rx_am.c                                    | 306 +++++++++++++++++++++
>  15 files changed, 558 insertions(+), 365 deletions(-)
>  create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_rx_am.c
>  create mode 100644 include/linux/net_rx_am.h
>  create mode 100644 lib/net_rx_am.c
>
> --
> 2.7.4
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ