lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b63fcab6-ad2f-c356-6818-08e497ff13e1@gmail.com>
Date:   Thu, 1 Apr 2021 23:46:36 +0200
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     Gatis Peisenieks <gatis@...rotik.com>, chris.snook@...il.com,
        davem@...emloft.net, kuba@...nel.org, hkallweit1@...il.com,
        jesse.brandeburg@...el.com, dchickles@...vell.com,
        tully@...rotik.com
Cc:     netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net] atl1c: move tx cleanup processing out of interrupt



On 4/1/21 7:32 PM, Gatis Peisenieks wrote:
> Tx queue cleanup happens in interrupt handler on same core as rx queue processing.
> Both can take considerable amount of processing in high packet-per-second scenarios.
> 
> Sending big amounts of packets can stall the rx processing which is unfair
> and also can lead to to out-of-memory condition since __dev_kfree_skb_irq
> queues the skbs for later kfree in softirq which is not allowed to happen
> with heavy load in interrupt handler.
> 
> This puts tx cleanup in its own napi to be executed on different core than rx
> cleanup which solves the mentioned problems and increases the throughput.
> 
> The ability to sustain equal amounts of tx/rx traffic increased:
> from 280Kpps to 1130Kpps on Threadripper 3960X with upcoming Mikrotik 10/25G NIC,
> from 520Kpps to 850Kpps on Intel i3-3320 with Mikrotik RB44Ge adapter.
> 
> Signed-off-by: Gatis Peisenieks <gatis@...rotik.com>



>              }
>          }
> -        if (status & ISR_TX_PKT)
> -            atl1c_clean_tx_irq(adapter, atl1c_trans_normal);
> +        if (status & ISR_TX_PKT) {
> +            if (napi_schedule_prep(&adapter->tx_napi)) {
> +                int tx_cpu = (smp_processor_id() + 1) %
> +                    num_online_cpus();

Ouch. Please do not burry in a driver such hard-coded facility.

There is no guarantee tx_cpu is an online cpu .

> +                spin_lock(&adapter->irq_mask_lock);
> +                hw->intr_mask &= ~ISR_TX_PKT;
> +                AT_WRITE_REG(hw, REG_IMR, hw->intr_mask);
> +                spin_unlock(&adapter->irq_mask_lock);
> +                smp_call_function_single_async(tx_cpu,
> +                                   &adapter->csd);
> +            }
> +        }
>
Have you tried using a second NAPI, but no csd ?

We now have kthread napi [1]

By simply enabling kthread NAPI on your NIC, you should get same nice behavior.

As a bonus, on moderate load you would use a single cpu instead of two.

[1]
cb038357937ee4f589aab2469ec3896dce90f317 net: fix race between napi kthread mode and busy poll
5fdd2f0e5c64846bf3066689b73fc3b8dddd1c74 net: add sysfs attribute to control napi threaded mode
29863d41bb6e1d969c62fdb15b0961806942960e net: implement threaded-able napi poll loop support

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ