[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LNX.2.20.1712081528060.28747@cae-iprp-alln-lb.cisco.com>
Date: Fri, 8 Dec 2017 16:14:41 -0800
From: Govindarajulu Varadarajan <gvaradar@...co.com>
To: David Miller <davem@...emloft.net>
CC: <netdev@...r.kernel.org>, <govindarajulu90@...il.com>,
<benve@...co.com>
Subject: Re: [PATCH net] enic: add wq clean up budget
On Wed, 6 Dec 2017, David Miller wrote:
> From: Govindarajulu Varadarajan <gvaradar@...co.com>
> Date: Tue, 5 Dec 2017 11:14:41 -0800
>
>> In case of tx clean up, we set '-1' as budget. This means clean up until
>> wq is empty or till (1 << 32) pkts are cleaned. Under heavy load this
>> will run for long time and cause
>> "watchdog: BUG: soft lockup - CPU#25 stuck for 21s!" warning.
>>
>> This patch sets wq clean up budget to 256.
>>
>> Signed-off-by: Govindarajulu Varadarajan <gvaradar@...co.com>
>
> This driver with all of it's indirection and layers upon layers of
> macros for queue processing is so difficult to read, and this can't
> be generating nice optimal code either...
>
> Anyways, I was walking over the driver to see if the logic is
> contributing to this.
>
> The limit you are proposing sounds unnecessary, nobody else I can
> see needs this, and that includes all of the most heavily used
> drivers under load.
I used 256 as the limit because most of the other drivers use it.
* mlx4 uses MLX4_EN_DEFAULT_TX_WORK as the tx budget in mlx4_en_process_tx_cq()
Added in commit fbc6daf19745 ("net/mlx4_en: Ignore budget on TX napi polling")
* i40e&vf uses vsi->work_limit as tx budget in i40e_clean_tx_irq(), which is
set to I40E_DEFAULT_IRQ_WORK. Added in commit
a619afe814453 ("i40e/i40evf: Add support for bulk free in Tx cleanup")
* ixgbe uses q_vector->tx.work_limit as tx budget in ixgbe_clean_tx_irq(),
which is set to IXGBE_DEFAULT_TX_WORK. Added in commit
592245559e900 ("ixgbe: Change default Tx work limit size to 256 buffers")
>
> If I had to guess I'd say that the problem is that the queue loop
> keeps sampling the head and tail pointers, where as it should just
> do that _once_ and only process that TX entries found in that
> snapshot and return to the poll() routine immedately afterwards.
The only way to know the tail pointer at the time napi is scheduled is to read
hw fetch_index register. This is discouraged by hw engineers.
We work around this by using color bit. Every cq entry has color bit. It is
either 0 or 1. Hw flips the bit when it creates a new cq entry. So every new
cq entry will have a different color bit than previous. We reach end of the
queue when previous color bit is same as current cq entry's color. i.e hw did
not flip the bit, so its not a new cq entry.
So enic driver cannot know the tail pointer at the time napi is scheduled, until
we reach the tail pointer.
Powered by blists - more mailing lists