[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 19 Dec 2017 16:37:14 -0800
From: Govindarajulu Varadarajan <gvaradar@...co.com>
To: David Miller <davem@...emloft.net>
CC: <netdev@...r.kernel.org>, <govindarajulu90@...il.com>,
<benve@...co.com>
Subject: Re: [PATCH net] enic: add wq clean up budget
On Fri, 8 Dec 2017, Govindarajulu Varadarajan wrote:
> On Wed, 6 Dec 2017, David Miller wrote:
>
>> From: Govindarajulu Varadarajan <gvaradar@...co.com>
>> Date: Tue, 5 Dec 2017 11:14:41 -0800
>>
>>> In case of tx clean up, we set '-1' as budget. This means clean up until
>>> wq is empty or till (1 << 32) pkts are cleaned. Under heavy load this
>>> will run for long time and cause
>>> "watchdog: BUG: soft lockup - CPU#25 stuck for 21s!" warning.
>>>
>>> This patch sets wq clean up budget to 256.
>>>
>>> Signed-off-by: Govindarajulu Varadarajan <gvaradar@...co.com>
>>
>> This driver with all of it's indirection and layers upon layers of
>> macros for queue processing is so difficult to read, and this can't
>> be generating nice optimal code either...
>>
>> Anyways, I was walking over the driver to see if the logic is
>> contributing to this.
>>
>> The limit you are proposing sounds unnecessary, nobody else I can
>> see needs this, and that includes all of the most heavily used
>> drivers under load.
>
> I used 256 as the limit because most of the other drivers use it.
>
> * mlx4 uses MLX4_EN_DEFAULT_TX_WORK as the tx budget in
> mlx4_en_process_tx_cq()
> Added in commit fbc6daf19745 ("net/mlx4_en: Ignore budget on TX napi
> polling")
>
> * i40e&vf uses vsi->work_limit as tx budget in i40e_clean_tx_irq(), which is
> set to I40E_DEFAULT_IRQ_WORK. Added in commit
> a619afe814453 ("i40e/i40evf: Add support for bulk free in Tx cleanup")
>
> * ixgbe uses q_vector->tx.work_limit as tx budget in ixgbe_clean_tx_irq(),
> which is set to IXGBE_DEFAULT_TX_WORK. Added in commit
> 592245559e900 ("ixgbe: Change default Tx work limit size to 256 buffers")
>
>>
>> If I had to guess I'd say that the problem is that the queue loop
>> keeps sampling the head and tail pointers, where as it should just
>> do that _once_ and only process that TX entries found in that
>> snapshot and return to the poll() routine immedately afterwards.
>
> The only way to know the tail pointer at the time napi is scheduled is to
> read
> hw fetch_index register. This is discouraged by hw engineers.
>
> We work around this by using color bit. Every cq entry has color bit. It is
> either 0 or 1. Hw flips the bit when it creates a new cq entry. So every new
> cq entry will have a different color bit than previous. We reach end of the
> queue when previous color bit is same as current cq entry's color. i.e hw did
> not flip the bit, so its not a new cq entry.
>
> So enic driver cannot know the tail pointer at the time napi is scheduled,
> until
> we reach the tail pointer.
>
David,
How would you want us to fix this issue? Is doing an ioread on fetch_index for
every poll our only option? (to get head and tail point once)
If 256 is not reasonable, will wq_budget equal to wq ring size be acceptable?
At any point number of wq entries to be cleaned cannot be more than ring size.
Thanks
Govind
Powered by blists - more mailing lists