netdev - Re: [PATCH net] enic: add wq clean up budget

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LNX.2.20.1712191623240.28262@cae-iprp-alln-lb.cisco.com>
Date:   Tue, 19 Dec 2017 16:37:14 -0800
From:   Govindarajulu Varadarajan <gvaradar@...co.com>
To:     David Miller <davem@...emloft.net>
CC:     <netdev@...r.kernel.org>, <govindarajulu90@...il.com>,
        <benve@...co.com>
Subject: Re: [PATCH net] enic: add wq clean up budget

On Fri, 8 Dec 2017, Govindarajulu Varadarajan wrote:

> On Wed, 6 Dec 2017, David Miller wrote:
>
>> From: Govindarajulu Varadarajan <gvaradar@...co.com>
>> Date: Tue,  5 Dec 2017 11:14:41 -0800
>> 
>>> In case of tx clean up, we set '-1' as budget. This means clean up until
>>> wq is empty or till (1 << 32) pkts are cleaned. Under heavy load this
>>> will run for long time and cause
>>> "watchdog: BUG: soft lockup - CPU#25 stuck for 21s!" warning.
>>> 
>>> This patch sets wq clean up budget to 256.
>>> 
>>> Signed-off-by: Govindarajulu Varadarajan <gvaradar@...co.com>
>> 
>> This driver with all of it's indirection and layers upon layers of
>> macros for queue processing is so difficult to read, and this can't
>> be generating nice optimal code either...
>> 
>> Anyways, I was walking over the driver to see if the logic is
>> contributing to this.
>> 
>> The limit you are proposing sounds unnecessary, nobody else I can
>> see needs this, and that includes all of the most heavily used
>> drivers under load.
>
> I used 256 as the limit because most of the other drivers use it.
>
> * mlx4 uses MLX4_EN_DEFAULT_TX_WORK as the tx budget in 
> mlx4_en_process_tx_cq()
>  Added in commit fbc6daf19745 ("net/mlx4_en: Ignore budget on TX napi 
> polling")
>
> * i40e&vf uses vsi->work_limit as tx budget in i40e_clean_tx_irq(), which is
>  set to I40E_DEFAULT_IRQ_WORK. Added in commit
>  a619afe814453 ("i40e/i40evf: Add support for bulk free in Tx cleanup")
>
> * ixgbe uses q_vector->tx.work_limit as tx budget in ixgbe_clean_tx_irq(),
>  which is set to IXGBE_DEFAULT_TX_WORK. Added in commit
>  592245559e900 ("ixgbe: Change default Tx work limit size to 256 buffers")
>
>> 
>> If I had to guess I'd say that the problem is that the queue loop
>> keeps sampling the head and tail pointers, where as it should just
>> do that _once_ and only process that TX entries found in that
>> snapshot and return to the poll() routine immedately afterwards.
>
> The only way to know the tail pointer at the time napi is scheduled is to 
> read
> hw fetch_index register. This is discouraged by hw engineers.
>
> We work around this by using color bit. Every cq entry has color bit. It is
> either 0 or 1. Hw flips the bit when it creates a new cq entry. So every new
> cq entry will have a different color bit than previous. We reach end of the
> queue when previous color bit is same as current cq entry's color. i.e hw did
> not flip the bit, so its not a new cq entry.
>
> So enic driver cannot know the tail pointer at the time napi is scheduled, 
> until
> we reach the tail pointer.
>

David,

How would you want us to fix this issue? Is doing an ioread on fetch_index for
every poll our only option? (to get head and tail point once)

If 256 is not reasonable, will wq_budget equal to wq ring size be acceptable?
At any point number of wq entries to be cleaned cannot be more than ring size.

Thanks
Govind