netdev - Re: [PATCH net] i40e: Enforce software interrupt during busy-poll exit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0249d506-6ab2-485a-b95f-6e32e5a92d9e@intel.com>
Date: Thu, 14 Mar 2024 17:53:26 -0700
From: Jesse Brandeburg <jesse.brandeburg@...el.com>
To: Ivan Vecera <ivecera@...hat.com>, <netdev@...r.kernel.org>
CC: <pawel.chmielewski@...el.com>, <aleksandr.loktionov@...el.com>,
	<mschmidt@...hat.com>, Hugo Ferreira <hferreir@...hat.com>, Tony Nguyen
	<anthony.l.nguyen@...el.com>, "David S. Miller" <davem@...emloft.net>, "Eric
 Dumazet" <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni
	<pabeni@...hat.com>, "moderated list:INTEL ETHERNET DRIVERS"
	<intel-wired-lan@...ts.osuosl.org>, open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net] i40e: Enforce software interrupt during busy-poll
 exit

On 3/13/2024 5:54 AM, Ivan Vecera wrote:
> As for ice bug fixed by commit b7306b42beaf ("ice: manage interrupts
> during poll exit") I'm seeing the similar issue also with i40e driver.
> 
> In certain situation when busy-loop is enabled together with adaptive
> coalescing, the driver occasionally miss that there are outstanding
> descriptors to clean when exiting busy poll.
> 
> Try to catch the remaining work by triggering a software interrupt
> when exiting busy poll. No extra interrupts will be generated when
> busy polling is not used.
> 
> The issue was found when running sockperf ping-pong tcp test with
> adaptive coalescing and busy poll enabled (50 as value busy_pool
> and busy_read sysctl knobs) and results in huge latency spikes
> with more than 100000us.

I like the results of this fix! Thanks for working on it.

> 
> The fix is inspired from the ice driver and do the following:
> 1) During napi poll exit in case of busy-poll (napo_complete_done()
>    returns false) this is recorded to q_vector that we were in busy
>    loop.
> 2) In i40e_update_enable_itr()
>    - updates refreshed ITR intervals directly using PFINT_ITRN register
>    - if we are exiting ordinary poll then just enables the interrupt
>      using PFINT_DYN_CTLN
>    - if we are exiting busy poll then enables the interrupt and
>      additionally triggers an immediate software interrupt to catch any
>      pending clean-ups
> 3) Reuses unused 3rd ITR (interrupt throttle) index and set it to
>    20K interrupts per second to limit the number of these sw interrupts.

This is a good idea.

> 
> @@ -2702,8 +2716,8 @@ static inline void i40e_update_enable_itr(struct i40e_vsi *vsi,
>  	 */
>  	if (q_vector->rx.target_itr < q_vector->rx.current_itr) {
>  		/* Rx ITR needs to be reduced, this is highest priority */
> -		intval = i40e_buildreg_itr(I40E_RX_ITR,
> -					   q_vector->rx.target_itr);
> +		wr32(hw, I40E_PFINT_ITRN(I40E_RX_ITR, q_vector->reg_idx),
> +		     q_vector->rx.target_itr >> 1);

so here you write (this is a new write)

>  		q_vector->rx.current_itr = q_vector->rx.target_itr;
>  		q_vector->itr_countdown = ITR_COUNTDOWN_START;
>  	} else if ((q_vector->tx.target_itr < q_vector->tx.current_itr) ||
> @@ -2712,25 +2726,33 @@ static inline void i40e_update_enable_itr(struct i40e_vsi *vsi,
>  		/* Tx ITR needs to be reduced, this is second priority
>  		 * Tx ITR needs to be increased more than Rx, fourth priority
>  		 */
> -		intval = i40e_buildreg_itr(I40E_TX_ITR,
> -					   q_vector->tx.target_itr);
> +		wr32(hw, I40E_PFINT_ITRN(I40E_TX_ITR, q_vector->reg_idx),
> +		     q_vector->tx.target_itr >> 1);
>  		q_vector->tx.current_itr = q_vector->tx.target_itr;
>  		q_vector->itr_countdown = ITR_COUNTDOWN_START;
>  	} else if (q_vector->rx.current_itr != q_vector->rx.target_itr) {
>  		/* Rx ITR needs to be increased, third priority */
> -		intval = i40e_buildreg_itr(I40E_RX_ITR,
> -					   q_vector->rx.target_itr);
> +		wr32(hw, I40E_PFINT_ITRN(I40E_RX_ITR, q_vector->reg_idx),
> +		     q_vector->rx.target_itr >> 1);

or here (new write)

>  		q_vector->rx.current_itr = q_vector->rx.target_itr;
>  		q_vector->itr_countdown = ITR_COUNTDOWN_START;
>  	} else {
>  		/* No ITR update, lowest priority */
> -		intval = i40e_buildreg_itr(I40E_ITR_NONE, 0);
>  		if (q_vector->itr_countdown)
>  			q_vector->itr_countdown--;
>  	}
>  
> -	if (!test_bit(__I40E_VSI_DOWN, vsi->state))
> -		wr32(hw, INTREG(q_vector->reg_idx), intval);

The above used to be the *only* write.

> +	/* Do not enable interrupt if VSI is down */
> +	if (test_bit(__I40E_VSI_DOWN, vsi->state))
> +		return;
> +
> +	if (!q_vector->in_busy_poll) {
> +		intval = i40e_buildreg_itr(I40E_ITR_NONE, 0);
> +	} else {
> +		q_vector->in_busy_poll = false;
> +		intval = i40e_buildreg_swint(I40E_SW_ITR);
> +	}
> +	wr32(hw, I40E_PFINT_DYN_CTLN(q_vector->reg_idx), intval);

and then you write again here.

So this function will now regularly have two writes in hot-path. Before
it was very carefully crafted to reduce the number of writes to 1.

This is made possible because the PFINT_DYN_CTLN register can do
multiple tasks at once with a single write.

Can you just modify intval to *both* trigger a software interrupt, and
update the ITR simultaneously? I'm really not sure that's even possible.

It may make more sense to only do the second write when exiting busy
poll, what do you think?

If there is no way to get the functionality without the two writes
everytime I'd put up with it, but we can have a lot of interrupts per
second, per queue (especially with adaptive disabled and rate set to a
small number like 2us or 0) and doubling the number of writes will have
a performance effect.