[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <47003267-35b5-446e-aaca-f775b71bd01f@redhat.com>
Date: Fri, 15 Mar 2024 10:19:55 +0100
From: Ivan Vecera <ivecera@...hat.com>
To: Jesse Brandeburg <jesse.brandeburg@...el.com>, netdev@...r.kernel.org
Cc: pawel.chmielewski@...el.com, aleksandr.loktionov@...el.com,
mschmidt@...hat.com, Hugo Ferreira <hferreir@...hat.com>,
Tony Nguyen <anthony.l.nguyen@...el.com>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
"moderated list:INTEL ETHERNET DRIVERS" <intel-wired-lan@...ts.osuosl.org>,
open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net] i40e: Enforce software interrupt during busy-poll
exit
On 15. 03. 24 1:53, Jesse Brandeburg wrote:
> On 3/13/2024 5:54 AM, Ivan Vecera wrote:
>> As for ice bug fixed by commit b7306b42beaf ("ice: manage interrupts
>> during poll exit") I'm seeing the similar issue also with i40e driver.
>>
>> In certain situation when busy-loop is enabled together with adaptive
>> coalescing, the driver occasionally miss that there are outstanding
>> descriptors to clean when exiting busy poll.
>>
>> Try to catch the remaining work by triggering a software interrupt
>> when exiting busy poll. No extra interrupts will be generated when
>> busy polling is not used.
>>
>> The issue was found when running sockperf ping-pong tcp test with
>> adaptive coalescing and busy poll enabled (50 as value busy_pool
>> and busy_read sysctl knobs) and results in huge latency spikes
>> with more than 100000us.
>
> I like the results of this fix! Thanks for working on it.
>
>>
>> The fix is inspired from the ice driver and do the following:
>> 1) During napi poll exit in case of busy-poll (napo_complete_done()
>> returns false) this is recorded to q_vector that we were in busy
>> loop.
>> 2) In i40e_update_enable_itr()
>> - updates refreshed ITR intervals directly using PFINT_ITRN register
>> - if we are exiting ordinary poll then just enables the interrupt
>> using PFINT_DYN_CTLN
>> - if we are exiting busy poll then enables the interrupt and
>> additionally triggers an immediate software interrupt to catch any
>> pending clean-ups
>> 3) Reuses unused 3rd ITR (interrupt throttle) index and set it to
>> 20K interrupts per second to limit the number of these sw interrupts.
>
> This is a good idea.
>
>>
>> @@ -2702,8 +2716,8 @@ static inline void i40e_update_enable_itr(struct i40e_vsi *vsi,
>> */
>> if (q_vector->rx.target_itr < q_vector->rx.current_itr) {
>> /* Rx ITR needs to be reduced, this is highest priority */
>> - intval = i40e_buildreg_itr(I40E_RX_ITR,
>> - q_vector->rx.target_itr);
>> + wr32(hw, I40E_PFINT_ITRN(I40E_RX_ITR, q_vector->reg_idx),
>> + q_vector->rx.target_itr >> 1);
>
> so here you write (this is a new write)
>
>> q_vector->rx.current_itr = q_vector->rx.target_itr;
>> q_vector->itr_countdown = ITR_COUNTDOWN_START;
>> } else if ((q_vector->tx.target_itr < q_vector->tx.current_itr) ||
>> @@ -2712,25 +2726,33 @@ static inline void i40e_update_enable_itr(struct i40e_vsi *vsi,
>> /* Tx ITR needs to be reduced, this is second priority
>> * Tx ITR needs to be increased more than Rx, fourth priority
>> */
>> - intval = i40e_buildreg_itr(I40E_TX_ITR,
>> - q_vector->tx.target_itr);
>> + wr32(hw, I40E_PFINT_ITRN(I40E_TX_ITR, q_vector->reg_idx),
>> + q_vector->tx.target_itr >> 1);
>> q_vector->tx.current_itr = q_vector->tx.target_itr;
>> q_vector->itr_countdown = ITR_COUNTDOWN_START;
>> } else if (q_vector->rx.current_itr != q_vector->rx.target_itr) {
>> /* Rx ITR needs to be increased, third priority */
>> - intval = i40e_buildreg_itr(I40E_RX_ITR,
>> - q_vector->rx.target_itr);
>> + wr32(hw, I40E_PFINT_ITRN(I40E_RX_ITR, q_vector->reg_idx),
>> + q_vector->rx.target_itr >> 1);
>
> or here (new write)
>
>> q_vector->rx.current_itr = q_vector->rx.target_itr;
>> q_vector->itr_countdown = ITR_COUNTDOWN_START;
>> } else {
>> /* No ITR update, lowest priority */
>> - intval = i40e_buildreg_itr(I40E_ITR_NONE, 0);
>> if (q_vector->itr_countdown)
>> q_vector->itr_countdown--;
>> }
>>
>> - if (!test_bit(__I40E_VSI_DOWN, vsi->state))
>> - wr32(hw, INTREG(q_vector->reg_idx), intval);
>
> The above used to be the *only* write.
>
>> + /* Do not enable interrupt if VSI is down */
>> + if (test_bit(__I40E_VSI_DOWN, vsi->state))
>> + return;
>> +
>> + if (!q_vector->in_busy_poll) {
>> + intval = i40e_buildreg_itr(I40E_ITR_NONE, 0);
>> + } else {
>> + q_vector->in_busy_poll = false;
>> + intval = i40e_buildreg_swint(I40E_SW_ITR);
>> + }
>> + wr32(hw, I40E_PFINT_DYN_CTLN(q_vector->reg_idx), intval);
>
> and then you write again here.
>
> So this function will now regularly have two writes in hot-path. Before
> it was very carefully crafted to reduce the number of writes to 1.
>
> This is made possible because the PFINT_DYN_CTLN register can do
> multiple tasks at once with a single write.
>
> Can you just modify intval to *both* trigger a software interrupt, and
> update the ITR simultaneously? I'm really not sure that's even possible.
>
> It may make more sense to only do the second write when exiting busy
> poll, what do you think?
Yeah, you are right, we can eliminate these two writes by one and also
for busy-poll exit. I'm setting up ITR2_IDX rate during MSI-X
initialization and as this is fixed we do not need to update it
everytime in i40e_update_enable_itr().
Per datasheet the PFINT_DYN_CTLN value can be encoded to do the
following at once:
- enable interrupt
- update interval for particular ITR index
- software interrupt trigger limited by interval of different ITR index
Will prepare, test and submit v3 with this change.
Thanks,
Ivan
Powered by blists - more mailing lists