[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d07c513c-a12a-4163-a582-a3d88bbbf4a3@intel.com>
Date: Wed, 11 Dec 2024 09:33:12 -0700
From: Ahmed Zaki <ahmed.zaki@...el.com>
To: Joe Damato <jdamato@...tly.com>, <netdev@...r.kernel.org>,
<intel-wired-lan@...ts.osuosl.org>, <andrew+netdev@...n.ch>,
<edumazet@...gle.com>, <kuba@...nel.org>, <pabeni@...hat.com>,
<davem@...emloft.net>, <michael.chan@...adcom.com>, <tariqt@...dia.com>,
<anthony.l.nguyen@...el.com>, <przemyslaw.kitszel@...el.com>
Subject: Re: [PATCH v1 net-next 2/6] net: napi: add CPU affinity to
napi->config
On 2024-12-09 6:29 p.m., Joe Damato wrote:
> On Mon, Dec 09, 2024 at 05:26:22PM -0700, Ahmed Zaki wrote:
>> A common task for most drivers is to remember the user's CPU affinity to
>> its IRQs. On each netdev reset, the driver must then re-assign the
>> user's setting to the IRQs.
>>
>> Add CPU affinity mask to napi->config. To delegate the CPU affinity
>> management to the core, drivers must:
>> 1 - add a persistent napi config: netif_napi_add_config()
>> 2 - bind an IRQ to the napi instance: netif_napi_set_irq()
>>
>> the core will then make sure to use re-assign affinity to the napi's
>> IRQ.
>>
>> The default mask set to all IRQs is all online CPUs.
>>
>> Suggested-by: Jakub Kicinski <kuba@...nel.org>
>> Signed-off-by: Ahmed Zaki <ahmed.zaki@...el.com>
>> ---
>
> [...]
>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 6ef9eb401fb2..778ba27d2b83 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -6699,11 +6699,35 @@ void netif_queue_set_napi(struct net_device *dev, unsigned int queue_index,
>> }
>> EXPORT_SYMBOL(netif_queue_set_napi);
>>
>> +static void
>> +netif_napi_affinity_notify(struct irq_affinity_notify *notify,
>> + const cpumask_t *mask)
>> +{
>> + struct napi_struct *napi =
>> + container_of(notify, struct napi_struct, affinity_notify);
>> +
>> + if (napi->config)
>> + cpumask_copy(&napi->config->affinity_mask, mask);
>> +}
>> +
>> +static void
>> +netif_napi_affinity_release(struct kref __always_unused *ref)
>> +{
>> +}
>> +
>> static void napi_restore_config(struct napi_struct *n)
>> {
>> n->defer_hard_irqs = n->config->defer_hard_irqs;
>> n->gro_flush_timeout = n->config->gro_flush_timeout;
>> n->irq_suspend_timeout = n->config->irq_suspend_timeout;
>> +
>> + if (n->irq > 0 && n->irq_flags & NAPIF_F_IRQ_AFFINITY) {
>> + n->affinity_notify.notify = netif_napi_affinity_notify;
>> + n->affinity_notify.release = netif_napi_affinity_release;
>> + irq_set_affinity_notifier(n->irq, &n->affinity_notify);
>> + irq_set_affinity(n->irq, &n->config->affinity_mask);
>> + }
>> +
>> /* a NAPI ID might be stored in the config, if so use it. if not, use
>> * napi_hash_add to generate one for us. It will be saved to the config
>> * in napi_disable.
>> @@ -6720,6 +6744,8 @@ static void napi_save_config(struct napi_struct *n)
>> n->config->gro_flush_timeout = n->gro_flush_timeout;
>> n->config->irq_suspend_timeout = n->irq_suspend_timeout;
>> n->config->napi_id = n->napi_id;
>> + if (n->irq > 0 && n->irq_flags & NAPIF_F_IRQ_AFFINITY)
>> + irq_set_affinity_notifier(n->irq, NULL);
>
> My understanding when I attempted this was that using generic IRQ
> notifiers breaks ARFS [1], because IRQ notifiers only support a
> single notifier and so drivers with ARFS can't _also_ set their own
> notifiers for that.
Thank you for the review and reply. I was wondering why some drivers
check for ARFS (in buggy ways) before setting affinity notifiers. I now
have a better understanding.
>
> Two ideas were proposed in the thread I mentioned:
> 1. Have multiple notifiers per IRQ so that having a generic core
> based notifier wouldn't break ARFS.
> 2. Jakub mentioned calling cpu_rmap_update from the core so that a
> generic solution wouldn't be blocked.
>
> I don't know anything about option 1, so I looked at option 2.
>
> At the time when I read the code, it seemed that cpu_rmap_update
> required some state be passed in (struct irq_glue), so in that case,
> the only way to call cpu_rmap_update from the core would be to
> maintain some state about ARFS in the core, too, so that drivers
> which support ARFS won't be broken by this change.
>
> At that time there was no persistent per-NAPI config, but since
> there is now, there might be a way to solve this.
>
> Just guessing here, but maybe one way to solve this would be to move
> ARFS into the core by:
> - Adding a new bit in addition to NAPIF_F_IRQ_AFFINITY... I don't
> know NAPIF_F_ARFS_AFFINITY or something? so that drivers
> could express that they support ARFS.
> - Remove the driver calls to irq_cpu_rmap_add and make sure to
> pass the new bit in for drivers that support ARFS (in your
> changeset, I believe that would be at least ice, mlx4, and
> bnxt... possibly more?).
> - In the generic core code, if the ARFS bit is set then you pass
> in the state needed for ARFS to work, otherwise do what the
> proposed code is doing now.
>
> But, that's just a guess. Maybe there's a better way.
I will look into all of this and send a new version, but yes it is clear
that the core needs to manage ARFS rmap creation and updates beside the
affinity restoration.
Ahmed
Powered by blists - more mailing lists