netdev - Re: [PATCH v1 net-next 2/6] net: napi: add CPU affinity to napi->config

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d07c513c-a12a-4163-a582-a3d88bbbf4a3@intel.com>
Date: Wed, 11 Dec 2024 09:33:12 -0700
From: Ahmed Zaki <ahmed.zaki@...el.com>
To: Joe Damato <jdamato@...tly.com>, <netdev@...r.kernel.org>,
	<intel-wired-lan@...ts.osuosl.org>, <andrew+netdev@...n.ch>,
	<edumazet@...gle.com>, <kuba@...nel.org>, <pabeni@...hat.com>,
	<davem@...emloft.net>, <michael.chan@...adcom.com>, <tariqt@...dia.com>,
	<anthony.l.nguyen@...el.com>, <przemyslaw.kitszel@...el.com>
Subject: Re: [PATCH v1 net-next 2/6] net: napi: add CPU affinity to
 napi->config



On 2024-12-09 6:29 p.m., Joe Damato wrote:
> On Mon, Dec 09, 2024 at 05:26:22PM -0700, Ahmed Zaki wrote:
>> A common task for most drivers is to remember the user's CPU affinity to
>> its IRQs. On each netdev reset, the driver must then re-assign the
>> user's setting to the IRQs.
>>
>> Add CPU affinity mask to napi->config. To delegate the CPU affinity
>> management to the core, drivers must:
>>   1 - add a persistent napi config:     netif_napi_add_config()
>>   2 - bind an IRQ to the napi instance: netif_napi_set_irq()
>>
>> the core will then make sure to use re-assign affinity to the napi's
>> IRQ.
>>
>> The default mask set to all IRQs is all online CPUs.
>>
>> Suggested-by: Jakub Kicinski <kuba@...nel.org>
>> Signed-off-by: Ahmed Zaki <ahmed.zaki@...el.com>
>> ---
> 
> [...]
> 
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 6ef9eb401fb2..778ba27d2b83 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -6699,11 +6699,35 @@ void netif_queue_set_napi(struct net_device *dev, unsigned int queue_index,
>>   }
>>   EXPORT_SYMBOL(netif_queue_set_napi);
>>   
>> +static void
>> +netif_napi_affinity_notify(struct irq_affinity_notify *notify,
>> +			   const cpumask_t *mask)
>> +{
>> +	struct napi_struct *napi =
>> +		container_of(notify, struct napi_struct, affinity_notify);
>> +
>> +	if (napi->config)
>> +		cpumask_copy(&napi->config->affinity_mask, mask);
>> +}
>> +
>> +static void
>> +netif_napi_affinity_release(struct kref __always_unused *ref)
>> +{
>> +}
>> +
>>   static void napi_restore_config(struct napi_struct *n)
>>   {
>>   	n->defer_hard_irqs = n->config->defer_hard_irqs;
>>   	n->gro_flush_timeout = n->config->gro_flush_timeout;
>>   	n->irq_suspend_timeout = n->config->irq_suspend_timeout;
>> +
>> +	if (n->irq > 0 && n->irq_flags & NAPIF_F_IRQ_AFFINITY) {
>> +		n->affinity_notify.notify = netif_napi_affinity_notify;
>> +		n->affinity_notify.release = netif_napi_affinity_release;
>> +		irq_set_affinity_notifier(n->irq, &n->affinity_notify);
>> +		irq_set_affinity(n->irq, &n->config->affinity_mask);
>> +	}
>> +
>>   	/* a NAPI ID might be stored in the config, if so use it. if not, use
>>   	 * napi_hash_add to generate one for us. It will be saved to the config
>>   	 * in napi_disable.
>> @@ -6720,6 +6744,8 @@ static void napi_save_config(struct napi_struct *n)
>>   	n->config->gro_flush_timeout = n->gro_flush_timeout;
>>   	n->config->irq_suspend_timeout = n->irq_suspend_timeout;
>>   	n->config->napi_id = n->napi_id;
>> +	if (n->irq > 0 && n->irq_flags & NAPIF_F_IRQ_AFFINITY)
>> +		irq_set_affinity_notifier(n->irq, NULL);
> 
> My understanding when I attempted this was that using generic IRQ
> notifiers breaks ARFS [1], because IRQ notifiers only support a
> single notifier and so drivers with ARFS can't _also_ set their own
> notifiers for that.

Thank you for the review and reply. I was wondering why some drivers 
check for ARFS (in buggy ways) before setting affinity notifiers. I now 
have a better understanding.

> 
> Two ideas were proposed in the thread I mentioned:
>    1. Have multiple notifiers per IRQ so that having a generic core
>       based notifier wouldn't break ARFS.
>    2. Jakub mentioned calling cpu_rmap_update from the core so that a
>       generic solution wouldn't be blocked.
> 
> I don't know anything about option 1, so I looked at option 2.
> 
> At the time when I read the code, it seemed that cpu_rmap_update
> required some state be passed in (struct irq_glue), so in that case,
> the only way to call cpu_rmap_update from the core would be to
> maintain some state about ARFS in the core, too, so that drivers
> which support ARFS won't be broken by this change.
> 
> At that time there was no persistent per-NAPI config, but since
> there is now, there might be a way to solve this.
> 
> Just guessing here, but maybe one way to solve this would be to move
> ARFS into the core by:
>    - Adding a new bit in addition to NAPIF_F_IRQ_AFFINITY... I don't
>      know NAPIF_F_ARFS_AFFINITY or something? so that drivers
>      could express that they support ARFS.
>    - Remove the driver calls to irq_cpu_rmap_add and make sure to
>      pass the new bit in for drivers that support ARFS (in your
>      changeset, I believe that would be at least ice, mlx4, and
>      bnxt... possibly more?).
>    - In the generic core code, if the ARFS bit is set then you pass
>      in the state needed for ARFS to work, otherwise do what the
>      proposed code is doing now.
> 
> But, that's just a guess. Maybe there's a better way.

I will look into all of this and send a new version, but yes it is clear 
that the core needs to manage ARFS rmap creation and updates beside the 
affinity restoration.

Ahmed