lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 26 May 2014 15:01:44 +0300
From:	Amir Vadai <amirv@...lanox.com>
To:	Thomas Gleixner <tglx@...utronix.de>
CC:	Ben Hutchings <ben@...adent.org.uk>,
	"David S. Miller" <davem@...emloft.net>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Or Gerlitz <ogerlitz@...lanox.com>, <idos@...lanox.com>,
	Yevgeny Petrilin <yevgenyp@...lanox.com>
Subject: Re: Extend irq_set_affinity_notifier() to use a call chain

On 5/26/2014 2:34 PM, Thomas Gleixner wrote:
> On Mon, 26 May 2014, Amir Vadai wrote:
>
>> On 5/26/2014 2:15 PM, Thomas Gleixner wrote:
>>> On Sun, 25 May 2014, Amir Vadai wrote:
>>>> In order to do that, I need to add a new irq affinity notification
>>>> callback (In addition to the existing cpu_rmap notification). For
>>>> that I would like to extend irq_set_affinity_notifier() to have a
>>>> notifier call-chain instead of a single notifier callback.
>>>
>>> Why? "I would like" is a non argument.
>>
>> Current implementation enables only one callback to be registered for irq
>> affinity change notifications.
>
> I'm well aware of that.
>
>> cpu_rmap is registered be notified - for RFS purposes.  mlx4_en (and
>> probably other network drivers) needs to be notified too, in order
>> to stop the napi polling on the old cpu and move to the new one.  To
>> enable more than 1 notification callbacks, I suggest to use a
>> notifier call chain.
>
> You are not describing what needs to be notified and why. Please
> explain the details of that and how the RFS (whatever that is) and the
> network driver are connected
The goal of RFS is to increase datacache hitrate by steering
kernel processing of packets in multi-queue devices to the CPU where the 
application thread consuming the packet is running.

In order to select the right queue, the networking stack needs to have a 
reverse map of IRQ affinty. This is the rmap that was added by Ben 
Hutchings [1]. To keep the rmap updated, cpu_rmap registers on the 
affinity notify.

This is the first affinity callback - it is located as a general library 
and not under net/...

The motivation to the second irq affinity callback is:
When traffic starts, first packet fires an interrupt which starts the 
napi polling on the cpu according the irq affinity.
If there is always packets to be consumed by the napi polling, no 
further interrupts will be fired, and napi will consume all the packets 
from the cpu it was started.
If the user changes the irq affinity, napi polling will continue to be 
done from the original cpu.
Only when the traffic will pause, napi session will be finished, and 
when traffic will resume, the new napi session will be done from the new 
cpu.
This is a problematic behavior, because from the user point of view, cpu 
affinity can't be changed in a non-stop traffic scenario.

To solve this, the network driver should be notified on irq affinity 
change event, and restart the napi session. This could be done by 
closing the napi session and arming the interrupts. Next packet arrives 
will trigger an interrupt and napi will session will start, this time on 
the new CPU.

 > and why this notification cannot be
 > propagated inside the network stack itself.

To my understanding, those are two different consumers to the same 
event, one is a general library to maintain a reverse irq affinity map, 
and the other is networking specific, and maybe even a networking driver 
specific.

[1] - c39649c lib: cpu_rmap: CPU affinity reverse-mapping

Thanks,
Amir

>
> notifier chains are almost always a clear sign for a design disaster
> and I'm not going to even think about it before I do not have a
> concice explanation of the problem at hand and why a notifier chain is
> a good solution.
>
> Thanks,
>
> 	tglx
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ