lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 23 Sep 2008 14:10:21 -0700
From:	"Tom Herbert" <therbert@...gle.com>
To:	"Chris Friesen" <cfriesen@...tel.com>
Cc:	"David Miller" <davem@...emloft.net>, linux-kernel@...r.kernel.org,
	netdev@...r.kernel.org, jens.axboe@...cle.com,
	steffen.klassert@...unet.com
Subject: Re: [PATCH 0/2]: Remote softirq invocation infrastructure.

>
> That patch basically just picks an arbitrary cpu for each flow.  This would
> spread the load out across cpus, but it doesn't allow any input from
> userspace.
>

We've been running softRSS for a while
(http://marc.info/?l=linux-netdev&m=120475045519940&w=2) which I
believe has very similar functionality to this patch.  From this work
we found some nice ways to improve scaling that might be applicable:

- When routing packets to CPU based on hash, sending to another CPU
sharing L2 or L3 cache is best performance.
- We added a simple functionality to route packets to the CPU on which
the application last did a read for the socket.  This seems to be a
win for cache locality.
- We added a lookup table that maps the Toeplitz hash to the receiving
CPU where the application is running.  This is for those devices that
provide the Toeplitz hash in the receive descriptor.  This is a win
since the CPU receiving the interrupt doesn't need to take any cache
misses on the packet itself.
- In our (preliminary) 10G testing we found that routing packets in
software with the the above trick actually allows higher PPS and
better CPU utilization than using hardware RSS.  Also, using both the
software routing and hardware RSS yields the best results.

Tom

> We have a current application where there are 16 cores and 16 threads. They
> would really like to be able to pin one thread to each core and tell the
> kernel what packets they're interested in so that the kernel can process
> those packets on that core to gain the maximum caching benefit as well as
> reduce reordering issues.  In our case the hardware supports filtering for
> multiqueues, so we could pass this information down to the hardware to avoid
> software filtering.
>
> Either way, it requires some way for userspace to indicate interest in a
> particular flow.  Has anyone given any thought to what an API like this
> would look like?
>
> I suppose we could automatically look at bound network sockets owned by
> tasks that are affined to single cpus.  This would simplify userspace but
> would reduce flexibility for things like packet sockets with socket filters
> applied.
>
> Chris
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ