netdev - RE: [PATCH] Packet socket: mmapped IO: PACKET_TX

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <FCC0EC655BD1AE408C047268D1F5DF4C3BA85A83@NASANEXMB10.na.qualcomm.com>
Date:	Thu, 30 Oct 2008 11:21:22 -0700
From:	"Lovich, Vitali" <vlovich@...lcomm.com>
To:	Johann Baudy <johaahn@...il.com>,
	David Miller <davem@...emloft.net>
CC:	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: [PATCH] Packet socket: mmapped IO: PACKET_TX_RING

Hey Johann,

There's no need to keep the index (packet_index).  Just store the pointer directly (change it to void *) - saves an extra lookup.  Also, I don't think setting TP_STATUS_COPY is necessary since the user can't really do anything with that information.  Simply leave it at TP_STATUS_USER & TP_STATUS_KERNEL.

Also, when you set the status back to TP_STATUS_KERNEL in the destructor, you need
 to add the following barriers:

__packet_set_status(po, ph, TP_STATUS_KERNEL);
smp_mb();   // make sure the TP_STATUS_KERNEL was actually written to memory before this - couldn't this actually be just a smp_wmb?
flush_dcache_page(virt_to_page(ph));  // on non-x86 architectures like ARM that have a moronic cache (i.e cache by virtual rather than physical address). on x86 this is a noop.

Also, I think that I looked at your code while working on my version and there may have been some logical problems with the way you're building packets.  I'll compare it within the next few days as I start cleaning up my code.

The atomic_inc of pending_skb should happen after the skb was allocated - the atomic_dec in out_status can also be removed.  out_status can be removed completely if you get rid of TP_STATUS_COPY.  If you leave it, you still need the barriers as above after changing tp_status, or the user may not see the change.  Also, you've got a potential threading issue - you're not protecting the frame index behind the spinlock.

As a benchmark on a 10G card (and not performing any optimizations like using syspart & dedicating a cpu for tx), I was able to hit 8.6 GBits/s using a dedicated kernel thread for the transmission.  With the dedicated CPU, I'm confident the line-rate will go up significantly

.I'll try to test your changes within the next few days.  TCPdump maxes out at around 1.5 GBits/s

As for CPU usage, there's a noticeable advantage to traditional send.  Using tcpreplay  - there's about 80% CPU utilization when sending.  Using the tx ring, there's maybe 10-15%.

I'm going to have latency numbers soon as well (i.e. how much jitter is introduced by the kernel).

Vitali--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html