lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 18 Mar 2009 22:58:22 -0700 (PDT)
From:	David Miller <davem@...emloft.net>
To:	dada1@...mosbay.com
Cc:	sven@...bigcorporation.com, ghaskins@...ell.com, vernux@...ibm.com,
	andi@...stfloor.org, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-rt-users@...r.kernel.org,
	pmullaney@...ell.com
Subject: Re: High contention on the sk_buff_head.lock

From: Eric Dumazet <dada1@...mosbay.com>
Date: Thu, 19 Mar 2009 06:49:19 +0100

> Still, there is room for improvements, since :
> 
> 1) default qdisc is pfifo_fast. This beast uses three sk_buff_head (96 bytes)
>   where it could use 3 smaller list_head (3 * 16 = 48 bytes on x86_64)
> 
>  (assuming sizeof(spinlock_t) is only 4 bytes, but it's more than that
>  on various situations (LOCKDEP, ...)

I already plan on doing this, skb->{next,prev} will be replaced with a
list_head and nearly all of the sk_buff_head usage will simply
disappear.  It's a lot of work because every piece of SKB queue
handling code has to be sanitized to only use the interfaces in
linux/skbuff.h and lots of extremely ugly code like the PPP
defragmenter make many non-trivial direct skb->{next,prev}
manipulations.

> 2) struct Qdisc layout could be better, letting read mostly fields
>    at beginning of structure. (ie move 'dev_queue', 'next_sched', reshape_fail,
>    u32_node, __parent, ...)

I have no problem with your struct layout changes, submit it formally.

> 3) In stress situation a CPU A queues a skb to a sk_buff_head, but a CPU B
>    dequeues it to feed device, involving an expensive cache line miss
>    on the skb.{next|prev} (to set them to NULL)
> 
>    We could:
>       Use a special dequeue op that doesnt touch skb.{next|prev}
>    Eventually set next/prev to NULL after q.lock is released

You absolutely can't do this, as it would break GSO/GRO.  The whole
transmit path is littered with checks of skb->next being NULL for the
purposes of segmentation handling.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ