lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 28 Oct 2008 14:28:21 -0700
From:	Stephen Hemminger <shemminger@...tta.com>
To:	Eric Dumazet <dada1@...mosbay.com>
Cc:	David Miller <davem@...emloft.net>, benny+usenet@...rsen.dk,
	minyard@....org, netdev@...r.kernel.org,
	paulmck@...ux.vnet.ibm.com, Christoph Lameter <clameter@....com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Evgeniy Polyakov <johnpol@....mipt.ru>
Subject: Re: [PATCH 0/2] udp: Convert the UDP hash lock to RCU

On Tue, 28 Oct 2008 21:37:15 +0100
Eric Dumazet <dada1@...mosbay.com> wrote:

> UDP sockets are hashed in a 128 slots hash table.
> 
> This hash table is protected by *one* rwlock.
> 
> This rwlock is readlocked each time an incoming UDP message is handled.
> 
> This rwlock is writelocked each time a socket must be inserted in
> hash table (bind time), or deleted from this table (unbind time)
> 
> This is not scalable on SMP machines :
> 
> 1) Even in read mode, lock() and unlock() are atomic operations and
> must dirty a contended cache line, shared by all cpus.
> 
> 2) A writer might be starved if many readers are 'in flight'. This can
> happen on a machine with some NIC receiving many UDP messages. User
> process can be delayed a long time at socket creation/dismantle time.
> 
> 
> What Corey and I propose is to use RCU to protect this hash table.
> 
> Goals are :
> 
> 1) Optimizing handling of incoming Unicast UDP frames, so that no memory
> writes should happen in the fast path. Using an array of rwlocks (one per
> slot for example is not an option in this regard)
> 
> Note: Multicasts and broadcasts still will need to take a lock,
> because doing a full lockless lookup in this case is difficult.
> 
> 2) No expensive operations in the socket bind/unhash phases :
>   - No expensive synchronize_rcu() calls.
> 
>   - No added rcu_head in socket structure, increasing memory needs,
>   but more important, forcing us to use call_rcu() calls,
>   that have the bad property of making sockets structure cold.
>   (rcu grace period between socket freeing and its potential reuse
>    make this socket being cold in CPU cache).
>   David did a previous patch using call_rcu() and noticed a 20%
>   impact on TCP connection rates.
> 
>   Quoting Cristopher Lameter :
>   "Right. That results in cacheline cooldown. You'd want to recycle
>    the object as they are cache hot on a per cpu basis. That is screwed
>    up by the delayed regular rcu processing. We have seen multiple
>    regressions due to cacheline cooldown.
>    The only choice in cacheline hot sensitive areas is to deal with the
>    complexity that comes with SLAB_DESTROY_BY_RCU or give up on RCU."
> 
>   - Because udp sockets are allocated from dedicated kmem_cache,
>   use of SLAB_DESTROY_BY_RCU can help here.
> 
> Theory of operation :
> ---------------------
> 
> As the lookup is lockfree (using rcu_read_lock()/rcu_read_unlock()),
> special attention must be taken by readers and writers.
> 
> Use of SLAB_DESTROY_BY_RCU is tricky too, because a socket can be freed,
> reused, inserted in a different chain or in worst case in the same chain
> while readers could do lookups in the same time.
> 
> In order to avoid loops, a reader must check each socket found in a chain
> really belongs to the chain the reader was traversing. If it finds a
> mismatch, lookup must start again at the begining. This *restart* loop
> is the reason we had to use rdlock for the multicast case, because
> we dont want to send same message several times to the same socket.
> 
> We use RCU only for fast path. Thus, /proc/net/udp still take rdlocks.

We should just make it a spin_lock later and speed up udp socket creation.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ