lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090422111817.GA21190@elte.hu>
Date:	Wed, 22 Apr 2009 13:18:17 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Eric Dumazet <dada1@...mosbay.com>
Cc:	Stephen Hemminger <shemminger@...tta.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Paul Mackerras <paulus@...ba.org>, paulmck@...ux.vnet.ibm.com,
	Evgeniy Polyakov <zbr@...emap.net>,
	David Miller <davem@...emloft.net>, kaber@...sh.net,
	jeff.chua.linux@...il.com, laijs@...fujitsu.com,
	jengelh@...ozas.de, r000n@...0n.net, linux-kernel@...r.kernel.org,
	netfilter-devel@...r.kernel.org, netdev@...r.kernel.org,
	benh@...nel.crashing.org, mathieu.desnoyers@...ymtl.ca
Subject: Re: [PATCH] netfilter: use per-cpu recursive lock (v11)


* Eric Dumazet <dada1@...mosbay.com> wrote:

> netfilter in 2.6.2[0-9] used :
> 
> CPU 1
> 
> sofirq handles one packet from a NIC
> 
> ipt_do_table() /* to handle INPUT table for example, or FORWARD */
> read_lock_bh(&a_central_and_damn_rwlock)
> ... parse rules
>     -> calling some netfilter sub-function
>        re-entering network IP stack to send some packet (say a RST packet)
>        ...
>        ipt_do_table() /* to handle OUPUT table rules for example */
>        read_lock_bh() ; /* WE RECURSE here, but once in a while (if ever) */
> 
> This is one of the case, but other can happens with virtual 
> networks, tunnels, ... and so on. (Stephen had some cases with KVM 
> if I remember well)
> 
> If this could be done without recursion, I am pretty sure 
> netfilter and network guys would have done it. I found Linus 
> reaction quite shocking IMHO, considering hard work done by all 
> people on this.

Again ... i dont know this code well, but you yourself describe it 
as "a_central_and_damn_rwlock" above.

"Central and damn" locks of any type, anywhere tend to cause such 
trouble.

Often they are mini-BKL's in the making. Here's some historic 
patterns:

 1- First there's a convenient lock around a popular and useful 
    piece of data and code.

 2- As popularity (and reach of code) grows, and some non-trivial 
    interaction ensues, a little bit of self-recursion is added to 
    the mix (this is often easier to do than to fix the root cause 
    of the problem) - making critical sections even easier to grow 
    in size.

 3- Then attempts are made to make it scale all (it's a popular 
    piece of code), it's extended along a per-cpu axis, but 
    complexity of locking explode by taking a ton of locks all at 
    once. The solution: yell at the lock validator for not allowing
    this (or for exploding due to the sheer mathematically 
    large complexity of the locking rules). Frequent requests for
    an exclusion from those pesky validations are made, and various
    hacks are done to work it around. It's all the fault of the lock
    validator, of course.

 4- At this point it's rarely a clean, tidy data lock anymore - it 
    tends to grow into a code lock nobody really knows how it works, 
    except that it better be taken more often than not, badness may
    ensue otherwise. Nobody really knows when to take it, only that 
    it should be taken widely enough, that it should be recursive 
    enough to call even _more_ code from under it. Efforts to reduce 
    critical section length are rebuffed with: "this adds unlocking
    and relocking overhead". And it should then all scale as well.

 5- In the end it's a lock everyone curses but nobody is able to fix 
    anymore. "If it was easy to fix we'd have fixed it long ago 
    already" kind of fatalist thinking becomes widespread.

We saw many examples of this in the past: beyond the BKL (which is a 
bit special as its more of an UP legacy - but which hurt us the 
most), we had the tasklist_lock which was problematic for a long 
time, then there's also the reiser3 locking which was extremely 
monolithic. [ i'm only naming safe examples here, where nobody will 
flame me =B-) ]

Whether this particular case applies is up to you. I see certain 
matches up to phase 3 or so - but i also see certain dissimilarities 
as well. Nor do i claim that it's easy to fix or improve.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ