[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1291582432.2806.300.camel@edumazet-laptop>
Date: Sun, 05 Dec 2010 21:53:52 +0100
From: Eric Dumazet <eric.dumazet@...il.com>
To: David Miller <davem@...emloft.net>
Cc: hagen@...u.net, xiaosuo@...il.com, wirelesser@...il.com,
netdev@...r.kernel.org, Pavel Emelyanov <xemul@...nvz.org>,
stable@...nel.org, "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Subject: PATCH] filter: fix sk_filter rcu handling
Le vendredi 03 décembre 2010 à 07:32 +0100, Eric Dumazet a écrit :
> Le mercredi 01 décembre 2010 à 10:18 -0800, David Miller a écrit :
>
> > However, I think it's still valuable to write a few JIT compilers for
> > the existing BPF stuff. I considered working on a sparc64 JIT just to
> > see what it would look like.
> >
> > If people work on the BPF optimizer and BPF JITs in parallel, we'll have
> > both ready at the same time. win++
>
> I began work on implementing a BPF JIT for x86_64
>
> My plan is to use external helpers to load skb data/metadata, to keep
> BPF program very short and have no dependencies against struct layouts.
>
> These helpers would be the three load_word, load_half, load_byte.
>
> In case the bits are in skb head, these helpers should be fast.
>
> For practical reasons, they would be in ASM for their fast path, and C
> for the slow path. They are ASM because they are able to perform the
> shortcut (in case of error, doing the stack unwind to perform the
> "return 0;") so that we dont have to test their return from the JIT
> program.
>
>
While working on this, I found an annoying problem with current code.
This patch is a stable candidate.
Thanks
[PATCH] filter: fix sk_filter rcu handling
Pavel Emelyanov tried to fix a race between sk_filter_(de|at)tach and
sk_clone() in commit 47e958eac280c263397
Problem is we can have several clones sharing a common sk_filter, and
these clones might want to sk_filter_attach() their own filters at the
same time, and can overwrite old_filter->rcu, corrupting RCU queues.
We can not use filter->rcu without being sure no other thread could do
the same thing.
Switch code to a more conventional ref-counting technique : Do the
atomic decrement immediately and queue one rcu call back when last
reference is released.
Signed-off-by: Eric Dumazet <eric.dumazet@...il.com>
Cc: Pavel Emelyanov <xemul@...nvz.org>
Cc: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
Cc: stable@...nel.org
---
include/net/sock.h | 4 +++-
net/core/filter.c | 19 ++++++-------------
2 files changed, 9 insertions(+), 14 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index a6338d0..4308af7 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1155,6 +1155,8 @@ extern void sk_common_release(struct sock *sk);
/* Initialise core socket variables */
extern void sock_init_data(struct socket *sock, struct sock *sk);
+extern void sk_filter_release_rcu(struct rcu_head *rcu);
+
/**
* sk_filter_release - release a socket filter
* @fp: filter to remove
@@ -1165,7 +1167,7 @@ extern void sock_init_data(struct socket *sock, struct sock *sk);
static inline void sk_filter_release(struct sk_filter *fp)
{
if (atomic_dec_and_test(&fp->refcnt))
- kfree(fp);
+ call_rcu_bh(&fp->rcu, sk_filter_release_rcu);
}
static inline void sk_filter_uncharge(struct sock *sk, struct sk_filter *fp)
diff --git a/net/core/filter.c b/net/core/filter.c
index c1ee800..ae21a0d 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -589,23 +589,16 @@ int sk_chk_filter(struct sock_filter *filter, int flen)
EXPORT_SYMBOL(sk_chk_filter);
/**
- * sk_filter_rcu_release - Release a socket filter by rcu_head
+ * sk_filter_release_rcu - Release a socket filter by rcu_head
* @rcu: rcu_head that contains the sk_filter to free
*/
-static void sk_filter_rcu_release(struct rcu_head *rcu)
+void sk_filter_release_rcu(struct rcu_head *rcu)
{
struct sk_filter *fp = container_of(rcu, struct sk_filter, rcu);
- sk_filter_release(fp);
-}
-
-static void sk_filter_delayed_uncharge(struct sock *sk, struct sk_filter *fp)
-{
- unsigned int size = sk_filter_len(fp);
-
- atomic_sub(size, &sk->sk_omem_alloc);
- call_rcu_bh(&fp->rcu, sk_filter_rcu_release);
+ kfree(fp);
}
+EXPORT_SYMBOL(sk_filter_release_rcu);
/**
* sk_attach_filter - attach a socket filter
@@ -649,7 +642,7 @@ int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
rcu_assign_pointer(sk->sk_filter, fp);
if (old_fp)
- sk_filter_delayed_uncharge(sk, old_fp);
+ sk_filter_uncharge(sk, old_fp);
return 0;
}
EXPORT_SYMBOL_GPL(sk_attach_filter);
@@ -663,7 +656,7 @@ int sk_detach_filter(struct sock *sk)
sock_owned_by_user(sk));
if (filter) {
rcu_assign_pointer(sk->sk_filter, NULL);
- sk_filter_delayed_uncharge(sk, filter);
+ sk_filter_uncharge(sk, filter);
ret = 0;
}
return ret;
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists