[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <49CA3542.5020601@cosmosbay.com>
Date: Wed, 25 Mar 2009 14:44:34 +0100
From: Eric Dumazet <dada1@...mosbay.com>
To: Patrick McHardy <kaber@...sh.net>
CC: Netfilter Developers <netfilter-devel@...r.kernel.org>,
Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: [PATCH] conntrack: Reduce conntrack count in nf_conntrack_free()
Patrick McHardy a écrit :
> Eric Dumazet wrote:
>> I have a litle problem on __nf_conntrack_find() being exported.
>>
>> Problem is that with SLAB_DESTROY_BY_RCU we must take a reference on
>> object
>> to recheck it. So ideally only nf_conntrack_find_get() should be used,
>> or callers of __nf_conntrack_find() should lock nf_conntrack_lock
>> (as properly done for example in net/netfilter/nf_conntrack_netlink.c,
>> line 1292)
>>
>> Here is preliminary patch for review (not tested at all, its 4h50 am
>> here :) )
>>
>> Could you help me, by checking __nf_conntrack_find() use in
>> net/netfilter/xt_connlimit.c ?
>> and line 1246 of net/netfilter/nf_conntrack_netlink.c
>>
>> This part is a litle bit gray for me. :)
>
> In case of xt_connlimit, it seems fine to just take a reference.
> In case of ctnetlink, keeping the unreferenced lookup under the
> lock seems fine. We unfortunately have to export some internals
> like nf_conntrack lock for ctnetlink anyways, so I don't think
> it would be worth to change it to take references and unexport
> the lookup function.
>
>> +/*
>> + * Warning :
>> + * - Caller must take a reference on returned object
>> + * and recheck nf_ct_tuple_equal(tuple, &h->tuple)
>> + * OR
>> + * - Caller must lock nf_conntrack_lock before calling this function
>> + */
>> struct nf_conntrack_tuple_hash *
>> __nf_conntrack_find(struct net *net, const struct nf_conntrack_tuple
>> *tuple)
>> {
>> struct nf_conntrack_tuple_hash *h;
>> - struct hlist_node *n;
>> + struct hlist_nulls_node *n;
>> unsigned int hash = hash_conntrack(tuple);
>>
>> /* Disable BHs the entire time since we normally need to disable
>> them
>> * at least once for the stats anyway.
>> */
>> local_bh_disable();
>> - hlist_for_each_entry_rcu(h, n, &net->ct.hash[hash], hnode) {
>> +begin:
>> + hlist_nulls_for_each_entry_rcu(h, n, &net->ct.hash[hash], hnnode) {
>> if (nf_ct_tuple_equal(tuple, &h->tuple)) {
>> NF_CT_STAT_INC(net, found);
>> local_bh_enable();
>> @@ -261,6 +270,13 @@ __nf_conntrack_find(struct net *net, const struct
>> nf_conntrack_tuple *tuple)
>> }
>> NF_CT_STAT_INC(net, searched);
>> }
>> + /*
>> + * if the nulls value we got at the end of this lookup is
>> + * not the expected one, we must restart lookup.
>> + * We probably met an item that was moved to another chain.
>> + */
>> + if (get_nulls_value(n) != hash)
>> + goto begin;
>
> Are you sure this is enough? An entry might have been reused and added
> to the same chain I think, so I think we need to recheck the tuple.
Yes, done in caller
>
>> local_bh_enable();
>>
>> return NULL;
>> @@ -275,11 +291,18 @@ nf_conntrack_find_get(struct net *net, const
>> struct nf_conntrack_tuple *tuple)
>> struct nf_conn *ct;
>>
>> rcu_read_lock();
>> +begin:
>> h = __nf_conntrack_find(net, tuple);
>> if (h) {
>> ct = nf_ct_tuplehash_to_ctrack(h);
>> if (unlikely(!atomic_inc_not_zero(&ct->ct_general.use)))
>> h = NULL;
>> + else {
>> + if (unlikely(!nf_ct_tuple_equal(tuple, &h->tuple))) {
>> + nf_ct_put(ct);
>> + goto begin;
>
> Ah I see, the hash comparison above is only an optimization?
>
>> + }
>> + }
>> }
>> rcu_read_unlock();
>>
>
>
check net/ipv4/udp.c for an example (__udp4_lib_lookup())
In case of UDP, key check is not returning true/false, but a score.
So UDP case is a litle bit more complex than conntrack case.
rcu_read_lock();
begin:
result = NULL;
badness = -1;
sk_nulls_for_each_rcu(sk, node, &hslot->head) {
score = compute_score(sk, net, saddr, hnum, sport,
daddr, dport, dif);
if (score > badness) {
result = sk;
badness = score;
}
}
/*
* if the nulls value we got at the end of this lookup is
* not the expected one, we must restart lookup.
* We probably met an item that was moved to another chain.
*/
if (get_nulls_value(node) != hash)
goto begin;
if (result) {
if (unlikely(!atomic_inc_not_zero(&result->sk_refcnt)))
result = NULL;
else if (unlikely(compute_score(result, net, saddr, hnum, sport,
daddr, dport, dif) < badness)) {
sock_put(result);
goto begin;
}
}
rcu_read_unlock();
return result;
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists