netdev - Re: [Patch net v2 2/2] net_sched: fix all the madness of tc filter chain

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170907205239.GF1967@nanopsycho>
Date:   Thu, 7 Sep 2017 22:52:39 +0200
From:   Jiri Pirko <jiri@...nulli.us>
To:     Cong Wang <xiyou.wangcong@...il.com>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Jakub Kicinski <jakub.kicinski@...ronome.com>,
        Jiri Pirko <jiri@...lanox.com>
Subject: Re: [Patch net v2 2/2] net_sched: fix all the madness of tc filter
 chain

Thu, Sep 07, 2017 at 07:45:49PM CEST, xiyou.wangcong@...il.com wrote:
>On Wed, Sep 6, 2017 at 11:32 PM, Jiri Pirko <jiri@...nulli.us> wrote:
>> Thu, Sep 07, 2017 at 06:26:07AM CEST, xiyou.wangcong@...il.com wrote:
>>>This patch fixes the following madness of tc filter chain:
>>
>> Could you avoid expressive words like "madness" and such?
>> Please be technical.
>>
>
>If the following 2a) 2b) 2c) 2d) are not enough to show the madness,
>I don't know any other to show it. Madness is for code, not for you
>or any other person, so 100% technical.

We hav eto agree to disagree. You want to be expressive, apparently
there is no way to talk you out of that. Sigh...


>
>>
>>>
>>>1) tcf_chain_destroy() is called by both tcf_block_put() and
>>>   tcf_chain_put().  tcf_chain_put() is correctly refcnt'ed and paired
>>>   with tcf_chain_get(), but tcf_block_put() is not, it should be paired
>>>   with tcf_block_get() which means we still need to decrease the refcnt.
>>>   Think it in another way: if we call tcf_bock_put() immediately after
>>>   tcf_block_get(), could we get effectively a nop? This causes a memory
>>>   leak as reported by Jakub.
>>>
>>>2) tp proto should hold a refcnt to the chain too. This significantly
>>>   simplifies the logic:
>>>
>>>2a) Chain 0 is no longer special, it is created and refcnted by tp
>>>    like any other chains. All the ugliness in tcf_chain_put() can be
>>>    gone!
>>>
>>>2b) No need to handle the flushing oddly, because block still holds
>>>    chain 0, it can not be released, this guarantees block is the last
>>>    user.
>>>
>>>2c) The race condition with RCU callbacks is easier to handle with just
>>>    a rcu_barrier()! Much easier to understand, nothing to hide! Thanks
>>>    to the previous patch. Please see also the comments in code.
>>>
>>>2d) Make the code understandable by humans, much less error-prone.
>>>
>>>Fixes: 744a4cf63e52 ("net: sched: fix use after free when tcf_chain_destroy is called multiple times")
>>>Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters")
>>>Reported-by: Jakub Kicinski <jakub.kicinski@...ronome.com>
>>>Cc: Jiri Pirko <jiri@...lanox.com>
>>>Signed-off-by: Cong Wang <xiyou.wangcong@...il.com>
>>>---
>>> net/sched/cls_api.c | 38 ++++++++++++++++++++++----------------
>>> 1 file changed, 22 insertions(+), 16 deletions(-)
>>>
>>>diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>>>index 6c5ea84d2682..e9060dc36519 100644
>>>--- a/net/sched/cls_api.c
>>>+++ b/net/sched/cls_api.c
>>>@@ -209,21 +209,20 @@ static void tcf_chain_flush(struct tcf_chain *chain)
>>>               RCU_INIT_POINTER(*chain->p_filter_chain, NULL);
>>>       while ((tp = rtnl_dereference(chain->filter_chain)) != NULL) {
>>>               RCU_INIT_POINTER(chain->filter_chain, tp->next);
>>>+              tcf_chain_put(chain);
>>>               tcf_proto_destroy(tp);
>>>       }
>>> }
>>>
>>> static void tcf_chain_destroy(struct tcf_chain *chain)
>>> {
>>>-      /* May be already removed from the list by the previous call. */
>>>-      if (!list_empty(&chain->list))
>>>-              list_del_init(&chain->list);
>>>+      list_del(&chain->list);
>>>+      kfree(chain);
>>>+}
>>>
>>>-      /* There might still be a reference held when we got here from
>>>-       * tcf_block_put. Wait for the user to drop reference before free.
>>>-       */
>>>-      if (!chain->refcnt)
>>>-              kfree(chain);
>>>+static void tcf_chain_hold(struct tcf_chain *chain)
>>>+{
>>>+      ++chain->refcnt;
>>> }
>>>
>>> struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
>>>@@ -233,7 +232,7 @@ struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
>>>
>>>       list_for_each_entry(chain, &block->chain_list, list) {
>>>               if (chain->index == chain_index) {
>>>-                      chain->refcnt++;
>>>+                      tcf_chain_hold(chain);
>>>                       return chain;
>>>               }
>>>       }
>>>@@ -246,10 +245,7 @@ EXPORT_SYMBOL(tcf_chain_get);
>>>
>>> void tcf_chain_put(struct tcf_chain *chain)
>>> {
>>>-      /* Destroy unused chain, with exception of chain 0, which is the
>>>-       * default one and has to be always present.
>>>-       */
>>>-      if (--chain->refcnt == 0 && !chain->filter_chain && chain->index != 0)
>>>+      if (--chain->refcnt == 0)
>>
>> Okay, so you take the reference for every goto_chain action and every
>> tp, right? Note that for chain 0, you hold one more reference (due to
>> the creation). That is probably ok as we need chain 0 not to go away
>> even if all tps and goto_chain actions are gone.
>
>Yeah, this is the core of the patch.
>
>>
>>
>>>               tcf_chain_destroy(chain);
>>> }
>>> EXPORT_SYMBOL(tcf_chain_put);
>>>@@ -294,10 +290,18 @@ void tcf_block_put(struct tcf_block *block)
>>>       if (!block)
>>>               return;
>>>
>>>-      list_for_each_entry_safe(chain, tmp, &block->chain_list, list) {
>>>+      /* Standalone actions are not allowed to jump to any chain, and
>>>+       * bound actions should be all removed after flushing. However,
>>>+       * filters are destroyed in RCU callbacks, we have to flush and wait
>>>+       * for them before releasing this refcnt, otherwise we race with RCU
>>>+       * callbacks!!!
>>
>> Why the "!!!"? Please avoid that. Not necessary at all.
>
>
>Because we don't have lock here, we have to pay more extra
>attention to it. Playing list without lock is like playing fire. I use "!!!" to
>draw people's attention when they touch it, perhaps "XXX" is better?
>
>
>>
>>
>>>+       */
>>>+      list_for_each_entry(chain, &block->chain_list, list)
>>>               tcf_chain_flush(chain);
>>>-              tcf_chain_destroy(chain);
>>>-      }
>>>+      rcu_barrier();
>>
>> This actually tries to fix another bug I discovered yesterday. Good.
>>
>
>This on the other hand proves we should make the code clean
>and understandable asap, not just wait for net-next.
>
>
>
>>
>>>+
>>>+      list_for_each_entry_safe(chain, tmp, &block->chain_list, list)
>>>+              tcf_chain_put(chain);
>>
>> Which reference are you putting here? For chain 0, that is the original
>> reference due to creation from block_get. But how about the other
>> chains? If you do flush all in the previous list iteration, they are
>> removed there. Also note that they are removed from the list while
>> iterating it.
>
>
>Yes it is for chain 0, because block holds a reference to chain 0 during
>creation. Non-0 chains are created with refcnt==1 too but paired with
>tcf_chain_put() rather than tcf_block_put(). This is what makes chain 0
>not special w.r.t. refcnt.

So you need to tcf_chain_put only chain 0 here, right? The rest of the
chains get destroyed by the previous list_for_each_entry iteration when
flush happens and actions destroy happens what decrements recnt to 0
there.

What do I miss, who would still hold reference for non-0 chains when all
tps and all goto_chain actions are gone?


>
>
>>
>> I believe that you need to add tcf_chain_hold(chain) to the start of the
>> previous list iteration to ensure all existing chains will stay, then
>> you can put them here.
>
>We don't need it, it is already perfectly paired with tcf_block_get().
>Think about the following:
>
>1)
>tcf_block_get(...); // create chain 0 with refcnt=1
>tcf_block_put(); // no tp, only chain 0 in chain_list
>// chain 0 is put, refcnt=0, it is gone
>
>2)
>tcf_block_get(); // create chain 0 with refcnt=1
>...
>tcf_chain_get(11); // create chain 11 with refcnt=1
>// one tp is inserted to chain 11, now refcnt==2
>// in tc_ctl_tfilter()
>tcf_chain_put(11); // paired with above get, refcnt==1
>...
>tcf_block_put(); // flush chain 11, tp removed, refcnt==0
>// put chain 0 too, paired with block get, refcnt == 0
>// both chain 0 and chain11 are gone
>
>
>
>>
>> Did you test this? I believe we need some simple test script.
>
>Of course I did. I verified memleak is gone and tested basic
>filters and gact actions with chain 0 and chain 11, everything
>works as expected, I also added a printk() to verify the chain
>is really gone as soon as all references are gone.
>
>I will contribute my test cases back after I figure out how.
>Also net-next is already closed.