netdev - Re: [PATCH net-next 01/12] net: sched: flower: don't check for rtnl on head dereference

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <vbf36obu9y9.fsf@mellanox.com>
Date:   Mon, 25 Feb 2019 16:11:20 +0000
From:   Vlad Buslov <vladbu@...lanox.com>
To:     Cong Wang <xiyou.wangcong@...il.com>
CC:     Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Jamal Hadi Salim <jhs@...atatu.com>,
        Jiri Pirko <jiri@...nulli.us>,
        David Miller <davem@...emloft.net>
Subject: Re: [PATCH net-next 01/12] net: sched: flower: don't check for rtnl
 on head dereference


On Fri 22 Feb 2019 at 19:32, Cong Wang <xiyou.wangcong@...il.com> wrote:
> On Thu, Feb 21, 2019 at 9:45 AM Vlad Buslov <vladbu@...lanox.com> wrote:
>>
>>
>> On Wed 20 Feb 2019 at 22:33, Cong Wang <xiyou.wangcong@...il.com> wrote:
>> > On Tue, Feb 19, 2019 at 1:46 AM Vlad Buslov <vladbu@...lanox.com> wrote:
>> >>
>> >>
>> >> On Mon 18 Feb 2019 at 19:08, Cong Wang <xiyou.wangcong@...il.com> wrote:
>> >> > On Wed, Feb 13, 2019 at 11:47 PM Vlad Buslov <vladbu@...lanox.com> wrote:
>> >> >>
>> >> >> Flower classifier only changes root pointer during init and destroy. Cls
>> >> >> API implements reference counting for tcf_proto, so there is no danger of
>> >> >> concurrent access to tp when it is being destroyed, even without protection
>> >> >> provided by rtnl lock.
>> >> >
>> >> > How about atomicity? Refcnt doesn't guarantee atomicity, how do
>> >> > you make sure two concurrent modifications are atomic?
>> >>
>> >> In order to guarantee atomicity I lock shared flower classifier data
>> >> structures with tp->lock in following patches.
>> >
>> > Sure, I meant the atomicity of the _whole_ change, as you know
>> > the TC filters are stored in hierarchical structures: a block, a chain,
>> > a tp struct, some type-specific data structure like a hash table.
>> >
>> > Locking tp only solves a partial of the atomicity here. Are you
>> > going to restart the whole change from top down to the bottom?
>>
>> You mean in cases where there is a conflict? Yes, processing EAGAIN in
>> tc_new_tfilter() involves releasing and re-obtaining all locks and
>> references.
>
> Sure, restart only happens when a conflict is detected, this is
> why I called it worst scenario.
>
>
>>
>> >
>> >
>> >>
>> >> >
>> >> >
>> >> >>
>> >> >> Implement new function fl_head_dereference() to dereference tp->root
>> >> >> without checking for rtnl lock. Use it in all flower function that obtain
>> >> >> head pointer instead of rtnl_dereference().
>> >> >>
>> >> >
>> >> > So what lock protects RCU writers after this patch?
>> >>
>> >> I explained it in comment for fl_head_dereference(), but should have
>> >> copied this information to changelog as well:
>> >> Flower classifier only changes root pointer during init and destroy.
>> >> Cls API implements reference counting for tcf_proto, so there is no
>> >> danger of concurrent access to tp when it is being destroyed, even
>> >> without protection provided by rtnl lock.
>> >
>> > So you are saying an RCU pointer is okay to deference without
>> > any lock eve without RCU read lock, right?
>> >
>> >
>> >>
>> >> In initial version of this change I used tp->lock to protect tp->root
>> >> access and verified it with lockdep, but during internal review Jiri
>> >> noted that this is not needed in current flower implementation.
>> >
>> > Let's see what you have on top of your own branch
>> > unlocked_flower_cong_1:
>> >
>> > 1458 static int fl_change(struct net *net, struct sk_buff *in_skb,
>> > 1459                      struct tcf_proto *tp, unsigned long base,
>> > 1460                      u32 handle, struct nlattr **tca,
>> > 1461                      void **arg, bool ovr, bool rtnl_held,
>> > 1462                      struct netlink_ext_ack *extack)
>> > 1463 {
>> > 1464         struct cls_fl_head *head = fl_head_dereference(tp);
>> >
>> > At the point of line 1464, there is no lock taken, tp->lock is taken
>> > after it, block->lock or chain lock is already unlocked before ->change().
>> >
>> > So, what protects this RCU structure? According to RCU, it must be
>> > either RCU read lock or some writer lock. I see none here. :(
>> >
>> > What am I missing?
>>
>> Initially I had flower implementation that protected tp->root access
>> with tp->lock, re-obtaining lock and head reference each time it was
>> used, sometimes multiple times per function (head reference is usually
>> obtained in beginning of every flower API function and used multiple
>> times afterwards). This complicated the code and reduced parallelism.
>> However, in case of flower classifier tp->root is never reassigned after
>> init, so essentially there is no "CU" part in this "RCU" pointer. This
>> realization allowed me to refactor unlocked flower code to look much
>> nicer and perform better. Am I missing something in flower classifier
>> implementation?
>
> So if it is no longer RCU any more, why do you still use
> rcu_dereference_protected()? That is, why not just deref it as a raw
> pointer?
>
> And, I don't think I can buy your argument here. The RCU infrastructure
> should not be changed even after your patches, the fast path is still
> protocted by RCU read lock, while the slow path now is protected by
> some smaller-scope locks. What makes cls_flower so unique that
> it doesn't even need RCU here? tp->root is not reassigned but it is still
> freed via RCU infra, that is in fl_destroy_sleepable().
>
> Thanks.

My cls API patch set introduced reference counting for tcf_proto
structure. With that change tp->ops->destroy() (which calls fl_destroy()
and fl_destroy_sleepable(), in case of flower classifier) is only called
after last reference to tp is released. All slow path users of tp->ops
must obtain reference to tp, so concurrent call to fl_destroy() is not
possible. Before this change tcf_proto structure didn't have reference
counting support and required users to obtain rtnl mutex before calling
its ops callbacks. This was verified in flower by using rtnl_dereference
to obtain tp->root.