[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <vbfworyx4hr.fsf@reg-r-vrt-018-180.mtr.labs.mlnx>
Date:   Thu, 06 Sep 2018 14:14:08 +0300
From:   Vlad Buslov <vladbu@...lanox.com>
To:     Cong Wang <xiyou.wangcong@...il.com>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Jamal Hadi Salim <jhs@...atatu.com>,
        Jiri Pirko <jiri@...nulli.us>,
        David Miller <davem@...emloft.net>
Subject: Re: [PATCH net-next] net: sched: change tcf_del_walker() to use concurrent-safe delete
On Wed 05 Sep 2018 at 20:32, Cong Wang <xiyou.wangcong@...il.com> wrote:
> On Wed, Sep 5, 2018 at 12:05 AM Vlad Buslov <vladbu@...lanox.com> wrote:
>>
>>
>> On Tue 04 Sep 2018 at 22:41, Cong Wang <xiyou.wangcong@...il.com> wrote:
>> > On Mon, Sep 3, 2018 at 1:33 PM Vlad Buslov <vladbu@...lanox.com> wrote:
>> >>
>> >>
>> >> On Mon 03 Sep 2018 at 18:50, Cong Wang <xiyou.wangcong@...il.com> wrote:
>> >> > On Mon, Sep 3, 2018 at 12:06 AM Vlad Buslov <vladbu@...lanox.com> wrote:
>> >> >>
>> >> >> Action API was changed to work with actions and action_idr in concurrency
>> >> >> safe manner, however tcf_del_walker() still uses actions without taking
>> >> >> reference to them first and deletes them directly, disregarding possible
>> >> >> concurrent delete.
>> >> >>
>> >> >> Change tcf_del_walker() to use tcf_idr_delete_index() that doesn't require
>> >> >> caller to hold reference to action and accepts action id as argument,
>> >> >> instead of direct action pointer.
>> >> >
>> >> > Hmm, why doesn't tcf_del_walker() just take idrinfo->lock? At least
>> >> > tcf_dump_walker() already does.
>> >>
>> >> Because tcf_del_walker() calls __tcf_idr_release(), which take
>> >> idrinfo->lock itself (deadlock). It also calls sleeping functions like
>> >
>> > Deadlock can be easily resolved by moving the lock out.
>> >
>> >
>> >> tcf_action_goto_chain_fini(), so just implementing function that
>> >> releases action without taking idrinfo->lock is not enough.
>> >
>> > Sleeping can be resolved either by making it atomic or
>> > deferring it to a work queue.
>> >
>> > None of your arguments here is a blocker to locking
>> > idrinfo->lock. You really should focus on if it is really
>> > necessary to lock idrinfo->lock in tcf_del_walker(), rather
>> > than these details.
>> >
>> > For me, if you need idrinfo->lock for dump walker, you must
>> > need it for delete walker too, because deletion is a writer
>> > which should require stronger protection than the dumper,
>> > which merely a reader.
>>
>> I don't get how it is necessary. Dump walker uses pointers to actions
>> directly, and in order to be concurrency-safe it must either hold the
>
> It uses the pointer in a read-only way, what you said doesn't change
> the fact that it is a reader. And, like other readers, it may not need
> to lock at all, which is a different topic.
>
>
>> lock or obtain reference to action. Note that del walker doesn't use the
>> action pointer, it only passed action id to tcf_idr_delete_index()
>> function, which does all the necessary locking and can deal with
>> potential concurrency issues (concurrent delete, etc.). This approach
>> also benefits from code reuse from other code paths that delete actions,
>> instead of implementing its own.
>
> Look at the difference below.
>
> With your change:
>
> idr_for_each_entry_ul{
>    spin_lock(&idrinfo->lock);
>    idr_remove();
>    spin_unlock(&idrinfo->lock);
> }
>
> With what I suggest:
>
> spin_lock(&idrinfo->lock);
> idr_for_each_entry_ul{
>    idr_remove();
> }
> spin_unlock(&idrinfo->lock);
>
> Isn't a concurrent tcf_idr_check_alloc() able to livelock here with
> your change?
>
> idr_for_each_entry_ul{
>    spin_lock(&idrinfo->lock);
>    idr_remove();
>    spin_unlock(&idrinfo->lock);
>       // tcf_idr_check_alloc() jumps in,
>      // allocates next ID which can be found
>       // by idr_get_next_ul()
> } // the whole loop goes _literately_ infinite...
idr_for_each_entry_ul traverses idr entries with ascending order of
identifiers, so infinite livelock like this is not possible because it
never goes back to newly added entries with id<current_id.
>
>
> Also, idr_for_each_entry_ul() is supposed to be protected either
> by RCU or idrinfo->lock, no? With your change or without any change,
> it doesn't even have any lock after removing RTNL?
After reading this comment I checked actual idr implementation and I
think you are right. Even though idr_for_each_entry_ul() macro (and
function idr_get_next_ul() that it uses to iterate over idr entries)
doesn't specify any locking requirements in comment description (that is
why this patch doesn't use any), its implementation seems to require
external synchronization.
You suggest I should just hold idrinfo->lock for whole del_walker loop
duration, or play nicely with potential concurrent users and
take/release it per action?
Thanks,
Vlad
Powered by blists - more mailing lists
 
