[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <vbfpngk2r9a.fsf@mellanox.com>
Date: Thu, 19 Dec 2019 16:51:17 +0000
From: Vlad Buslov <vladbu@...lanox.com>
To: Jamal Hadi Salim <jhs@...atatu.com>
CC: Vlad Buslov <vladbu@...lanox.com>,
Davide Caratti <dcaratti@...hat.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>,
Cong Wang <xiyou.wangcong@...il.com>,
Jiri Pirko <jiri@...nulli.us>, Roman Mashak <mrv@...atatu.com>
Subject: Re: [PATCH net 1/2] net/sched: cls_u32: fix refcount leak in the
error path of u32_change()
On Thu 19 Dec 2019 at 18:33, Jamal Hadi Salim <jhs@...atatu.com> wrote:
> On 2019-12-19 11:15 a.m., Vlad Buslov wrote:
>
>
>> Hi Jamal,
>>
>> Just destroying tp unconditionally will break unlocked case (flower)
>> because of possibility of concurrent insertion of new filters to the
>> same tp instance.
>>
>
> I was worried about that. So rtnlheld doesnt help?
Rtnl_held flag can be set for multiple other reasons besides
locked/unlocked classifier implementation (parent Qdisc doesn't support
unlocked execution, retry in tc_new_tfilter(), etc.).
>
>> The root cause here is precisely described by Davide in cover letter -
>> to accommodate concurrent insertions cls API verifies that tp instance
>> is empty before deleting it and since there is no cls ops to do it
>> directly, it relies on checking that walk() stopped without accessing
>> any filters instead. Unfortunately, somw classifier implementations
>> assumed that there is always at least one filter on classifier (I fixed
>> several of these) and now Davide also uncovered this leak in u32.
>>
>> As a simpler solution to fix such issues once and for all I can propose
>> not to perform the walk() check at all and assume that any classifier
>> implementation that doesn't have TCF_PROTO_OPS_DOIT_UNLOCKED flag set is
>> empty in tcf_chain_tp_delete_empty() (there is no possibility of
>> concurrent insertion when synchronizing with rtnl).
>>
>> WDYT?
>
> IMO that would be a cleaner fix give walk() is used for other
> operations and this is a core cls issue.
> Also: documenting what it takes for a classifier to support
> TCF_PROTO_OPS_DOIT_UNLOCKED is useful (you may have done this
> in some commit already).
Well, idea was not to have classifier implement any specific API. If
classifier has TCF_PROTO_OPS_DOIT_UNLOCKED, then cls API doesn't obtain
rtnl and it is up to classifier to implement whatever locking necessary
internally. However, my approach to checking for empty classifier
instance uncovered multiple bugs and inconsistencies in classifier
implementations of ops->walk().
So I guess the requirement now is for unlocked classifier to have sane
implementation of ops->walk() that doesn't assume >1 filters and
correctly handles the case when insertion of first filter after
classifier instance is created fails.
>
> cheers,
> jamal
Powered by blists - more mailing lists