[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <585C6F49.5030900@iogearbox.net>
Date: Fri, 23 Dec 2016 01:26:49 +0100
From: Daniel Borkmann <daniel@...earbox.net>
To: Cong Wang <xiyou.wangcong@...il.com>
CC: David Miller <davem@...emloft.net>,
Shahar Klein <shahark@...lanox.com>,
Or Gerlitz <gerlitz.or@...il.com>,
Roi Dayan <roid@...lanox.com>, Jiri Pirko <jiri@...lanox.com>,
John Fastabend <john.fastabend@...il.com>,
Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: [PATCH net] net, sched: fix soft lockup in tc_classify
On 12/22/2016 08:05 PM, Cong Wang wrote:
> On Wed, Dec 21, 2016 at 1:07 PM, Daniel Borkmann <daniel@...earbox.net> wrote:
>>
>> Ok, you mean for net. In that case I prefer the smaller sized fix to be
>> honest. It also covers everything from the point where we fetch the chain
>> via cops->tcf_chain() to the end of the function, which is where most of
>> the complexity resides, and only the two mentioned commits do the relock,
>
> I really wish the problem is only about relocking, but look at the code,
> the deeper reason why we have this bug is the complexity of the logic
> inside tc_ctl_tfilter(): 1) the replay logic is hard, we have to make it
> idempotent; 2) the request logic itself is hard, because of tc filter design
> and implementation.
>
> This is why I worry more than just relocking.
But do you have a concrete 2nd issue/bug you're seeing? It rather sounds to
me your argument is more about fear of complexity on tc framework itself.
I agree it's complex, and tc_ctl_tfilter() is quite big in itself, where it
would be good to reduce it's complexity into smaller pieces. But it's not
really related to the fix itself, reducing complexity requires significantly
more and deeper work on the code. We can rework tc_ctl_tfilter() in net-next
to try to simplify it, sure, but I don't get why we have to discuss so much
on this matter in this context, really.
>> so as a fix I think it's fine as-is. As mentioned, if there's need to
>> refactor tc_ctl_tfilter() net-next would be better, imho.
>
> Refactor is a too strong word, just moving the replay out is not a refactor.
> The end result will be still smaller than your commit d936377414fadbafb4,
> which is already backported to stable.
Commit d936377414fa is a whole different thing, and not related to the
topic at all. The two-line changes with the initialization fix is quite
straight forward and fixes a bug in the logic. It's also less complex in
terms of lines but also in terms of complexity than to refactor or move the
replay out. Moving it out contextually means that you also wouldn't rule
out that things like nlmsg_parse(), or in-fact *any* of the __tc_ctl_tfilter()
return paths could potentially return an error that suddenly would require
a replay of the request. So, while moving it out fixes the bug, too, it also
adds more potential points where you would go and replay the request where
you didn't do so before and therefore also adds more complexity to the code
that needs review/audit when fixing bugs since you don't have these hard/direct
return paths anymore. Therefore I don't think it's better to go that way
for the fix.
Thanks,
Daniel
Powered by blists - more mailing lists