[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <585EE2AF.2000100@iogearbox.net>
Date: Sat, 24 Dec 2016 22:03:43 +0100
From: Daniel Borkmann <daniel@...earbox.net>
To: Cong Wang <xiyou.wangcong@...il.com>
CC: David Miller <davem@...emloft.net>,
Shahar Klein <shahark@...lanox.com>,
Or Gerlitz <gerlitz.or@...il.com>,
Roi Dayan <roid@...lanox.com>, Jiri Pirko <jiri@...lanox.com>,
John Fastabend <john.fastabend@...il.com>,
Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: [PATCH net] net, sched: fix soft lockup in tc_classify
On 12/24/2016 08:34 AM, Cong Wang wrote:
> On Thu, Dec 22, 2016 at 4:26 PM, Daniel Borkmann <daniel@...earbox.net> wrote:
>> On 12/22/2016 08:05 PM, Cong Wang wrote:
>>> On Wed, Dec 21, 2016 at 1:07 PM, Daniel Borkmann <daniel@...earbox.net>
>>> wrote:
>>>>
>>>> Ok, you mean for net. In that case I prefer the smaller sized fix to be
>>>> honest. It also covers everything from the point where we fetch the chain
>>>> via cops->tcf_chain() to the end of the function, which is where most of
>>>> the complexity resides, and only the two mentioned commits do the relock,
>>>
>>> I really wish the problem is only about relocking, but look at the code,
>>> the deeper reason why we have this bug is the complexity of the logic
>>> inside tc_ctl_tfilter(): 1) the replay logic is hard, we have to make it
>>> idempotent; 2) the request logic itself is hard, because of tc filter
>>> design
>>> and implementation.
>>>
>>> This is why I worry more than just relocking.
>>
>> But do you have a concrete 2nd issue/bug you're seeing? It rather sounds to
>> me your argument is more about fear of complexity on tc framework itself.
>> I agree it's complex, and tc_ctl_tfilter() is quite big in itself, where it
>> would be good to reduce it's complexity into smaller pieces. But it's not
>> really related to the fix itself, reducing complexity requires significantly
>> more and deeper work on the code. We can rework tc_ctl_tfilter() in net-next
>> to try to simplify it, sure, but I don't get why we have to discuss so much
>> on this matter in this context, really.
>
> Thanks for ignoring my point 1) above... You are dragging the discussion
> further.
I don't think so. The analysis and patch I proposed provides an explanation
of how we get into the seen endless loop, it provides a logical fix for it,
which has been reviewed by others and it has been tested extensively that it
resolves the issue, which was easily reproducible for the reporter and that
after the fix it never occurred again. The delta is absolutely simple and
really low risk. Given this function has not much changed over time, also
distros could pick it up that have a much older base kernel than current
stable ones. This initiated follow-up discussion we're having here in general
is dragging the focus away for everyone, and quite frankly I'm getting tired
of discussing it. I have stated my preferences, you have stated yours, and
we're only repeating ourselves in circles which isn't helpful in any way,
the discussion is not about some concrete bug in the logic to fix anymore
(otherwise please name it). Hence my proposal that everything else can wait
and be done in net-next.
Powered by blists - more mailing lists