netdev - Re: [PATCH net-next] tipc: correctly unlink packets from deferred queue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Mon, 16 Dec 2013 13:11:49 -0500
From:	Paul Gortmaker <paul.gortmaker@...driver.com>
To:	Erik Hugne <erik.hugne@...csson.com>
CC:	<netdev@...r.kernel.org>, <jon.maloy@...csson.com>,
	<ying.xue@...driver.com>, <tipc-discussion@...ts.sourceforge.net>
Subject: Re: [PATCH net-next] tipc: correctly unlink packets from deferred
 queue

On 13-12-16 11:35 AM, Erik Hugne wrote:
> On Mon, Dec 16, 2013 at 10:30:42AM -0500, Paul Gortmaker wrote:
>> On 13-12-16 04:46 AM, erik.hugne@...csson.com wrote:
>>> From: Erik Hugne <erik.hugne@...csson.com>
>>>
>>> When we pull a packet from the deferred queue, the next
>>> pointer for the current packet being processed might still
>>> refer to deferred packets. This is incorrect, and will
>>> lead to an oops if the last fragment have once been put on
>>> the deferred queue, and at least one packet have been
>>
>> Once again, I have to ask when this behaviour was introduced.
>> This should always be a question that you ask yourself, and
>> that you consider putting in the commit log.  Please add it
>> to your self-check list.
>>
>> So, is this a fail we introduce with the pending two series,
>> or with the series already taken by DaveM?
> 
> The problem have always been there, but the window for when
> it may occur increased after commit 40ba3cdf5
> tipc: message reassembly using fragment chain

OK, so put that in the commit log:

"It is our understanding that this problem has always existed,
however, with the recent change of commit 40ba3cdf5
("tipc: message reassembly using fragment chain"), the window
for it possibly happening has increased."

In this case, your choice of net-next is probably OK, given
the above new information, but  we need to put the info in the
commit log, so it makes it more clear for net vs. net-next.

In the end, Dave makes the final decision of net vs net-next,
based on the information we provide him, along with information
you aren't aware of, like how deep we are into the current rcN,
whether Linus is in a good mood and so on.  If we aren't sure,
we can even specify that, after the three dashes in the commit,
by listing the pros/cons of each.

> 
>>
>> Otherwise, if it is an older problem than that, then why
>> is this tagged net-next?  It looks like a genuine bug fix
>> for an oops, if the existing mainline code has this bug.
>>
>>> deferred after this fragment. The result of this is that
>>> the fragment chain linked together with the defer-queue.
>>
>> "...chain is linked ..."   ?
>
> What we have seen is that after successful delivery of a 
> fragmented message, the last packet in the fragment chain
> will point into the deferred queue. When we later free the
> chain, kfree_skb_list will also free packets from the defer-queue.

My comment was wrt. the missing "if" in the sentence, but I
like the more detailed paragraph you have given above better.

> 
> In theory, the same thing can occur for non-fragmented traffic
> aswell.
> 
>>
>>>
>>> We fix this by clearing the next pointer for the current
>>> packet being processed.
>>>
>>> [...] general protection fault: 0000
>>
>> Was this all that was in the header?  Seems overly edited, and
>> missing content (registers, EIP, etc.)
>>
>>> [...]
>>> [...] ? trace_hardirqs_on+0xd/0x10
>>> [...] tipc_link_recv_fragment+0xd1/0x1b0 [tipc]
>>> [...] tipc_recv_msg+0x4e4/0x920 [tipc]
>>> [...] ? tipc_l2_rcv_msg+0x40/0x250 [tipc]
>>> [...] tipc_l2_rcv_msg+0xcc/0x250 [tipc]
>>> [...] ? tipc_l2_rcv_msg+0x40/0x250 [tipc]
>>> [...] __netif_receive_skb_core+0x80b/0xd00
>>> [...] ? __netif_receive_skb_core+0x144/0xd00
>>> [...] __netif_receive_skb+0x26/0x70
>>> [...] netif_receive_skb+0x2d/0x200
>>
>> Same here, why have you bothered to clobber the addresses?
>> Deleting the printk time prefix from non-time critical bugs is
>> fine, but don't delete the addresses, since they convey some
>> relative information about functions nearby etc.
> 
> Just trying to avoid an unnecessarily verbose commit message.
> As the oops was from Ying's test system with non-upstream tipc
> code i didn't think the addresses added any value

You don't need to mangle the log, but it is always good to
specify up front if non-merged code is in use.  We have had
people come to netdev in the past, saying there is a bug in
core code like net/core/dev.c -- and only after digging in
do we find they are using some broken (un-reviewed) patches
from some vendor SDK in their kernel, which actually caused
the issue in that particular case.

So, something like "Note that the above backtrace was observed
on a tree with some additional pending TIPC changes in place,
but nothing in those changes, or in the backtrace above appears
to play a role in causing this issue."  Or, even better, simply
prove to yourself that you _can_ reproduce it on what is already
accepted into the net-next code base, if the WIP in Ying's test
system isn't at all related to this problem.

> 
> Should i do an edit/resend anyway?

Given all of the above, yes please.

Paul.
--

> 
> //E
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html