[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM_iQpX493zk7nAgv8xo0XN8sT9g5SUnrBC76Fvamc9Otfhfqg@mail.gmail.com>
Date: Wed, 27 Dec 2017 10:29:53 -0800
From: Cong Wang <xiyou.wangcong@...il.com>
To: John Fastabend <john.fastabend@...il.com>
Cc: Linux Kernel Network Developers <netdev@...r.kernel.org>,
Jakub Kicinski <jakub.kicinski@...ronome.com>
Subject: Re: [Patch net-next] net_sched: remove the unsafe __skb_array_empty()
On Sat, Dec 23, 2017 at 10:57 PM, John Fastabend
<john.fastabend@...il.com> wrote:
> On 12/22/2017 12:31 PM, Cong Wang wrote:
>> I understand why you had it, but it is just not safe. You don't want
>> to achieve performance gain by crashing system, right?
>
> huh? So my point is the patch you submit here is not a
> real fix but a work around. To peek the head of a consumer/producer ring
> without a lock, _should_ be fine. This _should_ work as well with
> consumer or producer operations happening at the same time. After some
> digging the issue is in the ptr_ring code.
The comments disagree with you:
/* Might be slightly faster than skb_array_empty below, but only safe if the
* array is never resized. Also, callers invoking this in a loop must take care
* to use a compiler barrier, for example cpu_relax().
*/
If the comments are right, you miss a barrier here too.
>
> The peek code (what empty check calls) is the following,
>
> static inline void *__ptr_ring_peek(struct ptr_ring *r)
> {
> if (likely(r->size))
> return r->queue[r->consumer_head];
> return NULL;
> }
>
> So what the splat is detecting is consumer head being 'out of bounds'.
> This happens because ptr_ring_discard_one increments the consumer_head
> and then checks to see if it overran the array size. If above peek
> happens after the increment, but before the size check we get the
> splat. There are two ways, as far as I can see, to fix this. First
> do the check before incrementing the consumer head. Or the easier
> fix,
>
> --- a/include/linux/ptr_ring.h
> +++ b/include/linux/ptr_ring.h
> @@ -438,7 +438,7 @@ static inline int ptr_ring_consume_batched_bh(struct
> ptr_ring *r,
>
> static inline void **__ptr_ring_init_queue_alloc(unsigned int size,
> gfp_t gfp)
> {
> - return kcalloc(size, sizeof(void *), gfp);
> + return kcalloc(size + 1, sizeof(void *), gfp);
> }
>
> With Jakub's help (Thanks!) I was able to reproduce the original splat
> and also verify the above removes it.
>
> To be clear "resizing" a skb_array only refers to changing the actual
> array size not adding/removing elements.
I never look into the implementation, just simply trust the comments.
At least the comments above __skb_array_empty() need to improve.
>
>>
>>>
>>> Although its not logical IMO to have both reset and dequeue running at
>>> the same time. Some skbs would get through others would get sent, sort
>>> of a mess. I don't see how it can be an issue. The never resized bit
>>> in the documentation is referring to resizing the ring size _not_ popping
>>> off elements of the ring. array_empty just reads the consumer head.
>>> The only ring resizing in pfifo fast should be at init and destroy where
>>> enqueue/dequeue should be disconnected by then. Although based on the
>>> trace I missed a case.
>>
>>
>> Both pfifo_fast_reset() and pfifo_fast_dequeue() call
>> skb_array_consume_bh(), so there is no difference w.r.t. resizing.
>>
>
> Sorry not following.
>
>> And ->reset() is called in qdisc_graft() too. Let's say we have htb+pfifo_fast,
>> htb_graft() calls qdisc_replace() which calls qdisc_reset() on pfifo_fast,
>> so clearly pfifo_fast_reset() can run with pfifo_fast_dequeue()
>> concurrently.
>
> Yes and this _should_ be perfectly fine for pfifo_fast. I'm wondering
> though if this API can be cleaned up. What are the paths that do a reset
> without a destroy.. Do we really need to have this pattern where reset
> is called then later destroy. Seems destroy could do the entire cleanup
> and this would simplify things. None of this has to do with the splat
> though.
I don't follow your point any more.
We are talking about ->reset() race with ->dequeue() which is the
cause of the bug, right?
If you expect ->reset() runs in parallel with ->dequeue() for pfifo_fast,
why did you even mention synchronize_net() from the beginning???
Also you changed the code too, to adjust rcu grace period.
>
>>
>>
>>>
>>> I think the right fix is to only call reset/destroy patterns after
>>> waiting a grace period and for all tx_action calls in-flight to
>>> complete. This is also better going forward for more complex qdiscs.
>>
>> But we don't even have rcu read lock in TX BH, do we?
>>
>> Also, people certainly don't like yet another synchronize_net()...
>>
>
> This needs a fix and is a _real_ bug, but removing __skb_array_empty()
> doesn't help solve this at all. Will work on a fix after the holiday
> break. The fix here is to ensure the destroy is not going to happen
> while tx_action is in-flight. Can be done with qdisc_run and checking
> correct bits in lockless case.
Sounds like you missed a lot of things with your "lockless" patches....
First qdisc rcu callback, second rcu read lock in TX BH...
My quick one-line fix is to amend this bug before you going deeper
in this rabbit hole.
Powered by blists - more mailing lists