netdev - Re: [Patch net-next] net_sched: remove the unsafe __skb_array

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Fri, 22 Dec 2017 12:31:13 -0800
From:   Cong Wang <xiyou.wangcong@...il.com>
To:     John Fastabend <john.fastabend@...il.com>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Jakub Kicinski <jakub.kicinski@...ronome.com>
Subject: Re: [Patch net-next] net_sched: remove the unsafe __skb_array_empty()

On Thu, Dec 21, 2017 at 7:06 PM, John Fastabend
<john.fastabend@...il.com> wrote:
> On 12/21/2017 04:03 PM, Cong Wang wrote:
>> __skb_array_empty() is only safe if array is never resized.
>> pfifo_fast_dequeue() is called in TX BH context and without
>> qdisc lock, so even after we disable BH on ->reset() path
>> we can still race with other CPU's.
>>
>> Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array")
>> Reported-by: Jakub Kicinski <jakub.kicinski@...ronome.com>
>> Cc: John Fastabend <john.fastabend@...il.com>
>> Signed-off-by: Cong Wang <xiyou.wangcong@...il.com>
>> ---
>>  net/sched/sch_generic.c | 3 ---
>>  1 file changed, 3 deletions(-)
>>
>> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
>> index 00ddb5f8f430..9279258ce060 100644
>> --- a/net/sched/sch_generic.c
>> +++ b/net/sched/sch_generic.c
>> @@ -622,9 +622,6 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
>>       for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
>>               struct skb_array *q = band2list(priv, band);
>>
>> -             if (__skb_array_empty(q))
>> -                     continue;
>> -
>>               skb = skb_array_consume_bh(q);
>>       }
>>       if (likely(skb)) {
>>
>
>
> So this is a performance thing we don't want to grab the consumer lock on
> empty bands. Which can be fairly common depending on traffic patterns.


I understand why you had it, but it is just not safe. You don't want
to achieve performance gain by crashing system, right?

>
> Although its not logical IMO to have both reset and dequeue running at
> the same time. Some skbs would get through others would get sent, sort
> of a mess. I don't see how it can be an issue. The never resized bit
> in the documentation is referring to resizing the ring size _not_ popping
> off elements of the ring. array_empty just reads the consumer head.
> The only ring resizing in pfifo fast should be at init and destroy where
> enqueue/dequeue should be disconnected by then. Although based on the
> trace I missed a case.


Both pfifo_fast_reset() and pfifo_fast_dequeue() call
skb_array_consume_bh(), so there is no difference w.r.t. resizing.

And ->reset() is called in qdisc_graft() too. Let's say we have htb+pfifo_fast,
htb_graft() calls qdisc_replace() which calls qdisc_reset() on pfifo_fast,
so clearly pfifo_fast_reset() can run with pfifo_fast_dequeue()
concurrently.


>
> I think the right fix is to only call reset/destroy patterns after
> waiting a grace period and for all tx_action calls in-flight to
> complete. This is also better going forward for more complex qdiscs.

But we don't even have rcu read lock in TX BH, do we?

Also, people certainly don't like yet another synchronize_net()...