netdev - Re: [PATCH net-next v1 2/5] net: sched: Introduce helpers for qevent blocks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAM_iQpUi-aKBLF5MkkSkCBchHeK5a_8OEDw3eXHZ4yPo=_hvsQ@mail.gmail.com>
Date:   Mon, 13 Jul 2020 19:51:41 -0700
From:   Cong Wang <xiyou.wangcong@...il.com>
To:     Petr Machata <petrm@...lanox.com>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>,
        David Miller <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Jiri Pirko <jiri@...lanox.com>,
        Ido Schimmel <idosch@...lanox.com>
Subject: Re: [PATCH net-next v1 2/5] net: sched: Introduce helpers for qevent blocks

On Fri, Jul 10, 2020 at 7:40 AM Petr Machata <petrm@...lanox.com> wrote:
>
>
> Cong Wang <xiyou.wangcong@...il.com> writes:
>
> > On Wed, Jul 8, 2020 at 5:13 PM Petr Machata <petrm@...lanox.com> wrote:
> >>
> >>
> >> Petr Machata <petrm@...lanox.com> writes:
> >>
> >> > Cong Wang <xiyou.wangcong@...il.com> writes:
> >> >
> >> > I'll think about it some more. For now I will at least fix the lack of
> >> > locking.
> >>
> >> I guess I could store smp_processor_id() that acquired the lock in
> >> struct qdisc_skb_head. Do a trylock instead of lock, and on fail check
> >> the stored value. I'll need to be careful about the race between
> >> unsuccessful trylock and the test, and about making sure CPU ID doesn't
> >> change after it is read. I'll probe this tomorrow.
> >
> > Like __netif_tx_lock(), right? Seems doable.
>
> Good to see it actually used, I wasn't sure if the idea made sense :)
>
> Unfortunately it is not enough.
>
> Consider two threads (A, B) and two netdevices (eth0, eth1):
>
> - "A" takes eth0's root lock and proceeds to classification
> - "B" takes eth1's root lock and proceeds to classification
> - "A" invokes mirror to eth1, waits on lock held by "B"
> - "B" invakes mirror to eth0, waits on lock held by "A"
> - Some say they are still waiting to this day.

Sure, AA or ABBA deadlock.

>
> So one option that I see is to just stash the mirrored packet in a queue
> instead of delivering it right away:
>
> - s/netif_receive_skb/netif_rx/ in act_mirred
>
> - Reuse the RX queue for TX packets as well, differentiating the two by
>   a bit in SKB CB. Then process_backlog() would call either
>   __netif_receive_skb() or dev_queue_transmit().
>
> - Drop mirred_rec_level guard.

I don't think I follow you, the root qdisc lock is on egress which has
nothing to do with ingress, so I don't see how netif_rx() is even involved.

>
> This seems to work, but I might be missing something non-obvious, such
> as CB actually being used for something already in that context. I would
> really rather not introduce a second backlog queue just for mirred
> though.
>
> Since mirred_rec_level does not kick in anymore, the same packet can end
> up being forwarded from the backlog queue, to the qdisc, and back to the
> backlog queue, forever. But that seems OK, that's what the admin
> configured, so that's what's happening.
>
> If this is not a good idea for some reason, this might work as well:
>
> - Convert the current root lock to an rw lock. Convert all current
>   lockers to write lock (which should be safe), except of enqueue, which
>   will take read lock. That will allow many concurrent threads to enter
>   enqueue, or one thread several times, but it will exclude all other
>   users.

Are you sure we can parallelize enqueue()? They all need to move
skb into some queue, which is not able to parallelize with just a read
lock. Even the "lockless" qdisc takes a spinlock, r->producer_lock,
for enqueue().


>
>   So this guards configuration access to the qdisc tree, makes sure
>   qdiscs don't go away from under one's feet.
>
> - Introduce another spin lock to guard the private data of the qdisc
>   tree, counters etc., things that even two concurrent enqueue
>   operations shouldn't trample on. Enqueue takes this spin lock after
>   read-locking the root lock. act_mirred drops it before injecting the
>   packet and takes it again afterwards.
>
> Any opinions y'all?

I thought about forbidding mirror/redirecting to the same device,
but there might be some legitimate use cases of such. So, I don't
have any other ideas yet, perhaps there is some way to refactor
dev_queue_xmit() to avoid this deadlock.

Thanks.