[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM_iQpUi-aKBLF5MkkSkCBchHeK5a_8OEDw3eXHZ4yPo=_hvsQ@mail.gmail.com>
Date: Mon, 13 Jul 2020 19:51:41 -0700
From: Cong Wang <xiyou.wangcong@...il.com>
To: Petr Machata <petrm@...lanox.com>
Cc: Linux Kernel Network Developers <netdev@...r.kernel.org>,
David Miller <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Eric Dumazet <eric.dumazet@...il.com>,
Jiri Pirko <jiri@...lanox.com>,
Ido Schimmel <idosch@...lanox.com>
Subject: Re: [PATCH net-next v1 2/5] net: sched: Introduce helpers for qevent blocks
On Fri, Jul 10, 2020 at 7:40 AM Petr Machata <petrm@...lanox.com> wrote:
>
>
> Cong Wang <xiyou.wangcong@...il.com> writes:
>
> > On Wed, Jul 8, 2020 at 5:13 PM Petr Machata <petrm@...lanox.com> wrote:
> >>
> >>
> >> Petr Machata <petrm@...lanox.com> writes:
> >>
> >> > Cong Wang <xiyou.wangcong@...il.com> writes:
> >> >
> >> > I'll think about it some more. For now I will at least fix the lack of
> >> > locking.
> >>
> >> I guess I could store smp_processor_id() that acquired the lock in
> >> struct qdisc_skb_head. Do a trylock instead of lock, and on fail check
> >> the stored value. I'll need to be careful about the race between
> >> unsuccessful trylock and the test, and about making sure CPU ID doesn't
> >> change after it is read. I'll probe this tomorrow.
> >
> > Like __netif_tx_lock(), right? Seems doable.
>
> Good to see it actually used, I wasn't sure if the idea made sense :)
>
> Unfortunately it is not enough.
>
> Consider two threads (A, B) and two netdevices (eth0, eth1):
>
> - "A" takes eth0's root lock and proceeds to classification
> - "B" takes eth1's root lock and proceeds to classification
> - "A" invokes mirror to eth1, waits on lock held by "B"
> - "B" invakes mirror to eth0, waits on lock held by "A"
> - Some say they are still waiting to this day.
Sure, AA or ABBA deadlock.
>
> So one option that I see is to just stash the mirrored packet in a queue
> instead of delivering it right away:
>
> - s/netif_receive_skb/netif_rx/ in act_mirred
>
> - Reuse the RX queue for TX packets as well, differentiating the two by
> a bit in SKB CB. Then process_backlog() would call either
> __netif_receive_skb() or dev_queue_transmit().
>
> - Drop mirred_rec_level guard.
I don't think I follow you, the root qdisc lock is on egress which has
nothing to do with ingress, so I don't see how netif_rx() is even involved.
>
> This seems to work, but I might be missing something non-obvious, such
> as CB actually being used for something already in that context. I would
> really rather not introduce a second backlog queue just for mirred
> though.
>
> Since mirred_rec_level does not kick in anymore, the same packet can end
> up being forwarded from the backlog queue, to the qdisc, and back to the
> backlog queue, forever. But that seems OK, that's what the admin
> configured, so that's what's happening.
>
> If this is not a good idea for some reason, this might work as well:
>
> - Convert the current root lock to an rw lock. Convert all current
> lockers to write lock (which should be safe), except of enqueue, which
> will take read lock. That will allow many concurrent threads to enter
> enqueue, or one thread several times, but it will exclude all other
> users.
Are you sure we can parallelize enqueue()? They all need to move
skb into some queue, which is not able to parallelize with just a read
lock. Even the "lockless" qdisc takes a spinlock, r->producer_lock,
for enqueue().
>
> So this guards configuration access to the qdisc tree, makes sure
> qdiscs don't go away from under one's feet.
>
> - Introduce another spin lock to guard the private data of the qdisc
> tree, counters etc., things that even two concurrent enqueue
> operations shouldn't trample on. Enqueue takes this spin lock after
> read-locking the root lock. act_mirred drops it before injecting the
> packet and takes it again afterwards.
>
> Any opinions y'all?
I thought about forbidding mirror/redirecting to the same device,
but there might be some legitimate use cases of such. So, I don't
have any other ideas yet, perhaps there is some way to refactor
dev_queue_xmit() to avoid this deadlock.
Thanks.
Powered by blists - more mailing lists