[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <875zavh1re.fsf@mellanox.com>
Date: Fri, 10 Jul 2020 16:40:21 +0200
From: Petr Machata <petrm@...lanox.com>
To: Cong Wang <xiyou.wangcong@...il.com>
Cc: Linux Kernel Network Developers <netdev@...r.kernel.org>,
David Miller <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Eric Dumazet <eric.dumazet@...il.com>,
Jiri Pirko <jiri@...lanox.com>,
Ido Schimmel <idosch@...lanox.com>
Subject: Re: [PATCH net-next v1 2/5] net: sched: Introduce helpers for qevent blocks
Cong Wang <xiyou.wangcong@...il.com> writes:
> On Wed, Jul 8, 2020 at 5:13 PM Petr Machata <petrm@...lanox.com> wrote:
>>
>>
>> Petr Machata <petrm@...lanox.com> writes:
>>
>> > Cong Wang <xiyou.wangcong@...il.com> writes:
>> >
>> > I'll think about it some more. For now I will at least fix the lack of
>> > locking.
>>
>> I guess I could store smp_processor_id() that acquired the lock in
>> struct qdisc_skb_head. Do a trylock instead of lock, and on fail check
>> the stored value. I'll need to be careful about the race between
>> unsuccessful trylock and the test, and about making sure CPU ID doesn't
>> change after it is read. I'll probe this tomorrow.
>
> Like __netif_tx_lock(), right? Seems doable.
Good to see it actually used, I wasn't sure if the idea made sense :)
Unfortunately it is not enough.
Consider two threads (A, B) and two netdevices (eth0, eth1):
- "A" takes eth0's root lock and proceeds to classification
- "B" takes eth1's root lock and proceeds to classification
- "A" invokes mirror to eth1, waits on lock held by "B"
- "B" invakes mirror to eth0, waits on lock held by "A"
- Some say they are still waiting to this day.
So one option that I see is to just stash the mirrored packet in a queue
instead of delivering it right away:
- s/netif_receive_skb/netif_rx/ in act_mirred
- Reuse the RX queue for TX packets as well, differentiating the two by
a bit in SKB CB. Then process_backlog() would call either
__netif_receive_skb() or dev_queue_transmit().
- Drop mirred_rec_level guard.
This seems to work, but I might be missing something non-obvious, such
as CB actually being used for something already in that context. I would
really rather not introduce a second backlog queue just for mirred
though.
Since mirred_rec_level does not kick in anymore, the same packet can end
up being forwarded from the backlog queue, to the qdisc, and back to the
backlog queue, forever. But that seems OK, that's what the admin
configured, so that's what's happening.
If this is not a good idea for some reason, this might work as well:
- Convert the current root lock to an rw lock. Convert all current
lockers to write lock (which should be safe), except of enqueue, which
will take read lock. That will allow many concurrent threads to enter
enqueue, or one thread several times, but it will exclude all other
users.
So this guards configuration access to the qdisc tree, makes sure
qdiscs don't go away from under one's feet.
- Introduce another spin lock to guard the private data of the qdisc
tree, counters etc., things that even two concurrent enqueue
operations shouldn't trample on. Enqueue takes this spin lock after
read-locking the root lock. act_mirred drops it before injecting the
packet and takes it again afterwards.
Any opinions y'all?
Powered by blists - more mailing lists