[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5d873c14-9d17-4c48-8e11-951b99270b75@intel.com>
Date: Thu, 9 Nov 2023 13:54:17 -0800
From: "Chittim, Madhu" <madhu.chittim@...el.com>
To: Maxim Mikityanskiy <maxtram95@...il.com>, Paolo Abeni <pabeni@...hat.com>
CC: <netdev@...r.kernel.org>, Jamal Hadi Salim <jhs@...atatu.com>, Cong Wang
<xiyou.wangcong@...il.com>, Jiri Pirko <jiri@...nulli.us>, "David S. Miller"
<davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski
<kuba@...nel.org>, Tariq Toukan <tariqt@...dia.com>, Gal Pressman
<gal@...dia.com>, Saeed Mahameed <saeedm@...dia.com>,
<xuejun.zhang@...el.com>, <sridhar.samudrala@...el.com>
Subject: Re: [PATCH net] net: sched: fix warn on htb offloaded class creation
On 10/31/2023 7:40 AM, Maxim Mikityanskiy wrote:
> On Tue, 31 Oct 2023 at 10:11:14 +0100, Paolo Abeni wrote:
>> Hi,
>>
>> I'm sorry for the late reply.
>>
>> On Fri, 2023-10-27 at 16:57 +0300, Maxim Mikityanskiy wrote:
>>> I believe this is not the right fix.
>>>
>>> On Thu, 26 Oct 2023 at 17:36:48 +0200, Paolo Abeni wrote:
>>>> The following commands:
>>>>
>>>> tc qdisc add dev eth1 handle 2: root htb offload
>>>> tc class add dev eth1 parent 2: classid 2:1 htb rate 5mbit burst 15k
>>>>
>>>> yeld to a WARN in the HTB qdisc:
>>>
>>> Something is off here. These are literally the most basic commands one
>>> could invoke with HTB offload, I'm sure they worked. Is it something
>>> that broke recently? Tariq/Gal/Saeed, could you check them on a Mellanox
>>> NIC?
>>>
>>>>
>>>> WARNING: CPU: 2 PID: 1583 at net/sched/sch_htb.c:1959
>>>> CPU: 2 PID: 1583 Comm: tc Kdump: loaded 6.6.0-rc2.mptcp_7895773e5235+ #59
>>>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
>>>> RIP: 0010:htb_change_class+0x25c4/0x2e30 [sch_htb]
>>>> Code: 24 58 48 b8 00 00 00 00 00 fc ff df 48 89 ca 48 c1 ea 03 80 3c 02 00 0f 85 92 01 00 00 49 89 8c 24 b0 01 00 00 e9 77 fc ff ff <0f> 0b e9 15 ec ff ff 80 3d f8 35 00 00 00 0f 85 d4 f9 ff ff ba 32
>>>> RSP: 0018:ffffc900015df240 EFLAGS: 00010246
>>>> RAX: 0000000000000000 RBX: ffff88811b4ca000 RCX: ffff88811db42800
>>>> RDX: 1ffff11023b68502 RSI: ffffffffaf2e6a00 RDI: ffff88811db42810
>>>> RBP: ffff88811db45000 R08: 0000000000000001 R09: fffffbfff664bbc9
>>>> R10: ffffffffb325de4f R11: ffffffffb2d33748 R12: 0000000000000000
>>>> R13: ffff88811db43000 R14: ffff88811b4caaac R15: ffff8881252c0030
>>>> FS: 00007f6c1f126740(0000) GS:ffff88815aa00000(0000) knlGS:0000000000000000
>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> CR2: 000055dca8e5b4a8 CR3: 000000011bc7a006 CR4: 0000000000370ee0
>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> Call Trace:
>>>> <TASK>
>>>> tc_ctl_tclass+0x394/0xeb0
>>>> rtnetlink_rcv_msg+0x2f5/0xaa0
>>>> netlink_rcv_skb+0x12e/0x3a0
>>>> netlink_unicast+0x421/0x730
>>>> netlink_sendmsg+0x79e/0xc60
>>>> ____sys_sendmsg+0x95a/0xc20
>>>> ___sys_sendmsg+0xee/0x170
>>>> __sys_sendmsg+0xc6/0x170
>>>> do_syscall_64+0x58/0x80
>>>> entry_SYSCALL_64_after_hwframe+0x6e/0xd8
>>>>
>>>> The first command creates per TX queue pfifo qdiscs in
>>>> tc_modify_qdisc() -> htb_init() and grafts the pfifo to each dev_queue
>>>> via tc_modify_qdisc() -> qdisc_graft() -> htb_attach().
>>>
>>> Not exactly; it grafts pfifo to direct queues only. htb_attach_offload
>>> explicitly grafts noop to all the remaining queues.
>>
>> num_direct_qdiscs == real_num_tx_queues:
>>
>> https://elixir.bootlin.com/linux/latest/source/net/sched/sch_htb.c#L1101
>>
>> pfifo will be configured on all the TX queues available at TC creation
>> time, right?
>
> Yes, all real TX queues will be used as direct queues (for unclassified
> traffic). num_tx_queues should be somewhat bigger than
> real_num_tx_queues - it should reserve a queue per potential leaf class.
>
> pfifo is configured on direct queues, and the reserved queues have noop.
> Then, when a new leaf class is added (TC_HTB_LEAF_ALLOC_QUEUE), the
> driver allocates a new queue and increases real_num_tx_queues. HTB
> assigns a pfifo qdisc to the newly allocated queue.
>
> Changing the hierarchy (deleting a node or converting an inner node to a
> leaf) may reorder the classful queues (with indexes >= the initial
> real_num_tx_queues), so that there are no gaps.
>
>> Lacking a mlx card with offload support I hack basic htb support in
>> netdevsim and I observe the splat on top of such device. I can as well
>> share the netdevsim patch - it will need some clean-up.
>
> I will be happy to review the netdevsim patch, but I don't promise
> prompt responsiveness.
>
>>>
>>>> When the command completes, the qdisc_sleeping for each dev_queue is a
>>>> pfifo one. The next class creation will trigger the reported splat.
>>>>
>>>> Address the issue taking care of old non-builtin qdisc in
>>>> htb_change_class().
>>>>
>>>> Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload")
>>>> Signed-off-by: Paolo Abeni <pabeni@...hat.com>
>>>> ---
>>>> net/sched/sch_htb.c | 3 +--
>>>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>>>
>>>> diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
>>>> index 0d947414e616..dc682bd542b4 100644
>>>> --- a/net/sched/sch_htb.c
>>>> +++ b/net/sched/sch_htb.c
>>>> @@ -1955,8 +1955,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
>>>> qdisc_refcount_inc(new_q);
>>>> }
>>>> old_q = htb_graft_helper(dev_queue, new_q);
>>>> - /* No qdisc_put needed. */
>>>> - WARN_ON(!(old_q->flags & TCQ_F_BUILTIN));
>>>> + qdisc_put(old_q);
>>>
>>> We can get here after one of two cases above:
>>>
>>> 1. A new queue is allocated with TC_HTB_LEAF_ALLOC_QUEUE. It's supposed
>>> to have a noop qdisc by default (after htb_attach_offload).
>>
>> So most likely the trivial netdevsim implementation I used was not good
>> enough.
>>
>> Which constrains should respect TC_HTB_LEAF_ALLOC_QUEUE WRT the
>> returned qid value? should it in the (real_num_tx_queues,
>> num_tx_queues] range?
>
> Let's say N is real_num_tx_queues as it was at the moment of attaching.
> HTB queues should be allocated from [N, num_tx_queues), and
> real_num_tx_queues should be increased accordingly. It should not return
> queues number [0, N).
>
> Deletions should fill the gaps: if queue X is being deleted, N <= X <
> real_num_tx_queues - 1, then the gap should be filled with queue number
> real_num_tx_queues - 1 by swapping the queues (real_num_tx_queues will
> be decreased by 1 accordingly). Some care also needs to be taken when
> converting inner-to-leaf (TC_HTB_LEAF_DEL_LAST) and leaf-to-inner (it's
> better to get insights from [1], there are also some comments).
>
>> Can HTB actually configure H/W shaping on
>> real_num_tx_queues?
>
> It will be on real_num_tx_queues, but after it's increased to add new
> HTB queues. The original queues [0, N) are used for direct traffic, same
> as the non-offloaded HTB's direct_queue (it's not shaped).
>
>> I find no clear documentation WRT the above.
>
> I'm sorry for the lack of documentation. All I have is the commit
> message [2] and a netdev talk [3]. Maybe the slides could be of some
> use...
>
> I hope the above explanation clarifies something, and feel free to ask
> further questions, I'll be glad to explain what hasn't been documented
> properly.
We would like to enable Tx rate limiting using htb offload on all the
existing queues. We are able to do with the following set of commands
with Paolo's patch
tc qdisc add dev enp175s0v0 handle 1: root htb offload
tc class add dev enp175s0v0 parent 1: classid 1:1 htb rate 1000mbit ceil
2000mbit burst 100k
where
classid 1:1 is tx queue0
tx_minrate is rate 1000mbps
tx_maxrate is ceil 2000mbps
In order to not break your implementation could bring in if condition
instead WARN_ON, something like below
if (!(old_q->flags & TCQ_F_BUILTIN))
qdisc_put(old_q);
Would this work for you, please advise.
Powered by blists - more mailing lists