lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZUEQzsKiIlgtbN-S@mail.gmail.com>
Date: Tue, 31 Oct 2023 16:40:44 +0200
From: Maxim Mikityanskiy <maxtram95@...il.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: netdev@...r.kernel.org, Jamal Hadi Salim <jhs@...atatu.com>,
	Cong Wang <xiyou.wangcong@...il.com>, Jiri Pirko <jiri@...nulli.us>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Tariq Toukan <tariqt@...dia.com>,
	Gal Pressman <gal@...dia.com>, Saeed Mahameed <saeedm@...dia.com>
Subject: Re: [PATCH net] net: sched: fix warn on htb offloaded class creation

On Tue, 31 Oct 2023 at 10:11:14 +0100, Paolo Abeni wrote:
> Hi,
> 
> I'm sorry for the late reply.
> 
> On Fri, 2023-10-27 at 16:57 +0300, Maxim Mikityanskiy wrote:
> > I believe this is not the right fix.
> > 
> > On Thu, 26 Oct 2023 at 17:36:48 +0200, Paolo Abeni wrote:
> > > The following commands:
> > > 
> > > tc qdisc add dev eth1 handle 2: root htb offload
> > > tc class add dev eth1 parent 2: classid 2:1 htb rate 5mbit burst 15k
> > > 
> > > yeld to a WARN in the HTB qdisc:
> > 
> > Something is off here. These are literally the most basic commands one
> > could invoke with HTB offload, I'm sure they worked. Is it something
> > that broke recently? Tariq/Gal/Saeed, could you check them on a Mellanox
> > NIC?
> > 
> > > 
> > >  WARNING: CPU: 2 PID: 1583 at net/sched/sch_htb.c:1959
> > >  CPU: 2 PID: 1583 Comm: tc Kdump: loaded 6.6.0-rc2.mptcp_7895773e5235+ #59
> > >  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
> > >  RIP: 0010:htb_change_class+0x25c4/0x2e30 [sch_htb]
> > >  Code: 24 58 48 b8 00 00 00 00 00 fc ff df 48 89 ca 48 c1 ea 03 80 3c 02 00 0f 85 92 01 00 00 49 89 8c 24 b0 01 00 00 e9 77 fc ff ff <0f> 0b e9 15 ec ff ff 80 3d f8 35 00 00 00 0f 85 d4 f9 ff ff ba 32
> > >  RSP: 0018:ffffc900015df240 EFLAGS: 00010246
> > >  RAX: 0000000000000000 RBX: ffff88811b4ca000 RCX: ffff88811db42800
> > >  RDX: 1ffff11023b68502 RSI: ffffffffaf2e6a00 RDI: ffff88811db42810
> > >  RBP: ffff88811db45000 R08: 0000000000000001 R09: fffffbfff664bbc9
> > >  R10: ffffffffb325de4f R11: ffffffffb2d33748 R12: 0000000000000000
> > >  R13: ffff88811db43000 R14: ffff88811b4caaac R15: ffff8881252c0030
> > >  FS:  00007f6c1f126740(0000) GS:ffff88815aa00000(0000) knlGS:0000000000000000
> > >  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >  CR2: 000055dca8e5b4a8 CR3: 000000011bc7a006 CR4: 0000000000370ee0
> > >  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > >  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > >  Call Trace:
> > >  <TASK>
> > >   tc_ctl_tclass+0x394/0xeb0
> > >   rtnetlink_rcv_msg+0x2f5/0xaa0
> > >   netlink_rcv_skb+0x12e/0x3a0
> > >   netlink_unicast+0x421/0x730
> > >   netlink_sendmsg+0x79e/0xc60
> > >   ____sys_sendmsg+0x95a/0xc20
> > >   ___sys_sendmsg+0xee/0x170
> > >   __sys_sendmsg+0xc6/0x170
> > >  do_syscall_64+0x58/0x80
> > >  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> > > 
> > > The first command creates per TX queue pfifo qdiscs in
> > > tc_modify_qdisc() -> htb_init() and grafts the pfifo to each dev_queue
> > > via tc_modify_qdisc() ->  qdisc_graft() -> htb_attach().
> > 
> > Not exactly; it grafts pfifo to direct queues only. htb_attach_offload
> > explicitly grafts noop to all the remaining queues.
> 
> num_direct_qdiscs == real_num_tx_queues:
> 
> https://elixir.bootlin.com/linux/latest/source/net/sched/sch_htb.c#L1101
> 
> pfifo will be configured on all the TX queues available at TC creation
> time, right?

Yes, all real TX queues will be used as direct queues (for unclassified
traffic). num_tx_queues should be somewhat bigger than
real_num_tx_queues - it should reserve a queue per potential leaf class.

pfifo is configured on direct queues, and the reserved queues have noop.
Then, when a new leaf class is added (TC_HTB_LEAF_ALLOC_QUEUE), the
driver allocates a new queue and increases real_num_tx_queues. HTB
assigns a pfifo qdisc to the newly allocated queue.

Changing the hierarchy (deleting a node or converting an inner node to a
leaf) may reorder the classful queues (with indexes >= the initial
real_num_tx_queues), so that there are no gaps.

> Lacking a mlx card with offload support I hack basic htb support in
> netdevsim and I observe the splat on top of such device. I can as well
> share the netdevsim patch - it will need some clean-up.

I will be happy to review the netdevsim patch, but I don't promise
prompt responsiveness.

> > 
> > > When the command completes, the qdisc_sleeping for each dev_queue is a
> > > pfifo one. The next class creation will trigger the reported splat.
> > > 
> > > Address the issue taking care of old non-builtin qdisc in
> > > htb_change_class().
> > > 
> > > Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload")
> > > Signed-off-by: Paolo Abeni <pabeni@...hat.com>
> > > ---
> > >  net/sched/sch_htb.c | 3 +--
> > >  1 file changed, 1 insertion(+), 2 deletions(-)
> > > 
> > > diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
> > > index 0d947414e616..dc682bd542b4 100644
> > > --- a/net/sched/sch_htb.c
> > > +++ b/net/sched/sch_htb.c
> > > @@ -1955,8 +1955,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
> > >  				qdisc_refcount_inc(new_q);
> > >  			}
> > >  			old_q = htb_graft_helper(dev_queue, new_q);
> > > -			/* No qdisc_put needed. */
> > > -			WARN_ON(!(old_q->flags & TCQ_F_BUILTIN));
> > > +			qdisc_put(old_q);
> > 
> > We can get here after one of two cases above:
> > 
> > 1. A new queue is allocated with TC_HTB_LEAF_ALLOC_QUEUE. It's supposed
> > to have a noop qdisc by default (after htb_attach_offload).
> 
> So most likely the trivial netdevsim implementation I used was not good
> enough.
> 
> Which constrains should respect TC_HTB_LEAF_ALLOC_QUEUE WRT the
> returned qid value? should it in the (real_num_tx_queues,
> num_tx_queues] range?

Let's say N is real_num_tx_queues as it was at the moment of attaching.
HTB queues should be allocated from [N, num_tx_queues), and
real_num_tx_queues should be increased accordingly. It should not return
queues number [0, N).

Deletions should fill the gaps: if queue X is being deleted, N <= X <
real_num_tx_queues - 1, then the gap should be filled with queue number
real_num_tx_queues - 1 by swapping the queues (real_num_tx_queues will
be decreased by 1 accordingly). Some care also needs to be taken when
converting inner-to-leaf (TC_HTB_LEAF_DEL_LAST) and leaf-to-inner (it's
better to get insights from [1], there are also some comments).

> Can HTB actually configure H/W shaping on
> real_num_tx_queues?

It will be on real_num_tx_queues, but after it's increased to add new
HTB queues. The original queues [0, N) are used for direct traffic, same
as the non-offloaded HTB's direct_queue (it's not shaped).

> I find no clear documentation WRT the above.

I'm sorry for the lack of documentation. All I have is the commit
message [2] and a netdev talk [3]. Maybe the slides could be of some
use...

I hope the above explanation clarifies something, and feel free to ask
further questions, I'll be glad to explain what hasn't been documented
properly.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en/htb.c?id=5a6a09e97199d6600d31383055f9d43fbbcbe86f
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d03b195b5aa015f6c11988b86a3625f8d5dbac52
[3]: https://netdevconf.info/0x14/session.html?talk-hierarchical-QoS-hardware-offload

> Thanks!
> 
> Paolo
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ