lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM0EoMkFkJBGc5wsYec+1QZ_o6tEi6vm_KjAJV8SWB4EOPcppg@mail.gmail.com>
Date: Thu, 11 Jan 2024 14:55:04 -0500
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Petr Machata <petrm@...dia.com>
Cc: Ido Schimmel <idosch@...sch.org>, Jiri Pirko <jiri@...nulli.us>, netdev@...r.kernel.org, 
	kuba@...nel.org, pabeni@...hat.com, davem@...emloft.net, edumazet@...gle.com, 
	xiyou.wangcong@...il.com, victor@...atatu.com, pctammela@...atatu.com, 
	mleitner@...hat.com, vladbu@...dia.com, paulb@...dia.com
Subject: Re: [patch net-next] net: sched: move block device tracking into tcf_block_get/put_ext()

On Thu, Jan 11, 2024 at 11:55 AM Petr Machata <petrm@...dia.com> wrote:
>
>
> Jamal Hadi Salim <jhs@...atatu.com> writes:
>
> > On Thu, Jan 11, 2024 at 10:40 AM Jamal Hadi Salim <jhs@...atatu.com> wrote:
> >>
> >> On Wed, Jan 10, 2024 at 7:10 AM Ido Schimmel <idosch@...sch.org> wrote:
> >> >
> >> > On Thu, Jan 04, 2024 at 01:58:44PM +0100, Jiri Pirko wrote:
> >> > > diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> >> > > index adf5de1ff773..253b26f2eddd 100644
> >> > > --- a/net/sched/cls_api.c
> >> > > +++ b/net/sched/cls_api.c
> >> > > @@ -1428,6 +1428,7 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
> >> > >                     struct tcf_block_ext_info *ei,
> >> > >                     struct netlink_ext_ack *extack)
> >> > >  {
> >> > > +     struct net_device *dev = qdisc_dev(q);
> >> > >       struct net *net = qdisc_net(q);
> >> > >       struct tcf_block *block = NULL;
> >> > >       int err;
> >> > > @@ -1461,9 +1462,18 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
> >> > >       if (err)
> >> > >               goto err_block_offload_bind;
> >> > >
> >> > > +     if (tcf_block_shared(block)) {
> >> > > +             err = xa_insert(&block->ports, dev->ifindex, dev, GFP_KERNEL);
> >> > > +             if (err) {
> >> > > +                     NL_SET_ERR_MSG(extack, "block dev insert failed");
> >> > > +                     goto err_dev_insert;
> >> > > +             }
> >> > > +     }
> >> >
> >> > While this patch fixes the original issue, it creates another one:
> >> >
> >> > # ip link add name swp1 type dummy
> >> > # tc qdisc replace dev swp1 root handle 10: prio bands 8 priomap 7 6 5 4 3 2 1
> >> > # tc qdisc add dev swp1 parent 10:8 handle 108: red limit 1000000 min 200000 max 200001 probability 1.0 avpkt 8000 burst 38 qevent early_drop block 10
> >> > RED: set bandwidth to 10Mbit
> >> > # tc qdisc add dev swp1 parent 10:7 handle 107: red limit 1000000 min 500000 max 500001 probability 1.0 avpkt 8000 burst 63 qevent early_drop block 10
> >> > RED: set bandwidth to 10Mbit
> >> > Error: block dev insert failed.
> >> >
> >>
> >>
> >> +cc Petr
> >> We'll add a testcase on tdc - it doesnt seem we have any for qevents.
> >> If you have others that are related let us know.
> >> But how does this work? I see no mention of block on red code and i
>
> Look for qe_early_drop and qe_mark in sch_red.c.
>

I see it...

> >> see no mention of block on the reproducer above.
> >
> > Context: Yes, i see it on red setup but i dont see any block being setup.
>
> qevents are binding locations for blocks, similar in principle to
> clsact's ingress_block / egress_block. So the way to create a block is
> the same: just mention the block number for the first time.
>
> What qevents there are depends on the qdisc. They are supposed to
> reflect events that are somehow interesting, from the point of view of
> an skb within a qdisc. Thus RED has two qevents: early_drop for packets
> that were chosen to be, well, dropped early, and mark for packets that
> are ECN-marked. So when a packet is, say, early-dropped, the RED qdisc
> passes it through the TC block bound at that qevent (if any).
>

Ok, the confusing part was the missing block command. I am assuming in
addition to Ido's example one would need to create block 10 and then
attach a filter to it?
Something like:
tc qdisc add dev swp1 parent 10:7 handle 107: red limit 1000000 min
500000 max 500001 probability 1.0 avpkt 8000 burst 63 qevent
early_drop block 10
tc filter add block 10 ...

So a packet tagged for early drop will end up being processed in some
filter chain with some specified actions. So in the case of offload,
does it mean early drops will be sent to the kernel and land at the
specific chain? Also trying to understand (in retrospect, not armchair
lawyering): why was a block necessary? feels like the goto chain
action could have worked, no? i.e something like: qevent early_drop
goto chain x.. Is the block perhaps tied to something in the h/w or is
it just some clever metainfo that is used to jump to tc block when the
exceptions happen?

Important thing is we need tests so we can catch these regressions in
the future.  If you can, point me to some (outside of the ones Ido
posted) and we'll put them on tdc.

> > Also: Is it only Red or other qdiscs could behave this way?
>
> Currently only red supports any qevents at all, but in principle the
> mechanism is reusable. With my mlxsw hat on, an obvious next candidate
> would be tail_drop on FIFO qdisc.

Sounds cool. I can see use even for s/w only dpath.

cheers,
jamal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ