[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAM0EoM=mG51sY9-9_Bm8Sb=nUmF33b=Vcf6PvuOPqhoAQgUvwg@mail.gmail.com>
Date: Fri, 15 Dec 2023 09:42:47 -0500
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Taehee Yoo <ap420073@...il.com>
Cc: Jakub Kicinski <kuba@...nel.org>, Jamal Hadi Salim <hadi@...atatu.com>,
Eric Dumazet <edumazet@...gle.com>, Victor Nogueira <victor@...atatu.com>, xiyou.wangcong@...il.com,
jiri@...nulli.us, davem@...emloft.net, pabeni@...hat.com,
daniel@...earbox.net, dcaratti@...hat.com, netdev@...r.kernel.org,
kernel@...atatu.com
Subject: Re: [PATCH net-next v3 2/3] net: sched: Make tc-related drop reason
more flexible for remaining qdiscs
On Thu, Dec 14, 2023 at 12:18 PM Taehee Yoo <ap420073@...il.com> wrote:
>
> On 12/15/23 00:31, Jamal Hadi Salim wrote:
>
> Hi Jamal,
>
> > On Wed, Dec 13, 2023 at 4:08 PM Jakub Kicinski <kuba@...nel.org> wrote:
> >>
> >> On Wed, 13 Dec 2023 13:36:31 -0500 Jamal Hadi Salim wrote:
> >>> Putting this to rest:
> >>> Other than fq codel, the others that deal with multiple skbs due to
> >>> gso segments. So the conclusion is: if we have a bunch in the list
> >>> then they all suffer the same fate. So a single reason for the list is
> >>> sufficient.
> >>
> >> Alright.
> >>
> >> I'm still a bit confused about the cb, tho.
> >>
> >> struct qdisc_skb_cb is the state struct.
> >
> > Per packet state within tc though, no? Once it leaves tc whatever sits
> > in that space cant be trusted to be valid.
> > To answer your earlier question tcf_result is not available at the
> > qdisc level (when we call free_skb_list() but cb is and thats why we
> > used it)
> >
> >> But we put the drop reason in struct tc_skb_cb.
> >> How does that work. Qdiscs will assume they own all of
> >> qdisc_skb_cb::data ?
> >>
> >
> > Short answer, yes. Anyone can scribble over that. And multiple
> > consumers have a food fight going on - but it is expected behavior:
> > ebpf's skb->cb, cake, fq_codel etc - all use qdisc_skb_cb::data.
> > Out of the 48B in skb->cb qdisc_skb_cb redefined the first 28B and
> > left in qdisc_skb_cb::data as free-for-all space. I think,
> > unfortunately, that is now cast in stone.
> > Which still leaves us 20 bytes which is now being squatered by
> > tc_skb_cb where the drop reason was placed. Shit, i just noticed this
> > is not exclusive - seems like
> > drivers/net/amt.c is using this space - not sure why it is even using
> > tc space. I am wondering if we can make it use the 20B scratch pad.
> > +Cc author Taehee Yoo - it doesnt seem to conflict but strange that it
> > is considering qdisc_skb_cb
>
> The reason why amt considers qdisc_skb_cb is to not use CB area the TC
> is using.
> When amt driver send igmp/mld packet, it stores tunnel data in CB before
> calling dev_queue_xmit().
> Then, it uses that tunnel data from CB in the amt_dev_xmit().
> So, amt CB area should not be used by TC, this is the reason why it
> considers qdisc_skb_cb size.
> But It looks wrong, it should use tc_skb_cb instead of qdisc_skb_cb to
> fully avoid CB area of TC, right?
Yeah, that doesnt seem safe if you are storing state expecting it to
be intact afterwards - tc will definitely overwrite parts of it today.
See this:
struct qdisc_skb_cb {
struct {
unsigned int pkt_len;
u16 slave_dev_queue_mapping;
u16 tc_classid;
};
#define QDISC_CB_PRIV_LEN 20
unsigned char data[QDISC_CB_PRIV_LEN];
};
struct tc_skb_cb {
struct qdisc_skb_cb qdisc_cb;
u16 mru;
u8 post_ct:1;
u8 post_ct_snat:1;
u8 post_ct_dnat:1;
u16 zone; /* Only valid if post_ct = true */
};
tc_skb_cb->mru etc are writting over the area you are using.
The rule is you cant trust skb->cb content after it goes out of your "domain".
cheers,
jamal
> >
> >> Maybe some documentation about the lifetimes of these things
> >> would clarify things?
> >
> > What text do you think makes sense and where should it go?
> >
> > cheers,
> > jamal
Powered by blists - more mailing lists