lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAM0EoM=mG51sY9-9_Bm8Sb=nUmF33b=Vcf6PvuOPqhoAQgUvwg@mail.gmail.com>
Date: Fri, 15 Dec 2023 09:42:47 -0500
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Taehee Yoo <ap420073@...il.com>
Cc: Jakub Kicinski <kuba@...nel.org>, Jamal Hadi Salim <hadi@...atatu.com>, 
	Eric Dumazet <edumazet@...gle.com>, Victor Nogueira <victor@...atatu.com>, xiyou.wangcong@...il.com, 
	jiri@...nulli.us, davem@...emloft.net, pabeni@...hat.com, 
	daniel@...earbox.net, dcaratti@...hat.com, netdev@...r.kernel.org, 
	kernel@...atatu.com
Subject: Re: [PATCH net-next v3 2/3] net: sched: Make tc-related drop reason
 more flexible for remaining qdiscs

On Thu, Dec 14, 2023 at 12:18 PM Taehee Yoo <ap420073@...il.com> wrote:
>
> On 12/15/23 00:31, Jamal Hadi Salim wrote:
>
> Hi Jamal,
>
>  > On Wed, Dec 13, 2023 at 4:08 PM Jakub Kicinski <kuba@...nel.org> wrote:
>  >>
>  >> On Wed, 13 Dec 2023 13:36:31 -0500 Jamal Hadi Salim wrote:
>  >>> Putting this to rest:
>  >>> Other than fq codel, the others that deal with multiple skbs due to
>  >>> gso segments. So the conclusion is: if we have a bunch in the list
>  >>> then they all suffer the same fate. So a single reason for the list is
>  >>> sufficient.
>  >>
>  >> Alright.
>  >>
>  >> I'm still a bit confused about the cb, tho.
>  >>
>  >> struct qdisc_skb_cb is the state struct.
>  >
>  > Per packet state within tc though, no? Once it leaves tc whatever sits
>  > in that space cant be trusted to be valid.
>  > To answer your earlier question tcf_result is not available at the
>  > qdisc level (when we call free_skb_list() but cb is and thats why we
>  > used it)
>  >
>  >> But we put the drop reason in struct tc_skb_cb.
>  >> How does that work. Qdiscs will assume they own all of
>  >> qdisc_skb_cb::data ?
>  >>
>  >
>  > Short answer, yes. Anyone can scribble over that. And multiple
>  > consumers have a food fight going on - but it is expected behavior:
>  > ebpf's skb->cb, cake, fq_codel etc - all use qdisc_skb_cb::data.
>  > Out of the 48B in skb->cb qdisc_skb_cb redefined the first 28B and
>  > left in qdisc_skb_cb::data as free-for-all space. I think,
>  > unfortunately, that is now cast in stone.
>  > Which still leaves us 20 bytes which is now being squatered by
>  > tc_skb_cb where the drop reason was placed. Shit, i just noticed this
>  > is not exclusive - seems like
>  > drivers/net/amt.c is using this space - not sure why it is even using
>  > tc space. I am wondering if we can make it use the 20B scratch pad.
>  > +Cc author Taehee Yoo - it doesnt seem to conflict but strange that it
>  > is considering qdisc_skb_cb
>
> The reason why amt considers qdisc_skb_cb is to not use CB area the TC
> is using.
> When amt driver send igmp/mld packet, it stores tunnel data in CB before
> calling dev_queue_xmit().
> Then, it uses that tunnel data from CB in the amt_dev_xmit().
> So, amt CB area should not be used by TC, this is the reason why it
> considers qdisc_skb_cb size.
> But It looks wrong, it should use tc_skb_cb instead of qdisc_skb_cb to
> fully avoid CB area of TC, right?


Yeah, that doesnt seem safe if you are storing state expecting it to
be intact afterwards - tc will definitely overwrite parts of it today.
See this:

struct qdisc_skb_cb {
        struct {
                unsigned int            pkt_len;
                u16                     slave_dev_queue_mapping;
                u16                     tc_classid;
        };
#define QDISC_CB_PRIV_LEN 20
        unsigned char           data[QDISC_CB_PRIV_LEN];
};

struct tc_skb_cb {
        struct qdisc_skb_cb qdisc_cb;

        u16 mru;
        u8 post_ct:1;
        u8 post_ct_snat:1;
        u8 post_ct_dnat:1;
        u16 zone; /* Only valid if post_ct = true */
};

tc_skb_cb->mru etc are writting over the area you are using.

The rule is you cant trust skb->cb content after it goes out of your "domain".

cheers,
jamal

>  >
>  >> Maybe some documentation about the lifetimes of these things
>  >> would clarify things?
>  >
>  > What text do you think makes sense and where should it go?
>  >
>  > cheers,
>  > jamal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ