netdev - Re: [PATCH net-next 2/2] trace: tcp: fully support trace_tcp_send

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoD-2g2BTYOXfWEfrJ5DYh3PQcUn=Sk8_Zr6ZZLj6Pokzg@mail.gmail.com>
Date: Mon, 11 Mar 2024 13:00:14 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Ratheesh Kannoth <rkannoth@...vell.com>
Cc: edumazet@...gle.com, mhiramat@...nel.org, mathieu.desnoyers@...icios.com, 
	rostedt@...dmis.org, kuba@...nel.org, pabeni@...hat.com, davem@...emloft.net, 
	netdev@...r.kernel.org, linux-trace-kernel@...r.kernel.org, 
	Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next 2/2] trace: tcp: fully support trace_tcp_send_reset

On Mon, Mar 11, 2024 at 11:27 AM Ratheesh Kannoth <rkannoth@...vell.com> wrote:
>
> On 2024-03-11 at 08:11:04, Jason Xing (kerneljasonxing@...il.com) wrote:
> > From: Jason Xing <kernelxing@...cent.com>
> >
> > Prior to this patch, what we can see by enabling trace_tcp_send is
> > only happening under two circumstances:
> > 1) active rst mode
> > 2) non-active rst mode and based on the full socket
> >
> > That means the inconsistency occurs if we use tcpdump and trace
> > simultaneously to see how rst happens.
> >
> > It's necessary that we should take into other cases into considerations,
> > say:
> > 1) time-wait socket
> > 2) no socket
> > ...
> >
> > By parsing the incoming skb and reversing its 4-turple can
> > we know the exact 'flow' which might not exist.
> >
> > Samples after applied this patch:
> > 1. tcp_send_reset: skbaddr=XXX skaddr=XXX src=ip:port dest=ip:port
> > state=TCP_ESTABLISHED
> > 2. tcp_send_reset: skbaddr=000...000 skaddr=XXX src=ip:port dest=ip:port
> > state=UNKNOWN
> > Note:
> > 1) UNKNOWN means we cannot extract the right information from skb.
> > 2) skbaddr/skaddr could be 0
> >
> > Signed-off-by: Jason Xing <kernelxing@...cent.com>
> > ---
> >  include/trace/events/tcp.h | 39 ++++++++++++++++++++++++++++++++++++--
> >  net/ipv4/tcp_ipv4.c        |  4 ++--
> >  net/ipv6/tcp_ipv6.c        |  3 ++-
> >  3 files changed, 41 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
> > index 2495a1d579be..6c09d7941583 100644
> > --- a/include/trace/events/tcp.h
> > +++ b/include/trace/events/tcp.h
> > @@ -107,11 +107,46 @@ DEFINE_EVENT(tcp_event_sk_skb, tcp_retransmit_skb,
> >   * skb of trace_tcp_send_reset is the skb that caused RST. In case of
> >   * active reset, skb should be NULL
> >   */
> > -DEFINE_EVENT(tcp_event_sk_skb, tcp_send_reset,
> > +TRACE_EVENT(tcp_send_reset,
> >
> >       TP_PROTO(const struct sock *sk, const struct sk_buff *skb),
> >
> > -     TP_ARGS(sk, skb)
> > +     TP_ARGS(sk, skb),
> > +
> > +     TP_STRUCT__entry(
> > +             __field(const void *, skbaddr)
> > +             __field(const void *, skaddr)
> > +             __field(int, state)
> > +             __array(__u8, saddr, sizeof(struct sockaddr_in6))
> > +             __array(__u8, daddr, sizeof(struct sockaddr_in6))
> > +     ),
> > +
> > +     TP_fast_assign(
> > +             __entry->skbaddr = skb;
> > +             __entry->skaddr = sk;
> > +             /* Zero means unknown state. */
> > +             __entry->state = sk ? sk->sk_state : 0;
> > +
> > +             memset(__entry->saddr, 0, sizeof(struct sockaddr_in6));
> > +             memset(__entry->daddr, 0, sizeof(struct sockaddr_in6));
> > +
> > +             if (sk && sk_fullsock(sk)) {
> > +                     const struct inet_sock *inet = inet_sk(sk);
> > +
> > +                     TP_STORE_ADDR_PORTS(__entry, inet, sk);
> > +             } else {
> > +                     /*
> > +                      * We should reverse the 4-turple of skb, so later
> > +                      * it can print the right flow direction of rst.
> > +                      */
> > +                     TP_STORE_ADDR_PORTS_SKB(skb, entry->daddr, entry->saddr);
> > +             }
> > +     ),
> > +
> > +     TP_printk("skbaddr=%p skaddr=%p src=%pISpc dest=%pISpc state=%s",
> Could you consider using %px ? is it permitted ? it will be easy to track skb.

I prefer not to use %px because we cannot make use of the real address
of skb. Besides, using %px would leak kernel addresses.

Here is the Documentation (see Documentation/core-api/printk-formats.rst):
"Pointers printed without a specifier extension (i.e unadorned %p) are
hashed to give a unique identifier without leaking kernel addresses to user
space."

Perhaps, that's the reason why all the tracepoints didn't print in %px format:)

Thanks,
Jason

>
> > +               __entry->skbaddr, __entry->skaddr,
> > +               __entry->saddr, __entry->daddr,
> > +               __entry->state ? show_tcp_state_name(__entry->state) : "UNKNOWN")
> >  );
> >
> >  /*
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index a22ee5838751..d5c4a969c066 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -868,10 +868,10 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb)
> >        */
> >       if (sk) {
> >               arg.bound_dev_if = sk->sk_bound_dev_if;
> > -             if (sk_fullsock(sk))
> > -                     trace_tcp_send_reset(sk, skb);
> >       }
> >
> > +     trace_tcp_send_reset(sk, skb);
> > +
> >       BUILD_BUG_ON(offsetof(struct sock, sk_bound_dev_if) !=
> >                    offsetof(struct inet_timewait_sock, tw_bound_dev_if));
> >
> > diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> > index 3f4cba49e9ee..8e9c59b6c00c 100644
> > --- a/net/ipv6/tcp_ipv6.c
> > +++ b/net/ipv6/tcp_ipv6.c
> > @@ -1113,7 +1113,6 @@ static void tcp_v6_send_reset(const struct sock *sk, struct sk_buff *skb)
> >       if (sk) {
> >               oif = sk->sk_bound_dev_if;
> >               if (sk_fullsock(sk)) {
> > -                     trace_tcp_send_reset(sk, skb);
> >                       if (inet6_test_bit(REPFLOW, sk))
> >                               label = ip6_flowlabel(ipv6h);
> >                       priority = READ_ONCE(sk->sk_priority);
> > @@ -1129,6 +1128,8 @@ static void tcp_v6_send_reset(const struct sock *sk, struct sk_buff *skb)
> >                       label = ip6_flowlabel(ipv6h);
> >       }
> >
> > +     trace_tcp_send_reset(sk, skb);
> > +
> >       tcp_v6_send_response(sk, skb, seq, ack_seq, 0, 0, 0, oif, 1,
> >                            ipv6_get_dsfield(ipv6h), label, priority, txhash,
> >                            &key);
> > --
> > 2.37.3
> >