lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iJa3FZHXfUWHw-OwOu8X_Cc0-YzxkgE_M=8DrBN1jWnAQ@mail.gmail.com>
Date:   Tue, 26 Apr 2022 06:32:31 -0700
From:   Eric Dumazet <edumazet@...gle.com>
To:     Menglong Dong <menglong8.dong@...il.com>
Cc:     Jakub Kicinski <kuba@...nel.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ingo Molnar <mingo@...hat.com>,
        David Miller <davem@...emloft.net>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        David Ahern <dsahern@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, benbjiang@...cent.com,
        flyingpeng@...cent.com, Menglong Dong <imagedong@...cent.com>,
        Martin KaFai Lau <kafai@...com>,
        Talal Ahmad <talalahmad@...gle.com>,
        Kees Cook <keescook@...omium.org>, mengensun@...cent.com,
        Dongli Zhang <dongli.zhang@...cle.com>,
        LKML <linux-kernel@...r.kernel.org>,
        netdev <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next 1/2] net: add skb drop reasons to inet connect request

On Tue, Apr 26, 2022 at 1:07 AM <menglong8.dong@...il.com> wrote:
>
> From: Menglong Dong <imagedong@...cent.com>
>
> The 'conn_request()' in struct inet_connection_sock_af_ops is used to
> process connection requesting for TCP/DCCP. Take TCP for example, it
> is just 'tcp_v4_conn_request()'.
>
> When non-zero value is returned by 'tcp_v4_conn_request()', the skb
> will be freed by kfree_skb() and a 'reset' packet will be send.
> Otherwise, it will be freed normally.
>
> In this code path, 'consume_skb()' is used in many abnormal cases, such
> as the accept queue of the listen socket full, which should be
> 'kfree_skb()'.
>
> Therefore, we make a little change to the 'conn_request()' interface.
> When 0 is returned, we call 'consume_skb()' as usual; when negative is
> returned, we call 'kfree_skb()' and send a 'reset' as usual; when
> positive is returned, which has not happened yet, we do nothing, and
> skb will be freed in 'conn_request()'. Then, we can use drop reasons
> in 'conn_request()'.
>
> Following new drop reasons are added:
>
>   SKB_DROP_REASON_LISTENOVERFLOWS
>   SKB_DROP_REASON_TCP_REQQFULLDROP
>
> Reviewed-by: Jiang Biao <benbjiang@...cent.com>
> Reviewed-by: Hao Peng <flyingpeng@...cent.com>
> Signed-off-by: Menglong Dong <imagedong@...cent.com>
> ---
>  include/linux/skbuff.h     |  4 ++++
>  include/trace/events/skb.h |  2 ++
>  net/dccp/input.c           | 12 +++++-------
>  net/ipv4/tcp_input.c       | 21 +++++++++++++--------
>  net/ipv4/tcp_ipv4.c        |  3 ++-
>  5 files changed, 26 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 84d78df60453..f33b3636bbce 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -469,6 +469,10 @@ enum skb_drop_reason {
>         SKB_DROP_REASON_PKT_TOO_BIG,    /* packet size is too big (maybe exceed
>                                          * the MTU)
>                                          */
> +       SKB_DROP_REASON_LISTENOVERFLOWS, /* accept queue of the listen socket is full */
> +       SKB_DROP_REASON_TCP_REQQFULLDROP, /* request queue of the listen
> +                                          * socket is full
> +                                          */
>         SKB_DROP_REASON_MAX,
>  };
>
> diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
> index a477bf907498..de6c93670437 100644
> --- a/include/trace/events/skb.h
> +++ b/include/trace/events/skb.h
> @@ -80,6 +80,8 @@
>         EM(SKB_DROP_REASON_IP_INADDRERRORS, IP_INADDRERRORS)    \
>         EM(SKB_DROP_REASON_IP_INNOROUTES, IP_INNOROUTES)        \
>         EM(SKB_DROP_REASON_PKT_TOO_BIG, PKT_TOO_BIG)            \
> +       EM(SKB_DROP_REASON_LISTENOVERFLOWS, LISTENOVERFLOWS)    \
> +       EM(SKB_DROP_REASON_TCP_REQQFULLDROP, TCP_REQQFULLDROP)  \
>         EMe(SKB_DROP_REASON_MAX, MAX)
>
>  #undef EM
> diff --git a/net/dccp/input.c b/net/dccp/input.c
> index 2cbb757a894f..ed20dfe83f66 100644
> --- a/net/dccp/input.c
> +++ b/net/dccp/input.c
> @@ -574,8 +574,7 @@ int dccp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
>         struct dccp_sock *dp = dccp_sk(sk);
>         struct dccp_skb_cb *dcb = DCCP_SKB_CB(skb);
>         const int old_state = sk->sk_state;
> -       bool acceptable;
> -       int queued = 0;
> +       int err, queued = 0;
>
>         /*
>          *  Step 3: Process LISTEN state
> @@ -606,13 +605,12 @@ int dccp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
>                          */
>                         rcu_read_lock();
>                         local_bh_disable();
> -                       acceptable = inet_csk(sk)->icsk_af_ops->conn_request(sk, skb) >= 0;
> +                       err = inet_csk(sk)->icsk_af_ops->conn_request(sk, skb);
>                         local_bh_enable();
>                         rcu_read_unlock();
> -                       if (!acceptable)
> -                               return 1;
> -                       consume_skb(skb);
> -                       return 0;
> +                       if (!err)
> +                               consume_skb(skb);
> +                       return err < 0;
>                 }
>                 if (dh->dccph_type == DCCP_PKT_RESET)
>                         goto discard;
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index daff631b9486..e0bbbd624246 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -6411,7 +6411,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
>         struct inet_connection_sock *icsk = inet_csk(sk);
>         const struct tcphdr *th = tcp_hdr(skb);
>         struct request_sock *req;
> -       int queued = 0;
> +       int err, queued = 0;
>         bool acceptable;
>         SKB_DR(reason);
>
> @@ -6438,14 +6438,13 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
>                          */
>                         rcu_read_lock();
>                         local_bh_disable();
> -                       acceptable = icsk->icsk_af_ops->conn_request(sk, skb) >= 0;
> +                       err = icsk->icsk_af_ops->conn_request(sk, skb);
>                         local_bh_enable();
>                         rcu_read_unlock();
>
> -                       if (!acceptable)
> -                               return 1;
> -                       consume_skb(skb);
> -                       return 0;
> +                       if (!err)
> +                               consume_skb(skb);

Please, do not add more mess like that, where skb is either freed by
the callee or the caller.


> +                       return err < 0;

Where err is set to a negative value ?


>                 }
>                 SKB_DR_SET(reason, TCP_FLAGS);
>                 goto discard;
> @@ -6878,6 +6877,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
>         bool want_cookie = false;
>         struct dst_entry *dst;
>         struct flowi fl;
> +       SKB_DR(reason);
>
>         /* TW buckets are converted to open requests without
>          * limitations, they conserve resources and peer is
> @@ -6886,12 +6886,15 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
>         if ((net->ipv4.sysctl_tcp_syncookies == 2 ||
>              inet_csk_reqsk_queue_is_full(sk)) && !isn) {
>                 want_cookie = tcp_syn_flood_action(sk, rsk_ops->slab_name);
> -               if (!want_cookie)
> +               if (!want_cookie) {
> +                       SKB_DR_SET(reason, TCP_REQQFULLDROP);
>                         goto drop;
> +               }
>         }
>
>         if (sk_acceptq_is_full(sk)) {
>                 NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);
> +               SKB_DR_SET(reason, LISTENOVERFLOWS);
>                 goto drop;
>         }
>
> @@ -6947,6 +6950,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
>                          */
>                         pr_drop_req(req, ntohs(tcp_hdr(skb)->source),
>                                     rsk_ops->family);
> +                       SKB_DR_SET(reason, TCP_REQQFULLDROP);
>                         goto drop_and_release;
>                 }
>
> @@ -7006,7 +7010,8 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
>  drop_and_free:
>         __reqsk_free(req);
>  drop:
> +       kfree_skb_reason(skb, reason);

Ugh no, prefer "return reason" and leave to the caller the freeing part.

Your changes are too invasive and will hurt future backports.


>         tcp_listendrop(sk);
> -       return 0;
> +       return 1;
>  }
>  EXPORT_SYMBOL(tcp_conn_request);
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 157265aecbed..b8daf49f54a5 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1470,7 +1470,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
>
>  drop:
>         tcp_listendrop(sk);
> -       return 0;

This return 0 meant : do not send reset.


> +       kfree_skb_reason(skb, SKB_DROP_REASON_IP_INADDRERRORS);

double kfree_skb() ?

> +       return 1;

-> send RESET

>  }
>  EXPORT_SYMBOL(tcp_v4_conn_request);
>
> --
> 2.36.0
>

I have a hard time understanding this patch.

Where is the related IPv6 change ?

I really wonder if you actually have tested this.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ