lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 25 May 2024 10:14:09 -0400
From: Neal Cardwell <ncardwell@...gle.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, 
	Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org, eric.dumazet@...il.com
Subject: Re: [PATCH net 1/4] tcp: add tcp_done_with_error() helper

On Fri, May 24, 2024 at 3:36 PM Eric Dumazet <edumazet@...gle.com> wrote:
>
> tcp_reset() ends with a sequence that is carefuly ordered.
>
> We need to fix [e]poll bugs in the following patches,
> it makes sense to use a common helper.
>
> Suggested-by: Neal Cardwell <ncardwell@...gle.com>
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> ---
>  include/net/tcp.h    |  1 +
>  net/ipv4/tcp.c       |  2 +-
>  net/ipv4/tcp_input.c | 25 +++++++++++++++++--------
>  3 files changed, 19 insertions(+), 9 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 060e95b331a286ad7c355be11dc03250d2944920..2e7150f6755a5f5bf7b45454da0b33c5fac78183 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -677,6 +677,7 @@ void tcp_skb_collapse_tstamp(struct sk_buff *skb,
>  /* tcp_input.c */
>  void tcp_rearm_rto(struct sock *sk);
>  void tcp_synack_rtt_meas(struct sock *sk, struct request_sock *req);
> +void tcp_done_with_error(struct sock *sk);
>  void tcp_reset(struct sock *sk, struct sk_buff *skb);
>  void tcp_fin(struct sock *sk);
>  void tcp_check_space(struct sock *sk);
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 681b54e1f3a64387787738ab6495531b8abe1771..2a8f8d8676ff1d30ea9f8cd47ccf9236940eb299 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -598,7 +598,7 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
>                  */
>                 mask |= EPOLLOUT | EPOLLWRNORM;
>         }
> -       /* This barrier is coupled with smp_wmb() in tcp_reset() */
> +       /* This barrier is coupled with smp_wmb() in tcp_done_with_error() */
>         smp_rmb();
>         if (READ_ONCE(sk->sk_err) ||
>             !skb_queue_empty_lockless(&sk->sk_error_queue))
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 9c04a9c8be9dfaa0ec2437b3748284e57588b216..5af716f1bc74e095d22f64d605624decfe27cefe 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -4436,6 +4436,22 @@ static enum skb_drop_reason tcp_sequence(const struct tcp_sock *tp,
>         return SKB_NOT_DROPPED_YET;
>  }
>
> +
> +void tcp_done_with_error(struct sock *sk)
> +{
> +       /* Our caller wrote a value into sk->sk_err.
> +        * This barrier is coupled with smp_rmb() in tcp_poll()
> +        */
> +       smp_wmb();
> +
> +       tcp_write_queue_purge(sk);
> +       tcp_done(sk);
> +
> +       if (!sock_flag(sk, SOCK_DEAD))
> +               sk_error_report(sk);
> +}
> +EXPORT_SYMBOL(tcp_done_with_error);
> +
>  /* When we get a reset we do this. */
>  void tcp_reset(struct sock *sk, struct sk_buff *skb)
>  {
> @@ -4460,14 +4476,7 @@ void tcp_reset(struct sock *sk, struct sk_buff *skb)
>         default:
>                 WRITE_ONCE(sk->sk_err, ECONNRESET);
>         }
> -       /* This barrier is coupled with smp_rmb() in tcp_poll() */
> -       smp_wmb();
> -
> -       tcp_write_queue_purge(sk);
> -       tcp_done(sk);
> -
> -       if (!sock_flag(sk, SOCK_DEAD))
> -               sk_error_report(sk);
> +       tcp_done_with_error(sk);
>  }
>
>  /*
> --

Thanks, Eric!

Thinking about this more, I wonder if there is another aspect to this issue.

I am thinking about this part of tcp_done():

void tcp_done(struct sock *sk)
{
...
        sk->sk_shutdown = SHUTDOWN_MASK;

        if (!sock_flag(sk, SOCK_DEAD))
                sk->sk_state_change(sk);

The tcp_poll() code reads sk->sk_shutdown to decide whether to set
EPOLLHUP and other bits. However, sk->sk_shutdown is not set until
here in tcp_done(). And in the tcp_done() code there is no smp_wmb()
to ensure that the sk->sk_shutdown is visible to other CPUs before
tcp_done() calls sk->sk_state_change() to wake up threads sleeping on
sk->sk_wq.

So AFAICT we could have cases where this sk->sk_state_change() (or the
later sk_error_report()?) wakes a thread doing a tcp_poll() on another
CPU, and the tcp_poll() code may correctly see the sk->sk_err because
it was updated before the smp_wmb() in tcp_done_with_error(), but may
fail to see the "sk->sk_shutdown = SHUTDOWN_MASK" write because that
happened after the smp_wmb() in tcp_done_with_error().

So AFAICT  maybe we need two changes?

(1) AFAICT the call to smp_wmb() should actually instead be inside
tcp_done(), after we set sk->sk_shutdown?

void tcp_done(struct sock *sk)
{
        ...
        sk->sk_shutdown = SHUTDOWN_MASK;

        /* Ensure previous writes to sk->sk_err, sk->sk_state,
         * sk->sk_shutdown are visible to others.
         * This barrier is coupled with smp_rmb() in tcp_poll()
         */
        smp_wmb();

        if (!sock_flag(sk, SOCK_DEAD))
                sk->sk_state_change(sk);

(2) Correspondingly, AFAICT the tcp_poll() call to smp_rmb() should be
before tcp_poll() first reads sk->sk_shutdown, rather than right
before it reads sk->sk_err?

thanks,
neal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ