netdev - Re: [PATCH net-next] tcp: forbid direct reclaim if MSG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALOAHbBwGAqia_2xJSNkaX7TzSPN89bRz71qHRH43FtbwNTP6w@mail.gmail.com>
Date:   Tue, 9 Oct 2018 22:52:35 +0800
From:   Yafang Shao <laoar.shao@...il.com>
To:     Eric Dumazet <edumazet@...gle.com>
Cc:     David Miller <davem@...emloft.net>,
        netdev <netdev@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net-next] tcp: forbid direct reclaim if MSG_DONTWAIT is
 set in send path

On Tue, Oct 9, 2018 at 10:12 PM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Tue, Oct 9, 2018 at 5:05 AM Yafang Shao <laoar.shao@...il.com> wrote:
> >
> > By default, the sk->sk_allocation is GFP_KERNEL, that means if there's
> > no enough memory it will do both direct reclaim and background reclaim.
> > If the size of system memory is great, the direct reclaim may cause great
> > latency spike.
> >
> > When we set MSG_DONTWAIT in send syscalls, we really don't want it to be
> > blocked, so we'd better clear __GFP_DIRECT_RECLAIM when allocate skb in the
> > send path. Then, it will return immediately if there's no enough memory to
> > be allocated, and then the appliation has a chance to do some other stuffs
> > instead of being blocked here.
> >
> > Signed-off-by: Yafang Shao <laoar.shao@...il.com>
> > ---
> >  net/ipv4/tcp.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 43ef83b..fe4f5ce 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -1182,6 +1182,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >         bool process_backlog = false;
> >         bool zc = false;
> >         long timeo;
> > +       gfp_t gfp;
> >
> >         flags = msg->msg_flags;
> >
> > @@ -1255,6 +1256,9 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >         /* Ok commence sending. */
> >         copied = 0;
> >
> > +       gfp = flags & MSG_DONTWAIT ? sk->sk_allocation & ~__GFP_DIRECT_RECLAIM :
> > +             sk->sk_allocation;
> > +
> >  restart:
> >         mss_now = tcp_send_mss(sk, &size_goal, flags);
> >
> > @@ -1283,8 +1287,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >                         }
> >                         first_skb = tcp_rtx_and_write_queues_empty(sk);
> >                         linear = select_size(first_skb, zc);
> > -                       skb = sk_stream_alloc_skb(sk, linear, sk->sk_allocation,
> > -                                                 first_skb);
> > +                       skb = sk_stream_alloc_skb(sk, linear, gfp, first_skb);
> >                         if (!skb)
> >                                 goto wait_for_memory;
>
>
> How have you tested this patch exactly ?
>
There was a network latency (hunreds msecs or even one sec ) recently
on our production enviroment.
And finally I diagnosed that this latency was caused by direct reclaim
in tcp_sendmsg.
That issue could be resovled by keeping a reserved memory.
But I think deeply that why not forbid direct reclaim if we set MSG_DONWAIT.
So I did this change and tested it. The application got a errno
returned instead of being blocked in send path.
That's why I sumbit this patch.

> Most of TCP payloads are added in page fragments, and you have not
> changed the page allocation fragments.
>
> Also, I do not see how an application will get future notifications
> that it can retry the failed system call ?
> How are you really going to deal with this in high performance applications ?
>

I think that immdiately return with errno is better than being blocked.
Maybe this solution is not good enough.
At least it could tell the application that something is wrong and it
can't send now.

> I would rather prefer a socket setsockopt() to eventually be able to
> flip __GFP_DIRECT_RECLAIM in sk->sk_allocation,
> to not add all these tests in fast path, but honestly I do not see how
> applications can really make use of this.

Maybe an event is needed to tell the application it can send now.
I don't have better idea neither.

Thanks
Yafang