[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAOFY-A0w51=3M0Gj80wt8BhJryXeEgTTovWVLVieOTW=NwrRmA@mail.gmail.com>
Date: Fri, 10 Apr 2020 12:15:29 -0700
From: Arjun Roy <arjunroy@...gle.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Arjun Roy <arjunroy.kdev@...il.com>,
David Miller <davem@...emloft.net>,
netdev <netdev@...r.kernel.org>, linux-mm@...ck.org,
Eric Dumazet <edumazet@...gle.com>,
Soheil Hassas Yeganeh <soheil@...gle.com>
Subject: Re: [PATCH resend mm,net-next 3/3] net-zerocopy: Use
vm_insert_pages() for tcp rcv zerocopy.
On Fri, Apr 10, 2020 at 12:04 PM Andrew Morton
<akpm@...ux-foundation.org> wrote:
>
> On Fri, 21 Feb 2020 13:21:41 -0800 Arjun Roy <arjunroy@...gle.com> wrote:
>
> > I remain a bit concerned regarding the merge process for this specific
> > patch (0003, the net/ipv4/tcp.c change) since I have other in-flight
> > changes for TCP receive zerocopy that I'd like to upstream for
> > net-next - and would like to avoid weird merge issues.
> >
> > So perhaps the following could work:
> >
> > 1. Andrew, perhaps we could remove this particular patch (0003, the
> > net/ipv4/tcp.c change) from mm-next; that way we merge
> > vm_insert_pages() but not the call-site within TCP, for now.
> > 2. net-next will eventually pick vm_insert_pages() up.
> > 3. I can modify the zerocopy code to use it at that point?
> >
> > Else I'm concerned a complicated merge situation may result.
>
> The merge situation is quite clean.
>
> I guess I'll hold off on
> net-zerocopy-use-vm_insert_pages-for-tcp-rcv-zerocopy.patch (below) and
> shall send it to davem after Linus has merged the prerequisites.
>
Acknowledged, thank you!
(resend because gmail conveniently forgot my plain text mode preference...)
-Arjun
>
> From: Arjun Roy <arjunroy@...gle.com>
> Subject: net-zerocopy: use vm_insert_pages() for tcp rcv zerocopy
>
> Use vm_insert_pages() for tcp receive zerocopy. Spin lock cycles (as
> reported by perf) drop from a couple of percentage points to a fraction of
> a percent. This results in a roughly 6% increase in efficiency, measured
> roughly as zerocopy receive count divided by CPU utilization.
>
> The intention of this patchset is to reduce atomic ops for tcp zerocopy
> receives, which normally hits the same spinlock multiple times
> consecutively.
>
> [akpm@...ux-foundation.org: suppress gcc-7.2.0 warning]
> Link: http://lkml.kernel.org/r/20200128025958.43490-3-arjunroy.kdev@gmail.com
> Signed-off-by: Arjun Roy <arjunroy@...gle.com>
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@...gle.com>
> Cc: David Miller <davem@...emloft.net>
> Cc: Matthew Wilcox <willy@...radead.org>
> Cc: Jason Gunthorpe <jgg@...pe.ca>
> Cc: Stephen Rothwell <sfr@...b.auug.org.au>
> Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
> ---
>
> net/ipv4/tcp.c | 70 ++++++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 63 insertions(+), 7 deletions(-)
>
> --- a/net/ipv4/tcp.c~net-zerocopy-use-vm_insert_pages-for-tcp-rcv-zerocopy
> +++ a/net/ipv4/tcp.c
> @@ -1734,14 +1734,48 @@ int tcp_mmap(struct file *file, struct s
> }
> EXPORT_SYMBOL(tcp_mmap);
>
> +static int tcp_zerocopy_vm_insert_batch(struct vm_area_struct *vma,
> + struct page **pages,
> + unsigned long pages_to_map,
> + unsigned long *insert_addr,
> + u32 *length_with_pending,
> + u32 *seq,
> + struct tcp_zerocopy_receive *zc)
> +{
> + unsigned long pages_remaining = pages_to_map;
> + int bytes_mapped;
> + int ret;
> +
> + ret = vm_insert_pages(vma, *insert_addr, pages, &pages_remaining);
> + bytes_mapped = PAGE_SIZE * (pages_to_map - pages_remaining);
> + /* Even if vm_insert_pages fails, it may have partially succeeded in
> + * mapping (some but not all of the pages).
> + */
> + *seq += bytes_mapped;
> + *insert_addr += bytes_mapped;
> + if (ret) {
> + /* But if vm_insert_pages did fail, we have to unroll some state
> + * we speculatively touched before.
> + */
> + const int bytes_not_mapped = PAGE_SIZE * pages_remaining;
> + *length_with_pending -= bytes_not_mapped;
> + zc->recv_skip_hint += bytes_not_mapped;
> + }
> + return ret;
> +}
> +
> static int tcp_zerocopy_receive(struct sock *sk,
> struct tcp_zerocopy_receive *zc)
> {
> unsigned long address = (unsigned long)zc->address;
> u32 length = 0, seq, offset, zap_len;
> + #define PAGE_BATCH_SIZE 8
> + struct page *pages[PAGE_BATCH_SIZE];
> const skb_frag_t *frags = NULL;
> struct vm_area_struct *vma;
> struct sk_buff *skb = NULL;
> + unsigned long pg_idx = 0;
> + unsigned long curr_addr;
> struct tcp_sock *tp;
> int inq;
> int ret;
> @@ -1754,6 +1788,8 @@ static int tcp_zerocopy_receive(struct s
>
> sock_rps_record_flow(sk);
>
> + tp = tcp_sk(sk);
> +
> down_read(¤t->mm->mmap_sem);
>
> ret = -EINVAL;
> @@ -1762,7 +1798,6 @@ static int tcp_zerocopy_receive(struct s
> goto out;
> zc->length = min_t(unsigned long, zc->length, vma->vm_end - address);
>
> - tp = tcp_sk(sk);
> seq = tp->copied_seq;
> inq = tcp_inq(sk);
> zc->length = min_t(u32, zc->length, inq);
> @@ -1774,8 +1809,20 @@ static int tcp_zerocopy_receive(struct s
> zc->recv_skip_hint = zc->length;
> }
> ret = 0;
> + curr_addr = address;
> while (length + PAGE_SIZE <= zc->length) {
> if (zc->recv_skip_hint < PAGE_SIZE) {
> + /* If we're here, finish the current batch. */
> + if (pg_idx) {
> + ret = tcp_zerocopy_vm_insert_batch(vma, pages,
> + pg_idx,
> + &curr_addr,
> + &length,
> + &seq, zc);
> + if (ret)
> + goto out;
> + pg_idx = 0;
> + }
> if (skb) {
> if (zc->recv_skip_hint > 0)
> break;
> @@ -1784,7 +1831,6 @@ static int tcp_zerocopy_receive(struct s
> } else {
> skb = tcp_recv_skb(sk, seq, &offset);
> }
> -
> zc->recv_skip_hint = skb->len - offset;
> offset -= skb_headlen(skb);
> if ((int)offset < 0 || skb_has_frag_list(skb))
> @@ -1808,14 +1854,24 @@ static int tcp_zerocopy_receive(struct s
> zc->recv_skip_hint -= remaining;
> break;
> }
> - ret = vm_insert_page(vma, address + length,
> - skb_frag_page(frags));
> - if (ret)
> - break;
> + pages[pg_idx] = skb_frag_page(frags);
> + pg_idx++;
> length += PAGE_SIZE;
> - seq += PAGE_SIZE;
> zc->recv_skip_hint -= PAGE_SIZE;
> frags++;
> + if (pg_idx == PAGE_BATCH_SIZE) {
> + ret = tcp_zerocopy_vm_insert_batch(vma, pages, pg_idx,
> + &curr_addr, &length,
> + &seq, zc);
> + if (ret)
> + goto out;
> + pg_idx = 0;
> + }
> + }
> + if (pg_idx) {
> + ret = tcp_zerocopy_vm_insert_batch(vma, pages, pg_idx,
> + &curr_addr, &length, &seq,
> + zc);
> }
> out:
> up_read(¤t->mm->mmap_sem);
> _
>
Powered by blists - more mailing lists