[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200804190115.15983.rusty@rustcorp.com.au>
Date: Sat, 19 Apr 2008 01:15:15 +1000
From: Rusty Russell <rusty@...tcorp.com.au>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: netdev@...r.kernel.org, Max Krasnyansky <maxk@...lcomm.com>,
virtualization@...ts.linux-foundation.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 5/5] tun: vringfd xmit support.
On Friday 18 April 2008 21:31:20 Andrew Morton wrote:
> On Fri, 18 Apr 2008 14:43:24 +1000 Rusty Russell <rusty@...tcorp.com.au> wrote:
> > + /* How many pages will this take? */
> > + npages = 1 + (base + len - 1)/PAGE_SIZE - base/PAGE_SIZE;
>
> Brain hurts. I hope you got that right.
I tested it when I wrote it, but just wrote a tester again:
base len npages
0 1 1
0xfff 1 1
0x1000 1 1
0 4096 1
0x1 4096 2
0xfff 4096 2
0x1000 4096 1
0xfffff000 4096 1
0xfffff000 4097 4293918722
> > + if (unlikely(num_pg + npages > MAX_SKB_FRAGS)) {
> > + err = -ENOSPC;
> > + goto fail;
> > + }
> > + n = get_user_pages(current, current->mm, base, npages,
> > + 0, 0, pages, NULL);
>
> What is the maximum numbet of pages which an unpriviliged user can
> concurrently pin with this code?
Since only root can open the tun device, it's currently OK. The old code
kmalloced and copied: is there some mm-fu reason why pinning userspace memory
is worse?
But I actually think it's OK even for non-root, since these become skbs, which
means they either go into an outgoing device queue or a socket queue which is
accounted for exactly for this reason.
> > + if (unlikely(n < 0)) {
> > + err = n;
> > + goto fail;
> > + }
> > +
> > + /* Transfer pages to the frag array */
> > + for (j = 0; j < n; j++) {
> > + f[num_pg].page = pages[j];
> > + if (j == 0) {
> > + f[num_pg].page_offset = offset_in_page(base);
> > + f[num_pg].size = min(len, PAGE_SIZE -
> > + f[num_pg].page_offset);
> > + } else {
> > + f[num_pg].page_offset = 0;
> > + f[num_pg].size = min(len, PAGE_SIZE);
> > + }
> > + len -= f[num_pg].size;
> > + base += f[num_pg].size;
> > + num_pg++;
> > + }
>
> This loop is a fancy way of doing
>
> num_pg = n;
Damn, you had me reworking this until I realized why. It's not: we're
inside a loop, doing one iovec array element at a time.
> > + if (unlikely(n != npages)) {
> > + err = -EFAULT;
> > + goto fail;
> > + }
>
> why not do this immediately after running get_user_pages()?
To simplify the failure path. Hmm, I would use release_pages here...
> > +fail:
> > + for (i = 0; i < num_pg; i++)
> > + put_page(f[i].page);
>
> release_pages() could be a tad more efficient, but it's only error-path.
... but I didn't know that existed. Had to include pagemap.h, and it's not
exported. It seems to be a useful interface; see patch.
Cheers,
Rusty.
Subject: Export release_pages; nice undo for get_user_pages.
Andrew Morton suggests tun/tap use release_pages, but it's not
exported. It's not clear to me why this is in swap.c, but it exists
even without CONFIG_SWAP, so that's OK.
Signed-off-by: Rusty Russell <rusty@...tcorp.com.au>
diff -r abd2ad431e5c mm/swap.c
--- a/mm/swap.c Sat Apr 19 00:34:54 2008 +1000
+++ b/mm/swap.c Sat Apr 19 01:11:40 2008 +1000
@@ -346,6 +346,7 @@ void release_pages(struct page **pages,
pagevec_free(&pages_to_free);
}
+EXPORT_SYMBOL(release_pages);
/*
* The pages which we're about to release may be in the deferred lru-addition
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists