[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171117213215.GQ21978@ZenIV.linux.org.uk>
Date: Fri, 17 Nov 2017 21:32:15 +0000
From: Al Viro <viro@...IV.linux.org.uk>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [git pull] vfs.git get_user_pages_fast() conversion
On Fri, Nov 17, 2017 at 12:50:47PM -0800, Linus Torvalds wrote:
> Not because the conversion was wrong, but because the original code is
> so broken.
>
> In particular, that "1" that is unchanged in the arguments is correct
> in the conversion, but it was completely wrong in the original, even
> if it happened to work.
>
> it _should_ have been a FOLL_WRITE. Yes, it happens to have that
> value, but it was broken.
>
> (I note that a bit of grepping shows we have the same issue in a stale
> comment in mm/ksm.c).
>
> It would have been nice to see things like this mentioned in the commit message.
>
> Because I'm pretty sure you actually _realized_ that as you made the
> conversion, but there's no sign of that in the logs, because the
> commit message just says
>
> atomisp: use get_user_pages_fast()
>
> without mentioning how broken the old case was (even it if happened to work).
Point... Frankly, my impression from the whole thing is that get_user_pages()
and get_user_pages_unlocked() calling conventions are overcomplicated, especially
since most of the callers either should be get_user_pages_fast() or happen
to be inside implementations of get_user_pages_fast(). Grepping for the
remaining callers right now yields just this:
arch/cris/arch-v32/drivers/cryptocop.c:2722: err = get_user_pages((unsigned long int)(oper.indata + prev_ix),
arch/cris/arch-v32/drivers/cryptocop.c:2736: err = get_user_pages((unsigned long int)oper.cipher_outdata,
arch/ia64/kernel/err_inject.c:145: ret = get_user_pages(virt_addr, 1, FOLL_WRITE, NULL, NULL);
arch/x86/mm/mpx.c:550: gup_ret = get_user_pages((unsigned long)addr, nr_pages,
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:646: r = get_user_pages(userptr, num_pages, flags, p, NULL);
drivers/gpu/drm/radeon/radeon_ttm.c:571: r = get_user_pages(userptr, num_pages, write ? FOLL_WRITE : 0,
drivers/infiniband/core/umem.c:194: ret = get_user_pages(cur_base,
drivers/infiniband/hw/mthca/mthca_memfree.c:475: ret = get_user_pages(uaddr & PAGE_MASK, 1, FOLL_WRITE, pages, NUL
L);
drivers/infiniband/hw/qib/qib_user_pages.c:70: ret = get_user_pages(start_page + got * PAGE_SIZE,
drivers/infiniband/hw/usnic/usnic_uiom.c:146: ret = get_user_pages(cur_base,
drivers/media/pci/ivtv/ivtv-udma.c:127: err = get_user_pages_unlocked(user_dma.uaddr, user_dma.page_count,
drivers/media/pci/ivtv/ivtv-yuv.c:78: y_pages = get_user_pages_unlocked(y_dma.uaddr,
drivers/media/pci/ivtv/ivtv-yuv.c:82: uv_pages = get_user_pages_unlocked(uv_dma.uaddr,
drivers/media/v4l2-core/videobuf-dma-sg.c:188: err = get_user_pages(data & PAGE_MASK, dma->nr_pages,
drivers/misc/mic/scif/scif_rma.c:1399: pinned_pages->nr_pages = get_user_pages(
drivers/misc/sgi-gru/grufault.c:201: if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page, NULL) <= 0)
mm/mempolicy.c:829: err = get_user_pages(addr & PAGE_MASK, 1, 0, &p, NULL);
virt/kvm/kvm_main.c:1326: return get_user_pages(start, 1, flags, page, NULL);
virt/kvm/kvm_main.c:1333: rc = get_user_pages(addr, 1, flags, NULL, NULL);
virt/kvm/kvm_main.c:1395: npages = get_user_pages_unlocked(addr, 1, page, flags);
and even those are dubious - e.g. cris ones could bloody well become a pair of
get_user_pages_fast(), without ->mmap_sem being held across both calls. Itanic
one is almost certainly buggered - we are not holding ->mmap_sem there.
Hell knows... I wonder if exposing FOLL_... thing outside of mm/gup.c actually
makes sense. With the users so heavily skewed towards just two combinations of
flags (0 and FOLL_WRITE)...
And for get_user_pages() itself it's even more ridiculous - vmalist (the last
argument) is non-NULL in only one caller. Which uses it only to check if all
of the VMAs happen to be hugetlb ones, apparently.
FWIW, I wanted to trim the users of those two suckers and see what remains.
And then go through those with maintainers of subsystems in question, to
see what is really wanted there. That's for the coming cycle, though...
Powered by blists - more mailing lists