linux-kernel - Re: [PATCH 2/2] mm/frame-vec: use FOLL

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKMK7uEsVOopEuWrjXOh+pjo=XZXkVf6Y6sC7WD7zr+t=rBdZg@mail.gmail.com>
Date:   Tue, 6 Oct 2020 15:08:50 +0200
From:   Daniel Vetter <daniel.vetter@...ll.ch>
To:     Jason Gunthorpe <jgg@...pe.ca>
Cc:     DRI Development <dri-devel@...ts.freedesktop.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Daniel Vetter <daniel.vetter@...el.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        John Hubbard <jhubbard@...dia.com>,
        Jérôme Glisse <jglisse@...hat.com>,
        Jan Kara <jack@...e.cz>,
        Dan Williams <dan.j.williams@...el.com>,
        Linux MM <linux-mm@...ck.org>,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>,
        Pawel Osciak <pawel@...iak.com>,
        Marek Szyprowski <m.szyprowski@...sung.com>,
        Kyungmin Park <kyungmin.park@...sung.com>,
        Tomasz Figa <tfiga@...omium.org>,
        Inki Dae <inki.dae@...sung.com>,
        Joonyoung Shim <jy0922.shim@...sung.com>,
        Seung-Woo Kim <sw0312.kim@...sung.com>,
        linux-samsung-soc <linux-samsung-soc@...r.kernel.org>,
        "open list:DMA BUFFER SHARING FRAMEWORK" 
        <linux-media@...r.kernel.org>, Oded Gabbay <oded.gabbay@...il.com>
Subject: Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM

On Tue, Oct 6, 2020 at 2:26 PM Jason Gunthorpe <jgg@...pe.ca> wrote:
>
> On Tue, Oct 06, 2020 at 08:23:23AM +0200, Daniel Vetter wrote:
> > On Tue, Oct 6, 2020 at 1:41 AM Jason Gunthorpe <jgg@...pe.ca> wrote:
> > >
> > > On Tue, Oct 06, 2020 at 12:43:31AM +0200, Daniel Vetter wrote:
> > >
> > > > > iow I think I can outright delete the frame vector stuff.
> > > >
> > > > Ok this doesn't work, because dma_mmap always uses a remap_pfn_range,
> > > > which is a VM_IO | VM_PFNMAP vma and so even if it's cma backed and
> > > > not a carveout, we can't get the pages.
> > >
> > > If CMA memory has struct pages it probably should be mmap'd with
> > > different flags, and the lifecycle of the CMA memory needs to respect
> > > the struct page refcount?
> >
> > I guess yes and no. The problem is if there's pagecache in the cma
> > region, pup(FOLL_LONGTERM) needs to first migrate those pages out of
> > the cma range. Because all normal page allocation in cma regions must
> > be migratable at all times.
>
> Eh? Then how are we doing follow_pfn() on this stuff and not being
> completely broken?
>
> The entire point of this framevec API is to pin the memory and
> preventing it from moving around.
>
> Sounds like it is fundamentally incompatible with CMA. Why is
> something trying to mix the two?

I think the assumption way back when this started is that any VM_IO |
VM_PFNMAP vma is perma-pinned because it's just a piece of carveout.
Of course this ignored that it could also be a piece of iomem and
peer2peer dma doens't Just Work, so could result in all kinds of
hilarity and hw exceptions. But no leaks. Well, if you assume that the
ownership of a device never changes after you've booted the system.

But now we have dynamic gpu memory management, a bunch of subsystems
that fully support revoke semantics (in a subsystem specific way), and
CMA trying really hard to make the old carveouts useable for the
system at large when the memory isn't needed by the device. So all
these assumptions behind follow_pfn are out of the window, and
follow_pfn is pretty much broken.

What's worse I noticed that even for static pfnmaps (for userspace
drivers) we now revoke access to those mmaps. For example implemented
for /dev/mem in 3234ac664a87 ("/dev/mem: Revoke mappings when a driver
claims the region"). Which means follow_pfn isn't even working
correctly anymore for that case, and it's all pretty much broken.

> > This is actually worse than the gpu case I had in mind, where at most
> > you can sneak access other gpu buffers. With cma you should be able to
> > get at arbitrary pagecache (well anything that's GFP_MOVEABLE really).
> > Nice :-(
>
> Ah, we have a winner :\

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch