linux-kernel - Re: [Linaro-mm-sig] [PATCH 1/2] mm: replace BUG_ON in vm_insert

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YBwRU1nrE3mfYbWK@phenom.ffwll.local>
Date:   Thu, 4 Feb 2021 16:22:59 +0100
From:   Daniel Vetter <daniel@...ll.ch>
To:     Christian König <christian.koenig@....com>
Cc:     Suren Baghdasaryan <surenb@...gle.com>,
        Daniel Vetter <daniel.vetter@...ll.ch>,
        Matthew Wilcox <willy@...radead.org>,
        "moderated list:DMA BUFFER SHARING FRAMEWORK" 
        <linaro-mm-sig@...ts.linaro.org>,
        Sandeep Patil <sspatil@...gle.com>,
        Android Kernel Team <kernel-team@...roid.com>,
        James Jones <jajones@...dia.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Liam Mark <lmark@...eaurora.org>,
        Brian Starkey <Brian.Starkey@....com>,
        Christoph Hellwig <hch@...radead.org>,
        Minchan Kim <minchan@...nel.org>,
        Linux MM <linux-mm@...ck.org>,
        John Stultz <john.stultz@...aro.org>,
        dri-devel <dri-devel@...ts.freedesktop.org>,
        Chris Goldsworthy <cgoldswo@...eaurora.org>,
        Hridya Valsaraju <hridya@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Robin Murphy <robin.murphy@....com>,
        "open list:DMA BUFFER SHARING FRAMEWORK" 
        <linux-media@...r.kernel.org>
Subject: Re: [Linaro-mm-sig] [PATCH 1/2] mm: replace BUG_ON in vm_insert_page
 with a return of an error

On Thu, Feb 04, 2021 at 09:16:32AM +0100, Christian König wrote:
> Am 03.02.21 um 22:41 schrieb Suren Baghdasaryan:
> > [SNIP]
> > > > How many semi-unrelated buffer accounting schemes does google come up with?
> > > > 
> > > > We're at three with this one.
> > > > 
> > > > And also we _cannot_ required that all dma-bufs are backed by struct
> > > > page, so requiring struct page to make this work is a no-go.
> > > > 
> > > > Second, we do not want to all get_user_pages and friends to work on
> > > > dma-buf, it causes all kinds of pain. Yes on SoC where dma-buf are
> > > > exclusively in system memory you can maybe get away with this, but
> > > > dma-buf is supposed to work in more places than just Android SoCs.
> > > I just realized that vm_inser_page doesn't even work for CMA, it would
> > > upset get_user_pages pretty badly - you're trying to pin a page in
> > > ZONE_MOVEABLE but you can't move it because it's rather special.
> > > VM_SPECIAL is exactly meant to catch this stuff.
> > Thanks for the input, Daniel! Let me think about the cases you pointed out.
> > 
> > IMHO, the issue with PSS is the difficulty of calculating this metric
> > without struct page usage. I don't think that problem becomes easier
> > if we use cgroups or any other API. I wanted to enable existing PSS
> > calculation mechanisms for the dmabufs known to be backed by struct
> > pages (since we know how the heap allocated that memory), but sounds
> > like this would lead to problems that I did not consider.
> 
> Yeah, using struct page indeed won't work. We discussed that multiple times
> now and Daniel even has a patch to mangle the struct page pointers inside
> the sg_table object to prevent abuse in that direction.
> 
> On the other hand I totally agree that we need to do something on this side
> which goes beyong what cgroups provide.
> 
> A few years ago I came up with patches to improve the OOM killer to include
> resources bound to the processes through file descriptors. I unfortunately
> can't find them of hand any more and I'm currently to busy to dig them up.
> 
> In general I think we need to make it possible that both the in kernel OOM
> killer as well as userspace processes and handlers have access to that kind
> of data.
> 
> The fdinfo approach as suggested in the other thread sounds like the easiest
> solution to me.

Yeah for OOM handling cgroups alone isn't enough as the interface - we
need to make sure that oom killer takes into account the system memory
usage (ideally zone aware, for CMA pools).

But to track that we still need that infrastructure first I think.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch