linux-kernel - Re: On guest free page hinting and OOM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKgT0UeZE29qBOAxDb2EmLr_hr1_W-m3Rw3gKs-UAPbD80K_+Q@mail.gmail.com>
Date:   Tue, 2 Apr 2019 13:32:03 -0700
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
Cc:     David Hildenbrand <david@...hat.com>,
        Nitesh Narayan Lal <nitesh@...hat.com>,
        kvm list <kvm@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-mm <linux-mm@...ck.org>,
        Paolo Bonzini <pbonzini@...hat.com>, lcapitulino@...hat.com,
        pagupta@...hat.com, Yang Zhang <yang.zhang.wz@...il.com>,
        Rik van Riel <riel@...riel.com>, dodgen@...gle.com,
        Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
        dhildenb@...hat.com, Andrea Arcangeli <aarcange@...hat.com>,
        Dave Hansen <dave.hansen@...el.com>
Subject: Re: On guest free page hinting and OOM

On Tue, Apr 2, 2019 at 10:53 AM Michael S. Tsirkin <mst@...hat.com> wrote:
>
> On Tue, Apr 02, 2019 at 10:45:43AM -0700, Alexander Duyck wrote:
> > We went through this back in the day with
> > networking. Adding more buffers is not the solution. The solution is
> > to have a way to gracefully recover and keep our hinting latency and
> > buffer bloat to a minimum.
>
> That's an interesting approach, I think that things that end up working
> well are NAPI (asychronous notifications), limited batching, XDP (big
> aligned buffers) and BQL (accounting). Is that your perspective too?
>

Yes, that is kind of what I was getting at.

Basically we could have a kthread running somewhere that goes through
and pulls something like 64M of pages out of the MAX_ORDER - 1
freelist, does what is necessary to isolate them, puts them on a queue
somewhere, kicks the virtio ring, and waits for the response to come
back indicating that the hints have been processed. We would just have
to keep it running until the list doesn't have enough non-"Offline"
memory to fulfill the request. Then we just wait until we again reach
a level necessary to justify waking the thread back up and repeat.

In my mind it looks a lot like your standard Rx ring in that we
allocate some fixed number of buffers and wait for hardware to tell us
when the buffers are ready. The only extra complexity is having to add
tracking using the PageType "Offline" bit which should be cheap when
we are already having to manipulate the "Buddy" PageType anyway.

It would let us get away from having to do the per-cpu queues and
complicated coordination logic to translate free pages to their buddy.