lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <efe01b95-33d4-71ce-2a48-ec43f0846d68@redhat.com>
Date:   Mon, 8 Apr 2019 22:51:10 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Alexander Duyck <alexander.duyck@...il.com>
Cc:     "Michael S. Tsirkin" <mst@...hat.com>,
        Nitesh Narayan Lal <nitesh@...hat.com>,
        kvm list <kvm@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-mm <linux-mm@...ck.org>,
        Paolo Bonzini <pbonzini@...hat.com>, lcapitulino@...hat.com,
        pagupta@...hat.com, wei.w.wang@...el.com,
        Yang Zhang <yang.zhang.wz@...il.com>,
        Rik van Riel <riel@...riel.com>, dodgen@...gle.com,
        Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
        dhildenb@...hat.com, Andrea Arcangeli <aarcange@...hat.com>
Subject: Re: Thoughts on simple scanner approach for free page hinting

On 08.04.19 22:10, Alexander Duyck wrote:
> On Mon, Apr 8, 2019 at 11:40 AM David Hildenbrand <david@...hat.com> wrote:
>>
>>>>>
>>>>> In addition we will need some way to identify which pages have been
>>>>> hinted on and which have not. The way I believe easiest to do this
>>>>> would be to overload the PageType value so that we could essentially
>>>>> have two values for "Buddy" pages. We would have our standard "Buddy"
>>>>> pages, and "Buddy" pages that also have the "Offline" value set in the
>>>>> PageType field. Tracking the Online vs Offline pages this way would
>>>>> actually allow us to do this with almost no overhead as the mapcount
>>>>> value is already being reset to clear the "Buddy" flag so adding a
>>>>> "Offline" flag to this clearing should come at no additional cost.
>>>>
>>>> Just nothing here that this will require modifications to kdump
>>>> (makedumpfile to be precise and the vmcore information exposed from the
>>>> kernel), as kdump only checks for the the actual mapcount value to
>>>> detect buddy and offline pages (to exclude them from dumps), they are
>>>> not treated as flags.
>>>>
>>>> For now, any mapcount values are really only separate values, meaning
>>>> not the separate bits are of interest, like flags would be. Reusing
>>>> other flags would make our life a lot easier. E.g. PG_young or so. But
>>>> clearing of these is then the problematic part.
>>>>
>>>> Of course we could use in the kernel two values, Buddy and BuddyOffline.
>>>> But then we have to check for two different values whenever we want to
>>>> identify a buddy page in the kernel.
>>>
>>> Actually this may not be working the way you think it is working.
>>
>> Trust me, I know how it works. That's why I was giving you the notice.
>>
>> Read the first paragraph again and ignore the others. I am only
>> concerned about makedumpfile that has to be changed.
>>
>> PAGE_OFFLINE_MAPCOUNT_VALUE
>> PAGE_BUDDY_MAPCOUNT_VALUE
>>
>> Once you find out how these values are used, you should understand what
>> has to be changed and where.
> 
> Ugh. Is there an official repo I am supposed to refer to for makedumpfile?
> 
> As far as the changes needed I don't think this would necessitate
> additional exports. We could probably just get away with having
> makedumpfile generate a new value by simply doing an "&" of the two
> values to determine what an offline buddy would be. If need be I can
> submit a patch for that. I find it kind of annoying that the kernel is
> handling identifying these bits one way, and makedumpfile is doing it
> another way. It should have been setup to handle this all the same
> way.
> 
>>
>>>>>
>>>>> Lastly we would need to create a specialized function for allocating
>>>>> the non-"Offline" pages, and to tweak __free_one_page to tail enqueue
>>>>> "Offline" pages. I'm thinking the alloc function it would look
>>>>> something like __rmqueue_smallest but without the "expand" and needing
>>>>> to modify the !page check to also include a check to verify the page
>>>>> is not "Offline". As far as the changes to __free_one_page it would be
>>>>> a 2 line change to test for the PageType being offline, and if it is
>>>>> to call add_to_free_area_tail instead of add_to_free_area.
>>>>
>>>> As already mentioned, there might be scenarios where the additional
>>>> hinting thread might consume too much CPU cycles, especially if there is
>>>> little guest activity any you mostly spend time scanning a handful of
>>>> free pages and reporting them. I wonder if we can somehow limit the
>>>> amount of wakeups/scans for a given period to mitigate this issue.
>>>
>>> That is why I was talking about breaking nr_free into nr_freed and
>>> nr_bound. By doing that I can record the nr_free value to a
>>> virtio-balloon specific location at the start of any walk and should
>>> know exactly now many pages were freed between that call and the next
>>> one. By ordering things such that we place the "Offline" pages on the
>>> tail of the list it should make the search quite fast since we would
>>> just be always allocating off of the head of the queue until we have
>>> hinted everything int he queue. So when we hit the last call to alloc
>>> the non-"Offline" pages and shut down our thread we can use the
>>> nr_freed value that we recorded to know exactly how many pages have
>>> been added that haven't been hinted.
>>>
>>>> One main issue I see with your approach is that we need quite a lot of
>>>> core memory management changes. This is a problem. I wonder if we can
>>>> factor out most parts into callbacks.
>>>
>>> I think that is something we can't get away from. However if we make
>>> this generic enough there would likely be others beyond just the
>>> virtualization drivers that could make use of the infrastructure. For
>>> example being able to track the rate at which the free areas are
>>> cycling in and out pages seems like something that would be useful
>>> outside of just the virtualization areas.
>>
>> Might be, but might be the other extreme, people not wanting such
>> special cases in core mm. I assume the latter until I see a very clear
>> design where such stuff has been properly factored out.
> 
> The only real pain point I am seeing right now is the assumptions
> makedumpfile is currently making about how mapcount is being used to
> indicate pagetype. If we patch it to fix it most of the other bits are
> minor.

I'll be curious how splitting etc. will be handled. Especially if you
want to set Offline for all affected sub pages.

-- 

Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ