lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180718134308.GF7193@dhcp22.suse.cz>
Date:   Wed, 18 Jul 2018 15:43:08 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     David Hildenbrand <david@...hat.com>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Alexander Potapenko <glider@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Andrey Ryabinin <aryabinin@...tuozzo.com>,
        Balbir Singh <bsingharora@...il.com>,
        Baoquan He <bhe@...hat.com>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Dave Young <dyoung@...hat.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Hari Bathini <hbathini@...ux.vnet.ibm.com>,
        Huang Ying <ying.huang@...el.com>,
        Hugh Dickins <hughd@...gle.com>,
        Ingo Molnar <mingo@...nel.org>,
        Jaewon Kim <jaewon31.kim@...sung.com>, Jan Kara <jack@...e.cz>,
        Jérôme Glisse <jglisse@...hat.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Juergen Gross <jgross@...e.com>,
        Kate Stewart <kstewart@...uxfoundation.org>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Matthew Wilcox <mawilcox@...rosoft.com>,
        Mel Gorman <mgorman@...e.de>,
        Michael Ellerman <mpe@...erman.id.au>,
        Miles Chen <miles.chen@...iatek.com>,
        Oscar Salvador <osalvador@...hadventures.net>,
        Paul Mackerras <paulus@...ba.org>,
        Pavel Tatashin <pasha.tatashin@...cle.com>,
        Philippe Ombredanne <pombredanne@...b.com>,
        Rashmica Gupta <rashmica.g@...il.com>,
        Reza Arbab <arbab@...ux.vnet.ibm.com>,
        Souptick Joarder <jrdr.linux@...il.com>,
        Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
        Thomas Gleixner <tglx@...utronix.de>,
        Vlastimil Babka <vbabka@...e.cz>
Subject: Re: [PATCH v1 00/10] mm: online/offline 4MB chunks controlled by
 device driver

On Wed 18-07-18 15:39:29, David Hildenbrand wrote:
> On 18.07.2018 15:19, Michal Hocko wrote:
> > [got back to this really late. Sorry about that]
> > 
> > On Thu 24-05-18 23:07:23, David Hildenbrand wrote:
> >> On 24.05.2018 16:22, Michal Hocko wrote:
> >>> I will go over the rest of the email later I just wanted to make this
> >>> point clear because I suspect we are talking past each other.
> >>
> >> It sounds like we are now talking about how to solve the problem. I like
> >> that :)
> >>
> >>>
> >>> On Thu 24-05-18 16:04:38, David Hildenbrand wrote:
> >>> [...]
> >>>> The point I was making is: I cannot allocate 8MB/128MB using the buddy
> >>>> allocator. All I want to do is manage the memory a virtio-mem device
> >>>> provides as flexible as possible.
> >>>
> >>> I didn't mean to use the page allocator to isolate pages from it. We do
> >>> have other means. Have a look at the page isolation framework and have a
> >>> look how the current memory hotplug (ab)uses it. In short you mark the
> >>> desired physical memory range as isolated (nobody can allocate from it)
> >>> and then simply remove it from the page allocator. And you are done with
> >>> it. Your particular range is gone, nobody will ever use it. If you mark
> >>> those struct pages reserved then pfn walkers should already ignore them.
> >>> If you keep those pages with ref count 0 then even hotplug should work
> >>> seemlessly (I would have to double check).
> >>>
> >>> So all I am arguing is that whatever your driver wants to do can be
> >>> handled without touching the hotplug code much. You would still need
> >>> to add new ranges in the mem section units and manage on top of that.
> >>> You need to do that anyway to keep track of what parts are in use or
> >>> offlined anyway right? Now the mem sections. You have to do that anyway
> >>> for memmaps. Our sparse memory model simply works in those units. Even
> >>> if you make a part of that range unavailable then the section will still
> >>> be there.
> >>>
> >>> Do I make at least some sense or I am completely missing your point?
> >>>
> >>
> >> I think we're heading somewhere. I understand that you want to separate
> >> this "semi" offline part from the general offlining code. If so, we
> >> should definitely enforce segment alignment for online_pages/offline_pages.
> >>
> >> Importantly, what I need is:
> >>
> >> 1. Indicate and prepare memory sections to be used for adding memory
> >>    chunks (right now add_memory())
> > 
> > Yes, this is section based. So you will always get memmap (struct page)
> > for the whole section.
> > 
> >> 2. Make memory chunks of a section available to the system (right now
> >>    online_pages())
> > 
> > Yes, this doesn't have to be section based. All you need is to mark
> > remaining pages as offline. They are reserved at this moment so nobody
> > should touch tehem.
> > 
> >> 3. Remove memory chunks of a section from the system (right now
> >>    offline_pages())
> > 
> > Yes. All we need is to note that those reserved pages are actually good
> > to offline. I have mentioned that reserved pages are yours at this stage
> > so you can note the special state without an additional page flag.
> > 
> > The generic hotplug code just have to learn about this new state.
> > has_unmovable_pages sounds like a proper place to do that. You simply
> > clear the offline state and the PageReserved and you are done with the
> > page.
> > 
> 
> I agree. This would be minimal invassive - notifiers are still called on
> whole segment.

That shouldn't matter because notifiers should never step on pages they
do not manage or own.

> >> 4. Remove memory sections from the system (right now remove_memory())
> > 
> > no change needed
> > 
> >> 5. Hinder dumping tools from reading memory chunks that are logically
> >>    offline (right now PageOffline())
> > 
> > I still fail to see why do we even care about some dumping tools. Pages
> > are reserved so they simply shouldn't touch that memory at all.
> > 
> 
> Thanks for having a look!
> 
> I wonder why reserved pages never got excluded by dump tools. So I
> assume there is some kind of magic hidden in it.
> 
> `git grep SetPageReserved` returns a number of buffers that are not to
> be swapped. So "reserved" there is used for:
>   "PG_reserved is set for special pages, which can never be swapped out"

That was an ancient menaing of the flag. The flag in general means that
you shouldn't touch it unless you own it.

> And my point would be that these pages are still to be dumped (just as
> it is being done now). They are valid memory.

Then fix kdump or what ever is touching them.

> It seems like this bit is used for two different purposes. My take would
> be then to have another way of indicating "don't swap" vs. "page not
> accessible / offline". And that's why I propose PageOffline.
> 
> I would even go one step further and rename "reserved" to "dontswap".

No, it really doesn't have that meaning for years.
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ