linux-kernel - Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6c753cae-f8b6-5563-e5ba-7c1fefdeb74e@redhat.com>
Date:   Tue, 24 Jul 2018 16:13:09 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Vlastimil Babka <vbabka@...e.cz>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Baoquan He <bhe@...hat.com>, Dave Young <dyoung@...hat.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Hari Bathini <hbathini@...ux.vnet.ibm.com>,
        Huang Ying <ying.huang@...el.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Marc-André Lureau <marcandre.lureau@...hat.com>,
        Matthew Wilcox <mawilcox@...rosoft.com>,
        Miles Chen <miles.chen@...iatek.com>,
        Pavel Tatashin <pasha.tatashin@...cle.com>,
        Petr Tesarik <ptesarik@...e.cz>
Subject: Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps

On 24.07.2018 15:35, Michal Hocko wrote:
> On Tue 24-07-18 15:27:51, David Hildenbrand wrote:
>> On 24.07.2018 15:13, Michal Hocko wrote:
>>> On Tue 24-07-18 14:17:12, David Hildenbrand wrote:
>>>> On 24.07.2018 09:25, Michal Hocko wrote:
>>>>> On Mon 23-07-18 19:20:43, David Hildenbrand wrote:
>>>>>> On 23.07.2018 14:30, Michal Hocko wrote:
>>>>>>> On Mon 23-07-18 13:45:18, Vlastimil Babka wrote:
>>>>>>>> On 07/20/2018 02:34 PM, David Hildenbrand wrote:
>>>>>>>>> Dumping tools (like makedumpfile) right now don't exclude reserved pages.
>>>>>>>>> So reserved pages might be access by dump tools although nobody except
>>>>>>>>> the owner should touch them.
>>>>>>>>
>>>>>>>> Are you sure about that? Or maybe I understand wrong. Maybe it changed
>>>>>>>> recently, but IIRC pages that are backing memmap (struct pages) are also
>>>>>>>> PG_reserved. And you definitely do want those in the dump.
>>>>>>>
>>>>>>> You are right. reserve_bootmem_region will make all early bootmem
>>>>>>> allocations (including those backing memmaps) PageReserved. I have asked
>>>>>>> several times but I haven't seen a satisfactory answer yet. Why do we
>>>>>>> even care for kdump about those. If they are reserved the nobody should
>>>>>>> really look at those specific struct pages and manipulate them. Kdump
>>>>>>> tools are using a kernel interface to read the content. If the specific
>>>>>>> content is backed by a non-existing memory then they should simply not
>>>>>>> return anything.
>>>>>>>
>>>>>>
>>>>>> "new kernel" provides an interface to read memory from "old kernel".
>>>>>>
>>>>>> The new kernel has no idea about
>>>>>> - which memory was added/online in the old kernel
>>>>>> - where struct pages of the old kernel are and what their content is
>>>>>> - which memory is save to touch and which not
>>>>>>
>>>>>> Dump tools figure all that out by interpreting the VMCORE. They e.g.
>>>>>> identify "struct pages" and see if they should be dumped. The "new
>>>>>> kernel" only allows to read that memory. It cannot hinder to crash the
>>>>>> system (e.g. if a dump tool would try to read a hwpoison page).
>>>>>>
>>>>>> So how should the "new kernel" know if a page can be touched or not?
>>>>>
>>>>> I am sorry I am not familiar with kdump much. But from what I remember
>>>>> it reads from /proc/vmcore and implementation of this interface should
>>>>> simply return EINVAL or alike when you try to dump inaccessible memory
>>>>> range.
>>>>
>>>> Oh, and BTW, while something like -EINVAL could work, we usually don't
>>>> want to try to read certain pages at all (e.g. ballooned pages -
>>>> accessing the page might work but involves quite some overhead in the
>>>> hypervisor).
>>>>
>>>> So we should either handle this in dump tools (reserved + ...?) or while
>>>> doing the read similar to XEN (is_ram_page()).
>>>
>>> Yes, I think this is the proper way. Just test for PageOnline
>>> in read_from_oldmem/copy_oldmem_page. Btw. we already page
>>> pfn_to_online_page which performs the per-section online/offline
>>> status. This should be extendable to consider your new PageOffline
>>> state.
>>
>> That is the important bit:
>>
>> What the new kernel sees is not what the old kernel saw.
>>
>> Checking for pfn_to_online_page() from
>> read_from_oldmem/copy_oldmem_page() is plain wrong.
>>
>> E.g. ACPI hotplug memory is not even added in the new kernel - see
>> "acpi_no_memhotplug" which is used in kdump environments.
>>
>> The only thing we can do is
>> - query the hypervisor
>> - try to access and get an exception
> 
> But we do preserve struct page's (aka memmap) from the crash kernel,
> don't we? So you have the whole state there. Or am I missing something?
> 

Yes, they are preserved but we don't interpret them, that is up to dump
tools. We only provide access to the vmcore, which includes read/writing
the memory indicated in it. The struct pages are simply part of the
vmcore. Completely hidden from the new kernel.

Finding/interpreting the struct pages is not (and most probably should
never) be done in the kernel.

E.g. The old kernel could be a different kernel version, different
memory configuration (!SPARSE, SPARSE ...), page flags could be
different ... it's not a straight forward access.

That's why dump tools interpret struct pages instead. And also why I
want a simple identifier in them so user space dump tools can figure out
"this page is better not to be touched, the content is stale or not
accessible".

So I see right now:

- Pg_reserved + e.g. new page type (or some other unique identifier in
  combination with Pg_reserved)
 -> Avoid reads of pages we know are offline
- extend is_ram_page()
 -> Fake zero memory for pages we know are offline

Or even both (avoid reading and don't crash the kernel if it is being done).

I am not a friend of the "try to access and get an exception" approach.

-- 

Thanks,

David / dhildenb