[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <af5353ee-319e-17ec-3a39-df997a5adf43@redhat.com>
Date: Tue, 24 Jul 2018 15:27:51 +0200
From: David Hildenbrand <david@...hat.com>
To: Michal Hocko <mhocko@...nel.org>
Cc: Vlastimil Babka <vbabka@...e.cz>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>,
Baoquan He <bhe@...hat.com>, Dave Young <dyoung@...hat.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Hari Bathini <hbathini@...ux.vnet.ibm.com>,
Huang Ying <ying.huang@...el.com>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Marc-André Lureau <marcandre.lureau@...hat.com>,
Matthew Wilcox <mawilcox@...rosoft.com>,
Miles Chen <miles.chen@...iatek.com>,
Pavel Tatashin <pasha.tatashin@...cle.com>,
Petr Tesarik <ptesarik@...e.cz>
Subject: Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
On 24.07.2018 15:13, Michal Hocko wrote:
> On Tue 24-07-18 14:17:12, David Hildenbrand wrote:
>> On 24.07.2018 09:25, Michal Hocko wrote:
>>> On Mon 23-07-18 19:20:43, David Hildenbrand wrote:
>>>> On 23.07.2018 14:30, Michal Hocko wrote:
>>>>> On Mon 23-07-18 13:45:18, Vlastimil Babka wrote:
>>>>>> On 07/20/2018 02:34 PM, David Hildenbrand wrote:
>>>>>>> Dumping tools (like makedumpfile) right now don't exclude reserved pages.
>>>>>>> So reserved pages might be access by dump tools although nobody except
>>>>>>> the owner should touch them.
>>>>>>
>>>>>> Are you sure about that? Or maybe I understand wrong. Maybe it changed
>>>>>> recently, but IIRC pages that are backing memmap (struct pages) are also
>>>>>> PG_reserved. And you definitely do want those in the dump.
>>>>>
>>>>> You are right. reserve_bootmem_region will make all early bootmem
>>>>> allocations (including those backing memmaps) PageReserved. I have asked
>>>>> several times but I haven't seen a satisfactory answer yet. Why do we
>>>>> even care for kdump about those. If they are reserved the nobody should
>>>>> really look at those specific struct pages and manipulate them. Kdump
>>>>> tools are using a kernel interface to read the content. If the specific
>>>>> content is backed by a non-existing memory then they should simply not
>>>>> return anything.
>>>>>
>>>>
>>>> "new kernel" provides an interface to read memory from "old kernel".
>>>>
>>>> The new kernel has no idea about
>>>> - which memory was added/online in the old kernel
>>>> - where struct pages of the old kernel are and what their content is
>>>> - which memory is save to touch and which not
>>>>
>>>> Dump tools figure all that out by interpreting the VMCORE. They e.g.
>>>> identify "struct pages" and see if they should be dumped. The "new
>>>> kernel" only allows to read that memory. It cannot hinder to crash the
>>>> system (e.g. if a dump tool would try to read a hwpoison page).
>>>>
>>>> So how should the "new kernel" know if a page can be touched or not?
>>>
>>> I am sorry I am not familiar with kdump much. But from what I remember
>>> it reads from /proc/vmcore and implementation of this interface should
>>> simply return EINVAL or alike when you try to dump inaccessible memory
>>> range.
>>
>> Oh, and BTW, while something like -EINVAL could work, we usually don't
>> want to try to read certain pages at all (e.g. ballooned pages -
>> accessing the page might work but involves quite some overhead in the
>> hypervisor).
>>
>> So we should either handle this in dump tools (reserved + ...?) or while
>> doing the read similar to XEN (is_ram_page()).
>
> Yes, I think this is the proper way. Just test for PageOnline
> in read_from_oldmem/copy_oldmem_page. Btw. we already page
> pfn_to_online_page which performs the per-section online/offline
> status. This should be extendable to consider your new PageOffline
> state.
That is the important bit:
What the new kernel sees is not what the old kernel saw.
Checking for pfn_to_online_page() from
read_from_oldmem/copy_oldmem_page() is plain wrong.
E.g. ACPI hotplug memory is not even added in the new kernel - see
"acpi_no_memhotplug" which is used in kdump environments.
The only thing we can do is
- query the hypervisor
- try to access and get an exception
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists