lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d4528eb7-9d8b-4073-afad-d8dd1390aa91@redhat.com>
Date:   Tue, 24 Jul 2018 10:46:20 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Vlastimil Babka <vbabka@...e.cz>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Baoquan He <bhe@...hat.com>, Dave Young <dyoung@...hat.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Hari Bathini <hbathini@...ux.vnet.ibm.com>,
        Huang Ying <ying.huang@...el.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Marc-André Lureau <marcandre.lureau@...hat.com>,
        Matthew Wilcox <mawilcox@...rosoft.com>,
        Miles Chen <miles.chen@...iatek.com>,
        Pavel Tatashin <pasha.tatashin@...cle.com>,
        Petr Tesarik <ptesarik@...e.cz>
Subject: Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps

On 24.07.2018 09:25, Michal Hocko wrote:
> On Mon 23-07-18 19:20:43, David Hildenbrand wrote:
>> On 23.07.2018 14:30, Michal Hocko wrote:
>>> On Mon 23-07-18 13:45:18, Vlastimil Babka wrote:
>>>> On 07/20/2018 02:34 PM, David Hildenbrand wrote:
>>>>> Dumping tools (like makedumpfile) right now don't exclude reserved pages.
>>>>> So reserved pages might be access by dump tools although nobody except
>>>>> the owner should touch them.
>>>>
>>>> Are you sure about that? Or maybe I understand wrong. Maybe it changed
>>>> recently, but IIRC pages that are backing memmap (struct pages) are also
>>>> PG_reserved. And you definitely do want those in the dump.
>>>
>>> You are right. reserve_bootmem_region will make all early bootmem
>>> allocations (including those backing memmaps) PageReserved. I have asked
>>> several times but I haven't seen a satisfactory answer yet. Why do we
>>> even care for kdump about those. If they are reserved the nobody should
>>> really look at those specific struct pages and manipulate them. Kdump
>>> tools are using a kernel interface to read the content. If the specific
>>> content is backed by a non-existing memory then they should simply not
>>> return anything.
>>>
>>
>> "new kernel" provides an interface to read memory from "old kernel".
>>
>> The new kernel has no idea about
>> - which memory was added/online in the old kernel
>> - where struct pages of the old kernel are and what their content is
>> - which memory is save to touch and which not
>>
>> Dump tools figure all that out by interpreting the VMCORE. They e.g.
>> identify "struct pages" and see if they should be dumped. The "new
>> kernel" only allows to read that memory. It cannot hinder to crash the
>> system (e.g. if a dump tool would try to read a hwpoison page).
>>
>> So how should the "new kernel" know if a page can be touched or not?
> 
> I am sorry I am not familiar with kdump much. But from what I remember
> it reads from /proc/vmcore and implementation of this interface should
> simply return EINVAL or alike when you try to dump inaccessible memory
> range.

I assume the main problem with this approach is that we would always
have to fallback to reading old memory from vmcore page by page. e.g.
makedumpfile will always try to read bigger bunches. I also assume the
reason HWPOISON is handled in dump tools instead of in the kernel using
the mechanism you describe is the case.

One way to avoid this would be to silently "read zero". Although not
nice, it avoids having to touch dump tools.

E.g. fs/proc/vmcore.c:read_from_oldmem() has a hook called
"pfn_is_ram()". This is the hook for XEN I mentioned previously.

-> register_oldmem_pfn_is_ram()

However this callback right now assumes that there is a "global
hypervisor implemented way of checking whether a page is accessible". We
don't want anything like that in KVM.

I could imagine extending this register mechanism in a way that
- we can have multiple callbacks
- we can return something like "Yes" / "No" / "Don't know"

So we could have multiple devices (controlling a memory area) register
there and when called, they could see if they are responsible for that
area and query the hypervisor (e.g. using virtio).

Might be complicated but the last resort.

-- 

Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ