linux-kernel - Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPcyv4g1ULSDXzp5ufpXzp5JB3z1TN_za=_AqUC8E6f2FNsycw@mail.gmail.com>
Date:   Tue, 23 Aug 2016 20:58:12 -0700
From:   Dan Williams <dan.j.williams@...el.com>
To:     "Kani, Toshimitsu" <toshi.kani@....com>
Cc:     "Mulumudi, Abhilash Kumar" <m.abhilash-kumar@....com>,
        "linux-nvdimm@...ts.01.org" <linux-nvdimm@...ts.01.org>,
        "ard.biesheuvel@...aro.org" <ard.biesheuvel@...aro.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "brian.starkey@....com" <brian.starkey@....com>
Subject: Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page()

On Tue, Aug 23, 2016 at 7:53 PM, Dan Williams <dan.j.williams@...el.com> wrote:
> On Tue, Aug 23, 2016 at 6:29 PM, Kani, Toshimitsu <toshi.kani@....com> wrote:
>>> On Tue, Aug 23, 2016 at 4:47 PM, Kani, Toshimitsu <toshi.kani@....com>
>>> wrote:
>>> > On Tue, 2016-08-23 at 15:32 -0700, Dan Williams wrote:
>>> >> On Tue, Aug 23, 2016 at 11:43 AM, Toshi Kani <toshi.kani@....com>
>>> >> wrote:
>>> >  :
>>> >> I'm not sure about this fix.  The point of honoring
>>> >> vmem_altmap_offset() is because a portion of the resource that is
>>> >> passed to devm_memremap_pages() also contains the metadata info
>>> block
>>> >> for the device.  The offset says "use everything past this point for
>>> >> pages".  This may work for avoiding a crash, but it may corrupt info
>>> >> block metadata in the process.  Can you provide more information
>>> >> about the failing scenario to be sure that we are not triggering a
>>> >> fault on an address that is not meant to have a page mapping?  I.e.
>>> >> what is the host physical address of the page that caused this fault,
>>> >> and is it valid?
>>> >
>>> > The fault address in question was the 2nd page of an NVDIMM range.  I
>>> > assumed this fault address was valid and needed to be handled.  Here is
>>> > some info about the base and patched cases.  Let me know if you need
>>> > more info.
>>> >
>>> > Base
>>> > ====
>>> >
>>> > The following NVDIMM range was set to /dev/dax.
>>>
>>> With ndctl create-namespace or manually via sysfs?  Specifically I'm
>>> looking for what the 'align' attribute was set to when the
>>> configuration was established.  Can you provide a dump of the sysfs
>>> attributes for the /dev/dax parent device?
>>
>> I used the ndctl command below.
>> ndctl create-namespace -f -e namespace0.0 -m dax
>>
>> Here is additional info from my note for the base case.
>>
>> p {struct dev_pagemap} 0xffff88046d0453f0
>> $3 = {
>>   altmap = 0xffff88046d045410,
>>   res = 0xffff88046d0453a8,
>>   ref = 0xffff88046d0452f0,
>>   dev = 0xffff880464790410
>> }
>>
>> crash> p {struct vmem_altmap} 0xffff88046d045410
>> $6 = {
>>   base_pfn = 0x480000,
>>   reserve = 0x2,        // PHYS_PFN(SZ_8K)
>>   free = 0x101fe,
>>   align = 0x1fe,
>>   alloc = 0x10000
>> }
>
> Ah, so, on second look the 0x490200000 data offset looks correct.  The
> total size of the address range is 16GB which equates to 256MB needed
> for struct page, plus 2MB more to re-align the data on the next 2MB
> boundary.
>
> The question now is why is the guest faulting on an access to an
> address less than 0x490200000?

Does the attached patch fix this for you?

View attachment "0001-dax-fix-device-dax-region-base.patch" of type "text/x-patch" (3372 bytes)