lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 6 Sep 2019 10:02:15 +0000
From:   Toshiki Fukasawa <t-fukasawa@...jp.nec.com>
To:     David Hildenbrand <david@...hat.com>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "dan.j.williams@...el.com" <dan.j.williams@...el.com>
CC:     Toshiki Fukasawa <t-fukasawa@...jp.nec.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "mhocko@...nel.org" <mhocko@...nel.org>,
        "adobriyan@...il.com" <adobriyan@...il.com>,
        "hch@....de" <hch@....de>,
        "longman@...hat.com" <longman@...hat.com>,
        "sfr@...b.auug.org.au" <sfr@...b.auug.org.au>,
        "mst@...hat.com" <mst@...hat.com>,
        Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
        Junichi Nomura <j-nomura@...jp.nec.com>
Subject: Re: [RFC PATCH v2] mm: initialize struct pages reserved by
 ZONE_DEVICE driver.

Thank you for your feedback.

On 2019/09/06 17:45, David Hildenbrand wrote:
> On 06.09.19 10:09, Toshiki Fukasawa wrote:
>> A kernel panic is observed during reading
>> /proc/kpage{cgroup,count,flags} for first few pfns allocated by
>> pmem namespace:
>>
>> BUG: unable to handle page fault for address: fffffffffffffffe
>> [  114.495280] #PF: supervisor read access in kernel mode
>> [  114.495738] #PF: error_code(0x0000) - not-present page
>> [  114.496203] PGD 17120e067 P4D 17120e067 PUD 171210067 PMD 0
>> [  114.496713] Oops: 0000 [#1] SMP PTI
>> [  114.497037] CPU: 9 PID: 1202 Comm: page-types Not tainted 5.3.0-rc1
>> [  114.497621] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
>> [  114.498706] RIP: 0010:stable_page_flags+0x27/0x3f0
>> [  114.499142] Code: 82 66 90 66 66 66 66 90 48 85 ff 0f 84 d1 03 00 00 41 54 55 48 89 fd 53 48 8b 57 08 48 8b 1f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 02 0f 84 57 03 00 00 45 31 e4 48 8b 55 08 48 89 ef
>> [  114.500788] RSP: 0018:ffffa5e601a0fe60 EFLAGS: 00010202
>> [  114.501373] RAX: fffffffffffffffe RBX: ffffffffffffffff RCX: 0000000000000000
>> [  114.502009] RDX: 0000000000000001 RSI: 00007ffca13a7310 RDI: ffffd07489000000
>> [  114.502637] RBP: ffffd07489000000 R08: 0000000000000001 R09: 0000000000000000
>> [  114.503270] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000240000
>> [  114.503896] R13: 0000000000080000 R14: 00007ffca13a7310 R15: ffffa5e601a0ff08
>> [  114.504530] FS:  00007f0266c7f540(0000) GS:ffff962dbbac0000(0000) knlGS:0000000000000000
>> [  114.505245] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  114.505754] CR2: fffffffffffffffe CR3: 000000023a204000 CR4: 00000000000006e0
>> [  114.506401] Call Trace:
>> [  114.506660]  kpageflags_read+0xb1/0x130
>> [  114.507051]  proc_reg_read+0x39/0x60
>> [  114.507387]  vfs_read+0x8a/0x140
>> [  114.507686]  ksys_pread64+0x61/0xa0
>> [  114.508021]  do_syscall_64+0x5f/0x1a0
>> [  114.508372]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [  114.508844] RIP: 0033:0x7f0266ba426b
>>
>> The first few pages of ZONE_DEVICE expressed as the range
>> (altmap->base_pfn) to (altmap->base_pfn + altmap->reserve) are
>> skipped by struct page initialization. Some pfn walkers like
>> /proc/kpage{cgroup, count, flags} can't handle these uninitialized
>> struct pages, which causes the error.
>>
>> In previous discussion, Dan seemed to have concern that the struct
>> page area of some pages indicated by vmem_altmap->reserve may not
>> be allocated. (See https://lore.kernel.org/lkml/CAPcyv4i5FjTOnPbXNcTzvt+e6RQYow0JRQwSFuxaa62LSuvzHQ@mail.gmail.com/)
>> However, arch_add_memory() called by devm_memremap_pages() allocates
>> struct page area for pages containing addresses in the range
>> (res.start) to (res.start + resource_size(res)), which include the
>> pages indicated by vmem_altmap->reserve. If I read correctly, it is
>> allocated as requested at least on x86_64. Also, memmap_init_zone()
>> initializes struct pages in the same range.
>> So I think the struct pages should be initialized.>
> 
> For !ZONE_DEVICE memory, the memmap is valid with SECTION_IS_ONLINE -
> for the whole section. For ZONE_DEVICE memory we have no such
> indication. In any section that is !SECTION_IS_ONLINE and
> SECTION_MARKED_PRESENT, we could have any subsections initialized. >
> The only indication I am aware of is pfn_zone_device_reserved() - which
> seems to check exactly what you are trying to skip here.
> 
> Can't you somehow use pfn_zone_device_reserved() ? Or if you considered
> that already, why did you decide against it?

No, in current approach this function is no longer needed.
The reason why we change the approach is that all pfn walkers
have to be aware of the uninitialized struct pages.

As for SECTION_IS_ONLINE, I'm not sure now.
I will look into it next week.

Thanks,
Toshiki Fukasawa

> 
>> Signed-off-by: Toshiki Fukasawa <t-fukasawa@...jp.nec.com>
>> Cc: stable@...r.kernel.org
>> ---
>> Changes since rev 1:
>>   Instead of avoiding uninitialized pages on the pfn walker side,
>>   we initialize struct pages.
>>
>> mm/page_alloc.c | 5 +----
>>   1 file changed, 1 insertion(+), 4 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 9c91949..6d180ae 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5846,8 +5846,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>   
>>   #ifdef CONFIG_ZONE_DEVICE
>>   	/*
>> -	 * Honor reservation requested by the driver for this ZONE_DEVICE
>> -	 * memory. We limit the total number of pages to initialize to just
>> +	 * We limit the total number of pages to initialize to just
>>   	 * those that might contain the memory mapping. We will defer the
>>   	 * ZONE_DEVICE page initialization until after we have released
>>   	 * the hotplug lock.
>> @@ -5856,8 +5855,6 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>   		if (!altmap)
>>   			return;
>>   
>> -		if (start_pfn == altmap->base_pfn)
>> -			start_pfn += altmap->reserve;
>>   		end_pfn = altmap->base_pfn + vmem_altmap_offset(altmap);
>>   	}
>>   #endif
>>
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ