lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 8 Oct 2018 15:07:42 -0700 From: Alexander Duyck <alexander.h.duyck@...ux.intel.com> To: Dan Williams <dan.j.williams@...el.com> Cc: Linux MM <linux-mm@...ck.org>, Andrew Morton <akpm@...ux-foundation.org>, Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, linux-nvdimm <linux-nvdimm@...ts.01.org>, Pasha Tatashin <pavel.tatashin@...rosoft.com>, Michal Hocko <mhocko@...e.com>, Dave Jiang <dave.jiang@...el.com>, Dave Hansen <dave.hansen@...el.com>, Jérôme Glisse <jglisse@...hat.com>, rppt@...ux.vnet.ibm.com, Logan Gunthorpe <logang@...tatee.com>, Ingo Molnar <mingo@...nel.org>, "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com> Subject: Re: [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap On 10/8/2018 3:00 PM, Dan Williams wrote: > On Mon, Oct 8, 2018 at 2:48 PM Alexander Duyck > <alexander.h.duyck@...ux.intel.com> wrote: >> >> On 10/8/2018 2:01 PM, Dan Williams wrote: >>> On Tue, Sep 25, 2018 at 1:29 PM Alexander Duyck >>> <alexander.h.duyck@...ux.intel.com> wrote: >>>> >>>> The ZONE_DEVICE pages were being initialized in two locations. One was with >>>> the memory_hotplug lock held and another was outside of that lock. The >>>> problem with this is that it was nearly doubling the memory initialization >>>> time. Instead of doing this twice, once while holding a global lock and >>>> once without, I am opting to defer the initialization to the one outside of >>>> the lock. This allows us to avoid serializing the overhead for memory init >>>> and we can instead focus on per-node init times. >>>> >>>> One issue I encountered is that devm_memremap_pages and >>>> hmm_devmmem_pages_create were initializing only the pgmap field the same >>>> way. One wasn't initializing hmm_data, and the other was initializing it to >>>> a poison value. Since this is something that is exposed to the driver in >>>> the case of hmm I am opting for a third option and just initializing >>>> hmm_data to 0 since this is going to be exposed to unknown third party >>>> drivers. >>>> >>>> Reviewed-by: Pavel Tatashin <pavel.tatashin@...rosoft.com> >>>> Signed-off-by: Alexander Duyck <alexander.h.duyck@...ux.intel.com> >>>> --- >>>> >>>> v4: Moved moved memmap_init_zone_device to below memmmap_init_zone to avoid >>>> merge conflicts with other changes in the kernel. >>>> v5: No change >>> >>> This patch appears to cause a regression in the "create.sh" unit test >>> in the ndctl test suite. >> >> So all you had to do is run the create.sh script to see the issue? I >> just want to confirm there isn't any additional information needed >> before I try chasing this down. > > From the ndctl source tree run: > > make -j TESTS="create.sh" check > > ...the readme has some more setup instructions: > https://github.com/pmem/ndctl/blob/master/README.md > > 0day has sometimes run this test suite automatically, but we need to > get that more robust because setting up this environment is a bit of a > hoop to jump through with the need to setup the nfit_test module. > >>> I tried to reproduce on -next with: >>> >>> 2302f5ee215e mm: defer ZONE_DEVICE page initialization to the point >>> where we init pgmap >>> >>> ...but -next does not even boot for me at that commit. >> >> What version of -next? There are a couple of patches probably needed >> depending on which version you are trying to boot. > > Today's -next, but backed up to that above commit. I was also seeing > CONFIG_DEBUG_LIST spamming the logs, and a crash in the crypto layer. > >>> Here is a warning signature that proceeds a hang with this patch >>> applied against v4.19-rc6: >>> >>> percpu ref (blk_queue_usage_counter_release) <= 0 (-1530626) after >>> switching to atomic >>> WARNING: CPU: 24 PID: 7346 at lib/percpu-refcount.c:155 >>> percpu_ref_switch_to_atomic_rcu+0x1f7/0x200 >>> CPU: 24 PID: 7346 Comm: modprobe Tainted: G OE 4.19.0-rc6+ #2458 >>> [..] >>> RIP: 0010:percpu_ref_switch_to_atomic_rcu+0x1f7/0x200 >>> [..] >>> Call Trace: >>> <IRQ> >>> ? percpu_ref_reinit+0x140/0x140 >>> rcu_process_callbacks+0x273/0x880 >>> __do_softirq+0xd2/0x428 >>> irq_exit+0xf6/0x100 >>> smp_apic_timer_interrupt+0xa2/0x220 >>> apic_timer_interrupt+0xf/0x20 >>> </IRQ> >>> RIP: 0010:lock_acquire+0xb8/0x1a0 >>> [..] >>> ? __put_page+0x55/0x150 >>> ? __put_page+0x55/0x150 >>> __put_page+0x83/0x150 >>> ? __put_page+0x55/0x150 >>> devm_memremap_pages_release+0x194/0x250 >>> release_nodes+0x17c/0x2c0 >>> device_release_driver_internal+0x1a2/0x250 >>> driver_detach+0x3a/0x70 >>> bus_remove_driver+0x58/0xd0 >>> __x64_sys_delete_module+0x13f/0x200 >>> ? trace_hardirqs_off_thunk+0x1a/0x1c >>> do_syscall_64+0x60/0x210 >>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>> >> >> So it looks like we are tearing down memory when this is triggered. Do >> we know if this is at the end of the test or if this is running in >> parallel with anything? > > Should not be running in parallel with anything this test is > performing a series of namespace setup and teardown events. > > Wait, where did the call to "percpu_ref_get()" go? I think that's the bug. Actually I think you are probably right. Do you want to get that or should I. Should be a quick patch since you could probably just add a call to percpu_ref_get_many to hold a reference for each page in the range of device pages before calling memmap_init_zone_device. - Alex
Powered by blists - more mailing lists