lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+CK2bA=E4zRFb0Qky=baOQi_LF4x4eu8KVdEkhPJo3wWr8dYQ@mail.gmail.com>
Date:   Thu, 2 May 2019 17:44:03 -0400
From:   Pavel Tatashin <pasha.tatashin@...een.com>
To:     "Verma, Vishal L" <vishal.l.verma@...el.com>
Cc:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "jmorris@...ei.org" <jmorris@...ei.org>,
        "tiwai@...e.de" <tiwai@...e.de>,
        "sashal@...nel.org" <sashal@...nel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "david@...hat.com" <david@...hat.com>, "bp@...e.de" <bp@...e.de>,
        "Williams, Dan J" <dan.j.williams@...el.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "linux-nvdimm@...ts.01.org" <linux-nvdimm@...ts.01.org>,
        "jglisse@...hat.com" <jglisse@...hat.com>,
        "zwisler@...nel.org" <zwisler@...nel.org>,
        "mhocko@...e.com" <mhocko@...e.com>,
        "Jiang, Dave" <dave.jiang@...el.com>,
        "bhelgaas@...gle.com" <bhelgaas@...gle.com>,
        "Busch, Keith" <keith.busch@...el.com>,
        "thomas.lendacky@....com" <thomas.lendacky@....com>,
        "Huang, Ying" <ying.huang@...el.com>,
        "Wu, Fengguang" <fengguang.wu@...el.com>,
        "baiyaowei@...s.chinamobile.com" <baiyaowei@...s.chinamobile.com>
Subject: Re: [v5 0/3] "Hotremove" persistent memory

On Thu, May 2, 2019 at 4:50 PM Verma, Vishal L <vishal.l.verma@...el.com> wrote:
>
> On Thu, 2019-05-02 at 14:43 -0400, Pavel Tatashin wrote:
> > The series of operations look like this:
> >
> > 1. After boot restore /dev/pmem0 to ramdisk to be consumed by apps.
> >    and free ramdisk.
> > 2. Convert raw pmem0 to devdax
> >    ndctl create-namespace --mode devdax --map mem -e namespace0.0 -f
> > 3. Hotadd to System RAM
> >    echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
> >    echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
> >    echo online_movable > /sys/devices/system/memoryXXX/state
> > 4. Before reboot hotremove device-dax memory from System RAM
> >    echo offline > /sys/devices/system/memoryXXX/state
> >    echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind
>
> Hi Pavel,
>
> I am working on adding this sort of a workflow into a new daxctl command
> (daxctl-reconfigure-device)- this will allow changing the 'mode' of a
> dax device to kmem, online the resulting memory, and with your patches,
> also attempt to offline the memory, and change back to device-dax.
>
> In running with these patches, and testing the offlining part, I ran
> into the following lockdep below.
>
> This is with just these three patches on top of -rc7.

Hi Verma,

Thank you for testing. I wonder if there is a command sequence that I
could run to reproduce it?
Also, could you please send your config and qemu arguments.

Thank you,
Pasha

>
>
> [  +0.004886] ======================================================
> [  +0.001576] WARNING: possible circular locking dependency detected
> [  +0.001506] 5.1.0-rc7+ #13 Tainted: G           O
> [  +0.000929] ------------------------------------------------------
> [  +0.000708] daxctl/22950 is trying to acquire lock:
> [  +0.000548] 00000000f4d397f7 (kn->count#424){++++}, at: kernfs_remove_by_name_ns+0x40/0x80
> [  +0.000922]
>               but task is already holding lock:
> [  +0.000657] 000000002aa52a9f (mem_sysfs_mutex){+.+.}, at: unregister_memory_section+0x22/0xa0
> [  +0.000960]
>               which lock already depends on the new lock.
>
> [  +0.001001]
>               the existing dependency chain (in reverse order) is:
> [  +0.000837]
>               -> #3 (mem_sysfs_mutex){+.+.}:
> [  +0.000631]        __mutex_lock+0x82/0x9a0
> [  +0.000477]        unregister_memory_section+0x22/0xa0
> [  +0.000582]        __remove_pages+0xe9/0x520
> [  +0.000489]        arch_remove_memory+0x81/0xc0
> [  +0.000510]        devm_memremap_pages_release+0x180/0x270
> [  +0.000633]        release_nodes+0x234/0x280
> [  +0.000483]        device_release_driver_internal+0xf4/0x1d0
> [  +0.000701]        bus_remove_device+0xfc/0x170
> [  +0.000529]        device_del+0x16a/0x380
> [  +0.000459]        unregister_dev_dax+0x23/0x50
> [  +0.000526]        release_nodes+0x234/0x280
> [  +0.000487]        device_release_driver_internal+0xf4/0x1d0
> [  +0.000646]        unbind_store+0x9b/0x130
> [  +0.000467]        kernfs_fop_write+0xf0/0x1a0
> [  +0.000510]        vfs_write+0xba/0x1c0
> [  +0.000438]        ksys_write+0x5a/0xe0
> [  +0.000521]        do_syscall_64+0x60/0x210
> [  +0.000489]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [  +0.000637]
>               -> #2 (mem_hotplug_lock.rw_sem){++++}:
> [  +0.000717]        get_online_mems+0x3e/0x80
> [  +0.000491]        kmem_cache_create_usercopy+0x2e/0x270
> [  +0.000609]        kmem_cache_create+0x12/0x20
> [  +0.000507]        ptlock_cache_init+0x20/0x28
> [  +0.000506]        start_kernel+0x240/0x4d0
> [  +0.000480]        secondary_startup_64+0xa4/0xb0
> [  +0.000539]
>               -> #1 (cpu_hotplug_lock.rw_sem){++++}:
> [  +0.000784]        cpus_read_lock+0x3e/0x80
> [  +0.000511]        online_pages+0x37/0x310
> [  +0.000469]        memory_subsys_online+0x34/0x60
> [  +0.000611]        device_online+0x60/0x80
> [  +0.000611]        state_store+0x66/0xd0
> [  +0.000552]        kernfs_fop_write+0xf0/0x1a0
> [  +0.000649]        vfs_write+0xba/0x1c0
> [  +0.000487]        ksys_write+0x5a/0xe0
> [  +0.000459]        do_syscall_64+0x60/0x210
> [  +0.000482]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [  +0.000646]
>               -> #0 (kn->count#424){++++}:
> [  +0.000669]        lock_acquire+0x9e/0x180
> [  +0.000471]        __kernfs_remove+0x26a/0x310
> [  +0.000518]        kernfs_remove_by_name_ns+0x40/0x80
> [  +0.000583]        remove_files.isra.1+0x30/0x70
> [  +0.000555]        sysfs_remove_group+0x3d/0x80
> [  +0.000524]        sysfs_remove_groups+0x29/0x40
> [  +0.000532]        device_remove_attrs+0x42/0x80
> [  +0.000522]        device_del+0x162/0x380
> [  +0.000464]        device_unregister+0x16/0x60
> [  +0.000505]        unregister_memory_section+0x6e/0xa0
> [  +0.000591]        __remove_pages+0xe9/0x520
> [  +0.000492]        arch_remove_memory+0x81/0xc0
> [  +0.000568]        try_remove_memory+0xba/0xd0
> [  +0.000510]        remove_memory+0x23/0x40
> [  +0.000483]        dev_dax_kmem_remove+0x29/0x57 [kmem]
> [  +0.000608]        device_release_driver_internal+0xe4/0x1d0
> [  +0.000637]        unbind_store+0x9b/0x130
> [  +0.000464]        kernfs_fop_write+0xf0/0x1a0
> [  +0.000685]        vfs_write+0xba/0x1c0
> [  +0.000594]        ksys_write+0x5a/0xe0
> [  +0.000449]        do_syscall_64+0x60/0x210
> [  +0.000481]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [  +0.000619]
>               other info that might help us debug this:
>
> [  +0.000889] Chain exists of:
>                 kn->count#424 --> mem_hotplug_lock.rw_sem --> mem_sysfs_mutex
>
> [  +0.001269]  Possible unsafe locking scenario:
>
> [  +0.000652]        CPU0                    CPU1
> [  +0.000505]        ----                    ----
> [  +0.000523]   lock(mem_sysfs_mutex);
> [  +0.000422]                                lock(mem_hotplug_lock.rw_sem);
> [  +0.000905]                                lock(mem_sysfs_mutex);
> [  +0.000793]   lock(kn->count#424);
> [  +0.000394]
>                *** DEADLOCK ***
>
> [  +0.000665] 7 locks held by daxctl/22950:
> [  +0.000458]  #0: 000000005f6d3c13 (sb_writers#4){.+.+}, at: vfs_write+0x159/0x1c0
> [  +0.000943]  #1: 00000000e468825d (&of->mutex){+.+.}, at: kernfs_fop_write+0xbd/0x1a0
> [  +0.000895]  #2: 00000000caa17dbb (&dev->mutex){....}, at: device_release_driver_internal+0x1a/0x1d0
> [  +0.001019]  #3: 000000002119b22c (device_hotplug_lock){+.+.}, at: remove_memory+0x16/0x40
> [  +0.000942]  #4: 00000000150c8efe (cpu_hotplug_lock.rw_sem){++++}, at: try_remove_memory+0x2e/0xd0
> [  +0.001019]  #5: 000000003d6b2a0f (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x25/0x120
> [  +0.001118]  #6: 000000002aa52a9f (mem_sysfs_mutex){+.+.}, at: unregister_memory_section+0x22/0xa0
> [  +0.001033]
>               stack backtrace:
> [  +0.000507] CPU: 5 PID: 22950 Comm: daxctl Tainted: G           O      5.1.0-rc7+ #13
> [  +0.000896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
> [  +0.001360] Call Trace:
> [  +0.000293]  dump_stack+0x85/0xc0
> [  +0.000390]  print_circular_bug.isra.41.cold.60+0x15c/0x195
> [  +0.000651]  check_prev_add.constprop.50+0x5fd/0xbe0
> [  +0.000563]  ? call_rcu_zapped+0x80/0x80
> [  +0.000449]  __lock_acquire+0xcee/0xfd0
> [  +0.000437]  lock_acquire+0x9e/0x180
> [  +0.000428]  ? kernfs_remove_by_name_ns+0x40/0x80
> [  +0.000531]  __kernfs_remove+0x26a/0x310
> [  +0.000451]  ? kernfs_remove_by_name_ns+0x40/0x80
> [  +0.000529]  ? kernfs_name_hash+0x12/0x80
> [  +0.000462]  kernfs_remove_by_name_ns+0x40/0x80
> [  +0.000513]  remove_files.isra.1+0x30/0x70
> [  +0.000483]  sysfs_remove_group+0x3d/0x80
> [  +0.000458]  sysfs_remove_groups+0x29/0x40
> [  +0.000477]  device_remove_attrs+0x42/0x80
> [  +0.000461]  device_del+0x162/0x380
> [  +0.000399]  device_unregister+0x16/0x60
> [  +0.000442]  unregister_memory_section+0x6e/0xa0
> [  +0.001232]  __remove_pages+0xe9/0x520
> [  +0.000443]  arch_remove_memory+0x81/0xc0
> [  +0.000459]  try_remove_memory+0xba/0xd0
> [  +0.000460]  remove_memory+0x23/0x40
> [  +0.000461]  dev_dax_kmem_remove+0x29/0x57 [kmem]
> [  +0.000603]  device_release_driver_internal+0xe4/0x1d0
> [  +0.000590]  unbind_store+0x9b/0x130
> [  +0.000409]  kernfs_fop_write+0xf0/0x1a0
> [  +0.000448]  vfs_write+0xba/0x1c0
> [  +0.000395]  ksys_write+0x5a/0xe0
> [  +0.000382]  do_syscall_64+0x60/0x210
> [  +0.000418]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [  +0.000573] RIP: 0033:0x7fd1f7442fa8
> [  +0.000407] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 75 77 0d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
> [  +0.002119] RSP: 002b:00007ffd48f58e28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [  +0.000833] RAX: ffffffffffffffda RBX: 000000000210c817 RCX: 00007fd1f7442fa8
> [  +0.000795] RDX: 0000000000000007 RSI: 000000000210c817 RDI: 0000000000000003
> [  +0.000816] RBP: 0000000000000007 R08: 000000000210c7d0 R09: 00007fd1f74d4e80
> [  +0.000808] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
> [  +0.000819] R13: 00007fd1f72b9ce8 R14: 0000000000000000 R15: 00007ffd48f58e70

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ