lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <36f35c25-73d3-5eb5-ef48-948d6eac997a@amd.com>
Date:   Thu, 31 Jan 2019 14:22:08 +0000
From:   "Yang, Philip" <Philip.Yang@....com>
To:     Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>,
        amd-gfx list <amd-gfx@...ts.freedesktop.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>
Subject: Re: Yet another RX Vega hang with another kernel panic signature.
 WARNING: inconsistent lock state

I found same issue while debugging, I will submit patch to fix this shortly.

Philip

On 2019-01-30 10:35 p.m., Mikhail Gavrilov wrote:
> Hi folks.
> Yet another kernel panic happens while GPU again is hang:
> 
> [ 1469.906798] ================================
> [ 1469.906799] WARNING: inconsistent lock state
> [ 1469.906801] 5.0.0-0.rc4.git2.2.fc30.x86_64 #1 Tainted: G         C
> [ 1469.906802] --------------------------------
> [ 1469.906804] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> [ 1469.906806] kworker/12:3/681 [HC0[0]:SC0[0]:HE1:SE1] takes:
> [ 1469.906807] 00000000d591b82b
> (&(&adev->vm_manager.pasid_lock)->rlock){?...}, at:
> amdgpu_vm_get_task_info+0x23/0x80 [amdgpu]
> [ 1469.906851] {IN-HARDIRQ-W} state was registered at:
> [ 1469.906855]   _raw_spin_lock+0x31/0x80
> [ 1469.906893]   amdgpu_vm_get_task_info+0x23/0x80 [amdgpu]
> [ 1469.906936]   gmc_v9_0_process_interrupt+0x198/0x2b0 [amdgpu]
> [ 1469.906978]   amdgpu_irq_dispatch+0x90/0x1f0 [amdgpu]
> [ 1469.907018]   amdgpu_irq_callback+0x4a/0x70 [amdgpu]
> [ 1469.907061]   amdgpu_ih_process+0x89/0x100 [amdgpu]
> [ 1469.907103]   amdgpu_irq_handler+0x22/0x50 [amdgpu]
> [ 1469.907106]   __handle_irq_event_percpu+0x3f/0x290
> [ 1469.907108]   handle_irq_event_percpu+0x31/0x80
> [ 1469.907109]   handle_irq_event+0x34/0x51
> [ 1469.907111]   handle_edge_irq+0x7c/0x1a0
> [ 1469.907114]   handle_irq+0xbf/0x100
> [ 1469.907116]   do_IRQ+0x61/0x120
> [ 1469.907118]   ret_from_intr+0x0/0x22
> [ 1469.907121]   cpuidle_enter_state+0xbf/0x470
> [ 1469.907123]   do_idle+0x1ec/0x280
> [ 1469.907125]   cpu_startup_entry+0x19/0x20
> [ 1469.907127]   start_secondary+0x1b3/0x200
> [ 1469.907129]   secondary_startup_64+0xa4/0xb0
> [ 1469.907131] irq event stamp: 5546749
> [ 1469.907133] hardirqs last  enabled at (5546749):
> [<ffffffff9719112a>] ktime_get+0xfa/0x130
> [ 1469.907135] hardirqs last disabled at (5546748):
> [<ffffffff9719105b>] ktime_get+0x2b/0x130
> [ 1469.907137] softirqs last  enabled at (5498318):
> [<ffffffff97e0035f>] __do_softirq+0x35f/0x46a
> [ 1469.907140] softirqs last disabled at (5497393):
> [<ffffffff970ee119>] irq_exit+0x119/0x120
> [ 1469.907141]
>                 other info that might help us debug this:
> [ 1469.907142]  Possible unsafe locking scenario:
> 
> [ 1469.907143]        CPU0
> [ 1469.907144]        ----
> [ 1469.907144]   lock(&(&adev->vm_manager.pasid_lock)->rlock);
> [ 1469.907146]   <Interrupt>
> [ 1469.907147]     lock(&(&adev->vm_manager.pasid_lock)->rlock);
> [ 1469.907148]
>                  *** DEADLOCK ***
> 
> [ 1469.907150] 2 locks held by kworker/12:3/681:
> [ 1469.907152]  #0: 00000000953235a7 ((wq_completion)"events"){+.+.},
> at: process_one_work+0x1e9/0x5d0
> [ 1469.907157]  #1: 0000000071a3d218
> ((work_completion)(&(&sched->work_tdr)->work)){+.+.}, at:
> process_one_work+0x1e9/0x5d0
> [ 1469.907160]
>                 stack backtrace:
> [ 1469.907163] CPU: 12 PID: 681 Comm: kworker/12:3 Tainted: G
> C        5.0.0-0.rc4.git2.2.fc30.x86_64 #1
> [ 1469.907165] Hardware name: System manufacturer System Product
> Name/ROG STRIX X470-I GAMING, BIOS 1103 11/16/2018
> [ 1469.907169] Workqueue: events drm_sched_job_timedout [gpu_sched]
> [ 1469.907171] Call Trace:
> [ 1469.907176]  dump_stack+0x85/0xc0
> [ 1469.907180]  print_usage_bug.cold+0x1ae/0x1e8
> [ 1469.907183]  ? print_shortest_lock_dependencies+0x40/0x40
> [ 1469.907185]  mark_lock+0x50a/0x600
> [ 1469.907186]  ? print_shortest_lock_dependencies+0x40/0x40
> [ 1469.907189]  __lock_acquire+0x544/0x1660
> [ 1469.907191]  ? mark_held_locks+0x57/0x80
> [ 1469.907193]  ? trace_hardirqs_on_thunk+0x1a/0x1c
> [ 1469.907195]  ? lockdep_hardirqs_on+0xed/0x180
> [ 1469.907197]  ? trace_hardirqs_on_thunk+0x1a/0x1c
> [ 1469.907200]  ? retint_kernel+0x10/0x10
> [ 1469.907202]  lock_acquire+0xa2/0x1b0
> [ 1469.907242]  ? amdgpu_vm_get_task_info+0x23/0x80 [amdgpu]
> [ 1469.907245]  _raw_spin_lock+0x31/0x80
> [ 1469.907283]  ? amdgpu_vm_get_task_info+0x23/0x80 [amdgpu]
> [ 1469.907323]  amdgpu_vm_get_task_info+0x23/0x80 [amdgpu]
> [ 1469.907324] ------------[ cut here ]------------
> 
> 
> My kernel commit is: 62967898789d
> 
> 
> 
> --
> Best Regards,
> Mike Gavrilov.
> 
> 
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@...ts.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ