lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF6AEGv3y3C6nAq7nrkgbv5-9-tVgj+BtY1yU+fXXFFm_N7fcQ@mail.gmail.com>
Date:   Thu, 8 Jun 2023 14:17:45 -0700
From:   Rob Clark <robdclark@...il.com>
To:     Johan Hovold <johan@...nel.org>
Cc:     Abhinav Kumar <quic_abhinavk@...cinc.com>,
        Dmitry Baryshkov <dmitry.baryshkov@...aro.org>,
        Sean Paul <sean@...rly.run>, David Airlie <airlied@...il.com>,
        Daniel Vetter <daniel@...ll.ch>,
        dri-devel@...ts.freedesktop.org, linux-arm-msm@...r.kernel.org,
        freedreno@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: Re: Adreno devfreq lockdep splat with 6.3-rc2

On Thu, Jun 8, 2023 at 7:12 AM Johan Hovold <johan@...nel.org> wrote:
>
> Hi Rob,
>
> Have you had a chance to look at this regression yet? It prevents us
> from using lockdep on the X13s as it is disabled as soon as we start
> the GPU.

Hmm, curious what is different between x13s and sc7180/sc7280 things?
Or did lockdep recently get more clever (or more annotation)?

I did spend some time a while back trying to bring some sense to
devfreq/pm-qos/icc locking:
https://patchwork.freedesktop.org/series/115028/

but haven't had time to revisit that for a while

BR,
-R

> On Wed, Mar 15, 2023 at 10:19:21AM +0100, Johan Hovold wrote:
> >
> > Since 6.3-rc2 (or possibly -rc1), I'm now seeing the below
> > devfreq-related lockdep splat.
> >
> > I noticed that you posted a fix for something similar here:
> >
> >       https://lore.kernel.org/r/20230312204150.1353517-9-robdclark@gmail.com
> >
> > but that particular patch makes no difference.
> >
> > From skimming the calltraces below and qos/devfreq related changes in
> > 6.3-rc1 it seems like this could be related to:
> >
> >       fadcc3ab1302 ("drm/msm/gpu: Bypass PM QoS constraint for idle clamp")
>
> Below is an updated splat from 6.4-rc5.
>
> Johan
>
> [ 2941.931507] ======================================================
> [ 2941.931509] WARNING: possible circular locking dependency detected
> [ 2941.931513] 6.4.0-rc5 #64 Not tainted
> [ 2941.931516] ------------------------------------------------------
> [ 2941.931518] ring0/359 is trying to acquire lock:
> [ 2941.931520] ffff63310e35c078 (&devfreq->lock){+.+.}-{3:3}, at: qos_min_notifier_call+0x28/0x88
> [ 2941.931541]
>                but task is already holding lock:
> [ 2941.931543] ffff63310e3cace8 (&(c->notifiers)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x30/0x70
> [ 2941.931553]
>                which lock already depends on the new lock.
>
> [ 2941.931555]
>                the existing dependency chain (in reverse order) is:
> [ 2941.931556]
>                -> #4 (&(c->notifiers)->rwsem){++++}-{3:3}:
> [ 2941.931562]        down_write+0x50/0x198
> [ 2941.931567]        blocking_notifier_chain_register+0x30/0x8c
> [ 2941.931570]        freq_qos_add_notifier+0x68/0x7c
> [ 2941.931574]        dev_pm_qos_add_notifier+0xa0/0xf8
> [ 2941.931579]        devfreq_add_device.part.0+0x360/0x5a4
> [ 2941.931583]        devm_devfreq_add_device+0x74/0xe0
> [ 2941.931587]        msm_devfreq_init+0xa0/0x154 [msm]
> [ 2941.931624]        msm_gpu_init+0x2fc/0x588 [msm]
> [ 2941.931649]        adreno_gpu_init+0x160/0x2d0 [msm]
> [ 2941.931675]        a6xx_gpu_init+0x2c0/0x74c [msm]
> [ 2941.931699]        adreno_bind+0x180/0x290 [msm]
> [ 2941.931723]        component_bind_all+0x124/0x288
> [ 2941.931728]        msm_drm_bind+0x1d8/0x6cc [msm]
> [ 2941.931752]        try_to_bring_up_aggregate_device+0x1ec/0x2f4
> [ 2941.931755]        __component_add+0xa8/0x194
> [ 2941.931758]        component_add+0x14/0x20
> [ 2941.931761]        dp_display_probe+0x2b4/0x474 [msm]
> [ 2941.931785]        platform_probe+0x68/0xd8
> [ 2941.931789]        really_probe+0x184/0x3c8
> [ 2941.931792]        __driver_probe_device+0x7c/0x16c
> [ 2941.931794]        driver_probe_device+0x3c/0x110
> [ 2941.931797]        __device_attach_driver+0xbc/0x158
> [ 2941.931800]        bus_for_each_drv+0x84/0xe0
> [ 2941.931802]        __device_attach+0xa8/0x1d4
> [ 2941.931805]        device_initial_probe+0x14/0x20
> [ 2941.931807]        bus_probe_device+0xb0/0xb4
> [ 2941.931810]        deferred_probe_work_func+0xa0/0xf4
> [ 2941.931812]        process_one_work+0x288/0x5bc
> [ 2941.931816]        worker_thread+0x74/0x450
> [ 2941.931818]        kthread+0x124/0x128
> [ 2941.931822]        ret_from_fork+0x10/0x20
> [ 2941.931826]
>                -> #3 (dev_pm_qos_mtx){+.+.}-{3:3}:
> [ 2941.931831]        __mutex_lock+0xa0/0x840
> [ 2941.931833]        mutex_lock_nested+0x24/0x30
> [ 2941.931836]        dev_pm_qos_remove_notifier+0x34/0x140
> [ 2941.931838]        genpd_remove_device+0x3c/0x174
> [ 2941.931841]        genpd_dev_pm_detach+0x78/0x1b4
> [ 2941.931844]        dev_pm_domain_detach+0x24/0x34
> [ 2941.931846]        a6xx_gmu_remove+0x34/0xc4 [msm]
> [ 2941.931869]        a6xx_destroy+0xd0/0x160 [msm]
> [ 2941.931892]        adreno_unbind+0x40/0x64 [msm]
> [ 2941.931916]        component_unbind+0x38/0x6c
> [ 2941.931919]        component_unbind_all+0xc8/0xd4
> [ 2941.931921]        msm_drm_uninit.isra.0+0x150/0x1c4 [msm]
> [ 2941.931945]        msm_drm_bind+0x310/0x6cc [msm]
> [ 2941.931967]        try_to_bring_up_aggregate_device+0x1ec/0x2f4
> [ 2941.931970]        __component_add+0xa8/0x194
> [ 2941.931973]        component_add+0x14/0x20
> [ 2941.931976]        dp_display_probe+0x2b4/0x474 [msm]
> [ 2941.932000]        platform_probe+0x68/0xd8
> [ 2941.932003]        really_probe+0x184/0x3c8
> [ 2941.932005]        __driver_probe_device+0x7c/0x16c
> [ 2941.932008]        driver_probe_device+0x3c/0x110
> [ 2941.932011]        __device_attach_driver+0xbc/0x158
> [ 2941.932014]        bus_for_each_drv+0x84/0xe0
> [ 2941.932016]        __device_attach+0xa8/0x1d4
> [ 2941.932018]        device_initial_probe+0x14/0x20
> [ 2941.932021]        bus_probe_device+0xb0/0xb4
> [ 2941.932023]        deferred_probe_work_func+0xa0/0xf4
> [ 2941.932026]        process_one_work+0x288/0x5bc
> [ 2941.932028]        worker_thread+0x74/0x450
> [ 2941.932031]        kthread+0x124/0x128
> [ 2941.932035]        ret_from_fork+0x10/0x20
> [ 2941.932037]
>                -> #2 (&gmu->lock){+.+.}-{3:3}:
> [ 2941.932043]        __mutex_lock+0xa0/0x840
> [ 2941.932045]        mutex_lock_nested+0x24/0x30
> [ 2941.932047]        a6xx_gpu_set_freq+0x30/0x5c [msm]
> [ 2941.932071]        msm_devfreq_target+0xb8/0x1a8 [msm]
> [ 2941.932094]        devfreq_set_target+0x84/0x27c
> [ 2941.932098]        devfreq_update_target+0xc4/0xec
> [ 2941.932102]        devfreq_monitor+0x38/0x170
> [ 2941.932105]        process_one_work+0x288/0x5bc
> [ 2941.932108]        worker_thread+0x74/0x450
> [ 2941.932110]        kthread+0x124/0x128
> [ 2941.932113]        ret_from_fork+0x10/0x20
> [ 2941.932116]
>                -> #1 (&df->lock){+.+.}-{3:3}:
> [ 2941.932121]        __mutex_lock+0xa0/0x840
> [ 2941.932124]        mutex_lock_nested+0x24/0x30
> [ 2941.932126]        msm_devfreq_get_dev_status+0x48/0x134 [msm]
> [ 2941.932149]        devfreq_simple_ondemand_func+0x3c/0x144
> [ 2941.932153]        devfreq_update_target+0x4c/0xec
> [ 2941.932157]        devfreq_monitor+0x38/0x170
> [ 2941.932160]        process_one_work+0x288/0x5bc
> [ 2941.932162]        worker_thread+0x74/0x450
> [ 2941.932165]        kthread+0x124/0x128
> [ 2941.932168]        ret_from_fork+0x10/0x20
> [ 2941.932171]
>                -> #0 (&devfreq->lock){+.+.}-{3:3}:
> [ 2941.932175]        __lock_acquire+0x13d8/0x2188
> [ 2941.932178]        lock_acquire+0x1e8/0x310
> [ 2941.932180]        __mutex_lock+0xa0/0x840
> [ 2941.932182]        mutex_lock_nested+0x24/0x30
> [ 2941.932184]        qos_min_notifier_call+0x28/0x88
> [ 2941.932188]        notifier_call_chain+0xa0/0x17c
> [ 2941.932190]        blocking_notifier_call_chain+0x48/0x70
> [ 2941.932193]        pm_qos_update_target+0xdc/0x1d0
> [ 2941.932195]        freq_qos_apply+0x68/0x74
> [ 2941.932198]        apply_constraint+0x100/0x148
> [ 2941.932201]        __dev_pm_qos_update_request+0xb8/0x1fc
> [ 2941.932203]        dev_pm_qos_update_request+0x3c/0x64
> [ 2941.932206]        msm_devfreq_active+0xf8/0x194 [msm]
> [ 2941.932227]        msm_gpu_submit+0x18c/0x1a8 [msm]
> [ 2941.932249]        msm_job_run+0x98/0x11c [msm]
> [ 2941.932272]        drm_sched_main+0x1a0/0x444 [gpu_sched]
> [ 2941.932281]        kthread+0x124/0x128
> [ 2941.932284]        ret_from_fork+0x10/0x20
> [ 2941.932287]
>                other info that might help us debug this:
>
> [ 2941.932289] Chain exists of:
>                  &devfreq->lock --> dev_pm_qos_mtx --> &(c->notifiers)->rwsem
>
> [ 2941.932296]  Possible unsafe locking scenario:
>
> [ 2941.932298]        CPU0                    CPU1
> [ 2941.932300]        ----                    ----
> [ 2941.932301]   rlock(&(c->notifiers)->rwsem);
> [ 2941.932304]                                lock(dev_pm_qos_mtx);
> [ 2941.932307]                                lock(&(c->notifiers)->rwsem);
> [ 2941.932309]   lock(&devfreq->lock);
> [ 2941.932312]
>                 *** DEADLOCK ***
>
> [ 2941.932313] 4 locks held by ring0/359:
> [ 2941.932315]  #0: ffff633110966170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x8c/0x11c [msm]
> [ 2941.932342]  #1: ffff633110966208 (&gpu->active_lock){+.+.}-{3:3}, at: msm_gpu_submit+0xdc/0x1a8 [msm]
> [ 2941.932368]  #2: ffffa40da2f91ed0 (dev_pm_qos_mtx){+.+.}-{3:3}, at: dev_pm_qos_update_request+0x30/0x64
> [ 2941.932374]  #3: ffff63310e3cace8 (&(c->notifiers)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x30/0x70
> [ 2941.932381]
>                stack backtrace:
> [ 2941.932383] CPU: 7 PID: 359 Comm: ring0 Not tainted 6.4.0-rc5 #64
> [ 2941.932386] Hardware name: LENOVO 21BYZ9SRUS/21BYZ9SRUS, BIOS N3HET53W (1.25 ) 10/12/2022
> [ 2941.932389] Call trace:
> [ 2941.932391]  dump_backtrace+0x9c/0x11c
> [ 2941.932395]  show_stack+0x18/0x24
> [ 2941.932398]  dump_stack_lvl+0x60/0xac
> [ 2941.932402]  dump_stack+0x18/0x24
> [ 2941.932405]  print_circular_bug+0x26c/0x348
> [ 2941.932407]  check_noncircular+0x134/0x148
> [ 2941.932409]  __lock_acquire+0x13d8/0x2188
> [ 2941.932411]  lock_acquire+0x1e8/0x310
> [ 2941.932414]  __mutex_lock+0xa0/0x840
> [ 2941.932416]  mutex_lock_nested+0x24/0x30
> [ 2941.932418]  qos_min_notifier_call+0x28/0x88
> [ 2941.932421]  notifier_call_chain+0xa0/0x17c
> [ 2941.932424]  blocking_notifier_call_chain+0x48/0x70
> [ 2941.932426]  pm_qos_update_target+0xdc/0x1d0
> [ 2941.932428]  freq_qos_apply+0x68/0x74
> [ 2941.932431]  apply_constraint+0x100/0x148
> [ 2941.932433]  __dev_pm_qos_update_request+0xb8/0x1fc
> [ 2941.932435]  dev_pm_qos_update_request+0x3c/0x64
> [ 2941.932437]  msm_devfreq_active+0xf8/0x194 [msm]
> [ 2941.932460]  msm_gpu_submit+0x18c/0x1a8 [msm]
> [ 2941.932482]  msm_job_run+0x98/0x11c [msm]
> [ 2941.932504]  drm_sched_main+0x1a0/0x444 [gpu_sched]
> [ 2941.932511]  kthread+0x124/0x128
> [ 2941.932514]  ret_from_fork+0x10/0x20

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ