linux-kernel - Re: [PATCH v6 00/22] Add generic memory shrinker to VirtIO-GPU and Panfrost DRM drivers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF6AEGsQBcHbU6Ps5fp5v6ANaZwMAtig-3i-ekQzwG=7BBDNwA@mail.gmail.com>
Date:   Tue, 28 Jun 2022 09:48:09 -0700
From:   Rob Clark <robdclark@...il.com>
To:     Dmitry Osipenko <dmitry.osipenko@...labora.com>
Cc:     Robin Murphy <robin.murphy@....com>,
        David Airlie <airlied@...ux.ie>,
        Gerd Hoffmann <kraxel@...hat.com>,
        Gurchetan Singh <gurchetansingh@...omium.org>,
        Chia-I Wu <olvaffe@...il.com>, Daniel Vetter <daniel@...ll.ch>,
        Daniel Almeida <daniel.almeida@...labora.com>,
        Gert Wollny <gert.wollny@...labora.com>,
        Gustavo Padovan <gustavo.padovan@...labora.com>,
        Daniel Stone <daniel@...ishbar.org>,
        Tomeu Vizoso <tomeu.vizoso@...labora.com>,
        Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
        Maxime Ripard <mripard@...nel.org>,
        Thomas Zimmermann <tzimmermann@...e.de>,
        Rob Herring <robh@...nel.org>,
        Steven Price <steven.price@....com>,
        Alyssa Rosenzweig <alyssa.rosenzweig@...labora.com>,
        Emil Velikov <emil.l.velikov@...il.com>,
        Qiang Yu <yuq825@...il.com>,
        Sumit Semwal <sumit.semwal@...aro.org>,
        Christian König <christian.koenig@....com>,
        "Pan, Xinhui" <Xinhui.Pan@....com>,
        Thierry Reding <thierry.reding@...il.com>,
        Tomasz Figa <tfiga@...omium.org>,
        Marek Szyprowski <m.szyprowski@...sung.com>,
        Mauro Carvalho Chehab <mchehab@...nel.org>,
        Alex Deucher <alexander.deucher@....com>,
        Jani Nikula <jani.nikula@...ux.intel.com>,
        Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>,
        Rodrigo Vivi <rodrigo.vivi@...el.com>,
        Tvrtko Ursulin <tvrtko.ursulin@...ux.intel.com>,
        dri-devel <dri-devel@...ts.freedesktop.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "open list:VIRTIO GPU DRIVER" 
        <virtualization@...ts.linux-foundation.org>,
        Dmitry Osipenko <digetx@...il.com>,
        linux-tegra@...r.kernel.org,
        "open list:DMA BUFFER SHARING FRAMEWORK" 
        <linux-media@...r.kernel.org>,
        "moderated list:DMA BUFFER SHARING FRAMEWORK" 
        <linaro-mm-sig@...ts.linaro.org>,
        amd-gfx list <amd-gfx@...ts.freedesktop.org>,
        Intel Graphics Development <intel-gfx@...ts.freedesktop.org>,
        kernel@...labora.com
Subject: Re: [PATCH v6 00/22] Add generic memory shrinker to VirtIO-GPU and
 Panfrost DRM drivers

On Tue, Jun 28, 2022 at 5:51 AM Dmitry Osipenko
<dmitry.osipenko@...labora.com> wrote:
>
> On 6/28/22 15:31, Robin Murphy wrote:
> > ----->8-----
> > [   68.295951] ======================================================
> > [   68.295956] WARNING: possible circular locking dependency detected
> > [   68.295963] 5.19.0-rc3+ #400 Not tainted
> > [   68.295972] ------------------------------------------------------
> > [   68.295977] cc1/295 is trying to acquire lock:
> > [   68.295986] ffff000008d7f1a0
> > (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_gem_shmem_free+0x7c/0x198
> > [   68.296036]
> > [   68.296036] but task is already holding lock:
> > [   68.296041] ffff80000c14b820 (fs_reclaim){+.+.}-{0:0}, at:
> > __alloc_pages_slowpath.constprop.0+0x4d8/0x1470
> > [   68.296080]
> > [   68.296080] which lock already depends on the new lock.
> > [   68.296080]
> > [   68.296085]
> > [   68.296085] the existing dependency chain (in reverse order) is:
> > [   68.296090]
> > [   68.296090] -> #1 (fs_reclaim){+.+.}-{0:0}:
> > [   68.296111]        fs_reclaim_acquire+0xb8/0x150
> > [   68.296130]        dma_resv_lockdep+0x298/0x3fc
> > [   68.296148]        do_one_initcall+0xe4/0x5f8
> > [   68.296163]        kernel_init_freeable+0x414/0x49c
> > [   68.296180]        kernel_init+0x2c/0x148
> > [   68.296195]        ret_from_fork+0x10/0x20
> > [   68.296207]
> > [   68.296207] -> #0 (reservation_ww_class_mutex){+.+.}-{3:3}:
> > [   68.296229]        __lock_acquire+0x1724/0x2398
> > [   68.296246]        lock_acquire+0x218/0x5b0
> > [   68.296260]        __ww_mutex_lock.constprop.0+0x158/0x2378
> > [   68.296277]        ww_mutex_lock+0x7c/0x4d8
> > [   68.296291]        drm_gem_shmem_free+0x7c/0x198
> > [   68.296304]        panfrost_gem_free_object+0x118/0x138
> > [   68.296318]        drm_gem_object_free+0x40/0x68
> > [   68.296334]        drm_gem_shmem_shrinker_run_objects_scan+0x42c/0x5b8
> > [   68.296352]        drm_gem_shmem_shrinker_scan_objects+0xa4/0x170
> > [   68.296368]        do_shrink_slab+0x220/0x808
> > [   68.296381]        shrink_slab+0x11c/0x408
> > [   68.296392]        shrink_node+0x6ac/0xb90
> > [   68.296403]        do_try_to_free_pages+0x1dc/0x8d0
> > [   68.296416]        try_to_free_pages+0x1ec/0x5b0
> > [   68.296429]        __alloc_pages_slowpath.constprop.0+0x528/0x1470
> > [   68.296444]        __alloc_pages+0x4e0/0x5b8
> > [   68.296455]        __folio_alloc+0x24/0x60
> > [   68.296467]        vma_alloc_folio+0xb8/0x2f8
> > [   68.296483]        alloc_zeroed_user_highpage_movable+0x58/0x68
> > [   68.296498]        __handle_mm_fault+0x918/0x12a8
> > [   68.296513]        handle_mm_fault+0x130/0x300
> > [   68.296527]        do_page_fault+0x1d0/0x568
> > [   68.296539]        do_translation_fault+0xa0/0xb8
> > [   68.296551]        do_mem_abort+0x68/0xf8
> > [   68.296562]        el0_da+0x74/0x100
> > [   68.296572]        el0t_64_sync_handler+0x68/0xc0
> > [   68.296585]        el0t_64_sync+0x18c/0x190
> > [   68.296596]
> > [   68.296596] other info that might help us debug this:
> > [   68.296596]
> > [   68.296601]  Possible unsafe locking scenario:
> > [   68.296601]
> > [   68.296604]        CPU0                    CPU1
> > [   68.296608]        ----                    ----
> > [   68.296612]   lock(fs_reclaim);
> > [   68.296622] lock(reservation_ww_class_mutex);
> > [   68.296633]                                lock(fs_reclaim);
> > [   68.296644]   lock(reservation_ww_class_mutex);
> > [   68.296654]
> > [   68.296654]  *** DEADLOCK ***
>
> This splat could be ignored for now. I'm aware about it, although
> haven't looked closely at how to fix it since it's a kind of a lockdep
> misreporting.

The lockdep splat could be fixed with something similar to what I've
done in msm, ie. basically just not acquire the lock in the finalizer:

https://patchwork.freedesktop.org/patch/489364/

There is one gotcha to watch for, as danvet pointed out
(scan_objects() could still see the obj in the LRU before the
finalizer removes it), but if scan_objects() does the
kref_get_unless_zero() trick, it is safe.

BR,
-R