linux-kernel - Re: [Linaro-mm-sig] Re: [PATCH] dma-fence: allow dma fence to have their own lock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ab42ca92-70e3-ec82-c52c-0fc41d5b4a53@gmail.com>
Date:   Wed, 1 Jun 2022 15:52:53 +0200
From:   Christian König <ckoenig.leichtzumerken@...il.com>
To:     Sergey Senozhatsky <senozhatsky@...omium.org>,
        Christian König <christian.koenig@....com>,
        Sumit Semwal <sumit.semwal@...aro.org>,
        Gustavo Padovan <gustavo@...ovan.org>,
        Tomasz Figa <tfiga@...omium.org>,
        Ricardo Ribalda <ribalda@...omium.org>,
        Christoph Hellwig <hch@...radead.org>,
        linux-media@...r.kernel.org, dri-devel@...ts.freedesktop.org,
        linaro-mm-sig@...ts.linaro.org, linux-kernel@...r.kernel.org
Subject: Re: [Linaro-mm-sig] Re: [PATCH] dma-fence: allow dma fence to have
 their own lock

Am 01.06.22 um 15:22 schrieb Daniel Vetter:
> On Wed, Jun 01, 2022 at 02:45:42PM +0200, Christian König wrote:
>> Am 31.05.22 um 04:51 schrieb Sergey Senozhatsky:
>>> On (22/05/30 16:55), Christian König wrote:
>>>> Am 30.05.22 um 16:22 schrieb Sergey Senozhatsky:
>>>>> [SNIP]
>>>>> So the `lock` should have at least same lifespan as the DMA fence
>>>>> that borrows it, which is impossible to guarantee in our case.
>>>> Nope, that's not correct. The lock should have at least same lifespan as the
>>>> context of the DMA fence.
>>> How does one know when it's safe to release the context? DMA fence
>>> objects are still transparently refcount-ed and "live their own lives",
>>> how does one synchronize lifespans?
>> Well, you don't.
>>
>> If you have a dynamic context structure you need to reference count that as
>> well. In other words every time you create a fence in your context you need
>> to increment the reference count and every time a fence is release you
>> decrement it.
>>
>> If you have a static context structure like most drivers have then you must
>> make sure that all fences at least signal before you unload your driver. We
>> still somewhat have a race when you try to unload a driver and the fence_ops
>> structure suddenly disappear, but we currently live with that.
>>
>> Apart from that you are right, fences can live forever and we need to deal
>> with that.
> Yeah this entire thing is a bit an "oops we might have screwed up" moment.
> I think the cleanest way is to essentially do what the drm/sched codes
> does, which is split the gpu job into the public dma_fence (which can live
> forever) and the internal job fence (which has to deal with all the
> resource refcounting issues). And then make sure that only ever the public
> fence escapes to places where the fence can live forever (dma_resv,
> drm_syncobj, sync_file as our uapi container objects are the prominent
> cases really).
>
> It sucks a bit.

It's actually not that bad.

See after signaling the dma_fence_ops is mostly used for debugging I 
think, e.g. timeline name etc...

Christian.

> -Daniel