lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <569E6062.6030309@Intel.com>
Date:	Tue, 19 Jan 2016 16:12:18 +0000
From:	John Harrison <John.C.Harrison@...el.com>
To:	Gustavo Padovan <gustavo@...ovan.org>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	linux-kernel@...r.kernel.org, devel@...verdev.osuosl.org,
	dri-devel@...ts.freedesktop.org, daniels@...labora.com,
	Arve Hjønnevåg <arve@...roid.com>,
	Riley Andrews <riandrews@...roid.com>,
	Rob Clark <robdclark@...il.com>,
	Greg Hackmann <ghackmann@...gle.com>,
	Maarten Lankhorst <maarten.lankhorst@...onical.com>,
	Gustavo Padovan <gustavo.padovan@...labora.co.uk>
Subject: Re: [RFC 00/29] De-stage android's sync framework

On 19/01/2016 15:23, Gustavo Padovan wrote:
> Hi Daniel,
>
> 2016-01-19 Daniel Vetter <daniel@...ll.ch>:
>
>> On Fri, Jan 15, 2016 at 12:55:10PM -0200, Gustavo Padovan wrote:
>>> From: Gustavo Padovan <gustavo.padovan@...labora.co.uk>
>>>
>>> This patch series de-stage the sync framework, and in order to accomplish that
>>> a bunch of cleanups/improvements on the sync and fence were made.
>>>
>>> The sync framework contained some abstractions around struct fence and those
>>> were removed in the de-staging process among other changes:
>>>
>>> Userspace visible changes
>>> -------------------------
>>>
>>>   * The sw_sync file was moved from /dev/sw_sync to <debugfs>/sync/sw_sync. No
>>>   other change.
>>>
>>> Kernel API changes
>>> ------------------
>>>
>>>   * struct sync_timeline is now struct fence_timeline
>>>   * sync_timeline_ops is now fence_timeline_ops and they now carry struct
>>>   fence as parameter instead of struct sync_pt
>>>   * a .cleanup() fence op was added to allow sync_fence to run a cleanup when
>>>   the fence_timeline is destroyed
>>>   * added fence_add_used_data() to pass a private point to struct fence. This
>>>   pointer is sent back on the .cleanup op.
>>>   * The sync timeline function were moved to be fence_timeline functions:
>>> 	 - sync_timeline_create()	-> fence_timeline_create()
>>> 	 - sync_timeline_get()		-> fence_timeline_get()
>>> 	 - sync_timeline_put()		-> fence_timeline_put()
>>> 	 - sync_timeline_destroy()	-> fence_timeline_destroy()
>>> 	 - sync_timeline_signal()	-> fence_timeline_signal()
>>>
>>>    * sync_pt_create() was replaced be fence_create_on_timeline()
>>>
>>> Internal changes
>>> ----------------
>>>
>>>   * fence_timeline_ops was removed in favor of direct use fence_ops
>>>   * fence default functions were created for fence_ops
>>>   * removed structs sync_pt, sw_sync_timeline and sw_sync_pt
>> Bunch of fairly random comments all over:
>>
>> - include/uapi/linux/sw_sync.h imo should be dropped, it's just a private
>>    debugfs interface between fence fds and the testsuite. Since the plan is
>>    to have the testcases integrated into the kernel tree too we don't need
>>    a public header.
>>
>> - similar for include/linux/sw_sync.h Imo that should all be moved into
>>    sync_debug.c. Same for sw_sync.c, that should all land in sync_debug
>>    imo, and made optional with a Kconfig option. At least we should reuse
>>    CONFIG_DEBUGFS.
> These two items sounds reasonable to me.

I have just posted our in-progress IGT for testing i915 syncs (with a CC 
of Gustavo). It uses the sw_sync mechanisms. Can you take a quick look 
and see if it is the kind of thing you would expect us to be doing? Or 
is it using interfaces that you are planning to remove and/or make 
kernel only?

I'm not sure having a kernel only test is the best way to go. Having 
user land tests like IGT would be much more versatile.


>> - fence_context and fence_timeline are really the same. timeline has some
>>    super-basic support for doing sw-only fence timelines, but imo that's
>>    not really worth keeping (and if so better to keep seperate in a
>>    sw-fence.c or similar, like seqno-fence.c). The other main thing
>>    timeline provides is support to clean up fences on a timeline. And imo
>>    that cleanup should be done by the core fence support, not by the add-on
>>    stuff.
> Yes, they are. But I currently doesn't know how to merge them best, so I
> decided to go for a RFC instead of trying some crazy solution touching
> all fence_context users.
>
>> Interlude about fence cleanup on driver unload:
>>
>> Working drivers imo should never call timeline_destroy when there's still
>> an unsignalled fence around for that timeline/context. That just means
>> they're broken and failed to clean up all the pending work. So the problem
>> really is only what to do with fences where the driver disappeared, and
>> for that we essentially need a fence_revoke() function (which could be
>> called internally from timeline_free). So here's what I think
>> timeline_free should do:
>>
>> for_each_fence_on_timel() {
>> 	WARN_ON(!fence_is_signalled());
>>
>> 	fence_revoke(fence);
>> }
>>
>> Implementing fence_revoke is a bit tricky since we need to make sure the
>> memory contained ->ops and similar stuff doesn't disappear. Simplest
>> option might be to grab a temporary reference (using
>> kref_get_unless_zero), and then exchange ->ops with one that has only a
>> release function. We don't need anything else as long as all fence_*
>> functions the kernel might call check for signalling correctly first
>> (fence_wait is broken at least).
>>
>> Or we just give up (for now) and declare module unload as slightly racy.
>> dma-buf is similar. An intermediate option might be to at least add a
>> THIS_MODULE reference to each fence (but that's a bit expensive ...).
> I'd say we just give up for now as we don't have any driver using
> timeline_destroy for now. So we could go for other improvements first.
>
>> - back to timeline vs. context: I have no idea how to best clean up this
>>    mess, but least painful option long-term is probably to switch over all
>>    current users of fence_context_alloc to timelines and remove the plain
>>    context interface.
> Agreed.
>
>> - Imo the interface in include/linux/sync.h is duplicating too much of
>>    fence.h. I think the only bits we need are the refcounting, creating,
>>    fd-install and that's it. Plus a macro to loop over all the fences in a
>>    sync_fence. With that drivers will only ever deal with a pile of
>>    struct fence, making implicit fencing (using the fence list in dma-buf)
>>    and explicit fencing (using the fence list in sync_fence) much more
>>    similar.
> Yes, most of the sync_fence waiting should not be exported. Drivers
> should only wait for fence imo, not sync_fences.
>
>>    And we can easily do that since no internal users ;-)
>>
>> - get_timeline_name and get_driver_name are imo too much indirection, just
>>    add ->(drv_)name field to each of these.
>>
>> - struct sync_fence is a major confusion imo against struct fence. It
>>    made much more sense in the pure-android world where fence == sync_pt.
>>    Maybe we can rename sync_fence to sync_fence_fd (a bit long, and fd is a
>>    bit inaccurate), sync_file (like this best), fence_file (sounds silly
>>    imo), or something else?
> sync_file sounds good for me. fence_file feels like it a file for a
> single fence but we may have many fences on one sync_file.
>
>> - I guess just not yet part of this rfc, but moving the testsuite and
>>    adding kerneldoc for this is planned I guess? If you feel like I think
>>    it'd be best. We pull the current dma-buf stuff into
>>    device-drivers.tmpl, but it's completely lacking overview docs and all
>>    that. And I'd like to duplicate at least the dma-buf/fence sections into
>>    the gpu.tmpl docbook.
> We have converted testsuite from android's libsync but we need to wait
> for Google to re-license it to send it upstream.
>
> kerneldoc is planned for sure, but I'd say it will be better to have
> some users first, DRM for example.
>
>> - If we make timelines first class objects I think we could move some of
>>    the fields from struct fence to struct fence_timeline. E.g. the ops
>>    struct. That also makes it clearer that some of the vfuncs really should
>>    be taking a struct fence_timeline *timeline instead of a struct fence
>>    *fence as their primary parameter.
> I'll keep that as a final goal and work RFC v2 and see how far we can
> get.
>
> 	Gustavo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ