[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9a9d4018-fd65-49be-9e0a-1eecc9cbf15d@redhat.com>
Date: Tue, 20 Feb 2024 12:11:26 +0100
From: Marco Pagani <marpagan@...hat.com>
To: Xu Yilun <yilun.xu@...ux.intel.com>
Cc: Moritz Fischer <mdf@...nel.org>, Wu Hao <hao.wu@...el.com>,
Xu Yilun <yilun.xu@...el.com>, Tom Rix <trix@...hat.com>,
Jonathan Corbet <corbet@....net>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Alan Tull <atull@...nsource.altera.com>, linux-kernel@...r.kernel.org,
linux-doc@...r.kernel.org, linux-fpga@...r.kernel.org
Subject: Re: [RFC PATCH v5 1/1] fpga: add an owner and use it to take the
low-level module's refcount
On 2024-02-18 11:05, Xu Yilun wrote:
> On Mon, Feb 05, 2024 at 06:47:34PM +0100, Marco Pagani wrote:
>>
>>
>> On 2024-02-04 06:15, Xu Yilun wrote:
>>> On Fri, Feb 02, 2024 at 06:44:01PM +0100, Marco Pagani wrote:
>>>>
>>>>
>>>> On 2024-01-30 05:31, Xu Yilun wrote:
>>>>>> +#define fpga_mgr_register_full(parent, info) \
>>>>>> + __fpga_mgr_register_full(parent, info, THIS_MODULE)
>>>>>> struct fpga_manager *
>>>>>> -fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
>>>>>> +__fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
>>>>>> + struct module *owner);
>>>>>>
>>>>>> +#define fpga_mgr_register(parent, name, mops, priv) \
>>>>>> + __fpga_mgr_register(parent, name, mops, priv, THIS_MODULE)
>>>>>> struct fpga_manager *
>>>>>> -fpga_mgr_register(struct device *parent, const char *name,
>>>>>> - const struct fpga_manager_ops *mops, void *priv);
>>>>>> +__fpga_mgr_register(struct device *parent, const char *name,
>>>>>> + const struct fpga_manager_ops *mops, void *priv, struct module *owner);
>>>>>> +
>>>>>> void fpga_mgr_unregister(struct fpga_manager *mgr);
>>>>>>
>>>>>> +#define devm_fpga_mgr_register_full(parent, info) \
>>>>>> + __devm_fpga_mgr_register_full(parent, info, THIS_MODULE)
>>>>>> struct fpga_manager *
>>>>>> -devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
>>>>>> +__devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
>>>>>> + struct module *owner);
>>>>>
>>>>> Add a line here. I can do it myself if you agree.
>>>>
>>>> Sure, that is fine by me. I also spotted a typo in the commit log body
>>>> (in taken -> is taken). Do you want me to send a v6, or do you prefer
>>>> to fix that in place?
>>>
>>> No need, I can fix it.
>>>
>>>>
>>>>>
>>>>> There is still a RFC prefix for this patch. Are you ready to get it merged?
>>>>> If yes, Acked-by: Xu Yilun <yilun.xu@...el.com>
>>>>
>>>> I'm ready for the patch to be merged. However, I recently sent an RFC
>>>> to propose a safer implementation of try_module_get() that would
>>>> simplify the code and may also benefit other subsystems. What do you
>>>> think?
>>>>
>>>> https://lore.kernel.org/linux-modules/20240130193614.49772-1-marpagan@redhat.com/
>>>
>>> I suggest take your fix to linux-fpga/for-next now. If your try_module_get()
>>> proposal is applied before the end of this cycle, we could re-evaluate
>>> this patch.
>>
>> That's fine by me.
>
> Sorry, I still found issues about this solution.
>
> void fpga_mgr_unregister(struct fpga_manager *mgr)
> {
> dev_info(&mgr->dev, "%s %s\n", __func__, mgr->name);
>
> /*
> * If the low level driver provides a method for putting fpga into
> * a desired state upon unregister, do it.
> */
> fpga_mgr_fpga_remove(mgr);
>
> mutex_lock(&mgr->mops_mutex);
>
> mgr->mops = NULL;
>
> mutex_unlock(&mgr->mops_mutex);
>
> device_unregister(&mgr->dev);
> }
>
> Note that fpga_mgr_unregister() doesn't have to be called in module_exit().
> So if we do fpga_mgr_get() then fpga_mgr_unregister(), We finally had a
> fpga_manager dev without mops, this is not what the user want and cause
> problem when using this fpga_manager dev for other FPGA APIs.
How about moving mgr->mops = NULL from fpga_mgr_unregister() to
class->dev_release()? In that way, mops will be set to NULL only when the
manager dev refcount reaches 0.
If fpga_mgr_unregister() is called from module_exit(), we are sure that nobody
got the manager dev earlier using fpga_mgr_get(), or it would have bumped up
the module's refcount, preventing its removal in the first place. In this case,
when device_unregister() is called, it will trigger dev_release() since the
manager dev refcount has reached 0.
If fpga_mgr_unregister() is called elsewhere in the module that registered the
manager (1), we have two subcases:
a) someone got the manager dev earlier and bumped the module's refcount. Hence,
the ops are safe since the module cannot be removed until the manager dev is
released by calling (the last) put_device(). This, in turn, will trigger
class->dev_release().
b) no one got manager dev. In this case, class->dev_release() will be called
immediately.
(1) The caller of fpga_mgr_register_*() is responsible for calling
fpga_mgr_unregister(), as specified in the docs.
> I have this concern when I was reviewing the same improvement for fpga
> bridge. The change for fpga bridge seems workable, the mutex keeps hold
> until fpga_bridge_put(). But I somewhat don't prefer the unregistration
> been unnecessarily blocked for long term.
I also don't like the idea of potentially blocking the unregistration, but I
could not find a better solution for the bridge at the moment.
> I think your try_module_get_safe() patch may finally solve the invalid
> module owner issue. Some options now, we ignore the invalid module owner
> issue (it exists before this change) and merge the rest of the
> improvements, or we wait for your patch accepted then re-evaluate. I
> prefer the former.
Yeah, try_module_get_safe() would make things simpler and easier. I'm currently
working on a series of selftests to demonstrate that the function is safe from
deadlocks, as requested by the maintainer. I hope I can convince people of the
advantages.
Thanks,
Marco
Powered by blists - more mailing lists