lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 20 Feb 2024 12:11:26 +0100
From: Marco Pagani <marpagan@...hat.com>
To: Xu Yilun <yilun.xu@...ux.intel.com>
Cc: Moritz Fischer <mdf@...nel.org>, Wu Hao <hao.wu@...el.com>,
 Xu Yilun <yilun.xu@...el.com>, Tom Rix <trix@...hat.com>,
 Jonathan Corbet <corbet@....net>,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 Alan Tull <atull@...nsource.altera.com>, linux-kernel@...r.kernel.org,
 linux-doc@...r.kernel.org, linux-fpga@...r.kernel.org
Subject: Re: [RFC PATCH v5 1/1] fpga: add an owner and use it to take the
 low-level module's refcount



On 2024-02-18 11:05, Xu Yilun wrote:
> On Mon, Feb 05, 2024 at 06:47:34PM +0100, Marco Pagani wrote:
>>
>>
>> On 2024-02-04 06:15, Xu Yilun wrote:
>>> On Fri, Feb 02, 2024 at 06:44:01PM +0100, Marco Pagani wrote:
>>>>
>>>>
>>>> On 2024-01-30 05:31, Xu Yilun wrote:
>>>>>> +#define fpga_mgr_register_full(parent, info) \
>>>>>> +	__fpga_mgr_register_full(parent, info, THIS_MODULE)
>>>>>>  struct fpga_manager *
>>>>>> -fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
>>>>>> +__fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
>>>>>> +			 struct module *owner);
>>>>>>  
>>>>>> +#define fpga_mgr_register(parent, name, mops, priv) \
>>>>>> +	__fpga_mgr_register(parent, name, mops, priv, THIS_MODULE)
>>>>>>  struct fpga_manager *
>>>>>> -fpga_mgr_register(struct device *parent, const char *name,
>>>>>> -		  const struct fpga_manager_ops *mops, void *priv);
>>>>>> +__fpga_mgr_register(struct device *parent, const char *name,
>>>>>> +		    const struct fpga_manager_ops *mops, void *priv, struct module *owner);
>>>>>> +
>>>>>>  void fpga_mgr_unregister(struct fpga_manager *mgr);
>>>>>>  
>>>>>> +#define devm_fpga_mgr_register_full(parent, info) \
>>>>>> +	__devm_fpga_mgr_register_full(parent, info, THIS_MODULE)
>>>>>>  struct fpga_manager *
>>>>>> -devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
>>>>>> +__devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
>>>>>> +			      struct module *owner);
>>>>>
>>>>> Add a line here. I can do it myself if you agree.
>>>>
>>>> Sure, that is fine by me. I also spotted a typo in the commit log body
>>>> (in taken -> is taken). Do you want me to send a v6, or do you prefer
>>>> to fix that in place?
>>>
>>> No need, I can fix it.
>>>
>>>>
>>>>>
>>>>> There is still a RFC prefix for this patch. Are you ready to get it merged?
>>>>> If yes, Acked-by: Xu Yilun <yilun.xu@...el.com>
>>>>
>>>> I'm ready for the patch to be merged. However, I recently sent an RFC
>>>> to propose a safer implementation of try_module_get() that would
>>>> simplify the code and may also benefit other subsystems. What do you
>>>> think?
>>>>
>>>> https://lore.kernel.org/linux-modules/20240130193614.49772-1-marpagan@redhat.com/
>>>
>>> I suggest take your fix to linux-fpga/for-next now. If your try_module_get()
>>> proposal is applied before the end of this cycle, we could re-evaluate
>>> this patch.
>>
>> That's fine by me.
> 
> Sorry, I still found issues about this solution.
> 
> void fpga_mgr_unregister(struct fpga_manager *mgr)
> {
>         dev_info(&mgr->dev, "%s %s\n", __func__, mgr->name);
> 
>         /*
>          * If the low level driver provides a method for putting fpga into
>          * a desired state upon unregister, do it.
>          */
>         fpga_mgr_fpga_remove(mgr);
> 
>         mutex_lock(&mgr->mops_mutex);
> 
>         mgr->mops = NULL;
> 
>         mutex_unlock(&mgr->mops_mutex);
> 
>         device_unregister(&mgr->dev);
> }
> 
> Note that fpga_mgr_unregister() doesn't have to be called in module_exit().
> So if we do fpga_mgr_get() then fpga_mgr_unregister(), We finally had a
> fpga_manager dev without mops, this is not what the user want and cause
> problem when using this fpga_manager dev for other FPGA APIs.

How about moving mgr->mops = NULL from fpga_mgr_unregister() to
class->dev_release()? In that way, mops will be set to NULL only when the
manager dev refcount reaches 0.

If fpga_mgr_unregister() is called from module_exit(), we are sure that nobody
got the manager dev earlier using fpga_mgr_get(), or it would have bumped up
the module's refcount, preventing its removal in the first place. In this case,
when device_unregister() is called, it will trigger dev_release() since the
manager dev refcount has reached 0.

If fpga_mgr_unregister() is called elsewhere in the module that registered the
manager (1), we have two subcases:

a) someone got the manager dev earlier and bumped the module's refcount. Hence,
the ops are safe since the module cannot be removed until the manager dev is
released by calling (the last) put_device(). This, in turn, will trigger
class->dev_release().

b) no one got manager dev. In this case, class->dev_release() will be called
immediately.

(1) The caller of fpga_mgr_register_*() is responsible for calling
fpga_mgr_unregister(), as specified in the docs.

> I have this concern when I was reviewing the same improvement for fpga
> bridge. The change for fpga bridge seems workable, the mutex keeps hold
> until fpga_bridge_put(). But I somewhat don't prefer the unregistration
> been unnecessarily blocked for long term.

I also don't like the idea of potentially blocking the unregistration, but I
could not find a better solution for the bridge at the moment.

> I think your try_module_get_safe() patch may finally solve the invalid
> module owner issue. Some options now, we ignore the invalid module owner
> issue (it exists before this change) and merge the rest of the
> improvements, or we wait for your patch accepted then re-evaluate. I
> prefer the former.

Yeah, try_module_get_safe() would make things simpler and easier. I'm currently
working on a series of selftests to demonstrate that the function is safe from
deadlocks, as requested by the maintainer. I hope I can convince people of the
advantages.

Thanks,
Marco


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ