[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f2a3bab9-296c-43a1-9b7e-944c5044feaf@redhat.com>
Date: Mon, 4 Dec 2023 14:51:49 +0100
From: Hans de Goede <hdegoede@...hat.com>
To: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
"David E. Box" <david.e.box@...ux.intel.com>
Cc: LKML <linux-kernel@...r.kernel.org>,
platform-driver-x86@...r.kernel.org, rajvi.jingar@...ux.intel.com
Subject: Re: [PATCH V6 01/20] platform/x86/intel/vsec: Fix xa_alloc memory
leak
Hi,
On 11/30/23 12:02, Ilpo Järvinen wrote:
> On Wed, 29 Nov 2023, David E. Box wrote:
>
>> Commit 936874b77dd0 ("platform/x86/intel/vsec: Add PCI error recovery
>> support to Intel PMT") added an xarray to track the list of vsec devices to
>> be recovered after a PCI error. But it did not provide cleanup for the list
>> leading to a memory leak that was caught by kmemleak. Do xa_alloc() before
>> devm_add_action_or_reset() so that the list may be cleaned up with
>> xa_erase() in the release function.
>>
>> Fixes: 936874b77dd0 ("platform/x86/intel/vsec: Add PCI error recovery support to Intel PMT")
>> Signed-off-by: David E. Box <david.e.box@...ux.intel.com>
>> ---
>>
>> V6 - Move xa_alloc() before ida_alloc() to reduce mutex use during error
>> recovery.
>> - Fix return value after id_alloc() fail
>> - Add Fixes tag
>> - Add more detail to changelog
>>
>> V5 - New patch
>>
>> drivers/platform/x86/intel/vsec.c | 24 ++++++++++++++----------
>> drivers/platform/x86/intel/vsec.h | 1 +
>> 2 files changed, 15 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/platform/x86/intel/vsec.c b/drivers/platform/x86/intel/vsec.c
>> index c1f9e4471b28..2d568466b4e2 100644
>> --- a/drivers/platform/x86/intel/vsec.c
>> +++ b/drivers/platform/x86/intel/vsec.c
>> @@ -120,6 +120,8 @@ static void intel_vsec_dev_release(struct device *dev)
>> {
>> struct intel_vsec_device *intel_vsec_dev = dev_to_ivdev(dev);
>>
>> + xa_erase(&auxdev_array, intel_vsec_dev->id);
>> +
>> mutex_lock(&vsec_ida_lock);
>> ida_free(intel_vsec_dev->ida, intel_vsec_dev->auxdev.id);
>> mutex_unlock(&vsec_ida_lock);
>> @@ -135,19 +137,27 @@ int intel_vsec_add_aux(struct pci_dev *pdev, struct device *parent,
>> struct auxiliary_device *auxdev = &intel_vsec_dev->auxdev;
>> int ret, id;
>>
>> - mutex_lock(&vsec_ida_lock);
>> - ret = ida_alloc(intel_vsec_dev->ida, GFP_KERNEL);
>> - mutex_unlock(&vsec_ida_lock);
>> + ret = xa_alloc(&auxdev_array, &intel_vsec_dev->id, intel_vsec_dev,
>> + PMT_XA_LIMIT, GFP_KERNEL);
>> if (ret < 0) {
>> kfree(intel_vsec_dev->resource);
>> kfree(intel_vsec_dev);
>> return ret;
>> }
>>
>> + mutex_lock(&vsec_ida_lock);
>> + id = ida_alloc(intel_vsec_dev->ida, GFP_KERNEL);
>> + mutex_unlock(&vsec_ida_lock);
>> + if (id < 0) {
>> + kfree(intel_vsec_dev->resource);
>> + kfree(intel_vsec_dev);
>> + return id;
>
> Thanks, this looks much better this way around but it is missing
> xa_alloc() rollback now that the order is reversed.
>
> Once that is fixed,
>
> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
I have fixed this up, adding the missing:
xa_erase(&auxdev_array, intel_vsec_dev->id);
to this error-exit path while merging this.
While looking into this I did find one other thing which
worries me (different issue, needs a separate fix):
intel_vsec_pci_slot_reset() uses
devm_release_action(&pdev->dev, intel_vsec_remove_aux,
&intel_vsec_dev->auxdev);
and seems to assume that after this intel_vsec_remove_aux()
has run for the auxdev-s. *But this is not the case*
devm_release_action() only removes the action from the list
of devres resources tied to the parent PCI device.
It does *NOT* call the registered action function,
so intel_vsec_remove_aux() is NOT called here.
And then on re-probing the device as is done in
intel_vsec_pci_slot_reset() a second set of aux
devices with the same parent will be created AFAICT.
So it seems that this also needs an explicit
intel_vsec_remove_aux() call for each auxdev!
###
This makes me wonder if the PCI error handling here
and specifically the reset code was ever tested ?
###
Note that simply forcing a reprobe using device_reprobe()
will cause all the aux-devices to also get removed through
the action on driver-unbind without ever needing
the auxdev_array at all!
I guess that you want the removal to happen before
the pci_restore_state(pdev) state though, so that
simply relying on the removal on driver unbind
is not an option ?
Regards,
Hans
Powered by blists - more mailing lists