lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d9afc6db-fd8a-4069-a8a8-1e2d74c1db3a@amd.com>
Date: Wed, 12 Nov 2025 14:32:51 +0800
From: "Du, Bin" <bin.du@....com>
To: Sultan Alsawaf <sultan@...neltoast.com>
Cc: mchehab@...nel.org, hverkuil@...all.nl,
 laurent.pinchart+renesas@...asonboard.com, bryan.odonoghue@...aro.org,
 sakari.ailus@...ux.intel.com, prabhakar.mahadev-lad.rj@...renesas.com,
 linux-media@...r.kernel.org, linux-kernel@...r.kernel.org,
 pratap.nirujogi@....com, benjamin.chan@....com, king.li@....com,
 gjorgji.rosikopulos@....com, Phil.Jawich@....com, Dominic.Antony@....com,
 mario.limonciello@....com, richard.gong@....com, anson.tsao@....com
Subject: Re: [PATCH v5 0/7] Add AMD ISP4 driver

Thanks Sultan for your information.

On 11/12/2025 9:21 AM, Sultan Alsawaf wrote:
> On Tue, Nov 11, 2025 at 03:33:42PM -0800, Sultan Alsawaf wrote:
>> On Tue, Nov 11, 2025 at 05:58:10PM +0800, Du, Bin wrote:
>>>
>>> On 11/11/2025 5:05 PM, Sultan Alsawaf wrote:
>>>
>>>> On Mon, Nov 10, 2025 at 05:46:28PM +0800, Du, Bin wrote:
>>>>> Thank you, Sultan, for your time, big effort, and constant support.
>>>>> Apologies for my delayed reply for being occupied a little with other
>>>>> matters.
>>>>>
>>>>> On 11/10/2025 4:33 PM, Sultan Alsawaf wrote:
>>>>>> Hi Bin,
>>>>>>
>>>>>> On Wed, Nov 05, 2025 at 01:25:58AM -0800, Sultan Alsawaf wrote:
>>>>>>> Hi Bin,
>>>>>>>
>>>>>>> To expedite review, I've attached a patch containing a bunch of fixes I've made
>>>>>>> on top of v5. Most of my changes should be self-explanatory; feel free to ask
>>>>>>> further about specific changes if you have any questions.
>>>>>>>
>>>>>>> I haven't had a chance to review all of the v4 -> v5 changes yet, but I figured
>>>>>>> I should send what I've got so far.
>>>>>>>
>>>>>>> FYI, there is a regression in isp4if_dequeue_buffer() where the bufq lock isn't
>>>>>>> protecting the list_del() anymore. I also checked the compiler output when using
>>>>>>> guard() versus scoped_guard() in that function and there is no difference:
>>>>>>>
>>>>>>>      sha1sum:
>>>>>>>      5652a0306da22ea741d80a9e03a787d0f71758a8  isp4_interface.o // guard()
>>>>>>>      5652a0306da22ea741d80a9e03a787d0f71758a8  isp4_interface.o // scoped_guard()
>>>>>>>
>>>>>>> So guard() should be used there again, which I've done in my patch.
>>>>>>>
>>>>>>> I also have a few questions:
>>>>>>>
>>>>>>> 1. Does ISP FW provide a register interface to disable the IRQ? If so, it is
>>>>>>>       faster to use that than using disable_irq_nosync() to disable the IRQ from
>>>>>>>       the CPU's side.
>>>>>>>
>>>>>>> 2. When the IRQ is re-enabled in isp4sd_fw_resp_func(), is there anything
>>>>>>>       beforehand to mask all pending interrupts from the ISP so that there isn't a
>>>>>>>       bunch of stale interrupts firing as soon the IRQ is re-enabled?
>>>>>>>
>>>>>>> 3. Is there any risk of deadlock due to the stream kthread racing with the ISP
>>>>>>>       when the ISP posts a new response _after_ the kthread determines there are no
>>>>>>>       more new responses but _before_ the enable_irq() in isp4sd_fw_resp_func()?
>>>>>>>
>>>>>>> 4. Why are some lines much longer than before? It seems inconsistent that now
>>>>>>>       there is a mix of several lines wrapped to 80 cols and many lines going
>>>>>>>       beyond 80 cols. And there are multiple places where code is wrapped before
>>>>>>>       reaching 80 cols even with lots of room left, specifically for cases where it
>>>>>>>       wouldn't hurt readability to put more characters onto each line.
>>>>>>
>>>>>> I've attached a new, complete patch of changes to apply on top of v5. You may
>>>>>> ignore the incomplete patch from my previous email and use the new one instead.
>>>>>>
>>>>>> I made many changes and also answered questions 1-3 myself.
>>>>>>
>>>>>> Please apply this on top of v5 and let me know if you have any questions.
>>>>>>
>>>>>
>>>>> Sure, will review, apply and test your patch accordingly. Your contribution
>>>>> is greatly appreciated, will let you know if there is any question or
>>>>> problem.
>>>>>
>>>>>> BTW, I noticed a strange regression in v5 even without any of my changes: every
>>>>>> time you use cheese after using it one time, the video will freeze after 30-60
>>>>>> seconds with this message printed to dmesg:
>>>>>>      [ 2032.716559] amd_isp_capture amd_isp_capture: -><- fail respid unknown respid(0x30002)
>>>>>>
>>>>>> And the video never unfreezes. I haven't found the cause for the regression yet;
>>>>>> can you try to reproduce it?
>>>>>>
>>>>>
>>>>> Really weird, we don't see this issue either in dev or QA test. Is it 100%
>>>>> reproducible and any other fail or err in the log?
>>>>
>>>> Yes, it's 100% reproducible. There's no other message in dmesg, only that one.
>>>>
>>>> Sometimes there is a stop stream error when I close cheese after it froze:
>>>>
>>>>     [  656.540307] amd_isp_capture amd_isp_capture: fail to disable stream
>>>>     [  657.046633] amd_isp_capture amd_isp_capture: fail to stop steam
>>>>     [  657.047224] amd_isp_capture amd_isp_capture: disabling streaming failed (-110)
>>>>
>>>> When I revert to v4 I cannot reproduce it at all. It seems to be something in
>>>> v4 -> v5 that is not fixed by any of my changes.
>>>>
>>>
>>> Hi Sultan, could you please try following modifications on top of v5 to see
>>> if it helps?
>>>
>>> diff --git a/drivers/media/platform/amd/isp4/isp4_fw_cmd_resp.h
>>> b/drivers/media/platform/amd/isp4/isp4_fw_cmd_resp.h
>>> index 39c2265121f9..d571b3873edb 100644
>>> --- a/drivers/media/platform/amd/isp4/isp4_fw_cmd_resp.h
>>> +++ b/drivers/media/platform/amd/isp4/isp4_fw_cmd_resp.h
>>> @@ -97,7 +97,7 @@
>>>
>>> #define ADDR_SPACE_TYPE_GPU_VA          4
>>>
>>> -#define FW_MEMORY_POOL_SIZE             (200 * 1024 * 1024)
>>> +#define FW_MEMORY_POOL_SIZE             (100 * 1024 * 1024)
>>>
>>> /*
>>>    * standard ISP mipicsi=>isp
>>> diff --git a/drivers/media/platform/amd/isp4/isp4_subdev.c
>>> b/drivers/media/platform/amd/isp4/isp4_subdev.c
>>> index 248d10076ae8..acbc80aa709e 100644
>>> --- a/drivers/media/platform/amd/isp4/isp4_subdev.c
>>> +++ b/drivers/media/platform/amd/isp4/isp4_subdev.c
>>> @@ -697,7 +697,7 @@ static int isp4sd_stop_resp_proc_threads(struct
>>> isp4_subdev *isp_subdev)
>>>          return 0;
>>> }
>>>
>>> -static int isp4sd_pwroff_and_deinit(struct isp4_subdev *isp_subdev)
>>> +static int isp4sd_pwroff_and_deinit(struct isp4_subdev *isp_subdev, bool
>>> irq_enabled)
>>> {
>>>          struct isp4sd_sensor_info *sensor_info = &isp_subdev->sensor_info;
>>>          unsigned int perf_state = ISP4SD_PERFORMANCE_STATE_LOW;
>>> @@ -716,8 +716,9 @@ static int isp4sd_pwroff_and_deinit(struct isp4_subdev
>>> *isp_subdev)
>>>                  return 0;
>>>          }
>>>
>>> -       for (int i = 0; i < ISP4SD_MAX_FW_RESP_STREAM_NUM; i++)
>>> -               disable_irq(isp_subdev->irq[i]);
>>> +       if (irq_enabled)
>>> +               for (int i = 0; i < ISP4SD_MAX_FW_RESP_STREAM_NUM; i++)
>>> +                       disable_irq(isp_subdev->irq[i]);
>>>
>>>          isp4sd_stop_resp_proc_threads(isp_subdev);
>>>          dev_dbg(dev, "isp_subdev stop resp proc streads suc");
>>> @@ -813,7 +814,7 @@ static int isp4sd_pwron_and_init(struct isp4_subdev
>>> *isp_subdev)
>>>
>>>          return 0;
>>> err_unlock_and_close:
>>> -       isp4sd_pwroff_and_deinit(isp_subdev);
>>> +       isp4sd_pwroff_and_deinit(isp_subdev, false);
>>>          return -EINVAL;
>>> }
>>>
>>> @@ -985,7 +986,7 @@ static int isp4sd_set_power(struct v4l2_subdev *sd, int
>>> on)
>>>          if (on)
>>>                  return isp4sd_pwron_and_init(isp_subdev);
>>>          else
>>> -               return isp4sd_pwroff_and_deinit(isp_subdev);
>>> +               return isp4sd_pwroff_and_deinit(isp_subdev, true);
>>> }
>>>
>>> static const struct v4l2_subdev_core_ops isp4sd_core_ops = {
>>
>> No difference sadly; same symptoms as before. FYI, your email client broke the
>> patch formatting so I couldn't apply it; it hard wrapped some lines to 80 cols,
>> replaced tabs with spaces, and removed leading spaces on each context line, so I
>> had to apply it manually. To confirm I applied it correctly, here is my diff:
>>
>> diff --git a/drivers/media/platform/amd/isp4/isp4_fw_cmd_resp.h b/drivers/media/platform/amd/isp4/isp4_fw_cmd_resp.h
>> index 39c2265121f9..d571b3873edb 100644
>> --- a/drivers/media/platform/amd/isp4/isp4_fw_cmd_resp.h
>> +++ b/drivers/media/platform/amd/isp4/isp4_fw_cmd_resp.h
>> @@ -97,7 +97,7 @@
>>   
>>   #define ADDR_SPACE_TYPE_GPU_VA          4
>>   
>> -#define FW_MEMORY_POOL_SIZE             (200 * 1024 * 1024)
>> +#define FW_MEMORY_POOL_SIZE             (100 * 1024 * 1024)
>>   
>>   /*
>>    * standard ISP mipicsi=>isp
>> diff --git a/drivers/media/platform/amd/isp4/isp4_subdev.c b/drivers/media/platform/amd/isp4/isp4_subdev.c
>> index 4bd2ebf0f694..500ef0af8a41 100644
>> --- a/drivers/media/platform/amd/isp4/isp4_subdev.c
>> +++ b/drivers/media/platform/amd/isp4/isp4_subdev.c
>> @@ -669,7 +669,7 @@ static int isp4sd_stop_resp_proc_threads(struct isp4_subdev *isp_subdev)
>>   	return 0;
>>   }
>>   
>> -static int isp4sd_pwroff_and_deinit(struct isp4_subdev *isp_subdev)
>> +static int isp4sd_pwroff_and_deinit(struct isp4_subdev *isp_subdev, bool irq_enabled)
>>   {
>>   	struct isp4sd_sensor_info *sensor_info = &isp_subdev->sensor_info;
>>   	unsigned int perf_state = ISP4SD_PERFORMANCE_STATE_LOW;
>> @@ -688,8 +688,9 @@ static int isp4sd_pwroff_and_deinit(struct isp4_subdev *isp_subdev)
>>   		return 0;
>>   	}
>>   
>> -	for (int i = 0; i < ISP4SD_MAX_FW_RESP_STREAM_NUM; i++)
>> -		disable_irq(isp_subdev->irq[i]);
>> +	if (irq_enabled)
>> +		for (int i = 0; i < ISP4SD_MAX_FW_RESP_STREAM_NUM; i++)
>> +			disable_irq(isp_subdev->irq[i]);
>>   
>>   	isp4sd_stop_resp_proc_threads(isp_subdev);
>>   	dev_dbg(dev, "isp_subdev stop resp proc streads suc");
>> @@ -785,7 +786,7 @@ static int isp4sd_pwron_and_init(struct isp4_subdev *isp_subdev)
>>   
>>   	return 0;
>>   err_unlock_and_close:
>> -	isp4sd_pwroff_and_deinit(isp_subdev);
>> +	isp4sd_pwroff_and_deinit(isp_subdev, false);
>>   	return -EINVAL;
>>   }
>>   
>> @@ -957,7 +958,7 @@ static int isp4sd_set_power(struct v4l2_subdev *sd, int on)
>>   	if (on)
>>   		return isp4sd_pwron_and_init(isp_subdev);
>>   	else
>> -		return isp4sd_pwroff_and_deinit(isp_subdev);
>> +		return isp4sd_pwroff_and_deinit(isp_subdev, true);
>>   }
>>   
>>   static const struct v4l2_subdev_core_ops isp4sd_core_ops = {
>>
>>> On the other hand, please also add the patch in amdgpu which sets *bo to
>>> NULL in isp_kernel_buffer_alloc() which you mentioned in another thread.
>>
>> Yep, I have been doing all v5 testing with that patch applied.
>>
>> BTW, I have verified the IRQ changes are not the cause for the regression; I
>> tested with IRQs kept enabled all the time and the issue still occurs.
>>
>> I did observe that ISP stops sending interrupts when the video stream freezes.
>> And, if I replicate the bug enough times, it seems to permanently break FW until
>> a full machine reboot. Reloading amd_capture with the v4 driver doesn't recover
>> the ISP when this happens.
>>
>> As an improvement to the driver, can we do a hard reset of ISP on driver probe?
>> I am assuming hardware doesn't need too long to settle after hard reset, maybe
>> a few hundred milliseconds? This would ensure ISP FW is always in a working
>> state when the driver is loaded.
>>

Actually, each time the camera is activated, the ISP driver powers on 
the ISP and initiates its firmware from the beginning; when the camera 
is closed, the ISP is powered off..

>> Thanks,
>> Sultan
> 
> A small update: I reproduced the issue on v4, but it took several more cycles of
> closing/opening cheese and waiting 30s compared to v5.
> 
> Right now my best guess is that this is a timing issue with respect to FW that
> was exposed by the v5 changes. v5 introduced slight changes in code timing, like
> with the mutex locks getting replaced by spin locks.
> 
> I'll try to insert mdelays to see if I can expose the issue that way on v4.
> 

Could you kindly provide the FW used?

> Sultan

-- 
Regards,
Bin


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ