lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <591efd28-805a-4a13-b7e2-0f78a3ca3eac@amd.com>
Date: Wed, 12 Nov 2025 18:21:33 +0800
From: "Du, Bin" <bin.du@....com>
To: Sultan Alsawaf <sultan@...neltoast.com>
Cc: mchehab@...nel.org, hverkuil@...all.nl,
 laurent.pinchart+renesas@...asonboard.com, bryan.odonoghue@...aro.org,
 sakari.ailus@...ux.intel.com, prabhakar.mahadev-lad.rj@...renesas.com,
 linux-media@...r.kernel.org, linux-kernel@...r.kernel.org,
 pratap.nirujogi@....com, benjamin.chan@....com, king.li@....com,
 gjorgji.rosikopulos@....com, Phil.Jawich@....com, Dominic.Antony@....com,
 mario.limonciello@....com, richard.gong@....com, anson.tsao@....com
Subject: Re: [PATCH v5 0/7] Add AMD ISP4 driver



On 11/12/2025 3:38 PM, Sultan Alsawaf wrote:
> On Tue, Nov 11, 2025 at 11:06:41PM -0800, Sultan Alsawaf wrote:
>> On Wed, Nov 12, 2025 at 02:32:51PM +0800, Du, Bin wrote:
>>> Thanks Sultan for your information.
>>>
>>> On 11/12/2025 9:21 AM, Sultan Alsawaf wrote:
>>>> On Tue, Nov 11, 2025 at 03:33:42PM -0800, Sultan Alsawaf wrote:
>>>>> On Tue, Nov 11, 2025 at 05:58:10PM +0800, Du, Bin wrote:
>>>>>>
>>>>>> On 11/11/2025 5:05 PM, Sultan Alsawaf wrote:
>>>>>>
>>>>>>> On Mon, Nov 10, 2025 at 05:46:28PM +0800, Du, Bin wrote:
>>>>>>>> Thank you, Sultan, for your time, big effort, and constant support.
>>>>>>>> Apologies for my delayed reply for being occupied a little with other
>>>>>>>> matters.
>>>>>>>>
>>>>>>>> On 11/10/2025 4:33 PM, Sultan Alsawaf wrote:
>>>>>>>>> Hi Bin,
>>>>>>>>>
>>>>>>>>> On Wed, Nov 05, 2025 at 01:25:58AM -0800, Sultan Alsawaf wrote:
>>>>>>>>>> Hi Bin,
>>>>>>>>>>
>>>>>>>>>> To expedite review, I've attached a patch containing a bunch of fixes I've made
>>>>>>>>>> on top of v5. Most of my changes should be self-explanatory; feel free to ask
>>>>>>>>>> further about specific changes if you have any questions.
>>>>>>>>>>
>>>>>>>>>> I haven't had a chance to review all of the v4 -> v5 changes yet, but I figured
>>>>>>>>>> I should send what I've got so far.
>>>>>>>>>>
>>>>>>>>>> FYI, there is a regression in isp4if_dequeue_buffer() where the bufq lock isn't
>>>>>>>>>> protecting the list_del() anymore. I also checked the compiler output when using
>>>>>>>>>> guard() versus scoped_guard() in that function and there is no difference:
>>>>>>>>>>
>>>>>>>>>>       sha1sum:
>>>>>>>>>>       5652a0306da22ea741d80a9e03a787d0f71758a8  isp4_interface.o // guard()
>>>>>>>>>>       5652a0306da22ea741d80a9e03a787d0f71758a8  isp4_interface.o // scoped_guard()
>>>>>>>>>>
>>>>>>>>>> So guard() should be used there again, which I've done in my patch.
>>>>>>>>>>
>>>>>>>>>> I also have a few questions:
>>>>>>>>>>
>>>>>>>>>> 1. Does ISP FW provide a register interface to disable the IRQ? If so, it is
>>>>>>>>>>        faster to use that than using disable_irq_nosync() to disable the IRQ from
>>>>>>>>>>        the CPU's side.
>>>>>>>>>>
>>>>>>>>>> 2. When the IRQ is re-enabled in isp4sd_fw_resp_func(), is there anything
>>>>>>>>>>        beforehand to mask all pending interrupts from the ISP so that there isn't a
>>>>>>>>>>        bunch of stale interrupts firing as soon the IRQ is re-enabled?
>>>>>>>>>>
>>>>>>>>>> 3. Is there any risk of deadlock due to the stream kthread racing with the ISP
>>>>>>>>>>        when the ISP posts a new response _after_ the kthread determines there are no
>>>>>>>>>>        more new responses but _before_ the enable_irq() in isp4sd_fw_resp_func()?
>>>>>>>>>>
>>>>>>>>>> 4. Why are some lines much longer than before? It seems inconsistent that now
>>>>>>>>>>        there is a mix of several lines wrapped to 80 cols and many lines going
>>>>>>>>>>        beyond 80 cols. And there are multiple places where code is wrapped before
>>>>>>>>>>        reaching 80 cols even with lots of room left, specifically for cases where it
>>>>>>>>>>        wouldn't hurt readability to put more characters onto each line.
>>>>>>>>>
>>>>>>>>> I've attached a new, complete patch of changes to apply on top of v5. You may
>>>>>>>>> ignore the incomplete patch from my previous email and use the new one instead.
>>>>>>>>>
>>>>>>>>> I made many changes and also answered questions 1-3 myself.
>>>>>>>>>
>>>>>>>>> Please apply this on top of v5 and let me know if you have any questions.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Sure, will review, apply and test your patch accordingly. Your contribution
>>>>>>>> is greatly appreciated, will let you know if there is any question or
>>>>>>>> problem.
>>>>>>>>
>>>>>>>>> BTW, I noticed a strange regression in v5 even without any of my changes: every
>>>>>>>>> time you use cheese after using it one time, the video will freeze after 30-60
>>>>>>>>> seconds with this message printed to dmesg:
>>>>>>>>>       [ 2032.716559] amd_isp_capture amd_isp_capture: -><- fail respid unknown respid(0x30002)
>>>>>>>>>
>>>>>>>>> And the video never unfreezes. I haven't found the cause for the regression yet;
>>>>>>>>> can you try to reproduce it?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Really weird, we don't see this issue either in dev or QA test. Is it 100%
>>>>>>>> reproducible and any other fail or err in the log?
>>>>>>>
>>>>>>> Yes, it's 100% reproducible. There's no other message in dmesg, only that one.
>>>>>>>
>>>>>>> Sometimes there is a stop stream error when I close cheese after it froze:
>>>>>>>
>>>>>>>      [  656.540307] amd_isp_capture amd_isp_capture: fail to disable stream
>>>>>>>      [  657.046633] amd_isp_capture amd_isp_capture: fail to stop steam
>>>>>>>      [  657.047224] amd_isp_capture amd_isp_capture: disabling streaming failed (-110)
>>>>>>>
>>>>>>> When I revert to v4 I cannot reproduce it at all. It seems to be something in
>>>>>>> v4 -> v5 that is not fixed by any of my changes.
>>>>>>>
>>>>>>
>>>>>> Hi Sultan, could you please try following modifications on top of v5 to see
>>>>>> if it helps?
>>>>>>
>>>>>> diff --git a/drivers/media/platform/amd/isp4/isp4_fw_cmd_resp.h
>>>>>> b/drivers/media/platform/amd/isp4/isp4_fw_cmd_resp.h
>>>>>> index 39c2265121f9..d571b3873edb 100644
>>>>>> --- a/drivers/media/platform/amd/isp4/isp4_fw_cmd_resp.h
>>>>>> +++ b/drivers/media/platform/amd/isp4/isp4_fw_cmd_resp.h
>>>>>> @@ -97,7 +97,7 @@
>>>>>>
>>>>>> #define ADDR_SPACE_TYPE_GPU_VA          4
>>>>>>
>>>>>> -#define FW_MEMORY_POOL_SIZE             (200 * 1024 * 1024)
>>>>>> +#define FW_MEMORY_POOL_SIZE             (100 * 1024 * 1024)
>>>>>>
>>>>>> /*
>>>>>>     * standard ISP mipicsi=>isp
>>>>>> diff --git a/drivers/media/platform/amd/isp4/isp4_subdev.c
>>>>>> b/drivers/media/platform/amd/isp4/isp4_subdev.c
>>>>>> index 248d10076ae8..acbc80aa709e 100644
>>>>>> --- a/drivers/media/platform/amd/isp4/isp4_subdev.c
>>>>>> +++ b/drivers/media/platform/amd/isp4/isp4_subdev.c
>>>>>> @@ -697,7 +697,7 @@ static int isp4sd_stop_resp_proc_threads(struct
>>>>>> isp4_subdev *isp_subdev)
>>>>>>           return 0;
>>>>>> }
>>>>>>
>>>>>> -static int isp4sd_pwroff_and_deinit(struct isp4_subdev *isp_subdev)
>>>>>> +static int isp4sd_pwroff_and_deinit(struct isp4_subdev *isp_subdev, bool
>>>>>> irq_enabled)
>>>>>> {
>>>>>>           struct isp4sd_sensor_info *sensor_info = &isp_subdev->sensor_info;
>>>>>>           unsigned int perf_state = ISP4SD_PERFORMANCE_STATE_LOW;
>>>>>> @@ -716,8 +716,9 @@ static int isp4sd_pwroff_and_deinit(struct isp4_subdev
>>>>>> *isp_subdev)
>>>>>>                   return 0;
>>>>>>           }
>>>>>>
>>>>>> -       for (int i = 0; i < ISP4SD_MAX_FW_RESP_STREAM_NUM; i++)
>>>>>> -               disable_irq(isp_subdev->irq[i]);
>>>>>> +       if (irq_enabled)
>>>>>> +               for (int i = 0; i < ISP4SD_MAX_FW_RESP_STREAM_NUM; i++)
>>>>>> +                       disable_irq(isp_subdev->irq[i]);
>>>>>>
>>>>>>           isp4sd_stop_resp_proc_threads(isp_subdev);
>>>>>>           dev_dbg(dev, "isp_subdev stop resp proc streads suc");
>>>>>> @@ -813,7 +814,7 @@ static int isp4sd_pwron_and_init(struct isp4_subdev
>>>>>> *isp_subdev)
>>>>>>
>>>>>>           return 0;
>>>>>> err_unlock_and_close:
>>>>>> -       isp4sd_pwroff_and_deinit(isp_subdev);
>>>>>> +       isp4sd_pwroff_and_deinit(isp_subdev, false);
>>>>>>           return -EINVAL;
>>>>>> }
>>>>>>
>>>>>> @@ -985,7 +986,7 @@ static int isp4sd_set_power(struct v4l2_subdev *sd, int
>>>>>> on)
>>>>>>           if (on)
>>>>>>                   return isp4sd_pwron_and_init(isp_subdev);
>>>>>>           else
>>>>>> -               return isp4sd_pwroff_and_deinit(isp_subdev);
>>>>>> +               return isp4sd_pwroff_and_deinit(isp_subdev, true);
>>>>>> }
>>>>>>
>>>>>> static const struct v4l2_subdev_core_ops isp4sd_core_ops = {
>>>>>
>>>>> No difference sadly; same symptoms as before. FYI, your email client broke the
>>>>> patch formatting so I couldn't apply it; it hard wrapped some lines to 80 cols,
>>>>> replaced tabs with spaces, and removed leading spaces on each context line, so I
>>>>> had to apply it manually. To confirm I applied it correctly, here is my diff:
>>>>>
>>>>> diff --git a/drivers/media/platform/amd/isp4/isp4_fw_cmd_resp.h b/drivers/media/platform/amd/isp4/isp4_fw_cmd_resp.h
>>>>> index 39c2265121f9..d571b3873edb 100644
>>>>> --- a/drivers/media/platform/amd/isp4/isp4_fw_cmd_resp.h
>>>>> +++ b/drivers/media/platform/amd/isp4/isp4_fw_cmd_resp.h
>>>>> @@ -97,7 +97,7 @@
>>>>>    #define ADDR_SPACE_TYPE_GPU_VA          4
>>>>> -#define FW_MEMORY_POOL_SIZE             (200 * 1024 * 1024)
>>>>> +#define FW_MEMORY_POOL_SIZE             (100 * 1024 * 1024)
>>>>>    /*
>>>>>     * standard ISP mipicsi=>isp
>>>>> diff --git a/drivers/media/platform/amd/isp4/isp4_subdev.c b/drivers/media/platform/amd/isp4/isp4_subdev.c
>>>>> index 4bd2ebf0f694..500ef0af8a41 100644
>>>>> --- a/drivers/media/platform/amd/isp4/isp4_subdev.c
>>>>> +++ b/drivers/media/platform/amd/isp4/isp4_subdev.c
>>>>> @@ -669,7 +669,7 @@ static int isp4sd_stop_resp_proc_threads(struct isp4_subdev *isp_subdev)
>>>>>    	return 0;
>>>>>    }
>>>>> -static int isp4sd_pwroff_and_deinit(struct isp4_subdev *isp_subdev)
>>>>> +static int isp4sd_pwroff_and_deinit(struct isp4_subdev *isp_subdev, bool irq_enabled)
>>>>>    {
>>>>>    	struct isp4sd_sensor_info *sensor_info = &isp_subdev->sensor_info;
>>>>>    	unsigned int perf_state = ISP4SD_PERFORMANCE_STATE_LOW;
>>>>> @@ -688,8 +688,9 @@ static int isp4sd_pwroff_and_deinit(struct isp4_subdev *isp_subdev)
>>>>>    		return 0;
>>>>>    	}
>>>>> -	for (int i = 0; i < ISP4SD_MAX_FW_RESP_STREAM_NUM; i++)
>>>>> -		disable_irq(isp_subdev->irq[i]);
>>>>> +	if (irq_enabled)
>>>>> +		for (int i = 0; i < ISP4SD_MAX_FW_RESP_STREAM_NUM; i++)
>>>>> +			disable_irq(isp_subdev->irq[i]);
>>>>>    	isp4sd_stop_resp_proc_threads(isp_subdev);
>>>>>    	dev_dbg(dev, "isp_subdev stop resp proc streads suc");
>>>>> @@ -785,7 +786,7 @@ static int isp4sd_pwron_and_init(struct isp4_subdev *isp_subdev)
>>>>>    	return 0;
>>>>>    err_unlock_and_close:
>>>>> -	isp4sd_pwroff_and_deinit(isp_subdev);
>>>>> +	isp4sd_pwroff_and_deinit(isp_subdev, false);
>>>>>    	return -EINVAL;
>>>>>    }
>>>>> @@ -957,7 +958,7 @@ static int isp4sd_set_power(struct v4l2_subdev *sd, int on)
>>>>>    	if (on)
>>>>>    		return isp4sd_pwron_and_init(isp_subdev);
>>>>>    	else
>>>>> -		return isp4sd_pwroff_and_deinit(isp_subdev);
>>>>> +		return isp4sd_pwroff_and_deinit(isp_subdev, true);
>>>>>    }
>>>>>    static const struct v4l2_subdev_core_ops isp4sd_core_ops = {
>>>>>
>>>>>> On the other hand, please also add the patch in amdgpu which sets *bo to
>>>>>> NULL in isp_kernel_buffer_alloc() which you mentioned in another thread.
>>>>>
>>>>> Yep, I have been doing all v5 testing with that patch applied.
>>>>>
>>>>> BTW, I have verified the IRQ changes are not the cause for the regression; I
>>>>> tested with IRQs kept enabled all the time and the issue still occurs.
>>>>>
>>>>> I did observe that ISP stops sending interrupts when the video stream freezes.
>>>>> And, if I replicate the bug enough times, it seems to permanently break FW until
>>>>> a full machine reboot. Reloading amd_capture with the v4 driver doesn't recover
>>>>> the ISP when this happens.
>>>>>
>>>>> As an improvement to the driver, can we do a hard reset of ISP on driver probe?
>>>>> I am assuming hardware doesn't need too long to settle after hard reset, maybe
>>>>> a few hundred milliseconds? This would ensure ISP FW is always in a working
>>>>> state when the driver is loaded.
>>>>>
>>>
>>> Actually, each time the camera is activated, the ISP driver powers on the
>>> ISP and initiates its firmware from the beginning; when the camera is
>>> closed, the ISP is powered off..
>>
>> Hmm, well I am able to put the ISP in a completely unusable state that doesn't
>> recover when I unload and reload amd_capture. Or maybe it is the sensor that is
>> stuck in a broken state?
> 
> Here is the log output when I try to start cheese after unloading and reloading
> amd_capture, where the ISP is in the broken state that requires rebooting the
> laptop (annotated with notes of what I saw/did at each point in time):
> 
> ==> opened cheese
> ==> cheese froze after a few seconds
> ==> closed cheese
>    [   34.570823] amd_isp_capture amd_isp_capture: fail to disable stream
>    [   35.077503] amd_isp_capture amd_isp_capture: fail to stop steam
>    [   35.077525] amd_isp_capture amd_isp_capture: disabling streaming failed (-110)
> ==> unloaded amd_capture
> ==> loaded amd_capture
> ==> opened cheese
>    [  308.039721] amd_isp_capture amd_isp_capture: ISP CCPU FW boot failed
>    [  308.043961] amd_isp_capture amd_isp_capture: fail to start isp_subdev interface
>    [  308.044188] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.044194] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.044196] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.044197] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.044198] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.044198] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.044199] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.044200] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.044200] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.044201] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.044202] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.065226] amd_isp_capture amd_isp_capture: power up isp fail -22
>    [  308.174814] amd_isp_capture amd_isp_capture: ISP CCPU FW boot failed
>    [  308.177807] amd_isp_capture amd_isp_capture: fail to start isp_subdev interface
>    [  308.178036] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.178043] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.178044] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.178045] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.178046] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.178047] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.178048] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.178048] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.178049] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.178050] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.178050] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.198776] amd_isp_capture amd_isp_capture: power up isp fail -22
>    [  308.306835] amd_isp_capture amd_isp_capture: ISP CCPU FW boot failed
>    [  308.311533] amd_isp_capture amd_isp_capture: fail to start isp_subdev interface
>    [  308.311723] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.311723] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.311724] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.311725] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.311725] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.311726] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.311726] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.311726] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.311727] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.311727] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.311727] amd_isp_capture amd_isp_capture: invalid mem_info
>    [  308.335281] amd_isp_capture amd_isp_capture: power up isp fail -22
> 

Thanks Sultan for the detailed info, I agree with you, tend to believe 
it may related to the sensor, I will follow up with the FW team to 
investigate further.

>>>>> Thanks,
>>>>> Sultan
>>>>
>>>> A small update: I reproduced the issue on v4, but it took several more cycles of
>>>> closing/opening cheese and waiting 30s compared to v5.
>>>>
>>>> Right now my best guess is that this is a timing issue with respect to FW that
>>>> was exposed by the v5 changes. v5 introduced slight changes in code timing, like
>>>> with the mutex locks getting replaced by spin locks.
>>>>
>>>> I'll try to insert mdelays to see if I can expose the issue that way on v4.
>>>>
>>>
>>> Could you kindly provide the FW used?
>>
>>    commit a89515d3ff79f12099f7a51b0f4932b881b7720e
>>    Author: Pratap Nirujogi <pratap.nirujogi@....com>
>>    Date:   Thu Aug 21 15:40:45 2025 -0400
>>
>>        amdgpu: Update ISP FW for isp v4.1.1
>>        
>>        From internal git commit:
>>        24557b7326e539183b3bc44cf8e1496c74d383d6
>>        
>>        Signed-off-by: Pratap Nirujogi <pratap.nirujogi@....com>
>>
>> Sultan
> 
> Sultan

-- 
Regards,
Bin


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ