lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aRPhMCwJjpMqAROG@sultan-box>
Date: Tue, 11 Nov 2025 17:21:52 -0800
From: Sultan Alsawaf <sultan@...neltoast.com>
To: "Du, Bin" <bin.du@....com>
Cc: mchehab@...nel.org, hverkuil@...all.nl,
	laurent.pinchart+renesas@...asonboard.com,
	bryan.odonoghue@...aro.org, sakari.ailus@...ux.intel.com,
	prabhakar.mahadev-lad.rj@...renesas.com,
	linux-media@...r.kernel.org, linux-kernel@...r.kernel.org,
	pratap.nirujogi@....com, benjamin.chan@....com, king.li@....com,
	gjorgji.rosikopulos@....com, Phil.Jawich@....com,
	Dominic.Antony@....com, mario.limonciello@....com,
	richard.gong@....com, anson.tsao@....com
Subject: Re: [PATCH v5 0/7] Add AMD ISP4 driver

On Tue, Nov 11, 2025 at 03:33:42PM -0800, Sultan Alsawaf wrote:
> On Tue, Nov 11, 2025 at 05:58:10PM +0800, Du, Bin wrote:
> > 
> > On 11/11/2025 5:05 PM, Sultan Alsawaf wrote:
> > 
> > > On Mon, Nov 10, 2025 at 05:46:28PM +0800, Du, Bin wrote:
> > > > Thank you, Sultan, for your time, big effort, and constant support.
> > > > Apologies for my delayed reply for being occupied a little with other
> > > > matters.
> > > > 
> > > > On 11/10/2025 4:33 PM, Sultan Alsawaf wrote:
> > > > > Hi Bin,
> > > > > 
> > > > > On Wed, Nov 05, 2025 at 01:25:58AM -0800, Sultan Alsawaf wrote:
> > > > > > Hi Bin,
> > > > > > 
> > > > > > To expedite review, I've attached a patch containing a bunch of fixes I've made
> > > > > > on top of v5. Most of my changes should be self-explanatory; feel free to ask
> > > > > > further about specific changes if you have any questions.
> > > > > > 
> > > > > > I haven't had a chance to review all of the v4 -> v5 changes yet, but I figured
> > > > > > I should send what I've got so far.
> > > > > > 
> > > > > > FYI, there is a regression in isp4if_dequeue_buffer() where the bufq lock isn't
> > > > > > protecting the list_del() anymore. I also checked the compiler output when using
> > > > > > guard() versus scoped_guard() in that function and there is no difference:
> > > > > > 
> > > > > >     sha1sum:
> > > > > >     5652a0306da22ea741d80a9e03a787d0f71758a8  isp4_interface.o // guard()
> > > > > >     5652a0306da22ea741d80a9e03a787d0f71758a8  isp4_interface.o // scoped_guard()
> > > > > > 
> > > > > > So guard() should be used there again, which I've done in my patch.
> > > > > > 
> > > > > > I also have a few questions:
> > > > > > 
> > > > > > 1. Does ISP FW provide a register interface to disable the IRQ? If so, it is
> > > > > >      faster to use that than using disable_irq_nosync() to disable the IRQ from
> > > > > >      the CPU's side.
> > > > > > 
> > > > > > 2. When the IRQ is re-enabled in isp4sd_fw_resp_func(), is there anything
> > > > > >      beforehand to mask all pending interrupts from the ISP so that there isn't a
> > > > > >      bunch of stale interrupts firing as soon the IRQ is re-enabled?
> > > > > > 
> > > > > > 3. Is there any risk of deadlock due to the stream kthread racing with the ISP
> > > > > >      when the ISP posts a new response _after_ the kthread determines there are no
> > > > > >      more new responses but _before_ the enable_irq() in isp4sd_fw_resp_func()?
> > > > > > 
> > > > > > 4. Why are some lines much longer than before? It seems inconsistent that now
> > > > > >      there is a mix of several lines wrapped to 80 cols and many lines going
> > > > > >      beyond 80 cols. And there are multiple places where code is wrapped before
> > > > > >      reaching 80 cols even with lots of room left, specifically for cases where it
> > > > > >      wouldn't hurt readability to put more characters onto each line.
> > > > > 
> > > > > I've attached a new, complete patch of changes to apply on top of v5. You may
> > > > > ignore the incomplete patch from my previous email and use the new one instead.
> > > > > 
> > > > > I made many changes and also answered questions 1-3 myself.
> > > > > 
> > > > > Please apply this on top of v5 and let me know if you have any questions.
> > > > > 
> > > > 
> > > > Sure, will review, apply and test your patch accordingly. Your contribution
> > > > is greatly appreciated, will let you know if there is any question or
> > > > problem.
> > > > 
> > > > > BTW, I noticed a strange regression in v5 even without any of my changes: every
> > > > > time you use cheese after using it one time, the video will freeze after 30-60
> > > > > seconds with this message printed to dmesg:
> > > > >     [ 2032.716559] amd_isp_capture amd_isp_capture: -><- fail respid unknown respid(0x30002)
> > > > > 
> > > > > And the video never unfreezes. I haven't found the cause for the regression yet;
> > > > > can you try to reproduce it?
> > > > > 
> > > > 
> > > > Really weird, we don't see this issue either in dev or QA test. Is it 100%
> > > > reproducible and any other fail or err in the log?
> > > 
> > > Yes, it's 100% reproducible. There's no other message in dmesg, only that one.
> > > 
> > > Sometimes there is a stop stream error when I close cheese after it froze:
> > > 
> > >    [  656.540307] amd_isp_capture amd_isp_capture: fail to disable stream
> > >    [  657.046633] amd_isp_capture amd_isp_capture: fail to stop steam
> > >    [  657.047224] amd_isp_capture amd_isp_capture: disabling streaming failed (-110)
> > > 
> > > When I revert to v4 I cannot reproduce it at all. It seems to be something in
> > > v4 -> v5 that is not fixed by any of my changes.
> > > 
> > 
> > Hi Sultan, could you please try following modifications on top of v5 to see
> > if it helps?
> > 
> > diff --git a/drivers/media/platform/amd/isp4/isp4_fw_cmd_resp.h
> > b/drivers/media/platform/amd/isp4/isp4_fw_cmd_resp.h
> > index 39c2265121f9..d571b3873edb 100644
> > --- a/drivers/media/platform/amd/isp4/isp4_fw_cmd_resp.h
> > +++ b/drivers/media/platform/amd/isp4/isp4_fw_cmd_resp.h
> > @@ -97,7 +97,7 @@
> > 
> > #define ADDR_SPACE_TYPE_GPU_VA          4
> > 
> > -#define FW_MEMORY_POOL_SIZE             (200 * 1024 * 1024)
> > +#define FW_MEMORY_POOL_SIZE             (100 * 1024 * 1024)
> > 
> > /*
> >   * standard ISP mipicsi=>isp
> > diff --git a/drivers/media/platform/amd/isp4/isp4_subdev.c
> > b/drivers/media/platform/amd/isp4/isp4_subdev.c
> > index 248d10076ae8..acbc80aa709e 100644
> > --- a/drivers/media/platform/amd/isp4/isp4_subdev.c
> > +++ b/drivers/media/platform/amd/isp4/isp4_subdev.c
> > @@ -697,7 +697,7 @@ static int isp4sd_stop_resp_proc_threads(struct
> > isp4_subdev *isp_subdev)
> >         return 0;
> > }
> > 
> > -static int isp4sd_pwroff_and_deinit(struct isp4_subdev *isp_subdev)
> > +static int isp4sd_pwroff_and_deinit(struct isp4_subdev *isp_subdev, bool
> > irq_enabled)
> > {
> >         struct isp4sd_sensor_info *sensor_info = &isp_subdev->sensor_info;
> >         unsigned int perf_state = ISP4SD_PERFORMANCE_STATE_LOW;
> > @@ -716,8 +716,9 @@ static int isp4sd_pwroff_and_deinit(struct isp4_subdev
> > *isp_subdev)
> >                 return 0;
> >         }
> > 
> > -       for (int i = 0; i < ISP4SD_MAX_FW_RESP_STREAM_NUM; i++)
> > -               disable_irq(isp_subdev->irq[i]);
> > +       if (irq_enabled)
> > +               for (int i = 0; i < ISP4SD_MAX_FW_RESP_STREAM_NUM; i++)
> > +                       disable_irq(isp_subdev->irq[i]);
> > 
> >         isp4sd_stop_resp_proc_threads(isp_subdev);
> >         dev_dbg(dev, "isp_subdev stop resp proc streads suc");
> > @@ -813,7 +814,7 @@ static int isp4sd_pwron_and_init(struct isp4_subdev
> > *isp_subdev)
> > 
> >         return 0;
> > err_unlock_and_close:
> > -       isp4sd_pwroff_and_deinit(isp_subdev);
> > +       isp4sd_pwroff_and_deinit(isp_subdev, false);
> >         return -EINVAL;
> > }
> > 
> > @@ -985,7 +986,7 @@ static int isp4sd_set_power(struct v4l2_subdev *sd, int
> > on)
> >         if (on)
> >                 return isp4sd_pwron_and_init(isp_subdev);
> >         else
> > -               return isp4sd_pwroff_and_deinit(isp_subdev);
> > +               return isp4sd_pwroff_and_deinit(isp_subdev, true);
> > }
> > 
> > static const struct v4l2_subdev_core_ops isp4sd_core_ops = {
> 
> No difference sadly; same symptoms as before. FYI, your email client broke the
> patch formatting so I couldn't apply it; it hard wrapped some lines to 80 cols,
> replaced tabs with spaces, and removed leading spaces on each context line, so I
> had to apply it manually. To confirm I applied it correctly, here is my diff:
> 
> diff --git a/drivers/media/platform/amd/isp4/isp4_fw_cmd_resp.h b/drivers/media/platform/amd/isp4/isp4_fw_cmd_resp.h
> index 39c2265121f9..d571b3873edb 100644
> --- a/drivers/media/platform/amd/isp4/isp4_fw_cmd_resp.h
> +++ b/drivers/media/platform/amd/isp4/isp4_fw_cmd_resp.h
> @@ -97,7 +97,7 @@
>  
>  #define ADDR_SPACE_TYPE_GPU_VA          4
>  
> -#define FW_MEMORY_POOL_SIZE             (200 * 1024 * 1024)
> +#define FW_MEMORY_POOL_SIZE             (100 * 1024 * 1024)
>  
>  /*
>   * standard ISP mipicsi=>isp
> diff --git a/drivers/media/platform/amd/isp4/isp4_subdev.c b/drivers/media/platform/amd/isp4/isp4_subdev.c
> index 4bd2ebf0f694..500ef0af8a41 100644
> --- a/drivers/media/platform/amd/isp4/isp4_subdev.c
> +++ b/drivers/media/platform/amd/isp4/isp4_subdev.c
> @@ -669,7 +669,7 @@ static int isp4sd_stop_resp_proc_threads(struct isp4_subdev *isp_subdev)
>  	return 0;
>  }
>  
> -static int isp4sd_pwroff_and_deinit(struct isp4_subdev *isp_subdev)
> +static int isp4sd_pwroff_and_deinit(struct isp4_subdev *isp_subdev, bool irq_enabled)
>  {
>  	struct isp4sd_sensor_info *sensor_info = &isp_subdev->sensor_info;
>  	unsigned int perf_state = ISP4SD_PERFORMANCE_STATE_LOW;
> @@ -688,8 +688,9 @@ static int isp4sd_pwroff_and_deinit(struct isp4_subdev *isp_subdev)
>  		return 0;
>  	}
>  
> -	for (int i = 0; i < ISP4SD_MAX_FW_RESP_STREAM_NUM; i++)
> -		disable_irq(isp_subdev->irq[i]);
> +	if (irq_enabled)
> +		for (int i = 0; i < ISP4SD_MAX_FW_RESP_STREAM_NUM; i++)
> +			disable_irq(isp_subdev->irq[i]);
>  
>  	isp4sd_stop_resp_proc_threads(isp_subdev);
>  	dev_dbg(dev, "isp_subdev stop resp proc streads suc");
> @@ -785,7 +786,7 @@ static int isp4sd_pwron_and_init(struct isp4_subdev *isp_subdev)
>  
>  	return 0;
>  err_unlock_and_close:
> -	isp4sd_pwroff_and_deinit(isp_subdev);
> +	isp4sd_pwroff_and_deinit(isp_subdev, false);
>  	return -EINVAL;
>  }
>  
> @@ -957,7 +958,7 @@ static int isp4sd_set_power(struct v4l2_subdev *sd, int on)
>  	if (on)
>  		return isp4sd_pwron_and_init(isp_subdev);
>  	else
> -		return isp4sd_pwroff_and_deinit(isp_subdev);
> +		return isp4sd_pwroff_and_deinit(isp_subdev, true);
>  }
>  
>  static const struct v4l2_subdev_core_ops isp4sd_core_ops = {
> 
> > On the other hand, please also add the patch in amdgpu which sets *bo to
> > NULL in isp_kernel_buffer_alloc() which you mentioned in another thread.
> 
> Yep, I have been doing all v5 testing with that patch applied. 
> 
> BTW, I have verified the IRQ changes are not the cause for the regression; I
> tested with IRQs kept enabled all the time and the issue still occurs.
> 
> I did observe that ISP stops sending interrupts when the video stream freezes.
> And, if I replicate the bug enough times, it seems to permanently break FW until
> a full machine reboot. Reloading amd_capture with the v4 driver doesn't recover
> the ISP when this happens.
> 
> As an improvement to the driver, can we do a hard reset of ISP on driver probe?
> I am assuming hardware doesn't need too long to settle after hard reset, maybe
> a few hundred milliseconds? This would ensure ISP FW is always in a working
> state when the driver is loaded.
> 
> Thanks,
> Sultan

A small update: I reproduced the issue on v4, but it took several more cycles of
closing/opening cheese and waiting 30s compared to v5.

Right now my best guess is that this is a timing issue with respect to FW that
was exposed by the v5 changes. v5 introduced slight changes in code timing, like
with the mutex locks getting replaced by spin locks.

I'll try to insert mdelays to see if I can expose the issue that way on v4.

Sultan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ