lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
 <SE1P216MB130388BE5EE8CA13A1DCDF28EDAA2@SE1P216MB1303.KORP216.PROD.OUTLOOK.COM>
Date: Mon, 7 Apr 2025 00:37:32 +0000
From: jackson.lee <jackson.lee@...psnmedia.com>
To: Nicolas Dufresne <nicolas.dufresne@...labora.com>, "mchehab@...nel.org"
	<mchehab@...nel.org>, "hverkuil-cisco@...all.nl" <hverkuil-cisco@...all.nl>,
	"sebastian.fricke@...labora.com" <sebastian.fricke@...labora.com>,
	"bob.beckett@...labora.com" <bob.beckett@...labora.com>,
	"dafna.hirschfeld@...labora.com" <dafna.hirschfeld@...labora.com>
CC: "linux-media@...r.kernel.org" <linux-media@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, lafley.kim
	<lafley.kim@...psnmedia.com>, "b-brnich@...com" <b-brnich@...com>,
	"hverkuil@...all.nl" <hverkuil@...all.nl>, Nas Chung
	<nas.chung@...psnmedia.com>
Subject: RE: [RESEND PATCH v0 1/3] media: chips-media: wave5: Improve
 performance of decoder

Hi Nicolas

> -----Original Message-----
> From: Nicolas Dufresne <nicolas.dufresne@...labora.com>
> Sent: Saturday, April 5, 2025 1:00 AM
> To: jackson.lee <jackson.lee@...psnmedia.com>; mchehab@...nel.org;
> hverkuil-cisco@...all.nl; sebastian.fricke@...labora.com;
> bob.beckett@...labora.com; dafna.hirschfeld@...labora.com
> Cc: linux-media@...r.kernel.org; linux-kernel@...r.kernel.org; lafley.kim
> <lafley.kim@...psnmedia.com>; b-brnich@...com; hverkuil@...all.nl; Nas
> Chung <nas.chung@...psnmedia.com>
> Subject: Re: [RESEND PATCH v0 1/3] media: chips-media: wave5: Improve
> performance of decoder
> 
> Hi,
> 
> Le vendredi 04 avril 2025 à 04:16 +0000, jackson.lee a écrit :
> > Hi Nicolas
> >
> > Thanks for your review.
> > Can you check my comment for your review?
> >
> >
> > > -----Original Message-----
> > > From: Nicolas Dufresne <nicolas.dufresne@...labora.com>
> > > Sent: Thursday, April 3, 2025 3:03 AM
> > > To: jackson.lee <jackson.lee@...psnmedia.com>; mchehab@...nel.org;
> > > hverkuil-cisco@...all.nl; sebastian.fricke@...labora.com;
> > > bob.beckett@...labora.com; dafna.hirschfeld@...labora.com
> > > Cc: linux-media@...r.kernel.org; linux-kernel@...r.kernel.org;
> > > lafley.kim <lafley.kim@...psnmedia.com>; b-brnich@...com;
> > > hverkuil@...all.nl; Nas Chung <nas.chung@...psnmedia.com>
> > > Subject: Re: [RESEND PATCH v0 1/3] media: chips-media: wave5:
> > > Improve
> > > performance of decoder
> > >
> > > Le mercredi 19 mars 2025 à 12:50 +0900, Jackson.lee a écrit :
> > > > From: Jackson Lee <jackson.lee@...psnmedia.com>
> > > >
> > > > The existing way for decoding frames was to wait until each frame
> > > > was
> > >
> > > Suggestion:
> > >   The current decoding method was to...
> > >
> > > > decoded after feeding a bitstream. As a result, performance was
> > > > low and Wave5 could not achieve max pixel processing rate.
> > > >
> > > > Update driver to use an asynchronous approach for decoding and
> > > > feeding a
> > >   Update the driver ..
> > >
> > > > bitstream in order to achieve full capabilities of the device.
> > >
> > > This second part lacks the technical design choices needed for your
> > > reviewer ;-P I did read the patch once, and am under the impression
> > > that
> > > your approach is to not finish the job anymore.
> > >
> > > >
> > > > Signed-off-by: Jackson Lee <jackson.lee@...psnmedia.com>
> > > > Signed-off-by: Nas Chung <nas.chung@...psnmedia.com>
> > > > ---
> > > >  .../chips-media/wave5/wave5-vpu-dec.c         | 344
> > > > ++++++++++--------
> > > >  .../chips-media/wave5/wave5-vpu-enc.c         |   3 -
> > > >  .../platform/chips-media/wave5/wave5-vpuapi.c |  25 +-
> > > >  .../platform/chips-media/wave5/wave5-vpuapi.h |   5 +-
> > > >  4 files changed, 211 insertions(+), 166 deletions(-)
> > > >
> > > > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-
> > > > dec.c
> > > > b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > > > index fd71f0c43ac3..cc47da509703 100644
> > > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > > > @@ -230,12 +230,147 @@ static int start_decode(struct
> > > > vpu_instance *inst,
> > > u32 *fail_res)
> > > >  		switch_state(inst, VPU_INST_STATE_STOP);
> > > >
> > > >  		dev_dbg(inst->dev->dev, "%s: pic run failed /
> > > > finish job",
> > > __func__);
> > > > -		v4l2_m2m_job_finish(inst->v4l2_m2m_dev,
> > > > m2m_ctx);
> > > >  	}
> > > >
> > > >  	return ret;
> > > >  }
> > > >
> > > > +static int write_to_ringbuffer(struct vpu_instance *inst, void
> > > > *buffer,
> > > size_t buffer_size,
> > > > +			       struct vpu_buf *ring_buffer,
> > > > dma_addr_t wr_ptr)
> > > {
> > >
> > > It seems that from now on this needs to be called with the
> > > feed_lock, so
> > > add:
> > >
> > > 	 lockdep_assert_held(&inst->feed_lock);
> > >
> >
> >
> > The "feed_lock" mutex is for protecting "inst->avail_src_bufs" this
> > list, so the feed_lock should be used when "static int
> > fill_ringbuffer(struct vpu_instance *inst)" is called, not the
> > write_to_ringbuffer.
> > So I think the feed_lock should not be added in the
> > write_to_ringbuffer function.
> 
> This is called from "fill_ringbuffer()" and both calls to that seems to
> take that lock, my comment was just to make it explicit with lockdep.
> This is called from CMD_STOP and from device_run() which can be
> concurrent. I think the code effectively protecting the ring buffer
> state with.
> 
> 
> +	mutex_lock(&inst->feed_lock);
> +	ret = fill_ringbuffer(inst);
> +	mutex_unlock(&inst->feed_lock);
> 
> note that there is bunch of random locking too in the abstraction, this
> driver will need some cleanup in the long run imho.
> 
> >
> >
> > > > +	size_t size;
> > > > +	size_t offset = wr_ptr - ring_buffer->daddr;
> > > > +	int ret;
> > > > +
> > > > +	if (wr_ptr + buffer_size > ring_buffer->daddr +
> > > > ring_buffer->size)
> > > {
> > > > +		size = ring_buffer->daddr + ring_buffer->size -
> > > > wr_ptr;
> > > > +		ret = wave5_vdi_write_memory(inst->dev,
> > > > ring_buffer, offset,
> > > (u8 *)buffer, size);
> > > > +		if (ret < 0)
> > > > +			return ret;
> > > > +
> > > > +		ret = wave5_vdi_write_memory(inst->dev,
> > > > ring_buffer, 0, (u8
> > > *)buffer + size,
> > > > +					     buffer_size -
> > > > size);
> > > > +		if (ret < 0)
> > > > +			return ret;
> > > > +	} else {
> > > > +		ret = wave5_vdi_write_memory(inst->dev,
> > > > ring_buffer, offset,
> > > (u8 *)buffer,
> > > > +					     buffer_size);
> > > > +		if (ret < 0)
> > > > +			return ret;
> > > > +	}
> > > > +
> > > > +	return 0;
> > > > +}
> > >
> > > I have a pretty strong preference in seeing code being moved around
> > > in
> > > seperate patch documented as "non functional changes". This way I
> > > know I
> > > don't have to review again that code.
> >
> > I will do it.
> >
> >
> > >
> > > > +
> > > > +static int fill_ringbuffer(struct vpu_instance *inst) {
> > >
> > > Same, I don't want to have to do back and fourth to check if you
> > > changed
> > > anything. Also, after the refactoring, the lockdeup assert can be
> > > added in
> > > the new patch, which makes the introduction of the new lock very
> > > obvious.
> > >
> > > > +	struct v4l2_m2m_ctx *m2m_ctx = inst->v4l2_fh.m2m_ctx;
> > > > +	struct vpu_src_buffer *vpu_buf;
> > > > +	int ret = 0;
> > > > +
> > > > +	if (m2m_ctx->last_src_buf)  {
> > > > +		struct vpu_src_buffer *vpu_buf =
> > > > +wave5_to_vpu_src_buf(m2m_ctx->last_src_buf);
> > > > +
> > > > +		if (vpu_buf->consumed) {
> > > > +			dev_dbg(inst->dev->dev, "last src buffer
> > > > already
> > > written\n");
> > > > +			return 0;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	list_for_each_entry(vpu_buf, &inst->avail_src_bufs,
> > > > list) {
> > > > +		struct vb2_v4l2_buffer *vbuf = &vpu_buf-
> > > > >v4l2_m2m_buf.vb;
> > > > +		struct vpu_buf *ring_buffer = &inst-
> > > > >bitstream_vbuf;
> > > > +		size_t src_size = vb2_get_plane_payload(&vbuf-
> > > > >vb2_buf, 0);
> > > > +		void *src_buf = vb2_plane_vaddr(&vbuf->vb2_buf,
> > > > 0);
> > > > +		dma_addr_t rd_ptr = 0;
> > > > +		dma_addr_t wr_ptr = 0;
> > > > +		size_t remain_size = 0;
> > > > +
> > > > +		if (vpu_buf->consumed) {
> > > > +			dev_dbg(inst->dev->dev, "already copied
> > > > src buf (%u)
> > > to the ring buffer\n",
> > > > +				vbuf->vb2_buf.index);
> > > > +			continue;
> > > > +		}
> > > > +
> > > > +		if (!src_buf) {
> > > > +			dev_dbg(inst->dev->dev,
> > > > +				"%s: Acquiring kernel pointer to
> > > > src buf (%u),
> > > fail\n",
> > > > +				__func__, vbuf->vb2_buf.index);
> > > > +			break;
> > > > +		}
> > > > +
> > > > +		ret = wave5_vpu_dec_get_bitstream_buffer(inst,
> > > > &rd_ptr,
> > > &wr_ptr, &remain_size);
> > > > +		if (ret) {
> > > > +			/* Unable to acquire the mutex */
> > >
> > > Might want to fix that comment in a separate patch, perhaps even
> > > remove
> > > that ?
> > >
> >
> > Do you want to remove the comment in a separate patch ?
> 
> Yes please.
> 
> >
> >
> >
> > > > +			dev_err(inst->dev->dev, "Getting the
> > > > bitstream buffer,
> > > fail: %d\n",
> > > > +				ret);
> > > > +			return ret;
> > > > +		}
> > > > +
> > > > +		dev_dbg(inst->dev->dev, "%s: rd_ptr %pad wr_ptr
> > > > %pad",
> > > __func__,
> > > > +&rd_ptr, &wr_ptr);
> > > > +
> > > > +		if (remain_size < src_size) {
> > > > +			dev_dbg(inst->dev->dev,
> > > > +				"%s: remaining size: %zu <
> > > > source size: %zu for
> > > src buf (%u)\n",
> > > > +				__func__, remain_size, src_size,
> > > > vbuf-
> > > > vb2_buf.index);
> > > > +			break;
> > > > +		}
> > > > +
> > > > +		ret = write_to_ringbuffer(inst, src_buf,
> > > > src_size,
> > > ring_buffer, wr_ptr);
> > > > +		if (ret) {
> > > > +			dev_err(inst->dev->dev, "Write src buf
> > > > (%u) to ring
> > > buffer, fail: %d\n",
> > > > +				vbuf->vb2_buf.index, ret);
> > > > +			return ret;
> > > > +		}
> > > > +
> > > > +		ret =
> > > > wave5_vpu_dec_update_bitstream_buffer(inst, src_size);
> > > > +		if (ret) {
> > > > +			dev_dbg(inst->dev->dev,
> > > > +				"update_bitstream_buffer fail:
> > > > %d for src buf
> > > (%u)\n",
> > > > +				ret, vbuf->vb2_buf.index);
> > > > +			break;
> > > > +		}
> > > > +
> > > > +		vpu_buf->consumed = true;
> > > > +
> > > > +		/* Don't write buffers passed the last one while
> > > > draining.
> > > */
> > > > +		if (v4l2_m2m_is_last_draining_src_buf(m2m_ctx,
> > > > vbuf)) {
> > > > +			dev_dbg(inst->dev->dev, "last src buffer
> > > > written to
> > > the ring buffer\n");
> > > > +			break;
> > > > +		}
> > > > +
> > > > +		inst->queuing_num++;
> > > > +		list_del_init(&vpu_buf->list);
> > > > +		break;
> > > > +	}
> > > > +
> > > > +	return ret;
> > > > +}
> > > > +
> > > > +static void wave5_vpu_dec_feed_remaining(struct vpu_instance
> > > > *inst) {
> > > > +	int ret = 0;
> > > > +	struct v4l2_m2m_ctx *m2m_ctx = inst->v4l2_fh.m2m_ctx;
> > > > +	u32 fail_res = 0;
> > > > +
> > > > +	mutex_lock(&inst->feed_lock);
> > > > +	ret = fill_ringbuffer(inst);
> > > > +	mutex_unlock(&inst->feed_lock);
> > > > +	if (ret) {
> > > > +		dev_warn(inst->dev->dev, "Filling ring buffer
> > > > failed\n");
> > > > +		return;
> > >
> > > I don't think the error handling is correct. What will happen with
> > > the
> > > job ?
> > >
> > > > +	}
> > > > +
> > > > +	ret = start_decode(inst, &fail_res);
> > >
> > > I'm a little worried seeing this called outside of device_run()
> > > context.
> > >
> > > > +	if (ret) {
> > > > +		dev_err(inst->dev->dev,
> > > > +			"Frame decoding on m2m context (%p),
> > > > fail: %d
> > > (result: %d)\n",
> > > > +			m2m_ctx, ret, fail_res);
> > > > +	}
> > > > +
> > > > +	v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
> > >
> > >
> > >
> > > > +}
> > > > +
> > > >  static void flag_last_buffer_done(struct vpu_instance *inst)
> > > >  {
> > > >  	struct v4l2_m2m_ctx *m2m_ctx = inst->v4l2_fh.m2m_ctx; @@
> > > > -347,7
> > > > +482,6 @@ static void wave5_vpu_dec_finish_decode(struct
> > > > vpu_instance
> > > *inst)
> > > >  	struct vb2_v4l2_buffer *dec_buf = NULL;
> > > >  	struct vb2_v4l2_buffer *disp_buf = NULL;
> > > >  	struct vb2_queue *dst_vq = v4l2_m2m_get_dst_vq(m2m_ctx);
> > > > -	struct queue_status_info q_status;
> > > >
> > > >  	dev_dbg(inst->dev->dev, "%s: Fetch output info from
> > > > firmware.",
> > > > __func__);
> > > >
> > > > @@ -441,20 +575,6 @@ static void
> > > > wave5_vpu_dec_finish_decode(struct
> > > vpu_instance *inst)
> > > >  		}
> > > >  		spin_unlock_irqrestore(&inst->state_spinlock,
> > > > flags);
> > > >  	}
> > > > -
> > > > -	/*
> > > > -	 * During a resolution change and while draining, the
> > > > firmware may
> > > flush
> > > > -	 * the reorder queue regardless of having a matching
> > > > decoding
> > > operation
> > > > -	 * pending. Only terminate the job if there are no more
> > > > IRQ coming.
> > > > -	 */
> > > > -	wave5_vpu_dec_give_command(inst, DEC_GET_QUEUE_STATUS,
> > > > &q_status);
> > > > -	if (q_status.report_queue_count == 0 &&
> > > > -	    (q_status.instance_queue_count == 0 ||
> > > dec_info.sequence_changed)) {
> > > > -		dev_dbg(inst->dev->dev, "%s: finishing job.\n",
> > > > __func__);
> > > > -		pm_runtime_mark_last_busy(inst->dev->dev);
> > > > -		pm_runtime_put_autosuspend(inst->dev->dev);
> > > > -		v4l2_m2m_job_finish(inst->v4l2_m2m_dev,
> > > > m2m_ctx);
> > > > -	}
> > >
> > > This removal is why I believe some seek stress testing are needed.
> > > I'll
> > > try and find some tests file I crafted last year.
> > >
> > > >  }
> > > >
> > > >  static int wave5_vpu_dec_querycap(struct file *file, void *fh,
> > > > struct
> > > > v4l2_capability *cap) @@ -794,11 +914,21 @@ static int
> > > wave5_vpu_dec_stop(struct vpu_instance *inst)
> > > >  	}
> > > >
> > > >  	if (inst->state != VPU_INST_STATE_NONE) {
> > > > +		struct vb2_v4l2_buffer *vbuf;
> > > > +		struct vpu_src_buffer *vpu_buf;
> > > > +
> > > >  		/*
> > > >  		 * Temporarily release the state_spinlock so
> > > > that subsequent
> > > >  		 * calls do not block on a mutex while inside
> > > > this spinlock.
> > > >  		 */
> > > >  		spin_unlock_irqrestore(&inst->state_spinlock,
> > > > flags);
> > > > +		vbuf = v4l2_m2m_last_src_buf(m2m_ctx);
> > > > +		if (vbuf) {
> > > > +			vpu_buf = wave5_to_vpu_src_buf(vbuf);
> > > > +			if (!vpu_buf->consumed)
> > > > +				wave5_vpu_dec_feed_remaining(ins
> > > > t);
> > > > +		}
> > > > +
> > >
> > > Do you really need to add all these checks ? Shouldn't you just
> > > "feed_remaining()" and move on ? I also don't think its a good idea
> > > to
> > > call v4l2_m2m_job_finish() in code that can run concurrently with
> > > the
> > > device_run() function. This should only happen if you are certain
> > > the VPU
> > > instance is stopped already.
> > >
> >
> > While testing fluster,  stop command is sometimes called  before the
> > last bitstream queued  is handled.
> > The stop command sets EOS before handling the last bitstream, so
> > sometimes mismatch happens.
> > I think the code should be add in the wave5_vpu_dec_stop.
> 
> This would be a regression, we took good care of handling before and
> after case correctly.


I don't think this is a regression issue, but I will check it.


> >
> >
> > >
> > > >  		ret = wave5_vpu_dec_set_eos_on_firmware(inst);
> > > >  		if (ret)
> > > >  			return ret;
> > > > @@ -1116,115 +1246,6 @@ static int wave5_prepare_fb(struct
> > > > vpu_instance
> > > *inst)
> > > >  	return 0;
> > > >  }
> > > >
> > > > -static int write_to_ringbuffer(struct vpu_instance *inst, void
> > > > *buffer,
> > > size_t buffer_size,
> > > > -			       struct vpu_buf *ring_buffer,
> > > > dma_addr_t wr_ptr)
> > > > -{
> > > > -	size_t size;
> > > > -	size_t offset = wr_ptr - ring_buffer->daddr;
> > > > -	int ret;
> > > > -
> > > > -	if (wr_ptr + buffer_size > ring_buffer->daddr +
> > > > ring_buffer->size)
> > > {
> > > > -		size = ring_buffer->daddr + ring_buffer->size -
> > > > wr_ptr;
> > > > -		ret = wave5_vdi_write_memory(inst->dev,
> > > > ring_buffer, offset,
> > > (u8 *)buffer, size);
> > > > -		if (ret < 0)
> > > > -			return ret;
> > > > -
> > > > -		ret = wave5_vdi_write_memory(inst->dev,
> > > > ring_buffer, 0, (u8
> > > *)buffer + size,
> > > > -					     buffer_size -
> > > > size);
> > > > -		if (ret < 0)
> > > > -			return ret;
> > > > -	} else {
> > > > -		ret = wave5_vdi_write_memory(inst->dev,
> > > > ring_buffer, offset,
> > > (u8 *)buffer,
> > > > -					     buffer_size);
> > > > -		if (ret < 0)
> > > > -			return ret;
> > > > -	}
> > > > -
> > > > -	return 0;
> > > > -}
> > > > -
> > > > -static int fill_ringbuffer(struct vpu_instance *inst) -{
> > > > -	struct v4l2_m2m_ctx *m2m_ctx = inst->v4l2_fh.m2m_ctx;
> > > > -	struct v4l2_m2m_buffer *buf, *n;
> > > > -	int ret;
> > > > -
> > > > -	if (m2m_ctx->last_src_buf)  {
> > > > -		struct vpu_src_buffer *vpu_buf =
> > > wave5_to_vpu_src_buf(m2m_ctx->last_src_buf);
> > > > -
> > > > -		if (vpu_buf->consumed) {
> > > > -			dev_dbg(inst->dev->dev, "last src buffer
> > > > already
> > > written\n");
> > > > -			return 0;
> > > > -		}
> > > > -	}
> > > > -
> > > > -	v4l2_m2m_for_each_src_buf_safe(m2m_ctx, buf, n) {
> > > > -		struct vb2_v4l2_buffer *vbuf = &buf->vb;
> > > > -		struct vpu_src_buffer *vpu_buf =
> > > > wave5_to_vpu_src_buf(vbuf);
> > > > -		struct vpu_buf *ring_buffer = &inst-
> > > > >bitstream_vbuf;
> > > > -		size_t src_size = vb2_get_plane_payload(&vbuf-
> > > > >vb2_buf, 0);
> > > > -		void *src_buf = vb2_plane_vaddr(&vbuf->vb2_buf,
> > > > 0);
> > > > -		dma_addr_t rd_ptr = 0;
> > > > -		dma_addr_t wr_ptr = 0;
> > > > -		size_t remain_size = 0;
> > > > -
> > > > -		if (vpu_buf->consumed) {
> > > > -			dev_dbg(inst->dev->dev, "already copied
> > > > src buf (%u)
> > > to the ring buffer\n",
> > > > -				vbuf->vb2_buf.index);
> > > > -			continue;
> > > > -		}
> > > > -
> > > > -		if (!src_buf) {
> > > > -			dev_dbg(inst->dev->dev,
> > > > -				"%s: Acquiring kernel pointer to
> > > > src buf (%u),
> > > fail\n",
> > > > -				__func__, vbuf->vb2_buf.index);
> > > > -			break;
> > > > -		}
> > > > -
> > > > -		ret = wave5_vpu_dec_get_bitstream_buffer(inst,
> > > > &rd_ptr,
> > > &wr_ptr, &remain_size);
> > > > -		if (ret) {
> > > > -			/* Unable to acquire the mutex */
> > > > -			dev_err(inst->dev->dev, "Getting the
> > > > bitstream buffer,
> > > fail: %d\n",
> > > > -				ret);
> > > > -			return ret;
> > > > -		}
> > > > -
> > > > -		dev_dbg(inst->dev->dev, "%s: rd_ptr %pad wr_ptr
> > > > %pad",
> > > __func__, &rd_ptr, &wr_ptr);
> > > > -
> > > > -		if (remain_size < src_size) {
> > > > -			dev_dbg(inst->dev->dev,
> > > > -				"%s: remaining size: %zu <
> > > > source size: %zu for
> > > src buf (%u)\n",
> > > > -				__func__, remain_size, src_size,
> > > > vbuf-
> > > > vb2_buf.index);
> > > > -			break;
> > > > -		}
> > > > -
> > > > -		ret = write_to_ringbuffer(inst, src_buf,
> > > > src_size,
> > > ring_buffer, wr_ptr);
> > > > -		if (ret) {
> > > > -			dev_err(inst->dev->dev, "Write src buf
> > > > (%u) to ring
> > > buffer, fail: %d\n",
> > > > -				vbuf->vb2_buf.index, ret);
> > > > -			return ret;
> > > > -		}
> > > > -
> > > > -		ret =
> > > > wave5_vpu_dec_update_bitstream_buffer(inst, src_size);
> > > > -		if (ret) {
> > > > -			dev_dbg(inst->dev->dev,
> > > > -				"update_bitstream_buffer fail:
> > > > %d for src buf
> > > (%u)\n",
> > > > -				ret, vbuf->vb2_buf.index);
> > > > -			break;
> > > > -		}
> > > > -
> > > > -		vpu_buf->consumed = true;
> > > > -
> > > > -		/* Don't write buffers passed the last one while
> > > > draining.
> > > */
> > > > -		if (v4l2_m2m_is_last_draining_src_buf(m2m_ctx,
> > > > vbuf)) {
> > > > -			dev_dbg(inst->dev->dev, "last src buffer
> > > > written to
> > > the ring buffer\n");
> > > > -			break;
> > > > -		}
> > > > -	}
> > > > -
> > > > -	return 0;
> > > > -}
> > > > -
> > > >  static void wave5_vpu_dec_buf_queue_src(struct vb2_buffer *vb)
> > > >  {
> > > >  	struct vpu_instance *inst = vb2_get_drv_priv(vb-
> > > > >vb2_queue); @@
> > > > -1236,6 +1257,11 @@ static void
> > > > wave5_vpu_dec_buf_queue_src(struct
> > > vb2_buffer *vb)
> > > >  	vbuf->sequence = inst->queued_src_buf_num++;
> > > >
> > > >  	v4l2_m2m_buf_queue(m2m_ctx, vbuf);
> > > > +
> > > > +	INIT_LIST_HEAD(&vpu_buf->list);
> > > > +	mutex_lock(&inst->feed_lock);
> > > > +	list_add_tail(&vpu_buf->list, &inst->avail_src_bufs);
> > > > +	mutex_unlock(&inst->feed_lock);
> > >
> > > This lack documentation, you didn't even mention the introduction
> > > of a new
> > > src buf list in the commit message.
> > >
> > > >  }
> > > >
> > > >  static void wave5_vpu_dec_buf_queue_dst(struct vb2_buffer *vb)
> > > > @@
> > > > -1362,7 +1388,6 @@ static int
> > > > wave5_vpu_dec_start_streaming(struct
> > > vb2_queue *q, unsigned int count
> > > >  				goto return_buffers;
> > > >  			}
> > > >  		}
> > > > -
> > > >  	}
> > > >  	pm_runtime_mark_last_busy(inst->dev->dev);
> > > >  	pm_runtime_put_autosuspend(inst->dev->dev);
> > > > @@ -1385,6 +1410,13 @@ static int streamoff_output(struct
> > > > vb2_queue *q)
> > > >  	dma_addr_t new_rd_ptr;
> > > >  	struct dec_output_info dec_info;
> > > >  	unsigned int i;
> > > > +	struct vpu_src_buffer *vpu_buf, *tmp;
> > > > +
> > > > +	inst->retry = false;
> > > > +	inst->queuing_num = 0;
> > > > +
> > > > +	list_for_each_entry_safe(vpu_buf, tmp, &inst-
> > > > >avail_src_bufs, list)
> > > > +		list_del_init(&vpu_buf->list);
> > > >
> > > >  	for (i = 0; i < v4l2_m2m_num_dst_bufs_ready(m2m_ctx);
> > > > i++) {
> > > >  		ret = wave5_vpu_dec_set_disp_flag(inst, i); @@ -
> > > > 1481,10
> > > +1513,8 @@
> > > > static void wave5_vpu_dec_stop_streaming(struct vb2_queue *q)
> > > >
> > > >  		wave5_vpu_dec_give_command(inst,
> > > > DEC_GET_QUEUE_STATUS,
> > > &q_status);
> > > >
> > > > -		if (q_status.report_queue_count == 0)
> > > > -			break;
> > > > -
> > > > -		if (wave5_vpu_wait_interrupt(inst,
> > > > VPU_DEC_TIMEOUT) < 0)
> > > > +		if ((inst->state == VPU_INST_STATE_STOP ||
> > > > +q_status.instance_queue_count == 0) &&
> > >
> > > State should be read with appropriate locking.
> > >
> > > > +		    q_status.report_queue_count == 0)
> > > >  			break;
> > >
> > > This loop have nothing to wait on anymore. It just actively pool on
> > > the HW.
> > > I noticed your third patch, but if this is what you address there,
> > > squash
> > > it somehow, this would break bisects.
> > >
> > > The goal of that code was to ensure that this code will never run
> > > concurrently to finish_decode() once out of that while loop. I
> > > don't think
> > > you break that, but that will cause possibly high CPU load.
> > >
> >
> >
> > The code is not related with the third patch, there was sometimes HW
> > interrupt misses since changing to asynchronous, so an infinite wait
> > happened when closed
> > So I removed it, but I think there is no high CPU load because it
> > runs very short time.
> 
> If the firmware get stuck, Linux can spin, this cannot be done that way
> I'm sorry. That its most of the time spinning only few times does not
> seems like valid justification. We need a limit, we need some
> preemption, etc.
> 
> >
> > > >
> > > >  		if (wave5_vpu_dec_get_output_info(inst,
> > > > &dec_output_info))
> > > @@
> > > > -1577,13 +1607,23 @@ static void wave5_vpu_dec_device_run(void
> > > > *priv)
> > > >  	struct queue_status_info q_status;
> > > >  	u32 fail_res = 0;
> > > >  	int ret = 0;
> > > > +	unsigned long flags;
> > > >
> > > >  	dev_dbg(inst->dev->dev, "%s: Fill the ring buffer with
> > > > new
> > > bitstream data", __func__);
> > > >  	pm_runtime_resume_and_get(inst->dev->dev);
> > > > -	ret = fill_ringbuffer(inst);
> > > > -	if (ret) {
> > > > -		dev_warn(inst->dev->dev, "Filling ring buffer
> > > > failed\n");
> > > > -		goto finish_job_and_return;
> > > > +	if (!inst->retry) {
> > > > +		mutex_lock(&inst->feed_lock);
> > > > +		ret = fill_ringbuffer(inst);
> > > > +		mutex_unlock(&inst->feed_lock);
> > > > +		if (ret < 0) {
> > > > +			dev_warn(inst->dev->dev, "Filling ring
> > > > buffer
> > > failed\n");
> > > > +			goto finish_job_and_return;
> > > > +		} else if (!inst->eos &&
> > > > +			   inst->queuing_num == 0 &&
> > > > +			   inst->state ==
> > > > VPU_INST_STATE_PIC_RUN) {
> > > > +			dev_dbg(inst->dev->dev, "%s: no
> > > > bitstream for feeding,
> > > so skip ", __func__);
> > > > +			goto finish_job_and_return;
> > > > +		}
> > >
> > > There is quite some overlap between that local src buffer queue and
> > > all
> > > the code that depends on v4l2_m2m_num_src_bufs_ready(). Are you
> > > sure you
> > > aren't re-implementing something ? Also, this code belong to the
> > > PIC_RUN
> > > case in the below switch.
> > >
> >
> >
> > After src buffer was queued, v4l2_m2m_num_src_bufs_ready() always
> > returns true until src buffer was removed in the finish_decode since
> > get_result.
> > So checking if there is buffer or not in the src bufs list was needed
> > in the device_run.
> 
> Ack.
> 
> >
> >
> >
> > > >  	}
> > > >
> > > >  	switch (inst->state) {
> > > > @@ -1619,7 +1659,9 @@ static void wave5_vpu_dec_device_run(void
> > > > *priv)
> > > >  		 * we had a chance to switch, which leads to an
> > > > invalid state
> > > >  		 * change.
> > > >  		 */
> > > > +		spin_lock_irqsave(&inst->state_spinlock, flags);
> > > >  		switch_state(inst, VPU_INST_STATE_PIC_RUN);
> > > > +		spin_unlock_irqrestore(&inst->state_spinlock,
> > > > flags);
> > >
> > > Should be its own patch, and fixed everywhere. Add missing lockdep
> > > assert
> > > (if there is a spinlock version of it).
> > >
> > > >
> > > >  		/*
> > > >  		 * During DRC, the picture decoding remains
> > > > pending, so just
> > > leave
> > > > the job @@ -1634,12 +1676,14 @@ static void
> > > wave5_vpu_dec_device_run(void *priv)
> > > >  		ret = wave5_prepare_fb(inst);
> > > >  		if (ret) {
> > > >  			dev_warn(inst->dev->dev, "Framebuffer
> > > > preparation,
> > > fail: %d\n",
> > > > ret);
> > > > +			spin_lock_irqsave(&inst->state_spinlock,
> > > > flags);
> > > >  			switch_state(inst, VPU_INST_STATE_STOP);
> > > > +			spin_unlock_irqrestore(&inst-
> > > > >state_spinlock, flags);
> > >
> > > Same, this is separate fix.
> > >
> > > >  			break;
> > > >  		}
> > > >
> > > >  		if (q_status.instance_queue_count) {
> > > > -			dev_dbg(inst->dev->dev, "%s: leave with
> > > > active job",
> > > __func__);
> > > > +			v4l2_m2m_job_finish(inst->v4l2_m2m_dev,
> > > > m2m_ctx);
> > > >  			return;
> > > >  		}
> > > >
> > > > @@ -1650,14 +1694,22 @@ static void wave5_vpu_dec_device_run(void
> > > > *priv)
> > > >  			dev_err(inst->dev->dev,
> > > >  				"Frame decoding on m2m context
> > > > (%p), fail: %d
> > > (result: %d)\n",
> > > >  				m2m_ctx, ret, fail_res);
> > > > -			break;
> > > > +			goto finish_job_and_return;
> > > > +		}
> > > > +
> > > > +		if (fail_res == WAVE5_SYSERR_QUEUEING_FAIL) {
> > > > +			inst->retry = true;
> > > > +		} else {
> > > > +			inst->retry = false;
> > > > +			if (!inst->eos)
> > > > +				inst->queuing_num--;
> > > >  		}
> > > > -		/* Return so that we leave this job active */
> > > > -		dev_dbg(inst->dev->dev, "%s: leave with active
> > > > job",
> > > __func__);
> > > > -		return;
> > > > -	default:
> > > > -		WARN(1, "Execution of a job in state %s
> > > > illegal.\n",
> > > state_to_str(inst->state));
> > > >  		break;
> > > > +	default:
> > > > +		if (!v4l2_m2m_has_stopped(m2m_ctx))
> > > > +			WARN(1, "Execution of a job in state %s
> > > > illegal.\n",
> > > > +			     state_to_str(inst->state));
> > > > +		return;
> > > >  	}
> > > >
> > > >  finish_job_and_return:
> > > > @@ -1676,10 +1728,7 @@ static void wave5_vpu_dec_job_abort(void
> > > > *priv)
> > > >  	if (ret)
> > > >  		return;
> > > >
> > > > -	ret = wave5_vpu_dec_set_eos_on_firmware(inst);
> > > > -	if (ret)
> > > > -		dev_warn(inst->dev->dev,
> > > > -			 "Setting EOS for the bitstream, fail:
> > > > %d\n", ret);
> > > > +	v4l2_m2m_job_finish(inst->v4l2_m2m_dev, inst-
> > > > >v4l2_fh.m2m_ctx);
> > > >  }
> > > >
> > > >  static int wave5_vpu_dec_job_ready(void *priv) @@ -1755,6
> > > > +1804,8 @@
> > > > static int wave5_vpu_open_dec(struct file *filp)
> > > >  	inst->ops = &wave5_vpu_dec_inst_ops;
> > > >
> > > >  	spin_lock_init(&inst->state_spinlock);
> > > > +	mutex_init(&inst->feed_lock);
> > > > +	INIT_LIST_HEAD(&inst->avail_src_bufs);
> > > >
> > > >  	inst->codec_info = kzalloc(sizeof(*inst->codec_info),
> > > > GFP_KERNEL);
> > > >  	if (!inst->codec_info)
> > > > @@ -1830,9 +1881,6 @@ static int wave5_vpu_open_dec(struct file
> > > > *filp)
> > > >  	if (ret)
> > > >  		goto cleanup_inst;
> > > >
> > > > -	if (list_empty(&dev->instances))
> > > > -		pm_runtime_use_autosuspend(inst->dev->dev);
> > > > -
> > > >  	list_add_tail(&inst->list, &dev->instances);
> > > >
> > > >  	mutex_unlock(&dev->dev_lock);
> > > > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-
> > > > enc.c
> > > > b/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
> > > > index 1e5fc5f8b856..cf20f774ed1b 100644
> > > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
> > > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
> > > > @@ -1774,9 +1774,6 @@ static int wave5_vpu_open_enc(struct file
> > > > *filp)
> > > >  	if (ret)
> > > >  		goto cleanup_inst;
> > > >
> > > > -	if (list_empty(&dev->instances))
> > > > -		pm_runtime_use_autosuspend(inst->dev->dev);
> > > > -
> > > >  	list_add_tail(&inst->list, &dev->instances);
> > > >
> > > >  	mutex_unlock(&dev->dev_lock);
> > > > diff --git a/drivers/media/platform/chips-media/wave5/wave5-
> > > > vpuapi.c
> > > > b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> > > > index e5e879a13e8b..c1737fac6edd 100644
> > > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> > > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> > > > @@ -207,8 +207,6 @@ int wave5_vpu_dec_close(struct vpu_instance
> > > > *inst,
> > > u32 *fail_res)
> > > >  	int retry = 0;
> > > >  	struct vpu_device *vpu_dev = inst->dev;
> > > >  	int i;
> > > > -	int inst_count = 0;
> > > > -	struct vpu_instance *inst_elm;
> > > >
> > > >  	*fail_res = 0;
> > > >  	if (!inst->codec_info)
> > > > @@ -233,6 +231,16 @@ int wave5_vpu_dec_close(struct vpu_instance
> > > > *inst,
> > > u32 *fail_res)
> > > >  		    retry++ >= MAX_FIRMWARE_CALL_RETRY) {
> > > >  			ret = -ETIMEDOUT;
> > > >  			goto unlock_and_return;
> > > > +		} else if (*fail_res ==
> > > > WAVE5_SYSERR_VPU_STILL_RUNNING) {
> > > > +			struct dec_output_info dec_info;
> > > > +
> > > > +			mutex_unlock(&vpu_dev->hw_lock);
> > > > +			wave5_vpu_dec_get_output_info(inst,
> > > > &dec_info);
> > > > +			ret = mutex_lock_interruptible(&vpu_dev-
> > > > >hw_lock);
> > > > +			if (ret) {
> > > > +				pm_runtime_put_sync(inst->dev-
> > > > >dev);
> > > > +				return ret;
> > > > +			}
> > >
> > > Please try and simplify all the branches. Something like:
> > >
> > > 			if (ret == 0)
> > > 				break;
> > >
> > > 			if (*fail_res !=
> > > WAVE5_SYSERR_VPU_STILL_RUNNING) {
> > > 					dev_warn(inst->dev->dev,
> > > "dec_finish_seq
> > > timed out\n");
> > > 					goto unlock_and_return;
> > > 			}
> > >
> > > 			if (retry++ >= MAX_FIRMWARE_CALL_RETRY) {
> > > 					ret = -ETIMEDOUT;
> > > 					goto unlock_and_return;
> > > 			}
> > >
> > > 			mutex_unlock(&vpu_dev->hw_lock);
> > > 			wave5_vpu_dec_get_output_info(inst,
> > > &dec_info);
> > > 			ret = mutex_lock_interruptible(&vpu_dev-
> > > >hw_lock);
> > > 			if (ret) {
> > > 				pm_runtime_put_sync(inst->dev-
> > > >dev);
> > > 				return ret;
> > > 			}
> > >
> > > My impression on that change is that it is independent and could be
> > > introduced separately.
> > >
> > > >  		}
> > > >  	} while (ret != 0);
> > > >
> > > > @@ -249,11 +257,7 @@ int wave5_vpu_dec_close(struct vpu_instance
> > > > *inst,
> > > u32 *fail_res)
> > > >  	}
> > > >
> > > >  	wave5_vdi_free_dma_memory(vpu_dev, &p_dec_info-
> > > > >vb_task);
> > > > -
> > > > -	list_for_each_entry(inst_elm, &vpu_dev->instances, list)
> > > > -		inst_count++;
> > > > -	if (inst_count == 1)
> > > > -		pm_runtime_dont_use_autosuspend(vpu_dev->dev);
> > >
> > > This seems like some other fix, that should be in its own commit.
> > >
> > > > +	mutex_destroy(&inst->feed_lock);
> > > >
> > > >  unlock_and_return:
> > > >  	mutex_unlock(&vpu_dev->hw_lock);
> > > > @@ -720,8 +724,6 @@ int wave5_vpu_enc_close(struct vpu_instance
> > > > *inst,
> > > u32 *fail_res)
> > > >  	int ret;
> > > >  	int retry = 0;
> > > >  	struct vpu_device *vpu_dev = inst->dev;
> > > > -	int inst_count = 0;
> > > > -	struct vpu_instance *inst_elm;
> > > >
> > > >  	*fail_res = 0;
> > > >  	if (!inst->codec_info)
> > > > @@ -765,11 +767,6 @@ int wave5_vpu_enc_close(struct vpu_instance
> > > > *inst, u32 *fail_res)
> > > >
> > > >  	wave5_vdi_free_dma_memory(vpu_dev, &p_enc_info-
> > > > >vb_task);
> > > >
> > > > -	list_for_each_entry(inst_elm, &vpu_dev->instances, list)
> > > > -		inst_count++;
> > > > -	if (inst_count == 1)
> > > > -		pm_runtime_dont_use_autosuspend(vpu_dev->dev);
> > > > -
> > >
> > > Same.
> > >
> > > >  	mutex_unlock(&vpu_dev->hw_lock);
> > > >  	pm_runtime_put_sync(inst->dev->dev);
> > > >
> > > > diff --git a/drivers/media/platform/chips-media/wave5/wave5-
> > > > vpuapi.h
> > > > b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > > > index 45615c15beca..6ca1ddc67c64 100644
> > > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > > > @@ -163,7 +163,7 @@ enum set_param_option {
> > > >
> > > >  #define BUFFER_MARGIN				4096
> > > >
> > > > -#define MAX_FIRMWARE_CALL_RETRY			10
> > > > +#define MAX_FIRMWARE_CALL_RETRY			30
> > >
> > > I thought 10s was already very long, am I right this means it can
> > > get
> > > stuck for 30s now ?
> > >
> > > >
> > > >  #define VDI_LITTLE_ENDIAN	0x0
> > > >
> > > > @@ -812,6 +812,9 @@ struct vpu_instance {
> > > >  	bool cbcr_interleave;
> > > >  	bool nv21;
> > > >  	bool eos;
> > > > +	bool retry;
> > > > +	int queuing_num;
> > >
> > > Some comment would be helpful. What does "retry" stand for, and how
> > > is
> > > queing_num different from v4l2_m2m_num_src_bufs_ready() ?
> > >
> > > > +	struct mutex feed_lock; /* lock for feeding bitstream
> > > > buffers */
> > > >  	struct vpu_buf bitstream_vbuf;
> > > >  	dma_addr_t last_rd_ptr;
> > > >  	size_t remaining_consumed_bytes;
> > >
> > > --
> > > Nicolas Dufresne
> > > Principal Engineer at Collabora
> >
> > Thanks
> > Jackson
> 
> --
> Nicolas Dufresne
> Principal Engineer at Collabora

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ