[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <53dd18ec-9a65-4bf7-8490-ca3eb56ce2a5@quicinc.com>
Date: Tue, 22 Jul 2025 17:50:08 +0530
From: Jyothi Kumar Seerapu <quic_jseerapu@...cinc.com>
To: Dmitry Baryshkov <dmitry.baryshkov@....qualcomm.com>
CC: Vinod Koul <vkoul@...nel.org>,
Mukesh Kumar Savaliya
<quic_msavaliy@...cinc.com>,
Viken Dadhaniya <quic_vdadhani@...cinc.com>,
Andi Shyti <andi.shyti@...nel.org>,
Sumit Semwal <sumit.semwal@...aro.org>,
Christian König <christian.koenig@....com>,
<linux-arm-msm@...r.kernel.org>, <dmaengine@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <linux-i2c@...r.kernel.org>,
<linux-media@...r.kernel.org>, <dri-devel@...ts.freedesktop.org>,
<linaro-mm-sig@...ts.linaro.org>, <quic_vtanuku@...cinc.com>
Subject: Re: [PATCH v6 2/2] i2c: i2c-qcom-geni: Add Block event interrupt
support
On 7/19/2025 3:27 PM, Dmitry Baryshkov wrote:
> On Mon, Jul 07, 2025 at 09:58:30PM +0530, Jyothi Kumar Seerapu wrote:
>>
>>
>> On 7/4/2025 1:11 AM, Dmitry Baryshkov wrote:
>>> On Thu, 3 Jul 2025 at 15:51, Jyothi Kumar Seerapu
>>> <quic_jseerapu@...cinc.com> wrote:
>>>>
>>>>
>>>>
>>>> On 6/19/2025 9:46 PM, Jyothi Kumar Seerapu wrote:
>>>>>
>>>>>
>>>>> On 6/18/2025 1:02 AM, Dmitry Baryshkov wrote:
>>>>>> On Tue, 17 Jun 2025 at 17:11, Jyothi Kumar Seerapu
>>>>>> <quic_jseerapu@...cinc.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 5/30/2025 10:12 PM, Dmitry Baryshkov wrote:
>>>>>>>> On Fri, May 30, 2025 at 07:36:05PM +0530, Jyothi Kumar Seerapu wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 5/21/2025 6:15 PM, Dmitry Baryshkov wrote:
>>>>>>>>>> On Wed, May 21, 2025 at 03:58:48PM +0530, Jyothi Kumar Seerapu wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 5/9/2025 9:31 PM, Dmitry Baryshkov wrote:
>>>>>>>>>>>> On 09/05/2025 09:18, Jyothi Kumar Seerapu wrote:
>>>>>>>>>>>>> Hi Dimitry, Thanks for providing the review comments.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 5/6/2025 5:16 PM, Dmitry Baryshkov wrote:
>>>>>>>>>>>>>> On Tue, May 06, 2025 at 04:48:44PM +0530, Jyothi Kumar Seerapu
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> The I2C driver gets an interrupt upon transfer completion.
>>>>>>>>>>>>>>> When handling multiple messages in a single transfer, this
>>>>>>>>>>>>>>> results in N interrupts for N messages, leading to significant
>>>>>>>>>>>>>>> software interrupt latency.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> To mitigate this latency, utilize Block Event Interrupt (BEI)
>>>>>>>>>>>>>>> mechanism. Enabling BEI instructs the hardware to prevent
>>>>>>>>>>>>>>> interrupt
>>>>>>>>>>>>>>> generation and BEI is disabled when an interrupt is necessary.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Large I2C transfer can be divided into chunks of 8 messages
>>>>>>>>>>>>>>> internally.
>>>>>>>>>>>>>>> Interrupts are not expected for the first 7 message
>>>>>>>>>>>>>>> completions, only
>>>>>>>>>>>>>>> the last message triggers an interrupt, indicating the
>>>>>>>>>>>>>>> completion of
>>>>>>>>>>>>>>> 8 messages. This BEI mechanism enhances overall transfer
>>>>>>>>>>>>>>> efficiency.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Why do you need this complexity? Is it possible to set the
>>>>>>>>>>>>>> DMA_PREP_INTERRUPT flag on the last message in the transfer?
>>>>>>>>>>>>>
>>>>>>>>>>>>> If i undertsand correctly, the suggestion is to get the single
>>>>>>>>>>>>> intetrrupt for last i2c message only.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But With this approach, we can't handle large number of i2c
>>>>>>>>>>>>> messages
>>>>>>>>>>>>> in the transfer.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In GPI driver, number of max TREs support is harcoded to 64
>>>>>>>>>>>>> (#define
>>>>>>>>>>>>> CHAN_TRES 64) and for I2C message, we need Config TRE, GO TRE
>>>>>>>>>>>>> and
>>>>>>>>>>>>> DMA TREs. So, the avilable TREs are not sufficient to handle
>>>>>>>>>>>>> all the
>>>>>>>>>>>>> N messages.
>>>>>>>>>>>>
>>>>>>>>>>>> It sounds like a DMA driver issue. In other words, the DMA
>>>>>>>>>>>> driver can
>>>>>>>>>>>> know that it must issue an interrupt before exausting 64 TREs in
>>>>>>>>>>>> order
>>>>>>>>>>>> to
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here, the plan is to queue i2c messages (QCOM_I2C_GPI_MAX_NUM_MSGS
>>>>>>>>>>>>> or 'num' incase for less messsages), process and unmap/free
>>>>>>>>>>>>> upon the
>>>>>>>>>>>>> interrupt based on QCOM_I2C_GPI_NUM_MSGS_PER_IRQ.
>>>>>>>>>>>>
>>>>>>>>>>>> Why? This is some random value which has no connection with
>>>>>>>>>>>> CHAN_TREs.
>>>>>>>>>>>> Also, what if one of the platforms get a 'liter' GPI which
>>>>>>>>>>>> supports less
>>>>>>>>>>>> TREs in a single run? Or a super-premium platform which can use 256
>>>>>>>>>>>> TREs? Please don't workaround issues from one driver in another
>>>>>>>>>>>> one.
>>>>>>>>>>>
>>>>>>>>>>> We are trying to utilize the existing CHAN_TRES mentioned in the
>>>>>>>>>>> GPI driver.
>>>>>>>>>>> With the following approach, the GPI hardware can process N
>>>>>>>>>>> number of I2C
>>>>>>>>>>> messages, thereby improving throughput and transfer efficiency.
>>>>>>>>>>>
>>>>>>>>>>> The main design consideration for using the block event interrupt
>>>>>>>>>>> is as
>>>>>>>>>>> follows:
>>>>>>>>>>>
>>>>>>>>>>> Allow the hardware to process the TREs (I2C messages), while the
>>>>>>>>>>> software
>>>>>>>>>>> concurrently prepares the next set of TREs to be submitted to the
>>>>>>>>>>> hardware.
>>>>>>>>>>> Once the TREs are processed, they can be freed, enabling the
>>>>>>>>>>> software to
>>>>>>>>>>> queue new TREs. This approach enhances overall optimization.
>>>>>>>>>>>
>>>>>>>>>>> Please let me know if you have any questions, concerns, or
>>>>>>>>>>> suggestions.
>>>>>>>>>>
>>>>>>>>>> The question was why do you limit that to
>>>>>>>>>> QCOM_I2C_GPI_NUM_MSGS_PER_IRQ.
>>>>>>>>>> What is the reason for that limit, etc. If you think about it, The
>>>>>>>>>> GENI
>>>>>>>>>> / I2C doesn't impose any limit on the number of messages processed in
>>>>>>>>>> one go (if I understand it correctly). Instead the limit comes
>>>>>>>>>> from the
>>>>>>>>>> GPI DMA driver. As such, please don't add extra 'handling' to the I2C
>>>>>>>>>> driver. Make GPI DMA driver responsible for saying 'no more for now',
>>>>>>>>>> then I2C driver can setup add an interrupt flag and proceed with
>>>>>>>>>> submitting next messages, etc.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> For I2C messages, we need to prepare TREs for Config, Go and DMAs.
>>>>>>>>> However,
>>>>>>>>> if a large number of I2C messages are submitted then may may run
>>>>>>>>> out of
>>>>>>>>> memory for serving the TREs. The GPI channel supports a maximum of
>>>>>>>>> 64 TREs,
>>>>>>>>> which is insufficient to serve 32 or even 16 I2C messages
>>>>>>>>> concurrently,
>>>>>>>>> given the multiple TREs required per message.
>>>>>>>>>
>>>>>>>>> To address this limitation, a strategy has been implemented to
>>>>>>>>> manage how
>>>>>>>>> many messages can be queued and how memory is recycled. The constant
>>>>>>>>> QCOM_I2C_GPI_MAX_NUM_MSGS is set to 16, defining the upper limit of
>>>>>>>>> messages that can be queued at once. Additionally,
>>>>>>>>> QCOM_I2C_GPI_NUM_MSGS_PER_IRQ is set to 8, meaning that
>>>>>>>>> half of the queued messages are expected to be freed or deallocated
>>>>>>>>> per
>>>>>>>>> interrupt.
>>>>>>>>> This approach ensures that the driver can efficiently manage TRE
>>>>>>>>> resources
>>>>>>>>> and continue queuing new I2C messages without exhausting memory.
>>>>>>>>>> I really don't see a reason for additional complicated handling in
>>>>>>>>>> the
>>>>>>>>>> geni driver that you've implemented. Maybe I misunderstand
>>>>>>>>>> something. In
>>>>>>>>>> such a case it usually means that you have to explain the design
>>>>>>>>>> in the
>>>>>>>>>> commit message / in-code comments.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The I2C Geni driver is designed to prepare and submit descriptors
>>>>>>>>> to the GPI
>>>>>>>>> driver one message at a time.
>>>>>>>>> As a result, the GPI driver does not have visibility into the current
>>>>>>>>> message index or the total number of I2C messages in a transfer.
>>>>>>>>> This lack
>>>>>>>>> of context makes it challenging to determine when to set the block
>>>>>>>>> event
>>>>>>>>> interrupt, which is typically used to signal the completion of a
>>>>>>>>> batch of
>>>>>>>>> messages.
>>>>>>>>>
>>>>>>>>> So, the responsibility for deciding when to set the BEI should lie
>>>>>>>>> with the
>>>>>>>>> I2C driver.
>>>>>>>>>
>>>>>>>>> If this approach is acceptable, I will proceed with updating the
>>>>>>>>> relevant
>>>>>>>>> details in the commit message.
>>>>>>>>>
>>>>>>>>> Please let me know if you have any concerns or suggestions.
>>>>>>>>
>>>>>>> Hi Dmitry, Sorry for the delayed response, and thank you for the
>>>>>>> suggestions.
>>>>>>>
>>>>>>>> - Make gpi_prep_slave_sg() return NULL if flags don't have
>>>>>>>> DMA_PREP_INTERRUPT flag and there are no 3 empty TREs for the
>>>>>>>> interrupt-enabled transfer.
>>>>>>> "there are no 3 empty TREs for the interrupt-enabled transfer."
>>>>>>> Could you please help me understand this a bit better?
>>>>>>
>>>>>> In the GPI driver you know how many TREs are available. In
>>>>>> gpi_prep_slave_sg() you can check that and return an error if there
>>>>>> are not enough TREs available.
>>>>>>
>>>>>>>>
>>>>>>>> - If I2C driver gets NULL from dmaengine_prep_slave_single(), retry
>>>>>>>> again, adding DMA_PREP_INTERRUPT. Make sure that the last one
>>>>>>>> always
>>>>>>>> gets DMA_PREP_INTERRUPT.
>>>>>>> Does this mean we need to proceed to the next I2C message and ensure
>>>>>>> that the DMA_PREP_INTERRUPT flag is set for the last I2C message in each
>>>>>>> chunk? And then, should we submit the chunk of messages to the GSI
>>>>>>> hardware for processing?
>>>>>>
>>>>>> No. You don't have to peek at the next I2C message. This all concerns
>>>>>> the current I2C message. The only point where you have to worry is to
>>>>>> explicitly set the flag for the last message.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> - In geni_i2c_gpi_xfer() split the loop to submit messages until you
>>>>>>>> can, then call wait_for_completion_timeout() and then
>>>>>>>> geni_i2c_gpi_unmap() for submitted messages, then continue with
>>>>>>>> a new
>>>>>>>> portion of messages.
>>>>>>> Since the GPI channel supports a maximum of 64 TREs, should we consider
>>>>>>> submitting a smaller number of predefined messages — perhaps fewer than
>>>>>>> 32, such as 16?
>>>>>>
>>>>>> Why? Just submit messages until they fit, then flush the DMA async
>>>>>> channel.
>>>>>>
>>>>>>> This is because handling 32 messages would require one TRE for config
>>>>>>> and 64 TREs for the Go and DMA preparation steps, which exceeds the
>>>>>>> channel's TRE capacity of 64.
>>>>>>>
>>>>>>> We designed the approach to submit a portion of the messages — for
>>>>>>> example, 16 at a time. Once 8 messages are processed and freed, the
>>>>>>> hardware can continue processing the TREs, while the software
>>>>>>> simultaneously prepares the next set of TREs. This parallelism helps in
>>>>>>> efficiently utilizing the hardware and enhances overall system
>>>>>>> optimization.
>>>>>>
>>>>>>
>>>>>> And this overcomplicates the driver and introduces artificial
>>>>>> limitations which need explanation. Please fix it in a simple way
>>>>>> first. Then you can e.g. implement the watermark at the half of the
>>>>>> GPI channel depth and request DMA_PREP_INTERRUPT to be set in the
>>>>>> middle of the full sequence, allowing it to be used asynchronously in
>>>>>> the background.
>>>>>>
>>>>>
>>>>> Okay, will review it. Thanks.
>>>>>
>>>>>
>>>>
>>>> Hi Dmitry,
>>>>
>>>> Can you please check and confirm the approach to follow is something
>>>> like the pseudo code mentioned below:
>>>
>>> Yes, this is what I've had in mind.
>>
>> So, Apart from the changes related to "submitting I2C messages until they
>> fit" and "unmapping all processed I2C messages together", the rest of the
>> code looks remains the same as in the v6 patch ?
>> Also, in the GPI driver, we need to add logic to retrieve the number of
>> available TREs.
>>
>> I have a concern regarding throughput and achieving parallelism between
>> software and hardware processing with this new approach. Since we need to
>> unmap all processed messages together, the software cannot queue the next
>> set of TREs while the hardware is still processing the current ones.
>
> Does that warrant the over-complexity of the driver or close-coupling of
> I2C and GPE drivers?
>
> The I2C is a slow bus and it is not expected to be used for
> high-throughput data.
>
The block event interrupt and multi-descriptor handling are primarily
added to support a camera use case, where multiple registers are need to
be configured in a single I2C transfer with 200 or more i2c messages.
This enhancement is expected to improve throughput and meet performance
KPIs.
>>
>> As I mentioned earlier, the previous approach allowed partial unmapping
>> where half of the messages processed by the hardware could be
>> freed/unmapped. This enabled the hardware to continue processing the
>> remaining TREs while the software simultaneously prepared the next batch.
>> This parallelism helped in better hardware utilization and improved overall
>> system performance.
>
> Measurements / values / impact?
>
>>
>> Could you please confirm if can go with the similar approach of unmap the
>> processed TREs based on a fixed threshold or constant value, instead of
>> unmapping them all at once?
>
> I'd still say, that's a bad idea. Please stay within the boundaries of
> the DMA API.
>
I agree with the approach you suggested—it's the GPI's responsibility to
manage the available TREs.
However, I'm curious whether can we set a dynamic watermark value
perhaps half the available TREs) to trigger unmapping of processed TREs
? This would allow the software to prepare the next set of TREs while
the hardware continues processing the remaining ones, enabling better
parallelism and throughput.
>>>
>>>>
>>>> GPI driver:
>>>> In gpi_prep_slave_sg() function,
>>>>
>>>> if (!(flags & DMA_PREP_INTERRUPT) && !gpi_available_tres(chan))
>>>> return NULL;
>>>>
>>>>
>>>> I2C GENI driver:
>>>>
>>>> for (i = 0; i < num; i++)
>>>> {
>>>> /* Always set interrupt for the last message */
>>>> if (i == num_msgs - 1)
>>>> flags |= DMA_PREP_INTERRUPT;
>>>>
>>>>
>>>> desc = dmaengine_prep_slave_single(chan, dma_addr, len, dir, flags);
>>>> if (!desc && !(flags & DMA_PREP_INTERRUPT)) {
>>>> /* Retry with interrupt if not enough TREs */
>>>> flags |= DMA_PREP_INTERRUPT;
>>>> desc = dmaengine_prep_slave_single(chan, dma_addr, len, dir, flags);
>>>> }
>>>>
>>>>
>>>> if (!desc)
>>>> break;
>>>>
>>>>
>>>> dmaengine_submit(desc);
>>>> msg_idx++;
>>>> }
>>>>
>>>> dma_async_issue_pending(chan));
>>>>
>>>> time_left = wait_for_completion_timeout(&gi2c->done, XFER_TIMEOUT);
>>>> if (!time_left)
>>>> return -ETIMEDOUT;
>>>>
>>>> Now Invoke "geni_i2c_gpi_unmap" for unmapping all submitted I2C messages.
>>>>
>>>>
>>>> Thanks,
>>>> JyothiKumar
>>>>
>>>>
>>>>
>>>
>>>
>>
>
Powered by blists - more mailing lists