[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bd7imlksghy5iigc75ob3mghm5zqvn5wrotj5fez3hahq2zv6x@l2bx6mbnijgp>
Date: Thu, 27 Jun 2024 23:14:47 +0300
From: Dmitry Baryshkov <dmitry.baryshkov@...aro.org>
To: Fabrice Gasnier <fabrice.gasnier@...s.st.com>
Cc: Heikki Krogerus <heikki.krogerus@...ux.intel.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Maxime Coquelin <mcoquelin.stm32@...il.com>,
Alexandre Torgue <alexandre.torgue@...s.st.com>, Neil Armstrong <neil.armstrong@...aro.org>,
linux-usb@...r.kernel.org, linux-kernel@...r.kernel.org, Nikita Travkin <nikita@...n.ru>,
linux-stm32@...md-mailman.stormreply.com, linux-arm-kernel@...ts.infradead.org
Subject: Re: [Linux-stm32] [PATCH v2 6/7] usb: typec: ucsi: extract common
code for command handling
On Thu, Jun 27, 2024 at 06:49:02PM GMT, Fabrice Gasnier wrote:
> On 6/27/24 17:58, Dmitry Baryshkov wrote:
> > On Thu, 27 Jun 2024 at 18:51, Fabrice Gasnier
> > <fabrice.gasnier@...s.st.com> wrote:
> >>
> >> On 6/25/24 18:49, Dmitry Baryshkov wrote:
> >>> On Tue, Jun 25, 2024 at 05:24:54PM GMT, Fabrice Gasnier wrote:
> >>>> On 6/21/24 00:55, Dmitry Baryshkov wrote:
> >>>>> Extract common functions to handle command sending and to handle events
> >>>>> from UCSI. This ensures that all UCSI glue drivers handle the ACKs in
> >>>>> the same way.
> >>>>>
> >>>>> The CCG driver used DEV_CMD_PENDING both for internal
> >>>>> firmware-related commands and for UCSI control handling. Leave the
> >>>>> former use case intact.
> >>>>>
> >>>>> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@...aro.org>
> >>>>> ---
> >>>>> drivers/usb/typec/ucsi/ucsi.c | 43 +++++++++++++++++++++++++++
> >>>>> drivers/usb/typec/ucsi/ucsi.h | 7 +++++
> >>>>> drivers/usb/typec/ucsi/ucsi_acpi.c | 46 ++---------------------------
> >>>>> drivers/usb/typec/ucsi/ucsi_ccg.c | 21 ++-----------
> >>>>> drivers/usb/typec/ucsi/ucsi_glink.c | 47 ++---------------------------
> >>>>> drivers/usb/typec/ucsi/ucsi_stm32g0.c | 44 ++--------------------------
> >>>>> drivers/usb/typec/ucsi/ucsi_yoga_c630.c | 52 ++-------------------------------
> >>>>> 7 files changed, 62 insertions(+), 198 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c
> >>>>> index 4ba22323dbf9..691ee0c4ef87 100644
> >>>>> --- a/drivers/usb/typec/ucsi/ucsi.c
> >>>>> +++ b/drivers/usb/typec/ucsi/ucsi.c
> >>>>> @@ -36,6 +36,48 @@
> >>>>> */
> >>>>> #define UCSI_SWAP_TIMEOUT_MS 5000
> >>>>>
> >>>>> +void ucsi_notify_common(struct ucsi *ucsi, u32 cci)
> >>>>> +{
> >>>>> + if (UCSI_CCI_CONNECTOR(cci))
> >>>>> + ucsi_connector_change(ucsi, UCSI_CCI_CONNECTOR(cci));
> >>>>> +
> >>>>> + if (cci & UCSI_CCI_ACK_COMPLETE &&
> >>>>> + test_bit(ACK_PENDING, &ucsi->flags))
> >>>>> + complete(&ucsi->complete);
> >>>>> +
> >>>>> + if (cci & UCSI_CCI_COMMAND_COMPLETE &&
> >>>>> + test_bit(COMMAND_PENDING, &ucsi->flags))
> >>>>> + complete(&ucsi->complete);
> >>>>
> >>>> Hi Dmitry,
> >>>>
> >>>> I've recently faced some race with ucsi_stm32g0 driver, and have sent a
> >>>> fix for it [1], as you've noticed in the cover letter.
> >>>>
> >>>> To fix that, I've used test_and_clear_bit() in above two cases, instead
> >>>> of test_bit().
> >>>
> >>> Could you possible describe, why do you need test_and_clear_bit()
> >>> instead of just test_bit()? The bits are cleared at the end of the
> >>> .sync_write(), also there can be no other command (or ACK_CC) submission
> >>> before this one is fully processed.
> >>
> >> Hi Dmitry,
> >>
> >> It took me some time to reproduce this race I observed earlier.
> >> (I observe this during DR swap.)
> >>
> >> Once the ->async_control(UCSI_ACK_CC_CI) call bellow gets completed, and
> >> before the ACK_PENDING bit gets cleared, e.g. clear_bit(ACK_PENDING), I
> >> get an asynchronous interrupt.
> >>
> >> Basically, Then the above complete() gets called (due to
> >> UCSI_CCI_ACK_COMPLETE & ACK_PENDING).
> >>
> >> Subsequent UCSI_GET_CONNECTOR_STATUS command (from
> >> ucsi_handle_connector_change) will be unblocked immediately due to
> >> complete() call has already happen, without UCSI_CCI_COMMAND_COMPLETE
> >> cci flag, hence returning -EIO.
> >
> > But the ACK_CI is being sent as a response to a command. This means
> > that the ppm_lock should be locked. The UCSI_GET_CONNECTOR_STATUS
> > command should wait for ppm_lock to be freed and only then it can
> > proceed with sending the command. Maybe I'm misunderstanding the case
> > or maybe there is a loophole somewhere.
>
> There's probably also some miss-understanding at my end, I don't get how
> the ppm_lock would protext from non atomic async_control()/clear_bit().
>
> Let me share some trace here, I hope it can better show the difference
> at my end when I get the race.
> I've added some debug prints around these lines, to trace the set/clear
> of the COMMAND/ACK_PENDING, main commands and cci flags.
>
> normal case is:
> ---
> ucsi_send_command_common: UCSI_SET_UOR
> set:COMMAND_PENDING
> ucsi_stm32g0_irq_handler
> ucsi_notify_common: UCSI_CCI_COMMAND_COMPLETE
> clr:COMMAND_PENDING
> ucsi_acknowledge: UCSI_ACK_COMMAND_COMPLETE UCSI_ACK_CONNECTOR_CHANGE
> set:ACK_PENDING
> ucsi_stm32g0_irq_handler
> ucsi_notify_common: UCSI_CCI_ACK_COMPLETE
> clr:ACK_PENDING
> ucsi_stm32g0_irq_handler
> ucsi_notify_common: ucsi_connector_change
> ucsi_send_command_common: UCSI_GET_CONNECTOR_STATUS
> set:COMMAND_PENDING
> ucsi_stm32g0_irq_handler
> ucsi_notify_common: UCSI_CCI_COMMAND_COMPLETE
> clr:COMMAND_PENDING
> ucsi_acknowledge: UCSI_ACK_COMMAND_COMPLETE
> set:ACK_PENDING
> ucsi_stm32g0_irq_handler
> ucsi_notify_common: UCSI_CCI_ACK_COMPLETE
> clr:ACK_PENDING
>
> racy case:
> ---
> ucsi_send_command_common: UCSI_SET_UOR
> set:COMMAND_PENDING
> ucsi_stm32g0_irq_handler
> ucsi_notify_common: UCSI_CCI_COMMAND_COMPLETE
> clr:COMMAND_PENDING
> ucsi_acknowledge: UCSI_ACK_COMMAND_COMPLETE UCSI_ACK_CONNECTOR_CHANGE
> set:ACK_PENDING
> ucsi_stm32g0_irq_handler <-- up to here nothing supicious
> ucsi_notify_common: ucsi_connector_change
> ucsi_notify_common: UCSI_CCI_ACK_COMPLETE
> ucsi_stm32g0_irq_handler <-- 2nd IRQ before clr:ACK_PENDING
> ucsi_notify_common: ucsi_connector_change
> ucsi_notify_common: UCSI_CCI_ACK_COMPLETE
> clr:ACK_PENDING
> ucsi_send_command_common: UCSI_GET_CONNECTOR_STATUS
> set:COMMAND_PENDING <-- completion already done :-(
> clr:COMMAND_PENDING
> ucsi_handle_connector_change: GET_CONNECTOR_STATUS failed (-5)
>
> I notice the two conditions are met above: both ucsi_connector_change()
> and cci ack completed.
>
> This happens before clear_bit(ACK_PENDING), which lead to anticipate
> completion of the subsequent UCSI_GET_CONNECTOR_STATUS command.
>
> So the test_and_clear_bit() would address a robustness case?
Ah, I see. So the race is for the completion. I fear that your solution
also doesn't fully solve the problem. The event can arrive after setting
the bit and before sending the command to the hardware. I thought about
doing various tricks, like reinit_completion or something close, but the
issue is that test_and_clear_bit() just makes the race window smaller,
but doesn't eliminate it completely.
It seems the only practical way to handle that is by making sure that
ucsi_notify_common() can sleep and locks the ppm_lock while
processing the CCI.
>
> Please advise,
> Best Regards,
> Fabrice
>
> >
> >> This is where the test_and_clear_bit() atomic operation helps, to avoid
> >> non atomic operation:
> >>
> >> -> async_control(UCSI_ACK_CC_CI)
> >> new interrupt may occur here
> >> -> clear_bit(ACK_PENDING)
--
With best wishes
Dmitry
Powered by blists - more mailing lists