linux-kernel - Re: [PATCH v4 07/13] firmware: arm_scmi: Add notification dispatch and delivery

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d37e86bb-3ddd-2c61-bbbe-ebb5cc4801dc@arm.com>
Date:   Mon, 23 Mar 2020 08:28:13 +0000
From:   Cristian Marussi <cristian.marussi@....com>
To:     Lukasz Luba <lukasz.luba@....com>, linux-kernel@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org, sudeep.holla@....com,
        james.quinlan@...adcom.com, Jonathan.Cameron@...wei.com
Subject: Re: [PATCH v4 07/13] firmware: arm_scmi: Add notification dispatch
 and delivery

Hi

On 3/18/20 8:26 AM, Lukasz Luba wrote:
> Hi Cristian,
> 
> On 3/16/20 2:46 PM, Cristian Marussi wrote:
>> On Thu, Mar 12, 2020 at 09:43:31PM +0000, Lukasz Luba wrote:
>>>
>>>
>>> On 3/12/20 6:34 PM, Cristian Marussi wrote:
>>>> On 12/03/2020 13:51, Lukasz Luba wrote:
>>>>> Hi Cristian,
>>>>>
>> Hi Lukasz
>>
>>>>> just one comment below...
>> [snip]
>>>>>> +    eh.timestamp = ts;
>>>>>> +    eh.evt_id = evt_id;
>>>>>> +    eh.payld_sz = len;
>>>>>> +    kfifo_in(&r_evt->proto->equeue.kfifo, &eh, sizeof(eh));
>>>>>> +    kfifo_in(&r_evt->proto->equeue.kfifo, buf, len);
>>>>>> +    queue_work(r_evt->proto->equeue.wq,
>>>>>> +           &r_evt->proto->equeue.notify_work);
>>>>>
>>>>> Is it safe to ignore the return value from the queue_work here?
>>>>>
>>>>
[snip]

>> On the other side considering the impact of such scenario, I can imagine that
>> it's not simply that we could only have a delayed delivery, but we must consider
>> that if the delayed event is effectively the last one ever it would remain
>> undelivered forever; this is particularly worrying in a scenario in which such
>> last event is particularly important: imagine a system shutdown where a last
>> system-power-off remains undelivered.
> 
> Agree, another example could be a thermal notification for some critical
> trip point.
> 
>>
>> As a consequence I think this rare racy condition should be addressed somehow.
>>
>> Looking at this scenario, it seems the classic situation in which you want to
>> use some sort of completion to avoid missing out on events delivery, BUT in our
>> usecase:
>>
>> - placing the workers loaned from cmwq into an unbounded wait_for_completion()
>>    once the queue is empty seems not the best to use resources (and probably
>>    frowned upon)....using a few dedicated kernel threads to simply let them idle
>>    waiting most of the time seems equally frowned upon (I could be wrong...))
>> - the needed complete() in the ISR would introduce a spinlock_irqsave into the
>>    interrupt path (there's already one inside queue_work in fact) so it is not
>>    desirable, at least not if used on a regular base (for each event notified)
>>
>> So I was thinking to try to reduce sensibly the above race window, more
>> than eliminate it completely, by adding an early flag to be checked under
>> specific conditions in order to retry the queue_work a few times when the race
>> is hit, something like:
>>
>> ISR (core N)        |    WQ (core N+1)
>> -------------------------------------------------------------------------------
>>             | atomic_set(&exiting, 0);
>>             |
>>             | do {
>>             |    ...
>>             |     if (queue_is_empty)        - WORK_PENDING        0 events queued
>>             +          atomic_set(&exiting, 1)    - WORK_PENDING        0 events queued
>> static int cnt=3    |          --> breakout of while    - WORK_PENDING        0 events queued
>> kfifo_in()        |    ....
>>             | } while (scmi_process_event_payload);
>> kfifo_in()        |
>> exiting = atomic_read()    |     ...cmwq backing out        - WORK_PENDING        1 events queued
>> do {            |     ...cmwq backing out        - WORK_PENDING        1 events queued
>>      ret = queue_work()     |     ...cmwq backing out        - WORK_PENDING        1 events queued
>>      if (ret || !exiting)|     ...cmwq backing out        - WORK_PENDING        1 events queued
>>     break;        |     ...cmwq backing out        - WORK_PENDING        1 events queued
>>      mdelay(5);        |     ...cmwq backing out        - WORK_PENDING        1 events queued
>>      exiting =        |     ...cmwq backing out        - WORK_PENDING        1 events queued
>>        atomic_read;    |     ...cmwq backing out        - WORK_PENDING        1 events queued
>> } while (--cnt);    |     ...cmwq backing out        - WORK_PENDING        1 events queued
>>             | ---- WORKER EXIT             - !WORK_PENDING        0 events queued
>>
>> like down below between the scissors.
>>
>> Not tested or tried....I could be missing something...and the mdelay is horrible (and not
>> the cleanest thing you've ever seen probably :D)...I'll have a chat with Sudeep too.
> 
> Indeed it looks more complicated. If you like I can join your offline
> discuss when Sudeep is back.
> 
Yes this is as of now my main remaining issue to address for v6.
I'll wait for Sudeep general review/feedback and raise this point.

Regards

Cristian

> Regards,
> Lukasz