linux-kernel - Re: [PATCH v2 6/7] mailbox: bcm-flexrm-mailbox: Set msg_queue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAALAos-VXg9XcNOr6OyiefdnL=oajEi3JGjoFSf74HEJgBiWBA@mail.gmail.com>
Date:   Fri, 28 Jul 2017 14:19:48 +0530
From:   Anup Patel <anup.patel@...adcom.com>
To:     Jassi Brar <jassisinghbrar@...il.com>
Cc:     Rob Herring <robh+dt@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will.deacon@....com>,
        Florian Fainelli <f.fainelli@...il.com>,
        Scott Branden <sbranden@...adcom.com>,
        Ray Jui <rjui@...adcom.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        Devicetree List <devicetree@...r.kernel.org>,
        BCM Kernel Feedback <bcm-kernel-feedback-list@...adcom.com>
Subject: Re: [PATCH v2 6/7] mailbox: bcm-flexrm-mailbox: Set msg_queue_len for
 each channel

On Thu, Jul 27, 2017 at 5:23 PM, Jassi Brar <jassisinghbrar@...il.com> wrote:
> On Thu, Jul 27, 2017 at 11:20 AM, Anup Patel <anup.patel@...adcom.com> wrote:
>> On Thu, Jul 27, 2017 at 10:29 AM, Jassi Brar <jassisinghbrar@...il.com> wrote:
>
>>>>>>>> Sorry for the delayed response...
>>>>>>>>
>>>>>>>> On Fri, Jul 21, 2017 at 9:16 PM, Jassi Brar <jassisinghbrar@...il.com> wrote:
>>>>>>>>> Hi Anup,
>>>>>>>>>
>>>>>>>>> On Fri, Jul 21, 2017 at 12:25 PM, Anup Patel <anup.patel@...adcom.com> wrote:
>>>>>>>>>> The Broadcom FlexRM ring (i.e. mailbox channel) can handle
>>>>>>>>>> larger number of messages queued in one FlexRM ring hence
>>>>>>>>>> this patch sets msg_queue_len for each mailbox channel to
>>>>>>>>>> be same as RING_MAX_REQ_COUNT.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Anup Patel <anup.patel@...adcom.com>
>>>>>>>>>> Reviewed-by: Scott Branden <scott.branden@...adcom.com>
>>>>>>>>>> ---
>>>>>>>>>>  drivers/mailbox/bcm-flexrm-mailbox.c | 5 ++++-
>>>>>>>>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/mailbox/bcm-flexrm-mailbox.c b/drivers/mailbox/bcm-flexrm-mailbox.c
>>>>>>>>>> index 9873818..20055a0 100644
>>>>>>>>>> --- a/drivers/mailbox/bcm-flexrm-mailbox.c
>>>>>>>>>> +++ b/drivers/mailbox/bcm-flexrm-mailbox.c
>>>>>>>>>> @@ -1683,8 +1683,11 @@ static int flexrm_mbox_probe(struct platform_device *pdev)
>>>>>>>>>>                 ret = -ENOMEM;
>>>>>>>>>>                 goto fail_free_debugfs_root;
>>>>>>>>>>         }
>>>>>>>>>> -       for (index = 0; index < mbox->num_rings; index++)
>>>>>>>>>> +       for (index = 0; index < mbox->num_rings; index++) {
>>>>>>>>>> +               mbox->controller.chans[index].msg_queue_len =
>>>>>>>>>> +                                               RING_MAX_REQ_COUNT;
>>>>>>>>>>                 mbox->controller.chans[index].con_priv = &mbox->rings[index];
>>>>>>>>>> +       }
>>>>>>>>>>
>>>>>>>>> While writing mailbox.c I wasn't unaware that there is the option to
>>>>>>>>> choose the queue length at runtime.
>>>>>>>>> The idea was to keep the code as simple as possible. I am open to
>>>>>>>>> making it a runtime thing, but first, please help me understand how
>>>>>>>>> that is useful here.
>>>>>>>>>
>>>>>>>>> I understand FlexRm has a ring buffer of RING_MAX_REQ_COUNT(1024)
>>>>>>>>> elements. Any message submitted to mailbox api can be immediately
>>>>>>>>> written onto the ringbuffer if there is some space.
>>>>>>>>> Is there any mechanism to report back to a client driver, if its
>>>>>>>>> message in ringbuffer failed "to be sent"?
>>>>>>>>> If there isn't any, then I think, in flexrm_last_tx_done() you should
>>>>>>>>> simply return true if there is some space left in the rung-buffer,
>>>>>>>>> false otherwise.
>>>>>>>>
>>>>>>>> Yes, we have error code in "struct brcm_message" to report back
>>>>>>>> errors from send_message. In our mailbox clients, we check
>>>>>>>> return value of mbox_send_message() and also the error code
>>>>>>>> in "struct brcm_message".
>>>>>>>>
>>>>>>> I meant after the message has been accepted in the ringbuffer but the
>>>>>>> remote failed to receive it.
>>>>>>
>>>>>> Yes, even this case is handled.
>>>>>>
>>>>>> In case of IO errors after message has been put in ring buffer, we get
>>>>>> completion message with error code and mailbox client drivers will
>>>>>> receive back "struct brcm_message" with error set.
>>>>>>
>>>>>> You can refer flexrm_process_completions() for more details.
>>>>>>
>>> It doesn't seem to be what I suggest. I see two issues in
>>> flexrm_process_completions()
>>> 1) It calls mbox_send_message(), which is a big NO for a controller
>>> driver. Why should you have one more message stored outside of
>>> ringbuffer?
>>
>> The "last_pending_msg" in each FlexRM ring was added to fit FlexRM
>> in Mailbox framework.
>>
>> We don't have any IRQ for TX done so "txdone_irq" out of the question for
>> FlexRM. We only have completions for both success or failures (IO errors).
>>
>> This means we have to use "txdone_poll" for FlexRM. For "txdone_poll",
>> we have to provide last_tx_done() callback. The last_tx_done() callback
>> is supposed to return true if last send_data() call succeeded.
>>
>> To implement last_tx_done() in FlexRM driver, we added "last_pending_msg".
>>
>> When "last_pending_msg" is NULL it means last call to send_data() succeeded
>> and when "last_pending_msg" is != NULL it means last call to send_data()
>> did not go through due to lack of space in FlexRM ring.
>>
> It could be simpler.
> Since flexrm_send_data() is essentially about putting the message in
> the ring-buffer (and not about _transmission_ failures), the
> last_tx_done() should simply return true if requests_ida has not all
> ids allocated. False otherwise.

It's not that simple because we have two cases in-which
send_data() will fail:
1. It run-out of IDs in requests_ida
2. There is no room in BD queue of FlexRM ring. This because each
brcm_message can be translated into variable number of descriptors.
In fact, using SPU2 crypto client we have one brcm_message translating
into 100's of descriptors. All-in-all few messages (< 1024) can also
fill-up the BD queue of FlexRM ring.

>
>>>
>>> 2) It calls mbox_chan_received_data()  which is for messages received
>>> from the remote. And not the way to report failed _transmission_, for
>>> which the api calls back mbox_client.tx_done() .  In your client
>>> driver please populate mbox_client.tx_done() and see which message is
>>> reported "sent fine" when.
>>>
>>>
>>>>>>> There seems no such provision. IIANW, then you should be able to
>>>>>>> consider every message as "sent successfully" once it is in the ring
>>>>>>> buffer i.e, immediately after mbox_send_message() returns 0.
>>>>>>> In that case I would think you don't need more than a couple of
>>>>>>> entries out of MBOX_TX_QUEUE_LEN ?
>>>>>>
>>>>>> What I am trying to suggest is that we can take upto 1024 messages
>>>>>> in a FlexRM ring but the MBOX_TX_QUEUE_LEN limits us queuing
>>>>>> more messages. This issue manifest easily when multiple CPUs
>>>>>> queues to same FlexRM ring (i.e. same mailbox channel).
>>>>>>
>>>>> OK then, I guess we have to make the queue length a runtime decision.
>>>>
>>>> Do you agree with approach taken by PATCH5 and PATCH6 to
>>>> make queue length runtime?
>>>>
>>> I agree that we may have to get the queue length from platform, if
>>> MBOX_TX_QUEUE_LEN is limiting performance. That will be easier on both
>>> of us. However I suspect the right fix for _this_ situation is in
>>> flexrm driver. See above.
>>
>> The current implementation is trying to model FlexRM using "txdone_poll"
>> method and that's why we have dependency on MBOX_TX_QUEUE_LEN
>>
>> I think what we really need is new method for "txdone" to model ring
>> manager HW (such as FlexRM). Let's call it "txdone_none".
>>
>> For "txdone_none", it means there is no "txdone" reporting in HW
>> and mbox_send_data() should simply return value returned by
>> send_data() callback. The last_tx_done() callback is not required
>> for "txdone_none" and MBOX_TX_QUEUE_LEN also has no
>> effect on "txdone_none". Both blocking and non-blocking clients
>> are treated same for "txdone_none".
>>
> That is already supported :)

If you are referring to "txdone_ack" then this cannot be used here
because for "txdone_ack" we have to call mbox_chan_txdon() API
after writing descriptors in send_data() callback which will cause
dead-lock in tx_tick() called by mbox_chan_txdone().

>
> In drivers/dma/bcm-sba-raid.c
>
> sba_send_mbox_request(...)
> {
>            ......
>         req->msg.error = 0;
>         ret = mbox_send_message(sba->mchans[mchans_idx], &req->msg);
>         if (ret < 0) {
>                 dev_err(sba->dev, "send message failed with error %d", ret);
>                 return ret;
>         }
>         ret = req->msg.error;
>         if (ret < 0) {
>                 dev_err(sba->dev, "message error %d", ret);
>                 return ret;
>         }
>           .....
> }
>
> Here you _do_ assume that as soon as the mbox_send_message() returns,
> the last_tx_done() is true. In other words, this is a case of client
> 'knows_txdone'.
>
> So ideally you should specify cl->knows_txdone = true during
> mbox_request_channel() and have ...
>
> sba_send_mbox_request(...)
> {
>         ret = mbox_send_message(sba->mchans[mchans_idx], &req->msg);
>         if (ret < 0) {
>                 dev_err(sba->dev, "send message failed with error %d", ret);
>                 return ret;
>         }
>
>         ret = req->msg.error;
>
>        /* Message successfully placed in the ringbuffer, i.e, done */
>        mbox_client_txdone(sba->mchans[mchans_idx], ret);
>
>        if (ret < 0) {
>                 dev_err(sba->dev, "message error %d", ret);
>                 return ret;
>         }
>
>         .....
> }
>

I think we need to improve mailbox.c so that
mbox_chan_txdone() can be called from
send_data() callback.

Regards,
Anup