linux-kernel - Re: [PATCH v8 2/3] CMDQ: Mediatek CMDQ driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5757F762.4020908@gmail.com>
Date:	Wed, 8 Jun 2016 12:45:54 +0200
From:	Matthias Brugger <matthias.bgg@...il.com>
To:	Horng-Shyang Liao <hs.liao@...iatek.com>
Cc:	Rob Herring <robh+dt@...nel.org>,
	Daniel Kurtz <djkurtz@...omium.org>,
	Sascha Hauer <s.hauer@...gutronix.de>,
	devicetree@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-arm-kernel@...ts.infradead.org,
	linux-mediatek@...ts.infradead.org, srv_heupstream@...iatek.com,
	Sascha Hauer <kernel@...gutronix.de>,
	Philipp Zabel <p.zabel@...gutronix.de>,
	Nicolas Boichat <drinkcat@...omium.org>,
	CK HU <ck.hu@...iatek.com>,
	cawa cheng <cawa.cheng@...iatek.com>,
	Bibby Hsieh <bibby.hsieh@...iatek.com>,
	YT Shen <yt.shen@...iatek.com>,
	Daoyuan Huang <daoyuan.huang@...iatek.com>,
	Damon Chu <damon.chu@...iatek.com>,
	Josh-YC Liu <josh-yc.liu@...iatek.com>,
	Glory Hung <glory.hung@...iatek.com>,
	Jiaguang Zhang <jiaguang.zhang@...iatek.com>,
	Dennis-YC Hsieh <dennis-yc.hsieh@...iatek.com>,
	Monica Wang <monica.wang@...iatek.com>,
	jassisinghbrar@...il.com, jaswinder.singh@...aro.org
Subject: Re: [PATCH v8 2/3] CMDQ: Mediatek CMDQ driver



On 08/06/16 07:40, Horng-Shyang Liao wrote:
> Hi Matthias,
>
> On Tue, 2016-06-07 at 18:59 +0200, Matthias Brugger wrote:
>>
>> On 03/06/16 15:11, Matthias Brugger wrote:
>>>
>>>
>> [...]
>>
>>>>>>>>>>>> +
>>>>>>>>>>>> +            smp_mb(); /* modify jump before enable thread */
>>>>>>>>>>>> +        }
>>>>>>>>>>>> +
>>>>>>>>>>>> +        cmdq_thread_writel(thread, task->pa_base +
>>>>>>>>>>>> task->command_size,
>>>>>>>>>>>> +                   CMDQ_THR_END_ADDR);
>>>>>>>>>>>> +        cmdq_thread_resume(thread);
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +    list_move_tail(&task->list_entry, &thread->task_busy_list);
>>>>>>>>>>>> +    spin_unlock_irqrestore(&cmdq->exec_lock, flags);
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +static void cmdq_handle_error_done(struct cmdq *cmdq,
>>>>>>>>>>>> +                   struct cmdq_thread *thread, u32 irq_flag)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    struct cmdq_task *task, *tmp, *curr_task = NULL;
>>>>>>>>>>>> +    u32 curr_pa;
>>>>>>>>>>>> +    struct cmdq_cb_data cmdq_cb_data;
>>>>>>>>>>>> +    bool err;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    if (irq_flag & CMDQ_THR_IRQ_ERROR)
>>>>>>>>>>>> +        err = true;
>>>>>>>>>>>> +    else if (irq_flag & CMDQ_THR_IRQ_DONE)
>>>>>>>>>>>> +        err = false;
>>>>>>>>>>>> +    else
>>>>>>>>>>>> +        return;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    curr_pa = cmdq_thread_readl(thread, CMDQ_THR_CURR_ADDR);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    list_for_each_entry_safe(task, tmp, &thread->task_busy_list,
>>>>>>>>>>>> +                 list_entry) {
>>>>>>>>>>>> +        if (curr_pa >= task->pa_base &&
>>>>>>>>>>>> +            curr_pa < (task->pa_base + task->command_size))
>>>>>>>>>>>
>>>>>>>>>>> What are you checking here? It seems as if you make some implcit
>>>>>>>>>>> assumptions about pa_base and the order of execution of
>>>>>>>>>>> commands in the
>>>>>>>>>>> thread. Is it save to do so? Does dma_alloc_coherent give any
>>>>>>>>>>> guarantees
>>>>>>>>>>> about dma_handle?
>>>>>>>>>>
>>>>>>>>>> 1. Check what is the current running task in this GCE thread.
>>>>>>>>>> 2. Yes.
>>>>>>>>>> 3. Yes, CMDQ doesn't use iommu, so physical address is continuous.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yes, physical addresses might be continous, but AFAIK there is no
>>>>>>>>> guarantee that the dma_handle address is steadily growing, when
>>>>>>>>> calling
>>>>>>>>> dma_alloc_coherent. And if I understand the code correctly, you
>>>>>>>>> use this
>>>>>>>>> assumption to decide if the task picked from task_busy_list is
>>>>>>>>> currently
>>>>>>>>> executing. So I think this mecanism is not working.
>>>>>>>>
>>>>>>>> I don't use dma_handle address, and just use physical addresses.
>>>>>>>>     From CPU's point of view, tasks are linked by the busy list.
>>>>>>>>     From GCE's point of view, tasks are linked by the JUMP command.
>>>>>>>>
>>>>>>>>> In which cases does the HW thread raise an interrupt.
>>>>>>>>> In case of error. When does CMDQ_THR_IRQ_DONE get raised?
>>>>>>>>
>>>>>>>> GCE will raise interrupt if any task is done or error.
>>>>>>>> However, GCE is fast, so CPU may get multiple done tasks
>>>>>>>> when it is running ISR.
>>>>>>>>
>>>>>>>> In case of error, that GCE thread will pause and raise interrupt.
>>>>>>>> So, CPU may get multiple done tasks and one error task.
>>>>>>>>
>>>>>>>
>>>>>>> I think we should reimplement the ISR mechanism. Can't we just read
>>>>>>> CURR_IRQ_STATUS and THR_IRQ_STATUS in the handler and leave
>>>>>>> cmdq_handle_error_done to the thread_fn? You will need to pass
>>>>>>> information from the handler to thread_fn, but that shouldn't be an
>>>>>>> issue. AFAIK interrupts are disabled in the handler, so we should stay
>>>>>>> there as short as possible. Traversing task_busy_list is expensive, so
>>>>>>> we need to do it in a thread context.
>>>>>>
>>>>>> Actually, our initial implementation is similar to your suggestion,
>>>>>> but display needs CMDQ to return callback function very precisely,
>>>>>> else display will drop frame.
>>>>>> For display, CMDQ interrupt will be raised every 16 ~ 17 ms,
>>>>>> and CMDQ needs to call callback function in ISR.
>>>>>> If we defer callback to workqueue, the time interval may be larger than
>>>>>> 32 ms.sometimes.
>>>>>>
>>>>>
>>>>> I think the problem is, that you implemented the workqueue as a ordered
>>>>> workqueue, so there is no parallel processing. I'm still not sure why
>>>>> you need the workqueue to be ordered. Can you please explain.
>>>>
>>>> The order should be kept.
>>>> Let me use mouse cursor as an example.
>>>> If task 1 means move mouse cursor to point A, task 2 means point B,
>>>> and task 3 means point C, our expected result is A -> B -> C.
>>>> If the order is not kept, the result could become A -> C -> B.
>>>>
>>>
>>> Got it, thanks for the clarification.
>>>
>>
>> I think a way to get rid of the workqueue is to use a timer, which gets
>> programmed to the time a timeout in the first task in the busy list
>> would happen. Everytime we update the busy list (e.g. because of task
>> got finished by the thread), we update the timer. When the timer
>> triggers, which hopefully won't happen too often, we return timeout on
>> the busy list elements, until the time is lower then the actual time.
>>
>> At least with this we can reduce the data structures in this driver and
>> make it more lightweight.
>
>  From my understanding, your proposed method can handle timeout case.
>
> However, the workqueue is also in charge of releasing tasks.
> Do you take releasing tasks into consideration by using the proposed
> timer method?
> Furthermore, I think the code will become more complex if we also use
> timer to implement releasing tasks.
>

Can't we call
         clk_disable_unprepare(cmdq->clock);
         cmdq_task_release(task);
after invoking the callback?
Regrading the clock, wouldn't it be easier to handle the clock 
enable/disable depending on the state of task_busy_list? I suppose we 
can't as we would need to check the task_busy_list of all threads, right?

Regards,
Matthias