[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aba2b6c3-eac1-5a5c-5f75-de1d9deadab9@cn.fujitsu.com>
Date: Tue, 2 Aug 2016 09:22:35 +0800
From: Zhou Jie <zhoujie2011@...fujitsu.com>
To: Alex Williamson <alex.williamson@...hat.com>
CC: <fan.chen@...ystack.cn>, <linux-kernel@...r.kernel.org>,
<qemu-devel@...gnu.org>, Chen Fan <chen.fan.fnst@...fujitsu.com>,
<izumi.taku@...fujitsu.com>
Subject: Re: [Qemu-devel] [PATCH v2 2/2] vfio : add aer process
Hi, Alex
>>> Clearly this has only been tested for a single instance of an AER error
>>> event and resume per device. Are the things you're intending to block
>>> actually blocked for subsequent events? Note how complete_all() fills
>>> the done field to let all current and future waiters go through and
>>> nowhere is there a call to reinit_completion() to drain that path.
>>> Thanks,
>>>
>>> Alex
>>
>> Do you mean this condition?
>>
>> For device 1:
>> error1 occurs ---- error1 resumes
>> error2 occurs ---- error2 resumes
>> error3 occurs ---- error3 resumes
>>
>> In current code, I do complete_all() when error1 resumes.
>> And this will unblock the device
>> when error2 and error3 are still be processed.
>
> So walk me through how this works. On vfio_pci_open() we call
> init_completion(), which sets aer_error_completion.done equal to zero
> (BTW, a user can open the device file descriptor multiple times, so
> there's already a bug here).
I will call init_completion() in vfio_pci_probe.
> Let's assume that an error occurs and the
> user stalls a single access on wait_for_completion_interruptible().
> The bulk of this function happens here:
>
> static inline long __sched
> do_wait_for_common(struct completion *x,
> long (*action)(long), long timeout, int state)
> {
> if (!x->done) {
> DECLARE_WAITQUEUE(wait, current);
>
> __add_wait_queue_tail_exclusive(&x->wait, &wait);
> do {
> if (signal_pending_state(state, current)) {
> timeout = -ERESTARTSYS;
> break;
> }
> __set_current_state(state);
> spin_unlock_irq(&x->wait.lock);
> timeout = action(timeout);
> spin_lock_irq(&x->wait.lock);
> } while (!x->done && timeout);
> __remove_wait_queue(&x->wait, &wait);
> if (!x->done)
> return timeout;
> }
> x->done--;
> return timeout ?: 1;
> }
>
> So it waits within that do{}while loop for a completion, interruption,
> or timeout. Then we have:
>
> void complete_all(struct completion *x)
> {
> unsigned long flags;
>
> spin_lock_irqsave(&x->wait.lock, flags);
> x->done += UINT_MAX/2;
> __wake_up_locked(&x->wait, TASK_NORMAL, 0);
> spin_unlock_irqrestore(&x->wait.lock, flags);
> }
>
> So aer_error_completion.done gets incremented to let a couple billion
> completion waiters through... Show me how another call to
> wait_for_completion_interruptible() will ever block again within our
> lifetime when the actual wait of do_wait_for_common() is only entered
> when 'done' count is equal to zero. This seems to be why
> reinit_completion() exists, but it's not used here. Thanks,
>
> Alex
I will call reinit_completion() in vfio_pci_aer_err_detected when
an aer error is detected.
Thank you very much.
Sincerely
ZhouJie
Powered by blists - more mailing lists