linux-kernel - Re: [Qemu-devel] [PATCH v2 2/2] vfio : add aer process

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <aba2b6c3-eac1-5a5c-5f75-de1d9deadab9@cn.fujitsu.com>
Date:	Tue, 2 Aug 2016 09:22:35 +0800
From:	Zhou Jie <zhoujie2011@...fujitsu.com>
To:	Alex Williamson <alex.williamson@...hat.com>
CC:	<fan.chen@...ystack.cn>, <linux-kernel@...r.kernel.org>,
	<qemu-devel@...gnu.org>, Chen Fan <chen.fan.fnst@...fujitsu.com>,
	<izumi.taku@...fujitsu.com>
Subject: Re: [Qemu-devel] [PATCH v2 2/2] vfio : add aer process

Hi, Alex

>>> Clearly this has only been tested for a single instance of an AER error
>>> event and resume per device.  Are the things you're intending to block
>>> actually blocked for subsequent events?  Note how complete_all() fills
>>> the done field to let all current and future waiters go through and
>>> nowhere is there a call to reinit_completion() to drain that path.
>>> Thanks,
>>>
>>> Alex
>>
>> Do you mean this condition?
>>
>> For device 1:
>> error1 occurs ---- error1 resumes
>>      error2 occurs ---- error2 resumes
>>          error3 occurs ---- error3 resumes
>>
>> In current code, I do complete_all() when error1 resumes.
>> And this will unblock the device
>> when error2 and error3 are still be processed.
>
> So walk me through how this works.  On vfio_pci_open() we call
> init_completion(), which sets aer_error_completion.done equal to zero
> (BTW, a user can open the device file descriptor multiple times, so
> there's already a bug here).
I will call init_completion() in vfio_pci_probe.

> Let's assume that an error occurs and the
> user stalls a single access on wait_for_completion_interruptible().
> The bulk of this function happens here:
>
> static inline long __sched
> do_wait_for_common(struct completion *x,
>                    long (*action)(long), long timeout, int state)
> {
>         if (!x->done) {
>                 DECLARE_WAITQUEUE(wait, current);
>
>                 __add_wait_queue_tail_exclusive(&x->wait, &wait);
>                 do {
>                         if (signal_pending_state(state, current)) {
>                                 timeout = -ERESTARTSYS;
>                                 break;
>                         }
>                         __set_current_state(state);
>                         spin_unlock_irq(&x->wait.lock);
>                         timeout = action(timeout);
>                         spin_lock_irq(&x->wait.lock);
>                 } while (!x->done && timeout);
>                 __remove_wait_queue(&x->wait, &wait);
>                 if (!x->done)
>                         return timeout;
>         }
>         x->done--;
>         return timeout ?: 1;
> }
>
> So it waits within that do{}while loop for a completion, interruption,
> or timeout.  Then we have:
>
> void complete_all(struct completion *x)
> {
>         unsigned long flags;
>
>         spin_lock_irqsave(&x->wait.lock, flags);
>         x->done += UINT_MAX/2;
>         __wake_up_locked(&x->wait, TASK_NORMAL, 0);
>         spin_unlock_irqrestore(&x->wait.lock, flags);
> }
>
> So aer_error_completion.done gets incremented to let a couple billion
> completion waiters through...  Show me how another call to
> wait_for_completion_interruptible() will ever block again within our
> lifetime when the actual wait of do_wait_for_common() is only entered
> when 'done' count is equal to zero.  This seems to be why
> reinit_completion() exists, but it's not used here.  Thanks,
>
> Alex

I will call reinit_completion() in vfio_pci_aer_err_detected when
an aer error is detected.
Thank you very much.

Sincerely
ZhouJie