lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 1 Mar 2007 15:28:40 +0100
From:	"Dmitry Adamushko" <dmitry.adamushko@...il.com>
To:	eli@...lanox.co.il
Cc:	"Linux Kernel" <linux-kernel@...r.kernel.org>
Subject: Re: wait_for_completion_timeout problem ???

Hi,

>
> I have a problem with using this function. I am referring to
> drivers/infiniband/hw/mthca/mthca_cmd.c line 394. For convenience I
> quote from this code:
>
>         init_completion(&context->done);
>
>         err = mthca_cmd_post(dev, in_param,
>                              out_param ? *out_param : 0,
>                              in_modifier, op_modifier,
>                              op, context->token, 1);
>         if (err)
>                 goto out;
>
>         if (!wait_for_completion_timeout(&context->done, timeout)) {
>                 err = -EBUSY;
>                 goto out;
>         }
>
> timeout is 10 * HZ. Sometimes this function returns 0 which signifies
> timeout. However I can see that the interrupt handler called
> complete(&context->done)
> around 200 usec after calling wait_for_completion_timout(). When the
> function returns I can see that context->done.done equals 1 which
> confirms that complete was indeed called.

The sequence of events can be as follows:

a caller gets blocked in wait_for_completion_timeout() on
schedule_timeout() which literally means:

   i ) will be unblocked (scheduled back) after "timeout" has expired;

   ii) will be unblocked by someone calling wake_up_*(&x->wait);

(wait_for_completion_timeout() inserted our caller into "x->wait" wait queue)

in both cases schedule_timeout() will do

...
        schedule(); <------------------ here we get CPU back
        del_singleshot_timer_sync(&timer);
        timeout = expire - jiffies;

 out:
        return timeout < 0 ? 0 : timeout;

"expire" is when (+latency) we were expected to be woken up by a
timeer -> timeout.

Now the point is that our waiter could have been "waken up" (become
"ready" from the point of view of the scheduler) earlier but it was
just "scheduled" (got CPU back) later than "expire" so that's why the
return value is 0 (timeout < 0 ==> return 0).

IOW, schedule_timeout() indicates whether a process has been scheduled
back /earlier than timeout/ (so return value >0) or /later/ (0).

It doesn't indicate why the process has been woked up ( i.e. (i) or
(ii) above ).

In you case it became /runnable/ because of complete() but it got
scheuled later than /timeout/.

And wait_for_completion_timeout() takes it as a /timeout condition/.

So either all the users of wait_for_completion_timeout() should
additionally check for x->done after they got scheduled

or

wait_for_completion_timeout() should return something different that
encodes the fact /event happened/ and not just /event happened _and_ a
caller has got scheduled back earlier than timeout.



>
> Thanks
> Eli

-- 
Best regards,
Dmitry Adamushko
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ