lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=whZYCK2eNEcTvKWgBvoSL8YLT6G0dexVkFbDiVCLN3zBQ@mail.gmail.com>
Date:   Mon, 3 Aug 2020 16:49:03 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Jens Axboe <axboe@...nel.dk>
Cc:     io-uring <io-uring@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [GIT PULL] io_uring changes for 5.9-rc1

On Mon, Aug 3, 2020 at 4:31 PM Jens Axboe <axboe@...nel.dk> wrote:
>
> Updated to honor exclusive return value as well:

See my previous email, You're just adding code that makes no sense,
because your wait entry fundamentally isn't an exclusive one.

So all that code is a no-op and only makes it more confusing to read.

Your wakeup handler has _nothing_ to do with the generic
wake_page_function(). There is _zero_ overlap. Your wakeup handler
gets called only for the wait entries _you_ created.

Trying to use the wakeup logic from wake_page_function() makes no
sense, because the rules for wake_page_function() are entirely
different. Yes, they are called for the same thing (somebody unlocked
a page and is waking up waiters), but it's using a completely
different sleeping logic.

See? When wake_page_function() does that

        wait->flags |= WQ_FLAG_WOKEN;

and does something different (and returns different values) depending
on whether WQ_FLAG_EXCLUSIVE was set, that is all because
wait_on_page_bit_common() entry set yo that wait entry (on its stack)
with those exact rules in mind.

So the wakeup function is 1:1 tied to the code that registers the wait
entry. wait_on_page_bit_common() has one set of rules, that are then
honored by the wakeup function it uses. But those rules have _zero_
impact on your use. You can have - and you *do* have - different sets
of rules.

For example, none of your wakeups are ever exclusive. All you do is
make a work runnable - that doesn't mean that other people shouldn't
do other things when they get a "page was unlocked" wakeup
notification.

Also, for you "list_del_init()" is fine, because you never do the
unlocked "list_empty_careful()" on that wait entry.  All the waitqueue
operations run under the queue head lock.

So what I think you _should_ do is just something like this:

    diff --git a/fs/io_uring.c b/fs/io_uring.c
    index 2a3af95be4ca..1e243f99643b 100644
    --- a/fs/io_uring.c
    +++ b/fs/io_uring.c
    @@ -2965,10 +2965,10 @@ static int io_async_buf_func(struct
wait_queue_entry *wait, unsigned mode,
            if (!wake_page_match(wpq, key))
                    return 0;

    -       /* Stop waking things up if the page is locked again */
    -       if (test_bit(key->bit_nr, &key->page->flags))
    -              return -1;
    -
    +       /*
    +        * Somebody unlocked the page. Unqueue the wait entry
    +        * and run the task_work
    +        */
             list_del_init(&wait->entry);

             init_task_work(&req->task_work, io_req_task_submit);

because that matches what you're actually doing.

There's no reason to stop waking up others because the page is locked,
because you don't know what others want.

And there's never any reason for the exclusive thing, b3ecause none of
what you do guarantees that you take exclusive ownership of the page
lock. Running the work *may* end up doing a "lock_page()", but you
don't actually guarantee that.

                Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ