netdev - Re: net/dccp: warning in dccp_feat_clone_sp_val/__might

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1477833601.7065.297.camel@edumazet-glaptop3.roam.corp.google.com>
Date:   Sun, 30 Oct 2016 06:20:01 -0700
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     Andrey Konovalov <andreyknvl@...gle.com>,
        Peter Zijlstra <peterz@...radead.org>
Cc:     Cong Wang <xiyou.wangcong@...il.com>,
        Gerrit Renker <gerrit@....abdn.ac.uk>,
        "David S. Miller" <davem@...emloft.net>, dccp@...r.kernel.org,
        netdev <netdev@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        Eric Dumazet <edumazet@...gle.com>
Subject: Re: net/dccp: warning in dccp_feat_clone_sp_val/__might_sleep

On Sun, 2016-10-30 at 05:41 +0100, Andrey Konovalov wrote:
> Sorry, the warning is still there.
> 
> I'm not sure adding sched_annotate_sleep() does anything, since it's
> defined as (in case CONFIG_DEBUG_ATOMIC_SLEEP is not set):
> # define sched_annotate_sleep() do { } while (0)

Thanks again for testing.

But you do have CONFIG_DEBUG_ATOMIC_SLEEP set, which triggers a check in
__might_sleep() :

WARN_ONCE(current->state != TASK_RUNNING && current->task_state_change,

Relevant commit is 00845eb968ead28007338b2bb852b8beef816583
("sched: don't cause task state changes in nested sleep debugging")

Another relevant commit was 26cabd31259ba43f68026ce3f62b78094124333f
("sched, net: Clean up sk_wait_event() vs. might_sleep()") 

Before release_sock() could process the backlog in process context, only
lock_sock() could trigger the issue, so my fix at that time was commit
cb7cf8a33ff73cf638481d1edf883d8968f934f8 ("inet: Clean up
inet_csk_wait_for_connect() vs. might_sleep()")

I guess we need something else now, because the following :

static int dccp_wait_for_ccid(struct sock *sk, unsigned long delay)
{
        DEFINE_WAIT(wait);
        long remaining;

        prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
        sk->sk_write_pending++;
        release_sock(sk);
	...

can now process the socket backlog in process context from
release_sock(), so all GFP_KERNEL allocations might barf because of
TASK_INTERRUPTIBLE being used at that point.

sk_wait_event() probably also needs a fix.

Peter, any idea how this can be done ?

Thanks !