netdev - Re: Report in unix_stream_read

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220217235042.GA20620@X58A-UD3R>
Date:   Fri, 18 Feb 2022 08:50:42 +0900
From:   Byungchul Park <byungchul.park@....com>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     davem@...emloft.net, kuba@...nel.org, netdev@...r.kernel.org,
        torvalds@...ux-foundation.org, mingo@...hat.com,
        linux-kernel@...r.kernel.org, peterz@...radead.org,
        will@...nel.org, tglx@...utronix.de, rostedt@...dmis.org,
        joel@...lfernandes.org, sashal@...nel.org, daniel.vetter@...ll.ch,
        chris@...is-wilson.co.uk, duyuyang@...il.com,
        johannes.berg@...el.com, tj@...nel.org, tytso@....edu,
        david@...morbit.com, amir73il@...il.com, bfields@...ldses.org,
        gregkh@...uxfoundation.org, kernel-team@....com,
        linux-mm@...ck.org, akpm@...ux-foundation.org, mhocko@...nel.org,
        minchan@...nel.org, hannes@...xchg.org, vdavydov.dev@...il.com,
        sj@...nel.org, jglisse@...hat.com, dennis@...nel.org, cl@...ux.com,
        penberg@...nel.org, rientjes@...gle.com, vbabka@...e.cz,
        ngupta@...are.org, linux-block@...r.kernel.org, axboe@...nel.dk,
        paolo.valente@...aro.org, josef@...icpanda.com,
        linux-fsdevel@...r.kernel.org, viro@...iv.linux.org.uk,
        jack@...e.cz, jlayton@...nel.org, dan.j.williams@...el.com,
        hch@...radead.org, djwong@...nel.org,
        dri-devel@...ts.freedesktop.org, airlied@...ux.ie,
        rodrigosiqueiramelo@...il.com, melissa.srw@...il.com,
        hamohammed.sa@...il.com
Subject: Re: Report in unix_stream_read_generic()

On Wed, Feb 16, 2022 at 04:33:41AM +0000, Matthew Wilcox wrote:
> On Wed, Feb 16, 2022 at 01:17:03PM +0900, Byungchul Park wrote:
> > [    7.013330] ===================================================
> > [    7.013331] DEPT: Circular dependency has been detected.
> > [    7.013332] 5.17.0-rc1-00014-gcf3441bb2012 #2 Tainted: G        W        
> > [    7.013333] ---------------------------------------------------
> > [    7.013334] summary
> > [    7.013334] ---------------------------------------------------
> > [    7.013335] *** DEADLOCK ***
> > [    7.013335] 
> > [    7.013335] context A
> > [    7.013336]     [S] (unknown)(&(&ei->socket.wq.wait)->dmap:0)
> > [    7.013337]     [W] __mutex_lock_common(&u->iolock:0)
> > [    7.013338]     [E] event(&(&ei->socket.wq.wait)->dmap:0)
> > [    7.013340] 
> > [    7.013340] context B
> > [    7.013341]     [S] __raw_spin_lock(&u->lock:0)
> > [    7.013342]     [W] wait(&(&ei->socket.wq.wait)->dmap:0)
> > [    7.013343]     [E] spin_unlock(&u->lock:0)
> 
> This seems unlikely to be real.  We're surely not actually waiting
> while holding a spinlock; existing debug checks would catch it.
> 
> > [    7.013407] ---------------------------------------------------
> > [    7.013407] context B's detail
> > [    7.013408] ---------------------------------------------------
> > [    7.013408] context B
> > [    7.013409]     [S] __raw_spin_lock(&u->lock:0)
> > [    7.013410]     [W] wait(&(&ei->socket.wq.wait)->dmap:0)
> > [    7.013411]     [E] spin_unlock(&u->lock:0)
> > [    7.013412] 
> > [    7.013412] [S] __raw_spin_lock(&u->lock:0):
> > [    7.013413] [<ffffffff81aa451f>] unix_stream_read_generic+0x6bf/0xb60
> > [    7.013416] stacktrace:
> > [    7.013416]       _raw_spin_lock+0x6e/0x90
> > [    7.013418]       unix_stream_read_generic+0x6bf/0xb60
> 
> It would be helpful if you'd run this through scripts/decode_stacktrace.sh

(Sorry for late reply, which was because of my email client issue.)

It was big help. Thank you very much.

> so we could see line numbers instead of hex offsets (which arene't much
> use without the binary kernel).
> 
> > [    7.013420]       unix_stream_recvmsg+0x40/0x50
> > [    7.013422]       sock_read_iter+0x85/0xd0
> > [    7.013424]       new_sync_read+0x162/0x180
> > [    7.013426]       vfs_read+0xf3/0x190
> > [    7.013428]       ksys_read+0xa6/0xc0
> > [    7.013429]       do_syscall_64+0x3a/0x90
> > [    7.013431]       entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [    7.013433] 
> > [    7.013434] [W] wait(&(&ei->socket.wq.wait)->dmap:0):
> > [    7.013434] [<ffffffff810bb017>] prepare_to_wait+0x47/0xd0
> 
> ... this may be the source of confusion.  Just because we prepare to
> wait doesn't mean we end up actually waiting.  For example, look at
> unix_wait_for_peer():
> 
>         prepare_to_wait_exclusive(&u->peer_wait, &wait, TASK_INTERRUPTIBLE);
> 
>         sched = !sock_flag(other, SOCK_DEAD) &&
>                 !(other->sk_shutdown & RCV_SHUTDOWN) &&
>                 unix_recvq_full(other);
> 
>         unix_state_unlock(other);
> 
>         if (sched)
>                 timeo = schedule_timeout(timeo);
> 
>         finish_wait(&u->peer_wait, &wait);
> 
> We *prepare* to wait, *then* drop the lock, then actually schedule.

Big help, too. I checked some samples for the usage, but where it's
almost "prepare == wait" :-(. Thanks a lot!

Thanks,
Byungchul