netdev - Re: [PATCH net-next] rds: avoid lock hierarchy violation between m_rs_lock and rs_recv

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Wed, 8 Aug 2018 14:51:52 -0700
From:   Santosh Shilimkar <santosh.shilimkar@...cle.com>
To:     Sowmini Varadhan <sowmini.varadhan@...cle.com>,
        netdev@...r.kernel.org
Cc:     davem@...emloft.net, rds-devel@....oracle.com
Subject: Re: [PATCH net-next] rds: avoid lock hierarchy violation between
 m_rs_lock and rs_recv_lock

On 8/8/2018 1:57 PM, Sowmini Varadhan wrote:
> The following deadlock, reported by syzbot, can occur if CPU0 is in
> rds_send_remove_from_sock() while CPU1 is in rds_clear_recv_queue()
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(&(&rm->m_rs_lock)->rlock);
>                                 lock(&rs->rs_recv_lock);
>                                 lock(&(&rm->m_rs_lock)->rlock);
>    lock(&rs->rs_recv_lock);
> 
> The deadlock should be avoided by moving the messages from the
> rs_recv_queue into a tmp_list in rds_clear_recv_queue() under
> the rs_recv_lock, and then dropping the refcnt on the messages
> in the tmp_list (potentially resulting in rds_message_purge())
> after dropping the rs_recv_lock.
> 
> The same lock hierarchy violation also exists in rds_still_queued()
> and should be avoided in a similar manner
> 
> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@...cle.com>
> Reported-by: syzbot+52140d69ac6dc6b927a9@...kaller.appspotmail.com
> ---
This bug doesn't make sense since two different transports are using
same socket (Loop and rds_tcp) and running together.
For same transport, such race can't happen with MSG_ON_SOCK flag.
CPU1-> rds_loop_inc_free
CPU0 -> rds_tcp_cork ...

I need to understand this test better.

Regards,
Santosh