netdev - Re: [rds-devel] BUG: unable to handle kernel NULL pointer dereference in rds_send

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180130222228.q23csjr5l666v3o5@gmail.com>
Date:   Tue, 30 Jan 2018 14:22:28 -0800
From:   Eric Biggers <ebiggers3@...il.com>
To:     Sowmini Varadhan <sowmini.varadhan@...cle.com>
Cc:     David Miller <davem@...emloft.net>, santosh.shilimkar@...cle.com,
        rds-devel@....oracle.com,
        bot+aaf54a8c644d559d34dedcf3126aac68a20c9e63@...kaller.appspotmail.com,
        linux-rdma@...r.kernel.org, netdev@...r.kernel.org,
        syzkaller-bugs@...glegroups.com, linux-kernel@...r.kernel.org
Subject: Re: [rds-devel] BUG: unable to handle kernel NULL pointer
 dereference in rds_send_xmit

On Mon, Dec 18, 2017 at 12:22:51PM -0500, Sowmini Varadhan wrote:
> > From: Santosh Shilimkar <santosh.shilimkar@...cle.com>
> > Date: Mon, 18 Dec 2017 08:28:05 -0800
>   :
> > > Looks like another one tripping on empty transport. Mostly below
> > > should
> > > address it but we will test it if it does.
> 
> that was my first thought, but it cannot be the case here: rds_sendmsg
> etc itself would have bombed if that were the case, and the packet
> would never have gotten queued.
> 
> This is unlike f3069c6d33, where an applications skips the transport
> binding (either misses the explicit bind, or gets the wrong transport
> due to an implicit bind) before it triggers the setsockopt.
> 
> I suspect that the problems is that the conn (and thus c_trans)
> have gotten destroyed, but the cp_send_w work got incorrectly 
> re-queued. For example, rds_cong_queue_updates() (because the
> peer sent a congestion update) can happen in softirq context, 
> and would end up requeing work in the middle of rds_conn_destroy, 
> after we have assumed that everything is quisced.
> 
> On (12/18/17 12:12), David Miller wrote:
> > 
> > We're seeming to accumulate a lot of checks like this, maybe there
> > is a more general way to deal with this problem?
> 
> Yeah, I was thinking about this..  let me try to reprodcue this in-house
> and get back with a patchset.  
> 

I assume you weren't able to reproduce this?  This crash hasn't been seen again,
and it was reported while KASAN was accidentally disabled in the syzbot kconfig
due to a change to the kconfig menus in linux-next.  So this crash was possibly
caused by slab corruption elsewhere.

I am invalidating the bug for syzbot so it will report the same crash signature
again if it occurs, but if you think there is a real bug feel free to keep
looking into it.

#syz invalid

Thanks,

Eric