lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 29 Jun 2020 21:41:27 +0200
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Jason Gunthorpe <jgg@...pe.ca>
Cc:     syzkaller <syzkaller@...glegroups.com>,
        Hillf Danton <hdanton@...a.com>,
        syzbot <syzbot+a929647172775e335941@...kaller.appspotmail.com>,
        chuck.lever@...cle.com, Doug Ledford <dledford@...hat.com>,
        Leon Romanovsky <leon@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        RDMA mailing list <linux-rdma@...r.kernel.org>,
        parav@...lanox.com, Markus Elfring <Markus.Elfring@....de>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>
Subject: Re: KASAN: use-after-free Read in addr_handler (2)

On Mon, Jun 29, 2020 at 9:22 PM Jason Gunthorpe <jgg@...pe.ca> wrote:
> > > > On Sat, Jun 27, 2020 at 09:02:05PM +0800, Hillf Danton wrote:
> > > > > > So, to hit this syzkaller one of these must have happened:
> > > > > >  1) rdma_addr_cancel() didn't work and the process_one_work() is still
> > > > > >     runnable/running
> > > > >
> > > > > What syzbot reported indicates that the kworker did survive not only
> > > > > canceling work but the handler_mutex, despite it's a sync cancel that
> > > > > waits for the work to complete.
> > > >
> > > > The syzbot report doesn't confirm that the cancel work was actaully
> > > > called.
> > > >
> > > > The most likely situation is that it was skipped because of the state
> > > > mangling the patch fixes..
> > > >
> > > > > >  2) The state changed away from RDMA_CM_ADDR_QUERY without doing
> > > > > >     rdma_addr_cancel()
> > > > >
> > > > > The cancel does cover the query state in the reported case, and have
> > > > > difficult time working out what's in the patch below preventing the
> > > > > work from going across the line the sync cancel draws. That's the
> > > > > question we can revisit once there is a reproducer available.
> > > >
> > > > rdma-cm never seems to get reproducers from syzkaller
> > >
> > > +syzkaller mailing list
> > >
> > > Hi Jason,
> > >
> > > Wonder if there is some systematic issue. Let me double check.
> >
> > By scanning bugs at:
> > https://syzkaller.appspot.com/upstream
> > https://syzkaller.appspot.com/upstream/fixed
> >
> > I found a significant number of bugs that I would qualify as "rdma-cm"
> > and that have reproducers. Here is an incomplete list (I did not get
> > to the end):
> >
> > https://syzkaller.appspot.com/bug?id=b8febdb3c7c8c1f1b606fb903cee66b21b2fd02f
> > https://syzkaller.appspot.com/bug?id=d5222b3e1659e0aea19df562c79f216515740daa
> > https://syzkaller.appspot.com/bug?id=c600e111223ce0a20e5f2fb4e9a4ebdff54d7fa6
> > https://syzkaller.appspot.com/bug?id=a9796acbdecc1b2ba927578917755899c63c48af
> > https://syzkaller.appspot.com/bug?id=95f89b8fb9fdc42e28ad586e657fea074e4e719b
> > https://syzkaller.appspot.com/bug?id=8dc0bcd9dd6ec915ba10b3354740eb420884acaa
> > https://syzkaller.appspot.com/bug?id=805ad726feb6910e35088ae7bbe61f4125e573b7
> > https://syzkaller.appspot.com/bug?id=56b60fb3340c5995373fe5b8eae9e8722a012fc4
> > https://syzkaller.appspot.com/bug?id=38d36d1b26b4299bf964d50af4d79688d39ab960
> > https://syzkaller.appspot.com/bug?id=25e00dd59f31783f233185cb60064b0ab645310f
> > https://syzkaller.appspot.com/bug?id=2f38d7e5312fdd0acc979c5e26ef2ef8f3370996
> >
> > Do you mean some specific subset of bugs by "rdma-cm"? If yes, what is
> > that subset?
>
> The race condition bugs never seem to get reproducers, I checked a few
> of the above and these are much more deterministic things.
>
> I think the recurrance rate for the races is probably too low?

Yes, it definitely may depend on probability. There is usually a
significant correlation with the number of crashes. This bug happened
only once, that usually means either a very hard to trigger race
condition, or a previous induced memory corruption. For harder to
trigger race conditions, KCSAN (the data race detector) may help in
future. However, kernel has too many races to report them to mailing
lists:
https://syzkaller.appspot.com/upstream?manager=ci2-upstream-kcsan-gce
Though, some race conditions are manageable to trigger and I think we
have hundreds of race conditions with reproducers on the dashboard.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ