[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210916160224.GP3544071@ziepe.ca>
Date: Thu, 16 Sep 2021 13:02:24 -0300
From: Jason Gunthorpe <jgg@...pe.ca>
To: Dmitry Vyukov <dvyukov@...gle.com>
Cc: syzbot <syzbot+dc3dfba010d7671e05f5@...kaller.appspotmail.com>,
dledford@...hat.com, leon@...nel.org, linux-kernel@...r.kernel.org,
linux-rdma@...r.kernel.org, syzkaller-bugs@...glegroups.com,
Aleksandr Nogikh <nogikh@...gle.com>
Subject: Re: [syzbot] KASAN: use-after-free Read in addr_handler (4)
On Thu, Sep 16, 2021 at 04:45:27PM +0200, Dmitry Vyukov wrote:
> It looks like a very hard to trigger race (few crashes, no reproducer,
> but KASAN reports look sensible). That's probably the reason syzkaller
> can't create a reproducer.
> From the log it looks like it was triggered by one of these programs
> below. But I tried to reproduce manually and had no success.
> We are currently doing some improvements to race triggering code in
> syzkaller, and may try to use this as a litmus test to see if
> syzkaller will do any better:
> https://github.com/google/syzkaller/issues/612#issuecomment-920961538
I would suggest to look at this:
https://patchwork.kernel.org/project/linux-rdma/patch/0-v1-9fbb33f5e201+2a-cma_listen_jgg@nvidia.com/
Which I think should be completely deterministic, just do the RDMA_CM
ops in the right order, but syzbot didn't find a reproducer.
The "healer" fork did however:
https://lore.kernel.org/all/CACkBjsY-CNzO74XGo0uJrcaZTubC+Yw9Sg1bNNi+evUOGaZTCg@mail.gmail.com/#r
> Answering your question re what was running concurrently with what.
> Each of the syscalls in these programs can run up to 2 times and
> ultimately any of these calls can race with any. Potentially syzkaller
> can predict values kernel will return (e.g. id's) before kernel
> actually returned them. I guess this does not restrict search area for
> the bug a lot...
Well, it does help if it is only those system calls
And I think I can discount the workqueue as a problem as I'd expect a
kasn hit on the 'req' allocation if the workqueue was malfunctioning -
thus I must conclude we are not calling work cancelation for some
reason.
Jason
Powered by blists - more mailing lists