netdev - Re: 9p/trans

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <b148ec48-0338-3549-7250-2b40d865d517@ozlabs.ru>
Date:   Fri, 7 Aug 2020 11:43:57 +1000
From:   Alexey Kardashevskiy <aik@...abs.ru>
To:     Dominique Martinet <asmadeus@...ewreck.org>
Cc:     v9fs-developer@...ts.sourceforge.net, Greg Kurz <groug@...d.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: 9p/trans_fd lockup



On 06/08/2020 22:38, Dominique Martinet wrote:
> Alexey Kardashevskiy wrote on Thu, Aug 06, 2020:
>> I am seeing another bug in 9p under syzkaller, the reprocase is:
>>
>> r0 = open$dir(&(0x7f0000000040)='./file0\x00', 0x88142, 0x182)
>>
>> r1 = openat$null(0xffffffffffffff9c, &(0x7f0000000640)='/dev/null\x00',
>> 0x0, 0x0)
>> mount$9p_fd(0x0, &(0x7f0000000000)='./file0\x00',
>> &(0x7f00000000c0)='9p\x00', 0x0, &(0x7f0000000100)={'trans=fd,',
>> {'rfdno', 0x3d, r1}, 0x2$, {'wfdno', 0x3d, r0}})
>>
>>
>>
>> The default behaviour of syzkaller is to call syscalls concurrently (I
>> think), at least it forks by default and executes the same sequence in
>> both threads.
>>
>> In this example both threads makes it to:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/9p/client.c?h=v5.8#n757
>>
>> and sit there with the only difference which is thread#1 goes via
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/9p/client.c?h=v5.8#n767
>>
>> I am pretty sure things should not have gone that far but I cannot
>> clearly see what needs fixing. Ideas? Thanks,
> 
> Unkillable threads there happen with the current p9_client_rpc when
> there is no real server (or server bug etc); the code is really stupid.
> 
> Basically what happens is that when you send a first signal (^C or
> whatever), the function catches the signal, sends a flush, and
> indefinitely waits for the flush to come back.
> If you send another signal there no more flush comes but it goes back to
> waiting -- it's using wait_event_killable but it's a lie it's not really
> killable it just loops on that wait until the flush finally comes, which
> will never come in your case.
> (the rpc that came by the way is probably version or whatever is first
> done on mount)
> 
> 
> Dmitry reported that to me ages ago and I have a fix which is just to
> stop waiting for the flush -- just make it asynchronous, send and
> forget. That removes the whole signal handling logic and it won't hang
> there anymore.
> 
> I sent the patches to the list last year, but didn't get much feedback
> and didn't have time to run all the tests I wanted to run on it.
> 
> 
> I have some free time at the end of the month so I was planning to
> finish it for 5.10 (e.g. won't send it for 5.9 but once 5.9 initial
> merge window passed leave it in -next for a couple of months and push it
> for 5.10), so your timing is pretty good :)
> An extra pair of eyes would be more than appreciated.
> 
> You can find the original mails there:
> https://lore.kernel.org/lkml/1544532108-21689-3-git-send-email-asmadeus@codewreck.org/
> 
> They're also in my 9p-test branch on git://github.com/martinetd/linux

Thanks for the patches, they fix my case indeed and I'll continue with
them, let's see what else syzkaller finds :)


> 
> 
> Cheers & thanks for the attention,
> 

-- 
Alexey