linux-kernel - Re: 9pfs hangs since 4.7

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170104220447.74f2265d@duuni>
Date:   Wed, 4 Jan 2017 22:04:47 +0200
From:   Tuomas Tynkkynen <tuomas@...era.com>
To:     Al Viro <viro@...IV.linux.org.uk>
CC:     <v9fs-developer@...ts.sourceforge.net>,
        <linux-fsdevel@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: 9pfs hangs since 4.7

On Wed, 4 Jan 2017 01:47:53 +0000
Al Viro <viro@...IV.linux.org.uk> wrote:

> On Wed, Jan 04, 2017 at 01:34:50AM +0200, Tuomas Tynkkynen wrote:
> 
> > I got these logs from the server & client with 9p tracepoints enabled:
> > 
> > https://gist.githubusercontent.com/dezgeg/02447100b3182167403099fe7de4d941/raw/3772e408ddf586fb662ac9148fc10943529a6b99/dmesg%2520with%25209p%2520trace
> > https://gist.githubusercontent.com/dezgeg/e1e0c7f354042e1d9bdf7e9135934a65/raw/3a0e3b4f7a5229fd0be032c6839b578d47a21ce4/qemu.log
> >   
> 
> Lovely...
> 
> 27349:out 8 0001 		TSTATFS, tag 1
> 27350:out 12 0001		TLOPEN, tag 1
> 27351:complete 9 0001		RSTATFS, tag 1
> 27352:complete 13 0001		RLOPEN, tag 1
> 
> 27677:out 8 0001		TSTATFS, tag 1
> 27678:out 12 0001		TLOPEN, tag 1
> 27679:complete 9 0001		RSTATFS, tag 1
> 27680:complete 13 0001		RLOPEN, tag 1
> 
> 29667:out 8 0001		TSTATFS, tag 1
> 29668:out 110 0001		TWALK, tag 1
> 29669:complete 9 0001		RSTATFS, tag 1
> 29670:complete 7 0001		RLERROR, tag 1
> 
> 42317:out 110 0001		TWALK, tag 1
> 42318:out 8 0001		TSTATFS, tag 1
> 42319:complete 9 0001		RSTATFS, tag 1
> 42320:complete 7 0001		RERROR, tag 1
> 
> Those are outright protocol violations: tag can be reused only after either
> a reply bearing the same tag has arrived *or* TFLUSH for that tag had been
> issued and successful (type == RFLUSH) reply bearing the tag of that TFLUSH
> has arrived.  Your log doesn't contain any TFLUSH (108) at all, so it should've
> been plain and simple "no reuse until server sends a reply with matching tag".
> 

Argh, I had completely forgotten there is another 9p mount involved, so the log
is mixed between the two mounts. Apologies, there is now a new trace attached
with the V9fsState pointer dumped in the mix as well.

> Otherwise the the dump looks sane.  In the end all issued requests had been
> served, so it's not as if the client had been waiting for a free tag or for
> a response to be produced by the server.
> 
> Interesting...  dmesg doesn't seem to contain the beginning of the 9P trace -
> only 89858 out of 108818 in the dump, even though it claims to have lost
> only 4445 events.  [...]

Here's logs that should be complete this time:

https://gist.githubusercontent.com/dezgeg/08629d4c8ca79da794bc087e5951e518/raw/a1a82b9bc24e5282c82beb43a9dc91974ffcf75a/9p.qemu.log
https://gist.githubusercontent.com/dezgeg/1d5f1cc0647e336c59989fc62780eb2e/raw/d7623755a895b0441c608ddba366d20bbf47ab0b/9p.dmesg.log

> 
> Interesting...  Which version of qemu/what arguments are you using there?

This is QEMU 2.7.0, with these on the server side:
    -virtfs local,path=/nix/store,security_model=none,mount_tag=store
    -virtfs local,path=$TMPDIR/xchg,security_model=none,mount_tag=xchg

and on the client side:

store on /nix/store type 9p (rw,relatime,dirsync,trans=virtio,version=9p2000.L,cache=loose)
xchg on /tmp/xchg type 9p (rw,relatime,dirsync,trans=virtio,version=9p2000.L,cache=loose)

'store' is the one receiving the hammering, 'xchg' I think is mostly sitting idle.