lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170104014753.GE1555@ZenIV.linux.org.uk>
Date:   Wed, 4 Jan 2017 01:47:53 +0000
From:   Al Viro <viro@...IV.linux.org.uk>
To:     Tuomas Tynkkynen <tuomas@...era.com>
Cc:     v9fs-developer@...ts.sourceforge.net,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: 9pfs hangs since 4.7

On Wed, Jan 04, 2017 at 01:34:50AM +0200, Tuomas Tynkkynen wrote:

> I got these logs from the server & client with 9p tracepoints enabled:
> 
> https://gist.githubusercontent.com/dezgeg/02447100b3182167403099fe7de4d941/raw/3772e408ddf586fb662ac9148fc10943529a6b99/dmesg%2520with%25209p%2520trace
> https://gist.githubusercontent.com/dezgeg/e1e0c7f354042e1d9bdf7e9135934a65/raw/3a0e3b4f7a5229fd0be032c6839b578d47a21ce4/qemu.log
> 

Lovely...

27349:out 8 0001 		TSTATFS, tag 1
27350:out 12 0001		TLOPEN, tag 1
27351:complete 9 0001		RSTATFS, tag 1
27352:complete 13 0001		RLOPEN, tag 1

27677:out 8 0001		TSTATFS, tag 1
27678:out 12 0001		TLOPEN, tag 1
27679:complete 9 0001		RSTATFS, tag 1
27680:complete 13 0001		RLOPEN, tag 1

29667:out 8 0001		TSTATFS, tag 1
29668:out 110 0001		TWALK, tag 1
29669:complete 9 0001		RSTATFS, tag 1
29670:complete 7 0001		RLERROR, tag 1

42317:out 110 0001		TWALK, tag 1
42318:out 8 0001		TSTATFS, tag 1
42319:complete 9 0001		RSTATFS, tag 1
42320:complete 7 0001		RERROR, tag 1

Those are outright protocol violations: tag can be reused only after either
a reply bearing the same tag has arrived *or* TFLUSH for that tag had been
issued and successful (type == RFLUSH) reply bearing the tag of that TFLUSH
has arrived.  Your log doesn't contain any TFLUSH (108) at all, so it should've
been plain and simple "no reuse until server sends a reply with matching tag".

Otherwise the the dump looks sane.  In the end all issued requests had been
served, so it's not as if the client had been waiting for a free tag or for
a response to be produced by the server.

Interesting...  dmesg doesn't seem to contain the beginning of the 9P trace -
only 89858 out of 108818 in the dump, even though it claims to have lost
only 4445 events.  The last exchanges in the trace are
> P9_TLOPEN tag  90 err 0
> P9_TLOPEN tag  25 err 0
> P9_TLOPEN tag  15 err 0
> P9_TLOPEN tag  104 err 0
> P9_TLOPEN tag  102 err 0
> P9_TLOPEN tag  91 err 0
> P9_TLOPEN tag  12 err 0
< P9_TREADLINK tag  12

which should've been
complete 13 005a
complete 13 0019
complete 13 000f
complete 13 0068
complete 13 0066
complete 13 005b
out 22 000c

and the only plausible match for that is

108784:complete 13 005a
108798:complete 13 000f
108799:complete 13 0019
108800:complete 13 0068
108801:complete 13 0066
108802:complete 13 005b

but there's a bunch of replies in between that hadn't been logged *and* such
a TREADLINK request isn't seen after
108341:out 22 000c

... and sure enough, there's a whole lot of processes stuck in readlink.

Interesting...  Which version of qemu/what arguments are you using there?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ