lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 14 Jun 2008 18:07:03 -0700 (PDT)
From:	David Miller <davem@...emloft.net>
To:	torvalds@...ux-foundation.org
Cc:	rjw@...k.pl, linux-kernel@...r.kernel.org, bunk@...nel.org,
	akpm@...ux-foundation.org, protasnb@...il.com
Subject: Re: 2.6.26-rc6-git2: Reported regressions from 2.6.25

From: Linus Torvalds <torvalds@...ux-foundation.org>
Date: Sat, 14 Jun 2008 17:41:24 -0700 (PDT)

> IOW, I'm pretty damn sure that the bug entry above is very much a result
> of the tcp_defer_accept_check() thing, and that commit ec0a196626 fixed
> it by reverting it. 

I agree with the gist of your analysis.

And it seems that Apache does try to use the deferred accept socket
option.  So we may indeed have a hit on this IA64 bug.

The wording in the report about versions is a little confusing:

	With kernel 2.6.26-rc5 and a git kernel just between rc4
	and rc5, my kernel panic...

Does this mean that the problem appeared between rc4 and rc5?  Or
that all 2.6.26-rcX releases have the problem?  That's an important
fact because the change in question showed up in 2.6.26-rc1, as it
came in the inital networking merge for the 2.6.26 merge window.

> > The behavior of that bug would not usually be a crash, but
> > rather stuck connections, and I severely doubt anything in
> > that specweb test setup is using the deferred-accept option
> > which is a requirement for hitting those problems.
> 
> Hey, I might be wrong. But see above. I don't think I am. I think the 
> deferred-accept was just even buggier than you believed.

Because of the requirements to trigger the new code, this case is
not likely to match the revert.  SSH absolutely does not use the
deferred accept socket option.

Let's look at the change in question.

Every single code path touched in the data paths are guarded
with "tp->defer_tcp_accept.request" which will be NULL unless
1) defer-accept socket option enabled and 2) a new connection
got queued up there.

Nothing about the normal accept queue handling got modified by those
changes which were reverted.

And note that this means the behavior change only hits listening
sockets.  So if we have a report that client outgoing SSH
connections hang with the current kernel, that report cannot
reasonably match this revert.

I also anticipate that if this change could trigger problems for
non-deferred-accept cases, we'd see a ton more reports than we have.

And we did some research and one of the only major servers that use
this obscure defer-accept feature is distcc and apache.  It is this
element of Ingo's bug report (that he uses distcc heavily and it was a
distcc socket which hung) that helped us narrow things down.

The SSH report clearly states "With kernel 2.6.26-rc5, ssh connections
to _remote_ servers randomly hang".  So this is a report about SSH
client connections under 2.6.26-rc5, not SSH server connections and
therefore not listening sockets.

So right now I'd say that the IA64 case could definitely be a match
but the SSH case very much is not.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists