lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090604170820.GA9823@prithivi.gnumonks.org>
Date:	Thu, 4 Jun 2009 19:08:20 +0200
From:	Harald Welte <HaraldWelte@...tech.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Duane Griffin <duaneg@...da.com>,
	"Michael S. Zick" <lkml@...ethan.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

Dear Linus and others,

On Thu, Jun 04, 2009 at 09:13:15AM -0700, Linus Torvalds wrote:

> > There have been reports of hangs on various VIA C7 machines going back
> > a year now. The version of the kernel doesn't seem to matter, but the
> > version of glibc does. Unfortunately there hasn't been much progress
> > in getting to the bottom of it.
> > 
> > See here (and other linked reports):
> > http://bugs.gentoo.org/show_bug.cgi?id=228263
> 
> Hmm. That looks like a CPU problem, but hey, it might be that the glibc 
> version thing is just coincidence, and just changes timings or whatever, 
> and the problem is in the chipsets.
> 
> So at least from that particular report it smells very much 
> non-kernel-related.
> 
> That said, even if it isn't kernel-related, it might be fixable with some 
> kernel patch that changes the setup of the CPU/chipset. But we'd need VIA 
> to help with anythign like that.

So far, inside VIA there is no well-known issue/bug about such hangs / locks at
all.

I have seen a number (probably between 5 or 10) of sporadic reports from a
number of people on a variety of systems.  Some from actual commercial vendors
of VIA+Linux based appliances, and some from the wider community of end users.
So far, to the best of my knowledge, none of those isseus has been narrowed
down to a sufficiently easy to reproduce test case.  Also, none of the bug
reporters has so far been able to reproduce the problem on a genuine VIA
mainboard, i.e. it could be issues introduced by the actual board hardware or
how the speicfic BIOS initializes the low-level hardware.

Especially when SMI/SMM based debugging no longer works (i.e. something that
appears to be a bus lockup), the actual bug needs to be reproduced on a
reference board that can be hooked up to a logic/protocol analyzer.

On the other hand, VIA's CPU division (CentaurLabs) is performing extensive
testing on their CPUs with a large codebase of x86 code, AFAIK based on more
than 40 operating systems.  Also, there are large quantities of VIA CPU+chipset
systems that run without any problem, especially in 24/7 embedded x86 worloads
on Linux...

I'm more than determined to help resolving those sporadic Linux lock-up
problems. It feels like there is some problem out there, given the fact that
there is a number of independent reporters who talk about some kind of hard
system hang without oops that even prevents the NMI watchdog to kick in.

However, unless we can somehow narrow down at least one of those reports into
something  that is easier to reproduce, and which can actuall be reproduced on
a VIA board.  Triggering in 1-4 hours is already very good, I have reports
where 1 of 30 system exposes a lock once within 5 days of continuous full
application workload.

Sure, third party BIOS/board vendors selling products that randomly produce
locks are obviously also not a particularly great advertisement for VIA...
but debigging on such a board is much more difficult due to the lack of access
to BIOS sources, schematics and hardware debugging interfaces.

In any case, if somebody can ship me a system that exposes one of those
lock-ups, together with a pre-installed test case that exposes the problem
within let's say less than one day, plus the full kernel sources used in
that particular system:  I'm happy to spend time to investigate the issue,
try to run the same test case on a VIA board, etc.

Any additional help is much appreciated.

Regards,
-- 
- Harald Welte <HaraldWelte@...tech.com>	    http://linux.via.com.tw/
============================================================================
VIA Free and Open Source Software Liaison
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ