linux-kernel - Re: Linux 2.6.30-rc8 [also: VIA Support]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <200906041218.44683.lkml@morethan.org>
Date:	Thu, 4 Jun 2009 12:18:42 -0500
From:	"Michael S. Zick" <lkml@...ethan.org>
To:	Harald Welte <HaraldWelte@...tech.com>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Duane Griffin <duaneg@...da.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu June 4 2009, Harald Welte wrote:
> Dear Linus and others,
> 
> On Thu, Jun 04, 2009 at 09:13:15AM -0700, Linus Torvalds wrote:
> 
> > > There have been reports of hangs on various VIA C7 machines going back
> > > a year now. The version of the kernel doesn't seem to matter, but the
> > > version of glibc does. Unfortunately there hasn't been much progress
> > > in getting to the bottom of it.
> > > 
> > > See here (and other linked reports):
> > > http://bugs.gentoo.org/show_bug.cgi?id=228263
> > 
> > Hmm. That looks like a CPU problem, but hey, it might be that the glibc 
> > version thing is just coincidence, and just changes timings or whatever, 
> > and the problem is in the chipsets.
> > 
> > So at least from that particular report it smells very much 
> > non-kernel-related.
> > 
> > That said, even if it isn't kernel-related, it might be fixable with some 
> > kernel patch that changes the setup of the CPU/chipset. But we'd need VIA 
> > to help with anythign like that.
> 
> So far, inside VIA there is no well-known issue/bug about such hangs / locks at
> all.
> 
> I have seen a number (probably between 5 or 10) of sporadic reports from a
> number of people on a variety of systems.  Some from actual commercial vendors
> of VIA+Linux based appliances, and some from the wider community of end users.
> So far, to the best of my knowledge, none of those isseus has been narrowed
> down to a sufficiently easy to reproduce test case.  Also, none of the bug
> reporters has so far been able to reproduce the problem on a genuine VIA
> mainboard, i.e. it could be issues introduced by the actual board hardware or
> how the speicfic BIOS initializes the low-level hardware.
> 
> Especially when SMI/SMM based debugging no longer works (i.e. something that
> appears to be a bus lockup), the actual bug needs to be reproduced on a
> reference board that can be hooked up to a logic/protocol analyzer.
> 
> On the other hand, VIA's CPU division (CentaurLabs) is performing extensive
> testing on their CPUs with a large codebase of x86 code, AFAIK based on more
> than 40 operating systems.  Also, there are large quantities of VIA CPU+chipset
> systems that run without any problem, especially in 24/7 embedded x86 worloads
> on Linux...
> 
> I'm more than determined to help resolving those sporadic Linux lock-up
> problems. It feels like there is some problem out there, given the fact that
> there is a number of independent reporters who talk about some kind of hard
> system hang without oops that even prevents the NMI watchdog to kick in.
> 
> However, unless we can somehow narrow down at least one of those reports into
> something  that is easier to reproduce, and which can actuall be reproduced on
> a VIA board.  Triggering in 1-4 hours is already very good, I have reports
> where 1 of 30 system exposes a lock once within 5 days of continuous full
> application workload.
> 
> Sure, third party BIOS/board vendors selling products that randomly produce
> locks are obviously also not a particularly great advertisement for VIA...
> but debigging on such a board is much more difficult due to the lack of access
> to BIOS sources, schematics and hardware debugging interfaces.
> 
> In any case, if somebody can ship me a system that exposes one of those
> lock-ups, together with a pre-installed test case that exposes the problem
> within let's say less than one day, plus the full kernel sources used in
> that particular system:  I'm happy to spend time to investigate the issue,
> try to run the same test case on a VIA board, etc.
> 

I am about at my wits end with this Everex product -

Give me a couple more weeks at the problem and if I haven't solved it;
I'll give you this machine if you promise to update LKML with any fix.

Mike
> Any additional help is much appreciated.
> 
> Regards,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/