linux-kernel - Re: Serial related oops

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070222170354.GB633@flint.arm.linux.org.uk>
Date:	Thu, 22 Feb 2007 17:03:54 +0000
From:	Russell King <rmk+lkml@....linux.org.uk>
To:	Jose Goncalves <jose.goncalves@...v.pt>
Cc:	Frederik Deweerdt <deweerdt@...e.fr>, akpm@...ux-foundation.org,
	linux-kernel@...r.kernel.org
Subject: Re: Serial related oops

On Thu, Feb 22, 2007 at 03:02:46PM +0000, Jose Goncalves wrote:
> It could be a silly question (tamper with me as I'm not familiar with
> such low level programming), but couldn't it be possible for a interrupt
> to hit in the middle of the serial_in() calls and mess with %ebx?

I'm no expert on x86, but if an interrupt was messing with %ebx, you'd
have random crashes verywhere - userspace, kernel space in unpredicatable
ways.

> What I find real hard to understand is why a hardware fault happens
> always in the same software instruction! I would expect a hardware fault
> to hit randomly...

Well, compared with your previous report, your latest report is different.
Your first report had  both EIP and %ebx being zero (because they got
corrupted when returning from serial_in).  This time only %ebx was
corrupted.

Consequently, this time we oopsed in the subsequent serial_in() rather
than trying to return to serial8250_startup() as last time.

> I left my application running this night, with a 2.6.16.41 kernel
> unpatched  on the serial driver (my last Oops report was with Frederik
> patch to remove the insertion made in 2.6.12) and it crashed again on
> exactly the same point!

>From that I take it that you removed the test in serial8250_startup which
sets UART_BUG_TXEN, and the problem persisted.  That tends to suggest
that it's not the culpret.

> > For all we know, it could be a one-off fault on the hardware you
> > happen to have - other identical units may not behave the same (can
> > you check?)
> 
> Yes I have other units that I can test it. I'll do that to see if it's
> really a one-off fault on the hardware.

Would be nice to know.

> If it continues to crash with other units I will then test with the
> msleep(10) before the "And clear the interrupt registers again for
> luck.", as you suggested earlier.
> 
> > If it is a one off case, you are welcome to patch that test out in
> > your kernel build to remove the problem, and if it's an isolated case
> > I encourage you to do this.  This is one of the great advantages of
> > open source - if you hit such a problem rather than throwing the
> > hardware away you can work around such issues.
> 
> I didn't understand what you mean by "you are welcome to patch that test
> out in your kernel build to remove the problem". Which test are you
> talking about?

The one which sets UART_BUG_TXEN.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/