lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTinFlyz2d4Y6Viw0aWfe6mb6F7Ev4UCvkkQVFADd@mail.gmail.com>
Date:	Mon, 17 May 2010 09:44:47 -0400
From:	Donald Allen <donaldcallen@...il.com>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	john stultz <johnstul@...ibm.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Arjan van de Ven <arjan@...radead.org>
Subject: Re: PROBLEM: tickless scheduling

On Sun, May 16, 2010 at 7:36 PM, Thomas Gleixner <tglx@...utronix.de> wrote:
> Donald,
>
> On Sat, 15 May 2010, Donald Allen wrote:
>> Attached. This is from the 2.6.30 kernel on the Arch Linux install cd.
>>
>> Here's another bit of data. As I've said previously, the problems I'm
>> reporting were observed on a Toshiba NB310-305 netbook with a
>> single-core Atom 450 processor. I just built myself a mini-ITX system
>> using the Intel D510MO motherboard, which provides a dual-core D510
>> Atom processor. The other hardware on the board is similar to the
>> Toshiba. I installed the same Slackware snapshot I used on the
>> Toshiba, and did the home directory transfer without any problem at
>> all with the default tickless kernel. The hardware isn't identical,
>> and while I don't know the internals of the Linux kernel at all, my
>> gut, backed up by many years of OS development work in scheduling and
>> memory management, is telling me that the key difference is dual- vs.
>> single-core. Just a guess.
>
> I fear you are wrong.

Please don't be afraid.

>
> The key difference is almost certainly that the BIOS of your netbook
> tries to be overly clever vs. power management and is not aware of the
> fact that the Linux kernel uses timer hardware in a very different way
> than the other OS which comes preinstalled on that machine.
>
> The overly clever BIOS power management which works nicely with the
> vendor provided "drivers" for the other OS is just interfering with
> the kernels way of dealing with the problem.
>
> Can you please boot with "hpet=disable" on the kernel command line ?

I did, and it made no difference.

To be specific, the test I am doing involves booting with the Arch
Linux 2009.08 install/live cd. I then run

fsck.ext2 -f -r /dev/sda3

to do a read-only check of my root filesystem. I watch the
disk-activity light, and it reliably goes out and then you've got a
long wait if you do nothing. Tickling the touchpad gets things moving
again. This happens reliably with or without the boot-time option you
requested above.

I just noticed something else, however, that may lend credence to the
opinion expressed by Arjan van de Ven that this has nothing to do with
tickless. I originally noticed this problem on the Toshiba netbook
when I installed the Slackware 13.1 x86_64 beta on this machine, which
comes with a tickless 2.6.33.3 kernel. The first symptom I observed
was attempting to rsync my home directory from another machine to this
new install and, as previously described, I had to help things along
by activating the touchpad, or pressing the ctrl key (any kind of
external stimulus that would generate an interrupt seemed to work).
Anyway, after some discussion with Patrick Volkerding, I decided to
build a custom kernel for the netbook and disabled tickless in that
kernel. After getting that kernel working, I re-did the tests that
failed with the tickless kernel and they all worked fine, so I thought
we had our culprit. But just now, after doing the test you requested
above, I rebooted the system from its installed kernel (the tickful
kernel I built), and it hung during booting. At first I thought it was
taking awhile to do the dance with the dhcp server, but after waiting
longer than I thought this should take, I touched the touchpad and
forward progress began again immediately (disk light came on, boot
time chatter proceeded, etc.). So, my current guess, for what it's
worth, is that there's a race here that causes the system to miss the
fact that it has a runnable process, and the probability of hitting it
is reduced, but not to zero, by using tickful scheduling.

I will do the experiment suggested by Arjan van de Ven and report the
results of that separately.

/Don


>
> Thanks,
>
>        tglx
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ