lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230725160453.40605-1-falcon@tinylab.org>
Date:   Wed, 26 Jul 2023 00:04:53 +0800
From:   Zhangjin Wu <falcon@...ylab.org>
To:     w@....eu
Cc:     arnd@...db.de, falcon@...ylab.org, linux-kernel@...r.kernel.org,
        linux-kselftest@...r.kernel.org, thomas@...ch.de
Subject: Re: [PATCH v2 14/14] selftests/nolibc: tinyconfig: add support for 32/64-bit powerpc

Hi, Willy

> On Wed, Jul 19, 2023 at 09:32:46PM +0800, Zhangjin Wu wrote:
> > Firstly, add extra config files for powerpc, powerpc64le and powerpc64.
> > 
> > Second, QEMU_TIMEOUT is configured as 60 seconds for powerpc to allow
> > quit qemu-system-ppc even if poweroff fails. In normal host machine, ~20
> > seconds may be enough for boot+test+poweroff, but 60 seconds is used
> > here to gurantee it at least finish even in a very slow host machine or
> > the host machine is too busy. Both powerpc64le and powerpc64 can
> > poweroff normally, no need to configure QEMU_TIMEOUT for them.
> 
> Hmmm call me annoying, but this started with tinyconfig "in order to
> save build time" and now it's enforcing a 1-minute timeout on a single
> test. When I run the tests, they hardly last more than a few seconds
> and sometimes even just about one second. If some tests last too long
> doing nothing, we should adjust their config (e.g. useless probe of a
> driver). If they can't power off due to a config option we need to fix
> that option. If it can't power off due to the architecture, we can also
> try the reboot (qemu is started with --no-reboot to stop instead of
> rebooting), and as a last resort we should rely on the timeout in case
> everything else fails. But then this timeout should be quite short
> because we'll then have guaranteed from the choice of config options
> that it boots and executes fast by default.
>

As I just explained in this reply [1], our current timeout logic will
detect the 'power off' string at first, so, the 1-minute is the worst
case when the qemu even not print a 'power off' string, that should be a
bug, normally, after the 'power off' string detected, qemu will quit as
expected. the 1-minute is just configured here as a last watchdog to
detect a real hang (may be bios related or may be kernel realted) ;-)

So, the 60 seconds will never be reached, even there is a failed
poweroff, but a smaller one may be ok, what about 30 seconds?

[1]: https://lore.kernel.org/lkml/20230725145955.37685-1-falcon@tinylab.org/

> Finally, if we need to implement a timeout enforcement for at least
> one arch because we do not control every failure case, then there's no
> reason for considering that certain archs are safe against this and
> others not. This means that we can (should?) implement the timeout by
> default for every arch,

Agree, so, what your suggestion about the default timeout? ;-)

10 or 15 seconds may be not enough especially when running on a very
slow host machine, for example, my host will be very slow when the
battery is not in charging status ;-(

And also, the architectures like PowerPC using a very slow SLOF will
boot very slowly, sometimes 20 seconds may be not enough and it may cost
30+ seconds on a very slow machine.

> and make sure that the timeout is never hit by
> default

Yeah, it is the current behavior.

> , unless there's really absolutely no way to fix the arch that
> cannot power down nor reboot,

Even when the kernel not support poweroff, the 'power off' string will
be printed after our 'reboot' syscall, our current timeout logic will
detect this and let qemu quit. We even plan to detect the 'Leaving
init with final status' line.

so, it is not necessary to spend too much time to find out and enable
the kernel power off support for every architecture. and some
architectures may simply not support power-off, and also, some
architectures require too many 'heavy' options to let power-off work,
which may increase build time for tinyconfig a lot, for example, the
ACPI+PCI support are required for power-off for x86.

> in which case the timeout should remain
> short enough.
> 
> What's your opinion ?
>

As a summary, with current timeout logic, a big timeout is only hit when
a real hang happen. Even when the kernel not support power-off, the
power-off string will be detected by us and qemu will quit by pkill.

So, a not that big timeout for every architecture by default, but still
allow the architecture to configure a bigger one?

    QEMU_TIMEOUT_powerpc     = 35
    QEMU_TIMEOUT             = $(or $(QEMU_TIMEOUT_$(XARCH),30)

I will retest them carefully, I'm still worried about that a too small timeout
may kill qemu during test or even before running test, but it would run tests
and power-off normally if we not kill them.

And even further, I'm thinking about the detecting of the boot hang as
earier as we can, for example, these lines are good for us:

    // first line to detect bios hang, may be 5 seconds?
    Linux version 6.4.0+ ...

    // second line to detect kernel boot hang, may be 10 or 15 seconds?
    Run /init as init process

    // third line to detect test hang, ...
    Leaving init with final status

    // forth line to detect power-off
    reboot: Power down

So, even we configure a big timeout, but we can use a smaller default hang
detect setting for bios hang, kernel hang and test hang, it will kill qemu as
earier as we can, even hang happens, no need to wait for the timeout value we
configured.

Best regards,
Zhangjin

> Willy

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ