lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 20 May 2016 11:07:49 +0200
From:	Giovanni Gherdovich <ggherdovich@...e.com>
To:	Peter Zijlstra <peterz@...radead.org>,
	Davidlohr Bueso <dave@...olabs.net>
Cc:	manfred@...orfullife.com, Waiman.Long@....com, mingo@...nel.org,
	torvalds@...ux-foundation.org, mgorman@...hsingularity.net,
	linux-kernel@...r.kernel.org
Subject: Re: sem_lock() vs qspinlocks

On Fri, 2016-05-20 at 10:18 +0200, Peter Zijlstra wrote:
> On Fri, May 20, 2016 at 10:13:15AM +0200, Peter Zijlstra wrote:
> > On Thu, May 19, 2016 at 10:39:26PM -0700, Davidlohr Bueso wrote:
> > 
> > > [1] https://hg.java.net/hg/libmicro~hg-repo
> > 
> > So far I've managed to install mercurial and clone this thing, but
> > it
> > doesn't actually build :/
> > 
> > I'll try harder..
> 
> The stuff needs this..
> 
> ---
> diff -r 7dd95b416c3c Makefile.com
> --- a/Makefile.com	Thu Jul 26 12:56:00 2012 -0700
> +++ b/Makefile.com	Fri May 20 10:18:08 2016 +0200
> @@ -107,7 +107,7 @@
>  	echo "char compiler_version[] =
> \""`$(COMPILER_VERSION_CMD)`"\";" > tattle.h
>  	echo "char CC[] = \""$(CC)"\";" >> tattle.h
>  	echo "char extra_compiler_flags[] = \""$(extra_CFLAGS)"\";"
> >> tattle.h
> -	$(CC) -o tattle $(CFLAGS) -I. ../tattle.c libmicro.a -lrt 
> -lm
> +	$(CC) -o tattle $(CFLAGS) -I. ../tattle.c libmicro.a -lrt 
> -lm -lpthread
>  
>  $(ELIDED_BENCHMARKS):	../elided.c
>  	$(CC) -o $(@) ../elided.c
> 

Hello Peter,


right, we forgot to mention that the libmicro Makefile is broken;
sorry for the hassle. At the bottom of this message you'll find the
script I use to reproduce the problem; you might have to modify the
variable $CASCADE_PATH.

The script needs an argument, which is the offending benchmark to run,
like

$ ./run_cascade.sh c_flock_200

or

$ ./run_cascade.sh c_cond_10

This runs the benchmark 10 times, and kills it if it lasts too long. I
get around 3 hangs per invocation, and on the affected kernels (4.2 or
later) I get around one panic each invocation of this reproducer.

The .config file with which you build the kernel seems to affect that,
too; I attach 2 config files:

- config.no-bug
- config.with-bug

The results I report (hangs & panics) happens if I compile with
config.with-bug, but disappear with config.no-bug. If you take
config.no-bug as reference, config.with-bug introduces

    CONFIG_MFD_SYSCON=y
    CONFIG_NO_HZ_IDLE=y
    CONFIG_QUEUED_SPINLOCK=y
    CONFIG_REGMAP=y
    CONFIG_REGMAP_MMIO=y
    CONFIG_TICK_CPU_ACCOUNTING=y

and removes

    CONFIG_BLK_DEV_DM=m
    CONFIG_BLK_DEV_DM_BUILTIN=y
    CONFIG_CONTEXT_TRACKING=y
    CONFIG_DM_UEVENT=y
    CONFIG_NO_HZ_FULL=y
    CONFIG_PAGE_EXTENSION=y
    CONFIG_PAGE_OWNER=y
    CONFIG_PARAVIRT_SPINLOCKS=y
    CONFIG_PERSISTENT_KEYRINGS=y
    CONFIG_RCU_NOCB_CPU=y
    CONFIG_RCU_NOCB_CPU_NONE=y
    CONFIG_RCU_USER_QS=y
    CONFIG_STAGING=y
    CONFIG_UNINLINE_SPIN_UNLOCK=y
    CONFIG_VIRT_CPU_ACCOUNTING=y
    CONFIG_VIRT_CPU_ACCOUNTING_GEN=y

Most of those params might be irrelevant, but some must trigger the
problem. Both configs are taken from /proc/config.gz on a running
system. FWIW my test machine is a 48 haswell cores with 64GB or RAM.


Giovanni
SUSE Labs


----------- run_cascade.sh -------------------------------------
#!/bin/bash

TESTCASE=$1
CASCADE_PATH="libmicro-1-installed/bin-x86_64"

case $TESTCASE in
    c_flock_200)
        BINNAME="cascade_flock"
        COMMAND="$CASCADE_PATH/cascade_flock -E -D 60000 -L -S -W \
                                             -N c_flock_200 \
                                             -P 200 -I 5000000"
        # c_flock_200 is supposed to last 60 seconds.
        SLEEPTIME=70
        ;;
    c_cond_10)
        BINNAME="cascade_cond"
        COMMAND="$CASCADE_PATH/cascade_cond -E -C 2000 -L -S -W \
                                            -N c_cond_10 \
                                            -T 10 -I 3000"
        # c_cond_10 terminates in less than 1 second.
        SLEEPTIME=5
        ;;
    *)
        echo "Unknown test case" >&2
        exit 1
        ;;
esac

ERRORS=0
uname -a
for i in {1..10} ; do
    {
        eval $COMMAND &
    } >/dev/null 2>&1
    sleep $SLEEPTIME
    if pidof $BINNAME >/dev/null ; then
        echo Run \#$i: $TESTCASE hangs
        for PID in $(pidof $BINNAME) ; do
            head -1 /proc/$PID/stack
        done | sort | uniq -c
        ERRORS=$((ERRORS+1))
        killall $BINNAME
    else
        echo Run \#$i: $TESTCASE exits successfully
    fi
done
echo $TESTCASE hanged $ERRORS times.
----------------------------------------------------------------
View attachment "config.with-bug" of type "text/x-mpsub" (101963 bytes)

View attachment "config.no-bug" of type "text/x-mpsub" (103351 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ