lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 8 Aug 2023 11:20:54 -0700
From:   John Stultz <jstultz@...gle.com>
To:     "Jason A. Donenfeld" <Jason@...c4.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
        Waiman Long <longman@...hat.com>,
        Boqun Feng <boqun.feng@...il.com>,
        "Paul E . McKenney" <paulmck@...nel.org>,
        Joel Fernandes <joelaf@...gle.com>,
        Li Zhijian <zhijianx.li@...el.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        kernel-team@...roid.com
Subject: Re: [RFC][PATCH 1/3] test-ww_mutex: Use prng instead of rng to avoid
 hangs at bootup

On Tue, Aug 8, 2023 at 7:05 AM Jason A. Donenfeld <Jason@...c4.com> wrote:
>
> Hi Peter, John,
>
> On Tue, Aug 8, 2023 at 12:36 PM Peter Zijlstra <peterz@...radead.org> wrote:
> >
> > On Tue, Aug 08, 2023 at 06:26:41AM +0000, John Stultz wrote:
> > > Booting w/ qemu without kvm, I noticed we'd sometimes seem to get
> > > stuck in get_random_u32_below(). This seems potentially to be
> > > entropy exhaustion (with the test module linked statically, it
> > > runs pretty early in the bootup).
> > >
> > > I'm not 100% sure on this, but this patch switches to use the
> > > prng instead since we don't need true randomness, just mixed up
> > > orders for testing ww_mutex lock acquisitions.
> > >
> > > With this patch, I no longer see hangs in get_random_u32_below()
> > >
> > > Feedback would be appreciated!
> >
> > Jason, I thought part of the 'recent' random rework was avoiding the
> > exhaustion problem, could you please give an opinion on the below?
>
> Thanks for looping me in. I actually can't reproduce this. I'm using a
> minimal config and using QEMU without KVM. The RNG doesn't initialize
> until much later on in the boot process, expectedly, yet
> get_random_u32_below() does _not_ hang in my trials. And indeed it's
> designed to never hang, since that would create boot deadlocks. So I'm
> not sure why you're seeing a hang.

Ok. My hesitancy here was in part due to my understanding the entropy
exhaustion hangs weren't supposed to  be an issue anymore (thanks,
btw).

> It is worth noting that in those early boot test-case scenarios,
> before the RNG initializes, get_random_u32_below() will be somewhat
> slower than it normally is, and also slower than prandom_u32_state().
> (But only in this early boot scenario edge case; this isn't a general
> statement about speed.) It's possible that in your QEMU machine,
> things are slow enough that you're simply noticing the difference. On
> my system, however, I replaced `get_random_u32_below()` with `static
> u32 x; return ++x % ceil;` and I didn't see any difference running it
> under TCG -- it took about 7 seconds either way.
>
> So, from my perspective, you shouldn't see any hang. That function
> never blocks. I'm happy to look more into what's happening on your end
> though. Maybe share your .config and qemu command line and I'll see if
> I can repro?

Yeah, it may just be that the real RNG is slow enough that I'm hitting
the hung task watchdog?
(I'm running with 64 cpus, so the test is trying to use 128 threads
all hitting get_random_u32_below over and over to create their own
random order of 16 locks)

When the softlockup watchdog hits, the trace looks like:
[   64.268881]  _get_random_bytes+0x3b/0x140
[   64.268881]  ? __kmem_cache_alloc_node+0x17a/0x310
[   64.268881]  ? get_random_order+0x28/0xa0
[   64.268881]  get_random_u32+0x1c2/0x1e0
[   64.268881]  __get_random_u32_below+0xd/0x60
[   64.268881]  get_random_order+0x60/0xa0
[   64.268881]  stress_inorder_work+0x26/0x390
[   64.268881]  ? lock_acquire+0xd4/0x290
[   64.268881]  process_one_work+0x267/0x520
[   64.268881]  worker_thread+0x4a/0x390
[   64.268881]  ? __pfx_worker_thread+0x10/0x10
[   64.268881]  kthread+0xf5/0x130
[   64.268881]  ? __pfx_kthread+0x10/0x10
[   64.268881]  ret_from_fork+0x2b/0x40
[   64.268881]  ? __pfx_kthread+0x10/0x10
[   64.268881]  ret_from_fork_asm+0x1b/0x30


Attached is the defconfig I just used.

My script to run is below.

thanks
-john

#base boot args
kcmd="root=/dev/vda2 console=ttyS0 earlycon "
#debug helper
kcmd+="ftrace_dump_on_oops sysrq_always_enabled debug_boot_weak_hash "
#locktorture bits
kcmd+="torture.random_shuffle=1 locktorture.writer_fifo=1
locktorture.torture_type=mutex_lock locktorture.nested_locks=8
locktorture.rt_boost=1 locktorture.rt_boost_factor=50
locktorture.stutter=0 "

cpu_count=64
mem_size="6G"
#use_kvm="-enable-kvm"
device_config="-smp ${cpu_count} -m ${mem_size} ${use_kvm}      \
                -drive if=virtio,format=qcow2,file=ubuntu.img   \
                -device virtio-scsi-pci,id=scsi0                \
                -object rng-random,filename=/dev/urandom,id=rng0\
                -device virtio-rng-pci,rng=rng0                 \
                -device virtio-net-pci,netdev=net0              \
                -netdev user,id=net0,hostfwd=tcp::8022-:22      \
                -chardev stdio,id=char0,mux=on,logfile=serial.log,signal=off \
                -serial chardev:char0 -mon chardev=char0        \
                -nographic                                      \
                "
kbin="bzImage"

qemu-system-x86_64                                      \
                -kernel "${kbin}" -initrd initrd.img -append "${kcmd}"  \
                ${device_config}

Download attachment "defconfig" of type "application/octet-stream" (27023 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ