lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Tue, 17 May 2022 13:33:43 -0500
From:   Larry Finger <Larry.Finger@...inger.net>
To:     Vadim Galitsin <vadim.galitsyn@...cle.com>,
        "larry.finger@...il.com" <larry.finger@...il.com>,
        "Jason@...c4.com" <Jason@...c4.com>
Cc:     LKML <linux-kernel@...r.kernel.org>
Subject: Re: Changes in kernel 5.18-rc1 leads to crashes in VirtualBox Virtual
 Machines

On 5/17/22 12:27, Vadim Galitsin wrote:
> Hi Larry and Jason,
> 
> I am from VirtualBox team. I noticed your conversation here:
> 
> https://lore.kernel.org/lkml/Ym8uPcuQpq1xBS6d@zx2c4.com/T/#mea7aa731b5524a05ac3b3e8588c0c42235bb33d6 
> <https://lore.kernel.org/lkml/Ym8uPcuQpq1xBS6d@zx2c4.com/T/#mea7aa731b5524a05ac3b3e8588c0c42235bb33d6>
> 
> Please let me add my 5c. I agree with Larry, the issue start happen after 
> 6e8ec2552c7d. I did not do complete bisecting, but rather tried this revision 
> and the one before (with dcd03ba15947cbad1a34cfed370c4feb41058469 -- I do not 
> see the issue).
> 
> For me this issue is quite reproducible with Ubuntu 20.04 Linux guest (other 
> guests are also affected). It happens even if there is no VBox Guest Additions 
> installed into guest. Guest kernel version does not play much role. Running 
> kernel 5.18-rc1+ on the host side is essential.
> 
> The first way for me to reproduce it -- is to run stress-ng(1) tool inside guest 
> and perform random mouse cursor movements (basically, mouse or keyboard 
> interrupts generation is somehow essential here). Tool will report the following 
> error:
> 
> root@...t-VirtualBox:~# stress-ng --vm 4 -t 10
> stress-ng: info:  [5463] dispatching hogs: 4 vm
> stress-ng: fail:  [5464] stress-ng-vm: detected 194065152 bit errors while 
> stressing memory
> stress-ng: error: [5463] process 5464 (stress-ng-vm) terminated with an error, 
> exit status=1 (stress-ng core failure)
> stress-ng: info:  [5463] unsuccessful run completed in 10.06s
> 
> This approach does not work in 100% cases, but triggers issue quite frequently.
> 
> The second approach is much more reliable for me. I basically, start compiling 
> kernel inside guest (say, with make -j4) and start moving mouse (or generate 
> keyboard interrupts, pressing keys randomly). In this case, gcc processes will 
> randomly receive SEGFAULT.
> 
> Important note: if I do not touch mouse or keyboard in both cases above -- all 
> works as normal.
> 
> My initial guess was that this might have something to do with kstack 
> randomization, but booting host kernel with randomize_kstack_offset=0 seem does 
> not change anything in this regard.
> 
> I am currently running out of ideas what exactly might trigger such behavior. 
> Hopefully, this additional info might shed additional light.
> 
> Best regards,
> Vadim
> 

Vadim,

I had an extended E-mail interchange with Jason Donenfeld over this issue. Sorry 
that most of this was private because some large files needed to be transmitted 
that were not appropriate for LKML. LKML is added back in to this reply.

My test for the fault was to start a VM running Windows 10 and use Edge to load 
the VirtualBox web page. Usually within a few seconds, Edge or Windows would 
crash. In the latter case, the log for the VM might show an unhandled exception 
while in kernel mode. I thought the browser was hitting the random number 
generator hard, but there is mouse activity, of course.

Jason has created a patch entitled "random: do not use input pool from hard 
IRQs" that fixes the problem for me. It can be found at 
https://lore.kernel.org/lkml/20220510140025.81168-1-Jason@zx2c4.com/. I had 
expected this patch to be merged into the mainline kernel by now. Jason should 
be able to shed light on any delays.

The bottom line and good news for Oracle/VirtualBox and those of us that package 
VB for distros is that this is a kernel regression - which is a conclusion I 
hesitated to make earlier. It is not a problem with VirtualBox, VB just exposes 
the kernel problem.

I certainly hope that this problem is fixed before 5.18 is released. If not, I 
will need to campaign to prevent openSUSE Tumbleweed from switching to 5.18. 
That would normally happen with the release of 5.18.1!

Larry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ