lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZOUI9yDzjxuFP68E@fisica.ufpr.br>
Date:   Tue, 22 Aug 2023 16:13:59 -0300
From:   Carlos Carvalho <carlos@...ica.ufpr.br>
To:     Song Liu <song@...nel.org>
Cc:     Bagas Sanjaya <bagasdotme@...il.com>,
        Christoph Hellwig <hch@....de>, AceLan Kao <acelan@...il.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux Regressions <regressions@...ts.linux.dev>,
        Linux RAID <linux-raid@...r.kernel.org>
Subject: Re: Infiniate systemd loop when power off the machine with multiple
 MD RAIDs

Song Liu (song@...nel.org) wrote on Tue, Aug 22, 2023 at 03:56:04PM -03:
> >From systemd code, i.e. function delete_md(), this error:
> 
> [ 205.957004] systemd-shutdown[1]: Stopping MD /dev/md124p1 (259:6).
> [ 205.964177] systemd-shutdown[1]: Could not stop MD /dev/md124p1:
> Device or resource busy
> 
> is most likely triggered by ioctl(STOP_ARRAY).
> 
> And based on the code, I think the ioctl fails here:
> 
>         if (cmd == STOP_ARRAY || cmd == STOP_ARRAY_RO) {
>                 /* Need to flush page cache, and ensure no-one else opens
>                  * and writes
>                  */
>                 mutex_lock(&mddev->open_mutex);
>                 if (mddev->pers && atomic_read(&mddev->openers) > 1) {
>                         mutex_unlock(&mddev->open_mutex);
>                         err = -EBUSY;
>                         goto out;        ////////////////////// HERE
>                 }
>                 if (test_and_set_bit(MD_CLOSING, &mddev->flags)) {
>                         mutex_unlock(&mddev->open_mutex);
>                         err = -EBUSY;
>                         goto out;
>                 }
>                 did_set_md_closing = true;
>                 mutex_unlock(&mddev->open_mutex);
>                 sync_blockdev(bdev);
>         }

Probably. The problem is why doesn't it manage to flush the page cache? I find
strange that the problem appears only when trying to stop the array, I get it
when trying to umount the filesystem, where it also hangs because of the same
reason. The kworker thread runs continuously using 100% cpu of only 1 core.

These are all similar symptoms of the underlying problem which I complained
about days ago: the system doesn't manage to write to the disks, which stay
nearly idle. If you wait long enough without issuing writes, which can be
several hours, it'll eventually flush the page cache and proceed to a "normal"
umount or reboot.

The bug is dependent on the rate of writes and also on uptime; it rarely
appears soon after boot, and increases as times passes.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ