lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPhsuW5Od9tczboEBxC8gn+2XLkEbirfCUm7WuJBey5MKQjwDA@mail.gmail.com>
Date:   Tue, 22 Aug 2023 11:56:04 -0700
From:   Song Liu <song@...nel.org>
To:     Bagas Sanjaya <bagasdotme@...il.com>
Cc:     Christoph Hellwig <hch@....de>, AceLan Kao <acelan@...il.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux Regressions <regressions@...ts.linux.dev>,
        Linux RAID <linux-raid@...r.kernel.org>
Subject: Re: Infiniate systemd loop when power off the machine with multiple
 MD RAIDs

On Wed, Aug 16, 2023 at 2:37 AM Bagas Sanjaya <bagasdotme@...il.com> wrote:
>
> Hi,
>
> I notice a regression report on Bugzilla [1]. Quoting from it:
>
> > It needs to build at least 2 different RAIDs(eg. RAID0 and RAID10, RAID5 and RAID10) and then you will see below error repeatly(need to use serial console to see it)
> >
> > [ 205.360738] systemd-shutdown[1]: Stopping MD devices.
> > [ 205.366384] systemd-shutdown[1]: sd-device-enumerator: Scan all dirs
> > [ 205.373327] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/bus
> > [ 205.380427] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/class
> > [ 205.388257] systemd-shutdown[1]: Stopping MD /dev/md127 (9:127).
> > [ 205.394880] systemd-shutdown[1]: Failed to sync MD block device /dev/md127, ignoring: Input/output error
> > [ 205.404975] md: md127 stopped.
> > [ 205.470491] systemd-shutdown[1]: Stopping MD /dev/md126 (9:126).
> > [ 205.770179] md: md126: resync interrupted.
> > [ 205.776258] md126: detected capacity change from 1900396544 to 0
> > [ 205.783349] md: md126 stopped.
> > [ 205.862258] systemd-shutdown[1]: Stopping MD /dev/md125 (9:125).
> > [ 205.862435] md: md126 stopped.
> > [ 205.868376] systemd-shutdown[1]: Failed to sync MD block device /dev/md125, ignoring: Input/output error
> > [ 205.872845] block device autoloading is deprecated and will be removed.
> > [ 205.880955] md: md125 stopped.
> > [ 205.934349] systemd-shutdown[1]: Stopping MD /dev/md124p2 (259:7).
> > [ 205.947707] systemd-shutdown[1]: Could not stop MD /dev/md124p2: Device or resource busy
> > [ 205.957004] systemd-shutdown[1]: Stopping MD /dev/md124p1 (259:6).
> > [ 205.964177] systemd-shutdown[1]: Could not stop MD /dev/md124p1: Device or resource busy
> > [ 205.973155] systemd-shutdown[1]: Stopping MD /dev/md124 (9:124).
> > [ 205.979789] systemd-shutdown[1]: Could not stop MD /dev/md124: Device or resource busy
> > [ 205.988475] systemd-shutdown[1]: Not all MD devices stopped, 4 left.

>From systemd code, i.e. function delete_md(), this error:

[ 205.957004] systemd-shutdown[1]: Stopping MD /dev/md124p1 (259:6).
[ 205.964177] systemd-shutdown[1]: Could not stop MD /dev/md124p1:
Device or resource busy

is most likely triggered by ioctl(STOP_ARRAY).

And based on the code, I think the ioctl fails here:

        if (cmd == STOP_ARRAY || cmd == STOP_ARRAY_RO) {
                /* Need to flush page cache, and ensure no-one else opens
                 * and writes
                 */
                mutex_lock(&mddev->open_mutex);
                if (mddev->pers && atomic_read(&mddev->openers) > 1) {
                        mutex_unlock(&mddev->open_mutex);
                        err = -EBUSY;
                        goto out;        ////////////////////// HERE
                }
                if (test_and_set_bit(MD_CLOSING, &mddev->flags)) {
                        mutex_unlock(&mddev->open_mutex);
                        err = -EBUSY;
                        goto out;
                }
                did_set_md_closing = true;
                mutex_unlock(&mddev->open_mutex);
                sync_blockdev(bdev);
        }

>
> See Bugzilla for the full thread and attached full journalctl log.
>
> Anyway, I'm adding this regression to be tracked by regzbot:
>
> #regzbot introduced: 12a6caf273240a https://bugzilla.kernel.org/show_bug.cgi?id=217798
> #regzbot title: systemd shutdown hang on machine with different RAID levels

But the observation above doesn't seem to match the bisect result
and it doesn't seem to be related to different RAID levels.

Thanks,
Song

>
> Thanks.
>
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217798
>
> --
> An old man doll... just what I always wanted! - Clara

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ