lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Yh/nbZYmYD6SpZV9@sakura.greysector.net>
Date:   Wed, 2 Mar 2022 22:53:49 +0100
From:   Dominik Mierzejewski <dominik@...ysector.net>
To:     Song Liu <song@...nel.org>
Cc:     Thorsten Leemhuis <regressions@...mhuis.info>,
        linux-raid <linux-raid@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "regressions@...ts.linux.dev" <regressions@...ts.linux.dev>,
        Jens Axboe <axboe@...nel.dk>
Subject: Re: Bug 215605 - [5.14 regression] BUG: unable to handle page fault
 while running badblocks (fsck.ext4 -c) on a raid5 md array

Hello!

On Tuesday, 01 March 2022 at 01:24, Song Liu wrote:
> On Mon, Feb 28, 2022 at 1:43 AM Thorsten Leemhuis
> <regressions@...mhuis.info> wrote:
> >
> > [CCing Jens]
> >
> > Hi, this is your Linux kernel regression tracker. Top-posting for once,
> > to make this easily accessible to everyone.
> >
> > What's up here? Below regression was reported two weeks ago and I
> > forwarded it nearly a week ago, nevertheless the reporter afaics didn't
> > get a single reply. Is the issue discussed somewhere else and I just
> > missed it? Is the report not accurate for some reason or missing
> > something important? Or did the report fall throug the cracks?
> 
> Sorry for the late reply. I was on vacation last week.

No problem, thanks for responding.

[...]
> > On 22.02.22 09:59, Thorsten Leemhuis wrote:
> > > Hi, this is your Linux kernel regression tracker.
> > >
> > > I noticed a regression report in bugzilla.kernel.org that afaics nobody
> > > acted upon since it was reported about a week ago, that's why I decided
> > > to forward it to the lists and add a few relevant people to the CC. To
> > > quote from https://bugzilla.kernel.org/show_bug.cgi?id=215605
> > >
> > >>  Dominik Mierzejewski 2022-02-14 10:36:36 UTC
> > >>
> > >> Created attachment 300450 [details]
> > >> kernel-5.16.8 dmesg with crash
> > >>
> > >> I'm experiencing kernel crash when running badblocks (fsck.ext4 -c) on a raid5 md array in my Intel Atom-based NAS box (Thecus N5550):
> > >> [  720.911993] kernel: BUG: unable to handle page fault for address: ffffdbc681023bc8
> > >> [  720.912073] kernel: #PF: supervisor read access in kernel mode
> > >> [  720.912120] kernel: #PF: error_code(0x0000) - not-present page
> > >> [  720.912166] kernel: PGD 11ffc6067 P4D 11ffc6067 PUD 0
> > >> [  720.912213] kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
> > >> [  720.912256] kernel: CPU: 1 PID: 1406 Comm: badblocks Not tainted 5.16.8-200.fc35.x86_64 #1
> > >> [  720.912321] kernel: Hardware name: Intel Corporation Milstead Platform/Granite Well, BIOS CDV W Series 05 08/27/2015
> > >> [  720.912400] kernel: RIP: 0010:kfree+0x58/0x3e0
> > >> [  720.912449] kernel: Code: 80 4c 01 e5 0f 82 84 03 00 00 48 c7 c0 00 00 00 80 48 2b 05 4a 96 3b 01 48 01 c5 48 c1 ed 0c 48 c1 e5 06 48 03 2d 28 96 3b 01 <48> 8b 45 08 48 8d 50 ff a8 01 48 0f 45 ea 4
> > >> 8 8b 55 08 48 8d 42 ff
> > >> [  720.912598] kernel: RSP: 0018:ffff9db4008efaf8 EFLAGS: 00010286
> > >> [  720.912648] kernel: RAX: 00006d7bc0000000 RBX: ffff9284c5214800 RCX: ffff9284c3758ff8
> > >> [  720.912708] kernel: RDX: ffff9283c1102740 RSI: ffffffffc07af091 RDI: ffff9db4008efd58
> > >> [  720.912767] kernel: RBP: ffffdbc681023bc0 R08: ffff9db4008efb88 R09: ffff9284c3759000
> > >> [  720.912826] kernel: R10: 0000000000000028 R11: ffff9284c213db48 R12: ffff9db4008efd58
> > >> [  720.912885] kernel: R13: ffff9284c213da00 R14: ffff9284c375f000 R15: ffff9db4008efd58
> > >> [  720.912945] kernel: FS:  00007f73e6669740(0000) GS:ffff9284dbc80000(0000) knlGS:0000000000000000
> > >> [  720.913012] kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >> [  720.913062] kernel: CR2: ffffdbc681023bc8 CR3: 000000005c2cc000 CR4: 00000000000006e0
> > >> [  720.913122] kernel: Call Trace:
> > >> [  720.913150] kernel:  <TASK>
> > >> [  720.913179] kernel:  raid5_make_request+0xb31/0xb90 [raid456]
> > >> [  720.913247] kernel:  ? do_wait_intr_irq+0xa0/0xa0
> > >> [  720.913292] kernel:  ? __blk_queue_split+0x30a/0x470
> > >> [  720.913339] kernel:  md_handle_request+0x119/0x180
> > >> [  720.913386] kernel:  md_submit_bio+0x67/0xa0
> > >> [  720.913425] kernel:  __submit_bio_fops+0x91/0x160
> > >> [  720.913468] kernel:  submit_bio_noacct+0xd7/0x2c0
> > >> [  720.913510] kernel:  __blkdev_direct_IO_simple+0x198/0x290
> > >> [  720.913576] kernel:  ? __fpu_restore_sig+0x193/0x570
> > >> [  720.913623] kernel:  ? sysvec_apic_timer_interrupt+0xaf/0xd0
> > >> [  720.913676] kernel:  ? __blkdev_direct_IO_simple+0x290/0x290
> > >> [  720.913728] kernel:  generic_file_read_iter+0x9b/0x160
> > >> [  720.913775] kernel:  new_sync_read+0x105/0x180
> > >> [  720.913820] kernel:  vfs_read+0xf1/0x190
> > >> [  720.913858] kernel:  ksys_read+0x4f/0xc0
> > >> [  720.913896] kernel:  do_syscall_64+0x38/0x90
> > >> [  720.913936] kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > >> [  720.913985] kernel: RIP: 0033:0x7f73e676d772
> > >> [  720.914024] kernel: Code: c0 e9 b2 fe ff ff 50 48 8d 3d da 2e 0c 00 e8 b5 f9 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
> > >> [  720.914166] kernel: RSP: 002b:00007fff1b8fcbb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> > >> [  720.914231] kernel: RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 00007f73e676d772
> > >> [  720.917149] kernel: RDX: 0000000000040000 RSI: 00007f73e65d3000 RDI: 0000000000000004
> > >> [  720.920078] kernel: RBP: 0000000000001000 R08: 00000000015105c0 R09: 0000000000000080
> > >> [  720.922980] kernel: R10: 00007fff1b8fca00 R11: 0000000000000246 R12: 00000015105c0000
> > >> [  720.925875] kernel: R13: 0000000000000004 R14: 00007f73e65d3000 R15: 0000000000040000
> > >> [  720.928795] kernel:  </TASK>
> > >> [  720.931704] kernel: Modules linked in: sctp ip6_udp_tunnel udp_tunnel rpcrdma rdma_cm iw_cm ib_cm ib_core sit tunnel4 ip_tunnel rfkill ipt_REJECT nf_reject_ipv4 iptable_filter xt_nat iptable_nat nf_nat iptable_mangle nf_conntrack_pptp xt_CT iptable_raw xt_multiport xt_set ip6t_REJECT nf_reject_ipv6 xt_LOG nf_log_syslog xt_limit xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter ip6_tables ip_set_hash_netport ip_set_hash_net ip_set drivetemp it87 nfnetlink hwmon_vid vfat fat iTCO_wdt intel_pmc_bxt iTCO_vendor_support at24 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx intel_powerclamp raid1 coretemp snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_usb_audio i2c_i801 gma500_gfx i2c_smbus snd_usbmidi_lib joydev snd_hwdep snd_rawmidi snd_seq_device mc snd_pcm lpc_ich i2c_algo_bit snd_timer drm_kms_helper snd cec soundcore nfsd auth_rpcgss
> > >> [  720.931885] kernel:  nfs_acl lockd grace drm fuse sunrpc zram ip_tables hid_logitech_hidpp serio_raw r8152 sata_sil24 video mii hid_jabra e1000e hid_logitech_dj
> > >> [  720.952122] kernel: CR2: ffffdbc681023bc8
> > >> [  720.955651] kernel: ---[ end trace de2c3d5b971ae71d ]---
> > >> [  720.959186] kernel: RIP: 0010:kfree+0x58/0x3e0
> > >> [  720.962723] kernel: Code: 80 4c 01 e5 0f 82 84 03 00 00 48 c7 c0 00 00 00 80 48 2b 05 4a 96 3b 01 48 01 c5 48 c1 ed 0c 48 c1 e5 06 48 03 2d 28 96 3b 01 <48> 8b 45 08 48 8d 50 ff a8 01 48 0f 45 ea 48 8b 55 08 48 8d 42 ff
> > >> [  720.966472] kernel: RSP: 0018:ffff9db4008efaf8 EFLAGS: 00010286
> > >> [  720.970238] kernel: RAX: 00006d7bc0000000 RBX: ffff9284c5214800 RCX: ffff9284c3758ff8
> > >> [  720.973993] kernel: RDX: ffff9283c1102740 RSI: ffffffffc07af091 RDI: ffff9db4008efd58
> > >> [  720.977723] kernel: RBP: ffffdbc681023bc0 R08: ffff9db4008efb88 R09: ffff9284c3759000
> > >> [  720.981464] kernel: R10: 0000000000000028 R11: ffff9284c213db48 R12: ffff9db4008efd58
> > >> [  720.985228] kernel: R13: ffff9284c213da00 R14: ffff9284c375f000 R15: ffff9db4008efd58
> > >> [  720.988995] kernel: FS:  00007f73e6669740(0000) GS:ffff9284dbc80000(0000) knlGS:0000000000000000
> > >> [  720.992774] kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >> [  720.996535] kernel: CR2: ffffdbc681023bc8 CR3: 000000005c2cc000 CR4: 00000000000006e0
> 
> I tried a few things (injecting badblocks, etc.) but still could not
> reproduce this issue.
> 
> Hi Dominik,
> 
> Could you please share more information about the array?

Please note that I repaired the array by stopping it and reassembling
without badblocks maps:
# mdadm --assemble -U force-no-bbl /dev/md126 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1

I ran fsck on it and repaired multiple-owned blocks. It's been running
smoothly since then, even under kernel 5.16.9. I'm not sure the
following information will be useful now.

>    mdadm --detail /dev/mdXXX

# mdadm --detail /dev/md126
/dev/md126:
           Version : 1.2
     Creation Time : Sat Oct 19 22:39:14 2019
        Raid Level : raid5
        Array Size : 11720531968 (10.92 TiB 12.00 TB)
     Used Dev Size : 2930132992 (2.73 TiB 3.00 TB)
      Raid Devices : 5
     Total Devices : 5
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Wed Mar  2 22:29:09 2022
             State : clean 
    Active Devices : 5
   Working Devices : 5
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : bitmap

              Name : any:nas
              UUID : 4fef0683:8f39a12d:eb69f742:b501ef53
            Events : 71011

    Number   Major   Minor   RaidDevice State
       6       8       17        0      active sync   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       7       8       49        2      active sync   /dev/sdd1
       3       8       65        3      active sync   /dev/sde1
       5       8       81        4      active sync   /dev/sdf1

>    mdadm --examine /dev/<devices>

# mdadm --examine /dev/sdb1
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 4fef0683:8f39a12d:eb69f742:b501ef53
           Name : any:nas
  Creation Time : Sat Oct 19 22:39:14 2019
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 5860266895 sectors (2.73 TiB 3.00 TB)
     Array Size : 11720531968 KiB (10.92 TiB 12.00 TB)
  Used Dev Size : 5860265984 sectors (2.73 TiB 3.00 TB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=911 sectors
          State : clean
    Device UUID : 506215d3:7893b583:1b956e6a:532d7751

Internal Bitmap : 8 sectors from superblock
    Update Time : Wed Mar  2 22:29:09 2022
       Checksum : b8ce0df8 - correct
         Events : 71011

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)
# mdadm --examine /dev/sdc1
/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 4fef0683:8f39a12d:eb69f742:b501ef53
           Name : any:nas
  Creation Time : Sat Oct 19 22:39:14 2019
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 5860266895 sectors (2.73 TiB 3.00 TB)
     Array Size : 11720531968 KiB (10.92 TiB 12.00 TB)
  Used Dev Size : 5860265984 sectors (2.73 TiB 3.00 TB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=911 sectors
          State : clean
    Device UUID : 9ab16a47:b23435f3:9231fd6b:e7d709d4

Internal Bitmap : 8 sectors from superblock
    Update Time : Wed Mar  2 22:29:09 2022
       Checksum : 20c43ebb - correct
         Events : 71011

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)
# mdadm --examine /dev/sdd1
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 4fef0683:8f39a12d:eb69f742:b501ef53
           Name : any:nas
  Creation Time : Sat Oct 19 22:39:14 2019
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 5860266895 sectors (2.73 TiB 3.00 TB)
     Array Size : 11720531968 KiB (10.92 TiB 12.00 TB)
  Used Dev Size : 5860265984 sectors (2.73 TiB 3.00 TB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=911 sectors
          State : clean
    Device UUID : 44265974:c83d899d:9bab17f4:59a0786e

Internal Bitmap : 8 sectors from superblock
    Update Time : Wed Mar  2 22:29:09 2022
       Checksum : 1a8ffefc - correct
         Events : 71011

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)
# mdadm --examine /dev/sde1
/dev/sde1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 4fef0683:8f39a12d:eb69f742:b501ef53
           Name : any:nas
  Creation Time : Sat Oct 19 22:39:14 2019
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 5860266895 sectors (2.73 TiB 3.00 TB)
     Array Size : 11720531968 KiB (10.92 TiB 12.00 TB)
  Used Dev Size : 5860265984 sectors (2.73 TiB 3.00 TB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=911 sectors
          State : clean
    Device UUID : 722420f5:3d9c05e0:71bad4ca:add63b08

Internal Bitmap : 8 sectors from superblock
    Update Time : Wed Mar  2 22:29:09 2022
       Checksum : 4e53a0c5 - correct
         Events : 71011

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)
# mdadm --examine /dev/sdf1
/dev/sdf1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 4fef0683:8f39a12d:eb69f742:b501ef53
           Name : any:nas
  Creation Time : Sat Oct 19 22:39:14 2019
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 5860266895 sectors (2.73 TiB 3.00 TB)
     Array Size : 11720531968 KiB (10.92 TiB 12.00 TB)
  Used Dev Size : 5860265984 sectors (2.73 TiB 3.00 TB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=911 sectors
          State : clean
    Device UUID : 6c2badc9:f7803c8b:9ee398c0:b4e7497d

Internal Bitmap : 8 sectors from superblock
    Update Time : Wed Mar  2 22:29:09 2022
       Checksum : 38e9c6af - correct
         Events : 71011

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 4
   Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)


Regards,
Dominik

-- 
Fedora   https://getfedora.org  |  RPM Fusion  http://rpmfusion.org
There should be a science of discontent. People need hard times and
oppression to develop psychic muscles.
        -- from "Collected Sayings of Muad'Dib" by the Princess Irulan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ