linux-ext4 - [PATCH 0/1] Kernel oopses on ext2 filesystems after 782b76d7abdf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20210713165821.8a268e2c1db4fd5cf452acd2@urjc.es>
Date:   Tue, 13 Jul 2021 16:58:21 +0200
From:   Javier Pello <javier.pello@...c.es>
To:     Jan Kara <jack@...e.com>, linux-ext4@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: [PATCH 0/1] Kernel oopses on ext2 filesystems after 782b76d7abdf

From: Javier Pello <javier.pello@...c.es>

The current mainline kernel oopses when handling ext2 filesystems,
and the filesystem is not usable, sometimes leading to a panic.

I recently tried to upgrade the kernel on my system from 5.10.7 to
5.13.1 but, when I booted the new kernel, I started getting oopses
consistently during the boot process as the init script tried to
mount an ext2 filesystem, and other weird behaviour (like tasks
stuck in uninterruptible sleep) if I accessed the filesystem later
(which I did not do for long for fear of data loss). I managed to
set up a QEMU virtual machine with as small a reproducer as I could
get (kernel stripped of as much stuff as possible) and this is the
oops message that I got (I set the kernel to panic on oops to that
I could get the first report; otherwise, the kernel tries to go on
but ends up attempting to kill init):

BUG: kernel NULL pointer dereference, address: 00000004
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
*pde = 00000000
Oops: 0000 [#1]
CPU: 0 PID: 41 Comm: init.boot Not tainted 5.12.0-rc3+ #23
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
EIP: ext2_get_page.isra.0+0xe2/0x2b0
Code: 8b 45 dc 89 5d cc 8b 80 cc 01 00 00 8b 40 34 8b 00 89 45 d4 8b 45 e0 f7 d8 89 45 e0 8b 45 e8 83 e8 0c 89 45 d0 8b 45 f0 01 f8 <0f> b7 48 04 89 ca 66 83 f9 0b 76 4a 83 e2 03 0f 85 09 01 00 00 0f
EAX: 00000000 EBX: f73fa700 ECX: 00000000 EDX: 00000001
ESI: 00000000 EDI: 00000000 EBP: c2509ce8 ESP: c2509cb0
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 EFLAGS: 00010246
CR0: 80050033 CR2: 00000004 CR3: 024dd000 CR4: 00150e90
Call Trace:
 ext2_find_entry+0x79/0x240
 ext2_inode_by_name+0x16/0x70
 ext2_lookup+0x27/0x70
 __lookup_slow+0x4f/0xe0
 walk_component+0xf7/0x160
 link_path_walk.part.0+0x24d/0x350
 ? terminate_walk+0x7d/0xf0
 path_lookupat+0x39/0x180
 filename_lookup+0x78/0x130
 ? kmem_cache_alloc+0x21/0x130
 ? getname_flags+0x1f/0x160
 ? getname_flags+0x36/0x160
 user_path_at_empty+0x25/0x30
 vfs_statx+0x53/0xd0
 __do_sys_stat64+0x27/0x50
 __ia32_sys_stat64+0xd/0x10
 __do_fast_syscall_32+0x40/0x70
 do_fast_syscall_32+0x28/0x60
 do_SYSENTER_32+0x15/0x20
 entry_SYSENTER_32+0x98/0xe6
EIP: 0xb7ee8549
Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
EAX: ffffffda EBX: 0a0dd360 ECX: bf993cc0 EDX: b7e64000
ESI: 0a0dd360 EDI: 0a0dd000 EBP: 0a0dd340 ESP: bf993c98
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000296
CR2: 0000000000000004
---[ end trace 865a3cf144c3889e ]---

I am running a 32-bit kernel on an x86 machine, in case this is
relevant. Also, I noticed that the high memory support config
option must be set to 4GB to trigger the bug; I could not
reproduce the bug if I set high memory to off.

As the text says, the oops is triggered within ext2_get_page.
I managed to track it down to line 130 in fs/ext2/dir.c, within
ext2_check_page,
    rec_len = ext2_rec_len_from_disk(p->rec_len);
Adding a printk in the function I confirmed that, while variable
page has a non-null value on function entry, the assignment
    char *kaddr = page_address(page);
in line 114 sets kaddr to a null value indeed, and this triggers
the bug when dereferenced later.

I bisected the problem to commit

782b76d7abdf02b12c46ed6f1e9bf715569027f7
fs/ext2: Replace kmap() with kmap_local_page()

The oops triggers consistently on this commit but its parent
commit works fine. Analysing the commit, I think that it may be
incomplete, as ext2_check_page and ext2_delete_entry are still
using page_address to get at the page address, but this no longer
works because those pages are now mapped with kmap_local_page,
not kmap. It seems to me that ext2_check_page and ext2_delete_entry
should be provided with the page address that their callers already
have for the page, as with all other functions in fs/ext2/dir.c
now. The proposed patch does precisely this.

Javier Pello (1):
  fs/ext2: Avoid page_address on pages returned by ext2_get_page

 fs/ext2/dir.c   | 20 ++++++++++----------
 fs/ext2/ext2.h  |  3 ++-
 fs/ext2/namei.c |  4 ++--
 3 files changed, 14 insertions(+), 13 deletions(-)

-- 
2.30.1