linux-ext4 - Re: Null pointer dereference of s_chksum_driver in ext4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150503213700.GL10014@thunk.org>
Date:	Sun, 3 May 2015 17:37:00 -0400
From:	Theodore Ts'o <tytso@....edu>
To:	Nikhilesh Reddy <reddyn@...eaurora.org>
Cc:	linux-ext4@...r.kernel.org, darrick.wong@...cle.com
Subject: Re: Null pointer dereference of s_chksum_driver  in ext4_chksum

On Sun, May 03, 2015 at 01:11:58PM -0700, Nikhilesh Reddy wrote:
> Please find additional details below:
> 
> This occurred in while de-referencing the sbi->s_chksum_driver member of
> the superblock info.
> 
> This occurs during a bootup mount
> 
>  10.216919:   <6> EXT4-fs (mmcblk0p22): mounted filesystem with ordered
> data mode. Opts: barrier=1,discard
>     10.225032:   <6> SELinux: initialized (dev mmcblk0p22, type ext4),
> uses xattr
>     10.235901:   <6> EXT4-fs (mmcblk0p29): Ignoring removed
> nomblk_io_submit option
>     10.341141:   <6> Unable to handle kernel NULL pointer dereference
> at virtual address 00000000

I'd have to actually see the full file system to understand what is
going on, but what I suspect is happening is that the file system has
been corrupted in at least two different ways.  The first is that
there the journal inode is corrupted; this is what's causing the call
to ext4_error_inode() from a call to jbd2_journal_bmap().

The *second* thing which is going on is that before we noticed the
corrupted journal inode, the journal contained a copy of the
superblock which we replayed that _set_ the metadata checksum feature
flag.  Since it wasn't set originally when file system was initially
mounted, s_chksum_driver wasn't initialized, and this cuases the NULL
pointer deference.

Avoiding the kernel crash was fixed by accident in 3.18 with the
following commit: 9aa5d32ba269 ("Replace open coded mdata csum feature
to helper function"), since instead of actually checking to see if the
metadata checksum field is set, it uses as its primary mechanism
checking to see if s_chksum_driver is non-NULL.  There is a
WARN_ON_ONCE that will trip in the situation where the feature flag is
set and s_chksum_driver is NULL, but that really is a "should never
happen" situation.  The only scenario I can think of where this might
have happened is the one I described above, where it was enabled by a
journal replay.

This should be sufficient to avoid the crash, but I haven't had the
chance to try creating a file system corrupted the way I conjecture it
was corrupted, and see whether it we correctly fail the mount (which
is clearly what should happen if we discover a corrupted journal inode
while replaying the journal during the mount.)

      		    	    	       - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html