linux-ext4 - Re: Data races in ext4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5346BB86.50001@mpi-sws.org>
Date:	Thu, 10 Apr 2014 17:40:54 +0200
From:	Pedro Fonseca <pfonseca@...-sws.org>
To:	Theodore Ts'o <tytso@....edu>
CC:	linux-ext4@...r.kernel.org, adilger.kernel@...ger.ca
Subject: Re: Data races in ext4

On 04/09/2014 11:33 PM, Theodore Ts'o wrote:
> The ones relating to ext4_do_update_inode() and generic_fillattr()
> look fine to me.  Basically the first may happen if there are two
> CPU's simultaneously trying to update the on-disk data structure for
> an inode from the in-memory data structure, and that's not a problem
> since the in-memory data structure is always authoratative.  The
> second happens if someone tries to stat(2) an inode while some inode
> field is getting updated.  Most of the inode field updates will be for
> things like mtime and atime updates, or a chown or chmod, where a
> single field is getting updated, and that's not a problem since there
> is no guarantee which version of the inode information you'll get when
> you call stat(2) while the inode is getting modified out from under
> you.  It is possible that the userspace might see the i_blocks and
> i_size be out of sync, but I really can't bring myself to care about
> that.

Regarding the generic_fillattr() function, apart from the inconsistency 
between i_blocks/i_size, it looks like it can also cause wrong values to 
be returned because several of those fields are 64 bits. This means that 
each assignment gets converted into multiple instructions (in 32-bit 
architectures), allowing other instructions to interleave in between. 
For example, this is the assembly code I get when compiling 
generic_fillattr():
>     stat->atime = inode->i_atime;
> c10aa017:   8b 48 3c                mov    0x3c(%eax),%ecx
> c10aa01a:   8b 58 40                mov    0x40(%eax),%ebx
> c10aa01d:   89 4a 28                mov    %ecx,0x28(%edx)
> c10aa020:   89 5a 2c                mov    %ebx,0x2c(%edx)
>     stat->mtime = inode->i_mtime;
> c10aa023:   8b 48 44                mov    0x44(%eax),%ecx
> c10aa026:   8b 58 48                mov    0x48(%eax),%ebx
> c10aa029:   89 4a 30                mov    %ecx,0x30(%edx)
> c10aa02c:   89 5a 34                mov    %ebx,0x34(%edx)
>     stat->ctime = inode->i_ctime;
> c10aa02f:   8b 48 4c                mov    0x4c(%eax),%ecx
> c10aa032:   8b 58 50                mov    0x50(%eax),%ebx
> c10aa035:   89 4a 38                mov    %ecx,0x38(%edx)
> c10aa038:   89 5a 3c                mov    %ebx,0x3c(%edx)
...
>     stat->blocks = inode->i_blocks;
> c10aa048:   8b 58 60                mov    0x60(%eax),%ebx
> c10aa04b:   8b 48 5c                mov    0x5c(%eax),%ecx
> c10aa04e:   89 5a 48                mov    %ebx,0x48(%edx)
> c10aa051:   89 4a 44                mov    %ecx,0x44(%edx)


I still don't understand why the races in ext4_do_update_inode() won't 
cause problems. Similarly to generic_fillattr(), in 
ext4_do_update_inode() wrong values can end up being written to the 
inode fields. Is there some mechanism to recover/ignore these values 
that I'm missing?


> As for the rest, some are obviously false positives.  For example, if
> you take a look at:
>
> Variable: journal->j_running_transaction	Addresses: c1138ca1 c1136b02 c1136ca9
> c1136ca9 jbd2_get_transaction /linux-3.13.5/fs/jbd2/transaction.c:103
> c1138ca1 jbd2_journal_commit_transaction /linux-3.13.5/fs/jbd2/commit.c:539
> c1136b02 start_this_handle /linux-3.13.5/fs/jbd2/transaction.c:280
>
> It's obvious from the code path that start_this_handle() is extremely
> careful to always revalidate j_running_transaction after either (a)
> taking a read lock on the j_state_lock for the first time, or (b)
> after dropping the read_lock and grabbing a write_lock on
> j_state_lock, and if things have changed, it loops and tries again.
You're right, start_this_handle() revalidates the value read so I also 
don't see harm in this race.


> There are a couple that require some closer examination; for some of
> them, for example the two reported cases of ext4_isize_set() vs
> ext4_isize(), having the stack trace would be really helpful.

Those two pairs of functions were actually also called from the 
ext4_do_update_inode() function:
> 4338     if (ei->i_disksize != ext4_isize(raw_inode)) {
> 4339         ext4_isize_set(raw_inode, ei->i_disksize);
> 4340         need_datasync = 1;
> 4341     }

So the top part of the stack trace is:
   ext4_mark_inode_dirty() -> ext4_mark_iloc_dirty() -> 
ext4_do_update_inode() -> ext4_isize()
   ext4_mark_inode_dirty() -> ext4_mark_iloc_dirty() -> 
ext4_do_update_inode() -> ext4_isize_set()




--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html