linux-kernel - the "read" syscall sees partial effects of the "write" syscall

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LRH.2.02.2009180509370.19302@file01.intranet.prod.int.rdu2.redhat.com>
Date:   Fri, 18 Sep 2020 08:25:28 -0400 (EDT)
From:   Mikulas Patocka <mpatocka@...hat.com>
To:     Dan Williams <dan.j.williams@...el.com>
cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Matthew Wilcox <willy@...radead.org>, Jan Kara <jack@...e.cz>,
        Eric Sandeen <esandeen@...hat.com>,
        Dave Chinner <dchinner@...hat.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: the "read" syscall sees partial effects of the "write" syscall

Hi

I'd like to ask about this problem: when we write to a file, the kernel 
takes the write inode lock. When we read from a file, no lock is taken - 
thus the read syscall can read data that are halfway modified by the write 
syscall.

The standard specifies the effects of the write syscall are atomic - see 
this:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_07

> 2.9.7 Thread Interactions with Regular File Operations
> 
> All of the following functions shall be atomic with respect to each 
> other in the effects specified in POSIX.1-2017 when they operate on 
> regular files or symbolic links:
> 
> chmod()    fchownat()  lseek()      readv()     unlink()
> chown()    fcntl()     lstat()      pwrite()    unlinkat()
> close()    fstat()     open()       rename()    utime()
> creat()    fstatat()   openat()     renameat()  utimensat()
> dup2()     ftruncate() pread()      stat()      utimes()
> fchmod()   lchown()    read()       symlink()   write()
> fchmodat() link()      readlink()   symlinkat() writev()
> fchown()   linkat()    readlinkat() truncate()
> 
> If two threads each call one of these functions, each call shall either 
> see all of the specified effects of the other call, or none of them. The 
> requirement on the close() function shall also apply whenever a file 
> descriptor is successfully closed, however caused (for example, as a 
> consequence of calling close(), calling dup2(), or of process 
> termination).

Should the read call take the read inode lock to make it atomic w.r.t. the 
write syscall? (I know - taking the read lock causes big performance hit 
due to cache line bouncing)

I've created this program to test it - it has two threads, one writing and 
the other reading and verifying. When I run it on OpenBSD or FreeBSD, it 
passes, on Linux it fails with "we read modified bytes".

Mikulas



#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <pthread.h>

#define L	65536

static int h;
static pthread_barrier_t barrier;
static pthread_t thr;

static char rpattern[L];
static char wpattern[L];

static void *reader(__attribute__((unused)) void *ptr)
{
	while (1) {
		int r;
		size_t i;
		r = pthread_barrier_wait(&barrier);
		if (r > 0) fprintf(stderr, "pthread_barrier_wait: %s\n", strerror(r)), exit(1);
		r = pread(h, rpattern, L, 0);
		if (r != L) perror("pread"), exit(1);
		for (i = 0; i < L; i++) {
			if (rpattern[i] != rpattern[0])
				fprintf(stderr, "we read modified bytes\n"), exit(1);
		}
	}
	return NULL;
}

int main(__attribute__((unused)) int argc, char *argv[])
{
	int r;
	h = open(argv[1], O_RDWR | O_CREAT | O_TRUNC, 0644);
	if (h < 0) perror("open"), exit(1);
	r = pwrite(h, wpattern, L, 0);
	if (r != L) perror("pwrite"), exit(1);
	r = pthread_barrier_init(&barrier, NULL, 2);
	if (r) fprintf(stderr, "pthread_barrier_init: %s\n", strerror(r)), exit(1);
	r = pthread_create(&thr, NULL, reader, NULL);
	if (r) fprintf(stderr, "pthread_create: %s\n", strerror(r)), exit(1);
	while (1) {
		size_t i;
		for (i = 0; i < L; i++)
			wpattern[i]++;
		r = pthread_barrier_wait(&barrier);
		if (r > 0) fprintf(stderr, "pthread_barrier_wait: %s\n", strerror(r)), exit(1);
		r = pwrite(h, wpattern, L, 0);
		if (r != L) perror("pwrite"), exit(1);
	}
}