lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAGudoHFxuVQPLgrsWMoCA1NMFJxVJ7Hm+ymp_S1WM_0+iz7XPQ@mail.gmail.com>
Date: Sat, 10 Aug 2024 12:16:57 +0200
From: Mateusz Guzik <mjguzik@...il.com>
To: "Theodore Ts'o" <tytso@....edu>, adilger.kernel@...ger.ca
Cc: linux-fsdevel <linux-fsdevel@...r.kernel.org>, linux-ext4@...r.kernel.org
Subject: ext4 avoidably stalls waiting on writeback when unlinking a truncated file

I'm messing around with an old microbenchmark which I massaged into
state pluggable into will-it-scale [pasted at the end]

I verified that with btrfs and xfs the bench stays on cpu the entire time.

In contrast, running in on top of ext4 gives me about 50% idle.

According to offcputime-bpfcc -K this is why:
   finish_task_switch.isra.0
    __schedule
    schedule
    io_schedule
    folio_wait_bit_common
    folio_wait_writeback
    truncate_inode_partial_folio
    truncate_inode_pages_range
    ext4_evict_inode
    evict
    do_unlinkat
    __x64_sys_unlink
    do_syscall_64
    entry_SYSCALL_64_after_hwframe
    -                vfsmix2_process (22793)
        3913285

The code reopens the file with O_TRUNC. Whacking the flag gets rid of
the off cpu time.

I have no interest in digging into it. I suspect this is an easy fix
for someone familiar with the fs.

git clone https://github.com/antonblanchard/will-it-scale.git

plug the code below into tests/vfsmix2.c && gmake && ./vfsmix2_processes

#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <assert.h>

/*
 * Repurposed code stolen from Ingo Molnar, see:
 * https://lkml.org/lkml/2015/5/19/1009
 */

char *testcase_description = "vfsmix";

void testcase(unsigned long long *iterations, unsigned long nr)
{
        char tmpfile[] = "/tmp/willitscale.XXXXXX";
        int fd = mkstemp(tmpfile);
        assert(fd >= 0);
        close(fd);
        unlink(tmpfile);

        while (1) {
                fd = open(tmpfile, O_RDWR | O_CREAT | O_EXCL, 0600);
                assert(fd >= 0);

                int ret;

                ret = lseek(fd, 4095, SEEK_SET);
                assert(ret == 4095);

                close(fd);

                fd = open(tmpfile, O_RDWR|O_CREAT|O_TRUNC);
                assert(fd >= 0);

                {
                        char c = 1;

                        ret = write(fd, &c, 1);
                        assert(ret == 1);
                }

                {
                        char *mmap_buf = (char *)mmap(0, 4096,
PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);

                        assert(mmap_buf != (void *)-1L);

                        mmap_buf[0] = 1;

                        ret = munmap(mmap_buf, 4096);
                        assert(ret == 0);
                }

                close(fd);

                ret = unlink(tmpfile);
                assert(ret == 0);

                (*iterations)++;
        }
}

-- 
Mateusz Guzik <mjguzik gmail.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ