lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170702233056.oommhqsip46fruuc@thunk.org>
Date:   Sun, 2 Jul 2017 19:30:56 -0400
From:   Theodore Ts'o <tytso@....edu>
To:     Благодаренко Артём 
        <artem.blagodarenko@...il.com>
Cc:     linux-ext4@...r.kernel.org
Subject: Re: [PATCH v3] Add largedir feature

Some more information about the failure that I'm seeing.

It reproduces *extremely* reliably using:

   gce-xfstests -c lustre_mds generic/027

I'm testing on the ext4 dev branch, and it's only show up with the
largedir setup.  The test in question is creating lots of 1k files in
separate directories to hit ENOSPC.  So I'm guessing it's some kind of
problem in an the error handling path.

>From looking at the console logs it looks like things are coming to a
dead halt due to a blocked wait_on_buffer() in jbd2_write_superblock()
in the commit thread.  Everything else ends up waiting for the commit
to finish, and it's all she wrote.

The generic/027 test passes on the 4k and 1k configuration.  It also
passes when run under kvm-xfstests with the same parameters, so it's
likely there is some kind of timing component as well.

I started doing some more digging, and it looks like it has nothing to
do with largedir.  Instead it seems to be something wierd with
lazy_itable initialization.   This works fine:

/sbin/mkfs.ext4 -F -b 4096 /dev/mapper/xt-vdc 65536
mount /dev/mapper/xt-vdc /xt-vdc
sleep 1 ; df ; sleep 1
umount /xt-vdc

Replace the first mkfs command with:

/sbin/mkfs.ext4 -F -I 2048 -b 4096 /dev/mapper/xt-vdc 65536

and the system locks up in the same way as generic/027 when run using
the lustre_mds configuration.

Replace the first mkfs with:

/sbin/mkfs.ext4 -F -I 2048 -b 4096 -E lazy_itable_init=0 /dev/mapper/xt-vdc 65536

there are no problems.  So, it looks like it's some combination of
using a 2048 inode size and lazy itable initialization.

I haven't figured out if this is a recent regression, or whether this
is something that we're only seeing recently.  It also seems to be
related to some SCSI tag aborts that we aren't seeing elsewhere, so it
may have to do with how we are issuing discards.  Whether this is a
GCE issue or something which doesn't show up because the KVM I am
handles discards differently is another unknown issue.  But I thought
I would at least ease your mind that this doesn't seem to be a
specifically a largedir issue.

Cheers,

						- Ted

Download attachment "console.out.gz" of type "application/gzip" (20510 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ