lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANubcdULKcXmc0mQa4E=giG4BvErS4cPnk8gq5FO-AkdhhCgqw@mail.gmail.com>
Date: Wed, 18 Dec 2024 09:31:04 +0800
From: Stephen Zhang <starzhangzsd@...il.com>
To: Dave Chinner <david@...morbit.com>
Cc: djwong@...nel.org, dchinner@...hat.com, leo.lilong@...wei.com, 
	wozizhi@...wei.com, osandov@...com, xiang@...nel.org, 
	zhangjiachen.jaycee@...edance.com, linux-xfs@...r.kernel.org, 
	linux-kernel@...r.kernel.org, zhangshida@...inos.cn, allexjlzheng@...cent.com, 
	flyingpeng@...cent.com, txpeng@...cent.com
Subject: Re: [PATCH 0/5] *** Introduce new space allocation algorithm ***

Dave Chinner <david@...morbit.com> 于2024年11月26日周二 08:51写道:
>
> is simply restating what you said in the previous email that I
> explicitly told you didn't answer the question I was asking you.
>
> Please listen to what I'm asking you to do. You don't need to
> explain anything to me, I just want you to run an experiment and
> report the results.
>
> This isn't a hard thing to do: the inode32 filesystem should fill to
> roughly 50% before it really starts to spill to the lower AGs.
> Record and paste the 'xfs_spaceman -c "freesp -a X"' histograms for
> each AG when the filesystem is a little over half full.
>
> That's it. I don't need you to explain anything to me, I simply want
> to know if the inode32 allocation policy does, in fact, work the way
> it is expected to under your problematic workload.
>
> -Dave.
> --
> Dave Chinner
> david@...morbit.com

Hi, sorry for the delay.
Seeing that others(adding them to CC list) have also encountered this issue:

https://lore.kernel.org/all/20241216130551.811305-1-txpeng@tencent.com/

As an reference, maybe we should give out the result we got so far:
+---------------+--------+--------+--------+
| Space Used (%)| Normal | inode32|   AF   |
+---------------+--------+--------+--------+
|            30 |  35.11 |  35.25 |  35.11 |
|            41 |  57.35 |  57.58 |  55.96 |
|            46 |  71.48 |  71.74 |  54.04 |
|            51 |  88.40 |  88.68 |  49.49 |
|            56 | 100.00 | 100.00 |  43.91 |
|            62 |        |        |  37.00 |
|            67 |        |        |  28.12 |
|            72 |        |        |  16.32 |
|            77 |        |        |  19.51 |
+---------------+--------+--------+--------+

The raw data will be attached in the tail of the mail.

The first column represents the percentage of the space used.
The rest three columns represents the fragmentation of the free space,
which is the percentage of free extent in range [1,1] from the output
of "xfs_db -c 'freesp' $test_dev".


How to test the Normal vs AF yourself?
Apply the patches and follow the commands in:
https://lore.kernel.org/linux-xfs/20241104014439.3786609-1-zhangshida@kylinos.cn/

How to test the inode32 yourself?
1. we need to do some hack to the kernel at first:
============
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 09dc44480d16..69fa9f8867df 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -253,9 +253,10 @@ xfs_set_inode_alloc_perag(
        }

        set_bit(XFS_AGSTATE_ALLOWS_INODES, &pag->pag_opstate);
-       if (pag->pag_agno < max_metadata)
+       if (pag->pag_agno < max_metadata) {
+               pr_info("%s===agno:%d\n", __func__, pag->pag_agno);
                set_bit(XFS_AGSTATE_PREFERS_METADATA, &pag->pag_opstate);
-       else
+       } else
                clear_bit(XFS_AGSTATE_PREFERS_METADATA, &pag->pag_opstate);
        return true;
 }
@@ -312,7 +313,7 @@ xfs_set_inode_alloc(
         * sufficiently large, set XFS_OPSTATE_INODE32 if we must alter
         * the allocator to accommodate the request.
         */
-       if (xfs_has_small_inums(mp) && ino > XFS_MAXINUMBER_32)
+       if (xfs_has_small_inums(mp))
                xfs_set_inode32(mp);
        else
                xfs_clear_inode32(mp);
==========
so that we can test inode32 in a small disk img and observe it in a
controllable way.

2. Do the same test as the method we used to test Normal vs AF, but with
   a little change.

2.1. Create an 1g sized img file and format it as xfs:
  dd if=/dev/zero of=test.img bs=1M count=1024
  mkfs.xfs -f test.img
  sync
2.2. Make a mount directory:
  mkdir mnt
2.3. Run the auto_frag.sh script, which will call another scripts
  To enable the inode32, you should change the mount option in frag.sh:
==========
-       mount -o af1=1 $test_dev $test_mnt
+       mount -o inode32 $test_dev $test_mnt
==========
  run:
    ./auto_frag.sh 1

And we are still hesitant about whether we should report these results since:
1. it's tested with the assumption that the hack that we did to the inode32
   will have no impact on the estimation of the metadata preference method.
2. it's tested under an alternate-punching script instead of some real MySQL
   workload.

And I am afraid that Dave will blame us for not doing exactly what you
told us to test. Sorry.:p
Maybe we should port the algorithm to a release version and do a few months
test with some users or database guys for the inode32 or the new algorithm
in a whole.
We should reply back at that time maybe.

And Tianxiang, would you mind working with us on the problem? Teamwork
will be quite efficient. We'll try our best to figure out a way to see how
to let everyone play an important role in this work.

Cheers,
Shida

===============Attachment 1: Normal=====================
test_dev:test.img test_mnt:mnt/ fize_size:512000KB
mount test.img mnt/
file:mnt//frag size:500MB
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/loop0     xfs   960M  285M  676M  30% /data/proj/frag_test/mnt
umount test.img
   from      to extents  blocks    pct
      1       1   63630   63630  35.11
   2048    4095       1    2923   1.61
  32768   65536       2  114672  63.28
test_dev:test.img test_mnt:mnt/ fize_size:204800KB
mount test.img mnt/
file:mnt//frag2 size:200MB
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/loop0     xfs   960M  386M  575M  41% /data/proj/frag_test/mnt
umount test.img
   from      to extents  blocks    pct
      1       1   89127   89127  57.35
   2048    4095       1    2923   1.88
   8192   16383       1   14226   9.15
  32768   65536       1   49144  31.62
test_dev:test.img test_mnt:mnt/ fize_size:102400KB
mount test.img mnt/
file:mnt//frag3 size:100MB
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/loop0     xfs   960M  436M  525M  46% /data/proj/frag_test/mnt
umount test.img
   from      to extents  blocks    pct
      1       1  101877  101877  71.48
   2048    4095       1    2923   2.05
   8192   16383       1   14226   9.98
  16384   32767       1   23492  16.48
test_dev:test.img test_mnt:mnt/ fize_size:102400KB
mount test.img mnt/
file:mnt//frag4 size:100MB
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/loop0     xfs   960M  486M  475M  51% /data/proj/frag_test/mnt
umount test.img
   from      to extents  blocks    pct
      1       1  114579  114579  88.40
    512    1023       1     811   0.63
   8192   16383       1   14226  10.98
test_dev:test.img test_mnt:mnt/ fize_size:102400KB
mount test.img mnt/
file:mnt//frag5 size:100MB
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/loop0     xfs   960M  537M  424M  56% /data/proj/frag_test/mnt
umount test.img
   from      to extents  blocks    pct
      1       1  116730  116730 100.00
===============Attachment 2: inode 32=====================
 test_dev:test.img test_mnt:mnt/ fize_size:512000KB
mount -o af1=1 test.img mnt/
file:mnt//frag size:500MB
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/loop0     xfs   960M  285M  676M  30% /data/proj/frag_test/mnt
umount test.img
   from      to extents  blocks    pct
      1       1   63887   63887  35.25
   2048    4095       1    2931   1.62
  32768   65536       2  114407  63.13
test_dev:test.img test_mnt:mnt/ fize_size:204800KB
mount -o af1=1 test.img mnt/
file:mnt//frag2 size:200MB
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/loop0     xfs   960M  386M  575M  41% /data/proj/frag_test/mnt
umount test.img
   from      to extents  blocks    pct
      1       1   89487   89487  57.58
   2048    4095       1    2931   1.89
   8192   16383       1   13858   8.92
  32768   65536       1   49144  31.62
test_dev:test.img test_mnt:mnt/ fize_size:102400KB
mount -o af1=1 test.img mnt/
file:mnt//frag3 size:100MB
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/loop0     xfs   960M  435M  526M  46% /data/proj/frag_test/mnt
umount test.img
   from      to extents  blocks    pct
      1       1  102235  102235  71.74
   2048    4095       1    2931   2.06
   8192   16383       1   13858   9.72
  16384   32767       1   23492  16.48
test_dev:test.img test_mnt:mnt/ fize_size:102400KB
mount -o af1=1 test.img mnt/
file:mnt//frag4 size:100MB
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/loop0     xfs   960M  486M  475M  51% /data/proj/frag_test/mnt
umount test.img
   from      to extents  blocks    pct
      1       1  114937  114937  88.68
    512    1023       1     819   0.63
   8192   16383       1   13858  10.69
test_dev:test.img test_mnt:mnt/ fize_size:102400KB
mount -o af1=1 test.img mnt/
file:mnt//frag5 size:100MB
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/loop0     xfs   960M  537M  424M  56% /data/proj/frag_test/mnt
umount test.img
   from      to extents  blocks    pct
      1       1  116730  116730 100.00
===============Attachment 3: AF=====================
test_dev:test.img test_mnt:mnt/ fize_size:512000KB
mount -o af1=1 test.img mnt/
file:mnt//frag size:500MB
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/loop0     xfs   960M  285M  676M  30% /data/proj/frag_test/mnt
umount test.img
   from      to extents  blocks    pct
      1       1   63630   63630  35.11
   2048    4095       1    2923   1.61
  32768   65536       2  114672  63.28
test_dev:test.img test_mnt:mnt/ fize_size:204800KB
mount -o af1=1 test.img mnt/
file:mnt//frag2 size:200MB
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/loop0     xfs   960M  385M  576M  41% /data/proj/frag_test/mnt
umount test.img
   from      to extents  blocks    pct
      1       1   86974   86974  55.96
   2048    4095       1    2923   1.88
  32768   65536       1   65528  42.16
test_dev:test.img test_mnt:mnt/ fize_size:102400KB
mount -o af1=1 test.img mnt/
file:mnt//frag3 size:100MB
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/loop0     xfs   960M  436M  525M  46% /data/proj/frag_test/mnt
umount test.img
   from      to extents  blocks    pct
      1       1   77038   77038  54.04
  32768   65536       1   65528  45.96
test_dev:test.img test_mnt:mnt/ fize_size:102400KB
mount -o af1=1 test.img mnt/
file:mnt//frag4 size:100MB
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/loop0     xfs   960M  486M  475M  51% /data/proj/frag_test/mnt
umount test.img
   from      to extents  blocks    pct
      1       1   64186   64186  49.48
  32768   65536       1   65528  50.52
test_dev:test.img test_mnt:mnt/ fize_size:102400KB
mount -o af1=1 test.img mnt/
file:mnt//frag5 size:100MB
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/loop0     xfs   960M  536M  425M  56% /data/proj/frag_test/mnt
umount test.img
   from      to extents  blocks    pct
      1       1   51312   51312  43.91
      2       3      11      22   0.02
  32768   65536       1   65528  56.07
test_dev:test.img test_mnt:mnt/ fize_size:102400KB
mount -o af1=1 test.img mnt/
file:mnt//frag6 size:100MB
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/loop0     xfs   960M  586M  375M  62% /data/proj/frag_test/mnt
umount test.img
   from      to extents  blocks    pct
      1       1   38486   38486  37.00
  32768   65536       1   65528  63.00
test_dev:test.img test_mnt:mnt/ fize_size:102400KB
mount -o af1=1 test.img mnt/
file:mnt//frag7 size:100MB
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/loop0     xfs   960M  637M  324M  67% /data/proj/frag_test/mnt
umount test.img
   from      to extents  blocks    pct
      1       1   25633   25633  28.12
  32768   65536       1   65528  71.88
test_dev:test.img test_mnt:mnt/ fize_size:102400KB
mount -o af1=1 test.img mnt/
file:mnt//frag8 size:100MB
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/loop0     xfs   960M  687M  274M  72% /data/proj/frag_test/mnt
umount test.img
   from      to extents  blocks    pct
      1       1   12783   12783  16.32
  32768   65536       1   65528  83.68
test_dev:test.img test_mnt:mnt/ fize_size:102400KB
mount -o af1=1 test.img mnt/
file:mnt//frag9 size:100MB
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/loop0     xfs   960M  737M  224M  77% /data/proj/frag_test/mnt
umount test.img
   from      to extents  blocks    pct
      1       1   12768   12768  19.51
   8192   16383       1   16370  25.02
  32768   65536       1   36295  55.47

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ