linux-kernel - [performance] fuse: No Significant Performance Improvement with Passthrough Enabled?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <be26c9d6-ff51-4399-b47d-8a0d4413ce0d@gmail.com>
Date: Thu, 28 Nov 2024 12:04:34 +0800
From: abushwang <abushwangs@...il.com>
To: miklos@...redi.hu
Cc: linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: [performance] fuse: No Significant Performance Improvement with
 Passthrough Enabled?

I recently learned that FUSE has introduced passthrough support, which 
appears to significantly enhance performance, as discussed in this 
article: [LWN.net](https://lwn.net/Articles/832430/).

I plan to develop some upper-layer applications based on this feature. 
However, during my testing, I found that the performance of passthrough 
for reading small files seems to be worse than that without passthrough. 
Below are the details of my test cases:
https://github.com/wswsmao/fuse-performance/blob/main/file_access_test.c

I generated files of sizes 1M, 500M, and 1000M using the aforementioned 
use case for reading.
https://github.com/wswsmao/fuse-performance/blob/main/generate_large_files.sh

### Test Environment Information:

```
$ uname -r
6.11.5-200.fc40.x86_64
```

```
$mount
/dev/vda1 on / type ext4 (rw,noatime)
...

```

### Testing Steps:

I cloned the latest code from the libfuse upstream community and 
compiled it to obtain passthrough_hp.

The latest passthrough_hp supports passthrough by default. Therefore, 
when testing with passthrough, I used the following command:

```
ls -lh source_dir/
total 1.5G
-rw-r--r-- 1 root root  1.0M Nov 28 02:45 sequential_file_1
-rw-r--r-- 1 root root  500M Nov 28 02:45 sequential_file_2
-rw-r--r-- 1 root root 1000M Nov 28 02:45 sequential_file_3

./lattest_passthrough_hp source_dir/ mount_point/
```

For testing without passthrough, I used the following command:

```
./lattest_passthrough_hp source_dir/ mount_point/ --nopassthrough
```

Then, I executed the test script on mount_point.


During debugging, in a scenario with a 1M buffer set to 4K, I added 
print statements in the FUSE daemon's read function. In the without 
passthrough mode, I observed 11 print statements, with the maximum read 
size being 131072. Additionally, I captured 11 fuse_readahead operations 
using ftrace. However, in passthrough mode, even after increasing the 
ext4 read-ahead size using the command `blockdev --setra $num 
/dev/vda1`, the performance improvement was not significant.

I would like to understand why, in this case, the performance of 
passthrough seems to be inferior to that of without passthrough.

Thank you for your assistance.

Best regards,

Abushwang

Attached is my test report for your reference.

## without passthrough

### Size = 1.0M

| Mode       | Buffer Size | Time (ms) | Read Calls |
| ------------ | ------------- | ----------- | ------------ |
| sequential | 4096        | 7.99      | 256        |
| sequential | 131072      | 6.46      | 8          |
| sequential | 262144      | 7.52      | 4          |
| random     | 4096        | 51.40     | 256        |
| random     | 131072      | 10.62     | 8          |
| random     | 262144      | 8.69      | 4          |


### Size = 500M

| Mode       | Buffer Size | Time (ms) | Read Calls |
| ------------ | ------------- | ----------- | ------------ |
| sequential | 4096        | 3662.68   | 128000     |
| sequential | 131072      | 3399.55   | 4000       |
| sequential | 262144      | 3565.99   | 2000       |
| random     | 4096        | 28444.48  | 128000     |
| random     | 131072      | 5012.85   | 4000       |
| random     | 262144      | 3636.87   | 2000       |

### Size = 1000M

| Mode       | Buffer Size | Time (ms) | Read Calls |
| ------------ | ------------- | ----------- | ------------ |
| sequential | 4096        | 8164.34   | 256000     |
| sequential | 131072      | 7704.75   | 8000       |
| sequential | 262144      | 7970.08   | 4000       |
| random     | 4096        | 57275.82  | 256000     |
| random     | 131072      | 10311.90  | 8000       |
| random     | 262144      | 7839.20   | 4000       |


## with passthrough

### Size = 1.0M

| Mode       | Buffer Size | Time (ms) | Read Calls |
| ------------ | ------------- | ----------- | ------------ |
| sequential | 4096        | 8.50      | 256        |
| sequential | 131072      | 7.54      | 8          |
| sequential | 262144      | 8.71      | 4          |
| random     | 4096        | 52.16     | 256        |
| random     | 131072      | 9.10      | 8          |
| random     | 262144      | 9.54      | 4          |


### Size = 500M

| Mode       | Buffer Size | Time (ms) | Read Calls |
| ------------ | ------------- | ----------- | ------------ |
| sequential | 4096        | 3320.70   | 128000     |
| sequential | 131072      | 3234.08   | 4000       |
| sequential | 262144      | 2881.98   | 2000       |
| random     | 4096        | 28457.52  | 128000     |
| random     | 131072      | 4558.78   | 4000       |
| random     | 262144      | 3476.05   | 2000       |


### Size = 1000M

| Mode       | Buffer Size | Time (ms) | Read Calls |
| ------------ | ------------- | ----------- | ------------ |
| sequential | 4096        | 6842.04   | 256000     |
| sequential | 131072      | 6677.01   | 8000       |
| sequential | 262144      | 6268.29   | 4000       |
| random     | 4096        | 58478.65  | 256000     |
| random     | 131072      | 9435.85   | 8000       |
| random     | 262144      | 7031.16   | 4000       |