lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8a19b81a-10ed-4d65-9fbd-433af11e822f@kernel.dk>
Date:   Wed, 16 Aug 2023 13:27:24 -0600
From:   Jens Axboe <axboe@...nel.dk>
To:     "Yin, Fengwei" <fengwei.yin@...el.com>,
        kernel test robot <oliver.sang@...el.com>
Cc:     oe-lkp@...ts.linux.dev, lkp@...el.com,
        linux-kernel@...r.kernel.org, io-uring@...r.kernel.org,
        ying.huang@...el.com, feng.tang@...el.com
Subject: Re: [linus:master] [io_uring] caec5ebe77:
 stress-ng.io-uring.ops_per_sec -33.1% regression

On 8/16/23 9:32 AM, Yin, Fengwei wrote:
> Hi Jens,
> 
> On 7/6/2023 3:29 PM, kernel test robot wrote:
>>
>>
>> Hello,
>>
>> kernel test robot noticed a -33.1% regression of stress-ng.io-uring.ops_per_sec on:
>>
>>
>> commit: caec5ebe77f97d948dcf46f07d622bda7f1f6dfd ("io_uring: rely solely on FMODE_NOWAIT")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>
>>
>> NOTE:
>> one thing we want to mention is we reported
>> "[linux-next:master] [io_uring]  caec5ebe77: stress-ng.io-uring.ops_per_sec 32.5% improvement"
>> on
>> https://lore.kernel.org/all/202306061247.510feb23-oliver.sang@intel.com/
>> however, by further checking, we realized the test machine ran in abnormal
>> status at that time due to BIOS issue, which we finally fixed recently.
>> please just ignore that report upon linus-next/master.
>>
>>
>> testcase: stress-ng
>> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
>> parameters:
>>
>> 	nr_threads: 100%
>> 	testtime: 60s
>> 	class: pts
>> 	test: io-uring
>> 	cpufreq_governor: performance
>>
>>
>> In addition to that, the commit also has significant impact on the following tests:
>>
>> +------------------+-------------------------------------------------------------------------------------------------+
>> | testcase: change | stress-ng: stress-ng.io-uring.ops_per_sec -1.3% regression                                      |
>> | test machine     | 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory |
>> | test parameters  | class=pts                                                                                       |
>> |                  | cpufreq_governor=performance                                                                    |
>> |                  | nr_threads=100%                                                                                 |
>> |                  | test=io-uring                                                                                   |
>> |                  | testtime=60s                                                                                    |
>> +------------------+-------------------------------------------------------------------------------------------------+
>>
>>
>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
>> the same patch/commit), kindly add following tags
>> | Reported-by: kernel test robot <oliver.sang@...el.com>
>> | Closes: https://lore.kernel.org/oe-lkp/202307060958.3594f22f-oliver.sang@intel.com
>>
>>
>> Details are as below:
>> -------------------------------------------------------------------------------------------------->
>>
>>
>> To reproduce:
>>
>>         git clone https://github.com/intel/lkp-tests.git
>>         cd lkp-tests
>>         sudo bin/lkp install job.yaml           # job file is attached in this email
>>         bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
>>         sudo bin/lkp run generated-yaml-file
>>
>>         # if come across any failure that blocks the test,
>>         # please remove ~/.lkp and /lkp dir to run from a clean state.
>>
>> =========================================================================================
>> class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
>>   pts/gcc-12/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp7/io-uring/stress-ng/60s
>>
>> commit: 
>>   e9833d8701 ("block: mark bdev files as FMODE_NOWAIT if underlying device supports it")
>>   caec5ebe77 ("io_uring: rely solely on FMODE_NOWAIT")
> About this regression, some findings in my side:
> - LKP use initrd to do stress-ng testing. That means the stress-ng is using the
>   file in initrd as test file.
> - The commit caec5ebe77 makes io_uring rely on FMODE_NOWAIT. But the tmpfs and
>   the file on initrd doesn't has that bit set. I suppose this is why we can see
>   this regression with LKP which is using initrd.
> 
>   I tried to let stress-ng.io_uring use the file on tmpfs and also can see
>   signifcient performance change with this commit.
> 
> - If apply following change based on commit caec5ebe77:
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 7c426584e35a..e755697c773f 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -1765,6 +1765,17 @@ static void io_iopoll_req_issued(struct io_kiocb *req, unsigned int issue_flags)
>   */
>  static bool __io_file_supports_nowait(struct file *file, umode_t mode)
>  {
> +       if (S_ISREG(mode)) {
> +               struct block_device *bdev = file->f_inode->i_sb->s_bdev;
> +
> +               if (IS_ENABLED(CONFIG_BLOCK) &&
> +                               (!bdev || bdev_nowait(bdev)) &&
> +                               !io_is_uring_fops(file))
> +                       return true;
> +
> +               return false;
> +       }
> +
>         /* any ->read/write should understand O_NONBLOCK */
>         if (file->f_flags & O_NONBLOCK)
>                 return true;
> 
> The regression is gone with LKP.
> 
> - If apply following change based on commit caec5ebe77:
> diff --git a/mm/shmem.c b/mm/shmem.c
> index e40a08c5c6d7..e9df664f0063 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -3962,9 +3962,16 @@ const struct address_space_operations shmem_aops = {
>  };
>  EXPORT_SYMBOL(shmem_aops);
> 
> +static int shmem_file_open(struct inode *inode, struct file *filp)
> +{
> +       filp->f_mode |= FMODE_NOWAIT;
> +
> +       return generic_file_open(inode, filp);
> +}
> +
>  static const struct file_operations shmem_file_operations = {
>         .mmap           = shmem_mmap,
> -       .open           = generic_file_open,
> +       .open           = shmem_file_open,
>         .get_unmapped_area = shmem_get_unmapped_area,
>  #ifdef CONFIG_TMPFS
>         .llseek         = shmem_file_llseek,
> 
> The performance change when running stress-ng.io_uring with testing file
> in tmpfs is gone.
> 
> This is just the information FYI. I may miss something obviously here. Thanks.

This actually highlighted a problem with the old nowait logic, in that
it assumed !bdev would mean that nowait was fine. Looking at shmem, we
definitely need IOCB_NOWAIT handling in there to make that safe. So the
old code was buggy, and conversely, we can't also just add the
FMODE_NOWAIT without first making those improvements to shmem first.

Basically you'd want to ensure that the read_iter and write_iter paths
honor IOCB_NOWAIT. Once that's done, then FMODE_NOWAIT can indeed be set
in the open helper.

I might take a stab at this, but I'm out sick right now so don't think
it'd be cohesive if I did it right now.

-- 
Jens Axboe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ