linux-kernel - Re: CFQ idling kills I/O performance on ext4 with blkio cgroup controller

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0d6e3c02-1952-2177-02d7-10ebeb133940@csail.mit.edu>
Date:   Thu, 30 May 2019 01:29:23 -0700
From:   "Srivatsa S. Bhat" <srivatsa@...il.mit.edu>
To:     Paolo Valente <paolo.valente@...aro.org>
Cc:     linux-fsdevel@...r.kernel.org,
        linux-block <linux-block@...r.kernel.org>,
        linux-ext4@...r.kernel.org, cgroups@...r.kernel.org,
        kernel list <linux-kernel@...r.kernel.org>,
        Jens Axboe <axboe@...nel.dk>, Jan Kara <jack@...e.cz>,
        jmoyer@...hat.com, Theodore Ts'o <tytso@....edu>,
        amakhalov@...are.com, anishs@...are.com, srivatsab@...are.com
Subject: Re: CFQ idling kills I/O performance on ext4 with blkio cgroup
 controller

On 5/29/19 12:41 AM, Paolo Valente wrote:
> 
> 
>> Il giorno 29 mag 2019, alle ore 03:09, Srivatsa S. Bhat <srivatsa@...il.mit.edu> ha scritto:
>>
>> On 5/23/19 11:51 PM, Paolo Valente wrote:
>>>
>>>> Il giorno 24 mag 2019, alle ore 01:43, Srivatsa S. Bhat <srivatsa@...il.mit.edu> ha scritto:
>>>>
>>>> When trying to run multiple dd tasks simultaneously, I get the kernel
>>>> panic shown below (mainline is fine, without these patches).
>>>>
>>>
>>> Could you please provide me somehow with a list *(bfq_serv_to_charge+0x21) ?
>>>
>>
>> Hi Paolo,
>>
>> Sorry for the delay! Here you go:
>>
>> (gdb) list *(bfq_serv_to_charge+0x21)
>> 0xffffffff814bad91 is in bfq_serv_to_charge (./include/linux/blkdev.h:919).
>> 914
>> 915	extern unsigned int blk_rq_err_bytes(const struct request *rq);
>> 916
>> 917	static inline unsigned int blk_rq_sectors(const struct request *rq)
>> 918	{
>> 919		return blk_rq_bytes(rq) >> SECTOR_SHIFT;
>> 920	}
>> 921
>> 922	static inline unsigned int blk_rq_cur_sectors(const struct request *rq)
>> 923	{
>> (gdb)
>>
>>
>> For some reason, I've not been able to reproduce this issue after
>> reporting it here. (Perhaps I got lucky when I hit the kernel panic
>> a bunch of times last week).
>>
>> I'll test with your fix applied and see how it goes.
>>
> 
> Great!  the offending line above gives me hope that my fix is correct.
> If no more failures occur, then I'm eager (and a little worried ...)
> to see how it goes with throughput :)
> 

Your fix held up well under my testing :)

As for throughput, with low_latency = 1, I get around 1.4 MB/s with
bfq (vs 1.6 MB/s with mq-deadline). This is a huge improvement
compared to what it was before (70 KB/s).

With tracing on, the throughput is a bit lower (as expected I guess),
about 1 MB/s, and the corresponding trace file
(trace-waker-detection-1MBps) is available at:

https://www.dropbox.com/s/3roycp1zwk372zo/bfq-traces.tar.gz?dl=0

Thank you so much for your tireless efforts in fixing this issue!

Regards,
Srivatsa
VMware Photon OS