linux-kernel - Re: [ata] 0568e61225: stress-ng.copy-file.ops_per

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c93e529d-b688-9910-50c4-779c2f85fbc3@huawei.com>
Date:   Tue, 16 Aug 2022 21:44:33 +0100
From:   John Garry <john.garry@...wei.com>
To:     Damien Le Moal <damien.lemoal@...nsource.wdc.com>,
        Oliver Sang <oliver.sang@...el.com>
CC:     Christoph Hellwig <hch@....de>,
        "Martin K. Petersen" <martin.petersen@...cle.com>,
        LKML <linux-kernel@...r.kernel.org>,
        "Linux Memory Management List" <linux-mm@...ck.org>,
        <linux-ide@...r.kernel.org>, <lkp@...ts.01.org>, <lkp@...el.com>,
        <ying.huang@...el.com>, <feng.tang@...el.com>,
        <zhengjun.xing@...ux.intel.com>, <fengwei.yin@...el.com>
Subject: Re: [ata] 0568e61225: stress-ng.copy-file.ops_per_sec -15.0%
 regression

On 16/08/2022 21:02, Damien Le Moal wrote:
>> ou confirm this? Thanks!
>>
>> On this basis, it appears that max_hw_sectors_kb is getting capped from
>> scsi default @ 1024 sectors by commit 0568e61225. If it were getting
>> capped by swiotlb mapping limit then that would give us 512 sectors -
>> this value is fixed.
>>
>> So for my SHT change proposal I am just trying to revert to previous
>> behaviour in 5.19 - make max_hw_sectors_kb crazy big again.
> I reread the entire thing and I think I got things reverted here. The perf
> regression happens with the 512/512 settings, while the original 1280/32767
> before your patches was OK.

Right, that's as I read it. It would be useful for Oliver to confirm the 
results.

> So is your patch defining the optimal mapping size
> cause the reduction to 512/512.

The optimal mapping size only affects specifically sas controllers, so I 
think that we can ignore that one for now. The reduction to 512/512 
comes from the change in ata_scsi_dev_config().

> It would mean that for ATA, we need a sane
> default mapping instead of SCSI default 1024 sectors.

Right

> Now I understand your
> proposed change using ATA_MAX_SECTORS_LBA48.
> 
> However, that would be correct only for LBA48 capable drives.
> ata_dev_configure() already sets dev->max_sectors correctly according to the
> drive type, capabilities and eventual quirks. So the problem comes from the
> libata-scsi change:
> 
> dev->max_sectors = min(dev->max_sectors, sdev->host->max_sectors);
> 
> when sdev->host->max_sectors is 0 (not initialized).

That cannot happen. If sht->max_sectors is 0, then we set 
shost->max_sectors at SCSI default 1024 sectors in scsi_host_alloc()

For my proposed change, dev->max_sectors would still be initialized in 
ata_dev_configure() according to drive type, etc. And it should be <= 
LBA48 max sectors (=65535).

So then in ata_scsi_dev_config():

dev->max_sectors = min(dev->max_sectors, sdev->host->max_sectors)

this only has an impact for ahci controllers if sdev->host->max_sectors 
was capped according to host dma dev max mapping size.

I will admit that this is not ideal. An alt approach is to change 
ata_scsi_dev_config() to cap the dev max_sectors only according to shost 
dma dev mapping limit (similar to scsi_add_host_with_dma()), but that 
would not work for a controller like ipr, which does have a geniune 
max_sectors limit (which we should respect).

Thanks,
John


> So maybe simply changing
> this line to:
> 
> dev->max_sectors = min_not_zero(dev->max_sectors, sdev->host->max_sectors);
> 
> would do the trick ? Any particular adapter driver that needs a mapping cap on
> the adpter max mapping size can still set sdev->host->max_sectors as needed, and
> we keep the same defaults as before when it is not set. Thoughts ? Or am I
> missing something else ?
> 
> 
>>> The regression may come not from commands becoming tiny, but from the fact that
>>> after the patch, max_sectors_kb is too large,
>> I don't think it is, but need confirmation.
>>
>>> causing a lot of overhead with
>>> qemu swiotlb mapping and slowing down IO processing.
>>> Above, it can be seen that we ed up with max_sectors_kb being 1280, which is the
>>> default for most scsi disks (including ATA drives). That is normal. But before
>>> that, it was 512, which likely better fits qemu swiotlb and does not generate
>> Again, I don't think this this is the case. Need confirmation.
>>
>>> overhead. So the above fix will not change anything I think...