lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Sat, 3 Feb 2024 02:36:55 +0800
From: Shengyu Qu <wiagn233@...look.com>
To: Adrian Hunter <adrian.hunter@...el.com>, ulf.hansson@...aro.org,
 linux-mmc@...r.kernel.org, linux-kernel@...r.kernel.org
Cc: wiagn233@...look.com
Subject: Re: [PATCH v1] mmc: sdhci: fix max req size based on spec

Hello Adrian,

Sorry for my late reply. I was quite busy this week.


在 2024/1/30 14:54, Adrian Hunter 写道:
> On 28/01/24 12:01, Shengyu Qu wrote:
>> For almost 2 decades, the max allowed requests were limited to 512KB
>> because of SDMA's max 512KiB boundary limit.
> It is not limited by SDMA.  It is limited by choice.
>
>> ADMA2 and ADMA3 do not have such limit and were effectively made so any
>> kind of block count would not impose interrupt and managing stress to the
>> host.
> The main benefit of ADMA is that it provides scatter/gather and so does
> not need a bounce buffer.
>
>> By limiting that to 512KiB, it effectively downgrades these DMA modes to
>> SDMA.
> Not really.
So maybe I could just delete this part of the commit message?
>> Fix that by actually following the spec:
>> When ADMA is selected tuning mode is advised. On lesser modes, 4MiB
>> transfer is selected as max, so re-tuning if timer trigger or if requested
>> by host interrupt, can be done in time. Otherwise, the only limit is the
>> variable size of types used. In this implementation, 16MiB is used as
>> maximum since tests showed that after that point, there are diminishing
>> returns.
>>
>> Also 16MiB in worst case scenarios, when card is eMMC and its max speed is
>> a generous 350MiB/s, will generate interrupts every 45ms on huge data
>> transfers.
>>
>> A new `adma_get_req_limit` sdhci host function was also introduced, to let
>> vendors override imposed limits by the generic implementation if needed.
> Not in this patch?
Yes, this part is removed from my version of patch, original version has a
"adma_get_req_limit" host function and implements it to tegra. And there's
another patch adding a new device tree item to tegra chips to set the limit
from device tree config.
>> For example, on local tests with rigorous CPU/GPU burn-in tests and abrupt
>> cut-offs to generate huge temperature changes (upwards/downwards) to the
>> card, tested host was fine up to 128MB/s transfers on slow cards that used
>> SDR104 bus timing without re-tuning.
>> In that case the 4MiB limit was overridden with a more than safe 8MiB
>> value.
>>
>> In all testing cases and boards, that change brought the following:
> "all testing cases and boards" doesn't mean much to anyone else. You
> need to be more explicit.
>
>> Depending on bus timing and eMMC/SD specs:
>> * Max Read throughput increased by 2-20%
>> * Max Write throughput increased by 50-200%
>> Depending on CPU frequency and transfer sizes:
>> * Reduced mmcqd cpu core usage by 4-50%
> The main issue with increasing the request size is that it introduces much
> more latency for synchronous reads e.g. a synchronous read may have to wait
> for a large write operation.  Generally, that is probably a show-stopper
> for unconditionally increasing the maximum request size.
>> Above commit message comes from original author whose id is CTCaer, with
>> SoB email addressctcaer@...il.com. I tried to contact with the author 1
>> month ago to ask for sending it to mainline or get the authority to submit
>> by myself, but I didn't get any reply, so I decided to send this patch by
>> myself. Original commit is here[1].
> Ok, so it is not your patch and the original author is out of touch.
>
> Is there a particular reason you wanted this patch?
>
>> The author also has a patch[2] applied after this patch, which overrides
>> adma size on tegra device from device tree property. But I don't have a
>> tegra device to actually test that, so it is not sent, and
>> adma_get_req_limit part is not included in this version of patch.
> Does that mean you haven't tested this patch yourself at all?
Yes, I have tested that on my Raspberry Pi 5b and Starfive Visionfive 2. On
pi5B, It brings about 23% speed increase in sequential write. So there's a
huge improvement in improving write performance that I think that's enough
to send this patch to upstream. On Visionfive 2, performance is mainly
limited by low mmc clock and cannot be increased, so I can't see obvious
improvement on that.

Best regards,
Shengyu

>> [1]:https://github.com/CTCaer/switch-l4t-kernel-4.9/commit/fa86ebbd56d30b3b6af26e1d1c3f9c411a47e98e
>> [2]:https://github.com/CTCaer/switch-l4t-kernel-4.9/commit/385f9335b9a60ce471ac3291f202b1326212be3e
>>
>> Signed-off-by: Shengyu Qu<wiagn233@...look.com>
>> ---
>>   drivers/mmc/host/sdhci.c | 17 ++++++++++++-----
>>   drivers/mmc/host/sdhci.h |  4 ++--
>>   2 files changed, 14 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
>> index c79f73459915..f546b675c7b9 100644
>> --- a/drivers/mmc/host/sdhci.c
>> +++ b/drivers/mmc/host/sdhci.c
>> @@ -1081,7 +1081,7 @@ static void sdhci_initialize_data(struct sdhci_host *host,
>>   	WARN_ON(host->data);
>>   
>>   	/* Sanity checks */
>> -	BUG_ON(data->blksz * data->blocks > 524288);
>> +	BUG_ON(data->blksz * data->blocks > host->mmc->max_req_size);
>>   	BUG_ON(data->blksz > host->mmc->max_blk_size);
>>   	BUG_ON(data->blocks > 65535);
>>   
>> @@ -4690,11 +4690,18 @@ int sdhci_setup_host(struct sdhci_host *host)
>>   
>>   	/*
>>   	 * Maximum number of sectors in one transfer. Limited by SDMA boundary
>> -	 * size (512KiB). Note some tuning modes impose a 4MiB limit, but this
>> -	 * is less anyway.
>> +	 * size and by tuning modes on ADMA. On tuning mode 3 16MiB is more than
>> +	 * enough to cover big data transfers.
>>   	 */
>> -	mmc->max_req_size = 524288;
>> -
>> +	if (host->flags & SDHCI_USE_ADMA) {
>> +		if (host->tuning_mode != SDHCI_TUNING_MODE_3)
>> +			mmc->max_req_size = SZ_4M;
>> +		else
>> +			mmc->max_req_size = SZ_16M;
>> +	} else {
>> +		/* On PIO/SDMA use SDMA boundary size (512KiB). */
>> +		mmc->max_req_size = SZ_512K;
>> +	}
>>   	/*
>>   	 * Maximum number of segments. Depends on if the hardware
>>   	 * can do scatter/gather or not.
>> diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h
>> index a20864fc0641..98252c427feb 100644
>> --- a/drivers/mmc/host/sdhci.h
>> +++ b/drivers/mmc/host/sdhci.h
>> @@ -346,11 +346,11 @@ struct sdhci_adma2_64_desc {
>>   #define ADMA2_END		0x2
>>   
>>   /*
>> - * Maximum segments assuming a 512KiB maximum requisition size and a minimum
>> + * Maximum segments assuming a 16MiB maximum requisition size and a minimum
>>    * 4KiB page size. Note this also allows enough for multiple descriptors in
>>    * case of PAGE_SIZE >= 64KiB.
>>    */
>> -#define SDHCI_MAX_SEGS		128
>> +#define SDHCI_MAX_SEGS		4096
>>   
>>   /* Allow for a command request and a data request at the same time */
>>   #define SDHCI_MAX_MRQS		2

Content of type "text/html" skipped

Download attachment "OpenPGP_0xE3520CC91929C8E7.asc" of type "application/pgp-keys" (6869 bytes)

Download attachment "OpenPGP_signature.asc" of type "application/pgp-signature" (841 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ