linux-kernel - Re: [PATCH] mmc: dw_mmc: Fix IDMAC operation with pages bigger than 4K

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPLW+4mu3K38_sPnTDj-gkvdsnfN3OKXwfDSBUg_jUj+f122cA@mail.gmail.com>
Date: Tue, 2 Apr 2024 17:43:31 -0500
From: Sam Protsenko <semen.protsenko@...aro.org>
To: Arnd Bergmann <arnd@...db.de>
Cc: Jaehoon Chung <jh80.chung@...sung.com>, Ulf Hansson <ulf.hansson@...aro.org>, 
	Christoph Hellwig <hch@....de>, Chris Ball <cjb@...top.org>, Will Newton <will.newton@...il.com>, 
	Matt Fleming <matt@...sole-pimps.org>, Christian Brauner <brauner@...nel.org>, 
	Jens Axboe <axboe@...nel.dk>, Sumit Semwal <sumit.semwal@...aro.org>, 
	Dan Carpenter <dan.carpenter@...aro.org>, Anders Roxell <anders.roxell@...aro.org>, 
	Naresh Kamboju <naresh.kamboju@...aro.org>, 
	"linux-mmc @ vger . kernel . org" <linux-mmc@...r.kernel.org>, linux-block <linux-block@...r.kernel.org>, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mmc: dw_mmc: Fix IDMAC operation with pages bigger than 4K

On Thu, Mar 7, 2024 at 1:52 AM Arnd Bergmann <arnd@...db.de> wrote:
>
> On Thu, Mar 7, 2024, at 00:20, Sam Protsenko wrote:
> > Commit 616f87661792 ("mmc: pass queue_limits to blk_mq_alloc_disk") [1]
> > revealed the long living issue in dw_mmc.c driver, existing since the
> > time when it was first introduced in commit f95f3850f7a9 ("mmc: dw_mmc:
> > Add Synopsys DesignWare mmc host driver."), also making kernel boot
> > broken on platforms using dw_mmc driver with 16K or 64K pages enabled,
> > with this message in dmesg:
> >
> >     mmcblk: probe of mmc0:0001 failed with error -22
> >
> > That's happening because mmc_blk_probe() fails when it calls
> > blk_validate_limits() consequently, which returns the error due to
> > failed max_segment_size check in this code:
> >
> >     /*
> >      * The maximum segment size has an odd historic 64k default that
> >      * drivers probably should override.  Just like the I/O size we
> >      * require drivers to at least handle a full page per segment.
> >      */
> >     ...
> >     if (WARN_ON_ONCE(lim->max_segment_size < PAGE_SIZE))
> >         return -EINVAL;
> >
> > In case when IDMAC (Internal DMA Controller) is used, dw_mmc.c always
> > sets .max_seg_size to 4 KiB:
> >
> >     mmc->max_seg_size = 0x1000;
> >
> > The comment in the code above explains why it's incorrect. Arnd
> > suggested setting .max_seg_size to .max_req_size to fix it, which is
> > also what some other drivers are doing:
> >
> >    $ grep -rl 'max_seg_size.*=.*max_req_size' drivers/mmc/host/ | \
> >      wc -l
> >    18
>
> Nice summary!
>
> > This change is not only fixing the boot with 16K/64K pages, but also
> > leads to a better MMC performance. The linear write performance was
> > tested on E850-96 board (eMMC only), before commit [1] (where it's
> > possible to boot with 16K/64K pages without this fix, to be able to do
> > a comparison). It was tested with this command:
> >
> >     # dd if=/dev/zero of=somefile bs=1M count=500 oflag=sync
> >
> > Test results are as follows:
> >
> >   - 4K pages,  .max_seg_size = 4 KiB:                   94.2 MB/s
> >   - 4K pages,  .max_seg_size = .max_req_size = 512 KiB: 96.9 MB/s
> >   - 16K pages, .max_seg_size = 4 KiB:                   126 MB/s
> >   - 16K pages, .max_seg_size = .max_req_size = 2 MiB:   128 MB/s
> >   - 64K pages, .max_seg_size = 4 KiB:                   138 MB/s
> >   - 64K pages, .max_seg_size = .max_req_size = 8 MiB:   138 MB/s
>
> Thanks for sharing these results. From what I can see here, the
> performance changes significantly with the page size, but barely
> with the max_seg_size, so this does not have the effect I was
> hoping for. On a more positive note this likely means that we
> don't have to urgently backport your fix.
>
> This could mean that either there is not much coalescing across
> pages after all, or that the bottleneck is somewhere else.
>
> > diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
> > index 8e2d676b9239..cccd5633ff40 100644
> > --- a/drivers/mmc/host/dw_mmc.c
> > +++ b/drivers/mmc/host/dw_mmc.c
> > @@ -2951,8 +2951,8 @@ static int dw_mci_init_slot(struct dw_mci *host)
> >       if (host->use_dma == TRANS_MODE_IDMAC) {
> >               mmc->max_segs = host->ring_size;
> >               mmc->max_blk_size = 65535;
> > -             mmc->max_seg_size = 0x1000;
> > -             mmc->max_req_size = mmc->max_seg_size * host->ring_size;
> > +             mmc->max_req_size = DW_MCI_DESC_DATA_LENGTH * host->ring_size;
> > +             mmc->max_seg_size = mmc->max_req_size;
>
> The change looks good to me.
>
> I see that the host->ring_size depends on PAGE_SIZE as well:
>
> #define DESC_RING_BUF_SZ        PAGE_SIZE
> host->ring_size = DESC_RING_BUF_SZ / sizeof(struct idmac_desc_64addr);
> host->sg_cpu = dmam_alloc_coherent(host->dev,
>                DESC_RING_BUF_SZ, &host->sg_dma, GFP_KERNEL);
>
> I don't see any reason for the ring buffer size to be tied to
> PAGE_SIZE at all, it was probably picked as a reasonable
> default in the initial driver but isn't necessarily ideal.
>
> From what I can see, the number of 4KB elements in the
> ring can be as small as 128 (4KB pages, 64-bit addresses)
> or as big as 4096 (64KB pages, 32-bit addresses), which is
> quite a difference. If you are still motivated to drill
> down into this, could you try changing DESC_RING_BUF_SZ
> to a fixed size of either 4KB or 64KB and test again
> with the opposite page size, to see if that changes the
> throughput?
>

Hi Arnd,

Sorry for the late reply. I'm a bit of busy with something else right
now (trying to enable this same driver for Exynos850 in U-Boot, hehe),
I'll try to carve out some time later and tinker with
DESC_RING_BUF_SZ. But for now, can we just apply this patch as is? As
I understand, it's fixing quite a major issue (at least from what I
heard), so it would be nice to have it in -next and -stable. Does that
sound reasonable?

Thanks!

> If a larger ring buffer gives us significantly better
> throughput, we may want to always use a higher number
> independent of page size. On the other hand, if the
> 64KB number (the 138MB/s) does not change with a smaller
> ring, we may as well reduce that in order to limit the
> maximum latency that is caused by a single I/O operation.
>
>      Arnd