[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <F7CD281DE3E379468C6D07993EA72F84D1881C54@RTITMBSVM04.realtek.com.tw>
Date: Tue, 30 Jul 2019 03:11:39 +0000
From: Tony Chuang <yhchuang@...ltek.com>
To: Jian-Hong Pan <jian-hong@...lessm.com>,
David Laight <David.Laight@...lab.com>
CC: Kalle Valo <kvalo@...eaurora.org>,
"David S . Miller" <davem@...emloft.net>,
"linux-wireless@...r.kernel.org" <linux-wireless@...r.kernel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux@...lessm.com" <linux@...lessm.com>,
"stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: RE: [PATCH] rtw88: pci: Use general byte arrays as the elements of RX ring
> > > > While allocating all 512 buffers in one block (just over 4MB)
> > > > is probably not a good idea, you may need to allocated (and dma map)
> > > > then in groups.
> > >
> > > Thanks for reviewing. But got questions here to double confirm the
> idea.
> > > According to original code, it allocates 512 skbs for RX ring and dma
> > > mapping one by one. So, the new code allocates memory buffer 512
> > > times to get 512 buffer arrays. Will the 512 buffers arrays be in one
> > > block? Do you mean aggregate the buffers as a scatterlist and use
> > > dma_map_sg?
> >
> > If you malloc a buffer of size (8192+32) the allocator will either
> > round it up to a whole number of (often 4k) pages or to a power of
> > 2 of pages - so either 12k of 16k.
> > I think the Linux allocator does the latter.
> > Some of the allocators also 'steal' a bit from the front of the buffer
> > for 'red tape'.
> >
> > OTOH malloc the space 15 buffers and the allocator will round the
> > 15*(8192 + 32) up to 32*4k - and you waste under 8k across all the
> > buffers.
> >
> > You then dma_map the large buffer and split into the actual rx buffers.
> > Repeat until you've filled the entire ring.
> > The only complication is remembering the base address (and size) for
> > the dma_unmap and free.
> > Although there is plenty of padding to extend the buffer structure
> > significantly without using more memory.
> > Allocate in 15's and you (probably) have 512 bytes per buffer.
> > Allocate in 31's and you have 256 bytes.
> >
> > The problem is that larger allocates are more likely to fail
> > (especially if the system has been running for some time).
> > So you almost certainly want to be able to fall back to smaller
> > allocates even though they use more memory.
> >
> > I also wonder if you actually need 512 8k rx buffers to cover
> > interrupt latency?
> > I've not done any measurements for 20 years!
>
> Thanks for the explanation.
> I am not sure the combination of 512 8k RX buffers. Maybe Realtek
> folks can give us some idea.
> Tony Chuang any comment?
>
> Jian-Hong Pan
>
512 RX buffers is not necessary I think. But I haven't had a chance to
test if reduce the number of RX SKBs could affect the latency.
I can run some throughput tests and then decide a minimum numbers
that RX ring requires. Or if you can try it.
Thanks.
Yan-Hsuan
Powered by blists - more mailing lists