[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a1683a2f-1940-a787-645e-fbf63b50e8a8@rock-chips.com>
Date: Wed, 13 Sep 2017 12:04:39 +0800
From: Shawn Lin <shawn.lin@...k-chips.com>
To: Linus Walleij <linus.walleij@...aro.org>
Cc: shawn.lin@...k-chips.com, Ulf Hansson <ulf.hansson@...aro.org>,
Adrian Hunter <adrian.hunter@...el.com>,
Pavel Machek <pavel@....cz>,
"linux-mmc@...r.kernel.org" <linux-mmc@...r.kernel.org>,
kernel list <linux-kernel@...r.kernel.org>,
Seraphime Kirkovski <kirkseraph@...il.com>
Subject: Re: 4.13 on thinkpad x220: oops when writing to SD card
On 2017/9/12 17:42, Linus Walleij wrote:
> On Fri, Sep 8, 2017 at 4:51 AM, Shawn Lin <shawn.lin@...k-chips.com> wrote:
>> On 2017/9/8 4:02, Linus Walleij wrote:
>>> On Thu, Sep 7, 2017 at 9:18 AM, Ulf Hansson <ulf.hansson@...aro.org>
>>> wrote:
>>>
>>>> Even if this fixes the problem it seems like we are papering over the
>>>> real issue, which earlier fixes also did during the release cycle for
>>>> v4.13.
>>>
>>>
>>> I think this is the real solution to the issue.
>>>
>>>>> Another unrelated issue with mmc_init_request() is that
>>>>> mmc_exit_request()
>>>>> is not called if mmc_init_request() fails, which means
>>>>> mmc_init_request()
>>>>> must free anything it allocates when it fails.
>>>>
>>>>
>>>> Yes, the situations it's just too fragile. We need to fix the behavior
>>>> properly, although I haven't myself been able to investigate exactly
>>>> how yet.
>>>>
>>>> Adding, Linus, perhaps he has some ideas.
>>>
>>>
>>> Maybe we should simply bite the bullet and do what was suggested
>>> by another contributor when I refactored the bounce buffer handling:
>>> simply delete the bounce buffer code and let any remaining (few?)
>>> legacy devices suffer a bit (performancewise) at the gain of way
>>> simpler code?
>>
>> Are you in the same page with what Adrian pointed to?
>>
>> IIUC, the issue is:
>> init_rq_fn will be called each time when trying to create new reuqest
>> from the pre-allocated request_list memeory pool, and exit_rq_fn will is
>> in the corresponding routine to free request from request_list
>> (blk_free_request) when finished. But if alloc_request_size fails, it
>> won't call exit_rq_fn, so you need to prevent memory leak on your own
>> error path of init_rq_fn.
>
> Yes and that is what the current patch fixes, is it not?
Nope. It fixed the *out-of-bound* memory usage as request queue will
hold 4 requests by default and only use these four requests when meeting
memory pressure. But the code didn't place the correct bouncesz so that
the 4 pre-allocated requests didn't have valid memory page when
allocated. But in runtime, normally the request queue asks for new
request from request list by dynamically allocating it. So your
init_rq_fn will be called each time. However the mmc_init_request
didn't handle the error path properly。
I simply add SDHCI_QUIRK_BROKEN_ADMA for my SDHCI to force it use
SDMA so that the host->max_segs should be 1. Then add some a hack to
simulate some random failure for mmc_alloc_sg and then after some iozone
stress test, the memory is exhausted finally.
+static u64 skip = 0;
static struct scatterlist *mmc_alloc_sg(int sg_len, gfp_t gfp)
{
struct scatterlist *sg;
+ u32 random = 0;
+
+ /* Simply skip the bootup stage */
+ if (skip++ >= 0x800)
+ random = get_random_int();
+
+ if (unlikely((random & 0xf) > 0xd)) {
+ printk("mmc_alloc_sg: make fake failure, random =
0x%x\n", random);
+ return NULL;
+ }
[ 540.507195] iozone invoked oom-killer:
gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK),
nodemask=(null), order=0, oom_score_adj=0
[ 540.508302] iozone cpuset=/ mems_allowed=0
[ 540.508727] CPU: 2 PID: 3039 Comm: iozone Not tainted
4.13.0-next-20170912-00003-g01cc0dd5-dirty #110
[ 540.509537] Hardware name: Firefly-RK3399 Board (DT)
[ 540.509977] Call trace:
[ 540.510221] [<ffff20000808b3d0>] dump_backtrace+0x0/0x480
[ 540.510707] [<ffff20000808b864>] show_stack+0x14/0x20
[ 540.511164] [<ffff200008dfbabc>] dump_stack+0xa4/0xc8
[ 540.511621] [<ffff2000081fc744>] dump_header+0xc4/0x2e8
[ 540.512091] [<ffff2000081fb888>] oom_kill_process+0x3a8/0x728
[ 540.512605] [<ffff2000081fc090>] out_of_memory+0x1a8/0x6d8
[ 540.513099] [<ffff200008203ea8>] __alloc_pages_nodemask+0xef8/0xf80
[ 540.513660] [<ffff200008275d6c>] alloc_pages_current+0x9c/0x128
[ 540.514191] [<ffff200008281f60>] new_slab+0x488/0x5d8
[ 540.514645] [<ffff200008284198>] ___slab_alloc+0x378/0x5d0
[ 540.515138] [<ffff200008284414>] __slab_alloc.isra.23+0x24/0x38
[ 540.515668] [<ffff200008284d44>] kmem_cache_alloc+0x1ec/0x218
[ 540.516183] [<ffff20000830dc30>] __blockdev_direct_IO+0x220/0x4a00
>
>> But you seem to talk about removing the bounce buffer and so finally
>> get rid of registering init_rq_fn/exit_rq_fn?
>
> That is in direct response to Ulf's statement
> "the situations it's just too fragile" and what can be done about
> that is not make the code even more complex but make it
> simpler and easier to follow.
>
> One way to achieve that in the long term, is to delete bounce
> buffer handling since it adds overhead and complexity.
>
>> That is another thing,
>> and what we right now need to do is to fix the pontential memory leak.
>> It's quite a simple action, right? :)
>
> It is another thing.
>
> This patch fixes a memory leak, but Ulf is asking how we may
> avoid fragile behaviour.
>
>>> I am a bit hesitant about that because Pierre Ossman said it was
>>> actually a big win on the SDHC hosts that made use of it at one
>>> point.
>>
>> You had removed packed cmd support to simplify the code, so I think
>> this is another trade-off need to ask: What you want? and keep
>> consistent with the direction you insisted on.
>
> Packed command could be removed because it was not used by
> any in-tree code. See
> commit 03d640ae1f9b24b1d2a11f747143a1ecc0745019
>
Ok, I was surprised that no any in-tree host enabled it, but if some
host did, the situation was similar with this one.
> Bounce buffers on the other hand, as this patch shows, is very much
> in use. So if they e.g. improve throughput on these systems
> (mainly laptops I think?) it should definately be kept around.
>
> It might be a good idea to make a patch to remove the bounce
> buffers and ask people to do before/after throughput tests,
> because I do not have this hardware myself. If it doesn't provide
> any throughput improvements to any system in use, it should
> be deleted. But I don't know that yet.
yeap, sounds good but it's up to Ulf.
>
> Yours,
> Linus Walleij
>
>
>
Powered by blists - more mailing lists