From 21b34b4574f8619907eb751b37c1831bfa3f2440 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Wed, 3 Feb 2016 13:54:25 -0500 Subject: workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup commit d6e022f1d207a161cd88e08ef0371554680ffc46 upstream. When looking up the pool_workqueue to use for an unbound workqueue, workqueue assumes that the target CPU is always bound to a valid NUMA node. However, currently, when a CPU goes offline, the mapping is destroyed and cpu_to_node() returns NUMA_NO_NODE. This has always been broken but hasn't triggered often enough before 874bbfe600a6 ("workqueue: make sure delayed work run in local cpu"). After the commit, workqueue forcifully assigns the local CPU for delayed work items without explicit target CPU to fix a different issue. This widens the window where CPU can go offline while a delayed work item is pending causing delayed work items dispatched with target CPU set to an already offlined CPU. The resulting NUMA_NO_NODE mapping makes workqueue try to queue the work item on a NULL pool_workqueue and thus crash. While 874bbfe600a6 has been reverted for a different reason making the bug less visible again, it can still happen. Fix it by mapping NUMA_NO_NODE to the default pool_workqueue from unbound_pwq_by_node(). This is a temporary workaround. The long term solution is keeping CPU -> NODE mapping stable across CPU off/online cycles which is being worked on. Signed-off-by: Tejun Heo Reported-by: Mike Galbraith Cc: Tang Chen Cc: Rafael J. Wysocki Cc: Len Brown Link: http://lkml.kernel.org/g/1454424264.11183.46.camel@gmail.com Link: http://lkml.kernel.org/g/1453702100-2597-1-git-send-email-tangchen@cn.fujitsu.com Signed-off-by: Greg Kroah-Hartman --- kernel/workqueue.c | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'kernel/workqueue.c') diff --git a/kernel/workqueue.c b/kernel/workqueue.c index c579dbab2e36..ad98ea967323 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -568,6 +568,16 @@ static struct pool_workqueue *unbound_pwq_by_node(struct workqueue_struct *wq, int node) { assert_rcu_or_wq_mutex_or_pool_mutex(wq); + + /* + * XXX: @node can be NUMA_NO_NODE if CPU goes offline while a + * delayed item is pending. The plan is to keep CPU -> NODE + * mapping valid and stable across CPU on/offlines. Once that + * happens, this workaround can be removed. + */ + if (unlikely(node == NUMA_NO_NODE)) + return wq->dfl_pwq; + return rcu_dereference_raw(wq->numa_pwq_tbl[node]); } -- cgit v1.2.3 From 6684710434d07dfd7e512e5ecc77eefb5a30151e Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Tue, 9 Feb 2016 16:11:26 -0500 Subject: Revert "workqueue: make sure delayed work run in local cpu" commit 041bd12e272c53a35c54c13875839bcb98c999ce upstream. This reverts commit 874bbfe600a660cba9c776b3957b1ce393151b76. Workqueue used to implicity guarantee that work items queued without explicit CPU specified are put on the local CPU. Recent changes in timer broke the guarantee and led to vmstat breakage which was fixed by 176bed1de5bf ("vmstat: explicitly schedule per-cpu work on the CPU we need it to run on"). vmstat is the most likely to expose the issue and it's quite possible that there are other similar problems which are a lot more difficult to trigger. As a preventive measure, 874bbfe600a6 ("workqueue: make sure delayed work run in local cpu") was applied to restore the local CPU guarnatee. Unfortunately, the change exposed a bug in timer code which got fixed by 22b886dd1018 ("timers: Use proper base migration in add_timer_on()"). Due to code restructuring, the commit couldn't be backported beyond certain point and stable kernels which only had 874bbfe600a6 started crashing. The local CPU guarantee was accidental more than anything else and we want to get rid of it anyway. As, with the vmstat case fixed, 874bbfe600a6 is causing more problems than it's fixing, it has been decided to take the chance and officially break the guarantee by reverting the commit. A debug feature will be added to force foreign CPU assignment to expose cases relying on the guarantee and fixes for the individual cases will be backported to stable as necessary. Signed-off-by: Tejun Heo Fixes: 874bbfe600a6 ("workqueue: make sure delayed work run in local cpu") Link: http://lkml.kernel.org/g/20160120211926.GJ10810@quack.suse.cz Cc: Mike Galbraith Cc: Henrique de Moraes Holschuh Cc: Daniel Bilik Cc: Jan Kara Cc: Shaohua Li Cc: Sasha Levin Cc: Ben Hutchings Cc: Thomas Gleixner Cc: Daniel Bilik Cc: Jiri Slaby Cc: Michal Hocko Signed-off-by: Greg Kroah-Hartman --- kernel/workqueue.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'kernel/workqueue.c') diff --git a/kernel/workqueue.c b/kernel/workqueue.c index ad98ea967323..450c21fd0e6e 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -1468,13 +1468,13 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq, timer_stats_timer_set_start_info(&dwork->timer); dwork->wq = wq; - /* timer isn't guaranteed to run in this cpu, record earlier */ - if (cpu == WORK_CPU_UNBOUND) - cpu = raw_smp_processor_id(); dwork->cpu = cpu; timer->expires = jiffies + delay; - add_timer_on(timer, cpu); + if (unlikely(cpu != WORK_CPU_UNBOUND)) + add_timer_on(timer, cpu); + else + add_timer(timer); } /** -- cgit v1.2.3