summaryrefslogtreecommitdiff
path: root/kernel (follow)
Commit message (Collapse)AuthorAge
...
| * | sched/fair: ignore backup CPU when not validPatrick Bellasi2017-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The find_best_target can sometimes not return a valid backup CPU, either because it cannot find one or just becasue it returns prev_cpu as a backup. In these cases we should skip the energy_diff evaluation for the backup CPU. Change-Id: I3787dbdfe74122348dd7a7485b88c4679051bd32 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>
| * | sched/fair: trace energy_diff for non boosted tasksPatrick Bellasi2017-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In systems where SchedTune is enabled, we do not report energy diff for non boosted tasks. Let's fix this by always genereting an energy_diff event where however: nrg.delta = 0, since we skip energy normalization payoff = nrg.diff, since the payoff is defined just by the energy difference Change-Id: I9a11ec19b6f56da04147f5ae5b47daf1dd180445 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>
| * | UPSTREAM: sched/fair: Sync task util before slow-path wakeupBrendan Jackman2017-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We use task_util() in find_idlest_group() via capacity_spare_wake(). This task_util() updated in wake_cap(). However wake_cap() is not the only reason for ending up in find_idlest_group() - we could have been sent there by wake_wide(). So explicitly sync the task util with prev_cpu when we are about to head to find_idlest_group(). We could simply do this at the beginning of select_task_rq_fair() (i.e. irrespective of whether we're heading to select_idle_sibling() or find_idlest_group() & co), but I didn't want to slow down the select_idle_sibling() path more than necessary. Don't do this during fork balancing, we won't need the task_util and we'd just clobber the last_update_time, which is supposed to be 0. Change-Id: I935f4bfdfec3e8b914457aac3387ce264d5fd484 Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andres Oportus <andresoportus@google.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: http://lkml.kernel.org/r/20170808095519.10077-1-brendan.jackman@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry-picked-from: commit ea16f0ea6c3d tip:sched/core) Signed-off-by: Chris Redpath <chris.redpath@arm.com>
| * | UPSTREAM: sched/fair: Fix usage of find_idlest_group() when the local group ↵Brendan Jackman2017-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | is idlest find_idlest_group() returns NULL when the local group is idlest. The caller then continues the find_idlest_group() search at a lower level of the current CPU's sched_domain hierarchy. find_idlest_group_cpu() is not consulted and, crucially, @new_cpu is not updated. This means the search is pointless and we return @prev_cpu from select_task_rq_fair(). This is fixed by initialising @new_cpu to @cpu instead of @prev_cpu. Change-Id: Ie531f5bb29775952bdc4c148b6e974b2f5f32b7a Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20171005114516.18617-6-brendan.jackman@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry-picked-from: commit 93f50f90247e tip:sched/core) Signed-off-by: Chris Redpath <chris.redpath@arm.com>
| * | UPSTREAM: sched/fair: Fix usage of find_idlest_group() when no groups are ↵Brendan Jackman2017-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | allowed When 'p' is not allowed on any of the CPUs in the sched_domain, we currently return NULL from find_idlest_group(), and pointlessly continue the search on lower sched_domain levels (where 'p' is also not allowed) before returning prev_cpu regardless (as we have not updated new_cpu). Add an explicit check for this case, and add a comment to find_idlest_group(). Now when find_idlest_group() returns NULL, it always means that the local group is allowed and idlest. Change-Id: I5f2648d2f7fb0465677961ecb7473df3d06f0057 Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Josef Bacik <jbacik@fb.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20171005114516.18617-5-brendan.jackman@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry-picked-from: commit 6fee85ccbc76 tip:sched/core) Signed-off-by: Chris Redpath <chris.redpath@arm.com>
| * | BACKPORT: sched/fair: Fix find_idlest_group when local group is not allowedBrendan Jackman2017-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the local group is not allowed we do not modify this_*_load from their initial value of 0. That means that the load checks at the end of find_idlest_group cause us to incorrectly return NULL. Fixing the initial values to ULONG_MAX means we will instead return the idlest remote group in that case. BACKPORT: Note 4.4 is missing commit 6b94780e45c1 "sched/core: Use load_avg for selecting idlest group", so we only have to fix this_load instead of this_runnable_load and this_avg_load. Change-Id: I41f775b0e7c8f5e675c2780f955bb130a563cba7 Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Josef Bacik <jbacik@fb.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20171005114516.18617-4-brendan.jackman@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry-picked-from: commit 0d10ab952e99 tip:sched/core) (backport changes described above) Signed-off-by: Chris Redpath <chris.redpath@arm.com>
| * | UPSTREAM: sched/fair: Remove unnecessary comparison with -1Brendan Jackman2017-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit: 83a0a96a5f26 ("sched/fair: Leverage the idle state info when choosing the "idlest" cpu") find_idlest_group_cpu() (formerly find_idlest_cpu) no longer returns -1, so we can simplify the checking of the return value in find_idlest_cpu(). Change-Id: I98f4b9f178cd93a30408e024e608d36771764c7b Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20171005114516.18617-3-brendan.jackman@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry-picked-from commit e90381eaecf6 in tip:sched/core) Signed-off-by: Chris Redpath <chris.redpath@arm.com>
| * | BACKPORT: sched/fair: Move select_task_rq_fair slow-path into its own functionBrendan Jackman2017-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for changes that would otherwise require adding a new level of indentation to the while(sd) loop, create a new function find_idlest_cpu() which contains this loop, and rename the existing find_idlest_cpu() to find_idlest_group_cpu(). Code inside the while(sd) loop is unchanged. @new_cpu is added as a variable in the new function, with the same initial value as the @new_cpu in select_task_rq_fair(). Change-Id: I9842308cab00dc9cd6c513fc38c609089a1aaaaf Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20171005114516.18617-2-brendan.jackman@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> (reworked for eas/cas schedstats added in Android) (cherry-picked commit 18bd1b4bd53a from tip:sched/core) Signed-off-by: Chris Redpath <chris.redpath@arm.com>
| * | UPSTREAM: sched/fair: Force balancing on nohz balance if local group has ↵Brendan Jackman2017-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | capacity The "goto force_balance" here is intended to mitigate the fact that avg_load calculations can result in bad placement decisions when priority is asymmetrical. The original commit that adds it: fab476228ba3 ("sched: Force balancing on newidle balance if local group has capacity") explains: Under certain situations, such as a niced down task (i.e. nice = -15) in the presence of nr_cpus NICE0 tasks, the niced task lands on a sched group and kicks away other tasks because of its large weight. This leads to sub-optimal utilization of the machine. Even though the sched group has capacity, it does not pull tasks because sds.this_load >> sds.max_load, and f_b_g() returns NULL. A similar but inverted issue also affects ARM big.LITTLE (asymmetrical CPU capacity) systems - consider 8 always-running, same-priority tasks on a system with 4 "big" and 4 "little" CPUs. Suppose that 5 of them end up on the "big" CPUs (which will be represented by one sched_group in the DIE sched_domain) and 3 on the "little" (the other sched_group in DIE), leaving one CPU unused. Because the "big" group has a higher group_capacity its avg_load may not present an imbalance that would cause migrating a task to the idle "little". The force_balance case here solves the problem but currently only for CPU_NEWLY_IDLE balances, which in theory might never happen on the unused CPU. Including CPU_IDLE in the force_balance case means there's an upper bound on the time before we can attempt to solve the underutilization: after DIE's sd->balance_interval has passed the next nohz balance kick will help us out. Change-Id: I807ba5cba0ef1b8bbec02cbcd4755fd32af10135 Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170807163900.25180-1-brendan.jackman@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry-picked-from: commit 583ffd99d765 tip:sched/core) Signed-off-by: Chris Redpath <chris.redpath@arm.com>
| * | UPSTREAM: sched/core: Add missing update_rq_clock() call in set_user_nice()Peter Zijlstra2017-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Address this rq-clock update bug: WARNING: CPU: 30 PID: 195 at ../kernel/sched/sched.h:797 set_next_entity() rq->clock_update_flags < RQCF_ACT_SKIP Call Trace: dump_stack() __warn() warn_slowpath_fmt() set_next_entity() ? _raw_spin_lock() set_curr_task_fair() set_user_nice.part.85() set_user_nice() create_worker() worker_thread() kthread() ret_from_fork() Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit 2fb8d36787affe26f3536c3d8ec094995a48037d) Change-Id: I53ba056e72820c7fadb3f022e4ee3b821c0de17d Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>
| * | UPSTREAM: sched/core: Add missing update_rq_clock() call for task_hot()Peter Zijlstra2017-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the update_rq_clock() call at the top of the callstack instead of at the bottom where we find it missing, this to aid later effort to minimize the number of update_rq_lock() calls. WARNING: CPU: 30 PID: 194 at ../kernel/sched/sched.h:797 assert_clock_updated() rq->clock_update_flags < RQCF_ACT_SKIP Call Trace: dump_stack() __warn() warn_slowpath_fmt() assert_clock_updated.isra.63.part.64() can_migrate_task() load_balance() pick_next_task_fair() __schedule() schedule() worker_thread() kthread() Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit 3bed5e2166a5e433bf62162f3cd3c5174d335934) Change-Id: Ief5070dcce486535334dcb739ee16b989ea9df42 Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>
| * | UPSTREAM: sched/core: Add missing update_rq_clock() in detach_task_cfs_rq()Peter Zijlstra2017-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of adding the update_rq_clock() all the way at the bottom of the callstack, add one at the top, this to aid later effort to minimize update_rq_lock() calls. WARNING: CPU: 0 PID: 1 at ../kernel/sched/sched.h:797 detach_task_cfs_rq() rq->clock_update_flags < RQCF_ACT_SKIP Call Trace: dump_stack() __warn() warn_slowpath_fmt() detach_task_cfs_rq() switched_from_fair() __sched_setscheduler() _sched_setscheduler() sched_set_stop_task() cpu_stop_create() __smpboot_create_thread.part.2() smpboot_register_percpu_thread_cpumask() cpu_stop_init() do_one_initcall() ? print_cpu_info() kernel_init_freeable() ? rest_init() kernel_init() ret_from_fork() Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit 80f5c1b84baa8180c3c27b7e227429712cd967b6) Change-Id: Ibffde077d18eabec4c2984158bd9d6d73bd0fb96 Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>
| * | UPSTREAM: sched/core: Add missing update_rq_clock() in ↵Peter Zijlstra2017-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | post_init_entity_util_avg() Address this rq-clock update bug: WARNING: CPU: 0 PID: 0 at ../kernel/sched/sched.h:797 post_init_entity_util_avg() rq->clock_update_flags < RQCF_ACT_SKIP Call Trace: __warn() post_init_entity_util_avg() wake_up_new_task() _do_fork() kernel_thread() rest_init() start_kernel() Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit 4126bad6717336abe5d666440ae15555563ca53f) Change-Id: Ibe9a73386896377f96483d195e433259218755a5 Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>
| * | UPSTREAM: sched/core: Fix find_idlest_group() for forkVincent Guittot2017-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During fork, the utilization of a task is init once the rq has been selected because the current utilization level of the rq is used to set the utilization of the fork task. As the task's utilization is still 0 at this step of the fork sequence, it doesn't make sense to look for some spare capacity that can fit the task's utilization. Furthermore, I can see perf regressions for the test: hackbench -P -g 1 because the least loaded policy is always bypassed and tasks are not spread during fork. With this patch and the fix below, we are back to same performances as for v4.8. The fix below is only a temporary one used for the test until a smarter solution is found because we can't simply remove the test which is useful for others benchmarks | @@ -5708,13 +5708,6 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t | | avg_cost = this_sd->avg_scan_cost; | | - /* | - * Due to large variance we need a large fuzz factor; hackbench in | - * particularly is sensitive here. | - */ | - if ((avg_idle / 512) < avg_cost) | - return -1; | - | time = local_clock(); | | for_each_cpu_wrap(cpu, sched_domain_span(sd), target, wrap) { Tested-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Acked-by: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: kernellwp@gmail.com Cc: umgwanakikbuti@gmail.com Cc: yuyang.du@intel.comc Link: http://lkml.kernel.org/r/1481216215-24651-2-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit f519a3f1c6b7a990e5aed37a8f853c6ecfdee945) Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Change-Id: I86cc2ad81af3467c0b2f82b995111f428248baa4
| * | BACKPORT: sched/fair: Fix PELT integrity for new tasksPeter Zijlstra2017-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Vincent and Yuyang found another few scenarios in which entity tracking goes wobbly. The scenarios are basically due to the fact that new tasks are not immediately attached and thereby differ from the normal situation -- a task is always attached to a cfs_rq load average (such that it includes its blocked contribution) and are explicitly detached/attached on migration to another cfs_rq. Scenario 1: switch to fair class p->sched_class = fair_class; if (queued) enqueue_task(p); ... enqueue_entity() enqueue_entity_load_avg() migrated = !sa->last_update_time (true) if (migrated) attach_entity_load_avg() check_class_changed() switched_from() (!fair) switched_to() (fair) switched_to_fair() attach_entity_load_avg() If @p is a new task that hasn't been fair before, it will have !last_update_time and, per the above, end up in attach_entity_load_avg() _twice_. Scenario 2: change between cgroups sched_move_group(p) if (queued) dequeue_task() task_move_group_fair() detach_task_cfs_rq() detach_entity_load_avg() set_task_rq() attach_task_cfs_rq() attach_entity_load_avg() if (queued) enqueue_task(); ... enqueue_entity() enqueue_entity_load_avg() migrated = !sa->last_update_time (true) if (migrated) attach_entity_load_avg() Similar as with scenario 1, if @p is a new task, it will have !load_update_time and we'll end up in attach_entity_load_avg() _twice_. Furthermore, notice how we do a detach_entity_load_avg() on something that wasn't attached to begin with. As stated above; the problem is that the new task isn't yet attached to the load tracking and thereby violates the invariant assumption. This patch remedies this by ensuring a new task is indeed properly attached to the load tracking on creation, through post_init_entity_util_avg(). Of course, this isn't entirely as straightforward as one might think, since the task is hashed before we call wake_up_new_task() and thus can be poked at. We avoid this by adding TASK_NEW and teaching cpu_cgroup_can_attach() to refuse such tasks. .:: BACKPORT Complicated by the fact that mch of the lines changed by the original of this commit were then changed by: df217913e72e sched/fair: Factorize attach/detach entity <Vincent Guittot> and then d31b1a66cbe0 sched/fair: Factorize PELT update <Vincent Guittot> , which have both already been backported here. Reported-by: Yuyang Du <yuyang.du@intel.com> Reported-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit 7dc603c9028ea5d4354e0e317e8481df99b06d7e) Change-Id: Ibc59eb52310a62709d49a744bd5a24e8b97c4ae8 Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>
| * | BACKPORT: sched/cgroup: Fix cpu_cgroup_fork() handlingVincent Guittot2017-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new fair task is detached and attached from/to task_group with: cgroup_post_fork() ss->fork(child) := cpu_cgroup_fork() sched_move_task() task_move_group_fair() Which is wrong, because at this point in fork() the task isn't fully initialized and it cannot 'move' to another group, because its not attached to any group as yet. In fact, cpu_cgroup_fork() needs a small part of sched_move_task() so we can just call this small part directly instead sched_move_task(). And the task doesn't really migrate because it is not yet attached so we need the following sequence: do_fork() sched_fork() __set_task_cpu() cgroup_post_fork() set_task_rq() # set task group and runqueue wake_up_new_task() select_task_rq() can select a new cpu __set_task_cpu post_init_entity_util_avg attach_task_cfs_rq() activate_task enqueue_task This patch makes that happen. BACKPORT: Difference from original commit: - Removed use of DEQUEUE_MOVE (which isn't defined in 4.4) in dequeue_task flags - Replaced "struct rq_flags rf" with "unsigned long flags". Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> [ Added TASK_SET_GROUP to set depth properly. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit ea86cb4b7621e1298a37197005bf0abcc86348d4) Change-Id: I8126fd923288acf961218431ffd29d6bf6fd8d72 Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>
| * | UPSTREAM: sched/fair: Fix and optimize the fork() pathPeter Zijlstra2017-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The task_fork_fair() callback already calls __set_task_cpu() and takes rq->lock. If we move the sched_class::task_fork callback in sched_fork() under the existing p->pi_lock, right after its set_task_cpu() call, we can avoid doing two such calls and omit the IRQ disabling on the rq->lock. Change to __set_task_cpu() to skip the migration bits, this is a new task, not a migration. Similarly, make wake_up_new_task() use __set_task_cpu() for the same reason, the task hasn't actually migrated as it hasn't ever ran. This cures the problem of calling migrate_task_rq_fair(), which does remove_entity_from_load_avg() on tasks that have never been added to the load avg to begin with. This bug would result in transiently messed up load_avg values, averaged out after a few dozen milliseconds. This is probably the reason why this bug was not found for such a long time. Reported-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit e210bffd39d01b649c94b820c28ff112673266dd) Change-Id: Icbddbaa6e8c1071859673d8685bc3f38955cf144 Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>
| * | BACKPORT: sched/fair: Make it possible to account fair load avg consistentlyChris Redpath2017-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While set_task_rq_fair() is introduced in mainline by commit ad936d8658fd ("sched/fair: Make it possible to account fair load avg consistently"), the function results to be introduced here by the backport of commit 09a43ace1f98 ("sched/fair: Propagate load during synchronous attach/detach"). The problem (apart from the confusion introduced by the backport) is actually that set_task_rq_fair() is currently not called at all. Fix the problem by backporting again commit ad936d8658fd ("sched/fair: Make it possible to account fair load avg consistently"). Original change log: The current code accounts for the time a task was absent from the fair class (per ATTACH_AGE_LOAD). However it does not work correctly when a task got migrated or moved to another cgroup while outside of the fair class. This patch tries to address that by aging on migration. We locklessly read the 'last_update_time' stamp from both the old and new cfs_rq, ages the load upto the old time, and sets it to the new time. These timestamps should in general not be more than 1 tick apart from one another, so there is a definite bound on things. Signed-off-by: Byungchul Park <byungchul.park@lge.com> [ Changelog, a few edits and !SMP build fix ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1445616981-29904-2-git-send-email-byungchul.park@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry-picked from ad936d8658fd348338cb7d42c577dac77892b074) Signed-off-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Change-Id: I17294ab0ada3901d35895014715fd60952949358 Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
| * | cpufreq/sched: Consider max cpu capacity when choosing frequenciesChris Redpath2017-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using schedfreq on cpus with max capacity significantly smaller than 1024, the tick update uses non-normalised capacities - this leads to selecting an incorrect OPP as we were scaling the frequency as if the max capacity achievable was 1024 rather than the max for that particular cpu or group. This could result in a cpu being stuck at the lowest OPP and unable to generate enough utilisation to climb out if the max capacity is significantly smaller than 1024. Instead, normalize the capacity to be in the range 0-1024 in the tick so that when we later select a frequency, we get the correct one. Also comments updated to be clearer about what is needed. Change-Id: Id84391c7ac015311002ada21813a353ee13bee60 Signed-off-by: Chris Redpath <chris.redpath@arm.com>
| * | ANDROID: sched/fair: Select correct capacity state for energy_diffChris Redpath2017-10-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The util returned from group_max_util is not capped at the max util present in the group, so it can be larger than the capacity stored in the array. Ensure that when this happens, we always use the last entry in the array to fetch energy from. Tested with synthetics on Juno board. Bug: 38159576 Change-Id: I89fb52fb7e68fa3e682e308acc232596672d03f7 Signed-off-by: Chris Redpath <chris.redpath@arm.com>
| * | cpufreq: schedutil: clamp util to CPU maximum capacityLeo Yan2017-10-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code is to get the CPU util by accumulate different scheduling classes and when the total util value is larger than CPU capacity then it clamps util to CPU maximum capacity. So we can get correct util value when use PELT signal but if with WALT signal it misses to clamp util value. On the other hand, WALT doesn't accumulate different class utilization but it needs to applying boost margin for WALT signal the CPU util value is possible to be larger than CPU capacity; so this patch is to always clamp util to CPU maximum capacity. Change-Id: I05481ddbf20246bb9be15b6bd21b6ec039015ea8 Signed-off-by: Leo Yan <leo.yan@linaro.org>
| * | cpufreq/sched: Use cpu max freq rather than policy maxChris Redpath2017-10-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we convert capacity into frequency, we used policy->max to get the max freq of the cpu. Since this can be changed by userspace policy or thermal events, we are potentially asking for a lower frequency than the utilization demands. Change over to using cpuinfo.max which is the max freq supported by that cpu rather than the currently-chosen max. Frequency granted still honours the max policy. Tested by setting a userspace policy and observing the relevant vars in a trace. In this instance, we ask for around 1ghz instead of 620MHz. freq_new=1013512 unfixed_freq_new=624487 capacity=546 cpuinfo_max=1900800 policy_max=1171200 Change-Id: I8c5694db42243c6fb78bb9be9046b06ac81295e7 Signed-off-by: Chris Redpath <chris.redpath@arm.com>
* | | sched: restore discarded ifdef CONFIG_SCHED_WALT codeBlagovest Kolenichev2017-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Code closed in ifdef CONFIG_SCHED_WALT blocks is not used in msm-4.4 builds, hence in order to be as much as closer to upstream and subsequently to have less merge conflicts in the future, let's restore this code. Restore below CONFIG_SCHED_WALT changes in file [1]: be832f6 sched: walt: Leverage existing ^^^^^^^ Discarded in dbad9b8. efb86bd sched: Introduce Window Assisted Load Tracking (WALT) ^^^^^^^ Restore only the block, which is modified by be832f6. Discarded in efbe378. dbad9b8 Merge android-4.4@89074de (v4.4.94) into msm-4.4 efbe378 Merge branch 'v4.4-16.09-android-tmp' into lsk-v4.4-16.09-android [1] kernel/sched/sched.h Change-Id: Ifd7e230b3b47dde61abf2472f092ff78d80b7427 Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
* | | Merge android-4.4@89074de (v4.4.94) into msm-4.4Blagovest Kolenichev2017-10-27
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * refs/heads/tmp-89074de Linux 4.4.94 Revert "tty: goldfish: Fix a parameter of a call to free_irq" cpufreq: CPPC: add ACPI_PROCESSOR dependency nfsd/callback: Cleanup callback cred on shutdown target/iscsi: Fix unsolicited data seq_end_offset calculation uapi: fix linux/mroute6.h userspace compilation errors uapi: fix linux/rds.h userspace compilation errors ceph: clean up unsafe d_parent accesses in build_dentry_path i2c: at91: ensure state is restored after suspending net: mvpp2: release reference to txq_cpu[] entry after unmapping scsi: scsi_dh_emc: return success in clariion_std_inquiry() slub: do not merge cache if slub_debug contains a never-merge flag ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock crypto: xts - Add ECB dependency net/mlx4_core: Fix VF overwrite of module param which disables DMFS on new probed PFs sparc64: Migrate hvcons irq to panicked cpu md/linear: shutup lockdep warnning f2fs: do not wait for writeback in write_begin Btrfs: send, fix failure to rename top level inode due to name collision iio: adc: xilinx: Fix error handling netfilter: nf_ct_expect: Change __nf_ct_expect_check() return value. net/mlx4_en: fix overflow in mlx4_en_init_timestamp() mac80211: fix power saving clients handling in iwlwifi mac80211_hwsim: check HWSIM_ATTR_RADIO_NAME length irqchip/crossbar: Fix incorrect type of local variables watchdog: kempld: fix gcc-4.3 build locking/lockdep: Add nest_lock integrity test Revert "bsg-lib: don't free job in bsg_prepare_job" tipc: use only positive error codes in messages net: Set sk_prot_creator when cloning sockets to the right proto packet: only test po->has_vnet_hdr once in packet_snd packet: in packet_do_bind, test fanout with bind_lock held tun: bail out from tun_get_user() if the skb is empty l2tp: fix race condition in l2tp_tunnel_delete l2tp: Avoid schedule while atomic in exit_net vti: fix use after free in vti_tunnel_xmit/vti6_tnl_xmit isdn/i4l: fetch the ppp_write buffer in one shot bpf: one perf event close won't free bpf program attached by another perf event packet: hold bind lock when rebinding to fanout hook net: emac: Fix napi poll list corruption ip6_gre: skb_push ipv6hdr before packing the header in ip6gre_header udpv6: Fix the checksum computation when HW checksum does not apply bpf/verifier: reject BPF_ALU64|BPF_END sctp: potential read out of bounds in sctp_ulpevent_type_enabled() MIPS: Fix minimum alignment requirement of IRQ stack drm/dp/mst: save vcpi with payloads percpu: make this_cpu_generic_read() atomic w.r.t. interrupts trace: sched: Fix util_avg_walt in sched_load_avg_cpu trace sched/fair: remove erroneous RCU_LOCKDEP_WARN from start_cpu() sched: EAS/WALT: finish accounting prior to task_tick cpufreq: sched: update capacity request upon tick always sched/fair: prevent meaningless active migration sched: walt: Leverage existing helper APIs to apply invariance Conflicts: kernel/sched/core.c kernel/sched/fair.c kernel/sched/sched.h Change-Id: I0effac90fb6a4db559479bfa2fefa31c41200ce9 Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
| * | Merge 4.4.94 into android-4.4Greg Kroah-Hartman2017-10-22
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.94 percpu: make this_cpu_generic_read() atomic w.r.t. interrupts drm/dp/mst: save vcpi with payloads MIPS: Fix minimum alignment requirement of IRQ stack sctp: potential read out of bounds in sctp_ulpevent_type_enabled() bpf/verifier: reject BPF_ALU64|BPF_END udpv6: Fix the checksum computation when HW checksum does not apply ip6_gre: skb_push ipv6hdr before packing the header in ip6gre_header net: emac: Fix napi poll list corruption packet: hold bind lock when rebinding to fanout hook bpf: one perf event close won't free bpf program attached by another perf event isdn/i4l: fetch the ppp_write buffer in one shot vti: fix use after free in vti_tunnel_xmit/vti6_tnl_xmit l2tp: Avoid schedule while atomic in exit_net l2tp: fix race condition in l2tp_tunnel_delete tun: bail out from tun_get_user() if the skb is empty packet: in packet_do_bind, test fanout with bind_lock held packet: only test po->has_vnet_hdr once in packet_snd net: Set sk_prot_creator when cloning sockets to the right proto tipc: use only positive error codes in messages Revert "bsg-lib: don't free job in bsg_prepare_job" locking/lockdep: Add nest_lock integrity test watchdog: kempld: fix gcc-4.3 build irqchip/crossbar: Fix incorrect type of local variables mac80211_hwsim: check HWSIM_ATTR_RADIO_NAME length mac80211: fix power saving clients handling in iwlwifi net/mlx4_en: fix overflow in mlx4_en_init_timestamp() netfilter: nf_ct_expect: Change __nf_ct_expect_check() return value. iio: adc: xilinx: Fix error handling Btrfs: send, fix failure to rename top level inode due to name collision f2fs: do not wait for writeback in write_begin md/linear: shutup lockdep warnning sparc64: Migrate hvcons irq to panicked cpu net/mlx4_core: Fix VF overwrite of module param which disables DMFS on new probed PFs crypto: xts - Add ECB dependency ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock slub: do not merge cache if slub_debug contains a never-merge flag scsi: scsi_dh_emc: return success in clariion_std_inquiry() net: mvpp2: release reference to txq_cpu[] entry after unmapping i2c: at91: ensure state is restored after suspending ceph: clean up unsafe d_parent accesses in build_dentry_path uapi: fix linux/rds.h userspace compilation errors uapi: fix linux/mroute6.h userspace compilation errors target/iscsi: Fix unsolicited data seq_end_offset calculation nfsd/callback: Cleanup callback cred on shutdown cpufreq: CPPC: add ACPI_PROCESSOR dependency Revert "tty: goldfish: Fix a parameter of a call to free_irq" Linux 4.4.94 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| | * locking/lockdep: Add nest_lock integrity testPeter Zijlstra2017-10-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 7fb4a2cea6b18dab56d609530d077f168169ed6b ] Boqun reported that hlock->references can overflow. Add a debug test for that to generate a clear error when this happens. Without this, lockdep is likely to report a mysterious failure on unlock. Reported-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicolai Hähnle <Nicolai.Haehnle@amd.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| | * bpf: one perf event close won't free bpf program attached by another perf eventYonghong Song2017-10-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ec9dd352d591f0c90402ec67a317c1ed4fb2e638 ] This patch fixes a bug exhibited by the following scenario: 1. fd1 = perf_event_open with attr.config = ID1 2. attach bpf program prog1 to fd1 3. fd2 = perf_event_open with attr.config = ID1 <this will be successful> 4. user program closes fd2 and prog1 is detached from the tracepoint. 5. user program with fd1 does not work properly as tracepoint no output any more. The issue happens at step 4. Multiple perf_event_open can be called successfully, but only one bpf prog pointer in the tp_event. In the current logic, any fd release for the same tp_event will free the tp_event->prog. The fix is to free tp_event->prog only when the closing fd corresponds to the one which registered the program. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| | * bpf/verifier: reject BPF_ALU64|BPF_ENDEdward Cree2017-10-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit e67b8a685c7c984e834e3181ef4619cd7025a136 ] Neither ___bpf_prog_run nor the JITs accept it. Also adds a new test case. Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)") Signed-off-by: Edward Cree <ecree@solarflare.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | sched/fair: remove erroneous RCU_LOCKDEP_WARN from start_cpu()Dietmar Eggemann2017-10-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes: https://bugs.linaro.org/show_bug.cgi?id=3075 Change-Id: I62d714fc4b9366a9b2535649aa92d1edc840cf94 Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>
| * | sched: EAS/WALT: finish accounting prior to task_tickJoonwoo Park2017-10-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | In order to set rq->misfit_task in time, call update_task_ravg() prior to task_tick. This reduces upmigration delay by 1 scheduler window. Change-Id: I7cc80badd423f2e7684125fbfd853b0a3610f0e8 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
| * | cpufreq: sched: update capacity request upon tick alwaysJoonwoo Park2017-10-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At present, sched_freq_tick() skips updating of capacity update when current frequency is fmax. This can cause incorrect frequency drop when a CPU bound task goes into sleep for example : 1) A task (A) enqueues onto CPU 0 and executes for long time. 2) A new task (B) which has low task demand enqueues onto CPU 1 and executes long so becomes a CPU bound task. 3) Both CPU 0 and 1 gets scheduler tick but skip sched_freq_tick() since current frequency is fmax. 4) Task (A) sleeps and lower the CPU 0's capacity request. 5) Because task (B) voted CPU capacity at step 2 with low demand and skipped to request afterwards, cluster frequency for both CPU 0 and 1 drops to match capacity voted by CPU 1 at step 2 even though task (B) on CPU 1 requires max capacity. Fix such incorrectness by not skipping CPU capacity voting at tick path. Change-Id: Ieb46af1ac96ffce7a5532c58c7f07bf1ada06b86 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
| * | sched/fair: prevent meaningless active migrationJoonwoo Park2017-10-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At present need_active_balance() determines whether an active upmigration is needed by using capacity_of(). A CPU's capacity may be reduced by RT pressure, and therefore distinguishing capability differences with capacity_of() may lead to suboptimal active migrations to less capable CPUs. Use capacity_orig_of to distinguish differently capable CPUs in addition to capacity_of(), thus avoiding placing tasks on less capable CPUs due to instantaneous RT pressure. Change-Id: I3e1435246a8edc3ad618ef98a34866cfbd8c16a5 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> [markivx: Reworked the commit text a bit] Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
| * | sched: walt: Leverage existing helper APIs to apply invarianceVikram Mulukutla2017-10-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no need for a separate hierarchy of notifiers, APIs and variables in walt.c for the purpose of applying frequency and IPC invariance. Let's just use capacity_curr_of and get rid of a lot of the infrastructure relating to capacity, load_scale_factor etc. Change-Id: Ia220e2c896373fa535db05bff60f9aa33aefc978 Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
* | | Merge android-4.4@d6fbbe5 (v4.4.93) into msm-4.4Blagovest Kolenichev2017-10-20
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * refs/heads/tmp-d6fbbe5 Linux 4.4.93 x86/alternatives: Fix alt_max_short macro to really be a max() USB: serial: console: fix use-after-free after failed setup USB: serial: qcserial: add Dell DW5818, DW5819 USB: serial: option: add support for TP-Link LTE module USB: serial: cp210x: add support for ELV TFD500 USB: serial: ftdi_sio: add id for Cypress WICED dev board fix unbalanced page refcounting in bio_map_user_iov direct-io: Prevent NULL pointer access in submit_page_section usb: gadget: composite: Fix use-after-free in usb_composite_overwrite_options ALSA: line6: Fix leftover URB at error-path during probe ALSA: caiaq: Fix stray URB at probe error path ALSA: seq: Fix copy_from_user() call inside lock ALSA: seq: Fix use-after-free at creating a port ALSA: usb-audio: Kill stray URB at exiting iommu/amd: Finish TLB flush in amd_iommu_unmap() usb: renesas_usbhs: Fix DMAC sequence for receiving zero-length packet KVM: nVMX: fix guest CR4 loading when emulating L2 to L1 exit crypto: shash - Fix zero-length shash ahash digest crash HID: usbhid: fix out-of-bounds bug dmaengine: edma: Align the memcpy acnt array size with the transfer MIPS: math-emu: Remove pr_err() calls from fpu_emu() USB: dummy-hcd: Fix deadlock caused by disconnect detection rcu: Allow for page faults in NMI handlers iwlwifi: mvm: use IWL_HCMD_NOCOPY for MCAST_FILTER_CMD nl80211: Define policy for packet pattern attributes CIFS: Reconnect expired SMB sessions ext4: in ext4_seek_{hole,data}, return -ENXIO for negative offsets brcmfmac: add length check in brcmf_cfg80211_escan_handler() ANDROID: HACK: arm64: use -mno-implicit-float instead of -mgeneral-regs-only sched: Update task->on_rq when tasks are moving between runqueues FROMLIST: f2fs: expose some sectors to user in inline data or dentry case crypto: Work around deallocated stack frame reference gcc bug on sparc. UPSTREAM: f2fs: fix potential panic during fstrim ANDROID: fscrypt: remove unnecessary fscrypto.h ANDROID: binder: fix node sched policy calculation ANDROID: Kbuild, LLVMLinux: allow overriding clang target triple CHROMIUM: arm64: Disable asm-operand-width warning for clang CHROMIUM: kbuild: clang: Disable the 'duplicate-decl-specifier' warning UPSTREAM: x86/build: Use cc-option to validate stack alignment parameter UPSTREAM: x86/build: Fix stack alignment for CLang UPSTREAM: efi/libstub/arm64: Set -fpie when building the EFI stub BACKPORT: efi/libstub/arm64: Force 'hidden' visibility for section markers UPSTREAM: compiler, clang: always inline when CONFIG_OPTIMIZE_INLINING is disabled UPSTREAM: x86/boot: #undef memcpy() et al in string.c UPSTREAM: crypto: arm64/sha - avoid non-standard inline asm tricks UPSTREAM: kbuild: clang: Disable 'address-of-packed-member' warning UPSTREAM: x86/build: Specify stack alignment for clang UPSTREAM: x86/build: Use __cc-option for boot code compiler options BACKPORT: kbuild: Add __cc-option macro UPSTREAM: x86/hweight: Don't clobber %rdi BACKPORT: x86/hweight: Get rid of the special calling convention BACKPORT: x86/mm/kaslr: Use the _ASM_MUL macro for multiplication to work around Clang incompatibility UPSTREAM: crypto, x86: aesni - fix token pasting for clang UPSTREAM: x86/kbuild: Use cc-option to enable -falign-{jumps/loops} UPSTREAM: compiler, clang: properly override 'inline' for clang UPSTREAM: compiler, clang: suppress warning for unused static inline functions UPSTREAM: Kbuild: provide a __UNIQUE_ID for clang UPSTREAM: modules: mark __inittest/__exittest as __maybe_unused BACKPORT: kbuild: Add support to generate LLVM assembly files UPSTREAM: kbuild: use -Oz instead of -Os when using clang BACKPORT: kbuild, LLVMLinux: Add -Werror to cc-option to support clang UPSTREAM: kbuild: drop -Wno-unknown-warning-option from clang options UPSTREAM: kbuild: fix asm-offset generation to work with clang UPSTREAM: kbuild: consolidate redundant sed script ASM offset generation UPSTREAM: kbuild: Consolidate header generation from ASM offset information UPSTREAM: kbuild: clang: add -no-integrated-as to KBUILD_[AC]FLAGS UPSTREAM: kbuild: Add better clang cross build support Conflicts: arch/x86/lib/Makefile net/wireless/nl80211.c Change-Id: I76032e8d1206903bc948b9ed918e7ddee7e746c7 Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
| * | Merge 4.4.93 into android-4.4Dmitry Shmidt2017-10-19
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.93 brcmfmac: add length check in brcmf_cfg80211_escan_handler() ext4: in ext4_seek_{hole,data}, return -ENXIO for negative offsets CIFS: Reconnect expired SMB sessions nl80211: Define policy for packet pattern attributes iwlwifi: mvm: use IWL_HCMD_NOCOPY for MCAST_FILTER_CMD rcu: Allow for page faults in NMI handlers USB: dummy-hcd: Fix deadlock caused by disconnect detection MIPS: math-emu: Remove pr_err() calls from fpu_emu() dmaengine: edma: Align the memcpy acnt array size with the transfer HID: usbhid: fix out-of-bounds bug crypto: shash - Fix zero-length shash ahash digest crash KVM: nVMX: fix guest CR4 loading when emulating L2 to L1 exit usb: renesas_usbhs: Fix DMAC sequence for receiving zero-length packet iommu/amd: Finish TLB flush in amd_iommu_unmap() ALSA: usb-audio: Kill stray URB at exiting ALSA: seq: Fix use-after-free at creating a port ALSA: seq: Fix copy_from_user() call inside lock ALSA: caiaq: Fix stray URB at probe error path ALSA: line6: Fix leftover URB at error-path during probe usb: gadget: composite: Fix use-after-free in usb_composite_overwrite_options direct-io: Prevent NULL pointer access in submit_page_section fix unbalanced page refcounting in bio_map_user_iov USB: serial: ftdi_sio: add id for Cypress WICED dev board USB: serial: cp210x: add support for ELV TFD500 USB: serial: option: add support for TP-Link LTE module USB: serial: qcserial: add Dell DW5818, DW5819 USB: serial: console: fix use-after-free after failed setup x86/alternatives: Fix alt_max_short macro to really be a max() Linux 4.4.93 Change-Id: I731bf1eef5aca9728dddd23bfbe407f0c6ff2d84 Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
| | * rcu: Allow for page faults in NMI handlersPaul E. McKenney2017-10-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 28585a832602747cbfa88ad8934013177a3aae38 upstream. A number of architecture invoke rcu_irq_enter() on exception entry in order to allow RCU read-side critical sections in the exception handler when the exception is from an idle or nohz_full CPU. This works, at least unless the exception happens in an NMI handler. In that case, rcu_nmi_enter() would already have exited the extended quiescent state, which would mean that rcu_irq_enter() would (incorrectly) cause RCU to think that it is again in an extended quiescent state. This will in turn result in lockdep splats in response to later RCU read-side critical sections. This commit therefore causes rcu_irq_enter() and rcu_irq_exit() to take no action if there is an rcu_nmi_enter() in effect, thus avoiding the unscheduled return to RCU quiescent state. This in turn should make the kernel safe for on-demand RCU voyeurism. Link: http://lkml.kernel.org/r/20170922211022.GA18084@linux.vnet.ibm.com Cc: stable@vger.kernel.org Fixes: 0be964be0 ("module: Sanitize RCU usage and locking") Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | sched: Update task->on_rq when tasks are moving between runqueuesOlav Haugan2017-10-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Task->on_rq has three states: 0 - Task is not on runqueue (rq) 1 (TASK_ON_RQ_QUEUED) - Task is on rq 2 (TASK_ON_RQ_MIGRATING) - Task is on rq but in the process of being migrated to another rq When a task is moving between rqs task->on_rq state should be TASK_ON_RQ_MIGRATING in order for WALT to account rq's cumulative runnable average correctly. Without such state marking for all the classes, WALT's update_history() would try to fixup task's demand which was never contributed to any of CPUs during migration. Change-Id: Iced3428f3924fe8ab5d0075698273ead04f12d5b Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> [joonwoop: Reinforced changelog to explain why this is needed by WALT. Fixed conflicts in deadline.c] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | Merge android-4.4@73a2b70 (v4.4.92) into msm-4.4Blagovest Kolenichev2017-10-20
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * refs/heads/tmp-73a2b70 Linux 4.4.92 ext4: don't allow encrypted operations without keys ext4: Don't clear SGID when inheriting ACLs ext4: fix data corruption for mmap writes sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs nvme: protect against simultaneous shutdown invocations drm/i915/bios: ignore HDMI on port A brcmfmac: setup passive scan if requested by user-space uwb: ensure that endpoint is interrupt uwb: properly check kthread_run return value iio: adc: mcp320x: Fix oops on module unload iio: adc: mcp320x: Fix readout of negative voltages iio: ad7793: Fix the serial interface reset iio: core: Return error for failed read_reg staging: iio: ad7192: Fix - use the dedicated reset function avoiding dma from stack. iio: ad_sigma_delta: Implement a dedicated reset function iio: adc: twl4030: Disable the vusb3v1 rugulator in the error handling path of 'twl4030_madc_probe()' iio: adc: twl4030: Fix an error handling path in 'twl4030_madc_probe()' xhci: fix finding correct bus_state structure for USB 3.1 hosts USB: fix out-of-bounds in usb_set_configuration usb: Increase quirk delay for USB devices USB: core: harden cdc_parse_cdc_header USB: uas: fix bug in handling of alternate settings scsi: sd: Do not override max_sectors_kb sysfs setting iwlwifi: add workaround to disable wide channels in 5GHz HID: i2c-hid: allocate hid buffers for real worst case ftrace: Fix kmemleak in unregister_ftrace_graph stm class: Fix a use-after-free Drivers: hv: fcopy: restore correct transfer length driver core: platform: Don't read past the end of "driver_override" buffer ALSA: usx2y: Suppress kernel warning at page allocation failures ALSA: compress: Remove unused variable lsm: fix smack_inode_removexattr and xattr_getsecurity memleak USB: g_mass_storage: Fix deadlock when driver is unbound usb: gadget: mass_storage: set msg_registered after msg registered USB: devio: Don't corrupt user memory USB: dummy-hcd: Fix erroneous synchronization change USB: dummy-hcd: fix infinite-loop resubmission bug USB: dummy-hcd: fix connection failures (wrong speed) usb: pci-quirks.c: Corrected timeout values used in handshake ALSA: usb-audio: Check out-of-bounds access by corrupted buffer descriptor usb: renesas_usbhs: fix usbhsf_fifo_clear() for RX direction usb: renesas_usbhs: fix the BCLR setting condition for non-DCP pipe usb-storage: unusual_devs entry to fix write-access regression for Seagate external drives usb: gadget: udc: atmel: set vbus irqflags explicitly USB: gadgetfs: fix copy_to_user while holding spinlock USB: gadgetfs: Fix crash caused by inadequate synchronization usb: gadget: inode.c: fix unbalanced spin_lock in ep0_write ANDROID: binder: init desired_prio.sched_policy before use it BACKPORT: net: xfrm: support setting an output mark. UPSTREAM: xfrm: Only add l3mdev oif to dst lookups UPSTREAM: net: l3mdev: Add master device lookup by index Linux 4.4.91 ttpci: address stringop overflow warning ALSA: au88x0: avoid theoretical uninitialized access ARM: remove duplicate 'const' annotations' IB/qib: fix false-postive maybe-uninitialized warning drivers: firmware: psci: drop duplicate const from psci_of_match libata: transport: Remove circular dependency at free time xfs: remove kmem_zalloc_greedy i2c: meson: fix wrong variable usage in meson_i2c_put_data md/raid10: submit bio directly to replacement disk rds: ib: add error handle iommu/io-pgtable-arm: Check for leaf entry before dereferencing it parisc: perf: Fix potential NULL pointer dereference netfilter: nfnl_cthelper: fix incorrect helper->expect_class_max exynos-gsc: Do not swap cb/cr for semi planar formats MIPS: IRQ Stack: Unwind IRQ stack onto task stack netfilter: invoke synchronize_rcu after set the _hook_ to NULL bridge: netlink: register netdevice before executing changelink mmc: sdio: fix alignment issue in struct sdio_func usb: plusb: Add support for PL-27A1 team: fix memory leaks net/packet: check length in getsockopt() called with PACKET_HDRLEN net: core: Prevent from dereferencing null pointer when releasing SKB MIPS: Lantiq: Fix another request_mem_region() return code check ASoC: dapm: fix some pointer error handling usb: chipidea: vbus event may exist before starting gadget audit: log 32-bit socketcalls ASoC: dapm: handle probe deferrals partitions/efi: Fix integer overflow in GPT size calculation USB: serial: mos7840: fix control-message error handling USB: serial: mos7720: fix control-message error handling drm/amdkfd: fix improper return value on error IB/ipoib: Replace list_del of the neigh->list with list_del_init IB/ipoib: rtnl_unlock can not come after free_netdev IB/ipoib: Fix deadlock over vlan_mutex tty: goldfish: Fix a parameter of a call to free_irq ARM: 8635/1: nommu: allow enabling REMAP_VECTORS_TO_RAM iio: adc: hx711: Add DT binding for avia,hx711 iio: adc: axp288: Drop bogus AXP288_ADC_TS_PIN_CTRL register modifications hwmon: (gl520sm) Fix overflows and crash seen when writing into limit attributes sh_eth: use correct name for ECMR_MPDE bit extcon: axp288: Use vbus-valid instead of -present to determine cable presence igb: re-assign hw address pointer on reset after PCI error MIPS: ralink: Fix incorrect assignment on ralink_soc MIPS: Ensure bss section ends on a long-aligned address ARM: dts: r8a7790: Use R-Car Gen 2 fallback binding for msiof nodes RDS: RDMA: Fix the composite message user notification GFS2: Fix reference to ERR_PTR in gfs2_glock_iter_next drm: bridge: add DT bindings for TI ths8135 drm_fourcc: Fix DRM_FORMAT_MOD_LINEAR #define FROMLIST: tracing: Add support for preempt and irq enable/disable events FROMLIST: tracing: Prepare to add preempt and irq trace events ANDROID: binder: fix transaction leak. ANDROID: binder: Add tracing for binder priority inheritance. Linux 4.4.90 fix xen_swiotlb_dma_mmap prototype swiotlb-xen: implement xen_swiotlb_dma_mmap callback video: fbdev: aty: do not leak uninitialized padding in clk to userspace KVM: VMX: use cmpxchg64 ARM: pxa: fix the number of DMA requestor lines ARM: pxa: add the number of DMA requestor lines dmaengine: mmp-pdma: add number of requestors cxl: Fix driver use count KVM: VMX: remove WARN_ON_ONCE in kvm_vcpu_trigger_posted_interrupt KVM: VMX: do not change SN bit in vmx_update_pi_irte() timer/sysclt: Restrict timer migration sysctl values to 0 and 1 gfs2: Fix debugfs glocks dump x86/fpu: Don't let userspace set bogus xcomp_bv btrfs: prevent to set invalid default subvolid btrfs: propagate error to btrfs_cmp_data_prepare caller btrfs: fix NULL pointer dereference from free_reloc_roots() PCI: Fix race condition with driver_override kvm: nVMX: Don't allow L2 to access the hardware CR8 KVM: VMX: Do not BUG() on out-of-bounds guest IRQ arm64: fault: Route pte translation faults via do_translation_fault arm64: Make sure SPsel is always set seccomp: fix the usage of get/put_seccomp_filter() in seccomp_get_filter() bsg-lib: don't free job in bsg_prepare_job nl80211: check for the required netlink attributes presence vfs: Return -ENXIO for negative SEEK_HOLE / SEEK_DATA offsets SMB3: Don't ignore O_SYNC/O_DSYNC and O_DIRECT flags SMB: Validate negotiate (to protect against downgrade) even if signing off Fix SMB3.1.1 guest authentication to Samba powerpc/pseries: Fix parent_dn reference leak in add_dt_node() KEYS: prevent KEYCTL_READ on negative key KEYS: prevent creating a different user's keyrings KEYS: fix writing past end of user-supplied buffer in keyring_read() crypto: talitos - fix sha224 crypto: talitos - Don't provide setkey for non hmac hashing algs. scsi: scsi_transport_iscsi: fix the issue that iscsi_if_rx doesn't parse nlmsg properly md/raid5: preserve STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list md/raid5: fix a race condition in stripe batch tracing: Erase irqsoff trace with empty write tracing: Fix trace_pipe behavior for instance traces KVM: PPC: Book3S: Fix race and leak in kvm_vm_ioctl_create_spapr_tce() mac80211: flush hw_roc_start work before cancelling the ROC cifs: release auth_key.response for reconnect. f2fs: catch up to v4.14-rc1 UPSTREAM: cpufreq: schedutil: use now as reference when aggregating shared policy requests ANDROID: add script to fetch android kernel config fragments f2fs: reorganize stat information f2fs: clean up flush/discard command namings f2fs: check in-memory sit version bitmap f2fs: check in-memory nat version bitmap f2fs: check in-memory block bitmap f2fs: introduce FI_ATOMIC_COMMIT f2fs: clean up with list_{first, last}_entry f2fs: return fs_trim if there is no candidate f2fs: avoid needless checkpoint in f2fs_trim_fs f2fs: relax async discard commands more f2fs: drop exist_data for inline_data when truncated to 0 f2fs: don't allow encrypted operations without keys f2fs: show the max number of atomic operations f2fs: get io size bit from mount option f2fs: support IO alignment for DATA and NODE writes f2fs: add submit_bio tracepoint f2fs: reassign new segment for mode=lfs f2fs: fix a missing discard prefree segments f2fs: use rb_entry_safe f2fs: add a case of no need to read a page in write begin f2fs: fix a problem of using memory after free f2fs: remove unneeded condition f2fs: don't cache nat entry if out of memory f2fs: remove unused values in recover_fsync_data f2fs: support async discard based on v4.9 f2fs: resolve op and op_flags confilcts f2fs: remove wrong backported codes FROMLIST: binder: fix use-after-free in binder_transaction() UPSTREAM: ipv6: fib: Unlink replaced routes from their nodes Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org> Conflicts: fs/f2fs/crypto_key.c fs/f2fs/f2fs_crypto.h net/wireless/nl80211.c sound/usb/card.c Change-Id: I742aeaec84c7892165976b7bea3e07bdd6881d93 Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
| * | Merge 4.4.92 into android-4.4Greg Kroah-Hartman2017-10-12
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.92 usb: gadget: inode.c: fix unbalanced spin_lock in ep0_write USB: gadgetfs: Fix crash caused by inadequate synchronization USB: gadgetfs: fix copy_to_user while holding spinlock usb: gadget: udc: atmel: set vbus irqflags explicitly usb-storage: unusual_devs entry to fix write-access regression for Seagate external drives usb: renesas_usbhs: fix the BCLR setting condition for non-DCP pipe usb: renesas_usbhs: fix usbhsf_fifo_clear() for RX direction ALSA: usb-audio: Check out-of-bounds access by corrupted buffer descriptor usb: pci-quirks.c: Corrected timeout values used in handshake USB: dummy-hcd: fix connection failures (wrong speed) USB: dummy-hcd: fix infinite-loop resubmission bug USB: dummy-hcd: Fix erroneous synchronization change USB: devio: Don't corrupt user memory usb: gadget: mass_storage: set msg_registered after msg registered USB: g_mass_storage: Fix deadlock when driver is unbound lsm: fix smack_inode_removexattr and xattr_getsecurity memleak ALSA: compress: Remove unused variable ALSA: usx2y: Suppress kernel warning at page allocation failures driver core: platform: Don't read past the end of "driver_override" buffer Drivers: hv: fcopy: restore correct transfer length stm class: Fix a use-after-free ftrace: Fix kmemleak in unregister_ftrace_graph HID: i2c-hid: allocate hid buffers for real worst case iwlwifi: add workaround to disable wide channels in 5GHz scsi: sd: Do not override max_sectors_kb sysfs setting USB: uas: fix bug in handling of alternate settings USB: core: harden cdc_parse_cdc_header usb: Increase quirk delay for USB devices USB: fix out-of-bounds in usb_set_configuration xhci: fix finding correct bus_state structure for USB 3.1 hosts iio: adc: twl4030: Fix an error handling path in 'twl4030_madc_probe()' iio: adc: twl4030: Disable the vusb3v1 rugulator in the error handling path of 'twl4030_madc_probe()' iio: ad_sigma_delta: Implement a dedicated reset function staging: iio: ad7192: Fix - use the dedicated reset function avoiding dma from stack. iio: core: Return error for failed read_reg iio: ad7793: Fix the serial interface reset iio: adc: mcp320x: Fix readout of negative voltages iio: adc: mcp320x: Fix oops on module unload uwb: properly check kthread_run return value uwb: ensure that endpoint is interrupt brcmfmac: setup passive scan if requested by user-space drm/i915/bios: ignore HDMI on port A nvme: protect against simultaneous shutdown invocations sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs ext4: fix data corruption for mmap writes ext4: Don't clear SGID when inheriting ACLs ext4: don't allow encrypted operations without keys Linux 4.4.92 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| | * sched/cpuset/pm: Fix cpuset vs. suspend-resume bugsPeter Zijlstra2017-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 50e76632339d4655859523a39249dd95ee5e93e7 upstream. Cpusets vs. suspend-resume is _completely_ broken. And it got noticed because it now resulted in non-cpuset usage breaking too. On suspend cpuset_cpu_inactive() doesn't call into cpuset_update_active_cpus() because it doesn't want to move tasks about, there is no need, all tasks are frozen and won't run again until after we've resumed everything. But this means that when we finally do call into cpuset_update_active_cpus() after resuming the last frozen cpu in cpuset_cpu_active(), the top_cpuset will not have any difference with the cpu_active_mask and this it will not in fact do _anything_. So the cpuset configuration will not be restored. This was largely hidden because we would unconditionally create identity domains and mobile users would not in fact use cpusets much. And servers what do use cpusets tend to not suspend-resume much. An addition problem is that we'd not in fact wait for the cpuset work to finish before resuming the tasks, allowing spurious migrations outside of the specified domains. Fix the rebuild by introducing cpuset_force_rebuild() and fix the ordering with cpuset_wait_for_hotplug(). Reported-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: deb7aa308ea2 ("cpuset: reorganize CPU / memory hotplug handling") Link: http://lkml.kernel.org/r/20170907091338.orwxrqkbfkki3c24@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| | * ftrace: Fix kmemleak in unregister_ftrace_graphShu Wang2017-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 2b0b8499ae75df91455bbeb7491d45affc384fb0 upstream. The trampoline allocated by function tracer was overwriten by function_graph tracer, and caused a memory leak. The save_global_trampoline should have saved the previous trampoline in register_ftrace_graph() and restored it in unregister_ftrace_graph(). But as it is implemented, save_global_trampoline was only used in unregister_ftrace_graph as default value 0, and it overwrote the previous trampoline's value. Causing the previous allocated trampoline to be lost. kmmeleak backtrace: kmemleak_vmalloc+0x77/0xc0 __vmalloc_node_range+0x1b5/0x2c0 module_alloc+0x7c/0xd0 arch_ftrace_update_trampoline+0xb5/0x290 ftrace_startup+0x78/0x210 register_ftrace_function+0x8b/0xd0 function_trace_init+0x4f/0x80 tracing_set_tracer+0xe6/0x170 tracing_set_trace_write+0x90/0xd0 __vfs_write+0x37/0x170 vfs_write+0xb2/0x1b0 SyS_write+0x55/0xc0 do_syscall_64+0x67/0x180 return_from_SYSCALL_64+0x0/0x6a [ Looking further into this, I found that this was left over from when the function and function graph tracers shared the same ftrace_ops. But in commit 5f151b2401 ("ftrace: Fix function_profiler and function tracer together"), the two were separated, and the save_global_trampoline no longer was necessary (and it may have been broken back then too). -- Steven Rostedt ] Link: http://lkml.kernel.org/r/20170912021454.5976-1-shuwang@redhat.com Fixes: 5f151b2401 ("ftrace: Fix function_profiler and function tracer together") Signed-off-by: Shu Wang <shuwang@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | FROMLIST: tracing: Add support for preempt and irq enable/disable eventsJoel Fernandes2017-10-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Preempt and irq trace events can be used for tracing the start and end of an atomic section which can be used by a trace viewer like systrace to graphically view the start and end of an atomic section and correlate them with latencies and scheduling issues. This also serves as a prelude to using synthetic events or probes to rewrite the preempt and irqsoff tracers, along with numerous benefits of using trace events features for these events. Change-Id: I718d40f7c3c48579adf9d7121b21495a669c89bd Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zilstra <peterz@infradead.org> Cc: kernel-team@android.com Link: https://patchwork.kernel.org/patch/9988157/ Signed-off-by: Joel Fernandes <joelaf@google.com>
| * | FROMLIST: tracing: Prepare to add preempt and irq trace eventsJoel Fernandes2017-10-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation of adding irqsoff and preemptsoff enable and disable trace events, move required functions and code to make it easier to add these events in a later patch. This patch is just code movement and no functional change. Change-Id: I587d411da5efbc4959bcccd7a05c7a66c231e1e0 Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-team@android.com Link: https://patchwork.kernel.org/patch/9988159/ Signed-off-by: Joel Fernandes <joelaf@google.com>
| * | Merge 4.4.90 into android-4.4Greg Kroah-Hartman2017-10-05
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.90 cifs: release auth_key.response for reconnect. mac80211: flush hw_roc_start work before cancelling the ROC KVM: PPC: Book3S: Fix race and leak in kvm_vm_ioctl_create_spapr_tce() tracing: Fix trace_pipe behavior for instance traces tracing: Erase irqsoff trace with empty write md/raid5: fix a race condition in stripe batch md/raid5: preserve STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list scsi: scsi_transport_iscsi: fix the issue that iscsi_if_rx doesn't parse nlmsg properly crypto: talitos - Don't provide setkey for non hmac hashing algs. crypto: talitos - fix sha224 KEYS: fix writing past end of user-supplied buffer in keyring_read() KEYS: prevent creating a different user's keyrings KEYS: prevent KEYCTL_READ on negative key powerpc/pseries: Fix parent_dn reference leak in add_dt_node() Fix SMB3.1.1 guest authentication to Samba SMB: Validate negotiate (to protect against downgrade) even if signing off SMB3: Don't ignore O_SYNC/O_DSYNC and O_DIRECT flags vfs: Return -ENXIO for negative SEEK_HOLE / SEEK_DATA offsets nl80211: check for the required netlink attributes presence bsg-lib: don't free job in bsg_prepare_job seccomp: fix the usage of get/put_seccomp_filter() in seccomp_get_filter() arm64: Make sure SPsel is always set arm64: fault: Route pte translation faults via do_translation_fault KVM: VMX: Do not BUG() on out-of-bounds guest IRQ kvm: nVMX: Don't allow L2 to access the hardware CR8 PCI: Fix race condition with driver_override btrfs: fix NULL pointer dereference from free_reloc_roots() btrfs: propagate error to btrfs_cmp_data_prepare caller btrfs: prevent to set invalid default subvolid x86/fpu: Don't let userspace set bogus xcomp_bv gfs2: Fix debugfs glocks dump timer/sysclt: Restrict timer migration sysctl values to 0 and 1 KVM: VMX: do not change SN bit in vmx_update_pi_irte() KVM: VMX: remove WARN_ON_ONCE in kvm_vcpu_trigger_posted_interrupt cxl: Fix driver use count dmaengine: mmp-pdma: add number of requestors ARM: pxa: add the number of DMA requestor lines ARM: pxa: fix the number of DMA requestor lines KVM: VMX: use cmpxchg64 video: fbdev: aty: do not leak uninitialized padding in clk to userspace swiotlb-xen: implement xen_swiotlb_dma_mmap callback fix xen_swiotlb_dma_mmap prototype Linux 4.4.90 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| | * timer/sysclt: Restrict timer migration sysctl values to 0 and 1Myungho Jung2017-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit b94bf594cf8ed67cdd0439e70fa939783471597a upstream. timer_migration sysctl acts as a boolean switch, so the allowed values should be restricted to 0 and 1. Add the necessary extra fields to the sysctl table entry to enforce that. [ tglx: Rewrote changelog ] Signed-off-by: Myungho Jung <mhjungk@gmail.com> Link: http://lkml.kernel.org/r/1492640690-3550-1-git-send-email-mhjungk@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Kazuhiro Hayashi <kazuhiro3.hayashi@toshiba.co.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| | * seccomp: fix the usage of get/put_seccomp_filter() in seccomp_get_filter()Oleg Nesterov2017-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 66a733ea6b611aecf0119514d2dddab5f9d6c01e upstream. As Chris explains, get_seccomp_filter() and put_seccomp_filter() can end up using different filters. Once we drop ->siglock it is possible for task->seccomp.filter to have been replaced by SECCOMP_FILTER_FLAG_TSYNC. Fixes: f8e529ed941b ("seccomp, ptrace: add support for dumping seccomp filters") Reported-by: Chris Salls <chrissalls5@gmail.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> [tycho: add __get_seccomp_filter vs. open coding refcount_inc()] Signed-off-by: Tycho Andersen <tycho@docker.com> [kees: tweak commit log] Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| | * tracing: Erase irqsoff trace with empty writeBo Yan2017-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 8dd33bcb7050dd6f8c1432732f930932c9d3a33e upstream. One convenient way to erase trace is "echo > trace". However, this is currently broken if the current tracer is irqsoff tracer. This is because irqsoff tracer use max_buffer as the default trace buffer. Set the max_buffer as the one to be cleared when it's the trace buffer currently in use. Link: http://lkml.kernel.org/r/1505754215-29411-1-git-send-email-byan@nvidia.com Cc: <mingo@redhat.com> Fixes: 4acd4d00f ("tracing: give easy way to clear trace buffer") Signed-off-by: Bo Yan <byan@nvidia.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| | * tracing: Fix trace_pipe behavior for instance tracesTahsin Erdogan2017-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 75df6e688ccd517e339a7c422ef7ad73045b18a2 upstream. When reading data from trace_pipe, tracing_wait_pipe() performs a check to see if tracing has been turned off after some data was read. Currently, this check always looks at global trace state, but it should be checking the trace instance where trace_pipe is located at. Because of this bug, cat instances/i1/trace_pipe in the following script will immediately exit instead of waiting for data: cd /sys/kernel/debug/tracing echo 0 > tracing_on mkdir -p instances/i1 echo 1 > instances/i1/tracing_on echo 1 > instances/i1/events/sched/sched_process_exec/enable cat instances/i1/trace_pipe Link: http://lkml.kernel.org/r/20170917102348.1615-1-tahsin@google.com Fixes: 10246fa35d4f ("tracing: give easy way to clear trace buffer") Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | UPSTREAM: cpufreq: schedutil: use now as reference when aggregating shared ↵Juri Lelli2017-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | policy requests Currently, sugov_next_freq_shared() uses last_freq_update_time as a reference to decide when to start considering CPU contributions as stale. However, since last_freq_update_time is set by the last CPU that issued a frequency transition, this might cause problems in certain cases. In practice, the detection of stale utilization values fails whenever the CPU with such values was the last to update the policy. For example (and please note again that the SCHED_CPUFREQ_RT flag is not the problem here, but only the detection of after how much time that flag has to be considered stale), suppose a policy with 2 CPUs: CPU0 | CPU1 | | RT task scheduled | SCHED_CPUFREQ_RT is set | CPU1->last_update = now | freq transition to max | last_freq_update_time = now | more than TICK_NSEC nsecs | a small CFS wakes up | CPU0->last_update = now1 | delta_ns(CPU0) < TICK_NSEC* | CPU0's util is considered | delta_ns(CPU1) = | last_freq_update_time - | CPU1->last_update = 0 | < TICK_NSEC | CPU1 is still considered | CPU1->SCHED_CPUFREQ_RT is set | we stay at max (until CPU1 | exits from idle) | * delta_ns is actually negative as now1 > last_freq_update_time While last_freq_update_time is a sensible reference for rate limiting, it doesn't seem to be useful for working around stale CPU states. Fix the problem by always considering now (time) as the reference for deciding when CPUs have stale contributions. Signed-off-by: Juri Lelli <juri.lelli@arm.com> Acked-by: Vincent Guittot <vincent.guittot@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> (cherry picked from commit d86ab9cff8b936aadde444d0e263a8db5ff0349b)
* | | Merge "Merge android-4.4@d68ba9f (v4.4.89) into msm-4.4"Linux Build Service Account2017-10-17
|\ \ \