summaryrefslogtreecommitdiff
path: root/kernel/sched/sched.h (follow)
Commit message (Collapse)AuthorAge
...
* | | sched/hmp: Use GFP_KERNEL for top task memory allocationsSyed Rameez Mustafa2016-11-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Task load structure allocations can consume a lot of memory as the number of tasks begin to increase. Also they might exhaust the atomic memory pool pretty quickly if a workload starts spawning lots of threads in a short amount of time thus increasing the possibility of failed allocations. Move the call to init_new_task_load() outside atomic context and start using GFP_KERNEL for allocations. There is no need for this allocation to be in atomic context. Change-Id: I357772e10bf8958804d9cd0c78eda27139054b21 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Fix compilation issue with reset_hmp_statsOlav Haugan2016-10-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | reset_hmp_stats was moved to another file and when CONFIG_CFS_BANDWIDTH is enabled there is code still referencing this in the original file causing compilation error. Change-Id: Iab7fc8551b628c443ce751026b06c5ff4ebba39a Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
* | | sched: Add multiple load reporting policies for cpu frequencySyed Rameez Mustafa2016-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous patches in this series introduce the mechanics of CPU load tracking without fixups for intra cluster migration and top task load tracking. Add a tunable that dictates what of the above needs to be considered when reporting load to the governor. The default policy is to take the maximum of the CPU load and top task load. Change-Id: Ie585a11ed774b929910d04c41471db3a2a102ec5 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Optimize the next top task search logic upon task migrationSyed Rameez Mustafa2016-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | find_next_top_index() is responsible for finding the second top task on a CPU when the top task migrates away from that CPU. This operation is expensive as we need to iterate the entire array of top tasks to find the second top task. Optimize this by introducing bitmaps for tracking top task indices. There are two bitmaps; one for the previous window and one for the current window. Each bit in a bitmap tracks whether the corresponding bucket in the top task hashmap has a non zero refcount. The bit is set when the refcount becomes non zero and is cleared when it becomes zero. Finding the second top task upon migration is then simply a matter of finding the highest set bit in the bitmap. Change-Id: Ibafaf66eed756b0328704dfaa89c17ab0d84e359 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Add the mechanics of top task tracking for frequency guidanceSyed Rameez Mustafa2016-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous patches in this rewrite of scheduler guided frequency selection reintroduces the part-picture problem that we addressed in our initial implementation. In that, when tasks migrate across CPUs within a cluster, we end up losing the complete picture of the sequential nature of the workload. This patch aims to solve that problem slightly differently. We track the top task on every CPU within a window. Top task is defined as the task that runs the most in a given window. This enhances our ability to detect the sequential nature of workloads. A single migrating task executing for an entire window will cause 100% load to be reported for frequency guidance instead of the maximum footprint left on any individual CPU in the task's trail. There are cases, that this new approach does not address. Namely, cases where the sum of two or more tasks accurately reflects the true sequential nature of the workload. Future optimizations might aim to tackle that problem. To track top tasks, we first realize that there is no strict need to maintain the task struct itself as long as we know the load exerted by the top task. We also realize that to maintain top tasks on every CPU we have to track the execution of every single task that runs during the window. The load associated with a task needs to be migrated when the task migrates from one CPU to another. When the top task migrates away, we need to locate the second top task and so on. Given the above realizations, we use hashmaps to track top task load both for the current and the previous window. This hashmap is implemented as an array of fixed size. The key of the hashmap is given by task_execution_time_in_a_window / array_size. The size of the array (number of buckets in the hashmap) dictate the load granularity of each bucket. The value stored in each bucket is a refcount of all the tasks that executed long enough to be in that bucket. This approach has a few benefits. Firstly, any top task stats update now take O(1) time. While task migration is also O(1), it does still involve going through up to the size of the array to find the second top task. Further patches will aim to optimize this behavior. Secondly, and more importantly, not having to store the task struct itself saves a lot of memory usage in that 1) there is no need to retrieve task structs later causing cache misses and 2) we don't have to unnecessarily hold up task memory for up to 2 full windows by calling get_task_struct() after a task exits. Change-Id: I004dba474f41590db7d3f40d9deafe86e71359ac Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Enhance the scheduler migration load fixup featureSyed Rameez Mustafa2016-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the current frequency guidance implementation the scheduler migrates task load from the source CPU to the destination CPU when a task migrates. The underlying assumption is that a task will stay on the destination CPU following the migration. Hence a CPU's load should reflect the sum of all tasks that last ran on that CPU prior to window expiration even if these tasks executed on some other CPU in that window prior to being migrated. However, given the ubiquitous nature of migrations the above assumption is flawed causing the scheduler to often add up load on a single CPU that in reality ran concurrently on multiple CPUs and will continue to run concurrently in subsequent windows. This leads to load over reporting on a single CPU which in turn causes CPU frequency to be higher than necessary. This is the first patch in a series of patches that attempts to change how load fixups are done upon migration to prevent load over reporting. In this patch, we stop doing migration fixups for intra-cluster migrations. Inter-cluster migration fixups are still retained. In order to achieve the above, we make use the per CPU footprint of each task introduced in the previous patch. Upon inter cluster migration, we go through every CPU in the source cluster to subtract the migrating task's contribution to the busy time on each one of those CPUs. The sum of the contributions is then added to the destination CPU allowing it to ramp up to the appropriate frequency for that task. Subtracting load from each of the source CPUs is not trivial, however, as it would require all runqueue locks to held. To get around this we introduce a deferred load subtraction mechanism whereby subtracting load from each of the source CPUs in deferred until an opportune moment. This opportune moment is when the governor comes asking the scheduler for load. At that time, all necessary runqueue locks are already held. There are a few cases to consider when doing deferred subtraction. Since we are not holding all runqueue locks other CPUs in the source cluster can be in a different window than the source CPU where the task is migrating from. Case 1: Other CPU in the source cluster is in the same window No special consideration Case 2: Other CPU in the source cluster is ahead by 1 window In this case, we will be doing redundant updates to subtraction load for the prev window. There is no way to avoid this redundant update though, without holding the rq lock. Case 3: Other CPU in the source cluster is trailing by 1 window In this case, we might end up overwriting old data for that CPU. But this is not a problem as when the other CPU calls update_task_ravg() it will move to the same window. This relies on maintaining synchronized windows between CPUs, which is true today. Finally, we must deal with frequency aggregation. When frequency aggregation is in effect, there is little point in dealing with per CPU footprint since the load of all related tasks have to be reported on a single CPU. Therefore when a task enters a related group we clear out all per CPU contributions and add it to the task CPU's cpu_time struct. From that point onwards we stop managing per CPU contributions upon inter cluster migrations since that work is redundant. Finally when a task exits a related group we must walk every CPU in reset all CPU contributions. We then set the task CPU contribution to the respective curr/prev sum values and add that sum to the task CPU rq runnable sum. Change-Id: I1f8d596e6c930f3f6f00e24109ddbe8b121f8d6b Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Add per CPU load tracking for each taskSyed Rameez Mustafa2016-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Keeping a track of the load footprint of each task on every CPU that it executed on gives the scheduler much more flexibility in terms of the number of frequency guidance policies. These new fields will be used in subsequent patches as we alter the load fixup mechanism upon task migration. We still need to maintain the curr/prev_window sums as they will also be required in subsequent patches as we start to track top tasks based on cumulative load. Also, we need to call init_new_task_load() for the idle task. This is an existing harmless bug as load tracking for the idle task is irrelevant. However, in this patch we are adding pointers to the ravg structure. These pointers have to be initialized even for the idle task. Finally move init_new_task_load() to sched_fork(). This was always the more appropriate place, however, following the introduction of new pointers in the ravg struct, this is necessary to avoid races with functions such as reset_all_task_stats(). Change-Id: Ib584372eb539706da4319973314e54dae04e5934 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | Merge "sched/cgroup: Fix/cleanup cgroup teardown/init"Linux Build Service Account2016-10-13
|\ \ \
| * | | sched/cgroup: Fix cgroup entity load tracking tear-downPeter Zijlstra2016-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a cgroup's CPU runqueue is destroyed, it should remove its remaining load accounting from its parent cgroup. The current site for doing so it unsuited because its far too late and unordered against other cgroup removal (->css_free() will be, but we're also in an RCU callback). Put it in the ->css_offline() callback, which is the start of cgroup destruction, right after the group has been made unavailable to userspace. The ->css_offline() callbacks are called in hierarchical order after the following v4.4 commit: aa226ff4a1ce ("cgroup: make sure a parent css isn't offlined before its children") Change-Id: Ice7cbd71d9e545da84d61686aa46c7213607bb9d Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Li Zefan <lizefan@huawei.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160121212416.GL6357@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Git-commit: 6fe1f348b3dd1f700f9630562b7d38afd6949568 Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
* | | | Merge "sched: Add a device tree property to specify the sched boost type"Linux Build Service Account2016-10-05
|\ \ \ \ | |/ / / |/| | |
| * | | sched: Add a device tree property to specify the sched boost typePavankumar Kondeti2016-10-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The HMP scheduler has two types of task placement boost policies. (1) boost-on-big policy make use of all big CPUs up to their full capacity before using the little CPUs. This improves performance on true b.L systems where the big CPUs have higher efficiency compared to the little CPUs. (2) boost-on-all policy place the tasks on the CPU having the highest spare capacity. This policy is optimal for SMP like systems. The scheduler sets the boost policy to boost-on-big on systems which has CPUs of different efficiencies. However it is possible that CPUs of the same micro architecture to have slight difference in efficiency due to other factors like cache size. Selecting the boost-on-big policy based on relative difference in efficiency is not optimal on such systems. The boost-policy device tree property is introduced to specify the required boost type and it overrides the default selection of boost type in the scheduler. The possible values for this property are "boost-on-big" and "boost-on-all". Change-Id: Iac19183fa7d4bfd9e5746b02a02b2b19cf64b78d Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | | Merge "sched: Add a stub function for init_clusters()"Linux Build Service Account2016-10-03
|\| | |
| * | | sched: Add a stub function for init_clusters()Pavankumar Kondeti2016-10-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a stub function for init_cluster() and remove a ifdefry for SCHED_HMP in sched_init() Change-Id: I6745485152d735436d8398818f7fb5e70ce5ee65 Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | | Merge "sched: don't assume higher capacity means higher power in lb"Linux Build Service Account2016-09-30
|\ \ \ \ | |/ / / |/| | |
| * | | sched: don't assume higher capacity means higher power in lbPavankumar Kondeti2016-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The load balancer restrictions are in place to control the tasks migration from the lower capacity cluster to higher capacity cluster to save power. The assumption here is that higher capacity cluster will have higher power cost which may not be necessarily true for all platforms. Use power cost based checks instead of capacity based checks while applying the inter cluster migration restrictions. Change-Id: Id9519eb8f7b183a2e9fca87a23cf95e951aa4005 Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | | sched: add cpu isolation supportOlav Haugan2016-09-24
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds cpu isolation APIs to the scheduler to isolate and unisolate CPUs. Isolating and unisolating a CPU can be used in place of hotplug. Isolating and unisolating a CPU is faster than hotplug and can thus be used to optimize the performance and power of multi-core CPUs. Isolating works by migrating non-pinned IRQs and tasks to other CPUS and marking the CPU as not available to the scheduler and load balancer. Pinned tasks and IRQs are still allowed to run but it is expected that this would be minimal. Unisolation works by just marking the CPU available for scheduler and load balancer. Change-Id: I0bbddb56238c2958c5987877c5bfc3e79afa67cc Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
* | | sched: Move data structures under CONFIG_SCHED_HMPSyed Rameez Mustafa2016-09-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | Frequency-demand conversion data structures are only used under CONFIG_SCHED_HMP. Move them out of sched.h into hmp.c to where they actually belong after the recent refactor. Change-Id: I3c3eebca86062f11b80af93ba3716695eb787376 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: handle frequency alert notifications betterPavankumar Kondeti2016-08-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The load reporting during frequency alert notifications is broken under load aggregation. When aggregation is enabled, the total group busy time is accounted towards the maximum busy CPU of a frequency domain. If this CPU has a notification pending, it's group busy time alone is accounted and other CPU's group busy time is completely ignored. Similarly if any CPU other than maximum busy CPU has a pending notification, its group busy time is accounted twice. Maintain the frequency alert notification flag per frequency domain. When the notification is pending, don't clip the load to 100% @ fur for any of the CPUs in the frequency domain. Change-Id: Iebc7d74d6fafa20430fa1c7d80f34a6ab198832d Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | sched: inherit the group id from the group leaderPavankumar Kondeti2016-08-22
| | | | | | | | | | | | | | | | | | | | | | | | When sysctl_sched_enable_thread_grouping is set to 1, any new tasks created are put in the same group as their group leader. Change-Id: If1837dd7c8120c8b097cfffa1dc52eb4781f1641 Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | sched: Move notify_migration() under CONFIG_SCHED_HMPSyed Rameez Mustafa2016-08-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | notify_migration() is a HMP specific function that relies on all of its contents to be stubbed out for !CONFIG_SCHED_HMP. However, it still maintains calls to rcu_read_lock/unlock(). In the !HMP case these calls are simply redundant. Move the function under CONFIG_SCHED_HMP and add a stub when the config is not defined so that there is no overhead. Change-Id: Iad914f31b629e81e403b0e89796b2b0f1d081695 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Move most HMP specific code to a separate file.Syed Rameez Mustafa2016-08-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most code pertaining to CONFIG_SCHED_HMP has been moved to a separate file "hmp.c" in order to facilitate kernel upgrades. Fewer changes in the original scheduler files means fewer conflicts. Some parts of code, however, could not be moved to the separate file either because of dependencies with other non-HMP code or because the changes are specific only to the scheduling classes where the code resides. Change-Id: Ib067ac75e5a494008dcb3c67586b622c1b3962ce Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Fix compile issues for !CONFIG_SCHED_HMPSyed Rameez Mustafa2016-08-22
| | | | | | | | | | | | | | | | | | | | | | | | Fix compile issues observed when CONFIG_SCHED_HMP is not turned on. There are still targets that may want that config option turned off. Change-Id: I29e69356da8d003d13d8cd3927a0b166cc1ef95e Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Remove all existence of CONFIG_SCHED_FREQ_INPUTSyed Rameez Mustafa2016-08-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CONFIG_SCHED_FREQ_INPUT was created to keep parts of the scheduler dealing with frequency separate from other parts of the scheduler that deal with task placement. However, overtime the two features have become intricately linked whereby SCHED_FREQ_INPUT cannot be turned on without having SCHED_HMP turned on as well. Given this complex inter-dependency and the fact that all old, existing and future targets use both config options, remove this unnecessary feature separation. It will aid in making kernel upgrades a lot simpler and faster. Change-Id: Ia20e40d8a088d50909cc28f5be758fa3e9a4af6f Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Move CPU cstate tracking under CONFIG_SCHED_HMPSyed Rameez Mustafa2016-08-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While tracking C-states makes sense under CONFIG_SMP as well, cstate information is currently unused under CONFIG_SMP. Move it under CONFIG_SCHED_HMP for now since that is the only place it is relevant at the moment. Change-Id: Ifc5812cfe14ebf2b4d447100dcd87f02ab29ff7a Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Remove unused PELT extensions for HMP schedulingSyed Rameez Mustafa2016-08-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PELT extensions for HMP have never been used since the early days of the HMP scheduler. Furthermore, changes to PELT itself in newer kernel versions render some of the code redundant or incorrect. These extensions have not been tested for a long time and are practically dead code. Remove it so that future upgrades become easier. Change-Id: I029f327406ca00b2370c93134158b61dda3b81e3 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | sched: Remove unused migration notifier code.Syed Rameez Mustafa2016-08-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Migration notifiers were created to aid the CPU-boost driver manage CPU frequencies when tasks migrate from one CPU to another. Over time with the evolution of scheduler guided frequency, the scheduler now directly manages load when tasks migrate. Consequently the CPU-boost driver no longer makes use of this information. Remove unused code pertaining to this feature. Change-Id: I3529e4356e15e342a5fcfbcf3654396752a1d7cd Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | Merge remote-tracking branch 'msm-4.4/tmp-2bf7955' into msm-4.4Trilok Soni2016-07-22
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * msm-4.4/tmp-2bf7955: Linux 4.4.8 Revert "usb: hub: do not clear BOS field during reset device" usbvision: fix crash on detecting device with invalid configuration staging: android: ion: Set the length of the DMA sg entries in buffer Revert "PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()" Revert "PCI: Add helpers to manage pci_dev->irq and pci_dev->irq_managed" Revert "x86/PCI: Don't alloc pcibios-irq when MSI is enabled" HID: usbhid: fix inconsistent reset/resume/reset-resume behavior HID: wacom: fix Bamboo ONE oops ALSA: usb-audio: Skip volume controls triggers hangup on Dell USB Dock ALSA: usb-audio: Add a quirk for Plantronics BT300 ALSA: usb-audio: Add a sample rate quirk for Phoenix Audio TMX320 ALSA: hda/realtek - Enable the ALC292 dock fixup on the Thinkpad T460s ALSA: hda - fix front mic problem for a HP desktop ALSA: hda - Fix headset support and noise on HP EliteBook 755 G2 ALSA: hda - Fixup speaker pass-through control for nid 0x14 on ALC225 mmc: sdhci-pci: Add support and PCI IDs for more Broxton host controllers perf: Cure event->pending_disable race perf: Do not double free arm64: replace read_lock to rcu lock in call_step_hook Btrfs: fix file/data loss caused by fsync after rename and new inode iommu: Don't overwrite domain pointer when there is no default_domain ext4: ignore quota mount options if the quota feature is enabled ext4: add lockdep annotations for i_data_sem btrfs: fix crash/invalid memory access on fsync when using overlayfs nfs: use file_dentry() fs: add file_dentry() sd: Fix excessive capacity printing on devices with blocks bigger than 512 bytes iio: gyro: bmg160: fix endianness when reading axes iio: gyro: bmg160: fix buffer read values iio: accel: bmc150: fix endianness when reading axes iio: st_magn: always define ST_MAGN_TRIGGER_SET_STATE usb: renesas_usbhs: fix to avoid using a disabled ep in usbhsg_queue_done() usb: renesas_usbhs: disable TX IRQ before starting TX DMAC transfer usb: renesas_usbhs: avoid NULL pointer derefernce in usbhsf_pkt_handler() mac80211: fix txq queue related crashes mac80211: fix unnecessary frame drops in mesh fwding mac80211: fix ibss scan parameters mac80211: avoid excessive stack usage in sta_info mac80211: properly deal with station hashtable insert errors virtio: virtio 1.0 cs04 spec compliance for reset rbd: use GFP_NOIO consistently for request allocations pcmcia: db1xxx_ss: fix last irq_to_gpio user v4l: vsp1: Set the SRU CTRL0 register when starting the stream coda: fix error path in case of missing pdata on non-DT platform au0828: Fix dev_state handling au0828: fix au0828_v4l2_close() dev_state race condition pinctrl: freescale: imx: fix bogus check of of_iomap() return value pinctrl: nomadik: fix pull debug print inversion pinctrl: sunxi: Fix A33 external interrupts not working pinctrl: sh-pfc: only use dummy states for non-DT platforms pinctrl: pistachio: fix mfio84-89 function description and pinmux. MIPS: Fix MSA ld unaligned failure cases KVM: x86: reduce default value of halt_poll_ns parameter KVM: x86: Inject pending interrupt even if pending nmi exist cdc-acm: fix NULL pointer reference USB: uas: Add a new NO_REPORT_LUNS quirk USB: uas: Limit qdepth at the scsi-host level mpls: find_outdev: check for err ptr in addition to NULL check ipv6: Count in extension headers in skb->network_header ip6_tunnel: set rtnl_link_ops before calling register_netdevice ipv6: l2tp: fix a potential issue in l2tp_ip6_recv ipv4: l2tp: fix a potential issue in l2tp_ip_recv tuntap: restore default qdisc tun, bpf: fix suspicious RCU usage in tun_{attach, detach}_filter rtnl: fix msg size calculation in if_nlmsg_size() bridge: Allow set bridge ageing time when switchdev disabled ipv6: udp: fix UDP_MIB_IGNOREDMULTI updates qmi_wwan: add "D-Link DWM-221 B1" device id xfrm: Fix crash observed during device unregistration and decryption ppp: take reference on channels netns ipv4: initialize flowi4_flags before calling fib_lookup() ipv4: fix broadcast packets reception bonding: fix bond_get_stats() net: bcmgenet: fix dma api length mismatch qlge: Fix receive packets drop. tcp/dccp: remove obsolete WARN_ON() in icmp handlers ppp: ensure file->private_data can't be overridden ath9k: fix buffer overrun for ar9287 farsync: fix off-by-one bug in fst_add_one mlx4: add missing braces in verify_qp_parameters net: Fix use after free in the recvmmsg exit path ipv4: Don't do expensive useless work during inetdev destroy. bridge: allow zero ageing time rocker: set FDB cleanup timer according to lowest ageing time mlxsw: spectrum: Check requested ageing time is valid macvtap: always pass ethernet header in linear qlcnic: Fix mailbox completion handling during spurious interrupt qlcnic: Remove unnecessary usage of atomic_t sh_eth: advance 'rxdesc' later in sh_eth_ring_format() sh_eth: fix NULL pointer dereference in sh_eth_ring_format() bpf: avoid copying junk bytes in bpf_get_current_comm() packet: validate variable length ll headers ax25: add link layer header validation function net: validate variable length ll headers ppp: release rtnl mutex when interface creation fails tcp: fix tcpi_segs_in after connection establishment udp6: fix UDP/IPv6 encap resubmit path usbnet: cleanup after bind() in probe() cdc_ncm: toggle altsetting to force reset before setup vxlan: fix missing options_len update on RX with collect metadata ipv6: re-enable fragment header matching in ipv6_find_hdr qmi_wwan: add Sierra Wireless EM74xx device ID tipc: Revert "tipc: use existing sk_write_queue for outgoing packet chain" mld, igmp: Fix reserved tailroom calculation sctp: lack the check for ports in sctp_v6_cmp_addr net: fix bridge multicast packet checksum validation net: qca_spi: clear IFF_TX_SKB_SHARING net: qca_spi: Don't clear IFF_BROADCAST net: vrf: Remove direct access to skb->data net: jme: fix suspend/resume on JMC260 ipv4: only create late gso-skb if skb is already set up with CHECKSUM_PARTIAL tunnel: Clear IPCB(skb)->opt before dst_link_failure called tcp: convert cached rtt from usec to jiffies when feeding initial rto xen/events: Mask a moving irq drm/amdgpu/gmc: use proper register for vram type on Fiji drm/amdgpu/gmc: move vram type fetching into sw_init drm/radeon: add a dpm quirk for all R7 370 parts drm/radeon: add another R7 370 quirk drm/radeon: add a dpm quirk for sapphire Dual-X R7 370 2G D5 drm/udl: Use unlocked gem unreferencing drm/dp: move hw_mutex up the call stack arm64: opcodes.h: Add arm big-endian config options before including arm header compiler-gcc: disable -ftracer for __noclone functions libnvdimm, pfn: fix uuid validation libnvdimm: fix smart data retrieval powerpc/mm: Fixup preempt underflow with huge pages mm: fix invalid node in alloc_migrate_target() ALSA: hda - Apply fix for white noise on Asus N550JV, too ALSA: hda - Fix white noise on Asus N750JV headphone ALSA: hda - Asus N750JV external subwoofer fixup ALSA: timer: Use mod_timer() for rearming the system timer parisc: Unbreak handling exceptions from kernel modules parisc: Fix kernel crash with reversed copy_from_user() parisc: Avoid function pointers for kernel exception routines PKCS#7: pkcs7_validate_trust(): initialize the _trusted output argument hwmon: (max1111) Return -ENODEV from max1111_read_channel if not instantiated Linux 4.4.7 perf/x86/intel: Fix PEBS data source interpretation on Nehalem/Westmere perf/x86/intel: Use PAGE_SIZE for PEBS buffer size on Core2 perf/x86/intel: Fix PEBS warning by only restoring active PMU in pmi perf/x86/pebs: Add workaround for broken OVFL status on HSW+ sched/cputime: Fix steal time accounting vs. CPU hotplug scsi_common: do not clobber fixed sense information PM / sleep: Clear pm_suspend_global_flags upon hibernate intel_idle: prevent SKL-H boot failure when C8+C9+C10 enabled mtd: onenand: fix deadlock in onenand_block_markbad mm/page_alloc: prevent merging between isolated and other pageblocks ocfs2/dlm: fix BUG in dlm_move_lockres_to_recovery_list ocfs2/dlm: fix race between convert and recovery Input: ati_remote2 - fix crashes on detecting device with invalid descriptor Input: ims-pcu - sanity check against missing interfaces Input: synaptics - handle spurious release of trackstick buttons, again writeback, cgroup: fix use of the wrong bdi_writeback which mismatches the inode writeback, cgroup: fix premature wb_put() in locked_inode_to_wb_and_lock_list() ACPI / PM: Runtime resume devices when waking from hibernate ARM: dts: at91: sama5d4 Xplained: don't disable hsmci regulator ARM: dts: at91: sama5d3 Xplained: don't disable hsmci regulator nfsd: fix deadlock secinfo+readdir compound nfsd4: fix bad bounds checking iser-target: Rework connection termination iser-target: Separate flows for np listeners and connections cma events iser-target: Add new state ISER_CONN_BOUND to isert_conn iser-target: Fix identification of login rx descriptor type target: Fix target_release_cmd_kref shutdown comp leak clk: bcm2835: Fix setting of PLL divider clock rates clk: rockchip: add hclk_cpubus to the list of rk3188 critical clocks clk: rockchip: rk3368: fix hdmi_cec gate-register clk: rockchip: rk3368: fix parents of video encoder/decoder clk: rockchip: rk3368: fix cpuclk core dividers clk: rockchip: rk3368: fix cpuclk mux bit of big cpu-cluster mmc: sdhci: Fix override of timeout clk wrt max_busy_timeout mmc: sdhci: fix data timeout (part 2) mmc: sdhci: fix data timeout (part 1) mmc: mmc_spi: Add Card Detect comments and fix CD GPIO case mmc: block: fix ABI regression of mmc_blk_ioctl ideapad-laptop: Add ideapad Y700 (15) to the no_hw_rfkill DMI list MAINTAINERS: Update mailing list and web page for hwmon subsystem kbuild/mkspec: fix grub2 installkernel issue scripts/kconfig: allow building with make 3.80 again scripts/coccinelle: modernize & bitops: Do not default to __clear_bit() for __clear_bit_unlock() tracing: Fix trace_printk() to print when not using bprintk() tracing: Fix crash from reading trace_pipe with sendfile tracing: Have preempt(irqs)off trace preempt disabled functions IB/ipoib: fix for rare multicast join race condition drm/amdgpu: include the right version of gmc header files for iceland drm/amdgpu: disable runtime pm on PX laptops without dGPU power control drm/radeon: Don't drop DP 2.7 Ghz link setup on some cards. drm/radeon: disable runtime pm on PX laptops without dGPU power control iwlwifi: mvm: Fix paging memory leak ipr: Fix regression when loading firmware ipr: Fix out-of-bounds null overwrite rapidio/rionet: fix deadlock on SMP fs/coredump: prevent fsuid=0 dumps into user-controlled directories fuse: Add reference counting for fuse_io_priv fuse: do not use iocb after it may have been freed md: multipath: don't hardcopy bio in .make_request path md/raid5: preserve STRIPE_PREREAD_ACTIVE in break_stripe_batch_list raid10: include bio_end_io_list in nr_queued to prevent freeze_array hang RAID5: revert e9e4c377e2f563 to fix a livelock RAID5: check_reshape() shouldn't call mddev_suspend md/raid5: Compare apples to apples (or sectors to sectors) raid1: include bio_end_io_list in nr_queued to prevent freeze_array hang xfs: fix two memory leaks in xfs_attr_list.c error paths quota: Fix possible GPF due to uninitialised pointers ARC: bitops: Remove non relevant comments ARC: [BE] readl()/writel() to work in Big Endian CPU configuration xtensa: clear all DBREAKC registers on start xtensa: fix preemption in {clear,copy}_user_highpage xtensa: ISS: don't hang if stdin EOF is reached splice: handle zero nr_pages in splice_to_pipe() vfs: show_vfsstat: do not ignore errors from show_devname method of: alloc anywhere from memblock if range not specified net: mvneta: enable change MAC address when interface is up cgroup: ignore css_sets associated with dead cgroups during migration Bluetooth: Fix potential buffer overflow with Add Advertising Bluetooth: Add new AR3012 ID 0489:e095 watchdog: rc32434_wdt: fix ioctl error handling watchdog: don't run proc_watchdog_update if new value is same as old ia64: define ioremap_uc() mm: memcontrol: reclaim and OOM kill when shrinking memory.max below usage mm: memcontrol: reclaim when shrinking memory.high below usage bcache: fix cache_set_flush() NULL pointer dereference on OOM bcache: fix race of writeback thread starting before complete initialization bcache: cleaned up error handling around register_cache() IB/srpt: Simplify srpt_handle_tsk_mgmt() brd: Fix discard request processing jbd2: fix FS corruption possibility in jbd2_journal_destroy() on umount path tools/hv: Use include/uapi with __EXPORTED_HEADERS__ ALSA: hda - Fix unconditional GPIO toggle via automute ALSA: hda - fix the mic mute button and led problem for a Lenovo AIO ALSA: hda - Don't handle ELD notify from invalid port ALSA: intel8x0: Add clock quirk entry for AD1981B on IBM ThinkPad X41. ALSA: pcm: Avoid "BUG:" string for warnings again ALSA: hda - Apply reboot D3 fix for CX20724 codec, too mtip32xx: Cleanup queued requests after surprise removal mtip32xx: Implement timeout handler mtip32xx: Handle FTL rebuild failure state during device initialization mtip32xx: Handle safe removal during IO mtip32xx: Fix for rmmod crash when drive is in FTL rebuild mtip32xx: Print exact time when an internal command is interrupted mtip32xx: Remove unwanted code from taskfile error handler mtip32xx: Fix broken service thread handling mtip32xx: Avoid issuing standby immediate cmd during FTL rebuild media: v4l2-compat-ioctl32: fix missing length copy in put_v4l2_buffer32 coda: fix first encoded frame payload bttv: Width must be a multiple of 16 when capturing planar formats adv7511: TX_EDID_PRESENT is still 1 after a disconnect saa7134: Fix bytesperline not being set correctly for planar formats 8250: use callbacks to access UART_DLL/UART_DLM net: irda: Fix use-after-free in irtty_open() tty: Fix GPF in flush_to_ldisc(), part 2 staging: comedi: ni_mio_common: fix the ni_write[blw]() functions staging: android: ion_test: fix check of platform_device_register_simple() error code staging: comedi: ni_tiocmd: change mistaken use of start_src for start_arg HID: fix hid_ignore_special_drivers module parameter HID: multitouch: force retrieving of Win8 signature blob HID: i2c-hid: fix OOB write in i2c_hid_set_or_send_report() HID: logitech: fix Dual Action gamepad support tpm: fix the cleanup of struct tpm_chip tpm_eventlog.c: fix binary_bios_measurements tpm_crb: tpm2_shutdown() must be called before tpm_chip_unregister() tpm: fix the rollback in tpm_chip_register() mei: bus: check if the device is enabled before data transfer X.509: Fix leap year handling again crypto: marvell/cesa - forward devm_ioremap_resource() error code crypto: ux500 - fix checks of error code returned by devm_ioremap_resource() crypto: atmel - fix checks of error code returned by devm_ioremap_resource() crypto: keywrap - memzero the correct memory crypto: ccp - memset request context to zero during import crypto: ccp - Don't assume export/import areas are aligned crypto: ccp - Limit the amount of information exported crypto: ccp - Add hash state import and export support Bluetooth: btusb: Add a new AR3012 ID 13d3:3472 Bluetooth: btusb: Add a new AR3012 ID 04ca:3014 Bluetooth: btusb: Add new AR3012 ID 13d3:3395 ALSA: usb-audio: Fix double-free in error paths after snd_usb_add_audio_stream() call ALSA: usb-audio: Minor code cleanup in create_fixed_stream_quirk() ALSA: usb-audio: add Microsoft HD-5001 to quirks ALSA: usb-audio: Add sanity checks for endpoint accesses ALSA: usb-audio: Fix NULL dereference in create_fixed_stream_quirk() Input: powermate - fix oops with malicious USB descriptors pwc: Add USB id for Philips Spc880nc webcam USB: option: add "D-Link DWM-221 B1" device id USB: serial: ftdi_sio: Add support for ICP DAS I-756xU devices USB: serial: cp210x: Adding GE Healthcare Device ID USB: cypress_m8: add endpoint sanity check USB: digi_acceleport: do sanity checking for the number of ports USB: mct_u232: add sanity checking in probe USB: usb_driver_claim_interface: add sanity checking USB: iowarrior: fix oops with malicious USB descriptors USB: cdc-acm: more sanity checking USB: uas: Reduce can_queue to MAX_CMNDS usb: hub: fix a typo in hub_port_init() leading to wrong logic usb: retry reset if a device times out dm: fix rq_end_stats() NULL pointer in dm_requeue_original_request() dm cache: make sure every metadata function checks fail_io dm thin metadata: don't issue prefetches if a transaction abort has failed dm: fix excessive dm-mq context switching dm snapshot: disallow the COW and origin devices from being identical libnvdimm: Fix security issue with DSM IOCTL. aic7xxx: Fix queue depth handling be2iscsi: set the boot_kset pointer to NULL in case of failure scsi: storvsc: fix SRB_STATUS_ABORTED handling sd: Fix discard granularity when LBPRZ=1 aacraid: Set correct msix count for EEH recovery aacraid: Fix memory leak in aac_fib_map_free aacraid: Fix RRQ overload sg: fix dxferp in from_to case x86/mm: TLB_REMOTE_SEND_IPI should count pages x86/iopl: Fix iopl capability check on Xen PV x86/iopl/64: Properly context-switch IOPL on Xen PV x86/apic: Fix suspicious RCU usage in smp_trace_call_function_interrupt() x86/irq: Cure live lock in fixup_irqs() PCI: ACPI: IA64: fix IO port generic range check PCI: Disable IO/MEM decoding for devices with non-compliant BARs pinctrl-bcm2835: Fix cut-and-paste error in "pull" parsing s390/pci: enforce fmb page boundary rule s390/cpumf: add missing lpp magic initialization s390: fix floating pointer register corruption (again) EDAC, amd64_edac: Shift wrapping issue in f1x_get_norm_dct_addr() EDAC/sb_edac: Fix computation of channel address sched/preempt, sh: kmap_coherent relies on disabled preemption sched/cputime: Fix steal_account_process_tick() to always return jiffies Thermal: Ignore invalid trip points perf tools: Fix python extension build perf tools: Fix checking asprintf return value perf tools: Dont stop PMU parsing on alias parse error perf/core: Fix perf_sched_count derailment KVM: VMX: fix nested vpid for old KVM guests KVM: VMX: avoid guest hang on invalid invvpid instruction KVM: VMX: avoid guest hang on invalid invept instruction KVM: fix spin_lock_init order on x86 KVM: i8254: change PIT discard tick policy KVM: x86: fix missed hardware breakpoints x86/PCI: Mark Broadwell-EP Home Agent & PCU as having non-compliant BARs perf/x86/intel: Add definition for PT PMI bit x86/entry/compat: Keep TS_COMPAT set during signal delivery x86/microcode: Untangle from BLK_DEV_INITRD x86/microcode/intel: Make early loader look for builtin microcode too mmc: sh_mmcif: Correct TX DMA channel allocation mmc: sh_mmcif: rework dma channel handling ASoC: samsung: pass DMA channels as pointers regulator: core: Fix nested locking of supplies regulator: core: avoid unused variable warning s390/cpumf: Fix lpp detection cpufreq: dt: No need to allocate resources anymore cpufreq: dt: No need to fetch voltage-tolerance cpufreq: dt: Use dev_pm_opp_set_rate() to switch frequency cpufreq: dt: Reuse dev_pm_opp_get_max_transition_latency() cpufreq: dt: Unsupported OPPs are already disabled cpufreq: dt: Pass regulator name to the OPP core cpufreq: dt: OPP layers handles clock-latency for V1 bindings as well cpufreq: dt: Rename 'need_update' to 'opp_v1' cpufreq: dt: Convert few pr_debug/err() calls to dev_dbg/err() cpufreq-dt: fix handling regulator_get_voltage() result cpufreq-dt: Supply power coefficient when registering cooling devices PM / OPP: Rename structures for clarity PM / OPP: Fix incorrect comments PM / OPP: Initialize regulator pointer to an error value PM / OPP: Initialize u_volt_min/max to a valid value PM / OPP: Fix NULL pointer dereference crash when disabling OPPs PM / OPP: Add dev_pm_opp_set_rate() PM / OPP: Manage device clk PM / OPP: Parse clock-latency and voltage-tolerance for v1 bindings PM / OPP: Introduce dev_pm_opp_get_max_transition_latency() PM / OPP: Introduce dev_pm_opp_get_max_volt_latency() PM / OPP: Disable OPPs that aren't supported by the regulator PM / OPP: get/put regulators from OPP core cpufreq: cpufreq-dt: avoid uninitialized variable warnings: PM / OPP: Use snprintf() instead of sprintf() PM / OPP: Set cpu_dev->id in cpumask first PM / OPP: Fix parsing of opp-microvolt and opp-microamp properties PM / OPP: Parse 'opp-<prop>-<name>' bindings PM / OPP: Parse 'opp-supported-hw' binding PM / OPP: Add missing doc comments PM / OPP: Rename OPP nodes as opp@<opp-hz> PM / OPP: Remove 'operating-points-names' binding PM / OPP: Add {opp-microvolt|opp-microamp}-<name> binding PM / OPP: Add "opp-supported-hw" binding PM / OPP: Add debugfs support arm64: vdso: Mark vDSO code as read-only Conflicts: drivers/staging/android/ion/ion.c mm/page_alloc.c CRs-Fixed: 1010239 Change-Id: Id59539cad642885e1e41340cebae4159ba1f7eaf Signed-off-by: Trilok Soni <tsoni@codeaurora.org>
| * | sched/cputime: Fix steal time accounting vs. CPU hotplugThomas Gleixner2016-04-12
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e9532e69b8d1d1284e8ecf8d2586de34aec61244 upstream. On CPU hotplug the steal time accounting can keep a stale rq->prev_steal_time value over CPU down and up. So after the CPU comes up again the delta calculation in steal_account_process_tick() wreckages itself due to the unsigned math: u64 steal = paravirt_steal_clock(smp_processor_id()); steal -= this_rq()->prev_steal_time; So if steal is smaller than rq->prev_steal_time we end up with an insane large value which then gets added to rq->prev_steal_time, resulting in a permanent wreckage of the accounting. As a consequence the per CPU stats in /proc/stat become stale. Nice trick to tell the world how idle the system is (100%) while the CPU is 100% busy running tasks. Though we prefer realistic numbers. None of the accounting values which use a previous value to account for fractions is reset at CPU hotplug time. update_rq_clock_task() has a sanity check for prev_irq_time and prev_steal_time_rq, but that sanity check solely deals with clock warps and limits the /proc/stat visible wreckage. The prev_time values are still wrong. Solution is simple: Reset rq->prev_*_time when the CPU is plugged in again. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: commit 095c0aa83e52 "sched: adjust scheduler cpu power for stolen time" Fixes: commit aa483808516c "sched: Remove irq time from available CPU power" Fixes: commit e6e6685accfa "KVM guest: Steal time accounting" Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603041539490.3686@nanos Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | sched: kill unnecessary divisions on fast pathJoonwoo Park2016-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | | The max_possible_efficiency and CPU's efficiency are fixed values which are determined at cluster allocation time. Avoid division on the fast by using precomputed scale factor. Also update_cpu_busy_time() doesn't need to know how many full windows have elapsed. Thus replace unneeded division with simple comparison. Change-Id: I2be1aad3fb9b895e4f0917d05bd8eade985bbccf Suggested-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | sched: remove unused parameter cpu from cpu_cycles_to_freq()Joonwoo Park2016-06-21
| | | | | | | | | | | | | | | | The function parameter cpu isn't used anymore by cpu_cycles_to_freq(). So remove it. Change-Id: Ide19321206dacb88fedca97e1b689d740f872866 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | sched: fix potential deflated frequency estimation during IRQ handlingJoonwoo Park2016-06-09
| | | | | | | | | | | | | | | | | | | | | | | | Time between mark_start of idle task and IRQ handler entry time is CPU cycle counter stall period. Therefore it's inappropriate to include such duration as part of sample period when we do frequency estimation. Fix such suboptimality by replenishing idle task's CPU cycle counter upon IRQ entry and using irqtime as time delta. Change-Id: I274d5047a50565cfaaa2fb821ece21c8cf4c991d Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | sched: preserve CPU cycle counter in rqJoonwoo Park2016-06-09
| | | | | | | | | | | | | | | | Preserve cycle counter in rq in preparation for wait time accounting while CPU idle fix. Change-Id: I469263c90e12f39bb36bde5ed26298b7c1c77597 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | sched: Remove the sched heavy task frequency guidance featureJoonwoo Park2016-06-03
| | | | | | | | | | | | | | | | | | | | | | | | This has always been unused feature given its limitation of adding phantom load to the system. Since there are no immediate plans of using this and the fact that it adds unnecessary complications to the new load fixup mechanism, remove this feature for now. It can be revisited later in light of the new mechanism. Change-Id: Ie9501a898d0f423338293a8dde6bc56f493f1e75 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | sched: Aggregate for frequencySrivatsa Vaddagiri2016-05-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Related threads in a group could execute on different CPUs and hence present a split-demand picture to cpufreq governor. IOW the governor fails to see the net cpu demand of all related threads in a given window if the threads's execution were to be split across CPUs. That could result in sub-optimal frequency chosen in comparison to the ideal frequency at which the aggregate work (taken up by related threads) needs to be run. This patch aggregates cpu execution stats in a window for all related threads in a group. This helps present cpu busy time to governor as if all related threads were part of the same thread and thus help select the right frequency required by related threads. This aggregation is done per-cluster. Change-Id: I71e6047620066323721c6d542034ddd4b2950e7f Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: Fixed notify_migration() to hold rcu read lock as this version of Linux doesn't hold p->pi_lock when the function gets called while keeping use of rcu_access_pointer() since we never dereference return value.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | sched: simplify CPU frequency estimation and cycle counter APIJoonwoo Park2016-05-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most of CPUs increase cycle counter by one every cycle which makes frequency = cycles / time_delta is correct. Therefore it's reasonable to get rid of current cpu_cycle_max_scale_factor and ask cycle counter read callback function to return scaled counter value when it's needed in such a case that cycle counter doesn't increase every cycle. Thus multiply NSEC_PER_SEC / HZ_PER_KHZ to CPU cycle counter delta as we calculate frequency in khz and remove cpu_cycle_max_scale_factor. This allows us to simplify frequency estimation and cycle counter API. Change-Id: Ie7a628d4bc77c9b6c769f6099ce8d75740262a14 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | sched: take into account of limited CPU min and max frequenciesJoonwoo Park2016-04-27
| | | | | | | | | | | | | | | | | | | | | | | | Actual CPU's min and max frequencies can be limited by hardware components while governor's not aware of. Provide an API for them to notify for scheduler to be able to notice accurate currently operating frequency boundaries which helps better task placement decision. CRs-fixed: 1006303 Change-Id: I608f5fa8b0baff8d9e998731dcddec59c9073d20 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | sched: add support for CPU frequency estimation with cycle counterJoonwoo Park2016-04-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At present scheduler calculates task's demand with the task's execution time weighted over CPU frequency. The CPU frequency is given by governor's CPU frequency transition notification. Such notification may not be available. Provide an API for CPU clock driver to register callback functions so in order for scheduler to access CPU's cycle counter to estimate CPU's frequency without notification. At time point scheduler assumes the cycle counter increases always even when cluster is idle which might not be true. This will be fixed by subsequent change for more accurate I/O wait time accounting. CRs-fixed: 1006303 Change-Id: I93b187efd7bc225db80da0184683694f5ab99738 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | sched/core: Add protection against null-pointer dereferenceOlav Haugan2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | p->grp is being accessed outside of lock which can cause null-pointer dereference. Fix this and also add rcu critical section around access of this data structure. CRs-fixed: 985379 Change-Id: Ic82de6ae2821845d704f0ec18046cc6a24f98e39 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict in init_new_task_load().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | sched: Add separate load tracking histogram to predict loadsPavankumar Kondeti2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current window based load tracking only saves history for five windows. A historically heavy task's heavy load will be completely forgotten after five windows of light load. Even before the five window expires, a heavy task wakes up on same CPU it used to run won't trigger any frequency change until end of the window. It would starve for the entire window. It also adds one "small" load window to history because it's accumulating load at a low frequency, further reducing the tracked load for this heavy task. Ideally, scheduler should be able to identify such tasks and notify governor to increase frequency immediately after it wakes up. Add a histogram for each task to track a much longer load history. A prediction will be made based on runtime of previous or current window, histogram data and load tracked in recent windows. Prediction of all tasks that is currently running or runnable on a CPU is aggregated and reported to CPUFreq governor in sched_get_cpus_busy(). sched_get_cpus_busy() now returns predicted busy time in addition to previous window busy time and new task busy time, scaled to the CPU maximum possible frequency. Tunables: - /proc/sys/kernel/sched_gov_alert_freq (KHz) This tunable can be used to further filter the notifications. Frequency alert notification is sent only when the predicted load exceeds previous window load by sched_gov_alert_freq converted to load. Change-Id: If29098cd2c5499163ceaff18668639db76ee8504 Suggested-by: Saravana Kannan <skannan@codeaurora.org> Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Junjie Wu <junjiew@codeaurora.org> [joonwoop@codeaurora.org: fixed merge conflicts around __migrate_task() and removed changes for CONFIG_SCHED_QHMP.]
* | sched: Provide a wake up API without sending freq notificationsJunjie Wu2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each time a task wakes up, scheduler evaluates its load and notifies governor if the resulting frequency of destination CPU is larger than a threshold. However, some governor wakes up a separate task that handles frequency change, which again calls wake_up_process(). This is dangerous because if the task being woken up meets the threshold and ends up being moved around, there is a potential for endless recursive notifications. Introduce a new API for waking up a task without triggering frequency notification. Change-Id: I24261af81b7dc410c7fb01eaa90920b8d66fbd2a Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
* | sched: Provide a facility to restrict RT tasks to lower power clusterPavankumar Kondeti2016-03-23
| | | | | | | | | | | | | | | | | | | | The current CPU selection algorithm for RT tasks looks for the least loaded CPU in all clusters. Stop the search at the lowest possible power cluster based on "sched_restrict_cluster_spill" sysctl tunable. Change-Id: I34fdaefea56e0d1b7e7178d800f1bb86aa0ec01c Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | sched: Take cluster's minimum power into account for optimizing sbc()Pavankumar Kondeti2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The select_best_cpu() algorithm iterates over all the clusters and selects the most power efficient CPU that satisfies the task needs. During the search, skip the next cluster if its minimum power cost is higher than the power cost of an eligible CPU found in the previous cluster. In a b.L system, if the BIG cluster minimum power cost is higher than the maximum power cost of the little cluster, this optimization avoids searching the BIG cluster if an eligible CPU is found in the little cluster. Change-Id: I5e3755f107edb6c72180edbec2a658be931c276d Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | sched: Revise the inter cluster load balance restrictionsPavankumar Kondeti2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The frequency based inter cluster load balance restrictions are not reliable as frequency does not provide a good estimate of the CPU's current load. Replace them with the spill_load and spill_nr_run based checks. The higher capacity cluster is restricted from pulling the tasks from the lower capacity cluster unless all of the lower capacity CPUs are above spill. This behavior can be controlled by a sysctl tunable and it is disabled by default (i.e. no load balance restrictions). Change-Id: I45c09c8adcb61a8a7d4e08beadf2f97f1805fb42 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> [joonwoop@codeaurora.org: fixed merge conflicts due to omitted changes for CONFIG_SCHED_QHMP.]
* | sched: colocate related threadsSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide userspace interface for tasks to be grouped together as "related" threads. For example, all threads involved in updating display buffer could be tagged as related. Scheduler will attempt to provide special treatment for group of related threads such as: 1) Colocation of related threads in same "preferred" cluster 2) Aggregation of demand towards determination of cluster frequency This patch extends scheduler to provide best-effort colocation support for a group of related threads. Change-Id: Ic2cd769faf5da4d03a8f3cb0ada6224d0101a5f5 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [joonwoop@codeaurora.org: fixed minor merge conflicts. removed ifdefry for CONFIG_SCHED_QHMP.] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | sched: Update fair and rt placement logic to use scheduler clustersSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Make use of clusters in the fair and rt scheduling classes. This is needed as the freq domain mask can no longer be used to do correct task placement. The freq domain mask was being used to demarcate clusters. Change-Id: I57f74147c7006f22d6760256926c10fd0bf50cbd Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed merge conflicts due to omitted changes for CONFIG_SCHED_QHMP.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | sched: Introduce the concept CPU clusters in the schedulerSrivatsa Vaddagiri2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A cluster is set of CPUs sharing some power controls and an L2 cache. This patch buids a list of clusters at bootup which are sorted by their max_power_cost. Many cluster-shared attributes like cur_freq, max_freq etc are needlessly maintained in per-cpu 'struct rq' currently. Consolidate them in a cluster structure. Change-Id: I0567672ad5fb67d211d9336181ceb53b9f6023af Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> [joonwoop@codeaurora.org: fixed minor conflict in arch/arm64/kernel/topology.c. fixed conflict due to ommited changes for CONFIG_SCHED_QHMP.] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | sched: fix compile failure where !CONFIG_SCHED_HMPJoonwoo Park2016-03-23
| | | | | | | | | | | | | | Fix compile failure when HMP scheduler isn't selected. Change-Id: I411fa3501a4c4ac280c037a1698aa3b7278d440f Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | sched: Optimize scheduler trace events to reduce trace buffer usageSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Scheduler ftrace events currently generate a lot of data when turned on. The excessive log messages often end up overflowing trace buffers for long use cases or crowding out other events. Optimize scheduler events so that the log spew is less and more manageable. To that end change the variable type for some event fields; introduce variants of sched_cpu_load that can be turned on/off for separate code paths and remove unused fields from various events. Change-Id: I2b313542b39ad5e09a01ad1303b5dfe2c4883b8a Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict in rt.c due to CONFIG_SCHED_QHMP.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | sched: Notify cpufreq governor early about potential big tasksSyed Rameez Mustafa2016-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tasks that are on the runqueue continuously for a certain amount of time have the potential to be big tasks at the end of the window in which they are runnable. In such scenarios ramping the CPU frequency early can boost performance rather than waiting till the end of a window for the governor to query load. Notify the governor early at every tick when a task has been observed to execute beyond some percentage of the tick period. The threshold beyond which a task is eligible for early detection can be changed via the tunable sched_early_detection_duration. The feature itself is enabled only when scheduler boost is in effect. Change-Id: I528b72bbc79a55b4593d1b8ab45450411c6d70f3 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict in scheduler_tick() in kernel/sched/core.c. fixed minor conflicts in include/linux/sched.h, include/linux/sched/sysctl.h and kernel/sysctl.c due to CONFIG_SCHED_QHMP.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | sched: avoid unnecessary multiplication and divisionJoonwoo Park2016-03-23
| | | | | | | | | | | | | | | | Avoid unnecessary multiplication and division when load scaling factor is 1024. Change-Id: If3cb63a77feaf49cc69ddec7f41cc3c1cabbfc5a Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>