android_kernel_zuk_msm8996.git

	Commit message (Collapse)	Author	Age
*	Merge remote-tracking branch 'msm8998/lineage-20'master	Raghuram Subramani	2024-10-13
\|\
\| *	sched/fair Remove duplicate walt_cpu_high_irqload call	Alexander Grund	2024-03-06
\| \| \| \| \| \| \| \| \| \| \| \|	This is already done a few lines up. Change-Id: Ifeda223d728bfc4e107407418b11303f7819e277
\| *	sched/fair: Remove leftover sched_clock_cpu call	Alexander Grund	2024-03-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The usage of `now` in __refill_cfs_bandwidth_runtime was removed in b933e4d37bc023d27c7394626669bae0a201da52 (sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices) Remove the variable and the call setting it. Change-Id: I1e98ccd2c2298d269b9b447a6e5c79d61518c66e
\| *	Revert "sched: tune: Unconditionally allow attach"	Bruno Martins	2024-01-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit f39d0b496aa4e13cbcf23bdf3c22d356bd783b9e. It is no longer applicable, allow_attach cgroup was ditched upstream. Change-Id: Ib370171e93a5cb0c9993065b98de2073c905229c
\| *	sched_getaffinity: don't assume 'cpumask_size()' is fully initialized	Linus Torvalds	2024-01-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[ Upstream commit 6015b1aca1a233379625385feb01dd014aca60b5 ] The getaffinity() system call uses 'cpumask_size()' to decide how big the CPU mask is - so far so good. It is indeed the allocation size of a cpumask. But the code also assumes that the whole allocation is initialized without actually doing so itself. That's wrong, because we might have fixed-size allocations (making copying and clearing more efficient), but not all of it is then necessarily used if 'nr_cpu_ids' is smaller. Having checked other users of 'cpumask_size()', they all seem to be ok, either using it purely for the allocation size, or explicitly zeroing the cpumask before using the size in bytes to copy it. See for example the ublk_ctrl_get_queue_affinity() function that uses the proper 'zalloc_cpumask_var()' to make sure that the whole mask is cleared, whether the storage is on the stack or if it was an external allocation. Fix this by just zeroing the allocation before using it. Do the same for the compat version of sched_getaffinity(), which had the same logic. Also, for consistency, make sched_getaffinity() use 'cpumask_bits()' to access the bits. For a cpumask_var_t, it ends up being a pointer to the same data either way, but it's just a good idea to treat it like you would a 'cpumask_t'. The compat case already did that. Reported-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/lkml/7d026744-6bd6-6827-0471-b5e8eae0be3f@arm.com/ Cc: Yury Norov <yury.norov@gmail.com> Change-Id: I60139451bd9e9a4b687f0f2097ac1b2813758c45 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Ulrich Hecht <uli+cip@fpond.eu>
\| *	sched: deadline: Add missing WALT code	Josh Choo	2023-11-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	CAF did not use WALT on msm-4.4 kernels and left out important WALT bits. Change-Id: If5de7100e010f299bd7b2c62720ff309a98c569d
\| *	sched: Reinstantiate EAS check_for_migration() implementation	Alexander Grund	2023-11-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 6d5adb184946 ("sched: Restore previous implementation of check_for_migration()") reverted parts of an upstream commit including the "EAS scheduler implementation" of check_for_migration() in favor of the HMP implementation as the former breaks the latter. However without HMP we do want the former. Hence add both and select based on CONFIG_SCHED_HMP. Note that CONFIG_SMP is a precondition for CONFIG_SCHED_HMP, so the guard in the header uses the former. Change-Id: Iac0b462a38b35d1670d56ba58fee532a957c60b3
\| *	sched: Remove left-over CPU-query from __migrate_task	Alexander Grund	2023-11-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The code using this was moved to notify_migration but the query and the variable were not removed. Fixes 375d7195fc257 ("sched: move out migration notification out of spinlock") Change-Id: I75569443db2a55510c8279993e94b3e1a0ed562c
\| *	sched/walt: Add missing WALT call to `dequeue_task_fair`	Alexander Grund	2023-11-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Similar to `dec_cfs_rq_hmp_stats` vs `walt_dec_cfs_cumulative_runnable_avg` we need to call `walt_dec_cumulative_runnable_avg` where `dec_rq_hmp_stats` is called. Corresponds to the `walt_inc_cfs_cumulative_runnable_avg` call in `enqueue_task_fair`. Based on 4e29a6c5f98f9694d5ad01a4e7899aad157f8d49 ("sched: Add missing WALT code") Fixes c0fa7577022c4169e1aaaf1bd9e04f63d285beb2 ("sched/walt: Re-add code to allow WALT to function") Change-Id: If2b291e1e509ba300d7f4b698afe73a72b273604
\| *	BACKPORT: cpufreq: schedutil: Avoid using invalid next_freq	Rafael J. Wysocki	2023-07-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the next_freq field of struct sugov_policy is set to UINT_MAX, it shouldn't be used for updating the CPU frequency (this is a special "invalid" value), but after commit b7eaf1aab9f8 (cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely) it may be passed as the new frequency to sugov_update_commit() in sugov_update_single(). Fix that by adding an extra check for the special UINT_MAX value of next_freq to sugov_update_single(). Fixes: b7eaf1aab9f8 (cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely) Reported-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 4.12+ <stable@vger.kernel.org> # 4.12+ Change-Id: Idf4ebe9e912f983598255167d2065e47562ab83d Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> (cherry picked from commit 97739501f207efe33145b918817f305b822987f8) Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
\| *	cpufreq: schedutil: Don't set next_freq to UINT_MAX	Viresh Kumar	2023-07-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The schedutil driver sets sg_policy->next_freq to UINT_MAX on certain occasions to discard the cached value of next freq: - In sugov_start(), when the schedutil governor is started for a group of CPUs. - And whenever we need to force a freq update before rate-limit duration, which happens when: - there is an update in cpufreq policy limits. - Or when the utilization of DL scheduling class increases. In return, get_next_freq() doesn't return a cached next_freq value but recalculates the next frequency instead. But having special meaning for a particular value of frequency makes the code less readable and error prone. We recently fixed a bug where the UINT_MAX value was considered as valid frequency in sugov_update_single(). All we need is a flag which can be used to discard the value of sg_policy->next_freq and we already have need_freq_update for that. Lets reuse it instead of setting next_freq to UINT_MAX. Change-Id: Ia37ef416d5ecac11fe0c6a2be7e21fdbca708a1a Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Yaroslav Furman <yaro330@gmail.com> - backported to 4.4
\| *	schedutil: Allow cpufreq requests to be made even when kthread kicked	Joel Fernandes (Google)	2023-07-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently there is a chance of a schedutil cpufreq update request to be dropped if there is a pending update request. This pending request can be delayed if there is a scheduling delay of the irq_work and the wake up of the schedutil governor kthread. A very bad scenario is when a schedutil request was already just made, such as to reduce the CPU frequency, then a newer request to increase CPU frequency (even sched deadline urgent frequency increase requests) can be dropped, even though the rate limits suggest that its Ok to process a request. This is because of the way the work_in_progress flag is used. This patch improves the situation by allowing new requests to happen even though the old one is still being processed. Note that in this approach, if an irq_work was already issued, we just update next_freq and don't bother to queue another request so there's no extra work being done to make this happen. Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Change-Id: I7b6e19971b2ce3bd8e8336a5a4bc1acb920493b5 Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Yaroslav Furman <yaro330@gmail.com> - backport to 4.4
\| *	cpufreq: schedutil: Fix iowait boost reset	Patrick Bellasi	2023-07-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A more energy efficient update of the IO wait boosting mechanism has been introduced in: commit a5a0809 ("cpufreq: schedutil: Make iowait boost more energy efficient") where the boost value is expected to be: - doubled at each successive wakeup from IO staring from the minimum frequency supported by a CPU - reset when a CPU is not updated for more then one tick by either disabling the IO wait boost or resetting its value to the minimum frequency if this new update requires an IO boost. This approach is supposed to "ignore" boosting for sporadic wakeups from IO, while still getting the frequency boosted to the maximum to benefit long sequence of wakeup from IO operations. However, these assumptions are not always satisfied. For example, when an IO boosted CPU enters idle for more the one tick and then wakes up after an IO wait, since in sugov_set_iowait_boost() we first check the IOWAIT flag, we keep doubling the iowait boost instead of restarting from the minimum frequency value. This misbehavior could happen mainly on non-shared frequency domains, thus defeating the energy efficiency optimization, but it can also happen on shared frequency domain systems. Let fix this issue in sugov_set_iowait_boost() by: - first check the IO wait boost reset conditions to eventually reset the boost value - then applying the correct IO boost value if required by the caller Fixes: a5a0809 (cpufreq: schedutil: Make iowait boost more energy efficient) Reported-by: Viresh Kumar <viresh.kumar@linaro.org> Change-Id: I196b5c464cd43164807c12b2dbc2e5d814bf1d33 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Yaroslav Furman <yaro330@gmail.com> - backport to 4.4
\| *	FROMLIST: sched/fair: Don't move tasks to lower capacity cpus unless necessary	Chris Redpath	2023-07-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When lower capacity CPUs are load balancing and considering to pull something from a higher capacity group, we should not pull tasks from a cpu with only one task running as this is guaranteed to impede progress for that task. If there is more than one task running, load balance in the higher capacity group would have already made any possible moves to resolve imbalance and we should make better use of system compute capacity by moving a task if we still have more than one running. cc: Ingo Molnar <mingo@redhat.com> cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> [from https://lore.kernel.org/lkml/1530699470-29808-11-git-send-email-morten.rasmussen@arm.com/] Signed-off-by: Chris Redpath <chris.redpath@arm.com> Change-Id: Ib86570abdd453a51be885b086c8d80be2773a6f2 Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
\| *	sched: Enable min capacity capping	Georg Veichtlbauer	2023-07-27
\| \| \| \| \| \| \| \|	Change-Id: Icd8f2cde6cac1b7bb07e54b8e6989c65eea4e4a5
\| *	softirq, sched: reduce softirq conflicts with RT	John Dias	2023-07-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	joshuous: Adapted to work with CAF's "softirq: defer softirq processing to ksoftirqd if CPU is busy with RT" commit. ajaivasudeve: adapted for the commit "softirq: Don't defer all softirq during RT task" We're finding audio glitches caused by audio-producing RT tasks that are either interrupted to handle softirq's or that are scheduled onto cpu's that are handling softirq's. In a previous patch, we attempted to catch many cases of the latter problem, but it's clear that we are still losing significant numbers of races in some apps. This patch attempts to address both problems: 1. It prohibits handling softirq's when interrupting an RT task, by delaying the softirq to the ksoftirqd thread. 2. It attempts to reduce the most common windows in which we lose the race between scheduling an RT task on a remote core and starting to handle softirq's on that core. We still lose some races, but we lose significantly fewer. (And we don't want to introduce any heavyweight forms of synchronization on these paths.) Bug: 64912585 Change-Id: Ida89a903be0f1965552dd0e84e67ef1d3158c7d8 Signed-off-by: John Dias <joaodias@google.com> Signed-off-by: joshuous <joshuous@gmail.com> Signed-off-by: ajaivasudeve <ajaivasudeve@gmail.com>
\| *	ANDROID: sched/rt: rt cpu selection integration with EAS.	Srinath Sridharan	2023-07-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For effective interplay between RT and fair tasks. Enables sched_fifo for UI and Render tasks. Critical for improving user experience. bug: 24503801 bug: 30377696 Change-Id: I2a210c567c3f5c7edbdd7674244822f848e6d0cf Signed-off-by: Srinath Sridharan <srinathsr@google.com> (cherry picked from commit dfe0f16b6fd3a694173c5c62cf825643eef184a3)
\| *	cpufreq: schedutil: Fix sugov_start versus sugov_update_shared race	Vikram Mulukutla	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With a shared policy in place, when one of the CPUs in the policy is hotplugged out and then brought back online, sugov_stop and sugov_start are called in order. sugov_stop removes utilization hooks for each CPU in the policy and does nothing else in the for_each_cpu loop. sugov_start on the other hand iterates through the CPUs in the policy and re-initializes the per-cpu structure _and_ adds the utilization hook. This implies that the scheduler is allowed to invoke a CPU's utilization update hook when the rest of the per-cpu structures have yet to be re-inited. Apart from some strange values in tracepoints this doesn't cause a problem, but if we do end up accessing a pointer from the per-cpu sugov_cpu structure somewhere in the sugov_update_shared path, we will likely see crashes since the memset for another CPU in the policy is free to race with sugov_update_shared from the CPU that is ready to go. So let's fix this now to first init all per-cpu structures, and then add the per-cpu utilization update hooks all at once. Change-Id: I399e0e159b3db3ae3258843c9231f92312fe18ef Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
\| *	BACKPORT: cpufreq: schedutil: Cache tunables on governor exit	Rohit Gupta	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently when all the related CPUs from a policy go offline or the governor is switched, cpufreq framework calls sugov_exit() that frees the governor tunables. When any of the related CPUs comes back online or governor is switched back to schedutil sugov_init() gets called which allocates a fresh set of tunables that are set to default values. This can cause the userspace settings to those tunables to be lost across governor switches or when an entire cluster is hotplugged out. To prevent this, save the tunable values on governor exit. Restore these values to the newly allocated tunables on governor init. Change-Id: I671d4d0e1a4e63e948bfddb0005367df33c0c249 Signed-off-by: Rohit Gupta <rohgup@codeaurora.org> [Caching and restoring different tunables.] Signed-off-by: joshuous <joshuous@gmail.com> Change-Id: I852ae2d23f10c9337e7057a47adcc46fe0623c6a Signed-off-by: joshuous <joshuous@gmail.com>
\| *	sched: energy: handle memory allocation failure	Joonwoo Park	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Return immediately upon memory allocation failure. Change-Id: I30947d55f0f4abd55c51e42912a0762df57cbc1d Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
\| *	BACKPORT: ANDROID: Add hold functionality to schedtune CPU boost	Chris Redpath	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Render - added back missing header When tasks come and go from a runqueue quickly, this can lead to boost being applied and removed quickly which sometimes means we cannot raise the CPU frequency again when we need to (due to the rate limit on frequency updates). This has proved to be a particular issue for RT tasks and alternative methods have been used in the past to work around it. This is an attempt to solve the issue for all task classes and cpufreq governors by introducing a generic mechanism in schedtune to retain the max boost level from task enqueue for a minimum period - defined here as 50ms. This timeout was determined experimentally and is not configurable. A sched_feat guards the application of this to tasks - in the default configuration, task boosting only applied to tasks which have RT policy. Change SCHEDTUNE_BOOST_HOLD_ALL to true to apply it to all tasks regardless of class. It works like so: Every task enqueue (in an allowed class) stores a cpu-local timestamp. If the task is not a member of an allowed class (all or RT depending upon feature selection), the timestamp is not updated. The boost group will stay active regardless of tasks present until 50ms beyond the last timestamp stored. We also store the timestamp of the active boost group to avoid unneccesarily revisiting the boost groups when checking CPU boost level. If the timestamp is more than 50ms in the past when we check boost then we re-evaluate the boost groups for that CPU, taking into account the timestamps associated with each group. Idea based on rt-boost-retention patches from Joel. Change-Id: I52cc2d2e82d1c5aa03550378c8836764f41630c1 Suggested-by: Joel Fernandes <joelaf@google.com> Reviewed-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: RenderBroken <zkennedy87@gmail.com>
\| *	sched/fair: select the most energy-efficient CPU candidate on wake-up	Quentin Perret	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current implementation of the energy-aware wake-up path relies on find_best_target() to select an ordered list of CPU candidates for task placement. The first candidate of the list saving energy is then chosen, disregarding all the others to avoid the overhead of an expensive energy_diff. With the recent refactoring of select_energy_cpu_idx(), the cost of exploring multiple CPUs has been reduced, hence offering the opportunity to select the most energy-efficient candidate at a lower cost. This commit seizes this opportunity by allowing to change select_energy_cpu_idx()'s behaviour as to ignore the order of CPUs returned by find_best_target() and to pick the best candidate energy-wise. As this functionality is still considered as experimental, it is hidden behind a sched_feature named FBT_STRICT_ORDER (like the equivalent feature in Android 4.14) which defaults to true, hence keeping the current behaviour by default. Change-Id: I0cb833bfec1a4a053eddaff1652c0b6cad554f97 Suggested-by: Patrick Bellasi <patrick.bellasi@arm.com> Suggested-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>
\| *	sched/fair: fix array out of bounds access in select_energy_cpu_idx()	Pavankumar Kondeti	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We are using an incorrect index while initializing the nrg_delta for the previous CPU in select_energy_cpu_idx(). This initialization it self is not needed as the nrg_delta for the previous CPU is already initialized to 0 while preparing the energ_env struct. Change-Id: Iee4e2c62f904050d2680a0a1df646d4d515c62cc Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
\| *	ANDROID: sched/walt: make walt_ktime_suspended __read_mostly	Todd Poynor	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Most walt variables in hot code paths are __read_mostly and grouped together in the .data section, including these variables that show up as frequently accessed in gem5 simulation: walt_ravg_window, walt_disabled, walt_account_wait_time, and walt_freq_account_wait_time. The exception is walt_ktime_suspended, which is also accessed in many of the same hot paths. It is also almost entirely accessed by reads. Move it to __read_mostly in hopes of keeping it in the same cache line as the other hot data. Change-Id: I8c9e4ee84e5a0328b943752ee9ed47d4e006e7de Signed-off-by: Todd Poynor <toddpoynor@google.com>
\| *	cpufreq: schedutil: Use >= when aggregating CPU loads in a policy	Saravana Kannan	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the normal utilization value for all the CPUs in a policy is 0, the current aggregation scheme of using a > check will result in the aggregated utilization being 0 and the max value being 1. This is not a problem for upstream code. However, since we also use other statistics provided by WALT to update the aggregated utilization value across all CPUs in a policy, we can end up with a non-zero aggregated utilization while the max remains as 1. Then when get_next_freq() tries to compute the frequency using: max-frequency * 1.25 * (util / max) it ends up with a frequency that is greater than max-frequency. So the policy frequency jumps to max-frequency. By changing the aggregation check to >=, we make sure that the max value is updated with something reported by the scheduler for a CPU in that policy. With the correct max, we can continue using the WALT specific statistics without spurious jumps to max-frequency. Change-Id: I14996cd796191192ea112f949dc42450782105f7 Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
\| *	sched/tune: fix tracepoint location	Patrick Bellasi	2023-07-26
\| \| \| \| \| \| \| \| \| \|	Change-Id: Ibbcb281c2f048e2af0ded0b1cbbbedcc49b29e45 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
\| *	sched/tune: allow negative cpu boosts	Connor O'Brien	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	schedtune sets 0 as the floor for calculating CPU max boost, so negative schedtune.boost values do not affect CPU frequency decisions. Remove this restriction to allow userspace to apply negative boosts when appropriate. Also change treatment of the root boost group to match the other groups, so it only affects the CPU boost if it has runnable tasks on the CPU. Test: set all boosts negative; sched_boost_cpu trace events show negative CPU margins. Change-Id: I89f3470299aef96a18797c105f02ebc8f367b5e1 Signed-off-by: Connor O'Brien <connoro@google.com>
\| *	sched: tune: Unconditionally allow attach	Andres Oportus	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in commit ac087abe1358 ("Merge android-msm-8998-4.4-common into android-msm-muskie-4.4"), .allow_attach = subsys_cgroup_allow_attach, was dropped in the merge. This patch brings back allow_attach, but with the marlin-3.18 behavior of allowing all cgroup changes rather than the subsys_cgroup_allow_attach behavior of requiring SYS_CAP_NICE. Bug: 36592053 Change-Id: Iaa51597b49a955fd5709ca504e968ea19a9ca8f5 Signed-off-by: Andres Oportus <andresoportus@google.com> Signed-off-by: Andrew Chant <achant@google.com>
\| *	sched/fair: use min capacity when evaluating active cpus	Ionela Voinescu	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we are calculating what the impact of placing a task on a specific cpu is, we should include the information that there might be a minimum capacity imposed upon that cpu which could change the performance and/or energy cost decisions. When choosing an active target CPU, favour CPUs that won't end up running at a high OPP due to a min capacity cap imposed by external actors. Change-Id: Ibc3302304345b63107f172b1fc3ffdabc19aa9d4 Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>
\| *	sched/fair: use min capacity when evaluating idle backup cpus	Ionela Voinescu	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we are calculating what the impact of placing a task on a specific cpu is, we should include the information that there might be a minimum capacity imposed upon that cpu which could change the performance and/or energy cost decisions. When choosing an idle backup CPU, favour CPUs that won't end up running at a high OPP due to a min capacity cap imposed by external actors. Change-Id: I566623ffb3a7c5b61a23242dcce1cb4147ef8a4a Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>
\| *	sched/fair: use min capacity when evaluating placement energy costs	Ionela Voinescu	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the ability to track minimim capacity forced onto a sched_group by some external actor. group_max_util returns the highest utilisation inside a sched_group and is used when we are trying to calculate an energy cost estimate for a specific scheduling scenario. Minimum capacities imposed from elsewhere will influence this energy cost so we should reflect it here. Change-Id: Ibd537a6dbe6d67b11cc9e9be18f40fcb2c0f13de Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>
\| *	sched/fair: introduce minimum capacity capping sched feature	Ionela Voinescu	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have the ability to track minimum capacity forced onto a CPU by userspace or external actors. This is provided though a minimum frequency scale factor exposed by arch_scale_min_freq_capacity. The use of this information is enabled through the MIN_CAPACITY_CAPPING feature. If not enabled, the minimum frequency scale factor will remain 0 and it will not impact energy estimation or scheduling decisions. Change-Id: Ibc61f2bf4fddf186695b72b262e602a6e8bfde37 Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>
\| *	sched: add arch_scale_min_freq_capacity to track minimum capacity caps	Ionela Voinescu	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the minimum capacity of a group is capped by userspace or internal dependencies which are not otherwise visible to the scheduler, we need a way to see these and integrate this information into the energy calculations and task placement decisions we make. Add arch_scale_min_freq_capacity to determine the lowest capacity which a specific cpu can provide under the current set of known constraints. Change-Id: Ied4a1dc0982bbf42cb5ea2f27201d4363db59705 Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
\| *	sched/fair: introduce an arch scaling function for max frequency capping	Dietmar Eggemann	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The max frequency scaling factor is defined as: max_freq_scale = policy_max_freq / cpuinfo_max_freq To be able to scale the cpu capacity by this factor introduce a call to the new arch scaling function arch_scale_max_freq_capacity() in update_cpu_capacity() and provide a default implementation which returns SCHED_CAPACITY_SCALE. Another subsystem (e.g. cpufreq) can overwrite this default implementation, exactly as for frequency and cpu invariance. It has to be enabled by the arch by defining arch_scale_max_freq_capacity to the actual implementation. Change-Id: I266cd1f4c1c82f54b80063c36aa5f7662599dd28 Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>
\| *	sched/fair: reduce rounding errors in energy computations	Patrick Bellasi	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The SG's energy is obtained by adding busy and idle contributions which are computed by considering a proper fraction of the SCHED_CAPACITY_SCALE defined by the SG's utilizations. By scaling each and every contribution conputed we risk to accumulate rounding errors which can results into a non null energy_delta also in cases when the same total accomulated utilization is differently distributed among different CPUs. To reduce rouding errors, this patch accumulated non-scaled busy/idle energy contributions for each visited SG, and scale each of them just one time at the end. Change-Id: Idf8367fee0ac11938c6436096f0c1b2d630210d2 Suggested-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>
\| *	sched/fair: re-factor energy_diff to use a single (extensible) energy_env	Patrick Bellasi	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The energy_env data structure is used to cache values required by multiple different functions involved in energy_diff computation. Some of these functions require additional parameters which can be easily embedded into the energy_env itself. The current implementation of energy_diff hardcodes the usage of two different energy_env structures to estimate and compare the energy consumption related to a "before" and an "after" CPU. Moreover, it does this energy estimation by walking multiple times the SDs/SGs data structures. A better design can be envisioned by better using the energy_env structure to support a more efficient and concurrent evaluation of multiple schedule candidates. To this purpose, this patch provides a complete re-factoring of the energy_diff implementation to: 1. use a single energy_env structure for the evaluation of all the candidate CPUs 2. walk just one time the SDs/SGs, thus improving the overall performance to compute the energy estimation for each CPU candidate specified by the single used energy_env 3. simplify the code (at least if you look at the new version and not at this re-factoring patch) thus providing a more clean code to maintain and extend for additional features This patch updated all the clients of energy_env to use only the data provided by this structure and an index for one of its CPUs candidates. Embedding everything within the energy env will make it simple to add tracepoints for this new version, which can easily provide an holistic view on how energy_diff evaluated the proposed CPU candidates. The new proposed structure, for both "struct energy_env" and the functions using it, is designed in such a way to easily accommodate additional further extensions (e.g. SchedTune filtering) without requiring an additional big re-factoring of these core functions. Finally, the usage of a CPUs candidate array, embedded into the energy_diff structure, allows also to seamless extend the exploration of multiple candidate CPUs, for example to support the comparison of a spread-vs-packing strategy. Change-Id: Ic04ffb6848b2c763cf1788767f22c6872eb12bee Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> [reworked find_new_capacity() and enforced the respect of find_best_target() selection order] Signed-off-by: Quentin Perret <quentin.perret@arm.com> [@0ctobot: Adapted for kernel.lnx.4.4.r35-rel] Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
\| *	sched/fair: cleanup select_energy_cpu_brute to be more consistent	Patrick Bellasi	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current definition of select_energy_cpu_brute is a bit confusing in the definition of the value for the target_cpu to be returned as wakeup CPU for the specified task. This cleanup the code by ensuring that we always set target_cpu right before returning it. rcu_read_lock and check on *sd!=NULL are also moved around to be exactly where they are required. Change-Id: I70a4b558b3624a13395da1a87ddc0776fd1d6641 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>
\| *	sched/fair: remove capacity tracking from energy_diff	Patrick Bellasi	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In preparation for the energy_diff refactoring, let's remove all the SchedTune specific bits which are used to keep track of the capacity variations requited by the PESpace filtering. This removes also the energy_normalization function and the wrapper of energy_diff which is used to trigger a PESpace filtering by schedtune_accept_deltas(). The remaining code is the "original" energy_diff function which looks just at the energy variations to compare prev_cpu vs next_cpu. Change-Id: I4fb1d1c5ba45a364e6db9ab8044969349aba0307 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>
\| *	sched/fair: remove energy_diff tracepoint in preparation to re-factoring	Patrick Bellasi	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The format of the energy_diff tracepoint is going to be changed by the following energ_diff refactoring patches. Let's remove it now to start from a clean slate. Change-Id: Id4f537ed60d90a7ddcca0a29a49944bfacb85c8c Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>
\| *	sched/fair: use *p to reference task_structs	Chris Redpath	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a simple renaming patch which just align to the most common code convention used in fair.c, task_structs pointers are usually named *p. Change-Id: Id0769e52b6a271014d89353fdb4be9bb721b5b2f Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>
* \|	Merge lineage-20 of git@github.com:LineageOS/android_kernel_qcom_msm8998.git ↵	Davide Garberi	2023-08-06
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	into lineage-20 7d11b1a7a11c Revert "sched: cpufreq: Use sched_clock instead of rq_clock when updating schedutil" daaa5da96a74 sched: Take irq_sparse lock during the isolation 217ab2d0ef91 rcu: Speed up calling of RCU tasks callbacks 997b726bc092 kernel: power: Workaround for sensor ipc message causing high power consume b933e4d37bc0 sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices 82d3f23d6dc5 sched/fair: Fix bandwidth timer clock drift condition 629bfed360f9 kernel: power: qos: remove check for core isolation while cluster LPMs 891a63210e1d sched/fair: Fix issue where frequency update not skipped b775cb29f663 ANDROID: Move schedtune en/dequeue before schedutil update triggers ebdb82f7b34a sched/fair: Skip frequency updates if CPU about to idle ff383d94478a FROMLIST: sched: Make iowait_boost optional in schedutil 9539942cb065 FROMLIST: cpufreq: Make iowait boost a policy option b65c91c9aa14 ARM: dts: msm: add HW CPU's busy-cost-data for additional freqs 72f13941085b ARM: dts: msm: fix CPU's idle-cost-data ab88411382f7 ARM: dts: msm: fix EM to be monotonically increasing 83dcbae14782 ARM: dts: msm: Fix EAS idle-cost-data property length 33d3b17bfdfb ARM: dts: msm: Add msm8998 energy model c0fa7577022c sched/walt: Re-add code to allow WALT to function d5cd35f38616 FROMGIT: binder: use EINTR for interrupted wait for work db74739c86de sched: Don't fail isolation request for an already isolated CPU aee7a16e347b sched: WALT: increase WALT minimum window size to 20ms 4dbe44554792 sched: cpufreq: Use per_cpu_ptr instead of this_cpu_ptr when reporting load ef3fb04c7df4 sched: cpufreq: Use sched_clock instead of rq_clock when updating schedutil c7128748614a sched/cpupri: Exclude isolated CPUs from the lowest_mask 6adb092856e8 sched: cpufreq: Limit governor updates to WALT changes alone 0fa652ee00f5 sched: walt: Correct WALT window size initialization 41cbb7bc59fb sched: walt: fix window misalignment when HZ=300 43cbf9d6153d sched/tune: Increase the cgroup limit to 6 c71b8fffe6b3 drivers: cpuidle: lpm-levels: Fix KW issues with idle state idx < 0 938e42ca699f drivers: cpuidle: lpm-levels: Correctly check for list empty 8d8a48aecde5 sched/fair: Fix load_balance() affinity redo path eccc8acbe705 sched/fair: Avoid unnecessary active load balance 0ffdb886996b BACKPORT: sched/core: Fix rules for running on online && !active CPUs c9999f04236e sched/core: Allow kthreads to fall back to online && !active cpus b9b6bc6ea3c0 sched: Allow migrating kthreads into online but inactive CPUs a9314f9d8ad4 sched/fair: Allow load bigger task load balance when nr_running is 2 c0b317c27d44 pinctrl: qcom: Clear status bit on irq_unmask 45df1516d04a UPSTREAM: mm: fix misplaced unlock_page in do_wp_page() 899def5edcd4 UPSTREAM: mm/ksm: Remove reuse_ksm_page() 46c6fbdd185a BACKPORT: mm: do_wp_page() simplification 90dccbae4c04 UPSTREAM: mm: reuse only-pte-mapped KSM page in do_wp_page() ebf270d24640 sched/fair: vruntime should normalize when switching from fair cbe0b37059c9 mm: introduce arg_lock to protect arg_start\|end and env_start\|end in mm_struct 12d40f1995b4 msm: mdss: Fix indentation 620df03a7229 msm: mdss: Treat polling_en as the bool that it is 12af218146a6 msm: mdss: add idle state node 13e661759656 cpuset: Restore tasks affinity while moving across cpusets 602bf4096dab genirq: Honour IRQ's affinity hint during migration 9209b5556f6a power: qos: Use effective affinity mask f31078b5825f genirq: Introduce effective affinity mask 58c453484f7e sched/cputime: Mitigate performance regression in times()/clock_gettime() 400383059868 kernel: time: Add delay after cpu_relax() in tight loops 1daa7ea39076 pinctrl: qcom: Update irq handle for GPIO pins 07f7c9961c7c power: smb-lib: Fix mutex acquisition deadlock on PD hard reset 094b738f46c8 power: qpnp-smb2: Implement battery charging_enabled node d6038d6da57f ASoC: msm-pcm-q6-v2: Add dsp buf check 0d7a6c301af8 qcacld-3.0: Fix OOB in wma_scan_roam.c Change-Id: Ia2e189e37daad6e99bdb359d1204d9133a7916f4
\| *	Revert "sched: cpufreq: Use sched_clock instead of rq_clock when updating ↵	Georg Veichtlbauer	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	schedutil" That commit should have changed rq_clock to sched_clock, instead of sched_ktime_clock, which kept schedutil from making correct decisions. This reverts commit ef3fb04c7df43dfa1793e33f764a2581cda96310. Change-Id: Id4118894388c33bf2b2d3d5ee27eb35e82dc4a96
\| *	sched: Take irq_sparse lock during the isolation	Prasad Sodagudi	2023-07-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	irq_migrate_all_off_this_cpu() is used to migrate IRQs and this function checks for all active irq in the allocated_irqs mask. irq_migrate_all_off_this_cpu() expects the caller to take irq_sparse lock to avoid race conditions while accessing allocated_irqs mask variable. Prevent a race between irq alloc/free and irq migration by adding irq_sparse lock across CPU isolation. Change-Id: I9edece1ecea45297c8f6529952d88b3133046467 Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
\| *	sched/fair: Fix low cpu usage with high throttling by removing expiration of ↵	Dave Chiluk	2023-07-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cpu-local slices commit de53fd7aedb100f03e5d2231cfce0e4993282425 upstream. It has been observed, that highly-threaded, non-cpu-bound applications running under cpu.cfs_quota_us constraints can hit a high percentage of periods throttled while simultaneously not consuming the allocated amount of quota. This use case is typical of user-interactive non-cpu bound applications, such as those running in kubernetes or mesos when run on multiple cpu cores. This has been root caused to cpu-local run queue being allocated per cpu bandwidth slices, and then not fully using that slice within the period. At which point the slice and quota expires. This expiration of unused slice results in applications not being able to utilize the quota for which they are allocated. The non-expiration of per-cpu slices was recently fixed by 'commit 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift condition")'. Prior to that it appears that this had been broken since at least 'commit 51f2176d74ac ("sched/fair: Fix unlocked reads of some cfs_b->quota/period")' which was introduced in v3.16-rc1 in 2014. That added the following conditional which resulted in slices never being expired. if (cfs_rq->runtime_expires != cfs_b->runtime_expires) { /* extend local deadline, drift is bounded above by 2 ticks */ cfs_rq->runtime_expires += TICK_NSEC; Because this was broken for nearly 5 years, and has recently been fixed and is now being noticed by many users running kubernetes (https://github.com/kubernetes/kubernetes/issues/67577) it is my opinion that the mechanisms around expiring runtime should be removed altogether. This allows quota already allocated to per-cpu run-queues to live longer than the period boundary. This allows threads on runqueues that do not use much CPU to continue to use their remaining slice over a longer period of time than cpu.cfs_period_us. However, this helps prevent the above condition of hitting throttling while also not fully utilizing your cpu quota. This theoretically allows a machine to use slightly more than its allotted quota in some periods. This overflow would be bounded by the remaining quota left on each per-cpu runqueueu. This is typically no more than min_cfs_rq_runtime=1ms per cpu. For CPU bound tasks this will change nothing, as they should theoretically fully utilize all of their quota in each period. For user-interactive tasks as described above this provides a much better user/application experience as their cpu utilization will more closely match the amount they requested when they hit throttling. This means that cpu limits no longer strictly apply per period for non-cpu bound applications, but that they are still accurate over longer timeframes. This greatly improves performance of high-thread-count, non-cpu bound applications with low cfs_quota_us allocation on high-core-count machines. In the case of an artificial testcase (10ms/100ms of quota on 80 CPU machine), this commit resulted in almost 30x performance improvement, while still maintaining correct cpu quota restrictions. That testcase is available at https://github.com/indeedeng/fibtest. Fixes: 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift condition") Change-Id: I7d7a39fb554ec0c31f9381f492165f43c70b3924 Signed-off-by: Dave Chiluk <chiluk+linux@indeed.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Phil Auld <pauld@redhat.com> Reviewed-by: Ben Segall <bsegall@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: John Hammond <jhammond@indeed.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kyle Anderson <kwa@yelp.com> Cc: Gabriel Munos <gmunoz@netflix.com> Cc: Peter Oskolkov <posk@posk.io> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Brendan Gregg <bgregg@netflix.com> Link: https://lkml.kernel.org/r/1563900266-19734-2-git-send-email-chiluk+linux@indeed.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
\| *	sched/fair: Fix bandwidth timer clock drift condition	Xunlei Pang	2023-07-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 512ac999d2755d2b7109e996a76b6fb8b888631d upstream. I noticed that cgroup task groups constantly get throttled even if they have low CPU usage, this causes some jitters on the response time to some of our business containers when enabling CPU quotas. It's very simple to reproduce: mkdir /sys/fs/cgroup/cpu/test cd /sys/fs/cgroup/cpu/test echo 100000 > cpu.cfs_quota_us echo $$ > tasks then repeat: cat cpu.stat \| grep nr_throttled # nr_throttled will increase steadily After some analysis, we found that cfs_rq::runtime_remaining will be cleared by expire_cfs_rq_runtime() due to two equal but stale "cfs_{b\|q}->runtime_expires" after period timer is re-armed. The current condition to judge clock drift in expire_cfs_rq_runtime() is wrong, the two runtime_expires are actually the same when clock drift happens, so this condtion can never hit. The orginal design was correctly done by this commit: a9cf55b28610 ("sched: Expire invalid runtime") ... but was changed to be the current implementation due to its locking bug. This patch introduces another way, it adds a new field in both structures cfs_rq and cfs_bandwidth to record the expiration update sequence, and uses them to figure out if clock drift happens (true if they are equal). Change-Id: I8168fe3b45785643536f289ea823d1a62d9d8ab2 Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [alakeshh: backport: Fixed merge conflicts: - sched.h: Fix the indentation and order in which the variables are declared to match with coding style of the existing code in 4.14 Struct members of same type were declared in separate lines in upstream patch which has been changed back to having multiple members of same type in the same line. e.g. int a; int b; -> int a, b; ] Signed-off-by: Alakesh Haloi <alakeshh@amazon.com> Reviewed-by: Ben Segall <bsegall@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> # 4.14.x Fixes: 51f2176d74ac ("sched/fair: Fix unlocked reads of some cfs_b->quota/period") Link: http://lkml.kernel.org/r/20180620101834.24455-1-xlpang@linux.alibaba.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
\| *	sched/fair: Fix issue where frequency update not skipped	Joel Fernandes	2023-07-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes one of the infrequent conditions in commit 54b6baeca500 ("sched/fair: Skip frequency updates if CPU about to idle") where we could have skipped a frequency update. The fix is to use the correct flag which skips freq updates. Note that this is a rare issue (can show up only during CFS throttling) and even then we just do an additional frequency update which we were doing anyway before the above patch. Bug: 64689959 Change-Id: I0117442f395cea932ad56617065151bdeb9a3b53 Signed-off-by: Joel Fernandes <joelaf@google.com>
\| *	ANDROID: Move schedtune en/dequeue before schedutil update triggers	Chris Redpath	2023-07-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CPU rq util updates happen when rq signals are updated as part of enqueue and dequeue operations. Doing these updates triggers a call to the registered util update handler, which takes schedtune boosting into account. Enqueueing the task in the correct schedtune group after this happens means that we will potentially not see the boost for an entire throttle period. Move the enqueue/dequeue operations for schedtune before the signal updates which can trigger OPP changes. Change-Id: I4236e6b194bc5daad32ff33067d4be1987996780 Signed-off-by: Chris Redpath <chris.redpath@arm.com>
\| *	sched/fair: Skip frequency updates if CPU about to idle	Joel Fernandes	2023-07-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If CPU is about to idle, prevent a frequency update. With the number of schedutil governor wake ups are reduced by more than half on a test playing bluetooth audio. Test: sugov wake ups drop by more than half when playing music with screen off (476 / 1092) Bug: 64689959 Change-Id: I400026557b4134c0ac77f51c79610a96eb985b4a Signed-off-by: Joel Fernandes <joelaf@google.com>
\| *	FROMLIST: sched: Make iowait_boost optional in schedutil	Joel Fernandes	2023-07-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We should apply the iowait boost only if cpufreq policy has iowait boost enabled. Also make it a schedutil configuration from sysfs so it can be turned on/off if needed (by default initialize it to the policy value). For systems that don't need/want it enabled, such as those on arm64 based mobile devices that are battery operated, it saves energy when the cpufreq driver policy doesn't have it enabled (details below): Here are some results for energy measurements collected running a YouTube video for 30 seconds: Before: 8.042533 mWh After: 7.948377 mWh Energy savings is ~1.2% Bug: 38010527 Link: https://lkml.org/lkml/2017/5/19/42 Change-Id: If124076ad0c16ade369253840dedfbf870aff927 Signed-off-by: Joel Fernandes <joelaf@google.com>
\| *	sched/walt: Re-add code to allow WALT to function	Ethan Chen	2023-07-16
\| \| \| \| \| \| \| \|	Change-Id: Ieb1067c5e276f872ed4c722b7d1fabecbdad87e7