android_kernel_zuk_msm8996.git

	Commit message (Collapse)	Author	Age
*	Merge remote-tracking branch 'msm8998/lineage-20'master	Raghuram Subramani	2024-10-13
\|\
\| *	sched/fair Remove duplicate walt_cpu_high_irqload call	Alexander Grund	2024-03-06
\| \| \| \| \| \| \| \| \| \| \| \|	This is already done a few lines up. Change-Id: Ifeda223d728bfc4e107407418b11303f7819e277
\| *	sched/fair: Remove leftover sched_clock_cpu call	Alexander Grund	2024-03-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The usage of `now` in __refill_cfs_bandwidth_runtime was removed in b933e4d37bc023d27c7394626669bae0a201da52 (sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices) Remove the variable and the call setting it. Change-Id: I1e98ccd2c2298d269b9b447a6e5c79d61518c66e
\| *	CHROMIUM: remove Android's cgroup generic permissions checks	Dmitry Torokhov	2024-01-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The implementation is utterly broken, resulting in all processes being allows to move tasks between sets (as long as they have access to the "tasks" attribute), and upstream is heading towards checking only capability anyway, so let's get rid of this code. BUG=b:31790445,chromium:647994 TEST=Boot android container, examine logcat Change-Id: I2f780a5992c34e52a8f2d0b3557fc9d490da2779 Signed-off-by: Dmitry Torokhov <dtor@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/394967 Reviewed-by: Ricky Zhou <rickyz@chromium.org> Reviewed-by: John Stultz <john.stultz@linaro.org>
\| *	Revert "sched: tune: Unconditionally allow attach"	Bruno Martins	2024-01-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit f39d0b496aa4e13cbcf23bdf3c22d356bd783b9e. It is no longer applicable, allow_attach cgroup was ditched upstream. Change-Id: Ib370171e93a5cb0c9993065b98de2073c905229c
\| *	bpf: add BPF_J{LT,LE,SLT,SLE} instructions	Daniel Borkmann	2024-01-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, eBPF only understands BPF_JGT (>), BPF_JGE (>=), BPF_JSGT (s>), BPF_JSGE (s>=) instructions, this means that particularly JLT/JLE counterparts involving immediates need to be rewritten from e.g. X < [IMM] by swapping arguments into [IMM] > X, meaning the immediate first is required to be loaded into a register Y := [IMM], such that then we can compare with Y > X. Note that the destination operand is always required to be a register. This has the downside of having unnecessarily increased register pressure, meaning complex program would need to spill other registers temporarily to stack in order to obtain an unused register for the [IMM]. Loading to registers will thus also affect state pruning since we need to account for that register use and potentially those registers that had to be spilled/filled again. As a consequence slightly more stack space might have been used due to spilling, and BPF programs are a bit longer due to extra code involving the register load and potentially required spill/fills. Thus, add BPF_JLT (<), BPF_JLE (<=), BPF_JSLT (s<), BPF_JSLE (s<=) counterparts to the eBPF instruction set. Modifying LLVM to remove the NegateCC() workaround in a PoC patch at [1] and allowing it to also emit the new instructions resulted in cilium's BPF programs that are injected into the fast-path to have a reduced program length in the range of 2-3% (e.g. accumulated main and tail call sections from one of the object file reduced from 4864 to 4729 insns), reduced complexity in the range of 10-30% (e.g. accumulated sections reduced in one of the cases from 116432 to 88428 insns), and reduced stack usage in the range of 1-5% (e.g. accumulated sections from one of the object files reduced from 824 to 784b). The modification for LLVM will be incorporated in a backwards compatible way. Plan is for LLVM to have i) a target specific option to offer a possibility to explicitly enable the extension by the user (as we have with -m target specific extensions today for various CPU insns), and ii) have the kernel checked for presence of the extensions and enable them transparently when the user is selecting more aggressive options such as -march=native in a bpf target context. (Other frontends generating BPF byte code, e.g. ply can probe the kernel directly for its code generation.) [1] https://github.com/borkmann/llvm/tree/bpf-insns Change-Id: Ic56500aaeaf5f3ebdfda094ad6ef4666c82e18c5 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	bpf: free up BPF_JMP \| BPF_CALL \| BPF_X opcode	Alexei Starovoitov	2024-01-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	free up BPF_JMP \| BPF_CALL \| BPF_X opcode to be used by actual indirect call by register and use kernel internal opcode to mark call instruction into bpf_tail_call() helper. Change-Id: I1a45b8e3c13848c9689ce288d4862935ede97fa7 Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	bpf: remove stubs for cBPF from arch code	Daniel Borkmann	2024-01-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove the dummy bpf_jit_compile() stubs for eBPF JITs and make that a single __weak function in the core that can be overridden similarly to the eBPF one. Also remove stale pr_err() mentions of bpf_jit_compile. Change-Id: Iac221c09e9ae0879acdd7064d710c4f7cb8f478d Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tracing: Avoid adding tracer option before update_tracer_options	Mark-PK Tsai	2024-01-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[ Upstream commit ef9188bcc6ca1d8a2ad83e826b548e6820721061 ] To prepare for support asynchronous tracer_init_tracefs initcall, avoid calling create_trace_option_files before __update_tracer_options. Otherwise, create_trace_option_files will show warning because some tracers in trace_types list are already in tr->topts. For example, hwlat_tracer call register_tracer in late_initcall, and global_trace.dir is already created in tracing_init_dentry, hwlat_tracer will be put into tr->topts. Then if the __update_tracer_options is executed after hwlat_tracer registered, create_trace_option_files find that hwlat_tracer is already in tr->topts. Link: https://lkml.kernel.org/r/20220426122407.17042-2-mark-pk.tsai@mediatek.com Link: https://lore.kernel.org/lkml/20220322133339.GA32582@xsang-OptiPlex-9020/ Reported-by: kernel test robot <oliver.sang@intel.com> Change-Id: Icaa5bcb097b421d2459b082644f84031e89ac88b Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
\| *	sched_getaffinity: don't assume 'cpumask_size()' is fully initialized	Linus Torvalds	2024-01-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[ Upstream commit 6015b1aca1a233379625385feb01dd014aca60b5 ] The getaffinity() system call uses 'cpumask_size()' to decide how big the CPU mask is - so far so good. It is indeed the allocation size of a cpumask. But the code also assumes that the whole allocation is initialized without actually doing so itself. That's wrong, because we might have fixed-size allocations (making copying and clearing more efficient), but not all of it is then necessarily used if 'nr_cpu_ids' is smaller. Having checked other users of 'cpumask_size()', they all seem to be ok, either using it purely for the allocation size, or explicitly zeroing the cpumask before using the size in bytes to copy it. See for example the ublk_ctrl_get_queue_affinity() function that uses the proper 'zalloc_cpumask_var()' to make sure that the whole mask is cleared, whether the storage is on the stack or if it was an external allocation. Fix this by just zeroing the allocation before using it. Do the same for the compat version of sched_getaffinity(), which had the same logic. Also, for consistency, make sched_getaffinity() use 'cpumask_bits()' to access the bits. For a cpumask_var_t, it ends up being a pointer to the same data either way, but it's just a good idea to treat it like you would a 'cpumask_t'. The compat case already did that. Reported-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/lkml/7d026744-6bd6-6827-0471-b5e8eae0be3f@arm.com/ Cc: Yury Norov <yury.norov@gmail.com> Change-Id: I60139451bd9e9a4b687f0f2097ac1b2813758c45 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Ulrich Hecht <uli+cip@fpond.eu>
\| *	sched: deadline: Add missing WALT code	Josh Choo	2023-11-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	CAF did not use WALT on msm-4.4 kernels and left out important WALT bits. Change-Id: If5de7100e010f299bd7b2c62720ff309a98c569d
\| *	sched: Reinstantiate EAS check_for_migration() implementation	Alexander Grund	2023-11-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 6d5adb184946 ("sched: Restore previous implementation of check_for_migration()") reverted parts of an upstream commit including the "EAS scheduler implementation" of check_for_migration() in favor of the HMP implementation as the former breaks the latter. However without HMP we do want the former. Hence add both and select based on CONFIG_SCHED_HMP. Note that CONFIG_SMP is a precondition for CONFIG_SCHED_HMP, so the guard in the header uses the former. Change-Id: Iac0b462a38b35d1670d56ba58fee532a957c60b3
\| *	sched: Remove left-over CPU-query from __migrate_task	Alexander Grund	2023-11-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The code using this was moved to notify_migration but the query and the variable were not removed. Fixes 375d7195fc257 ("sched: move out migration notification out of spinlock") Change-Id: I75569443db2a55510c8279993e94b3e1a0ed562c
\| *	tracing: Fix infinite loop in tracing_read_pipe on overflowed print_trace_line	Yang Jihong	2023-11-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit c1ac03af6ed45d05786c219d102f37eb44880f28 upstream. print_trace_line may overflow seq_file buffer. If the event is not consumed, the while loop keeps peeking this event, causing a infinite loop. Link: https://lkml.kernel.org/r/20221129113009.182425-1-yangjihong1@huawei.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: stable@vger.kernel.org Fixes: 088b1e427dbba ("ftrace: pipe fixes") Change-Id: Ic7f41111b7c7944e9f64b010a3c1ff095c757b9d Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ulrich Hecht <uli+cip@fpond.eu>
\| *	tracing: Fix memleak due to race between current_tracer and trace	Zheng Yejian	2023-11-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[ Upstream commit eecb91b9f98d6427d4af5fdb8f108f52572a39e7 ] Kmemleak report a leak in graph_trace_open(): unreferenced object 0xffff0040b95f4a00 (size 128): comm "cat", pid 204981, jiffies 4301155872 (age 99771.964s) hex dump (first 32 bytes): e0 05 e7 b4 ab 7d 00 00 0b 00 01 00 00 00 00 00 .....}.......... f4 00 01 10 00 a0 ff ff 00 00 00 00 65 00 10 00 ............e... backtrace: [<000000005db27c8b>] kmem_cache_alloc_trace+0x348/0x5f0 [<000000007df90faa>] graph_trace_open+0xb0/0x344 [<00000000737524cd>] __tracing_open+0x450/0xb10 [<0000000098043327>] tracing_open+0x1a0/0x2a0 [<00000000291c3876>] do_dentry_open+0x3c0/0xdc0 [<000000004015bcd6>] vfs_open+0x98/0xd0 [<000000002b5f60c9>] do_open+0x520/0x8d0 [<00000000376c7820>] path_openat+0x1c0/0x3e0 [<00000000336a54b5>] do_filp_open+0x14c/0x324 [<000000002802df13>] do_sys_openat2+0x2c4/0x530 [<0000000094eea458>] __arm64_sys_openat+0x130/0x1c4 [<00000000a71d7881>] el0_svc_common.constprop.0+0xfc/0x394 [<00000000313647bf>] do_el0_svc+0xac/0xec [<000000002ef1c651>] el0_svc+0x20/0x30 [<000000002fd4692a>] el0_sync_handler+0xb0/0xb4 [<000000000c309c35>] el0_sync+0x160/0x180 The root cause is descripted as follows: __tracing_open() { // 1. File 'trace' is being opened; ... iter->trace = tr->current_trace; // 2. Tracer 'function_graph' is // currently set; ... iter->trace->open(iter); // 3. Call graph_trace_open() here, // and memory are allocated in it; ... } s_start() { // 4. The opened file is being read; ... iter->trace = tr->current_trace; // 5. If tracer is switched to // 'nop' or others, then memory // in step 3 are leaked!!! ... } To fix it, in s_start(), close tracer before switching then reopen the new tracer after switching. And some tracers like 'wakeup' may not update 'iter->private' in some cases when reopen, then it should be cleared to avoid being mistakenly closed again. Link: https://lore.kernel.org/linux-trace-kernel/20230817125539.1646321-1-zhengyejian1@huawei.com Fixes: d7350c3f4569 ("tracing/core: make the read callbacks reentrants") Change-Id: I1265ac4538ca529fb02b4cce76a4aa835f90b128 Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Ulrich Hecht <uli@kernel.org>
\| *	tracing: Ensure trace buffer is at least 4096 bytes large	Sven Schnelle	2023-11-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[ Upstream commit 7acf3a127bb7c65ff39099afd78960e77b2ca5de ] Booting the kernel with 'trace_buf_size=1' give a warning at boot during the ftrace selftests: [ 0.892809] Running postponed tracer tests: [ 0.892893] Testing tracer function: [ 0.901899] Callback from call_rcu_tasks_trace() invoked. [ 0.983829] Callback from call_rcu_tasks_rude() invoked. [ 1.072003] .. bad ring buffer .. corrupted trace buffer .. [ 1.091944] Callback from call_rcu_tasks() invoked. [ 1.097695] PASSED [ 1.097701] Testing dynamic ftrace: .. filter failed count=0 ..FAILED! [ 1.353474] ------------[ cut here ]------------ [ 1.353478] WARNING: CPU: 0 PID: 1 at kernel/trace/trace.c:1951 run_tracer_selftest+0x13c/0x1b0 Therefore enforce a minimum of 4096 bytes to make the selftest pass. Link: https://lkml.kernel.org/r/20220214134456.1751749-1-svens@linux.ibm.com Change-Id: I796709bd9ccaefb463da0aca49f53a475b5f57a0 Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
\| *	tracing: Fix tp_printk option related with tp_printk_stop_on_boot	JaeSang Yoo	2023-11-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[ Upstream commit 3203ce39ac0b2a57a84382ec184c7d4a0bede175 ] The kernel parameter "tp_printk_stop_on_boot" starts with "tp_printk" which is the same as another kernel parameter "tp_printk". If "tp_printk" setup is called before the "tp_printk_stop_on_boot", it will override the latter and keep it from being set. This is similar to other kernel parameter issues, such as: Commit 745a600cf1a6 ("um: console: Ignore console= option") or init/do_mounts.c:45 (setup function of "ro" kernel param) Fix it by checking for a "_" right after the "tp_printk" and if that exists do not process the parameter. Link: https://lkml.kernel.org/r/20220208195421.969326-1-jsyoo5b@gmail.com Change-Id: I997201f383889388231894b1881ead904e322b14 Signed-off-by: JaeSang Yoo <jsyoo5b@gmail.com> [ Fixed up change log and added space after if condition ] Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
\| *	blktrace: Fix output non-blktrace event when blk_classic option enabled	Yang Jihong	2023-11-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[ Upstream commit f596da3efaf4130ff61cd029558845808df9bf99 ] When the blk_classic option is enabled, non-blktrace events must be filtered out. Otherwise, events of other types are output in the blktrace classic format, which is unexpected. The problem can be triggered in the following ways: # echo 1 > /sys/kernel/debug/tracing/options/blk_classic # echo 1 > /sys/kernel/debug/tracing/events/enable # echo blk > /sys/kernel/debug/tracing/current_tracer # cat /sys/kernel/debug/tracing/trace_pipe Fixes: c71a89615411 ("blktrace: add ftrace plugin") Change-Id: Ia8eda6f8123933ae2b8219fc781d417be8711647 Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20221122040410.85113-1-yangjihong1@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Ulrich Hecht <uli+cip@fpond.eu>
\| *	sched/walt: Add missing WALT call to `dequeue_task_fair`	Alexander Grund	2023-11-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Similar to `dec_cfs_rq_hmp_stats` vs `walt_dec_cfs_cumulative_runnable_avg` we need to call `walt_dec_cumulative_runnable_avg` where `dec_rq_hmp_stats` is called. Corresponds to the `walt_inc_cfs_cumulative_runnable_avg` call in `enqueue_task_fair`. Based on 4e29a6c5f98f9694d5ad01a4e7899aad157f8d49 ("sched: Add missing WALT code") Fixes c0fa7577022c4169e1aaaf1bd9e04f63d285beb2 ("sched/walt: Re-add code to allow WALT to function") Change-Id: If2b291e1e509ba300d7f4b698afe73a72b273604
\| *	libfs: support RENAME_NOREPLACE in simple_rename()	Miklos Szeredi	2023-11-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is trivial to do: - add flags argument to simple_rename() - check if flags doesn't have any other than RENAME_NOREPLACE - assign simple_rename() to .rename2 instead of .rename Filesystems converted: hugetlbfs, ramfs, bpf. Debugfs uses simple_rename() to implement debugfs_rename(), which is for debugfs instances to rename files internally, not for userspace filesystem access. For this case pass zero flags to simple_rename(). Change-Id: I1a46ece3b40b05c9f18fd13b98062d2a959b76a0 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alexei Starovoitov <ast@kernel.org>
\| *	BACKPORT: cpufreq: schedutil: Avoid using invalid next_freq	Rafael J. Wysocki	2023-07-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the next_freq field of struct sugov_policy is set to UINT_MAX, it shouldn't be used for updating the CPU frequency (this is a special "invalid" value), but after commit b7eaf1aab9f8 (cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely) it may be passed as the new frequency to sugov_update_commit() in sugov_update_single(). Fix that by adding an extra check for the special UINT_MAX value of next_freq to sugov_update_single(). Fixes: b7eaf1aab9f8 (cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely) Reported-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 4.12+ <stable@vger.kernel.org> # 4.12+ Change-Id: Idf4ebe9e912f983598255167d2065e47562ab83d Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> (cherry picked from commit 97739501f207efe33145b918817f305b822987f8) Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
\| *	cpufreq: schedutil: Don't set next_freq to UINT_MAX	Viresh Kumar	2023-07-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The schedutil driver sets sg_policy->next_freq to UINT_MAX on certain occasions to discard the cached value of next freq: - In sugov_start(), when the schedutil governor is started for a group of CPUs. - And whenever we need to force a freq update before rate-limit duration, which happens when: - there is an update in cpufreq policy limits. - Or when the utilization of DL scheduling class increases. In return, get_next_freq() doesn't return a cached next_freq value but recalculates the next frequency instead. But having special meaning for a particular value of frequency makes the code less readable and error prone. We recently fixed a bug where the UINT_MAX value was considered as valid frequency in sugov_update_single(). All we need is a flag which can be used to discard the value of sg_policy->next_freq and we already have need_freq_update for that. Lets reuse it instead of setting next_freq to UINT_MAX. Change-Id: Ia37ef416d5ecac11fe0c6a2be7e21fdbca708a1a Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Yaroslav Furman <yaro330@gmail.com> - backported to 4.4
\| *	schedutil: Allow cpufreq requests to be made even when kthread kicked	Joel Fernandes (Google)	2023-07-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently there is a chance of a schedutil cpufreq update request to be dropped if there is a pending update request. This pending request can be delayed if there is a scheduling delay of the irq_work and the wake up of the schedutil governor kthread. A very bad scenario is when a schedutil request was already just made, such as to reduce the CPU frequency, then a newer request to increase CPU frequency (even sched deadline urgent frequency increase requests) can be dropped, even though the rate limits suggest that its Ok to process a request. This is because of the way the work_in_progress flag is used. This patch improves the situation by allowing new requests to happen even though the old one is still being processed. Note that in this approach, if an irq_work was already issued, we just update next_freq and don't bother to queue another request so there's no extra work being done to make this happen. Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Change-Id: I7b6e19971b2ce3bd8e8336a5a4bc1acb920493b5 Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Yaroslav Furman <yaro330@gmail.com> - backport to 4.4
\| *	cpufreq: schedutil: Fix iowait boost reset	Patrick Bellasi	2023-07-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A more energy efficient update of the IO wait boosting mechanism has been introduced in: commit a5a0809 ("cpufreq: schedutil: Make iowait boost more energy efficient") where the boost value is expected to be: - doubled at each successive wakeup from IO staring from the minimum frequency supported by a CPU - reset when a CPU is not updated for more then one tick by either disabling the IO wait boost or resetting its value to the minimum frequency if this new update requires an IO boost. This approach is supposed to "ignore" boosting for sporadic wakeups from IO, while still getting the frequency boosted to the maximum to benefit long sequence of wakeup from IO operations. However, these assumptions are not always satisfied. For example, when an IO boosted CPU enters idle for more the one tick and then wakes up after an IO wait, since in sugov_set_iowait_boost() we first check the IOWAIT flag, we keep doubling the iowait boost instead of restarting from the minimum frequency value. This misbehavior could happen mainly on non-shared frequency domains, thus defeating the energy efficiency optimization, but it can also happen on shared frequency domain systems. Let fix this issue in sugov_set_iowait_boost() by: - first check the IO wait boost reset conditions to eventually reset the boost value - then applying the correct IO boost value if required by the caller Fixes: a5a0809 (cpufreq: schedutil: Make iowait boost more energy efficient) Reported-by: Viresh Kumar <viresh.kumar@linaro.org> Change-Id: I196b5c464cd43164807c12b2dbc2e5d814bf1d33 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Yaroslav Furman <yaro330@gmail.com> - backport to 4.4
\| *	FROMLIST: sched/fair: Don't move tasks to lower capacity cpus unless necessary	Chris Redpath	2023-07-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When lower capacity CPUs are load balancing and considering to pull something from a higher capacity group, we should not pull tasks from a cpu with only one task running as this is guaranteed to impede progress for that task. If there is more than one task running, load balance in the higher capacity group would have already made any possible moves to resolve imbalance and we should make better use of system compute capacity by moving a task if we still have more than one running. cc: Ingo Molnar <mingo@redhat.com> cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> [from https://lore.kernel.org/lkml/1530699470-29808-11-git-send-email-morten.rasmussen@arm.com/] Signed-off-by: Chris Redpath <chris.redpath@arm.com> Change-Id: Ib86570abdd453a51be885b086c8d80be2773a6f2 Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
\| *	sched: Enable min capacity capping	Georg Veichtlbauer	2023-07-27
\| \| \| \| \| \| \| \|	Change-Id: Icd8f2cde6cac1b7bb07e54b8e6989c65eea4e4a5
\| *	softirq, sched: reduce softirq conflicts with RT	John Dias	2023-07-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	joshuous: Adapted to work with CAF's "softirq: defer softirq processing to ksoftirqd if CPU is busy with RT" commit. ajaivasudeve: adapted for the commit "softirq: Don't defer all softirq during RT task" We're finding audio glitches caused by audio-producing RT tasks that are either interrupted to handle softirq's or that are scheduled onto cpu's that are handling softirq's. In a previous patch, we attempted to catch many cases of the latter problem, but it's clear that we are still losing significant numbers of races in some apps. This patch attempts to address both problems: 1. It prohibits handling softirq's when interrupting an RT task, by delaying the softirq to the ksoftirqd thread. 2. It attempts to reduce the most common windows in which we lose the race between scheduling an RT task on a remote core and starting to handle softirq's on that core. We still lose some races, but we lose significantly fewer. (And we don't want to introduce any heavyweight forms of synchronization on these paths.) Bug: 64912585 Change-Id: Ida89a903be0f1965552dd0e84e67ef1d3158c7d8 Signed-off-by: John Dias <joaodias@google.com> Signed-off-by: joshuous <joshuous@gmail.com> Signed-off-by: ajaivasudeve <ajaivasudeve@gmail.com>
\| *	ANDROID: sched/rt: rt cpu selection integration with EAS.	Srinath Sridharan	2023-07-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For effective interplay between RT and fair tasks. Enables sched_fifo for UI and Render tasks. Critical for improving user experience. bug: 24503801 bug: 30377696 Change-Id: I2a210c567c3f5c7edbdd7674244822f848e6d0cf Signed-off-by: Srinath Sridharan <srinathsr@google.com> (cherry picked from commit dfe0f16b6fd3a694173c5c62cf825643eef184a3)
\| *	cpufreq: schedutil: Fix sugov_start versus sugov_update_shared race	Vikram Mulukutla	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With a shared policy in place, when one of the CPUs in the policy is hotplugged out and then brought back online, sugov_stop and sugov_start are called in order. sugov_stop removes utilization hooks for each CPU in the policy and does nothing else in the for_each_cpu loop. sugov_start on the other hand iterates through the CPUs in the policy and re-initializes the per-cpu structure _and_ adds the utilization hook. This implies that the scheduler is allowed to invoke a CPU's utilization update hook when the rest of the per-cpu structures have yet to be re-inited. Apart from some strange values in tracepoints this doesn't cause a problem, but if we do end up accessing a pointer from the per-cpu sugov_cpu structure somewhere in the sugov_update_shared path, we will likely see crashes since the memset for another CPU in the policy is free to race with sugov_update_shared from the CPU that is ready to go. So let's fix this now to first init all per-cpu structures, and then add the per-cpu utilization update hooks all at once. Change-Id: I399e0e159b3db3ae3258843c9231f92312fe18ef Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
\| *	BACKPORT: cpufreq: schedutil: Cache tunables on governor exit	Rohit Gupta	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently when all the related CPUs from a policy go offline or the governor is switched, cpufreq framework calls sugov_exit() that frees the governor tunables. When any of the related CPUs comes back online or governor is switched back to schedutil sugov_init() gets called which allocates a fresh set of tunables that are set to default values. This can cause the userspace settings to those tunables to be lost across governor switches or when an entire cluster is hotplugged out. To prevent this, save the tunable values on governor exit. Restore these values to the newly allocated tunables on governor init. Change-Id: I671d4d0e1a4e63e948bfddb0005367df33c0c249 Signed-off-by: Rohit Gupta <rohgup@codeaurora.org> [Caching and restoring different tunables.] Signed-off-by: joshuous <joshuous@gmail.com> Change-Id: I852ae2d23f10c9337e7057a47adcc46fe0623c6a Signed-off-by: joshuous <joshuous@gmail.com>
\| *	sched: energy: handle memory allocation failure	Joonwoo Park	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Return immediately upon memory allocation failure. Change-Id: I30947d55f0f4abd55c51e42912a0762df57cbc1d Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
\| *	BACKPORT: ANDROID: Add hold functionality to schedtune CPU boost	Chris Redpath	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Render - added back missing header When tasks come and go from a runqueue quickly, this can lead to boost being applied and removed quickly which sometimes means we cannot raise the CPU frequency again when we need to (due to the rate limit on frequency updates). This has proved to be a particular issue for RT tasks and alternative methods have been used in the past to work around it. This is an attempt to solve the issue for all task classes and cpufreq governors by introducing a generic mechanism in schedtune to retain the max boost level from task enqueue for a minimum period - defined here as 50ms. This timeout was determined experimentally and is not configurable. A sched_feat guards the application of this to tasks - in the default configuration, task boosting only applied to tasks which have RT policy. Change SCHEDTUNE_BOOST_HOLD_ALL to true to apply it to all tasks regardless of class. It works like so: Every task enqueue (in an allowed class) stores a cpu-local timestamp. If the task is not a member of an allowed class (all or RT depending upon feature selection), the timestamp is not updated. The boost group will stay active regardless of tasks present until 50ms beyond the last timestamp stored. We also store the timestamp of the active boost group to avoid unneccesarily revisiting the boost groups when checking CPU boost level. If the timestamp is more than 50ms in the past when we check boost then we re-evaluate the boost groups for that CPU, taking into account the timestamps associated with each group. Idea based on rt-boost-retention patches from Joel. Change-Id: I52cc2d2e82d1c5aa03550378c8836764f41630c1 Suggested-by: Joel Fernandes <joelaf@google.com> Reviewed-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: RenderBroken <zkennedy87@gmail.com>
\| *	sched/fair: select the most energy-efficient CPU candidate on wake-up	Quentin Perret	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current implementation of the energy-aware wake-up path relies on find_best_target() to select an ordered list of CPU candidates for task placement. The first candidate of the list saving energy is then chosen, disregarding all the others to avoid the overhead of an expensive energy_diff. With the recent refactoring of select_energy_cpu_idx(), the cost of exploring multiple CPUs has been reduced, hence offering the opportunity to select the most energy-efficient candidate at a lower cost. This commit seizes this opportunity by allowing to change select_energy_cpu_idx()'s behaviour as to ignore the order of CPUs returned by find_best_target() and to pick the best candidate energy-wise. As this functionality is still considered as experimental, it is hidden behind a sched_feature named FBT_STRICT_ORDER (like the equivalent feature in Android 4.14) which defaults to true, hence keeping the current behaviour by default. Change-Id: I0cb833bfec1a4a053eddaff1652c0b6cad554f97 Suggested-by: Patrick Bellasi <patrick.bellasi@arm.com> Suggested-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>
\| *	sched/fair: fix array out of bounds access in select_energy_cpu_idx()	Pavankumar Kondeti	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We are using an incorrect index while initializing the nrg_delta for the previous CPU in select_energy_cpu_idx(). This initialization it self is not needed as the nrg_delta for the previous CPU is already initialized to 0 while preparing the energ_env struct. Change-Id: Iee4e2c62f904050d2680a0a1df646d4d515c62cc Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
\| *	ANDROID: sched/walt: make walt_ktime_suspended __read_mostly	Todd Poynor	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Most walt variables in hot code paths are __read_mostly and grouped together in the .data section, including these variables that show up as frequently accessed in gem5 simulation: walt_ravg_window, walt_disabled, walt_account_wait_time, and walt_freq_account_wait_time. The exception is walt_ktime_suspended, which is also accessed in many of the same hot paths. It is also almost entirely accessed by reads. Move it to __read_mostly in hopes of keeping it in the same cache line as the other hot data. Change-Id: I8c9e4ee84e5a0328b943752ee9ed47d4e006e7de Signed-off-by: Todd Poynor <toddpoynor@google.com>
\| *	cpufreq: schedutil: Use >= when aggregating CPU loads in a policy	Saravana Kannan	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the normal utilization value for all the CPUs in a policy is 0, the current aggregation scheme of using a > check will result in the aggregated utilization being 0 and the max value being 1. This is not a problem for upstream code. However, since we also use other statistics provided by WALT to update the aggregated utilization value across all CPUs in a policy, we can end up with a non-zero aggregated utilization while the max remains as 1. Then when get_next_freq() tries to compute the frequency using: max-frequency * 1.25 * (util / max) it ends up with a frequency that is greater than max-frequency. So the policy frequency jumps to max-frequency. By changing the aggregation check to >=, we make sure that the max value is updated with something reported by the scheduler for a CPU in that policy. With the correct max, we can continue using the WALT specific statistics without spurious jumps to max-frequency. Change-Id: I14996cd796191192ea112f949dc42450782105f7 Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
\| *	sched/tune: fix tracepoint location	Patrick Bellasi	2023-07-26
\| \| \| \| \| \| \| \| \| \|	Change-Id: Ibbcb281c2f048e2af0ded0b1cbbbedcc49b29e45 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
\| *	sched/tune: allow negative cpu boosts	Connor O'Brien	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	schedtune sets 0 as the floor for calculating CPU max boost, so negative schedtune.boost values do not affect CPU frequency decisions. Remove this restriction to allow userspace to apply negative boosts when appropriate. Also change treatment of the root boost group to match the other groups, so it only affects the CPU boost if it has runnable tasks on the CPU. Test: set all boosts negative; sched_boost_cpu trace events show negative CPU margins. Change-Id: I89f3470299aef96a18797c105f02ebc8f367b5e1 Signed-off-by: Connor O'Brien <connoro@google.com>
\| *	sched: tune: Unconditionally allow attach	Andres Oportus	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in commit ac087abe1358 ("Merge android-msm-8998-4.4-common into android-msm-muskie-4.4"), .allow_attach = subsys_cgroup_allow_attach, was dropped in the merge. This patch brings back allow_attach, but with the marlin-3.18 behavior of allowing all cgroup changes rather than the subsys_cgroup_allow_attach behavior of requiring SYS_CAP_NICE. Bug: 36592053 Change-Id: Iaa51597b49a955fd5709ca504e968ea19a9ca8f5 Signed-off-by: Andres Oportus <andresoportus@google.com> Signed-off-by: Andrew Chant <achant@google.com>
\| *	sched/fair: use min capacity when evaluating active cpus	Ionela Voinescu	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we are calculating what the impact of placing a task on a specific cpu is, we should include the information that there might be a minimum capacity imposed upon that cpu which could change the performance and/or energy cost decisions. When choosing an active target CPU, favour CPUs that won't end up running at a high OPP due to a min capacity cap imposed by external actors. Change-Id: Ibc3302304345b63107f172b1fc3ffdabc19aa9d4 Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>
\| *	sched/fair: use min capacity when evaluating idle backup cpus	Ionela Voinescu	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we are calculating what the impact of placing a task on a specific cpu is, we should include the information that there might be a minimum capacity imposed upon that cpu which could change the performance and/or energy cost decisions. When choosing an idle backup CPU, favour CPUs that won't end up running at a high OPP due to a min capacity cap imposed by external actors. Change-Id: I566623ffb3a7c5b61a23242dcce1cb4147ef8a4a Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>
\| *	sched/fair: use min capacity when evaluating placement energy costs	Ionela Voinescu	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the ability to track minimim capacity forced onto a sched_group by some external actor. group_max_util returns the highest utilisation inside a sched_group and is used when we are trying to calculate an energy cost estimate for a specific scheduling scenario. Minimum capacities imposed from elsewhere will influence this energy cost so we should reflect it here. Change-Id: Ibd537a6dbe6d67b11cc9e9be18f40fcb2c0f13de Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>
\| *	sched/fair: introduce minimum capacity capping sched feature	Ionela Voinescu	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have the ability to track minimum capacity forced onto a CPU by userspace or external actors. This is provided though a minimum frequency scale factor exposed by arch_scale_min_freq_capacity. The use of this information is enabled through the MIN_CAPACITY_CAPPING feature. If not enabled, the minimum frequency scale factor will remain 0 and it will not impact energy estimation or scheduling decisions. Change-Id: Ibc61f2bf4fddf186695b72b262e602a6e8bfde37 Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>
\| *	sched: add arch_scale_min_freq_capacity to track minimum capacity caps	Ionela Voinescu	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the minimum capacity of a group is capped by userspace or internal dependencies which are not otherwise visible to the scheduler, we need a way to see these and integrate this information into the energy calculations and task placement decisions we make. Add arch_scale_min_freq_capacity to determine the lowest capacity which a specific cpu can provide under the current set of known constraints. Change-Id: Ied4a1dc0982bbf42cb5ea2f27201d4363db59705 Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
\| *	sched/fair: introduce an arch scaling function for max frequency capping	Dietmar Eggemann	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The max frequency scaling factor is defined as: max_freq_scale = policy_max_freq / cpuinfo_max_freq To be able to scale the cpu capacity by this factor introduce a call to the new arch scaling function arch_scale_max_freq_capacity() in update_cpu_capacity() and provide a default implementation which returns SCHED_CAPACITY_SCALE. Another subsystem (e.g. cpufreq) can overwrite this default implementation, exactly as for frequency and cpu invariance. It has to be enabled by the arch by defining arch_scale_max_freq_capacity to the actual implementation. Change-Id: I266cd1f4c1c82f54b80063c36aa5f7662599dd28 Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>
\| *	sched/fair: reduce rounding errors in energy computations	Patrick Bellasi	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The SG's energy is obtained by adding busy and idle contributions which are computed by considering a proper fraction of the SCHED_CAPACITY_SCALE defined by the SG's utilizations. By scaling each and every contribution conputed we risk to accumulate rounding errors which can results into a non null energy_delta also in cases when the same total accomulated utilization is differently distributed among different CPUs. To reduce rouding errors, this patch accumulated non-scaled busy/idle energy contributions for each visited SG, and scale each of them just one time at the end. Change-Id: Idf8367fee0ac11938c6436096f0c1b2d630210d2 Suggested-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>
\| *	sched/fair: re-factor energy_diff to use a single (extensible) energy_env	Patrick Bellasi	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The energy_env data structure is used to cache values required by multiple different functions involved in energy_diff computation. Some of these functions require additional parameters which can be easily embedded into the energy_env itself. The current implementation of energy_diff hardcodes the usage of two different energy_env structures to estimate and compare the energy consumption related to a "before" and an "after" CPU. Moreover, it does this energy estimation by walking multiple times the SDs/SGs data structures. A better design can be envisioned by better using the energy_env structure to support a more efficient and concurrent evaluation of multiple schedule candidates. To this purpose, this patch provides a complete re-factoring of the energy_diff implementation to: 1. use a single energy_env structure for the evaluation of all the candidate CPUs 2. walk just one time the SDs/SGs, thus improving the overall performance to compute the energy estimation for each CPU candidate specified by the single used energy_env 3. simplify the code (at least if you look at the new version and not at this re-factoring patch) thus providing a more clean code to maintain and extend for additional features This patch updated all the clients of energy_env to use only the data provided by this structure and an index for one of its CPUs candidates. Embedding everything within the energy env will make it simple to add tracepoints for this new version, which can easily provide an holistic view on how energy_diff evaluated the proposed CPU candidates. The new proposed structure, for both "struct energy_env" and the functions using it, is designed in such a way to easily accommodate additional further extensions (e.g. SchedTune filtering) without requiring an additional big re-factoring of these core functions. Finally, the usage of a CPUs candidate array, embedded into the energy_diff structure, allows also to seamless extend the exploration of multiple candidate CPUs, for example to support the comparison of a spread-vs-packing strategy. Change-Id: Ic04ffb6848b2c763cf1788767f22c6872eb12bee Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> [reworked find_new_capacity() and enforced the respect of find_best_target() selection order] Signed-off-by: Quentin Perret <quentin.perret@arm.com> [@0ctobot: Adapted for kernel.lnx.4.4.r35-rel] Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
\| *	sched/fair: cleanup select_energy_cpu_brute to be more consistent	Patrick Bellasi	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current definition of select_energy_cpu_brute is a bit confusing in the definition of the value for the target_cpu to be returned as wakeup CPU for the specified task. This cleanup the code by ensuring that we always set target_cpu right before returning it. rcu_read_lock and check on *sd!=NULL are also moved around to be exactly where they are required. Change-Id: I70a4b558b3624a13395da1a87ddc0776fd1d6641 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>
\| *	sched/fair: remove capacity tracking from energy_diff	Patrick Bellasi	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In preparation for the energy_diff refactoring, let's remove all the SchedTune specific bits which are used to keep track of the capacity variations requited by the PESpace filtering. This removes also the energy_normalization function and the wrapper of energy_diff which is used to trigger a PESpace filtering by schedtune_accept_deltas(). The remaining code is the "original" energy_diff function which looks just at the energy variations to compare prev_cpu vs next_cpu. Change-Id: I4fb1d1c5ba45a364e6db9ab8044969349aba0307 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>
\| *	sched/fair: remove energy_diff tracepoint in preparation to re-factoring	Patrick Bellasi	2023-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The format of the energy_diff tracepoint is going to be changed by the following energ_diff refactoring patches. Let's remove it now to start from a clean slate. Change-Id: Id4f537ed60d90a7ddcca0a29a49944bfacb85c8c Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>