summaryrefslogtreecommitdiff
path: root/kernel (follow)
Commit message (Collapse)AuthorAge
* BACKPORT: exit_thread: accept a task parameter to be exitedJiri Slaby2018-02-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to call exit_thread from copy_process in a fail path. So make it accept task_struct as a parameter. [v2] * s390: exit_thread_runtime_instr doesn't make sense to be called for non-current tasks. * arm: fix the comment in vfp_thread_copy * change 'me' to 'tsk' for task_struct * now we can change only archs that actually have exit_thread [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: "David S. Miller" <davem@davemloft.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Liqin <liqin.linux@gmail.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Howells <dhowells@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Hogan <james.hogan@imgtec.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Jonas Bonn <jonas@southpole.se> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: Ley Foon Tan <lftan@altera.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mikael Starvik <starvik@axis.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@arm.linux.org.uk> Cc: Steven Miao <realmz6@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit e64646946ed32902fd597fa6e514b1da84642de3) Conflicts: arch/s390/kernel/process.c Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
* Merge 4.4.115 into android-4.4Greg Kroah-Hartman2018-02-03
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.115 loop: fix concurrent lo_open/lo_release bpf: fix branch pruning logic x86: bpf_jit: small optimization in emit_bpf_tail_call() bpf: fix bpf_tail_call() x64 JIT bpf: introduce BPF_JIT_ALWAYS_ON config bpf: arsh is not supported in 32 bit alu thus reject it bpf: avoid false sharing of map refcount with max_entries bpf: fix divides by zero bpf: fix 32-bit divide by zero bpf: reject stores into ctx via st and xadd x86/pti: Make unpoison of pgd for trusted boot work for real kaiser: fix intel_bts perf crashes ALSA: seq: Make ioctls race-free crypto: aesni - handle zero length dst buffer crypto: af_alg - whitelist mask and type power: reset: zx-reboot: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE gpio: iop: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE gpio: ath79: add missing MODULE_DESCRIPTION/LICENSE mtd: nand: denali_pci: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE igb: Free IRQs when device is hotplugged KVM: x86: emulator: Return to user-mode on L1 CPL=0 emulation failure KVM: x86: Don't re-execute instruction when not passing CR2 value KVM: X86: Fix operand/address-size during instruction decoding KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered KVM: x86: ioapic: Preserve read-only values in the redirection table ACPI / bus: Leave modalias empty for devices which are not present cpufreq: Add Loongson machine dependencies bcache: check return value of register_shrinker drm/amdgpu: Fix SDMA load/unload sequence on HWS disabled mode drm/amdkfd: Fix SDMA ring buffer size calculation drm/amdkfd: Fix SDMA oversubsription handling openvswitch: fix the incorrect flow action alloc size mac80211: fix the update of path metric for RANN frame btrfs: fix deadlock when writing out space cache KVM: VMX: Fix rflags cache during vCPU reset xen-netfront: remove warning when unloading module nfsd: CLOSE SHOULD return the invalid special stateid for NFSv4.x (x>0) nfsd: Ensure we check stateid validity in the seqid operation checks grace: replace BUG_ON by WARN_ONCE in exit_net hook nfsd: check for use of the closed special stateid lockd: fix "list_add double add" caused by legacy signal interface hwmon: (pmbus) Use 64bit math for DIRECT format values net: ethernet: xilinx: Mark XILINX_LL_TEMAC broken on 64-bit quota: Check for register_shrinker() failure. SUNRPC: Allow connect to return EHOSTUNREACH kmemleak: add scheduling point to kmemleak_scan() drm/omap: Fix error handling path in 'omap_dmm_probe()' xfs: ubsan fixes scsi: aacraid: Prevent crash in case of free interrupt during scsi EH path scsi: ufs: ufshcd: fix potential NULL pointer dereference in ufshcd_config_vreg media: usbtv: add a new usbid usb: gadget: don't dereference g until after it has been null checked staging: rtl8188eu: Fix incorrect response to SIOCGIWESSID usb: option: Add support for FS040U modem USB: serial: pl2303: new device id for Chilitag USB: cdc-acm: Do not log urb submission errors on disconnect CDC-ACM: apply quirk for card reader USB: serial: io_edgeport: fix possible sleep-in-atomic usbip: prevent bind loops on devices attached to vhci_hcd usbip: list: don't list devices attached to vhci_hcd USB: serial: simple: add Motorola Tetra driver usb: f_fs: Prevent gadget unbind if it is already unbound usb: uas: unconditionally bring back host after reset selinux: general protection fault in sock_has_perm serial: imx: Only wakeup via RTSDEN bit if the system has RTS/CTS spi: imx: do not access registers while clocks disabled Linux 4.4.115 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * bpf: reject stores into ctx via st and xaddDaniel Borkmann2018-02-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ upstream commit f37a8cb84cce18762e8f86a70bd6a49a66ab964c ] Alexei found that verifier does not reject stores into context via BPF_ST instead of BPF_STX. And while looking at it, we also should not allow XADD variant of BPF_STX. The context rewriter is only assuming either BPF_LDX_MEM- or BPF_STX_MEM-type operations, thus reject anything other than that so that assumptions in the rewriter properly hold. Add test cases as well for BPF selftests. Fixes: d691f9e8d440 ("bpf: allow programs to write to certain skb fields") Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * bpf: fix 32-bit divide by zeroAlexei Starovoitov2018-02-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ upstream commit 68fda450a7df51cff9e5a4d4a4d9d0d5f2589153 ] due to some JITs doing if (src_reg == 0) check in 64-bit mode for div/mod operations mask upper 32-bits of src register before doing the check Fixes: 622582786c9e ("net: filter: x86: internal BPF JIT") Fixes: 7a12b5031c6b ("sparc64: Add eBPF JIT.") Reported-by: syzbot+48340bb518e88849e2e3@syzkaller.appspotmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * bpf: fix divides by zeroEric Dumazet2018-02-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ upstream commit c366287ebd698ef5e3de300d90cd62ee9ee7373e ] Divides by zero are not nice, lets avoid them if possible. Also do_div() seems not needed when dealing with 32bit operands, but this seems a minor detail. Fixes: bd4cf0ed331a ("net: filter: rework/optimize internal BPF interpreter's instruction set") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * bpf: arsh is not supported in 32 bit alu thus reject itDaniel Borkmann2018-02-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ upstream commit 7891a87efc7116590eaba57acc3c422487802c6f ] The following snippet was throwing an 'unknown opcode cc' warning in BPF interpreter: 0: (18) r0 = 0x0 2: (7b) *(u64 *)(r10 -16) = r0 3: (cc) (u32) r0 s>>= (u32) r0 4: (95) exit Although a number of JITs do support BPF_ALU | BPF_ARSH | BPF_{K,X} generation, not all of them do and interpreter does neither. We can leave existing ones and implement it later in bpf-next for the remaining ones, but reject this properly in verifier for the time being. Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)") Reported-by: syzbot+93c4904c5c70348a6890@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * bpf: introduce BPF_JIT_ALWAYS_ON configAlexei Starovoitov2018-02-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ upstream commit 290af86629b25ffd1ed6232c4e9107da031705cb ] The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715. A quote from goolge project zero blog: "At this point, it would normally be necessary to locate gadgets in the host kernel code that can be used to actually leak data by reading from an attacker-controlled location, shifting and masking the result appropriately and then using the result of that as offset to an attacker-controlled address for a load. But piecing gadgets together and figuring out which ones work in a speculation context seems annoying. So instead, we decided to use the eBPF interpreter, which is built into the host kernel - while there is no legitimate way to invoke it from inside a VM, the presence of the code in the host kernel's text section is sufficient to make it usable for the attack, just like with ordinary ROP gadgets." To make attacker job harder introduce BPF_JIT_ALWAYS_ON config option that removes interpreter from the kernel in favor of JIT-only mode. So far eBPF JIT is supported by: x64, arm64, arm32, sparc64, s390, powerpc64, mips64 The start of JITed program is randomized and code page is marked as read-only. In addition "constant blinding" can be turned on with net.core.bpf_jit_harden v2->v3: - move __bpf_prog_ret0 under ifdef (Daniel) v1->v2: - fix init order, test_bpf and cBPF (Daniel's feedback) - fix offloaded bpf (Jakub's feedback) - add 'return 0' dummy in case something can invoke prog->bpf_func - retarget bpf tree. For bpf-next the patch would need one extra hunk. It will be sent when the trees are merged back to net-next Considered doing: int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT; but it seems better to land the patch as-is and in bpf-next remove bpf_jit_enable global variable from all JITs, consolidate in one place and remove this jit_init() function. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * bpf: fix bpf_tail_call() x64 JITAlexei Starovoitov2018-02-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ upstream commit 90caccdd8cc0215705f18b92771b449b01e2474a ] - bpf prog_array just like all other types of bpf array accepts 32-bit index. Clarify that in the comment. - fix x64 JIT of bpf_tail_call which was incorrectly loading 8 instead of 4 bytes - tighten corresponding check in the interpreter to stay consistent The JIT bug can be triggered after introduction of BPF_F_NUMA_NODE flag in commit 96eabe7a40aa in 4.14. Before that the map_flags would stay zero and though JIT code is wrong it will check bounds correctly. Hence two fixes tags. All other JITs don't have this problem. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Fixes: 96eabe7a40aa ("bpf: Allow selecting numa node during map creation") Fixes: b52f00e6a715 ("x86: bpf_jit: implement bpf_tail_call() helper") Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * bpf: fix branch pruning logicAlexei Starovoitov2018-02-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit c131187db2d3fa2f8bf32fdf4e9a4ef805168467 ] when the verifier detects that register contains a runtime constant and it's compared with another constant it will prune exploration of the branch that is guaranteed not to be taken at runtime. This is all correct, but malicious program may be constructed in such a way that it always has a constant comparison and the other branch is never taken under any conditions. In this case such path through the program will not be explored by the verifier. It won't be taken at run-time either, but since all instructions are JITed the malicious program may cause JITs to complain about using reserved fields, etc. To fix the issue we have to track the instructions explored by the verifier and sanitize instructions that are dead at run time with NOPs. We cannot reject such dead code, since llvm generates it for valid C code, since it doesn't do as much data flow analysis as the verifier does. Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | ANDROID: sched/rt: schedtune: Add boost retention to RTJoel Fernandes2018-02-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Boosted RT tasks can be deboosted quickly, this makes boost usless for RT tasks and causes lots of glitching. Use timers to prevent de-boost too soon and wait for long enough such that next enqueue happens after a threshold. While this can be solved in the governor, there are following advantages: - The approach used is governor-independent - Reduces boost group lock contention for frequently sleepers/wakers Note: Fixed build breakage due to schedfreq dependency which isn't used for RT anymore. Bug: 30210506 Change-Id: I428a2695cac06cc3458cdde0dea72315e4e66c00 Signed-off-by: Joel Fernandes <joelaf@google.com>
* | Merge 4.4.114 into android-4.4Greg Kroah-Hartman2018-01-31
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.114 x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels usbip: prevent vhci_hcd driver from leaking a socket pointer address usbip: Fix implicit fallthrough warning usbip: Fix potential format overflow in userspace tools x86/microcode/intel: Fix BDW late-loading revision check x86/cpu/intel: Introduce macros for Intel family numbers x86/retpoline: Fill RSB on context switch for affected CPUs sched/deadline: Use the revised wakeup rule for suspending constrained dl tasks can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once PM / sleep: declare __tracedata symbols as char[] rather than char time: Avoid undefined behaviour in ktime_add_safe() timers: Plug locking race vs. timer migration Prevent timer value 0 for MWAITX drivers: base: cacheinfo: fix x86 with CONFIG_OF enabled drivers: base: cacheinfo: fix boot error message when acpi is enabled PCI: layerscape: Add "fsl,ls2085a-pcie" compatible ID PCI: layerscape: Fix MSG TLP drop setting mmc: sdhci-of-esdhc: add/remove some quirks according to vendor version fs/select: add vmalloc fallback for select(2) mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack hwpoison, memcg: forcibly uncharge LRU pages cma: fix calculation of aligned offset mm, page_alloc: fix potential false positive in __zone_watermark_ok ipc: msg, make msgrcv work with LONG_MIN x86/ioapic: Fix incorrect pointers in ioapic_setup_resources() ACPI / processor: Avoid reserving IO regions too early ACPI / scan: Prefer devices without _HID/_CID for _ADR matching ACPICA: Namespace: fix operand cache leak netfilter: x_tables: speed up jump target validation netfilter: arp_tables: fix invoking 32bit "iptable -P INPUT ACCEPT" failed in 64bit kernel netfilter: nf_dup_ipv6: set again FLOWI_FLAG_KNOWN_NH at flowi6_flags netfilter: nf_ct_expect: remove the redundant slash when policy name is empty netfilter: nfnetlink_queue: reject verdict request from different portid netfilter: restart search if moved to other chain netfilter: nf_conntrack_sip: extend request line validation netfilter: use fwmark_reflect in nf_send_reset netfilter: fix IS_ERR_VALUE usage netfilter: nfnetlink_cthelper: Add missing permission checks netfilter: xt_osf: Add missing permission checks ext2: Don't clear SGID when inheriting ACLs reiserfs: fix race in prealloc discard reiserfs: don't preallocate blocks for extended attributes reiserfs: Don't clear SGID when inheriting ACLs fs/fcntl: f_setown, avoid undefined behaviour scsi: libiscsi: fix shifting of DID_REQUEUE host byte Revert "module: Add retpoline tag to VERMAGIC" Input: trackpoint - force 3 buttons if 0 button is reported usb: usbip: Fix possible deadlocks reported by lockdep usbip: fix stub_rx: get_pipe() to validate endpoint number usbip: fix stub_rx: harden CMD_SUBMIT path to handle malicious input usbip: prevent leaking socket pointer address in messages um: link vmlinux with -no-pie vsyscall: Fix permissions for emulate mode with KAISER/PTI eventpoll.h: add missing epoll event masks x86/microcode/intel: Extend BDW late-loading further with LLC size check hrtimer: Reset hrtimer cpu base proper on CPU hotplug dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL ipv6: fix udpv6 sendmsg crash caused by too small MTU ipv6: ip6_make_skb() needs to clear cork.base.dst lan78xx: Fix failure in USB Full Speed net: igmp: fix source address check for IGMPv3 reports tcp: __tcp_hdrlen() helper net: qdisc_pkt_len_init() should be more robust pppoe: take ->needed_headroom of lower device into account on xmit r8169: fix memory corruption on retrieval of hardware statistics. sctp: do not allow the v4 socket to bind a v4mapped v6 address sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf vmxnet3: repair memory leak net: Allow neigh contructor functions ability to modify the primary_key ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY flow_dissector: properly cap thoff field net: tcp: close sock if net namespace is exiting nfsd: auth: Fix gid sorting when rootsquash enabled Linux 4.4.114 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * hrtimer: Reset hrtimer cpu base proper on CPU hotplugThomas Gleixner2018-01-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d5421ea43d30701e03cadc56a38854c36a8b4433 upstream. The hrtimer interrupt code contains a hang detection and mitigation mechanism, which prevents that a long delayed hrtimer interrupt causes a continous retriggering of interrupts which prevent the system from making progress. If a hang is detected then the timer hardware is programmed with a certain delay into the future and a flag is set in the hrtimer cpu base which prevents newly enqueued timers from reprogramming the timer hardware prior to the chosen delay. The subsequent hrtimer interrupt after the delay clears the flag and resumes normal operation. If such a hang happens in the last hrtimer interrupt before a CPU is unplugged then the hang_detected flag is set and stays that way when the CPU is plugged in again. At that point the timer hardware is not armed and it cannot be armed because the hang_detected flag is still active, so nothing clears that flag. As a consequence the CPU does not receive hrtimer interrupts and no timers expire on that CPU which results in RCU stalls and other malfunctions. Clear the flag along with some other less critical members of the hrtimer cpu base to ensure starting from a clean state when a CPU is plugged in. Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the root cause of that hard to reproduce heisenbug. Once understood it's trivial and certainly justifies a brown paperbag. Fixes: 41d2e4949377 ("hrtimer: Tune hrtimer_interrupt hang logic") Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Sewior <bigeasy@linutronix.de> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * timers: Plug locking race vs. timer migrationThomas Gleixner2018-01-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit b831275a3553c32091222ac619cfddd73a5553fb upstream. Linus noticed that lock_timer_base() lacks a READ_ONCE() for accessing the timer flags. As a consequence the compiler is allowed to reload the flags between the initial check for TIMER_MIGRATION and the following timer base computation and the spin lock of the base. While this has not been observed (yet), we need to make sure that it never happens. Fixes: 0eeda71bc30d ("timer: Replace timer base by a cpu index") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610241711220.4983@nanos Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Mike Galbraith <mgalbraith@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * time: Avoid undefined behaviour in ktime_add_safe()Vegard Nossum2018-01-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 979515c5645830465739254abc1b1648ada41518 upstream. I ran into this: ================================================================================ UBSAN: Undefined behaviour in kernel/time/hrtimer.c:310:16 signed integer overflow: 9223372036854775807 + 50000 cannot be represented in type 'long long int' CPU: 2 PID: 4798 Comm: trinity-c2 Not tainted 4.8.0-rc1+ #91 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 0000000000000000 ffff88010ce6fb88 ffffffff82344740 0000000041b58ab3 ffffffff84f97a20 ffffffff82344694 ffff88010ce6fbb0 ffff88010ce6fb60 000000000000c350 ffff88010ce6f968 dffffc0000000000 ffffffff857bc320 Call Trace: [<ffffffff82344740>] dump_stack+0xac/0xfc [<ffffffff82344694>] ? _atomic_dec_and_lock+0xc4/0xc4 [<ffffffff8242df78>] ubsan_epilogue+0xd/0x8a [<ffffffff8242e6b4>] handle_overflow+0x202/0x23d [<ffffffff8242e4b2>] ? val_to_string.constprop.6+0x11e/0x11e [<ffffffff8236df71>] ? timerqueue_add+0x151/0x410 [<ffffffff81485c48>] ? hrtimer_start_range_ns+0x3b8/0x1380 [<ffffffff81795631>] ? memset+0x31/0x40 [<ffffffff8242e6fd>] __ubsan_handle_add_overflow+0xe/0x10 [<ffffffff81488ac9>] hrtimer_nanosleep+0x5d9/0x790 [<ffffffff814884f0>] ? hrtimer_init_sleeper+0x80/0x80 [<ffffffff813a9ffb>] ? __might_sleep+0x5b/0x260 [<ffffffff8148be10>] common_nsleep+0x20/0x30 [<ffffffff814906c7>] SyS_clock_nanosleep+0x197/0x210 [<ffffffff81490530>] ? SyS_clock_getres+0x150/0x150 [<ffffffff823c7113>] ? __this_cpu_preempt_check+0x13/0x20 [<ffffffff8162ef60>] ? __context_tracking_exit.part.3+0x30/0x1b0 [<ffffffff81490530>] ? SyS_clock_getres+0x150/0x150 [<ffffffff81007bd3>] do_syscall_64+0x1b3/0x4b0 [<ffffffff845f85aa>] entry_SYSCALL64_slow_path+0x25/0x25 ================================================================================ Add a new ktime_add_unsafe() helper which doesn't check for overflow, but doesn't throw a UBSAN warning when it does overflow either. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * sched/deadline: Use the revised wakeup rule for suspending constrained dl tasksDaniel Bristot de Oliveira2018-01-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3effcb4247e74a51f5d8b775a1ee4abf87cc089a upstream. We have been facing some problems with self-suspending constrained deadline tasks. The main reason is that the original CBS was not designed for such sort of tasks. One problem reported by Xunlei Pang takes place when a task suspends, and then is awakened before the deadline, but so close to the deadline that its remaining runtime can cause the task to have an absolute density higher than allowed. In such situation, the original CBS assumes that the task is facing an early activation, and so it replenishes the task and set another deadline, one deadline in the future. This rule works fine for implicit deadline tasks. Moreover, it allows the system to adapt the period of a task in which the external event source suffered from a clock drift. However, this opens the window for bandwidth leakage for constrained deadline tasks. For instance, a task with the following parameters: runtime = 5 ms deadline = 7 ms [density] = 5 / 7 = 0.71 period = 1000 ms If the task runs for 1 ms, and then suspends for another 1ms, it will be awakened with the following parameters: remaining runtime = 4 laxity = 5 presenting a absolute density of 4 / 5 = 0.80. In this case, the original CBS would assume the task had an early wakeup. Then, CBS will reset the runtime, and the absolute deadline will be postponed by one relative deadline, allowing the task to run. The problem is that, if the task runs this pattern forever, it will keep receiving bandwidth, being able to run 1ms every 2ms. Following this behavior, the task would be able to run 500 ms in 1 sec. Thus running more than the 5 ms / 1 sec the admission control allowed it to run. Trying to address the self-suspending case, Luca Abeni, Giuseppe Lipari, and Juri Lelli [1] revisited the CBS in order to deal with self-suspending tasks. In the new approach, rather than replenishing/postponing the absolute deadline, the revised wakeup rule adjusts the remaining runtime, reducing it to fit into the allowed density. A revised version of the idea is: At a given time t, the maximum absolute density of a task cannot be higher than its relative density, that is: runtime / (deadline - t) <= dl_runtime / dl_deadline Knowing the laxity of a task (deadline - t), it is possible to move it to the other side of the equality, thus enabling to define max remaining runtime a task can use within the absolute deadline, without over-running the allowed density: runtime = (dl_runtime / dl_deadline) * (deadline - t) For instance, in our previous example, the task could still run: runtime = ( 5 / 7 ) * 5 runtime = 3.57 ms Without causing damage for other deadline tasks. It is note worthy that the laxity cannot be negative because that would cause a negative runtime. Thus, this patch depends on the patch: df8eac8cafce ("sched/deadline: Throttle a constrained deadline task activated after the deadline") Which throttles a constrained deadline task activated after the deadline. Finally, it is also possible to use the revised wakeup rule for all other tasks, but that would require some more discussions about pros and cons. [The main difference from the original commit is that the BW_SHIFT define was not present yet. As BW_SHIFT was introduced in a new feature, I just used the value (20), likewise we used to use before the #define. Other changes were required because of comments. - bistrot] Reported-by: Xunlei Pang <xpang@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> [peterz: replaced dl_is_constrained with dl_is_implicit] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luca Abeni <luca.abeni@santannapisa.it> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Romulo Silva de Oliveira <romulo.deoliveira@ufsc.br> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tommaso Cucinotta <tommaso.cucinotta@sssup.it> Link: http://lkml.kernel.org/r/5c800ab3a74a168a84ee5f3f84d12a02e11383be.1495803804.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | ANDROID: sched: EAS: check energy_aware() before calling ↵Ke Wang2018-01-29
| | | | | | | | | | | | | | | | | | | | | | select_energy_cpu_brute() in up-migrate path In up-migrate path, select_energy_cpu_brute() was called directly without checking energy_aware(). This will make select_energy_cpu_brute() always worked even disabling energy_aware() on the asymmetric cpu capacity system. Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>
* | Merge 4.4.113 into android-4.4Greg Kroah-Hartman2018-01-23
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.113 gcov: disable for COMPILE_TEST x86/cpu/AMD: Make LFENCE a serializing instruction x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier x86/asm: Use register variable to get stack pointer value x86/kbuild: enable modversions for symbols exported from asm x86/asm: Make asm/alternative.h safe from assembly EXPORT_SYMBOL() for asm kconfig.h: use __is_defined() to check if MODULE is defined x86/retpoline: Add initial retpoline support x86/spectre: Add boot time option to select Spectre v2 mitigation x86/retpoline/crypto: Convert crypto assembler indirect jumps x86/retpoline/entry: Convert entry assembler indirect jumps x86/retpoline/ftrace: Convert ftrace assembler indirect jumps x86/retpoline/hyperv: Convert assembler indirect jumps x86/retpoline/xen: Convert Xen hypercall indirect jumps x86/retpoline/checksum32: Convert assembler indirect jumps x86/retpoline/irq32: Convert assembler indirect jumps x86/retpoline: Fill return stack buffer on vmexit x86/retpoline: Remove compile time warning scsi: sg: disable SET_FORCE_LOW_DMA futex: Prevent overflow by strengthen input validation ALSA: pcm: Remove yet superfluous WARN_ON() ALSA: hda - Apply headphone noise quirk for another Dell XPS 13 variant ALSA: hda - Apply the existing quirk to iMac 14,1 af_key: fix buffer overread in verify_address_len() af_key: fix buffer overread in parse_exthdrs() scsi: hpsa: fix volume offline state sched/deadline: Zero out positive runtime after throttling constrained tasks x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros module: Add retpoline tag to VERMAGIC pipe: avoid round_pipe_size() nr_pages overflow on 32-bit x86/apic/vector: Fix off by one in error path Input: 88pm860x-ts - fix child-node lookup Input: twl6040-vibra - fix DT node memory management Input: twl6040-vibra - fix child-node lookup Input: twl4030-vibra - fix sibling-node lookup tracing: Fix converting enum's from the map in trace_event_eval_update() phy: work around 'phys' references to usb-nop-xceiv devices ARM: dts: kirkwood: fix pin-muxing of MPP7 on OpenBlocks A7 can: peak: fix potential bug in packet fragmentation libata: apply MAX_SEC_1024 to all LITEON EP1 series devices dm btree: fix serious bug in btree_split_beneath() dm thin metadata: THIN_MAX_CONCURRENT_LOCKS should be 6 arm64: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls x86/cpu, x86/pti: Do not enable PTI on AMD processors kbuild: modversions for EXPORT_SYMBOL() for asm x86/mce: Make machine check speculation protected retpoline: Introduce start/end markers of indirect thunk kprobes/x86: Blacklist indirect thunk functions for kprobes kprobes/x86: Disable optimizing on the function jumps to indirect thunk x86/pti: Document fix wrong index x86/retpoline: Optimize inline assembler for vmexit_fill_RSB MIPS: AR7: ensure the port type's FCR value is used Linux 4.4.113 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * tracing: Fix converting enum's from the map in trace_event_eval_update()Steven Rostedt (VMware)2018-01-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 1ebe1eaf2f02784921759992ae1fde1a9bec8fd0 upstream. Since enums do not get converted by the TRACE_EVENT macro into their values, the event format displaces the enum name and not the value. This breaks tools like perf and trace-cmd that need to interpret the raw binary data. To solve this, an enum map was created to convert these enums into their actual numbers on boot up. This is done by TRACE_EVENTS() adding a TRACE_DEFINE_ENUM() macro. Some enums were not being converted. This was caused by an optization that had a bug in it. All calls get checked against this enum map to see if it should be converted or not, and it compares the call's system to the system that the enum map was created under. If they match, then they call is processed. To cut down on the number of iterations needed to find the maps with a matching system, since calls and maps are grouped by system, when a match is made, the index into the map array is saved, so that the next call, if it belongs to the same system as the previous call, could start right at that array index and not have to scan all the previous arrays. The problem was, the saved index was used as the variable to know if this is a call in a new system or not. If the index was zero, it was assumed that the call is in a new system and would keep incrementing the saved index until it found a matching system. The issue arises when the first matching system was at index zero. The next map, if it belonged to the same system, would then think it was the first match and increment the index to one. If the next call belong to the same system, it would begin its search of the maps off by one, and miss the first enum that should be converted. This left a single enum not converted properly. Also add a comment to describe exactly what that index was for. It took me a bit too long to figure out what I was thinking when debugging this issue. Link: http://lkml.kernel.org/r/717BE572-2070-4C1E-9902-9F2E0FEDA4F8@oracle.com Fixes: 0c564a538aa93 ("tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values") Reported-by: Chuck Lever <chuck.lever@oracle.com> Teste-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * sched/deadline: Zero out positive runtime after throttling constrained tasksXunlei Pang2018-01-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit ae83b56a56f8d9643dedbee86b457fa1c5d42f59 upstream. When a contrained task is throttled by dl_check_constrained_dl(), it may carry the remaining positive runtime, as a result when dl_task_timer() fires and calls replenish_dl_entity(), it will not be replenished correctly due to the positive dl_se->runtime. This patch assigns its runtime to 0 if positive after throttling. Signed-off-by: Xunlei Pang <xlpang@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luca Abeni <luca.abeni@santannapisa.it> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: df8eac8cafce ("sched/deadline: Throttle a constrained deadline task activated after the deadline) Link: http://lkml.kernel.org/r/1494421417-27550-1-git-send-email-xlpang@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * futex: Prevent overflow by strengthen input validationLi Jinyue2018-01-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit fbe0e839d1e22d88810f3ee3e2f1479be4c0aa4a upstream. UBSAN reports signed integer overflow in kernel/futex.c: UBSAN: Undefined behaviour in kernel/futex.c:2041:18 signed integer overflow: 0 - -2147483648 cannot be represented in type 'int' Add a sanity check to catch negative values of nr_wake and nr_requeue. Signed-off-by: Li Jinyue <lijinyue@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: peterz@infradead.org Cc: dvhart@infradead.org Link: https://lkml.kernel.org/r/1513242294-31786-1-git-send-email-lijinyue@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * gcov: disable for COMPILE_TESTArnd Bergmann2018-01-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit cc622420798c4bcf093785d872525087a7798db9 upstream. Enabling gcov is counterproductive to compile testing: it significantly increases the kernel image size, compile time, and it produces lots of false positive "may be used uninitialized" warnings as the result of missed optimizations. This is in line with how UBSAN_SANITIZE_ALL and PROFILE_ALL_BRANCHES work, both of which have similar problems. With an ARM allmodconfig kernel, I see the build time drop from 283 minutes CPU time to 225 minutes, and the vmlinux size drops from 43MB to 26MB. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Michal Marek <mmarek@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | sched: EAS: Initialize push_task as NULL to avoid direct reference on ↵Ke Wang2018-01-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | out_unlock path After applying up-migrate patches(dc626b2 sched: avoid pushing tasks to an offline CPU, 2da014c sched: Extend active balance to accept 'push_task' argument), leaving EAS disabled and doing a stability test which includes some random cpu plugin/plugout. There are two types crashes happened as below: TYPE 1: [ 2072.653091] c1 ------------[ cut here ]------------ [ 2072.653133] c1 WARNING: CPU: 1 PID: 13 at kernel/fork.c:252 __put_task_struct+0x30/0x124() [ 2072.653173] c1 CPU: 1 PID: 13 Comm: migration/1 Tainted: G W O 4.4.83-01066-g04c5403-dirty #17 [ 2072.653215] c1 [<c011141c>] (unwind_backtrace) from [<c010ced8>] (show_stack+0x20/0x24) [ 2072.653235] c1 [<c010ced8>] (show_stack) from [<c043d7f8>] (dump_stack+0xa8/0xe0) [ 2072.653255] c1 [<c043d7f8>] (dump_stack) from [<c012be04>] (warn_slowpath_common+0x98/0xc4) [ 2072.653273] c1 [<c012be04>] (warn_slowpath_common) from [<c012beec>] (warn_slowpath_null+0x2c/0x34) [ 2072.653291] c1 [<c012beec>] (warn_slowpath_null) from [<c01293b4>] (__put_task_struct+0x30/0x124) [ 2072.653310] c1 [<c01293b4>] (__put_task_struct) from [<c0166964>] (active_load_balance_cpu_stop+0x22c/0x314) [ 2072.653331] c1 [<c0166964>] (active_load_balance_cpu_stop) from [<c01c2604>] (cpu_stopper_thread+0x90/0x144) [ 2072.653352] c1 [<c01c2604>] (cpu_stopper_thread) from [<c014d80c>] (smpboot_thread_fn+0x258/0x270) [ 2072.653370] c1 [<c014d80c>] (smpboot_thread_fn) from [<c0149ee4>] (kthread+0x118/0x12c) [ 2072.653388] c1 [<c0149ee4>] (kthread) from [<c0108310>] (ret_from_fork+0x14/0x24) [ 2072.653400] c1 ---[ end trace 49c3d154890763fc ]--- [ 2072.653418] c1 Unable to handle kernel NULL pointer dereference at virtual address 00000000 ... [ 2072.832804] c1 [<c01ba00c>] (put_css_set) from [<c01be870>] (cgroup_free+0x6c/0x78) [ 2072.832823] c1 [<c01be870>] (cgroup_free) from [<c01293f8>] (__put_task_struct+0x74/0x124) [ 2072.832844] c1 [<c01293f8>] (__put_task_struct) from [<c0166964>] (active_load_balance_cpu_stop+0x22c/0x314) [ 2072.832860] c1 [<c0166964>] (active_load_balance_cpu_stop) from [<c01c2604>] (cpu_stopper_thread+0x90/0x144) [ 2072.832879] c1 [<c01c2604>] (cpu_stopper_thread) from [<c014d80c>] (smpboot_thread_fn+0x258/0x270) [ 2072.832896] c1 [<c014d80c>] (smpboot_thread_fn) from [<c0149ee4>] (kthread+0x118/0x12c) [ 2072.832914] c1 [<c0149ee4>] (kthread) from [<c0108310>] (ret_from_fork+0x14/0x24) [ 2072.832930] c1 Code: f57ff05b f590f000 e3e02000 e3a03001 (e1941f9f) [ 2072.839208] c1 ---[ end trace 49c3d154890763fd ]--- TYPE 2: [ 214.742695] c1 ------------[ cut here ]------------ [ 214.742709] c1 kernel BUG at kernel/smpboot.c:136! [ 214.742718] c1 Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM [ 214.748785] c1 CPU: 1 PID: 18 Comm: migration/2 Tainted: G W O 4.4.83-00912-g370f62c #1 [ 214.748805] c1 task: ef2d9680 task.stack: ee862000 [ 214.748821] c1 PC is at smpboot_thread_fn+0x168/0x270 [ 214.748832] c1 LR is at smpboot_thread_fn+0xe4/0x270 ... [ 214.821339] c1 [<c014d71c>] (smpboot_thread_fn) from [<c0149ee4>] (kthread+0x118/0x12c) [ 214.821363] c1 [<c0149ee4>] (kthread) from [<c0108310>] (ret_from_fork+0x14/0x24) [ 214.821378] c1 Code: e5950000 e5943010 e1500003 0a000000 (e7f001f2) [ 214.827676] c1 ---[ end trace da87539f59bab8de ]--- For the first type crash, the root cause is the push_task pointer will be used without initialization on the out_lock path. And maybe cpu hotplug in/out make this happen more easily. For the second type crash, it hits 'BUG_ON(td->cpu != smp_processor_id());' in smpboot_thread_fn(). It seems that OOPS was caused by migration/2 which actually running on cpu1. And I haven't found what actually happened. However, after this fix, the second type crash seems gone too. Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>
* | Merge 4.4.112 into android-4.4Greg Kroah-Hartman2018-01-17
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.112 dm bufio: fix shrinker scans when (nr_to_scan < retain_target) KVM: Fix stack-out-of-bounds read in write_mmio can: gs_usb: fix return value of the "set_bittiming" callback IB/srpt: Disable RDMA access by the initiator MIPS: Validate PR_SET_FP_MODE prctl(2) requests against the ABI of the task MIPS: Factor out NT_PRFPREG regset access helpers MIPS: Guard against any partial write attempt with PTRACE_SETREGSET MIPS: Consistently handle buffer counter with PTRACE_SETREGSET MIPS: Fix an FCSR access API regression with NT_PRFPREG and MSA MIPS: Also verify sizeof `elf_fpreg_t' with PTRACE_SETREGSET MIPS: Disallow outsized PTRACE_SETREGSET NT_PRFPREG regset accesses net/mac80211/debugfs.c: prevent build failure with CONFIG_UBSAN=y kvm: vmx: Scrub hardware GPRs at VM-exit x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST=n x86/acpi: Handle SCI interrupts above legacy space gracefully iommu/arm-smmu-v3: Don't free page table ops twice ALSA: pcm: Remove incorrect snd_BUG_ON() usages ALSA: pcm: Add missing error checks in OSS emulation plugin builder ALSA: pcm: Abort properly at pending signal in OSS read/write loops ALSA: pcm: Allow aborting mutex lock at OSS read/write loops ALSA: aloop: Release cable upon open error path ALSA: aloop: Fix inconsistent format due to incomplete rule ALSA: aloop: Fix racy hw constraints adjustment x86/acpi: Reduce code duplication in mp_override_legacy_irq() mm/compaction: fix invalid free_pfn and compact_cached_free_pfn mm/compaction: pass only pageblock aligned range to pageblock_pfn_to_page mm/page-writeback: fix dirty_ratelimit calculation mm/zswap: use workqueue to destroy pool zswap: don't param_set_charp while holding spinlock locks: don't check for race with close when setting OFD lock futex: Replace barrier() in unqueue_me() with READ_ONCE() locking/mutex: Allow next waiter lockless wakeup usbvision fix overflow of interfaces array usb: musb: ux500: Fix NULL pointer dereference at system PM r8152: fix the wake event r8152: use test_and_clear_bit r8152: adjust ALDPS function lan78xx: use skb_cow_head() to deal with cloned skbs sr9700: use skb_cow_head() to deal with cloned skbs smsc75xx: use skb_cow_head() to deal with cloned skbs cx82310_eth: use skb_cow_head() to deal with cloned skbs x86/mm/pat, /dev/mem: Remove superfluous error message hwrng: core - sleep interruptible in read sysrq: Fix warning in sysrq generated crash. xhci: Fix ring leak in failure path of xhci_alloc_virt_device() Revert "userfaultfd: selftest: vm: allow to build in vm/ directory" x86/pti/efi: broken conversion from efi to kernel page table 8021q: fix a memory leak for VLAN 0 device ip6_tunnel: disable dst caching if tunnel is dual-stack net: core: fix module type in sock_diag_bind RDS: Heap OOB write in rds_message_alloc_sgs() RDS: null pointer dereference in rds_atomic_free_op sh_eth: fix TSU resource handling sh_eth: fix SH7757 GEther initialization net: stmmac: enable EEE in MII, GMII or RGMII only ipv6: fix possible mem leaks in ipv6_make_skb() crypto: algapi - fix NULL dereference in crypto_remove_spawns() rbd: set max_segments to USHRT_MAX x86/microcode/intel: Extend BDW late-loading with a revision check KVM: x86: Add memory barrier on vmcs field lookup drm/vmwgfx: Potential off by one in vmw_view_add() kaiser: Set _PAGE_NX only if supported bpf: add bpf_patch_insn_single helper bpf: don't (ab)use instructions to store state bpf: move fixup_bpf_calls() function bpf: refactor fixup_bpf_calls() bpf: adjust insn_aux_data when patching insns bpf: prevent out-of-bounds speculation bpf, array: fix overflow in max_entries and undefined behavior in index_mask iscsi-target: Make TASK_REASSIGN use proper se_cmd->cmd_kref target: Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK USB: serial: cp210x: add IDs for LifeScan OneTouch Verio IQ USB: serial: cp210x: add new device ID ELV ALC 8xxx usb: misc: usb3503: make sure reset is low for at least 100us USB: fix usbmon BUG trigger usbip: remove kernel addresses from usb device and urb debug msgs staging: android: ashmem: fix a race condition in ASHMEM_SET_SIZE ioctl Bluetooth: Prevent stack info leak from the EFS element. uas: ignore UAS for Norelsys NS1068(X) chips e1000e: Fix e1000_check_for_copper_link_ich8lan return value. x86/Documentation: Add PTI description x86/cpu: Factor out application of forced CPU caps x86/cpufeatures: Make CPU bugs sticky x86/cpufeatures: Add X86_BUG_CPU_INSECURE x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN x86/cpufeatures: Add X86_BUG_SPECTRE_V[12] x86/cpu: Merge bugs.c and bugs_64.c sysfs/cpu: Add vulnerability folder x86/cpu: Implement CPU vulnerabilites sysfs functions sysfs/cpu: Fix typos in vulnerability documentation x86/alternatives: Fix optimize_nops() checking x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm selftests/x86: Add test_vsyscall Linux 4.4.112 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * bpf, array: fix overflow in max_entries and undefined behavior in index_maskDaniel Borkmann2018-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit bbeb6e4323dad9b5e0ee9f60c223dd532e2403b1 upstream. syzkaller tried to alloc a map with 0xfffffffd entries out of a userns, and thus unprivileged. With the recently added logic in b2157399cc98 ("bpf: prevent out-of-bounds speculation") we round this up to the next power of two value for max_entries for unprivileged such that we can apply proper masking into potentially zeroed out map slots. However, this will generate an index_mask of 0xffffffff, and therefore a + 1 will let this overflow into new max_entries of 0. This will pass allocation, etc, and later on map access we still enforce on the original attr->max_entries value which was 0xfffffffd, therefore triggering GPF all over the place. Thus bail out on overflow in such case. Moreover, on 32 bit archs roundup_pow_of_two() can also not be used, since fls_long(max_entries - 1) can result in 32 and 1UL << 32 in 32 bit space is undefined. Therefore, do this by hand in a 64 bit variable. This fixes all the issues triggered by syzkaller's reproducers. Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation") Reported-by: syzbot+b0efb8e572d01bce1ae0@syzkaller.appspotmail.com Reported-by: syzbot+6c15e9744f75f2364773@syzkaller.appspotmail.com Reported-by: syzbot+d2f5524fb46fd3b312ee@syzkaller.appspotmail.com Reported-by: syzbot+61d23c95395cc90dbc2b@syzkaller.appspotmail.com Reported-by: syzbot+0d363c942452cca68c01@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * bpf: prevent out-of-bounds speculationAlexei Starovoitov2018-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit b2157399cc9898260d6031c5bfe45fe137c1fbe7 upstream. Under speculation, CPUs may mis-predict branches in bounds checks. Thus, memory accesses under a bounds check may be speculated even if the bounds check fails, providing a primitive for building a side channel. To avoid leaking kernel data round up array-based maps and mask the index after bounds check, so speculated load with out of bounds index will load either valid value from the array or zero from the padded area. Unconditionally mask index for all array types even when max_entries are not rounded to power of 2 for root user. When map is created by unpriv user generate a sequence of bpf insns that includes AND operation to make sure that JITed code includes the same 'index & index_mask' operation. If prog_array map is created by unpriv user replace bpf_tail_call(ctx, map, index); with if (index >= max_entries) { index &= map->index_mask; bpf_tail_call(ctx, map, index); } (along with roundup to power 2) to prevent out-of-bounds speculation. There is secondary redundant 'if (index >= max_entries)' in the interpreter and in all JITs, but they can be optimized later if necessary. Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array) cannot be used by unpriv, so no changes there. That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on all architectures with and without JIT. v2->v3: Daniel noticed that attack potentially can be crafted via syscall commands without loading the program, so add masking to those paths as well. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * bpf: adjust insn_aux_data when patching insnsAlexei Starovoitov2018-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 8041902dae5299c1f194ba42d14383f734631009 upstream. convert_ctx_accesses() replaces single bpf instruction with a set of instructions. Adjust corresponding insn_aux_data while patching. It's needed to make sure subsequent 'for(all insn)' loops have matching insn and insn_aux_data. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * bpf: refactor fixup_bpf_calls()Alexei Starovoitov2018-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 79741b3bdec01a8628368fbcfccc7d189ed606cb upstream. reduce indent and make it iterate over instructions similar to convert_ctx_accesses(). Also convert hard BUG_ON into soft verifier error. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * bpf: move fixup_bpf_calls() functionAlexei Starovoitov2018-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e245c5c6a5656e4d61aa7bb08e9694fd6e5b2b9d upstream. no functional change. move fixup_bpf_calls() to verifier.c it's being refactored in the next patch Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * bpf: don't (ab)use instructions to store stateJakub Kicinski2018-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3df126f35f88dc76eea33769f85a3c3bb8ce6c6b upstream. Storing state in reserved fields of instructions makes it impossible to run verifier on programs already marked as read-only. Allocate and use an array of per-instruction state instead. While touching the error path rename and move existing jump target. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * bpf: add bpf_patch_insn_single helperDaniel Borkmann2018-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c237ee5eb33bf19fe0591c04ff8db19da7323a83 upstream. Move the functionality to patch instructions out of the verifier code and into the core as the new bpf_patch_insn_single() helper will be needed later on for blinding as well. No changes in functionality. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * locking/mutex: Allow next waiter lockless wakeupDavidlohr Bueso2018-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 1329ce6fbbe4536592dfcfc8d64d61bfeb598fe6 upstream. Make use of wake-queues and enable the wakeup to occur after releasing the wait_lock. This is similar to what we do with rtmutex top waiter, slightly shortening the critical region and allow other waiters to acquire the wait_lock sooner. In low contention cases it can also help the recently woken waiter to find the wait_lock available (fastpath) when it continues execution. Reviewed-by: Waiman Long <Waiman.Long@hpe.com> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ding Tianhong <dingtianhong@huawei.com> Cc: Jason Low <jason.low2@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Waiman Long <waiman.long@hpe.com> Cc: Will Deacon <Will.Deacon@arm.com> Link: http://lkml.kernel.org/r/20160125022343.GA3322@linux-uzut.site Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * futex: Replace barrier() in unqueue_me() with READ_ONCE()Jianyu Zhan2018-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 29b75eb2d56a714190a93d7be4525e617591077a upstream. Commit e91467ecd1ef ("bug in futex unqueue_me") introduced a barrier() in unqueue_me() to prevent the compiler from rereading the lock pointer which might change after a check for NULL. Replace the barrier() with a READ_ONCE() for the following reasons: 1) READ_ONCE() is a weaker form of barrier() that affects only the specific load operation, while barrier() is a general compiler level memory barrier. READ_ONCE() was not available at the time when the barrier was added. 2) Aside of that READ_ONCE() is descriptive and self explainatory while a barrier without comment is not clear to the casual reader. No functional change. [ tglx: Massaged changelog ] Signed-off-by: Jianyu Zhan <nasa4836@gmail.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Darren Hart <dvhart@linux.intel.com> Cc: dave@stgolabs.net Cc: peterz@infradead.org Cc: linux@rasmusvillemoes.dk Cc: akpm@linux-foundation.org Cc: fengguang.wu@intel.com Cc: bigeasy@linutronix.de Link: http://lkml.kernel.org/r/1457314344-5685-1-git-send-email-nasa4836@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge 4.4.111 into android-4.4Greg Kroah-Hartman2018-01-10
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.111 x86/kasan: Write protect kasan zero shadow kernel/acct.c: fix the acct->needcheck check in check_free_space() crypto: n2 - cure use after free crypto: chacha20poly1305 - validate the digest size crypto: pcrypt - fix freeing pcrypt instances sunxi-rsb: Include OF based modalias in device uevent fscache: Fix the default for fscache_maybe_release_page() kernel: make groups_sort calling a responsibility group_info allocators kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal() ARC: uaccess: dont use "l" gcc inline asm constraint modifier Input: elantech - add new icbody type 15 x86/microcode/AMD: Add support for fam17h microcode loading parisc: Fix alignment of pa_tlb_lock in assembly on 32-bit SMP kernel x86/tlb: Drop the _GPL from the cpu_tlbstate export genksyms: Handle string literals with spaces in reference files module: keep percpu symbols in module's symtab module: Issue warnings when tainting kernel proc: much faster /proc/vmstat Map the vsyscall page with _PAGE_USER Fix build error in vma.c Linux 4.4.111 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * module: Issue warnings when tainting kernelLibor Pechacek2018-01-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3205c36cf7d96024626f92d65f560035df1abcb2 upstream. While most of the locations where a kernel taint bit is set are accompanied with a warning message, there are two which set their bits silently. If the tainting module gets unloaded later on, it is almost impossible to tell what was the reason for setting the flag. Signed-off-by: Libor Pechacek <lpechacek@suse.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * module: keep percpu symbols in module's symtabMiroslav Benes2018-01-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e0224418516b4d8a6c2160574bac18447c354ef0 upstream. Currently, percpu symbols from .data..percpu ELF section of a module are not copied over and stored in final symtab array of struct module. Consequently such symbol cannot be returned via kallsyms API (for example kallsyms_lookup_name). This can be especially confusing when the percpu symbol is exported. Only its __ksymtab et al. are present in its symtab. The culprit is in layout_and_allocate() function where SHF_ALLOC flag is dropped for .data..percpu section. There is in fact no need to copy the section to final struct module, because kernel module loader allocates extra percpu section by itself. Unfortunately only symbols from SHF_ALLOC sections are copied due to a check in is_core_symbol(). The patch changes is_core_symbol() function to copy over also percpu symbols (their st_shndx points to .data..percpu ELF section). We do it only if CONFIG_KALLSYMS_ALL is set to be consistent with the rest of the function (ELF section is SHF_ALLOC but !SHF_EXECINSTR). Finally elf_type() returns type 'a' for a percpu symbol because its address is absolute. Signed-off-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in ↵Oleg Nesterov2018-01-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | complete_signal() commit 426915796ccaf9c2bd9bb06dc5702225957bc2e5 upstream. complete_signal() checks SIGNAL_UNKILLABLE before it starts to destroy the thread group, today this is wrong in many ways. If nothing else, fatal_signal_pending() should always imply that the whole thread group (except ->group_exit_task if it is not NULL) is killed, this check breaks the rule. After the previous changes we can rely on sig_task_ignored(); sig_fatal(sig) && SIGNAL_UNKILLABLE can only be true if we actually want to kill this task and sig == SIGKILL OR it is traced and debugger can intercept the signal. This should hopefully fix the problem reported by Dmitry. This test-case static int init(void *arg) { for (;;) pause(); } int main(void) { char stack[16 * 1024]; for (;;) { int pid = clone(init, stack + sizeof(stack)/2, CLONE_NEWPID | SIGCHLD, NULL); assert(pid > 0); assert(ptrace(PTRACE_ATTACH, pid, 0, 0) == 0); assert(waitpid(-1, NULL, WSTOPPED) == pid); assert(ptrace(PTRACE_DETACH, pid, 0, SIGSTOP) == 0); assert(syscall(__NR_tkill, pid, SIGKILL) == 0); assert(pid == wait(NULL)); } } triggers the WARN_ON_ONCE(!(task->jobctl & JOBCTL_STOP_PENDING)) in task_participate_group_stop(). do_signal_stop()->signal_group_exit() checks SIGNAL_GROUP_EXIT and return false, but task_set_jobctl_pending() checks fatal_signal_pending() and does not set JOBCTL_STOP_PENDING. And his should fix the minor security problem reported by Kyle, SECCOMP_RET_TRACE can miss fatal_signal_pending() the same way if the task is the root of a pid namespace. Link: http://lkml.kernel.org/r/20171103184246.GD21036@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Kyle Huey <me@kylehuey.com> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kyle Huey <me@kylehuey.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() ↵Oleg Nesterov2018-01-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | signals commit ac25385089f673560867eb5179228a44ade0cfc1 upstream. Change sig_task_ignored() to drop the SIG_DFL && !sig_kernel_only() signals even if force == T. This simplifies the next change and this matches the same check in get_signal() which will drop these signals anyway. Link: http://lkml.kernel.org/r/20171103184227.GC21036@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Tested-by: Kyle Huey <me@kylehuey.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILLOleg Nesterov2018-01-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 628c1bcba204052d19b686b5bac149a644cdb72e upstream. The comment in sig_ignored() says "Tracers may want to know about even ignored signals" but SIGKILL can not be reported to debugger and it is just wrong to return 0 in this case: SIGKILL should only kill the SIGNAL_UNKILLABLE task if it comes from the parent ns. Change sig_ignored() to ignore ->ptrace if sig == SIGKILL and rely on sig_task_ignored(). SISGTOP coming from within the namespace is not really right too but at least debugger can intercept it, and we can't drop it here because this will break "gdb -p 1": ptrace_attach() won't work. Perhaps we will add another ->ptrace check later, we will see. Link: http://lkml.kernel.org/r/20171103184206.GB21036@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Tested-by: Kyle Huey <me@kylehuey.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * kernel: make groups_sort calling a responsibility group_info allocatorsThiago Rafael Becker2018-01-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit bdcf0a423ea1c40bbb40e7ee483b50fc8aa3d758 upstream. In testing, we found that nfsd threads may call set_groups in parallel for the same entry cached in auth.unix.gid, racing in the call of groups_sort, corrupting the groups for that entry and leading to permission denials for the client. This patch: - Make groups_sort globally visible. - Move the call to groups_sort to the modifiers of group_info - Remove the call to groups_sort from set_groups Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com Signed-off-by: Thiago Rafael Becker <thiago.becker@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: NeilBrown <neilb@suse.com> Acked-by: "J. Bruce Fields" <bfields@fieldses.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * kernel/acct.c: fix the acct->needcheck check in check_free_space()Oleg Nesterov2018-01-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 4d9570158b6260f449e317a5f9ed030c2504a615 upstream. As Tsukada explains, the time_is_before_jiffies(acct->needcheck) check is very wrong, we need time_is_after_jiffies() to make sys_acct() work. Ignoring the overflows, the code should "goto out" if needcheck > jiffies, while currently it checks "needcheck < jiffies" and thus in the likely case check_free_space() does nothing until jiffies overflow. In particular this means that sys_acct() is simply broken, acct_on() sets acct->needcheck = jiffies and expects that check_free_space() should set acct->active = 1 after the free-space check, but this won't happen if jiffies increments in between. This was broken by commit 32dc73086015 ("get rid of timer in kern/acct.c") in 2011, then another (correct) commit 795a2f22a8ea ("acct() should honour the limits from the very beginning") made the problem more visible. Link: http://lkml.kernel.org/r/20171213133940.GA6554@redhat.com Fixes: 32dc73086015 ("get rid of timer in kern/acct.c") Reported-by: TSUKADA Koutaro <tsukada@ascade.co.jp> Suggested-by: TSUKADA Koutaro <tsukada@ascade.co.jp> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge 4.4.110 into android-4.4Greg Kroah-Hartman2018-01-06
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.110 x86/boot: Add early cmdline parsing for options with arguments KAISER: Kernel Address Isolation kaiser: merged update kaiser: do not set _PAGE_NX on pgd_none kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE kaiser: fix build and FIXME in alloc_ldt_struct() kaiser: KAISER depends on SMP kaiser: fix regs to do_nmi() ifndef CONFIG_KAISER kaiser: fix perf crashes kaiser: ENOMEM if kaiser_pagetable_walk() NULL kaiser: tidied up asm/kaiser.h somewhat kaiser: tidied up kaiser_add/remove_mapping slightly kaiser: kaiser_remove_mapping() move along the pgd kaiser: cleanups while trying for gold link kaiser: name that 0x1000 KAISER_SHADOW_PGD_OFFSET kaiser: delete KAISER_REAL_SWITCH option kaiser: vmstat show NR_KAISERTABLE as nr_overhead kaiser: enhanced by kernel and user PCIDs kaiser: load_new_mm_cr3() let SWITCH_USER_CR3 flush user kaiser: PCID 0 for kernel and 128 for user kaiser: x86_cr3_pcid_noflush and x86_cr3_pcid_user kaiser: paranoid_entry pass cr3 need to paranoid_exit kaiser: _pgd_alloc() without __GFP_REPEAT to avoid stalls kaiser: fix unlikely error in alloc_ldt_struct() kaiser: add "nokaiser" boot option, using ALTERNATIVE x86/kaiser: Rename and simplify X86_FEATURE_KAISER handling x86/kaiser: Check boottime cmdline params kaiser: use ALTERNATIVE instead of x86_cr3_pcid_noflush kaiser: drop is_atomic arg to kaiser_pagetable_walk() kaiser: asm/tlbflush.h handle noPGE at lower level kaiser: kaiser_flush_tlb_on_return_to_user() check PCID x86/paravirt: Dont patch flush_tlb_single x86/kaiser: Reenable PARAVIRT kaiser: disabled on Xen PV x86/kaiser: Move feature detection up KPTI: Rename to PAGE_TABLE_ISOLATION KPTI: Report when enabled x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap x86/kasan: Clear kasan_zero_page after TLB flush kaiser: Set _PAGE_NX only if supported Linux 4.4.110 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZEHugh Dickins2018-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | Kaiser only needs to map one page of the stack; and kernel/fork.c did not build on powerpc (no __PAGE_KERNEL). It's all cleaner if linux/kaiser.h provides kaiser_map_thread_stack() and kaiser_unmap_thread_stack() wrappers around asm/kaiser.h's kaiser_add_mapping() and kaiser_remove_mapping(). And use linux/kaiser.h in init/main.c to avoid the #ifdefs there. Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * kaiser: merged updateDave Hansen2018-01-05
| | | | | | | | | | | | | | | | | | Merged fixes and cleanups, rebased to 4.4.89 tree (no 5-level paging). Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * KAISER: Kernel Address IsolationRichard Fellner2018-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces our implementation of KAISER (Kernel Address Isolation to have Side-channels Efficiently Removed), a kernel isolation technique to close hardware side channels on kernel address information. More information about the patch can be found on: https://github.com/IAIK/KAISER From: Richard Fellner <richard.fellner@student.tugraz.at> From: Daniel Gruss <daniel.gruss@iaik.tugraz.at> X-Subject: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode Date: Thu, 4 May 2017 14:26:50 +0200 Link: http://marc.info/?l=linux-kernel&m=149390087310405&w=2 Kaiser-4.10-SHA1: c4b1831d44c6144d3762ccc72f0c4e71a0c713e5 To: <linux-kernel@vger.kernel.org> To: <kernel-hardening@lists.openwall.com> Cc: <clementine.maurice@iaik.tugraz.at> Cc: <moritz.lipp@iaik.tugraz.at> Cc: Michael Schwarz <michael.schwarz@iaik.tugraz.at> Cc: Richard Fellner <richard.fellner@student.tugraz.at> Cc: Ingo Molnar <mingo@kernel.org> Cc: <kirill.shutemov@linux.intel.com> Cc: <anders.fogh@gdata-adan.de> After several recent works [1,2,3] KASLR on x86_64 was basically considered dead by many researchers. We have been working on an efficient but effective fix for this problem and found that not mapping the kernel space when running in user mode is the solution to this problem [4] (the corresponding paper [5] will be presented at ESSoS17). With this RFC patch we allow anybody to configure their kernel with the flag CONFIG_KAISER to add our defense mechanism. If there are any questions we would love to answer them. We also appreciate any comments! Cheers, Daniel (+ the KAISER team from Graz University of Technology) [1] http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf [2] https://www.blackhat.com/docs/us-16/materials/us-16-Fogh-Using-Undocumented-CPU-Behaviour-To-See-Into-Kernel-Mode-And-Break-KASLR-In-The-Process.pdf [3] https://www.blackhat.com/docs/us-16/materials/us-16-Jang-Breaking-Kernel-Address-Space-Layout-Randomization-KASLR-With-Intel-TSX.pdf [4] https://github.com/IAIK/KAISER [5] https://gruss.cc/files/kaiser.pdf [patch based also on https://raw.githubusercontent.com/IAIK/KAISER/master/KAISER/0001-KAISER-Kernel-Address-Isolation.patch] Signed-off-by: Richard Fellner <richard.fellner@student.tugraz.at> Signed-off-by: Moritz Lipp <moritz.lipp@iaik.tugraz.at> Signed-off-by: Daniel Gruss <daniel.gruss@iaik.tugraz.at> Signed-off-by: Michael Schwarz <michael.schwarz@iaik.tugraz.at> Acked-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge 4.4.109 into android-4.4Greg Kroah-Hartman2018-01-02
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.109 ACPI: APEI / ERST: Fix missing error handling in erst_reader() crypto: mcryptd - protect the per-CPU queue with a lock mfd: cros ec: spi: Don't send first message too soon mfd: twl4030-audio: Fix sibling-node lookup mfd: twl6040: Fix child-node lookup ALSA: rawmidi: Avoid racy info ioctl via ctl device ALSA: usb-audio: Fix the missing ctl name suffix at parsing SU PCI / PM: Force devices to D0 in pci_pm_thaw_noirq() parisc: Hide Diva-built-in serial aux and graphics card spi: xilinx: Detect stall with Unknown commands KVM: X86: Fix load RFLAGS w/o the fixed bit kvm: x86: fix RSM when PCID is non-zero powerpc/perf: Dereference BHRB entries safely net: mvneta: clear interface link status on port disable tracing: Remove extra zeroing out of the ring buffer page tracing: Fix possible double free on failure of allocating trace buffer tracing: Fix crash when it fails to alloc ring buffer ring-buffer: Mask out the info bits when returning buffer page length iw_cxgb4: Only validate the MSN for successful completions ASoC: fsl_ssi: AC'97 ops need regmap, clock and cleaning up on failure ASoC: twl4030: fix child-node lookup ALSA: hda: Drop useless WARN_ON() ALSA: hda - fix headset mic detection issue on a Dell machine x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly() x86/mm: Remove flush_tlb() and flush_tlb_current_task() x86/mm: Make flush_tlb_mm_range() more predictable x86/mm: Reimplement flush_tlb_page() using flush_tlb_mm_range() x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code x86/mm: Disable PCID on 32-bit kernels x86/mm: Add the 'nopcid' boot option to turn off PCID x86/mm: Enable CR4.PCIDE on supported systems x86/mm/64: Fix reboot interaction with CR4.PCIDE kbuild: add '-fno-stack-check' to kernel build options ipv4: igmp: guard against silly MTU values ipv6: mcast: better catch silly mtu values net: igmp: Use correct source address on IGMPv3 reports netlink: Add netns check on taps net: qmi_wwan: add Sierra EM7565 1199:9091 net: reevalulate autoflowlabel setting after sysctl setting tcp md5sig: Use skb's saddr when replying to an incoming segment tg3: Fix rx hang on MTU change with 5717/5719 net: ipv4: fix for a race condition in raw_sendmsg net: mvmdio: disable/unprepare clocks in EPROBE_DEFER case sctp: Replace use of sockets_allocated with specified macro. ipv4: Fix use-after-free when flushing FIB tables net: bridge: fix early call to br_stp_change_bridge_id and plug newlink leaks net: Fix double free and memory corruption in get_net_ns_by_id() net: phy: micrel: ksz9031: reconfigure autoneg after phy autoneg workaround sock: free skb in skb_complete_tx_timestamp on error usbip: fix usbip bind writing random string after command in match_busid usbip: stub: stop printing kernel pointer addresses in messages usbip: vhci: stop printing kernel pointer addresses in messages USB: serial: ftdi_sio: add id for Airbus DS P8GR USB: serial: qcserial: add Sierra Wireless EM7565 USB: serial: option: add support for Telit ME910 PID 0x1101 USB: serial: option: adding support for YUGA CLM920-NC5 usb: Add device quirk for Logitech HD Pro Webcam C925e usb: add RESET_RESUME for ELSA MicroLink 56K USB: Fix off by one in type-specific length check of BOS SSP capability usb: xhci: Add XHCI_TRUST_TX_LENGTH for Renesas uPD720201 nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick() x86/smpboot: Remove stale TLB flush invocations n_tty: fix EXTPROC vs ICANON interaction with TIOCINQ (aka FIONREAD) mm/vmstat: Make NR_TLB_REMOTE_FLUSH_RECEIVED available even on UP Linux 4.4.109 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()Thomas Gleixner2018-01-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5d62c183f9e9df1deeea0906d099a94e8a43047a upstream. The conditions in irq_exit() to invoke tick_nohz_irq_exit() which subsequently invokes tick_nohz_stop_sched_tick() are: if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) If need_resched() is not set, but a timer softirq is pending then this is an indication that the softirq code punted and delegated the execution to softirqd. need_resched() is not true because the current interrupted task takes precedence over softirqd. Invoking tick_nohz_irq_exit() in this case can cause an endless loop of timer interrupts because the timer wheel contains an expired timer, but softirqs are not yet executed. So it returns an immediate expiry request, which causes the timer to fire immediately again. Lather, rinse and repeat.... Prevent that by adding a check for a pending timer soft interrupt to the conditions in tick_nohz_stop_sched_tick() which avoid calling get_next_timer_interrupt(). That keeps the tick sched timer on the tick and prevents a repetitive programming of an already expired timer. Reported-by: Sebastian Siewior <bigeasy@linutronix.d> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Sebastian Siewior <bigeasy@linutronix.de> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272156050.2431@nanos Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * ring-buffer: Mask out the info bits when returning buffer page lengthSteven Rostedt (VMware)2018-01-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 45d8b80c2ac5d21cd1e2954431fb676bc2b1e099 upstream. Two info bits were added to the "commit" part of the ring buffer data page when returned to be consumed. This was to inform the user space readers that events have been missed, and that the count may be stored at the end of the page. What wasn't handled, was the splice code that actually called a function to return the length of the data in order to zero out the rest of the page before sending it up to user space. These data bits were returned with the length making the value negative, and that negative value was not checked. It was compared to PAGE_SIZE, and only used if the size was less than PAGE_SIZE. Luckily PAGE_SIZE is unsigned long which made the compare an unsigned compare, meaning the negative size value did not end up causing a large portion of memory to be randomly zeroed out. Fixes: 66a8cb95ed040 ("ring-buffer: Add place holder recording of dropped events") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * tracing: Fix crash when it fails to alloc ring bufferJing Xia2018-01-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 24f2aaf952ee0b59f31c3a18b8b36c9e3d3c2cf5 upstream. Double free of the ring buffer happens when it fails to alloc new ring buffer instance for max_buffer if TRACER_MAX_TRACE is configured. The root cause is that the pointer is not set to NULL after the buffer is freed in allocate_trace_buffers(), and the freeing of the ring buffer is invoked again later if the pointer is not equal to Null, as: instance_mkdir() |-allocate_trace_buffers() |-allocate_trace_buffer(tr, &tr->trace_buffer...) |-allocate_trace_buffer(tr, &tr->max_buffer...) // allocate fail(-ENOMEM),first free // and the buffer pointer is not set to null |-ring_buffer_free(tr->trace_buffer.buffer) // out_free_tr |-free_trace_buffers() |-free_trace_buffer(&tr->trace_buffer); //if trace_buffer is not null, free again |-ring_buffer_free(buf->buffer) |-rb_free_cpu_buffer(buffer->buffers[cpu]) // ring_buffer_per_cpu is null, and // crash in ring_buffer_per_cpu->pages Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com Fixes: 737223fbca3b1 ("tracing: Consolidate buffer allocation code") Signed-off-by: Jing Xia <jing.xia@spreadtrum.com> Signed-off-by: Chunyan Zhang <chunyan.zhang@spreadtrum.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * tracing: Fix possible double free on failure of allocating trace bufferSteven Rostedt (VMware)2018-01-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 4397f04575c44e1440ec2e49b6302785c95fd2f8 upstream. Jing Xia and Chunyan Zhang reported that on failing to allocate part of the tracing buffer, memory is freed, but the pointers that point to them are not initialized back to NULL, and later paths may try to free the freed memory again. Jing and Chunyan fixed one of the locations that does this, but missed a spot. Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com Fixes: 737223fbca3b1 ("tracing: Consolidate buffer allocation code") Reported-by: Jing Xia <jing.xia@spreadtrum.com> Reported-by: Chunyan Zhang <chunyan.zhang@spreadtrum.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * tracing: Remove extra zeroing out of the ring buffer pageSteven Rostedt (VMware)2018-01-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 6b7e633fe9c24682df550e5311f47fb524701586 upstream. The ring_buffer_read_page() takes care of zeroing out any extra data in the page that it returns. There's no need to zero it out again from the consumer. It was removed from one consumer of this function, but read_buffers_splice_read() did not remove it, and worse, it contained a nasty bug because of it. Fixes: 2711ca237a084 ("ring-buffer: Move zeroing out excess in page to ring buffer code") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>