summaryrefslogtreecommitdiff
path: root/include (follow)
Commit message (Collapse)AuthorAge
...
* bpf: add generic bpf_csum_diff helperDaniel Borkmann2022-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For L4 checksums, we currently have bpf_l4_csum_replace() helper. It's currently limited to handle 2 and 4 byte changes in a header and feeds the from/to into inet_proto_csum_replace{2,4}() helpers of the kernel. When working with IPv6, for example, this makes it rather cumbersome to deal with, similarly when editing larger parts of a header. Instead, extend the API in a more generic way: For bpf_l4_csum_replace(), add a case for header field mask of 0 to change the checksum at a given offset through inet_proto_csum_replace_by_diff(), and provide a helper bpf_csum_diff() that can generically calculate a from/to diff for arbitrary amounts of data. This can be used in multiple ways: for the bpf_l4_csum_replace() only part, this even provides us with the option to insert precalculated diffs from user space f.e. from a map, or from bpf_csum_diff() during runtime. bpf_csum_diff() has a optional from/to stack buffer input, so we can calculate a diff by using a scratchbuffer for scenarios where we're inserting (from is NULL), removing (to is NULL) or diffing (from/to buffers don't need to be of equal size) data. Also, bpf_csum_diff() allows to feed a previous csum into csum_partial(), so the function can also be cascaded. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* tun: use socket locks for sk_{attach,detatch}_filterHannes Frederic Sowa2022-04-19
| | | | | | | | | | | | | | | This reverts commit 5a5abb1fa3b05dd ("tun, bpf: fix suspicious RCU usage in tun_{attach, detach}_filter") and replaces it to use lock_sock around sk_{attach,detach}_filter. The checks inside filter.c are updated with lockdep_sock_is_held to check for proper socket locks. It keeps the code cleaner by ensuring that only one lock governs the socket filter instead of two independent locks. Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* bpf, verifier: add ARG_PTR_TO_RAW_STACK typeDaniel Borkmann2022-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When passing buffers from eBPF stack space into a helper function, we have ARG_PTR_TO_STACK argument type for helpers available. The verifier makes sure that such buffers are initialized, within boundaries, etc. However, the downside with this is that we have a couple of helper functions such as bpf_skb_load_bytes() that fill out the passed buffer in the expected success case anyway, so zero initializing them prior to the helper call is unneeded/wasted instructions in the eBPF program that can be avoided. Therefore, add a new helper function argument type called ARG_PTR_TO_RAW_STACK. The idea is to skip the STACK_MISC check in check_stack_boundary() and color the related stack slots as STACK_MISC after we checked all call arguments. Helper functions using ARG_PTR_TO_RAW_STACK must make sure that every path of the helper function will fill the provided buffer area, so that we cannot leak any uninitialized stack memory. This f.e. means that error paths need to memset() the buffers, but the expected fast-path doesn't have to do this anymore. Since there's no such helper needing more than at most one ARG_PTR_TO_RAW_STACK argument, we can keep it simple and don't need to check for multiple areas. Should in future such a use-case really appear, we have check_raw_mode() that will make sure we implement support for it first. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* bpf: sanitize bpf tracepoint accessAlexei Starovoitov2022-04-19
| | | | | | | | | | | | | during bpf program loading remember the last byte of ctx access and at the time of attaching the program to tracepoint check that the program doesn't access bytes beyond defined in tracepoint fields This also disallows access to __dynamic_array fields, but can be relaxed in the future. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* bpf: support bpf_get_stackid() and bpf_perf_event_output() in tracepoint ↵Alexei Starovoitov2022-04-19
| | | | | | | | | | | | programs needs two wrapper functions to fetch 'struct pt_regs *' to convert tracepoint bpf context into kprobe bpf context to reuse existing helper functions Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* perf, bpf: allow bpf programs attach to tracepointsAlexei Starovoitov2022-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | introduce BPF_PROG_TYPE_TRACEPOINT program type and allow it to be attached to the perf tracepoint handler, which will copy the arguments into the per-cpu buffer and pass it to the bpf program as its first argument. The layout of the fields can be discovered by doing 'cat /sys/kernel/debug/tracing/events/sched/sched_switch/format' prior to the compilation of the program with exception that first 8 bytes are reserved and not accessible to the program. This area is used to store the pointer to 'struct pt_regs' which some of the bpf helpers will use: +---------+ | 8 bytes | hidden 'struct pt_regs *' (inaccessible to bpf program) +---------+ | N bytes | static tracepoint fields defined in tracepoint/format (bpf readonly) +---------+ | dynamic | __dynamic_array bytes of tracepoint (inaccessible to bpf yet) +---------+ Not that all of the fields are already dumped to user space via perf ring buffer and broken application access it directly without consulting tracepoint/format. Same rule applies here: static tracepoint fields should only be accessed in a format defined in tracepoint/format. The order of fields and field sizes are not an ABI. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* perf: split perf_trace_buf_prepare into alloc and update partsAlexei Starovoitov2022-04-19
| | | | | | | | | | | | | split allows to move expensive update of 'struct trace_entry' to later phase. Repurpose unused 1st argument of perf_tp_event() to indicate event type. While splitting use temp variable 'rctx' instead of '*rctx' to avoid unnecessary loads done by the compiler due to -fno-strict-aliasing Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* perf: remove unused __addr variableAlexei Starovoitov2022-04-19
| | | | | | | | | | | now all calls to perf_trace_buf_submit() pass 0 as 4th argument which will be repurposed in the next patch which will change the meaning of 1st arg of perf_tp_event() to event_type Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* bpf: convert stackmap to pre-allocationAlexei Starovoitov2022-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It was observed that calling bpf_get_stackid() from a kprobe inside slub or from spin_unlock causes similar deadlock as with hashmap, therefore convert stackmap to use pre-allocated memory. The call_rcu is no longer feasible mechanism, since delayed freeing causes bpf_get_stackid() to fail unpredictably when number of actual stacks is significantly less than user requested max_entries. Since elements are no longer freed into slub, we can push elements into freelist immediately and let them be recycled. However the very unlikley race between user space map_lookup() and program-side recycling is possible: cpu0 cpu1 ---- ---- user does lookup(stackidX) starts copying ips into buffer delete(stackidX) calls bpf_get_stackid() which recyles the element and overwrites with new stack trace To avoid user space seeing a partial stack trace consisting of two merged stack traces, do bucket = xchg(, NULL); copy; xchg(,bucket); to preserve consistent stack trace delivery to user space. Now we can move memset(,0) of left-over element value from critical path of bpf_get_stackid() into slow-path of user space lookup. Also disallow lookup() from bpf program, since it's useless and program shouldn't be messing with collected stack trace. Note that similar race between user space lookup and kernel side updates is also present in hashmap, but it's not a new race. bpf programs were always allowed to modify hash and array map elements while user space is copying them. Fixes: d5a3b1f69186 ("bpf: introduce BPF_MAP_TYPE_STACK_TRACE") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* bpf: pre-allocate hash map elementsAlexei Starovoitov2022-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If kprobe is placed on spin_unlock then calling kmalloc/kfree from bpf programs is not safe, since the following dead lock is possible: kfree->spin_lock(kmem_cache_node->lock)...spin_unlock->kprobe-> bpf_prog->map_update->kmalloc->spin_lock(of the same kmem_cache_node->lock) and deadlocks. The following solutions were considered and some implemented, but eventually discarded - kmem_cache_create for every map - add recursion check to slow-path of slub - use reserved memory in bpf_map_update for in_irq or in preempt_disabled - kmalloc via irq_work At the end pre-allocation of all map elements turned out to be the simplest solution and since the user is charged upfront for all the memory, such pre-allocation doesn't affect the user space visible behavior. Since it's impossible to tell whether kprobe is triggered in a safe location from kmalloc point of view, use pre-allocation by default and introduce new BPF_F_NO_PREALLOC flag. While testing of per-cpu hash maps it was discovered that alloc_percpu(GFP_ATOMIC) has odd corner cases and often fails to allocate memory even when 90% of it is free. The pre-allocation of per-cpu hash elements solves this problem as well. Turned out that bpf_map_update() quickly followed by bpf_map_lookup()+bpf_map_delete() is very common pattern used in many of iovisor/bcc/tools, so there is additional benefit of pre-allocation, since such use cases are must faster. Since all hash map elements are now pre-allocated we can remove atomic increment of htab->count and save few more cycles. Also add bpf_map_precharge_memlock() to check rlimit_memlock early to avoid large malloc/free done by users who don't have sufficient limits. Pre-allocation is done with vmalloc and alloc/free is done via percpu_freelist. Here are performance numbers for different pre-allocation algorithms that were implemented, but discarded in favor of percpu_freelist: 1 cpu: pcpu_ida 2.1M pcpu_ida nolock 2.3M bt 2.4M kmalloc 1.8M hlist+spinlock 2.3M pcpu_freelist 2.6M 4 cpu: pcpu_ida 1.5M pcpu_ida nolock 1.8M bt w/smp_align 1.7M bt no/smp_align 1.1M kmalloc 0.7M hlist+spinlock 0.2M pcpu_freelist 2.0M 8 cpu: pcpu_ida 0.7M bt w/smp_align 0.8M kmalloc 0.4M pcpu_freelist 1.5M 32 cpu: kmalloc 0.13M pcpu_freelist 0.49M pcpu_ida nolock is a modified percpu_ida algorithm without percpu_ida_cpu locks and without cross-cpu tag stealing. It's faster than existing percpu_ida, but not as fast as pcpu_freelist. bt is a variant of block/blk-mq-tag.c simlified and customized for bpf use case. bt w/smp_align is using cache line for every 'long' (similar to blk-mq-tag). bt no/smp_align allocates 'long' bitmasks continuously to save memory. It's comparable to percpu_ida and in some cases faster, but slower than percpu_freelist hlist+spinlock is the simplest free list with single spinlock. As expeceted it has very bad scaling in SMP. kmalloc is existing implementation which is still available via BPF_F_NO_PREALLOC flag. It's significantly slower in single cpu and in 8 cpu setup it's 3 times slower than pre-allocation with pcpu_freelist, but saves memory, so in cases where map->max_entries can be large and number of map update/delete per second is low, it may make sense to use it. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* bpf: prevent kprobe+bpf deadlocksAlexei Starovoitov2022-04-19
| | | | | | | | | | | | | | | | | if kprobe is placed within update or delete hash map helpers that hold bucket spin lock and triggered bpf program is trying to grab the spinlock for the same bucket on the same cpu, it will deadlock. Fix it by extending existing recursion prevention mechanism. Note, map_lookup and other tracing helpers don't have this problem, since they don't hold any locks and don't modify global data. bpf_trace_printk has its own recursive check and ok as well. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* bpf: add new arg_type that allows for 0 sized stack bufferDaniel Borkmann2022-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, when we pass a buffer from the eBPF stack into a helper function, the function proto indicates argument types as ARG_PTR_TO_STACK and ARG_CONST_STACK_SIZE pair. If R<X> contains the former, then R<X+1> must be of the latter type. Then, verifier checks whether the buffer points into eBPF stack, is initialized, etc. The verifier also guarantees that the constant value passed in R<X+1> is greater than 0, so helper functions don't need to test for it and can always assume a non-NULL initialized buffer as well as non-0 buffer size. This patch adds a new argument types ARG_CONST_STACK_SIZE_OR_ZERO that allows to also pass NULL as R<X> and 0 as R<X+1> into the helper function. Such helper functions, of course, need to be able to handle these cases internally then. Verifier guarantees that either R<X> == NULL && R<X+1> == 0 or R<X> != NULL && R<X+1> != 0 (like the case of ARG_CONST_STACK_SIZE), any other combinations are not possible to load. I went through various options of extending the verifier, and introducing the type ARG_CONST_STACK_SIZE_OR_ZERO seems to have most minimal changes needed to the verifier. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* bpf: introduce BPF_MAP_TYPE_STACK_TRACEAlexei Starovoitov2022-04-19
| | | | | | | | | | | | | | | | | | | | | | | | add new map type to store stack traces and corresponding helper bpf_get_stackid(ctx, map, flags) - walk user or kernel stack and return id @ctx: struct pt_regs* @map: pointer to stack_trace map @flags: bits 0-7 - numer of stack frames to skip bit 8 - collect user stack instead of kernel bit 9 - compare stacks by hash only bit 10 - if two different stacks hash into the same stackid discard old other bits - reserved Return: >= 0 stackid on success or negative error stackid is a 32-bit integer handle that can be further combined with other data (including other stackid) and used as a key into maps. Userspace will access stackmap using standard lookup/delete syscall commands to retrieve full stack trace for given stackid. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* bpf: support ipv6 for bpf_skb_{set,get}_tunnel_keyDaniel Borkmann2022-04-19
| | | | | | | | | | | | | | | | | | | After IPv6 support has recently been added to metadata dst and related encaps, add support for populating/reading it from an eBPF program. Commit d3aa45ce6b ("bpf: add helpers to access tunnel metadata") started with initial IPv4-only support back then (due to IPv6 metadata support not being available yet). To stay compatible with older programs, we need to test for the passed structure size. Also TOS and TTL support from the ip_tunnel_info key has been added. Tested with vxlan devs in collect meta data mode with IPv4, IPv6 and in compat mode over different network namespaces. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* bpf: export helper function flags and reject invalid onesDaniel Borkmann2022-04-19
| | | | | | | | | | | | | Export flags used by eBPF helper functions through UAPI, so they can be used by programs (instead of them redefining all flags each time or just using the hard-coded values). It also gives a better overview what flags are used where and we can further get rid of the extra macros defined in filter.c. Moreover, reject invalid flags. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPFCraig Gallek2022-04-19
| | | | | | | | | | | | | | | | Expose socket options for setting a classic or extended BPF program for use when selecting sockets in an SO_REUSEPORT group. These options can be used on the first socket to belong to a group before bind or on any socket in the group after bind. This change includes refactoring of the existing sk_filter code to allow reuse of the existing BPF filter validation checks. Signed-off-by: Craig Gallek <kraig@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chatur27 <jasonbright2709@gmail.com> Change-Id: I4433dfd5d865bb69e8d3cab55cc68325829e8b49
* soreuseport: fast reuseport UDP socket selectionCraig Gallek2022-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Include a struct sock_reuseport instance when a UDP socket binds to a specific address for the first time with the reuseport flag set. When selecting a socket for an incoming UDP packet, use the information available in sock_reuseport if present. This required adding an additional field to the UDP source address equality function to differentiate between exact and wildcard matches. The original use case allowed wildcard matches when checking for existing port uses during bind. The new use case of adding a socket to a reuseport group requires exact address matching. Performance test (using a machine with 2 CPU sockets and a total of 48 cores): Create reuseport groups of varying size. Use one socket from this group per user thread (pinning each thread to a different core) calling recvmmsg in a tight loop. Record number of messages received per second while saturating a 10G link. 10 sockets: 18% increase (~2.8M -> 3.3M pkts/s) 20 sockets: 14% increase (~2.9M -> 3.3M pkts/s) 40 sockets: 13% increase (~3.0M -> 3.4M pkts/s) This work is based off a similar implementation written by Ying Cai <ycai@google.com> for implementing policy-based reuseport selection. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* soreuseport: define reuseport groupsCraig Gallek2022-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | struct sock_reuseport is an optional shared structure referenced by each socket belonging to a reuseport group. When a socket is bound to an address/port not yet in use and the reuseport flag has been set, the structure will be allocated and attached to the newly bound socket. When subsequent calls to bind are made for the same address/port, the shared structure will be updated to include the new socket and the newly bound socket will reference the group structure. Usually, when an incoming packet was destined for a reuseport group, all sockets in the same group needed to be considered before a dispatching decision was made. With this structure, an appropriate socket can be found after looking up just one socket in the group. This shared structure will also allow for more complicated decisions to be made when selecting a socket (eg a BPF filter). This work is based off a similar implementation written by Ying Cai <ycai@google.com> for implementing policy-based reuseport selection. Signed-off-by: Craig Gallek <kraig@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* bpf: add bpf_skb_load_bytes helperDaniel Borkmann2022-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When hacking tc programs with eBPF, one of the issues that come up from time to time is to load addresses from headers. In eBPF as in classic BPF, we have BPF_LD | BPF_ABS | BPF_{B,H,W} instructions that extract a byte, half-word or word out of the skb data though helpers such as bpf_load_pointer() (interpreter case). F.e. extracting a whole IPv6 address could possibly look like ... union v6addr { struct { __u32 p1; __u32 p2; __u32 p3; __u32 p4; }; __u8 addr[16]; }; [...] a.p1 = htonl(load_word(skb, off)); a.p2 = htonl(load_word(skb, off + 4)); a.p3 = htonl(load_word(skb, off + 8)); a.p4 = htonl(load_word(skb, off + 12)); [...] /* access to a.addr[...] */ This work adds a complementary helper bpf_skb_load_bytes() (we also have bpf_skb_store_bytes()) as an alternative where the same call would look like from an eBPF program: ret = bpf_skb_load_bytes(skb, off, addr, sizeof(addr)); Same verifier restrictions apply as in ffeedafbf023 ("bpf: introduce current->pid, tgid, uid, gid, comm accessors") case, where stack memory access needs to be statically verified and thus guaranteed to be initialized in first use (otherwise verifier cannot tell whether a subsequent access to it is valid or not as it's runtime dependent). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* bpf: add lookup/update support for per-cpu hash and array mapsAlexei Starovoitov2022-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | The functions bpf_map_lookup_elem(map, key, value) and bpf_map_update_elem(map, key, value, flags) need to get/set values from all-cpus for per-cpu hash and array maps, so that user space can aggregate/update them as necessary. Example of single counter aggregation in user space: unsigned int nr_cpus = sysconf(_SC_NPROCESSORS_CONF); long values[nr_cpus]; long value = 0; bpf_lookup_elem(fd, key, values); for (i = 0; i < nr_cpus; i++) value += values[i]; The user space must provide round_up(value_size, 8) * nr_cpus array to get/set values, since kernel will use 'long' copy of per-cpu values to try to copy good counters atomically. It's a best-effort, since bpf programs and user space are racing to access the same memory. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY mapAlexei Starovoitov2022-04-19
| | | | | | | | | | | | | Primary use case is a histogram array of latency where bpf program computes the latency of block requests or other events and stores histogram of latency into array of 64 elements. All cpus are constantly running, so normal increment is not accurate, bpf_xadd causes cache ping-pong and this per-cpu approach allows fastest collision-free counters. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* bpf: introduce BPF_MAP_TYPE_PERCPU_HASH mapAlexei Starovoitov2022-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce BPF_MAP_TYPE_PERCPU_HASH map type which is used to do accurate counters without need to use BPF_XADD instruction which turned out to be too costly for high-performance network monitoring. In the typical use case the 'key' is the flow tuple or other long living object that sees a lot of events per second. bpf_map_lookup_elem() returns per-cpu area. Example: struct { u32 packets; u32 bytes; } * ptr = bpf_map_lookup_elem(&map, &key); /* ptr points to this_cpu area of the value, so the following * increments will not collide with other cpus */ ptr->packets ++; ptr->bytes += skb->len; bpf_update_elem() atomically creates a new element where all per-cpu values are zero initialized and this_cpu value is populated with given 'value'. Note that non-per-cpu hash map always allocates new element and then deletes old after rcu grace period to maintain atomicity of update. Per-cpu hash map updates element values in-place. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* perf/bpf: Convert perf_event_array to use struct fileAlexei Starovoitov2022-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | Robustify refcounting. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Wang Nan <wangnan0@huawei.com> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160126045947.GA40151@ast-mbp.thefacebook.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* bpf: fix clearing on persistent program array mapsDaniel Borkmann2022-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, when having map file descriptors pointing to program arrays, there's still the issue that we unconditionally flush program array contents via bpf_fd_array_map_clear() in bpf_map_release(). This happens when such a file descriptor is released and is independent of the map's refcount. Having this flush independent of the refcount is for a reason: there can be arbitrary complex dependency chains among tail calls, also circular ones (direct or indirect, nesting limit determined during runtime), and we need to make sure that the map drops all references to eBPF programs it holds, so that the map's refcount can eventually drop to zero and initiate its freeing. Btw, a walk of the whole dependency graph would not be possible for various reasons, one being complexity and another one inconsistency, i.e. new programs can be added to parts of the graph at any time, so there's no guaranteed consistent state for the time of such a walk. Now, the program array pinning itself works, but the issue is that each derived file descriptor on close would nevertheless call unconditionally into bpf_fd_array_map_clear(). Instead, keep track of users and postpone this flush until the last reference to a user is dropped. As this only concerns a subset of references (f.e. a prog array could hold a program that itself has reference on the prog array holding it, etc), we need to track them separately. Short analysis on the refcounting: on map creation time usercnt will be one, so there's no change in behaviour for bpf_map_release(), if unpinned. If we already fail in map_create(), we are immediately freed, and no file descriptor has been made public yet. In bpf_obj_pin_user(), we need to probe for a possible map in bpf_fd_probe_obj() already with a usercnt reference, so before we drop the reference on the fd with fdput(). Therefore, if actual pinning fails, we need to drop that reference again in bpf_any_put(), otherwise we keep holding it. When last reference drops on the inode, the bpf_any_put() in bpf_evict_inode() will take care of dropping the usercnt again. In the bpf_obj_get_user() case, the bpf_any_get() will grab a reference on the usercnt, still at a time when we have the reference on the path. Should we later on fail to grab a new file descriptor, bpf_any_put() will drop it, otherwise we hold it until bpf_map_release() time. Joint work with Alexei. Fixes: b2197755b263 ("bpf: add support for persistent maps/progs") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* Revert "bpf: fix clearing on persistent program array maps"Anay Wadhera2022-04-19
| | | | | | This reverts commit c9da161c6517ba12154059d3b965c2cbaf16f90f. Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* Revert "bpf: fix refcnt overflow"Anay Wadhera2022-04-19
| | | | | | This reverts commit 3899251bdb9c2b31fc73d4cc132f52d3710101de. Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* Revert "bpf: add bpf_patch_insn_single helper"Anay Wadhera2022-04-19
| | | | | | This reverts commit 087a92287dbae61b4ee1e76d7c20c81710109422. Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* Revert "bpf: prevent out-of-bounds speculation"Anay Wadhera2022-04-19
| | | | | | This reverts commit 9a7fad4c0e215fb1c256fee27c45f9f8bc4364c5. Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* Revert "bpf: avoid false sharing of map refcount with max_entries"Anay Wadhera2022-04-19
| | | | | | This reverts commit 96d9b2338bed553c37f759127d8d18c857449ceb. Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* Revert "bpf: generally move prog destruction to RCU deferral"Anay Wadhera2022-04-19
| | | | | | This reverts commit e25dc63aa366fd0f61d1d9ba67b66f5d75fc4372. Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
* Merge remote-tracking branch 'google/common/android-4.4-p' into ↵Michael Bestas2022-02-04
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lineage-18.1-caf-msm8998 * google/common/android-4.4-p: Linux 4.4.302 Input: i8042 - Fix misplaced backport of "add ASUS Zenbook Flip to noselftest list" KVM: x86: Fix misplaced backport of "work around leak of uninitialized stack contents" Revert "tc358743: fix register i2c_rd/wr function fix" Revert "drm/radeon/ci: disable mclk switching for high refresh rates (v2)" Bluetooth: MGMT: Fix misplaced BT_HS check ipv4: tcp: send zero IPID in SYNACK messages ipv4: raw: lock the socket in raw_bind() hwmon: (lm90) Reduce maximum conversion rate for G781 drm/msm: Fix wrong size calculation net-procfs: show net devices bound packet types ipv4: avoid using shared IP generator for connected sockets net: fix information leakage in /proc/net/ptype ipv6_tunnel: Rate limit warning messages scsi: bnx2fc: Flush destroy_work queue before calling bnx2fc_interface_put() USB: core: Fix hang in usb_kill_urb by adding memory barriers usb-storage: Add unusual-devs entry for VL817 USB-SATA bridge tty: Add support for Brainboxes UC cards. tty: n_gsm: fix SW flow control encoding/handling serial: stm32: fix software flow control transfer PM: wakeup: simplify the output logic of pm_show_wakelocks() udf: Fix NULL ptr deref when converting from inline format udf: Restore i_lenAlloc when inode expansion fails scsi: zfcp: Fix failed recovery on gone remote port with non-NPIV FCP devices s390/hypfs: include z/VM guests with access control group set Bluetooth: refactor malicious adv data check can: bcm: fix UAF of bcm op Linux 4.4.301 drm/i915: Flush TLBs before releasing backing store Linux 4.4.300 lib82596: Fix IRQ check in sni_82596_probe bcmgenet: add WOL IRQ check net_sched: restore "mpu xxx" handling dmaengine: at_xdmac: Fix at_xdmac_lld struct definition dmaengine: at_xdmac: Fix lld view setting dmaengine: at_xdmac: Print debug message after realeasing the lock dmaengine: at_xdmac: Don't start transactions at tx_submit level netns: add schedule point in ops_exit_list() net: axienet: fix number of TX ring slots for available check net: axienet: Wait for PhyRstCmplt after core reset af_unix: annote lockless accesses to unix_tot_inflight & gc_in_progress parisc: pdc_stable: Fix memory leak in pdcs_register_pathentries net/fsl: xgmac_mdio: Fix incorrect iounmap when removing module powerpc/fsl/dts: Enable WA for erratum A-009885 on fman3l MDIO buses ext4: don't use the orphan list when migrating an inode ext4: Fix BUG_ON in ext4_bread when write quota data ext4: set csum seed in tmp inode while migrating to extents ubifs: Error path in ubifs_remount_rw() seems to wrongly free write buffers power: bq25890: Enable continuous conversion for ADC at charging scsi: sr: Don't use GFP_DMA MIPS: Octeon: Fix build errors using clang i2c: designware-pci: Fix to change data types of hcnt and lcnt parameters ALSA: seq: Set upper limit of processed events w1: Misuse of get_user()/put_user() reported by sparse i2c: mpc: Correct I2C reset procedure powerpc/smp: Move setup_profiling_timer() under CONFIG_PROFILING i2c: i801: Don't silently correct invalid transfer size powerpc/btext: add missing of_node_put powerpc/cell: add missing of_node_put powerpc/powernv: add missing of_node_put powerpc/6xx: add missing of_node_put parisc: Avoid calling faulthandler_disabled() twice serial: core: Keep mctrl register state and cached copy in sync serial: pl010: Drop CR register reset on set_termios dm space map common: add bounds check to sm_ll_lookup_bitmap() dm btree: add a defensive bounds check to insert_at() net: mdio: Demote probed message to debug print btrfs: remove BUG_ON(!eie) in find_parent_nodes btrfs: remove BUG_ON() in find_parent_nodes() ACPICA: Executer: Fix the REFCLASS_REFOF case in acpi_ex_opcode_1A_0T_1R() ACPICA: Utilities: Avoid deleting the same object twice in a row um: registers: Rename function names to avoid conflicts and build problems ath9k: Fix out-of-bound memcpy in ath9k_hif_usb_rx_stream usb: hub: Add delay for SuperSpeed hub resume to let links transit to U0 media: saa7146: hexium_gemini: Fix a NULL pointer dereference in hexium_attach() media: igorplugusb: receiver overflow should be reported net: bonding: debug: avoid printing debug logs when bond is not notifying peers iwlwifi: mvm: synchronize with FW after multicast commands media: m920x: don't use stack on USB reads media: saa7146: hexium_orion: Fix a NULL pointer dereference in hexium_attach() floppy: Add max size check for user space request mwifiex: Fix skb_over_panic in mwifiex_usb_recv() HSI: core: Fix return freed object in hsi_new_client media: b2c2: Add missing check in flexcop_pci_isr: usb: gadget: f_fs: Use stream_open() for endpoint files ar5523: Fix null-ptr-deref with unexpected WDCMSG_TARGET_START reply fs: dlm: filter user dlm messages for kernel locks Bluetooth: Fix debugfs entry leak in hci_register_dev() RDMA/cxgb4: Set queue pair state when being queried mips: bcm63xx: add support for clk_set_parent() mips: lantiq: add support for clk_set_parent() misc: lattice-ecp3-config: Fix task hung when firmware load failed ASoC: samsung: idma: Check of ioremap return value dmaengine: pxa/mmp: stop referencing config->slave_id RDMA/core: Let ib_find_gid() continue search even after empty entry char/mwave: Adjust io port register size ALSA: oss: fix compile error when OSS_DEBUG is enabled powerpc/prom_init: Fix improper check of prom_getprop() ALSA: hda: Add missing rwsem around snd_ctl_remove() calls ALSA: PCM: Add missing rwsem around snd_ctl_remove() calls ALSA: jack: Add missing rwsem around snd_ctl_remove() calls ext4: avoid trim error on fs with small groups net: mcs7830: handle usb read errors properly pcmcia: fix setting of kthread task states can: xilinx_can: xcan_probe(): check for error irq can: softing: softing_startstop(): fix set but not used variable warning spi: spi-meson-spifc: Add missing pm_runtime_disable() in meson_spifc_probe ppp: ensure minimum packet size in ppp_write() pcmcia: rsrc_nonstatic: Fix a NULL pointer dereference in nonstatic_find_mem_region() pcmcia: rsrc_nonstatic: Fix a NULL pointer dereference in __nonstatic_find_io_region() usb: ftdi-elan: fix memory leak on device disconnect media: msi001: fix possible null-ptr-deref in msi001_probe() media: saa7146: mxb: Fix a NULL pointer dereference in mxb_attach() media: dib8000: Fix a memleak in dib8000_init() floppy: Fix hang in watchdog when disk is ejected serial: amba-pl011: do not request memory region twice drm/amdgpu: Fix a NULL pointer dereference in amdgpu_connector_lcd_native_mode() arm64: dts: qcom: msm8916: fix MMC controller aliases netfilter: bridge: add support for pppoe filtering tty: serial: atmel: Call dma_async_issue_pending() tty: serial: atmel: Check return code of dmaengine_submit() crypto: qce - fix uaf on qce_ahash_register_one Bluetooth: stop proccessing malicious adv data Bluetooth: cmtp: fix possible panic when cmtp_init_sockets() fails PCI: Add function 1 DMA alias quirk for Marvell 88SE9125 SATA controller can: softing_cs: softingcs_probe(): fix memleak on registration failure media: stk1160: fix control-message timeouts media: pvrusb2: fix control-message timeouts media: dib0700: fix undefined behavior in tuner shutdown media: em28xx: fix control-message timeouts media: mceusb: fix control-message timeouts rtc: cmos: take rtc_lock while reading from CMOS nfc: llcp: fix NULL error pointer dereference on sendmsg() after failed bind() HID: uhid: Fix worker destroying device without any protection rtlwifi: rtl8192cu: Fix WARNING when calling local_irq_restore() with interrupts enabled media: uvcvideo: fix division by zero at stream start drm/i915: Avoid bitwise vs logical OR warning in snb_wm_latency_quirk() can: gs_usb: gs_can_start_xmit(): zero-initialize hf->{flags,reserved} can: gs_usb: fix use of uninitialized variable, detach device on reception of invalid USB data mfd: intel-lpss: Fix too early PM enablement in the ACPI ->probe() USB: Fix "slab-out-of-bounds Write" bug in usb_hcd_poll_rh_status USB: core: Fix bug in resuming hub's handling of wakeup requests Bluetooth: bfusb: fix division by zero in send path Linux 4.4.299 power: reset: ltc2952: Fix use of floating point literals mISDN: change function names to avoid conflicts net: udp: fix alignment problem in udp4_seq_show() ip6_vti: initialize __ip6_tnl_parm struct in vti6_siocdevprivate scsi: libiscsi: Fix UAF in iscsi_conn_get_param()/iscsi_conn_teardown() phonet: refcount leak in pep_sock_accep rndis_host: support Hytera digital radios xfs: map unwritten blocks in XFS_IOC_{ALLOC,FREE}SP just like fallocate sch_qfq: prevent shift-out-of-bounds in qfq_init_qdisc i40e: Fix incorrect netdev's real number of RX/TX queues mac80211: initialize variable have_higher_than_11mbit ieee802154: atusb: fix uninit value in atusb_set_extended_addr Bluetooth: btusb: Apply QCA Rome patches for some ATH3012 models bpf, test: fix ld_abs + vlan push/pop stress test Linux 4.4.298 net: fix use-after-free in tw_timer_handler Input: spaceball - fix parsing of movement data packets Input: appletouch - initialize work before device registration scsi: vmw_pvscsi: Set residual data length conditionally usb: gadget: f_fs: Clear ffs_eventfd in ffs_data_clear. xhci: Fresco FL1100 controller should not have BROKEN_MSI quirk set. uapi: fix linux/nfc.h userspace compilation errors nfc: uapi: use kernel size_t to fix user-space builds selinux: initialize proto variable in selinux_ip_postroute_compat() recordmcount.pl: fix typo in s390 mcount regex platform/x86: apple-gmux: use resource_size() with res Linux 4.4.297 phonet/pep: refuse to enable an unbound pipe hamradio: improve the incomplete fix to avoid NPD hamradio: defer ax25 kfree after unregister_netdev ax25: NPD bug when detaching AX25 device xen/blkfront: fix bug in backported patch ARM: 9169/1: entry: fix Thumb2 bug in iWMMXt exception handling ALSA: drivers: opl3: Fix incorrect use of vp->state ALSA: jack: Check the return value of kstrdup() hwmon: (lm90) Fix usage of CONFIG2 register in detect function drivers: net: smc911x: Check for error irq bonding: fix ad_actor_system option setting to default qlcnic: potential dereference null pointer of rx_queue->page_ring IB/qib: Fix memory leak in qib_user_sdma_queue_pkts() HID: holtek: fix mouse probing can: kvaser_usb: get CAN clock frequency from device net: usb: lan78xx: add Allied Telesis AT29M2-AF Conflicts: drivers/usb/gadget/function/f_fs.c Change-Id: Iabc390c3c9160c7a2864ffe1125d73412ffdb31d
| * Merge 4.4.302 into android-4.4-pGreg Kroah-Hartman2022-02-03
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.302 can: bcm: fix UAF of bcm op Bluetooth: refactor malicious adv data check s390/hypfs: include z/VM guests with access control group set scsi: zfcp: Fix failed recovery on gone remote port with non-NPIV FCP devices udf: Restore i_lenAlloc when inode expansion fails udf: Fix NULL ptr deref when converting from inline format PM: wakeup: simplify the output logic of pm_show_wakelocks() serial: stm32: fix software flow control transfer tty: n_gsm: fix SW flow control encoding/handling tty: Add support for Brainboxes UC cards. usb-storage: Add unusual-devs entry for VL817 USB-SATA bridge USB: core: Fix hang in usb_kill_urb by adding memory barriers scsi: bnx2fc: Flush destroy_work queue before calling bnx2fc_interface_put() ipv6_tunnel: Rate limit warning messages net: fix information leakage in /proc/net/ptype ipv4: avoid using shared IP generator for connected sockets net-procfs: show net devices bound packet types drm/msm: Fix wrong size calculation hwmon: (lm90) Reduce maximum conversion rate for G781 ipv4: raw: lock the socket in raw_bind() ipv4: tcp: send zero IPID in SYNACK messages Bluetooth: MGMT: Fix misplaced BT_HS check Revert "drm/radeon/ci: disable mclk switching for high refresh rates (v2)" Revert "tc358743: fix register i2c_rd/wr function fix" KVM: x86: Fix misplaced backport of "work around leak of uninitialized stack contents" Input: i8042 - Fix misplaced backport of "add ASUS Zenbook Flip to noselftest list" Linux 4.4.302 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I5191d3cb4df0fa8de60170d2fedf4a3c51380fdf
| | * ipv4: avoid using shared IP generator for connected socketsEric Dumazet2022-02-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 23f57406b82de51809d5812afd96f210f8b627f3 upstream. ip_select_ident_segs() has been very conservative about using the connected socket private generator only for packets with IP_DF set, claiming it was needed for some VJ compression implementations. As mentioned in this referenced document, this can be abused. (Ref: Off-Path TCP Exploits of the Mixed IPID Assignment) Before switching to pure random IPID generation and possibly hurt some workloads, lets use the private inet socket generator. Not only this will remove one vulnerability, this will also improve performance of TCP flows using pmtudisc==IP_PMTUDISC_DONT Fixes: 73f156a6e8c1 ("inetpeer: get rid of ip_id_count") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reported-by: Ray Che <xijiache@gmail.com> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| | * net: fix information leakage in /proc/net/ptypeCongyu Liu2022-02-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 47934e06b65637c88a762d9c98329ae6e3238888 upstream. In one net namespace, after creating a packet socket without binding it to a device, users in other net namespaces can observe the new `packet_type` added by this packet socket by reading `/proc/net/ptype` file. This is minor information leakage as packet socket is namespace aware. Add a net pointer in `packet_type` to keep the net namespace of of corresponding packet socket. In `ptype_seq_show`, this net pointer must be checked when it is not NULL. Fixes: 2feb27dbe00c ("[NETNS]: Minor information leak via /proc/net/ptype file.") Signed-off-by: Congyu Liu <liu3101@purdue.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
| * | Merge 4.4.300 into android-4.4-pGreg Kroah-Hartman2022-01-27
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.300 Bluetooth: bfusb: fix division by zero in send path USB: core: Fix bug in resuming hub's handling of wakeup requests USB: Fix "slab-out-of-bounds Write" bug in usb_hcd_poll_rh_status mfd: intel-lpss: Fix too early PM enablement in the ACPI ->probe() can: gs_usb: fix use of uninitialized variable, detach device on reception of invalid USB data can: gs_usb: gs_can_start_xmit(): zero-initialize hf->{flags,reserved} drm/i915: Avoid bitwise vs logical OR warning in snb_wm_latency_quirk() media: uvcvideo: fix division by zero at stream start rtlwifi: rtl8192cu: Fix WARNING when calling local_irq_restore() with interrupts enabled HID: uhid: Fix worker destroying device without any protection nfc: llcp: fix NULL error pointer dereference on sendmsg() after failed bind() rtc: cmos: take rtc_lock while reading from CMOS media: mceusb: fix control-message timeouts media: em28xx: fix control-message timeouts media: dib0700: fix undefined behavior in tuner shutdown media: pvrusb2: fix control-message timeouts media: stk1160: fix control-message timeouts can: softing_cs: softingcs_probe(): fix memleak on registration failure PCI: Add function 1 DMA alias quirk for Marvell 88SE9125 SATA controller Bluetooth: cmtp: fix possible panic when cmtp_init_sockets() fails Bluetooth: stop proccessing malicious adv data crypto: qce - fix uaf on qce_ahash_register_one tty: serial: atmel: Check return code of dmaengine_submit() tty: serial: atmel: Call dma_async_issue_pending() netfilter: bridge: add support for pppoe filtering arm64: dts: qcom: msm8916: fix MMC controller aliases drm/amdgpu: Fix a NULL pointer dereference in amdgpu_connector_lcd_native_mode() serial: amba-pl011: do not request memory region twice floppy: Fix hang in watchdog when disk is ejected media: dib8000: Fix a memleak in dib8000_init() media: saa7146: mxb: Fix a NULL pointer dereference in mxb_attach() media: msi001: fix possible null-ptr-deref in msi001_probe() usb: ftdi-elan: fix memory leak on device disconnect pcmcia: rsrc_nonstatic: Fix a NULL pointer dereference in __nonstatic_find_io_region() pcmcia: rsrc_nonstatic: Fix a NULL pointer dereference in nonstatic_find_mem_region() ppp: ensure minimum packet size in ppp_write() spi: spi-meson-spifc: Add missing pm_runtime_disable() in meson_spifc_probe can: softing: softing_startstop(): fix set but not used variable warning can: xilinx_can: xcan_probe(): check for error irq pcmcia: fix setting of kthread task states net: mcs7830: handle usb read errors properly ext4: avoid trim error on fs with small groups ALSA: jack: Add missing rwsem around snd_ctl_remove() calls ALSA: PCM: Add missing rwsem around snd_ctl_remove() calls ALSA: hda: Add missing rwsem around snd_ctl_remove() calls powerpc/prom_init: Fix improper check of prom_getprop() ALSA: oss: fix compile error when OSS_DEBUG is enabled char/mwave: Adjust io port register size RDMA/core: Let ib_find_gid() continue search even after empty entry dmaengine: pxa/mmp: stop referencing config->slave_id ASoC: samsung: idma: Check of ioremap return value misc: lattice-ecp3-config: Fix task hung when firmware load failed mips: lantiq: add support for clk_set_parent() mips: bcm63xx: add support for clk_set_parent() RDMA/cxgb4: Set queue pair state when being queried Bluetooth: Fix debugfs entry leak in hci_register_dev() fs: dlm: filter user dlm messages for kernel locks ar5523: Fix null-ptr-deref with unexpected WDCMSG_TARGET_START reply usb: gadget: f_fs: Use stream_open() for endpoint files media: b2c2: Add missing check in flexcop_pci_isr: HSI: core: Fix return freed object in hsi_new_client mwifiex: Fix skb_over_panic in mwifiex_usb_recv() floppy: Add max size check for user space request media: saa7146: hexium_orion: Fix a NULL pointer dereference in hexium_attach() media: m920x: don't use stack on USB reads iwlwifi: mvm: synchronize with FW after multicast commands net: bonding: debug: avoid printing debug logs when bond is not notifying peers media: igorplugusb: receiver overflow should be reported media: saa7146: hexium_gemini: Fix a NULL pointer dereference in hexium_attach() usb: hub: Add delay for SuperSpeed hub resume to let links transit to U0 ath9k: Fix out-of-bound memcpy in ath9k_hif_usb_rx_stream um: registers: Rename function names to avoid conflicts and build problems ACPICA: Utilities: Avoid deleting the same object twice in a row ACPICA: Executer: Fix the REFCLASS_REFOF case in acpi_ex_opcode_1A_0T_1R() btrfs: remove BUG_ON() in find_parent_nodes() btrfs: remove BUG_ON(!eie) in find_parent_nodes net: mdio: Demote probed message to debug print dm btree: add a defensive bounds check to insert_at() dm space map common: add bounds check to sm_ll_lookup_bitmap() serial: pl010: Drop CR register reset on set_termios serial: core: Keep mctrl register state and cached copy in sync parisc: Avoid calling faulthandler_disabled() twice powerpc/6xx: add missing of_node_put powerpc/powernv: add missing of_node_put powerpc/cell: add missing of_node_put powerpc/btext: add missing of_node_put i2c: i801: Don't silently correct invalid transfer size powerpc/smp: Move setup_profiling_timer() under CONFIG_PROFILING i2c: mpc: Correct I2C reset procedure w1: Misuse of get_user()/put_user() reported by sparse ALSA: seq: Set upper limit of processed events i2c: designware-pci: Fix to change data types of hcnt and lcnt parameters MIPS: Octeon: Fix build errors using clang scsi: sr: Don't use GFP_DMA power: bq25890: Enable continuous conversion for ADC at charging ubifs: Error path in ubifs_remount_rw() seems to wrongly free write buffers ext4: set csum seed in tmp inode while migrating to extents ext4: Fix BUG_ON in ext4_bread when write quota data ext4: don't use the orphan list when migrating an inode powerpc/fsl/dts: Enable WA for erratum A-009885 on fman3l MDIO buses net/fsl: xgmac_mdio: Fix incorrect iounmap when removing module parisc: pdc_stable: Fix memory leak in pdcs_register_pathentries af_unix: annote lockless accesses to unix_tot_inflight & gc_in_progress net: axienet: Wait for PhyRstCmplt after core reset net: axienet: fix number of TX ring slots for available check netns: add schedule point in ops_exit_list() dmaengine: at_xdmac: Don't start transactions at tx_submit level dmaengine: at_xdmac: Print debug message after realeasing the lock dmaengine: at_xdmac: Fix lld view setting dmaengine: at_xdmac: Fix at_xdmac_lld struct definition net_sched: restore "mpu xxx" handling bcmgenet: add WOL IRQ check lib82596: Fix IRQ check in sni_82596_probe Linux 4.4.300 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ic6c59dd0f4ed703fff49584b3774d39e4548af4a
| | * net_sched: restore "mpu xxx" handlingKevin Bracey2022-01-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit fb80445c438c78b40b547d12b8d56596ce4ccfeb upstream. commit 56b765b79e9a ("htb: improved accuracy at high rates") broke "overhead X", "linklayer atm" and "mpu X" attributes. "overhead X" and "linklayer atm" have already been fixed. This restores the "mpu X" handling, as might be used by DOCSIS or Ethernet shaping: tc class add ... htb rate X overhead 4 mpu 64 The code being fixed is used by htb, tbf and act_police. Cake has its own mpu handling. qdisc_calculate_pkt_len still uses the size table containing values adjusted for mpu by user space. iproute2 tc has always passed mpu into the kernel via a tc_ratespec structure, but the kernel never directly acted on it, merely stored it so that it could be read back by `tc class show`. Rather, tc would generate length-to-time tables that included the mpu (and linklayer) in their construction, and the kernel used those tables. Since v3.7, the tables were no longer used. Along with "mpu", this also broke "overhead" and "linklayer" which were fixed in 01cb71d2d47b ("net_sched: restore "overhead xxx" handling", v3.10) and 8a8e3d84b171 ("net_sched: restore "linklayer atm" handling", v3.11). "overhead" was fixed by simply restoring use of tc_ratespec::overhead - this had originally been used by the kernel but was initially omitted from the new non-table-based calculations. "linklayer" had been handled in the table like "mpu", but the mode was not originally passed in tc_ratespec. The new implementation was made to handle it by getting new versions of tc to pass the mode in an extended tc_ratespec, and for older versions of tc the table contents were analysed at load time to deduce linklayer. As "mpu" has always been given to the kernel in tc_ratespec, accompanying the mpu-based table, we can restore system functionality with no userspace change by making the kernel act on the tc_ratespec value. Fixes: 56b765b79e9a ("htb: improved accuracy at high rates") Signed-off-by: Kevin Bracey <kevin@bracey.fi> Cc: Eric Dumazet <edumazet@google.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Vimalkumar <j.vimal@gmail.com> Link: https://lore.kernel.org/r/20220112170210.1014351-1-kevin@bracey.fi Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | Merge 4.4.298 into android-4.4-pGreg Kroah-Hartman2022-01-05
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.298 platform/x86: apple-gmux: use resource_size() with res recordmcount.pl: fix typo in s390 mcount regex selinux: initialize proto variable in selinux_ip_postroute_compat() nfc: uapi: use kernel size_t to fix user-space builds uapi: fix linux/nfc.h userspace compilation errors xhci: Fresco FL1100 controller should not have BROKEN_MSI quirk set. usb: gadget: f_fs: Clear ffs_eventfd in ffs_data_clear. scsi: vmw_pvscsi: Set residual data length conditionally Input: appletouch - initialize work before device registration Input: spaceball - fix parsing of movement data packets net: fix use-after-free in tw_timer_handler Linux 4.4.298 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: If5baf377c3d2fb89e244b005a47ddaccaff9e9f9
| | * uapi: fix linux/nfc.h userspace compilation errorsDmitry V. Levin2022-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 7175f02c4e5f5a9430113ab9ca0fd0ce98b28a51 upstream. Replace sa_family_t with __kernel_sa_family_t to fix the following linux/nfc.h userspace compilation errors: /usr/include/linux/nfc.h:266:2: error: unknown type name 'sa_family_t' sa_family_t sa_family; /usr/include/linux/nfc.h:274:2: error: unknown type name 'sa_family_t' sa_family_t sa_family; Fixes: 23b7869c0fd0 ("NFC: add the NFC socket raw protocol") Fixes: d646960f7986 ("NFC: Initial LLCP support") Cc: <stable@vger.kernel.org> Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| | * nfc: uapi: use kernel size_t to fix user-space buildsKrzysztof Kozlowski2022-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 79b69a83705e621b258ac6d8ae6d3bfdb4b930aa upstream. Fix user-space builds if it includes /usr/include/linux/nfc.h before some of other headers: /usr/include/linux/nfc.h:281:9: error: unknown type name ‘size_t’ 281 | size_t service_name_len; | ^~~~~~ Fixes: d646960f7986 ("NFC: Initial LLCP support") Cc: <stable@vger.kernel.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | Merge tag 'LA.UM.9.2.r1-03700-SDMxx0.0' of ↵Michael Bestas2021-12-27
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://source.codeaurora.org/quic/la/kernel/msm-4.4 into lineage-18.1-caf-msm8998 "LA.UM.9.2.r1-03700-SDMxx0.0" * tag 'LA.UM.9.2.r1-03700-SDMxx0.0' of https://source.codeaurora.org/quic/la/kernel/msm-4.4: msm: kgsl: Fix out of bound write in adreno_profile_submit_time uapi: Add UAPI headers for slatecom_interface driver soc: qcom: Add check to handle out of bound access msm: adsprpc: Handle UAF in process shell memory Change-Id: I7dcf42763390a7a156c41a9c08a9a3d653b7f0f2
| * | | uapi: Add UAPI headers for slatecom_interface driverKiran Gunda2021-09-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add slatecom_interface header file in gen_headers to make it accessible from userspace modules who uses Android.bp files for compilation. Change-Id: Ie298ec28983c16999d941ee8667e2dd7b5c3db22 Signed-off-by: Kiran Gunda <kgunda@codeaurora.org>
* | | | Merge remote-tracking branch 'common/android-4.4-p' into ↵Michael Bestas2021-12-27
|\ \ \ \ | | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lineage-18.1-caf-msm8998 * common/android-4.4-p: Linux 4.4.296 xen/netback: don't queue unlimited number of packages xen/console: harden hvc_xen against event channel storms xen/netfront: harden netfront against event channel storms xen/blkfront: harden blkfront against event channel storms Input: touchscreen - avoid bitwise vs logical OR warning ARM: 8805/2: remove unneeded naked function usage net: lan78xx: Avoid unnecessary self assignment net: systemport: Add global locking for descriptor lifecycle timekeeping: Really make sure wall_to_monotonic isn't positive USB: serial: option: add Telit FN990 compositions PCI/MSI: Clear PCI_MSIX_FLAGS_MASKALL on error USB: gadget: bRequestType is a bitfield, not a enum igbvf: fix double free in `igbvf_probe` soc/tegra: fuse: Fix bitwise vs. logical OR warning nfsd: fix use-after-free due to delegation race dm btree remove: fix use after free in rebalance_children() recordmcount.pl: look for jgnop instruction as well as bcrl on s390 mac80211: send ADDBA requests using the tid/queue of the aggregation session hwmon: (dell-smm) Fix warning on /proc/i8k creation error net: netlink: af_netlink: Prevent empty skb by adding a check on len. i2c: rk3x: Handle a spurious start completion interrupt flag parisc/agp: Annotate parisc agp init functions with __init nfc: fix segfault in nfc_genl_dump_devices_done FROMGIT: USB: gadget: bRequestType is a bitfield, not a enum Linux 4.4.295 irqchip: nvic: Fix offset for Interrupt Priority Offsets irqchip/irq-gic-v3-its.c: Force synchronisation when issuing INVALL iio: accel: kxcjk-1013: Fix possible memory leak in probe and remove iio: itg3200: Call iio_trigger_notify_done() on error iio: ltr501: Don't return error code in trigger handler iio: mma8452: Fix trigger reference couting iio: stk3310: Don't return error code in interrupt handler usb: core: config: fix validation of wMaxPacketValue entries USB: gadget: zero allocate endpoint 0 buffers USB: gadget: detect too-big endpoint 0 requests net/qla3xxx: fix an error code in ql_adapter_up() net, neigh: clear whole pneigh_entry at alloc time net: fec: only clear interrupt of handling queue in fec_enet_rx_queue() net: altera: set a couple error code in probe() net: cdc_ncm: Allow for dwNtbOutMaxSize to be unset or zero block: fix ioprio_get(IOPRIO_WHO_PGRP) vs setuid(2) tracefs: Set all files to the same group ownership as the mount option signalfd: use wake_up_pollfree() binder: use wake_up_pollfree() wait: add wake_up_pollfree() libata: add horkage for ASMedia 1092 can: pch_can: pch_can_rx_normal: fix use after free tracefs: Have new files inherit the ownership of their parent ALSA: pcm: oss: Handle missing errors in snd_pcm_oss_change_params*() ALSA: pcm: oss: Limit the period size to 16MB ALSA: pcm: oss: Fix negative period/buffer sizes ALSA: ctl: Fix copy of updated id with element read/write mm: bdi: initialize bdi_min_ratio when bdi is unregistered nfc: fix potential NULL pointer deref in nfc_genl_dump_ses_done can: sja1000: fix use after free in ems_pcmcia_add_card() HID: check for valid USB device for many HID drivers HID: wacom: fix problems when device is not a valid USB device HID: add USB_HID dependancy on some USB HID drivers HID: add USB_HID dependancy to hid-chicony HID: add USB_HID dependancy to hid-prodikeys HID: add hid_is_usb() function to make it simpler for USB detection HID: introduce hid_is_using_ll_driver UPSTREAM: USB: gadget: zero allocate endpoint 0 buffers UPSTREAM: USB: gadget: detect too-big endpoint 0 requests Linux 4.4.294 serial: pl011: Add ACPI SBSA UART match id tty: serial: msm_serial: Deactivate RX DMA for polling support vgacon: Propagate console boot parameters before calling `vc_resize' parisc: Fix "make install" on newer debian releases siphash: use _unaligned version by default net: qlogic: qlcnic: Fix a NULL pointer dereference in qlcnic_83xx_add_rings() natsemi: xtensa: fix section mismatch warnings fget: check that the fd still exists after getting a ref to it fs: add fget_many() and fput_many() sata_fsl: fix warning in remove_proc_entry when rmmod sata_fsl sata_fsl: fix UAF in sata_fsl_port_stop when rmmod sata_fsl kprobes: Limit max data_size of the kretprobe instances net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock() net: tulip: de4x5: fix the problem that the array 'lp->phy[8]' may be out of bound scsi: iscsi: Unblock session then wake up error handler s390/setup: avoid using memblock_enforce_memory_limit platform/x86: thinkpad_acpi: Fix WWAN device disabled issue after S3 deep net: return correct error code hugetlb: take PMD sharing into account when flushing tlb/caches tty: hvc: replace BUG_ON() with negative return value xen/netfront: don't trust the backend response data blindly xen/netfront: disentangle tx_skb_freelist xen/netfront: don't read data from request on the ring page xen/netfront: read response from backend only once xen/blkfront: don't trust the backend response data blindly xen/blkfront: don't take local copy of a request from the ring page xen/blkfront: read response from backend only once xen: sync include/xen/interface/io/ring.h with Xen's newest version shm: extend forced shm destroy to support objects from several IPC nses fuse: release pipe buf after last use fuse: fix page stealing NFC: add NCI_UNREG flag to eliminate the race proc/vmcore: fix clearing user buffer by properly using clear_user() hugetlbfs: flush TLBs correctly after huge_pmd_unshare tracing: Check pid filtering when creating events tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows scsi: mpt3sas: Fix kernel panic during drive powercycle test ARM: socfpga: Fix crash with CONFIG_FORTIRY_SOURCE NFSv42: Don't fail clone() unless the OP_CLONE operation failed net: ieee802154: handle iftypes as u32 ASoC: topology: Add missing rwsem around snd_ctl_remove() calls ARM: dts: BCM5301X: Add interrupt properties to GPIO node xen: detect uninitialized xenbus in xenbus_init xen: don't continue xenstore initialization in case of errors staging: rtl8192e: Fix use after free in _rtl92e_pci_disconnect() ALSA: ctxfi: Fix out-of-range access binder: fix test regression due to sender_euid change usb: hub: Fix locking issues with address0_mutex usb: hub: Fix usb enumeration issue due to address0 race USB: serial: option: add Fibocom FM101-GL variants USB: serial: option: add Telit LE910S1 0x9200 composition staging: ion: Prevent incorrect reference counting behavour Change-Id: Iadf9f213915d2a02b27ceb3b2144eac827ade329
| * | | Merge 4.4.295 into android-4.4-pGreg Kroah-Hartman2021-12-14
| |\ \ \ | | | |/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.295 HID: introduce hid_is_using_ll_driver HID: add hid_is_usb() function to make it simpler for USB detection HID: add USB_HID dependancy to hid-prodikeys HID: add USB_HID dependancy to hid-chicony HID: add USB_HID dependancy on some USB HID drivers HID: wacom: fix problems when device is not a valid USB device HID: check for valid USB device for many HID drivers can: sja1000: fix use after free in ems_pcmcia_add_card() nfc: fix potential NULL pointer deref in nfc_genl_dump_ses_done mm: bdi: initialize bdi_min_ratio when bdi is unregistered ALSA: ctl: Fix copy of updated id with element read/write ALSA: pcm: oss: Fix negative period/buffer sizes ALSA: pcm: oss: Limit the period size to 16MB ALSA: pcm: oss: Handle missing errors in snd_pcm_oss_change_params*() tracefs: Have new files inherit the ownership of their parent can: pch_can: pch_can_rx_normal: fix use after free libata: add horkage for ASMedia 1092 wait: add wake_up_pollfree() binder: use wake_up_pollfree() signalfd: use wake_up_pollfree() tracefs: Set all files to the same group ownership as the mount option block: fix ioprio_get(IOPRIO_WHO_PGRP) vs setuid(2) net: cdc_ncm: Allow for dwNtbOutMaxSize to be unset or zero net: altera: set a couple error code in probe() net: fec: only clear interrupt of handling queue in fec_enet_rx_queue() net, neigh: clear whole pneigh_entry at alloc time net/qla3xxx: fix an error code in ql_adapter_up() USB: gadget: detect too-big endpoint 0 requests USB: gadget: zero allocate endpoint 0 buffers usb: core: config: fix validation of wMaxPacketValue entries iio: stk3310: Don't return error code in interrupt handler iio: mma8452: Fix trigger reference couting iio: ltr501: Don't return error code in trigger handler iio: itg3200: Call iio_trigger_notify_done() on error iio: accel: kxcjk-1013: Fix possible memory leak in probe and remove irqchip/irq-gic-v3-its.c: Force synchronisation when issuing INVALL irqchip: nvic: Fix offset for Interrupt Priority Offsets Linux 4.4.295 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I4810f7e2ebd538739fb0d62f9eade0921a9badfb
| | * | wait: add wake_up_pollfree()Eric Biggers2021-12-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 42288cb44c4b5fff7653bc392b583a2b8bd6a8c0 upstream. Several ->poll() implementations are special in that they use a waitqueue whose lifetime is the current task, rather than the struct file as is normally the case. This is okay for blocking polls, since a blocking poll occurs within one task; however, non-blocking polls require another solution. This solution is for the queue to be cleared before it is freed, using 'wake_up_poll(wq, EPOLLHUP | POLLFREE);'. However, that has a bug: wake_up_poll() calls __wake_up() with nr_exclusive=1. Therefore, if there are multiple "exclusive" waiters, and the wakeup function for the first one returns a positive value, only that one will be called. That's *not* what's needed for POLLFREE; POLLFREE is special in that it really needs to wake up everyone. Considering the three non-blocking poll systems: - io_uring poll doesn't handle POLLFREE at all, so it is broken anyway. - aio poll is unaffected, since it doesn't support exclusive waits. However, that's fragile, as someone could add this feature later. - epoll doesn't appear to be broken by this, since its wakeup function returns 0 when it sees POLLFREE. But this is fragile. Although there is a workaround (see epoll), it's better to define a function which always sends POLLFREE to all waiters. Add such a function. Also make it verify that the queue really becomes empty after all waiters have been woken up. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211209010455.42744-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| | * | HID: add hid_is_usb() function to make it simpler for USB detectionGreg Kroah-Hartman2021-12-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit f83baa0cb6cfc92ebaf7f9d3a99d7e34f2e77a8a upstream. A number of HID drivers already call hid_is_using_ll_driver() but only for the detection of if this is a USB device or not. Make this more obvious by creating hid_is_usb() and calling the function that way. Also converts the existing hid_is_using_ll_driver() functions to use the new call. Cc: Jiri Kosina <jikos@kernel.org> Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com> Cc: linux-input@vger.kernel.org Cc: stable@vger.kernel.org Tested-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Link: https://lore.kernel.org/r/20211201183503.2373082-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| | * | HID: introduce hid_is_using_ll_driverJason Gerecke2021-12-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit fc2237a724a9e448599076d7d23497f51e2f7441 upstream. Although HID itself is transport-agnostic, occasionally a driver may want to interact with the low-level transport that a device is connected through. To do this, we need to know what kind of bus is in use. The first guess may be to look at the 'bus' field of the 'struct hid_device', but this field may be emulated in some cases (e.g. uhid). More ideally, we can check which ll_driver a device is using. This function introduces a 'hid_is_using_ll_driver' function and makes the 'struct hid_ll_driver' of the four most common transports accessible through hid.h. Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com> Acked-By: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | Merge 4.4.294 into android-4.4-pGreg Kroah-Hartman2021-12-08
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.294 staging: ion: Prevent incorrect reference counting behavour USB: serial: option: add Telit LE910S1 0x9200 composition USB: serial: option: add Fibocom FM101-GL variants usb: hub: Fix usb enumeration issue due to address0 race usb: hub: Fix locking issues with address0_mutex binder: fix test regression due to sender_euid change ALSA: ctxfi: Fix out-of-range access staging: rtl8192e: Fix use after free in _rtl92e_pci_disconnect() xen: don't continue xenstore initialization in case of errors xen: detect uninitialized xenbus in xenbus_init ARM: dts: BCM5301X: Add interrupt properties to GPIO node ASoC: topology: Add missing rwsem around snd_ctl_remove() calls net: ieee802154: handle iftypes as u32 NFSv42: Don't fail clone() unless the OP_CLONE operation failed ARM: socfpga: Fix crash with CONFIG_FORTIRY_SOURCE scsi: mpt3sas: Fix kernel panic during drive powercycle test tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows tracing: Check pid filtering when creating events hugetlbfs: flush TLBs correctly after huge_pmd_unshare proc/vmcore: fix clearing user buffer by properly using clear_user() NFC: add NCI_UNREG flag to eliminate the race fuse: fix page stealing fuse: release pipe buf after last use shm: extend forced shm destroy to support objects from several IPC nses xen: sync include/xen/interface/io/ring.h with Xen's newest version xen/blkfront: read response from backend only once xen/blkfront: don't take local copy of a request from the ring page xen/blkfront: don't trust the backend response data blindly xen/netfront: read response from backend only once xen/netfront: don't read data from request on the ring page xen/netfront: disentangle tx_skb_freelist xen/netfront: don't trust the backend response data blindly tty: hvc: replace BUG_ON() with negative return value hugetlb: take PMD sharing into account when flushing tlb/caches net: return correct error code platform/x86: thinkpad_acpi: Fix WWAN device disabled issue after S3 deep s390/setup: avoid using memblock_enforce_memory_limit scsi: iscsi: Unblock session then wake up error handler net: tulip: de4x5: fix the problem that the array 'lp->phy[8]' may be out of bound net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock() kprobes: Limit max data_size of the kretprobe instances sata_fsl: fix UAF in sata_fsl_port_stop when rmmod sata_fsl sata_fsl: fix warning in remove_proc_entry when rmmod sata_fsl fs: add fget_many() and fput_many() fget: check that the fd still exists after getting a ref to it natsemi: xtensa: fix section mismatch warnings net: qlogic: qlcnic: Fix a NULL pointer dereference in qlcnic_83xx_add_rings() siphash: use _unaligned version by default parisc: Fix "make install" on newer debian releases vgacon: Propagate console boot parameters before calling `vc_resize' tty: serial: msm_serial: Deactivate RX DMA for polling support serial: pl011: Add ACPI SBSA UART match id Linux 4.4.294 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Id3cafc33da957a0501bcf61d000025167d552797
| | * | siphash: use _unaligned version by defaultArnd Bergmann2021-12-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit f7e5b9bfa6c8820407b64eabc1f29c9a87e8993d upstream. On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS because the ordinary load/store instructions (ldr, ldrh, ldrb) can tolerate any misalignment of the memory address. However, load/store double and load/store multiple instructions (ldrd, ldm) may still only be used on memory addresses that are 32-bit aligned, and so we have to use the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS macro with care, or we may end up with a severe performance hit due to alignment traps that require fixups by the kernel. Testing shows that this currently happens with clang-13 but not gcc-11. In theory, any compiler version can produce this bug or other problems, as we are dealing with undefined behavior in C99 even on architectures that support this in hardware, see also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363. Fortunately, the get_unaligned() accessors do the right thing: when building for ARMv6 or later, the compiler will emit unaligned accesses using the ordinary load/store instructions (but avoid the ones that require 32-bit alignment). When building for older ARM, those accessors will emit the appropriate sequence of ldrb/mov/orr instructions. And on architectures that can truly tolerate any kind of misalignment, the get_unaligned() accessors resolve to the leXX_to_cpup accessors that operate on aligned addresses. Since the compiler will in fact emit ldrd or ldm instructions when building this code for ARM v6 or later, the solution is to use the unaligned accessors unconditionally on architectures where this is known to be fast. The _aligned version of the hash function is however still needed to get the best performance on architectures that cannot do any unaligned access in hardware. This new version avoids the undefined behavior and should produce the fastest hash on all architectures we support. Link: https://lore.kernel.org/linux-arm-kernel/20181008211554.5355-4-ard.biesheuvel@linaro.org/ Link: https://lore.kernel.org/linux-crypto/CAK8P3a2KfmmGDbVHULWevB0hv71P2oi2ZCHEAqT=8dQfa0=cqQ@mail.gmail.com/ Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Fixes: 2c956a60778c ("siphash: add cryptographically secure PRF") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| | * | fs: add fget_many() and fput_many()Jens Axboe2021-12-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 091141a42e15fe47ada737f3996b317072afcefb upstream. Some uses cases repeatedly get and put references to the same file, but the only exposed interface is doing these one at the time. As each of these entail an atomic inc or dec on a shared structure, that cost can add up. Add fget_many(), which works just like fget(), except it takes an argument for how many references to get on the file. Ditto fput_many(), which can drop an arbitrary number of references to a file. Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| | * | kprobes: Limit max data_size of the kretprobe instancesMasami Hiramatsu2021-12-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 6bbfa44116689469267f1a6e3d233b52114139d2 upstream. The 'kprobe::data_size' is unsigned, thus it can not be negative. But if user sets it enough big number (e.g. (size_t)-8), the result of 'data_size + sizeof(struct kretprobe_instance)' becomes smaller than sizeof(struct kretprobe_instance) or zero. In result, the kretprobe_instance are allocated without enough memory, and kretprobe accesses outside of allocated memory. To avoid this issue, introduce a max limitation of the kretprobe::data_size. 4KB per instance should be OK. Link: https://lkml.kernel.org/r/163836995040.432120.10322772773821182925.stgit@devnote2 Cc: stable@vger.kernel.org Fixes: f47cd9b553aa ("kprobes: kretprobe user entry-handler") Reported-by: zhangyue <zhangyue1@kylinos.cn> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>