summaryrefslogtreecommitdiff
path: root/net/ipv4/syncookies.c (follow)
Commit message (Collapse)AuthorAge
* Merge remote-tracking branch 'msm8998/lineage-20' into lineage-20Raghuram Subramani2024-10-17
| | | | Change-Id: I126075a330f305c85f8fe1b8c9d408f368be95d1
* Merge 4.4.244 into android-4.4-pGreg Kroah-Hartman2020-11-18
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.244 ring-buffer: Fix recursion protection transitions between interrupt context gfs2: Wake up when sd_glock_disposal becomes zero mm: mempolicy: fix potential pte_unmap_unlock pte error time: Prevent undefined behaviour in timespec64_to_ns() btrfs: reschedule when cloning lots of extents net: xfrm: fix a race condition during allocing spi perf tools: Add missing swap for ino_generation ALSA: hda: prevent undefined shift in snd_hdac_ext_bus_get_link() can: dev: can_get_echo_skb(): prevent call to kfree_skb() in hard IRQ context can: dev: __can_get_echo_skb(): fix real payload length return value for RTR frames can: can_create_echo_skb(): fix echo skb generation: always use skb_clone() can: peak_usb: add range checking in decode operations can: peak_usb: peak_usb_get_ts_time(): fix timestamp wrapping Btrfs: fix missing error return if writeback for extent buffer never started pinctrl: devicetree: Avoid taking direct reference to device name string i40e: Wrong truncation from u16 to u8 i40e: Fix of memory leak and integer truncation in i40e_virtchnl.c geneve: add transport ports in route lookup for geneve ath9k_htc: Use appropriate rs_datalen type usb: gadget: goku_udc: fix potential crashes in probe gfs2: Free rd_bits later in gfs2_clear_rgrpd to fix use-after-free gfs2: check for live vs. read-only file system in gfs2_fitrim drm/amdgpu: perform srbm soft reset always on SDMA resume mac80211: fix use of skb payload instead of header cfg80211: regulatory: Fix inconsistent format argument iommu/amd: Increase interrupt remapping table limit to 512 entries xfs: fix a missing unlock on error in xfs_fs_map_blocks of/address: Fix of_node memory leak in of_dma_is_coherent cosa: Add missing kfree in error path of cosa_write perf: Fix get_recursion_context() ext4: correctly report "not supported" for {usr,grp}jquota when !CONFIG_QUOTA ext4: unlock xattr_sem properly in ext4_inline_data_truncate() usb: cdc-acm: Add DISABLE_ECHO for Renesas USB Download mode mei: protect mei_cl_mtu from null dereference ocfs2: initialize ip_next_orphan don't dump the threads that had been already exiting when zapped. drm/gma500: Fix out-of-bounds access to struct drm_device.vblank[] pinctrl: amd: use higher precision for 512 RtcClk pinctrl: amd: fix incorrect way to disable debounce filter swiotlb: fix "x86: Don't panic if can not alloc buffer for swiotlb" IPv6: Set SIT tunnel hard_header_len to zero net/af_iucv: fix null pointer dereference on shutdown net/x25: Fix null-ptr-deref in x25_connect net: Update window_clamp if SOCK_RCVBUF is set random32: make prandom_u32() output unpredictable x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP xen/events: avoid removing an event channel while handling it xen/events: add a proper barrier to 2-level uevent unmasking xen/events: fix race in evtchn_fifo_unmask() xen/events: add a new "late EOI" evtchn framework xen/blkback: use lateeoi irq binding xen/netback: use lateeoi irq binding xen/scsiback: use lateeoi irq binding xen/pciback: use lateeoi irq binding xen/events: switch user event channels to lateeoi model xen/events: use a common cpu hotplug hook for event channels xen/events: defer eoi in case of excessive number of events xen/events: block rogue events for some time perf/core: Fix race in the perf_mmap_close() function Revert "kernel/reboot.c: convert simple_strtoul to kstrtoint" reboot: fix overflow parsing reboot cpu number ext4: fix leaking sysfs kobject after failed mount Convert trailing spaces and periods in path components Linux 4.4.244 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I70bf4c5ac9248a8ca3383b9b0c4871729606e75e
| * net: Update window_clamp if SOCK_RCVBUF is setMao Wenan2020-11-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 909172a149749242990a6e64cb55d55460d4e417 ] When net.ipv4.tcp_syncookies=1 and syn flood is happened, cookie_v4_check or cookie_v6_check tries to redo what tcp_v4_send_synack or tcp_v6_send_synack did, rsk_window_clamp will be changed if SOCK_RCVBUF is set, which will make rcv_wscale is different, the client still operates with initial window scale and can overshot granted window, the client use the initial scale but local server use new scale to advertise window value, and session work abnormally. Fixes: e88c64f0a425 ("tcp: allow effective reduction of TCP's rcv-buffer via setsockopt") Signed-off-by: Mao Wenan <wenan.mao@linux.alibaba.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/1604967391-123737-1-git-send-email-wenan.mao@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge 4.4.177 into android-4.4-pGreg Kroah-Hartman2019-03-23
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.177 ceph: avoid repeatedly adding inode to mdsc->snap_flush_list numa: change get_mempolicy() to use nr_node_ids instead of MAX_NUMNODES KEYS: allow reaching the keys quotas exactly mfd: ti_am335x_tscadc: Use PLATFORM_DEVID_AUTO while registering mfd cells mfd: twl-core: Fix section annotations on {,un}protect_pm_master mfd: db8500-prcmu: Fix some section annotations mfd: ab8500-core: Return zero in get_register_interruptible() mfd: qcom_rpm: write fw_version to CTRL_REG mfd: wm5110: Add missing ASRC rate register mfd: mc13xxx: Fix a missing check of a register-read failure net: hns: Fix use after free identified by SLUB debug MIPS: ath79: Enable OF serial ports in the default config scsi: qla4xxx: check return code of qla4xxx_copy_from_fwddb_param scsi: isci: initialize shost fully before calling scsi_add_host() MIPS: jazz: fix 64bit build isdn: i4l: isdn_tty: Fix some concurrency double-free bugs atm: he: fix sign-extension overflow on large shift leds: lp5523: fix a missing check of return value of lp55xx_read isdn: avm: Fix string plus integer warning from Clang RDMA/srp: Rework SCSI device reset handling KEYS: user: Align the payload buffer KEYS: always initialize keyring_index_key::desc_len batman-adv: fix uninit-value in batadv_interface_tx() net/packet: fix 4gb buffer limit due to overflow check team: avoid complex list operations in team_nl_cmd_options_set() sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach() net/mlx4_en: Force CHECKSUM_NONE for short ethernet frames ARCv2: Enable unaligned access in early ASM code Revert "bridge: do not add port to router list when receives query with source 0.0.0.0" libceph: handle an empty authorize reply scsi: libsas: Fix rphy phy_identifier for PHYs with end devices attached drm/msm: Unblock writer if reader closes file ASoC: Intel: Haswell/Broadwell: fix setting for .dynamic field ALSA: compress: prevent potential divide by zero bugs thermal: int340x_thermal: Fix a NULL vs IS_ERR() check usb: dwc3: gadget: Fix the uninitialized link_state when udc starts usb: gadget: Potential NULL dereference on allocation error ASoC: dapm: change snprintf to scnprintf for possible overflow ASoC: imx-audmux: change snprintf to scnprintf for possible overflow ARC: fix __ffs return value to avoid build warnings mac80211: fix miscounting of ttl-dropped frames serial: fsl_lpuart: fix maximum acceptable baud rate with over-sampling scsi: csiostor: fix NULL pointer dereference in csio_vport_set_state() net: altera_tse: fix connect_local_phy error path ibmveth: Do not process frames after calling napi_reschedule mac80211: don't initiate TDLS connection if station is not associated to AP cfg80211: extend range deviation for DMG KVM: nSVM: clear events pending from svm_complete_interrupts() when exiting to L1 arm/arm64: KVM: Feed initialized memory to MMIO accesses KVM: arm/arm64: Fix MMIO emulation data handling powerpc: Always initialize input array when calling epapr_hypercall() mmc: spi: Fix card detection during probe mm: enforce min addr even if capable() in expand_downwards() x86/uaccess: Don't leak the AC flag into __put_user() value evaluation USB: serial: option: add Telit ME910 ECM composition USB: serial: cp210x: add ID for Ingenico 3070 USB: serial: ftdi_sio: add ID for Hjelmslund Electronics USB485 cpufreq: Use struct kobj_attribute instead of struct global_attr sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names ncpfs: fix build warning of strncpy isdn: isdn_tty: fix build warning of strncpy staging: lustre: fix buffer overflow of string buffer net-sysfs: Fix mem leak in netdev_register_kobject sky2: Disable MSI on Dell Inspiron 1545 and Gateway P-79 team: Free BPF filter when unregistering netdev bnxt_en: Drop oversize TX packets to prevent errors. net: nfc: Fix NULL dereference on nfc_llcp_build_tlv fails xen-netback: fix occasional leak of grant ref mappings under memory pressure net: Add __icmp_send helper. net: avoid use IPCB in cipso_v4_error net: phy: Micrel KSZ8061: link failure after cable connect x86/CPU/AMD: Set the CPB bit unconditionally on F17h applicom: Fix potential Spectre v1 vulnerabilities MIPS: irq: Allocate accurate order pages for irq stack hugetlbfs: fix races and page leaks during migration netlabel: fix out-of-bounds memory accesses net: dsa: mv88e6xxx: Fix u64 statistics ip6mr: Do not call __IP6_INC_STATS() from preemptible context media: uvcvideo: Fix 'type' check leading to overflow vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel perf tools: Handle TOPOLOGY headers with no CPU IB/{hfi1, qib}: Fix WC.byte_len calculation for UD_SEND_WITH_IMM ipvs: Fix signed integer overflow when setsockopt timeout iommu/amd: Fix IOMMU page flush when detach device from a domain xtensa: SMP: fix ccount_timer_shutdown xtensa: SMP: fix secondary CPU initialization xtensa: smp_lx200_defconfig: fix vectors clash xtensa: SMP: mark each possible CPU as present xtensa: SMP: limit number of possible CPUs by NR_CPUS net: altera_tse: fix msgdma_tx_completion on non-zero fill_level case net: hns: Fix wrong read accesses via Clause 45 MDIO protocol net: stmmac: dwmac-rk: fix error handling in rk_gmac_powerup() gpio: vf610: Mask all GPIO interrupts nfs: Fix NULL pointer dereference of dev_name scsi: libfc: free skb when receiving invalid flogi resp platform/x86: Fix unmet dependency warning for SAMSUNG_Q10 cifs: fix computation for MAX_SMB2_HDR_SIZE x86/kexec: Don't setup EFI info if EFI runtime is not enabled x86_64: increase stack size for KASAN_EXTRA mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone mm, memory_hotplug: test_pages_in_a_zone do not pass the end of zone fs/drop_caches.c: avoid softlockups in drop_pagecache_sb() autofs: drop dentry reference only when it is never used autofs: fix error return in autofs_fill_super() ARM: pxa: ssp: unneeded to free devm_ allocated data irqchip/mmp: Only touch the PJ4 IRQ & FIQ bits on enable/disable dmaengine: at_xdmac: Fix wrongfull report of a channel as in use dmaengine: dmatest: Abort test in case of mapping error s390/qeth: fix use-after-free in error path perf symbols: Filter out hidden symbols from labels MIPS: Remove function size check in get_frame_info() Input: wacom_serial4 - add support for Wacom ArtPad II tablet Input: elan_i2c - add id for touchpad found in Lenovo s21e-20 iscsi_ibft: Fix missing break in switch statement futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock() ARM: dts: exynos: Add minimal clkout parameters to Exynos3250 PMU Revert "x86/platform/UV: Use efi_runtime_lock to serialise BIOS calls" ARM: dts: exynos: Do not ignore real-world fuse values for thermal zone 0 on Exynos5420 udplite: call proper backlog handlers netfilter: x_tables: enforce nul-terminated table name from getsockopt GET_ENTRIES netfilter: nfnetlink_log: just returns error for unknown command netfilter: nfnetlink_acct: validate NFACCT_FILTER parameters netfilter: nf_conntrack_tcp: Fix stack out of bounds when parsing TCP options KEYS: restrict /proc/keys by credentials at open time l2tp: fix infoleak in l2tp_ip6_recvmsg() net: hsr: fix memory leak in hsr_dev_finalize() net: sit: fix UBSAN Undefined behaviour in check_6rd net/x25: fix use-after-free in x25_device_event() net/x25: reset state in x25_connect() pptp: dst_release sk_dst_cache in pptp_sock_destruct ravb: Decrease TxFIFO depth of Q3 and Q2 to one route: set the deleted fnhe fnhe_daddr to 0 in ip_del_fnhe to fix a race tcp: handle inet_csk_reqsk_queue_add() failures net/mlx4_core: Fix reset flow when in command polling mode net/mlx4_core: Fix qp mtt size calculation net/x25: fix a race in x25_bind() mdio_bus: Fix use-after-free on device_register fails net: Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255 missing barriers in some of unix_sock ->addr and ->path accesses ipvlan: disallow userns cap_net_admin to change global mode/flags vxlan: test dev->flags & IFF_UP before calling gro_cells_receive() vxlan: Fix GRO cells race condition between receive and link delete net/hsr: fix possible crash in add_timer() gro_cells: make sure device is up in gro_cells_receive() tcp/dccp: remove reqsk_put() from inet_child_forget() ALSA: bebob: use more identical mod_alias for Saffire Pro 10 I/O against Liquid Saffire 56 fs/9p: use fscache mutex rather than spinlock It's wrong to add len to sector_nr in raid10 reshape twice media: videobuf2-v4l2: drop WARN_ON in vb2_warn_zero_bytesused() 9p: use inode->i_lock to protect i_size_write() under 32-bit 9p/net: fix memory leak in p9_client_create ASoC: fsl_esai: fix register setting issue in RIGHT_J mode stm class: Fix an endless loop in channel allocation crypto: caam - fixed handling of sg list crypto: ahash - fix another early termination in hash walk gpu: ipu-v3: Fix i.MX51 CSI control registers offset gpu: ipu-v3: Fix CSI offsets for imx53 s390/dasd: fix using offset into zero size array error ARM: OMAP2+: Variable "reg" in function omap4_dsi_mux_pads() could be uninitialized Input: matrix_keypad - use flush_delayed_work() i2c: cadence: Fix the hold bit setting Input: st-keyscan - fix potential zalloc NULL dereference ARM: 8824/1: fix a migrating irq bug when hotplug cpu assoc_array: Fix shortcut creation scsi: libiscsi: Fix race between iscsi_xmit_task and iscsi_complete_task net: systemport: Fix reception of BPDUs pinctrl: meson: meson8b: fix the sdxc_a data 1..3 pins net: mv643xx_eth: disable clk on error path in mv643xx_eth_shared_probe() ASoC: topology: free created components in tplg load error arm64: Relax GIC version check during early boot tmpfs: fix link accounting when a tmpfile is linked in ARC: uacces: remove lp_start, lp_end from clobber list phonet: fix building with clang mac80211_hwsim: propagate genlmsg_reply return code net: set static variable an initial value in atl2_probe() tmpfs: fix uninitialized return value in shmem_link stm class: Prevent division by zero crypto: arm64/aes-ccm - fix logical bug in AAD MAC handling CIFS: Fix read after write for files with read caching tracing: Do not free iter->trace in fail path of tracing_open_pipe() ACPI / device_sysfs: Avoid OF modalias creation for removed device regulator: s2mps11: Fix steps for buck7, buck8 and LDO35 regulator: s2mpa01: Fix step values for some LDOs clocksource/drivers/exynos_mct: Move one-shot check from tick clear to ISR clocksource/drivers/exynos_mct: Clear timer interrupt when shutdown s390/virtio: handle find on invalid queue gracefully scsi: virtio_scsi: don't send sc payload with tmfs scsi: target/iscsi: Avoid iscsit_release_commands_from_conn() deadlock m68k: Add -ffreestanding to CFLAGS btrfs: ensure that a DUP or RAID1 block group has exactly two stripes Btrfs: fix corruption reading shared and compressed extents after hole punching crypto: pcbc - remove bogus memcpy()s with src == dest cpufreq: tegra124: add missing of_node_put() cpufreq: pxa2xx: remove incorrect __init annotation ext4: fix crash during online resizing ext2: Fix underflow in ext2_max_size() clk: ingenic: Fix round_rate misbehaving with non-integer dividers dmaengine: usb-dmac: Make DMAC system sleep callbacks explicit mm/vmalloc: fix size check for remap_vmalloc_range_partial() kernel/sysctl.c: add missing range check in do_proc_dointvec_minmax_conv intel_th: Don't reference unassigned outputs parport_pc: fix find_superio io compare code, should use equal test. i2c: tegra: fix maximum transfer size perf bench: Copy kernel files needed to build mem{cpy,set} x86_64 benchmarks serial: 8250_pci: Fix number of ports for ACCES serial cards serial: 8250_pci: Have ACCES cards that use the four port Pericom PI7C9X7954 chip use the pci_pericom_setup() jbd2: clear dirty flag when revoking a buffer from an older transaction jbd2: fix compile warning when using JBUFFER_TRACE powerpc/32: Clear on-stack exception marker upon exception return powerpc/wii: properly disable use of BATs when requested. powerpc/powernv: Make opal log only readable by root powerpc/83xx: Also save/restore SPRG4-7 during suspend ARM: s3c24xx: Fix boolean expressions in osiris_dvs_notify dm: fix to_sector() for 32bit NFS41: pop some layoutget errors to application perf intel-pt: Fix CYC timestamp calculation after OVF perf auxtrace: Define auxtrace record alignment perf intel-pt: Fix overlap calculation for padding md: Fix failed allocation of md_register_thread NFS: Fix an I/O request leakage in nfs_do_recoalesce NFS: Don't recoalesce on error in nfs_pageio_complete_mirror() nfsd: fix memory corruption caused by readdir nfsd: fix wrong check in write_v4_end_grace() PM / wakeup: Rework wakeup source timer cancellation rcu: Do RCU GP kthread self-wakeup from softirq and interrupt media: uvcvideo: Avoid NULL pointer dereference at the end of streaming drm/radeon/evergreen_cs: fix missing break in switch statement KVM: nVMX: Sign extend displacements of VMX instr's mem operands KVM: nVMX: Ignore limit checks on VMX instructions using flat segments KVM: X86: Fix residual mmio emulation request to userspace Linux 4.4.177 Change-Id: Ia33b88c9634e04612874d79ce4cc166e8aa8096a Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * tcp: handle inet_csk_reqsk_queue_add() failuresGuillaume Nault2019-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 9d3e1368bb45893a75a5dfb7cd21fdebfa6b47af ] Commit 7716682cc58e ("tcp/dccp: fix another race at listener dismantle") let inet_csk_reqsk_queue_add() fail, and adjusted {tcp,dccp}_check_req() accordingly. However, TFO and syncookies weren't modified, thus leaking allocated resources on error. Contrary to tcp_check_req(), in both syncookies and TFO cases, we need to drop the request socket. Also, since the child socket is created with inet_csk_clone_lock(), we have to unlock it and drop an extra reference (->sk_refcount is initially set to 2 and inet_csk_reqsk_queue_add() drops only one ref). For TFO, we also need to revert the work done by tcp_try_fastopen() (with reqsk_fastopen_remove()). Fixes: 7716682cc58e ("tcp/dccp: fix another race at listener dismantle") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge 4.4.99 into android-4.4Greg Kroah-Hartman2017-11-18
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.99 mac80211: accept key reinstall without changing anything mac80211: use constant time comparison with keys mac80211: don't compare TKIP TX MIC key in reinstall prevention usb: usbtest: fix NULL pointer dereference Input: ims-psu - check if CDC union descriptor is sane ALSA: seq: Cancel pending autoload work at unbinding device tun/tap: sanitize TUNSETSNDBUF input tcp: fix tcp_mtu_probe() vs highest_sack l2tp: check ps->sock before running pppol2tp_session_ioctl() tun: call dev_get_valid_name() before register_netdevice() sctp: add the missing sock_owned_by_user check in sctp_icmp_redirect packet: avoid panic in packet_getsockopt() ipv6: flowlabel: do not leave opt->tot_len with garbage net/unix: don't show information about sockets from other namespaces ip6_gre: only increase err_count for some certain type icmpv6 in ip6gre_err tun: allow positive return values on dev_get_valid_name() call sctp: reset owner sk for data chunks on out queues when migrating a sock ppp: fix race in ppp device destruction ipip: only increase err_count for some certain type icmp in ipip_err tcp/dccp: fix ireq->opt races tcp/dccp: fix lockdep splat in inet_csk_route_req() tcp/dccp: fix other lockdep splats accessing ireq_opt security/keys: add CONFIG_KEYS_COMPAT to Kconfig tipc: fix link attribute propagation bug brcmfmac: remove setting IBSS mode when stopping AP target/iscsi: Fix iSCSI task reassignment handling target: Fix node_acl demo-mode + uncached dynamic shutdown regression misc: panel: properly restore atomic counter on error path Linux 4.4.99 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * tcp/dccp: fix ireq->opt racesEric Dumazet2017-11-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit c92e8c02fe664155ac4234516e32544bec0f113d ] syzkaller found another bug in DCCP/TCP stacks [1] For the reasons explained in commit ce1050089c96 ("tcp/dccp: fix ireq->pktopts race"), we need to make sure we do not access ireq->opt unless we own the request sock. Note the opt field is renamed to ireq_opt to ease grep games. [1] BUG: KASAN: use-after-free in ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474 Read of size 1 at addr ffff8801c951039c by task syz-executor5/3295 CPU: 1 PID: 3295 Comm: syz-executor5 Not tainted 4.14.0-rc4+ #80 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 print_address_description+0x73/0x250 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x25b/0x340 mm/kasan/report.c:409 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:427 ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474 tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1135 tcp_send_ack.part.37+0x3bb/0x650 net/ipv4/tcp_output.c:3587 tcp_send_ack+0x49/0x60 net/ipv4/tcp_output.c:3557 __tcp_ack_snd_check+0x2c6/0x4b0 net/ipv4/tcp_input.c:5072 tcp_ack_snd_check net/ipv4/tcp_input.c:5085 [inline] tcp_rcv_state_process+0x2eff/0x4850 net/ipv4/tcp_input.c:6071 tcp_child_process+0x342/0x990 net/ipv4/tcp_minisocks.c:816 tcp_v4_rcv+0x1827/0x2f80 net/ipv4/tcp_ipv4.c:1682 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216 NF_HOOK include/linux/netfilter.h:249 [inline] ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257 dst_input include/net/dst.h:464 [inline] ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397 NF_HOOK include/linux/netfilter.h:249 [inline] ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587 netif_receive_skb+0xae/0x390 net/core/dev.c:4611 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792 call_write_iter include/linux/fs.h:1770 [inline] new_sync_write fs/read_write.c:468 [inline] __vfs_write+0x68a/0x970 fs/read_write.c:481 vfs_write+0x18f/0x510 fs/read_write.c:543 SYSC_write fs/read_write.c:588 [inline] SyS_write+0xef/0x220 fs/read_write.c:580 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x40c341 RSP: 002b:00007f469523ec10 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 000000000040c341 RDX: 0000000000000037 RSI: 0000000020004000 RDI: 0000000000000015 RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000 R10: 00000000000f4240 R11: 0000000000000293 R12: 00000000004b7fd1 R13: 00000000ffffffff R14: 0000000020000000 R15: 0000000000025000 Allocated by task 3295: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551 __do_kmalloc mm/slab.c:3725 [inline] __kmalloc+0x162/0x760 mm/slab.c:3734 kmalloc include/linux/slab.h:498 [inline] tcp_v4_save_options include/net/tcp.h:1962 [inline] tcp_v4_init_req+0x2d3/0x3e0 net/ipv4/tcp_ipv4.c:1271 tcp_conn_request+0xf6d/0x3410 net/ipv4/tcp_input.c:6283 tcp_v4_conn_request+0x157/0x210 net/ipv4/tcp_ipv4.c:1313 tcp_rcv_state_process+0x8ea/0x4850 net/ipv4/tcp_input.c:5857 tcp_v4_do_rcv+0x55c/0x7d0 net/ipv4/tcp_ipv4.c:1482 tcp_v4_rcv+0x2d10/0x2f80 net/ipv4/tcp_ipv4.c:1711 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216 NF_HOOK include/linux/netfilter.h:249 [inline] ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257 dst_input include/net/dst.h:464 [inline] ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397 NF_HOOK include/linux/netfilter.h:249 [inline] ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587 netif_receive_skb+0xae/0x390 net/core/dev.c:4611 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792 call_write_iter include/linux/fs.h:1770 [inline] new_sync_write fs/read_write.c:468 [inline] __vfs_write+0x68a/0x970 fs/read_write.c:481 vfs_write+0x18f/0x510 fs/read_write.c:543 SYSC_write fs/read_write.c:588 [inline] SyS_write+0xef/0x220 fs/read_write.c:580 entry_SYSCALL_64_fastpath+0x1f/0xbe Freed by task 3306: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524 __cache_free mm/slab.c:3503 [inline] kfree+0xca/0x250 mm/slab.c:3820 inet_sock_destruct+0x59d/0x950 net/ipv4/af_inet.c:157 __sk_destruct+0xfd/0x910 net/core/sock.c:1560 sk_destruct+0x47/0x80 net/core/sock.c:1595 __sk_free+0x57/0x230 net/core/sock.c:1603 sk_free+0x2a/0x40 net/core/sock.c:1614 sock_put include/net/sock.h:1652 [inline] inet_csk_complete_hashdance+0xd5/0xf0 net/ipv4/inet_connection_sock.c:959 tcp_check_req+0xf4d/0x1620 net/ipv4/tcp_minisocks.c:765 tcp_v4_rcv+0x17f6/0x2f80 net/ipv4/tcp_ipv4.c:1675 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216 NF_HOOK include/linux/netfilter.h:249 [inline] ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257 dst_input include/net/dst.h:464 [inline] ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397 NF_HOOK include/linux/netfilter.h:249 [inline] ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587 netif_receive_skb+0xae/0x390 net/core/dev.c:4611 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792 call_write_iter include/linux/fs.h:1770 [inline] new_sync_write fs/read_write.c:468 [inline] __vfs_write+0x68a/0x970 fs/read_write.c:481 vfs_write+0x18f/0x510 fs/read_write.c:543 SYSC_write fs/read_write.c:588 [inline] SyS_write+0xef/0x220 fs/read_write.c:580 entry_SYSCALL_64_fastpath+0x1f/0xbe Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets") Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge 4.4.81 into android-4.4Greg Kroah-Hartman2017-08-11
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.81 libata: array underflow in ata_find_dev() workqueue: restore WQ_UNBOUND/max_active==1 to be ordered ALSA: hda - Fix speaker output from VAIO VPCL14M1R ASoC: do not close shared backend dailink KVM: async_pf: make rcu irq exit if not triggered from idle task mm/page_alloc: Remove kernel address exposure in free_reserved_area() ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize ext4: fix overflow caused by missing cast in ext4_resize_fs() ARM: dts: armada-38x: Fix irq type for pca955 media: platform: davinci: return -EINVAL for VPFE_CMD_S_CCDC_RAW_PARAMS ioctl target: Avoid mappedlun symlink creation during lun shutdown iscsi-target: Always wait for kthread_should_stop() before kthread exit iscsi-target: Fix early sk_data_ready LOGIN_FLAGS_READY race iscsi-target: Fix initial login PDU asynchronous socket close OOPs iscsi-target: Fix delayed logout processing greater than SECONDS_FOR_LOGOUT_COMP iser-target: Avoid isert_conn->cm_id dereference in isert_login_recv_done mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries media: lirc: LIRC_GET_REC_RESOLUTION should return microseconds f2fs: sanity check checkpoint segno and blkoff drm: rcar-du: fix backport bug saa7164: fix double fetch PCIe access condition ipv4: ipv6: initialize treq->txhash in cookie_v[46]_check() net: Zero terminate ifr_name in dev_ifname(). ipv6: avoid overflow of offset in ip6_find_1stfragopt ipv4: initialize fib_trie prior to register_netdev_notifier call. rtnetlink: allocate more memory for dev_set_mac_address() mcs7780: Fix initialization when CONFIG_VMAP_STACK is enabled openvswitch: fix potential out of bound access in parse_ct packet: fix use-after-free in prb_retire_rx_blk_timer_expired() ipv6: Don't increase IPSTATS_MIB_FRAGFAILS twice in ip6_fragment() net: ethernet: nb8800: Handle all 4 RGMII modes identically dccp: fix a memleak that dccp_ipv6 doesn't put reqsk properly dccp: fix a memleak that dccp_ipv4 doesn't put reqsk properly dccp: fix a memleak for dccp_feat_init err process sctp: don't dereference ptr before leaving _sctp_walk_{params, errors}() sctp: fix the check for _sctp_walk_params and _sctp_walk_errors net/mlx5: Fix command bad flow on command entry allocation failure net: phy: Correctly process PHY_HALTED in phy_stop_machine() net: phy: Fix PHY unbind crash xen-netback: correctly schedule rate-limited queues sparc64: Measure receiver forward progress to avoid send mondo timeout wext: handle NULL extra data in iwe_stream_add_point better sh_eth: R8A7740 supports packet shecksumming net: phy: dp83867: fix irq generation tg3: Fix race condition in tg3_get_stats64(). x86/boot: Add missing declaration of string functions phy state machine: failsafe leave invalid RUNNING state scsi: qla2xxx: Get mutex lock before checking optrom_state drm/virtio: fix framebuffer sparse warning virtio_blk: fix panic in initialization error path ARM: 8632/1: ftrace: fix syscall name matching mm, slab: make sure that KMALLOC_MAX_SIZE will fit into MAX_ORDER lib/Kconfig.debug: fix frv build failure signal: protect SIGNAL_UNKILLABLE from unintentional clearing. mm: don't dereference struct page fields of invalid pages ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output net: account for current skb length when deciding about UFO workqueue: implicit ordered attribute should be overridable Linux 4.4.81 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * ipv4: ipv6: initialize treq->txhash in cookie_v[46]_check()Alexander Potapenko2017-08-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 18bcf2907df935981266532e1e0d052aff2e6fae ] KMSAN reported use of uninitialized memory in skb_set_hash_from_sk(), which originated from the TCP request socket created in cookie_v6_check(): ================================================================== BUG: KMSAN: use of uninitialized memory in tcp_transmit_skb+0xf77/0x3ec0 CPU: 1 PID: 2949 Comm: syz-execprog Not tainted 4.11.0-rc5+ #2931 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 TCP: request_sock_TCPv6: Possible SYN flooding on port 20028. Sending cookies. Check SNMP counters. Call Trace: <IRQ> __dump_stack lib/dump_stack.c:16 dump_stack+0x172/0x1c0 lib/dump_stack.c:52 kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927 __msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469 skb_set_hash_from_sk ./include/net/sock.h:2011 tcp_transmit_skb+0xf77/0x3ec0 net/ipv4/tcp_output.c:983 tcp_send_ack+0x75b/0x830 net/ipv4/tcp_output.c:3493 tcp_delack_timer_handler+0x9a6/0xb90 net/ipv4/tcp_timer.c:284 tcp_delack_timer+0x1b0/0x310 net/ipv4/tcp_timer.c:309 call_timer_fn+0x240/0x520 kernel/time/timer.c:1268 expire_timers kernel/time/timer.c:1307 __run_timers+0xc13/0xf10 kernel/time/timer.c:1601 run_timer_softirq+0x36/0xa0 kernel/time/timer.c:1614 __do_softirq+0x485/0x942 kernel/softirq.c:284 invoke_softirq kernel/softirq.c:364 irq_exit+0x1fa/0x230 kernel/softirq.c:405 exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:657 smp_apic_timer_interrupt+0x5a/0x80 arch/x86/kernel/apic/apic.c:966 apic_timer_interrupt+0x86/0x90 arch/x86/entry/entry_64.S:489 RIP: 0010:native_restore_fl ./arch/x86/include/asm/irqflags.h:36 RIP: 0010:arch_local_irq_restore ./arch/x86/include/asm/irqflags.h:77 RIP: 0010:__msan_poison_alloca+0xed/0x120 mm/kmsan/kmsan_instr.c:440 RSP: 0018:ffff880024917cd8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10 RAX: 0000000000000246 RBX: ffff8800224c0000 RCX: 0000000000000005 RDX: 0000000000000004 RSI: ffff880000000000 RDI: ffffea0000b6d770 RBP: ffff880024917d58 R08: 0000000000000dd8 R09: 0000000000000004 R10: 0000160000000000 R11: 0000000000000000 R12: ffffffff85abf810 R13: ffff880024917dd8 R14: 0000000000000010 R15: ffffffff81cabde4 </IRQ> poll_select_copy_remaining+0xac/0x6b0 fs/select.c:293 SYSC_select+0x4b4/0x4e0 fs/select.c:653 SyS_select+0x76/0xa0 fs/select.c:634 entry_SYSCALL_64_fastpath+0x13/0x94 arch/x86/entry/entry_64.S:204 RIP: 0033:0x4597e7 RSP: 002b:000000c420037ee0 EFLAGS: 00000246 ORIG_RAX: 0000000000000017 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004597e7 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 000000c420037ef0 R08: 000000c420037ee0 R09: 0000000000000059 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000042dc20 R13: 00000000000000f3 R14: 0000000000000030 R15: 0000000000000003 chained origin: save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302 kmsan_save_stack mm/kmsan/kmsan.c:317 kmsan_internal_chain_origin+0x12a/0x1f0 mm/kmsan/kmsan.c:547 __msan_store_shadow_origin_4+0xac/0x110 mm/kmsan/kmsan_instr.c:259 tcp_create_openreq_child+0x709/0x1ae0 net/ipv4/tcp_minisocks.c:472 tcp_v6_syn_recv_sock+0x7eb/0x2a30 net/ipv6/tcp_ipv6.c:1103 tcp_get_cookie_sock+0x136/0x5f0 net/ipv4/syncookies.c:212 cookie_v6_check+0x17a9/0x1b50 net/ipv6/syncookies.c:245 tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989 tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298 tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487 ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279 NF_HOOK ./include/linux/netfilter.h:257 ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322 dst_input ./include/net/dst.h:492 ip6_rcv_finish net/ipv6/ip6_input.c:69 NF_HOOK ./include/linux/netfilter.h:257 ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203 __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208 __netif_receive_skb net/core/dev.c:4246 process_backlog+0x667/0xba0 net/core/dev.c:4866 napi_poll net/core/dev.c:5268 net_rx_action+0xc95/0x1590 net/core/dev.c:5333 __do_softirq+0x485/0x942 kernel/softirq.c:284 origin: save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302 kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198 kmsan_kmalloc+0x7f/0xe0 mm/kmsan/kmsan.c:337 kmem_cache_alloc+0x1c2/0x1e0 mm/slub.c:2766 reqsk_alloc ./include/net/request_sock.h:87 inet_reqsk_alloc+0xa4/0x5b0 net/ipv4/tcp_input.c:6200 cookie_v6_check+0x4f4/0x1b50 net/ipv6/syncookies.c:169 tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989 tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298 tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487 ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279 NF_HOOK ./include/linux/netfilter.h:257 ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322 dst_input ./include/net/dst.h:492 ip6_rcv_finish net/ipv6/ip6_input.c:69 NF_HOOK ./include/linux/netfilter.h:257 ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203 __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208 __netif_receive_skb net/core/dev.c:4246 process_backlog+0x667/0xba0 net/core/dev.c:4866 napi_poll net/core/dev.c:5268 net_rx_action+0xc95/0x1590 net/core/dev.c:5333 __do_softirq+0x485/0x942 kernel/softirq.c:284 ================================================================== Similar error is reported for cookie_v4_check(). Fixes: 58d607d3e52f ("tcp: provide skb->hash to synack packets") Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | net: inet: Support UID-based routing in IP protocols.Lorenzo Colitti2016-12-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Use the UID in routing lookups made by protocol connect() and sendmsg() functions. - Make sure that routing lookups triggered by incoming packets (e.g., Path MTU discovery) take the UID of the socket into account. - For packets not associated with a userspace socket, (e.g., ping replies) use UID 0 inside the user namespace corresponding to the network namespace the socket belongs to. This allows all namespaces to apply routing and iptables rules to kernel-originated traffic in that namespaces by matching UID 0. This is better than using the UID of the kernel socket that is sending the traffic, because the UID of kernel sockets created at namespace creation time (e.g., the per-processor ICMP and TCP sockets) is the UID of the user that created the socket, which might not be mapped in the namespace. Bug: 16355602 Change-Id: I910504b508948057912bc188fd1e8aca28294de3 Tested: compiles allnoconfig, allyesconfig, allmodconfig Tested: https://android-review.googlesource.com/253302 Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Revert "net: core: Support UID-based routing."Lorenzo Colitti2016-12-20
| | | | | | | | | | | | | | This reverts commit fd2cf795f3ab193752781be7372949ac1780d0ed. Bug: 16355602 Change-Id: I1ec2d1eb3d53f4186b60c6ca5d6a20fcca46d442
* | net: core: Support UID-based routing.Lorenzo Colitti2016-02-16
|/ | | | | | | | | | | | | | | | This contains the following commits: 1. cc2f522 net: core: Add a UID range to fib rules. 2. d7ed2bd net: core: Use the socket UID in routing lookups. 3. 2f9306a net: core: Add a RTA_UID attribute to routes. This is so that userspace can do per-UID route lookups. 4. 8e46efb net: ipv6: Use the UID in IPv6 PMTUD IPv4 PMTUD already does this because ipv4_sk_update_pmtu uses __build_flow_key, which includes the UID. Bug: 15413527 Change-Id: Iae3d4ca3979d252b6cec989bdc1a6875f811f03a Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
* tcp/dccp: fix hashdance race for passive sessionsEric Dumazet2015-10-23
| | | | | | | | | | | | | | | | | | | | | | Multiple cpus can process duplicates of incoming ACK messages matching a SYN_RECV request socket. This is a rare event under normal operations, but definitely can happen. Only one must win the race, otherwise corruption would occur. To fix this without adding new atomic ops, we use logic in inet_ehash_nolisten() to detect the request was present in the same ehash bucket where we try to insert the new child. If request socket was not found, we have to undo the child creation. This actually removes a spin_lock()/spin_unlock() pair in reqsk_queue_unlink() for the fast path. Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets") Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: shrink struct sock and request_sock by 8 bytesEric Dumazet2015-10-12
| | | | | | | | One 32bit hole is following skc_refcnt, use it. skc_incoming_cpu can also be an union for request_sock rcv_wnd. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: fix RFS vs lockless listenersEric Dumazet2015-10-11
| | | | | | | | | | | | | | | | | | | | Before recent TCP listener patches, we were updating listener sk->sk_rxhash before the cloning of master socket. children sk_rxhash was therefore correct after the normal 3WHS. But with lockless listener, we no longer dirty/change listener sk_rxhash as it would be racy. We need to correctly update the child sk_rxhash, otherwise first data packet wont hit correct cpu if RFS is used. Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Willem de Bruijn <willemb@google.com> Cc: Tom Herbert <tom@herbertland.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: avoid two atomic ops for syncookiesEric Dumazet2015-10-05
| | | | | | | | | | | | inet_reqsk_alloc() is used to allocate a temporary request in order to generate a SYNACK with a cookie. Then later, syncookie validation also uses a temporary request. These paths already took a reference on listener refcount, we can avoid a couple of atomic operations. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp/dccp: install syn_recv requests into ehash tableEric Dumazet2015-10-03
| | | | | | | | | | | | | | | | | | | | | | | | In this patch, we insert request sockets into TCP/DCCP regular ehash table (where ESTABLISHED and TIMEWAIT sockets are) instead of using the per listener hash table. ACK packets find SYN_RECV pseudo sockets without having to find and lock the listener. In nominal conditions, this halves pressure on listener lock. Note that this will allow for SO_REUSEPORT refinements, so that we can select a listener using cpu/numa affinities instead of the prior 'consistent hash', since only SYN packets will apply this selection logic. We will shrink listen_sock in the following patch to ease code review. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ying Cai <ycai@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: cookie_init_sequence() cleanupsEric Dumazet2015-09-29
| | | | | | | | Some common IPv4/IPv6 code can be factorized. Also constify cookie_init_sequence() socket argument. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: usec resolution SYN/ACK RTTYuchung Cheng2015-09-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently SYN/ACK RTT is measured in jiffies. For LAN the SYN/ACK RTT is often measured as 0ms or sometimes 1ms, which would affect RTT estimation and min RTT samping used by some congestion control. This patch improves SYN/ACK RTT to be usec resolution if platform supports it. While the timestamping of SYN/ACK is done in request sock, the RTT measurement is carefully arranged to avoid storing another u64 timestamp in tcp_sock. For regular handshake w/o SYNACK retransmission, the RTT is sampled right after the child socket is created and right before the request sock is released (tcp_check_req() in tcp_minisocks.c) For Fast Open the child socket is already created when SYN/ACK was sent, the RTT is sampled in tcp_rcv_state_process() after processing the final ACK an right before the request socket is released. If the SYN/ACK was retransmistted or SYN-cookie was used, we rely on TCP timestamps to measure the RTT. The sample is taken at the same place in tcp_rcv_state_process() after the timestamp values are validated in tcp_validate_incoming(). Note that we do not store TS echo value in request_sock for SYN-cookies, because the value is already stored in tp->rx_opt used by tcp_ack_update_rtt(). One side benefit is that the RTT measurement now happens before initializing congestion control (of the passive side). Therefore the congestion control can use the SYN/ACK RTT. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: get_cookie_sock() consolidationEric Dumazet2015-06-07
| | | | | | | | | | IPv4 and IPv6 share same implementation of get_cookie_sock(), and there is no point inlining it. We add tcp_ prefix to the common helper name and export it. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: fix ipv4 mapped request socksEric Dumazet2015-03-25
| | | | | | | | | | | | | | | | | | | ss should display ipv4 mapped request sockets like this : tcp SYN-RECV 0 0 ::ffff:192.168.0.1:8080 ::ffff:192.0.2.1:35261 and not like this : tcp SYN-RECV 0 0 192.168.0.1:8080 192.0.2.1:35261 We should init ireq->ireq_family based on listener sk_family, not the actual protocol carried by SYN packet. This means we can set ireq_family in inet_reqsk_alloc() Fixes: 3f66b083a5b7 ("inet: introduce ireq_family") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: get rid of central tcp/dccp listener timerEric Dumazet2015-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of the major issue for TCP is the SYNACK rtx handling, done by inet_csk_reqsk_queue_prune(), fired by the keepalive timer of a TCP_LISTEN socket. This function runs for awful long times, with socket lock held, meaning that other cpus needing this lock have to spin for hundred of ms. SYNACK are sent in huge bursts, likely to cause severe drops anyway. This model was OK 15 years ago when memory was very tight. We now can afford to have a timer per request sock. Timer invocations no longer need to lock the listener, and can be run from all cpus in parallel. With following patch increasing somaxconn width to 32 bits, I tested a listener with more than 4 million active request sockets, and a steady SYNFLOOD of ~200,000 SYN per second. Host was sending ~830,000 SYNACK per second. This is ~100 times more what we could achieve before this patch. Later, we will get rid of the listener hash and use ehash instead. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: request sock should init IPv6/IPv4 addressesEric Dumazet2015-03-18
| | | | | | | | In order to be able to use sk_ehashfn() for request socks, we need to initialize their IPv6/IPv4 addresses. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: fix request sock refcountingEric Dumazet2015-03-17
| | | | | | | | | | | | | | | | | While testing last patch series, I found req sock refcounting was wrong. We must set skc_refcnt to 1 for all request socks added in hashes, but also on request sockets created by FastOpen or syncookies. It is tricky because we need to defer this initialization so that future RCU lookups do not try to take a refcount on a not yet fully initialized request socket. Also get rid of ireq_refcnt alias. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 13854e5a6046 ("inet: add proper refcounting to request sock") Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: rename struct tcp_request_sock listenerEric Dumazet2015-03-17
| | | | | | | | | The listener field in struct tcp_request_sock is a pointer back to the listener. We now have req->rsk_listener, so TCP only needs one boolean and not a full pointer. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: add sk_listener argument to inet_reqsk_alloc()Eric Dumazet2015-03-17
| | | | | | | | | | | listener socket can be used to set net pointer, and will be later used to hold a reference on listener. Add a const qualifier to first argument (struct request_sock_ops *), and factorize all write_pnet(&ireq->ireq_net, sock_net(sk)); Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: add proper refcounting to request sockEric Dumazet2015-03-16
| | | | | | | | | | | | | | | | reqsk_put() is the generic function that should be used to release a refcount (and automatically call reqsk_free()) reqsk_free() might be called if refcount is known to be 0 or undefined. refcnt is set to one in inet_csk_reqsk_queue_add() As request socks are not yet in global ehash table, I added temporary debugging checks in reqsk_put() and reqsk_free() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: fill request sock ir_iif for IPv4Eric Dumazet2015-03-14
| | | | | | | | | | | | | | | Once request socks will be in ehash table, they will need to have a valid ir_iff field. This is currently true only for IPv6. This patch extends support for IPv4 as well. This means inet_diag_fill_req() can now properly use ir_iif, which is better for IPv6 link locals anyway, as request sockets and established sockets will propagate consistent netlink idiag_if. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: introduce ireq_familyEric Dumazet2015-03-12
| | | | | | | | Before inserting request socks into general hash table, fill their socket family. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: fix CONFIG_NET_NS=n compilationEric Dumazet2015-03-11
| | | | | | | | | I forgot to use write_pnet() in three locations. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 33cf7c90fe2f9 ("net: add real socket cookies") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: add real socket cookiesEric Dumazet2015-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | A long standing problem in netlink socket dumps is the use of kernel socket addresses as cookies. 1) It is a security concern. 2) Sockets can be reused quite quickly, so there is no guarantee a cookie is used once and identify a flow. 3) request sock, establish sock, and timewait socks for a given flow have different cookies. Part of our effort to bring better TCP statistics requires to switch to a different allocator. In this patch, I chose to use a per network namespace 64bit generator, and to use it only in the case a socket needs to be dumped to netlink. (This might be refined later if needed) Note that I tried to carry cookies from request sock, to establish sock, then timewait sockets. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Eric Salo <salo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: allow setting ecn via routing tableFlorian Westphal2014-11-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows to set ECN on a per-route basis in case the sysctl tcp_ecn is not set to 1. In other words, when ECN is set for specific routes, it provides a tcp_ecn=1 behaviour for that route while the rest of the stack acts according to the global settings. One can use 'ip route change dev $dev $net features ecn' to toggle this. Having a more fine-grained per-route setting can be beneficial for various reasons, for example, 1) within data centers, or 2) local ISPs may deploy ECN support for their own video/streaming services [1], etc. There was a recent measurement study/paper [2] which scanned the Alexa's publicly available top million websites list from a vantage point in US, Europe and Asia: Half of the Alexa list will now happily use ECN (tcp_ecn=2, most likely blamed to commit 255cac91c3 ("tcp: extend ECN sysctl to allow server-side only ECN") ;)); the break in connectivity on-path was found is about 1 in 10,000 cases. Timeouts rather than receiving back RSTs were much more common in the negotiation phase (and mostly seen in the Alexa middle band, ranks around 50k-150k): from 12-thousand hosts on which there _may_ be ECN-linked connection failures, only 79 failed with RST when _not_ failing with RST when ECN is not requested. It's unclear though, how much equipment in the wild actually marks CE when buffers start to fill up. We thought about a fallback to non-ECN for retransmitted SYNs as another global option (which could perhaps one day be made default), but as Eric points out, there's much more work needed to detect broken middleboxes. Two examples Eric mentioned are buggy firewalls that accept only a single SYN per flow, and middleboxes that successfully let an ECN flow establish, but later mark CE for all packets (so cwnd converges to 1). [1] http://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf, p.15 [2] http://ecn.ethz.ch/ Joint work with Daniel Borkmann. Reference: http://thread.gmane.org/gmane.linux.network/335797 Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* syncookies: split cookie_check_timestamp() into two functionsFlorian Westphal2014-11-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function cookie_check_timestamp(), both called from IPv4/6 context, is being used to decode the echoed timestamp from the SYN/ACK into TCP options used for follow-up communication with the peer. We can remove ECN handling from that function, split it into a separate one, and simply rename the original function into cookie_decode_options(). cookie_decode_options() just fills in tcp_option struct based on the echoed timestamp received from the peer. Anything that fails in this function will actually discard the request socket. While this is the natural place for decoding options such as ECN which commit 172d69e63c7f ("syncookies: add support for ECN") added, we argue that in particular for ECN handling, it can be checked at a later point in time as the request sock would actually not need to be dropped from this, but just ECN support turned off. Therefore, we split this functionality into cookie_ecn_ok(), which tells us if the timestamp indicates ECN support AND the tcp_ecn sysctl is enabled. This prepares for per-route ECN support: just looking at the tcp_ecn sysctl won't be enough anymore at that point; if the timestamp indicates ECN and sysctl tcp_ecn == 0, we will also need to check the ECN dst metric. This would mean adding a route lookup to cookie_check_timestamp(), which we definitely want to avoid. As we already do a route lookup at a later point in cookie_{v4,v6}_check(), we can simply make use of that as well for the new cookie_ecn_ok() function w/o any additional cost. Joint work with Daniel Borkmann. Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* syncookies: avoid magic values and document which-bit-is-what-optionFlorian Westphal2014-11-04
| | | | | | | | | | | | Was a bit more difficult to read than needed due to magic shifts; add defines and document the used encoding scheme. Joint work with Daniel Borkmann. Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* syncookies: only increment SYNCOOKIESFAILED on validation errorFlorian Westphal2014-10-30
| | | | | | | | Only count packets that failed cookie-authentication. We can get SYNCOOKIESFAILED > 0 while we never even sent a single cookie. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2014-10-18
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Include fixes for netrom and dsa (Fabian Frederick and Florian Fainelli) 2) Fix FIXED_PHY support in stmmac, from Giuseppe CAVALLARO. 3) Several SKB use after free fixes (vxlan, openvswitch, vxlan, ip_tunnel, fou), from Li ROngQing. 4) fec driver PTP support fixes from Luwei Zhou and Nimrod Andy. 5) Use after free in virtio_net, from Michael S Tsirkin. 6) Fix flow mask handling for megaflows in openvswitch, from Pravin B Shelar. 7) ISDN gigaset and capi bug fixes from Tilman Schmidt. 8) Fix route leak in ip_send_unicast_reply(), from Vasily Averin. 9) Fix two eBPF JIT bugs on x86, from Alexei Starovoitov. 10) TCP_SKB_CB() reorganization caused a few regressions, fixed by Cong Wang and Eric Dumazet. 11) Don't overwrite end of SKB when parsing malformed sctp ASCONF chunks, from Daniel Borkmann. 12) Don't call sock_kfree_s() with NULL pointers, this function also has the side effect of adjusting the socket memory usage. From Cong Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits) bna: fix skb->truesize underestimation net: dsa: add includes for ethtool and phy_fixed definitions openvswitch: Set flow-key members. netrom: use linux/uaccess.h dsa: Fix conversion from host device to mii bus tipc: fix bug in bundled buffer reception ipv6: introduce tcp_v6_iif() sfc: add support for skb->xmit_more r8152: return -EBUSY for runtime suspend ipv4: fix a potential use after free in fou.c ipv4: fix a potential use after free in ip_tunnel_core.c hyperv: Add handling of IP header with option field in netvsc_set_hash() openvswitch: Create right mask with disabled megaflows vxlan: fix a free after use openvswitch: fix a use after free ipv4: dst_entry leak in ip_send_unicast_reply() ipv4: clean up cookie_v4_check() ipv4: share tcp_v4_save_options() with cookie_v4_check() ipv4: call __ip_options_echo() in cookie_v4_check() atm: simplify lanai.c by using module_pci_driver ...
| * ipv4: clean up cookie_v4_check()Cong Wang2014-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | We can retrieve opt from skb, no need to pass it as a parameter. And opt should always be non-NULL, no need to check. Cc: Krzysztof Kolasa <kkolasa@winsoft.pl> Cc: Eric Dumazet <edumazet@google.com> Tested-by: Krzysztof Kolasa <kkolasa@winsoft.pl> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv4: share tcp_v4_save_options() with cookie_v4_check()Cong Wang2014-10-17
| | | | | | | | | | | | | | | | | | | | | | | | cookie_v4_check() allocates ip_options_rcu in the same way with tcp_v4_save_options(), we can just make it a helper function. Cc: Krzysztof Kolasa <kkolasa@winsoft.pl> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv4: call __ip_options_echo() in cookie_v4_check()Cong Wang2014-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 971f10eca186cab238c49da ("tcp: better TCP_SKB_CB layout to reduce cache line misses") missed that cookie_v4_check() still calls ip_options_echo() which uses IPCB(). It should use TCPCB() at TCP layer, so call __ip_options_echo() instead. Fixes: commit 971f10eca186cab238c49da ("tcp: better TCP_SKB_CB layout to reduce cache line misses") Cc: Krzysztof Kolasa <kkolasa@winsoft.pl> Cc: Eric Dumazet <edumazet@google.com> Reported-by: Krzysztof Kolasa <kkolasa@winsoft.pl> Tested-by: Krzysztof Kolasa <kkolasa@winsoft.pl> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'for-3.18-consistent-ops' of ↵Linus Torvalds2014-10-15
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu consistent-ops changes from Tejun Heo: "Way back, before the current percpu allocator was implemented, static and dynamic percpu memory areas were allocated and handled separately and had their own accessors. The distinction has been gone for many years now; however, the now duplicate two sets of accessors remained with the pointer based ones - this_cpu_*() - evolving various other operations over time. During the process, we also accumulated other inconsistent operations. This pull request contains Christoph's patches to clean up the duplicate accessor situation. __get_cpu_var() uses are replaced with with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr(). Unfortunately, the former sometimes is tricky thanks to C being a bit messy with the distinction between lvalues and pointers, which led to a rather ugly solution for cpumask_var_t involving the introduction of this_cpu_cpumask_var_ptr(). This converts most of the uses but not all. Christoph will follow up with the remaining conversions in this merge window and hopefully remove the obsolete accessors" * 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits) irqchip: Properly fetch the per cpu offset percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write. percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t Revert "powerpc: Replace __get_cpu_var uses" percpu: Remove __this_cpu_ptr clocksource: Replace __this_cpu_ptr with raw_cpu_ptr sparc: Replace __get_cpu_var uses avr32: Replace __get_cpu_var with __this_cpu_write blackfin: Replace __get_cpu_var uses tile: Use this_cpu_ptr() for hardware counters tile: Replace __get_cpu_var uses powerpc: Replace __get_cpu_var uses alpha: Replace __get_cpu_var ia64: Replace __get_cpu_var uses s390: cio driver &__get_cpu_var replacements s390: Replace __get_cpu_var uses mips: Replace __get_cpu_var uses MIPS: Replace __get_cpu_var uses in FPU emulator. arm: Replace __this_cpu_ptr with raw_cpu_ptr ...
| * net: Replace get_cpu_var through this_cpu_ptrChristoph Lameter2014-08-26
| | | | | | | | | | | | | | | | | | | | Replace uses of get_cpu_var for address calculation through this_cpu_ptr. Cc: netdev@vger.kernel.org Cc: Eric Dumazet <edumazet@google.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* | tcp: syncookies: mark cookie_secret read_mostlyFlorian Westphal2014-08-27
|/ | | | | | | only written once. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: cookie_v4_init_sequence: skb should be constOctavian Purdila2014-06-27
| | | | | Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: support marking accepting TCP socketsLorenzo Colitti2014-05-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using mark-based routing, sockets returned from accept() may need to be marked differently depending on the incoming connection request. This is the case, for example, if different socket marks identify different networks: a listening socket may want to accept connections from all networks, but each connection should be marked with the network that the request came in on, so that subsequent packets are sent on the correct network. This patch adds a sysctl to mark TCP sockets based on the fwmark of the incoming SYN packet. If enabled, and an unmarked socket receives a SYN, then the SYN packet's fwmark is written to the connection's inet_request_sock, and later written back to the accepted socket when the connection is established. If the socket already has a nonzero mark, then the behaviour is the same as it is today, i.e., the listening socket's fwmark is used. Black-box tested using user-mode linux: - IPv4/IPv6 SYN+ACK, FIN, etc. packets are routed based on the mark of the incoming SYN packet. - The socket returned by accept() is marked with the mark of the incoming SYN packet. - Tested with syncookies=1 and syncookies=2. Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: fix checkpatch error "space prohibited"Weilong Chen2013-12-26
| | | | | Signed-off-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: split syncookie keys for ipv4 and ipv6 and initialize with ↵Hannes Frederic Sowa2013-10-19
| | | | | | | | | | | | | | | net_get_random_once This patch splits the secret key for syncookies for ipv4 and ipv6 and initializes them with net_get_random_once. This change was the reason I did this series. I think the initialization of the syncookie_secret is way to early. Cc: Florian Westphal <fw@strlen.de> Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: rename ir_loc_port to ir_numEric Dumazet2013-10-10
| | | | | | | | | | | | | | | | | In commit 634fb979e8f ("inet: includes a sock_common in request_sock") I forgot that the two ports in sock_common do not have same byte order : skc_dport is __be16 (network order), but skc_num is __u16 (host order) So sparse complains because ir_loc_port (mapped into skc_num) is considered as __u16 while it should be __be16 Let rename ir_loc_port to ireq->ir_num (analogy with inet->inet_num), and perform appropriate htons/ntohs conversions. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: includes a sock_common in request_sockEric Dumazet2013-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | TCP listener refactoring, part 5 : We want to be able to insert request sockets (SYN_RECV) into main ehash table instead of the per listener hash table to allow RCU lookups and remove listener lock contention. This patch includes the needed struct sock_common in front of struct request_sock This means there is no more inet6_request_sock IPv6 specific structure. Following inet_request_sock fields were renamed as they became macros to reference fields from struct sock_common. Prefix ir_ was chosen to avoid name collisions. loc_port -> ir_loc_port loc_addr -> ir_loc_addr rmt_addr -> ir_rmt_addr rmt_port -> ir_rmt_port iif -> ir_iif Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: syncookies: reduce mss table to four valuesFlorian Westphal2013-09-24
| | | | | | | | | | | | | Halve mss table size to make blind cookie guessing more difficult. This is sad since the tables were already small, but there is little alternative except perhaps adding more precise mss information in the tcp timestamp. Timestamps are unfortunately not ubiquitous. Guessing all possible cookie values still has 8-in 2**32 chance. Reported-by: Jakob Lell <jakob@jakoblell.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: syncookies: reduce cookie lifetime to 128 secondsFlorian Westphal2013-09-24
| | | | | | | | | | | | | | | | | | | | We currently accept cookies that were created less than 4 minutes ago (ie, cookies with counter delta 0-3). Combined with the 8 mss table values, this yields 32 possible values (out of 2**32) that will be valid. Reducing the lifetime to < 2 minutes halves the guessing chance while still providing a large enough period. While at it, get rid of jiffies value -- they overflow too quickly on 32 bit platforms. getnstimeofday is used to create a counter that increments every 64s. perf shows getnstimeofday cost is negible compared to sha_transform; normal tcp initial sequence number generation uses getnstimeofday, too. Reported-by: Jakob Lell <jakob@jakoblell.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>