Tuning FreeBSD For Routing And Firewalling

2y ago
36 Views
4 Downloads
1.64 MB
61 Pages
Last View : 18d ago
Last Download : 5m ago
Upload by : Maxton Kershaw
Transcription

AsiaBSDcon 2018Tuning FreeBSD forrouting and firewallingOlivier Cochard-Labbé1 / 61

whoami(1) olivier.cochard@ olivier@2 / 61

Benchmarking a router Router job: Forward packets between itsinterfaces at maximum rateReference value:Packet Forwarding Ratein packets-per-second (pps) unit– NOT a bandwidth (in bit-per-second unit)RFC 2544: Benchmarking Methodology forNetwork Interconnect Devices3 / 61

Some Line-rate references Gigabit line-rate: 1.48M frames-per-second 10 Gigabit line rate: 14.8M frames-per-second Small packets: 1 frame 1 packet Gigabit Ethernet is a full duplex media:–A line-rate Gigabit router MUST be ableto receive AND transmit in the same time,then to forward at 3Mpps4 / 61

I want bandwidth values! Packets-per-second * Packets-sizeEstimated using Simple Internet Mix (IMIX)packet size trimodal reference distributionIPv4 layer in bits-per-second:7 40 4 576 1500PPS () 812 Ethernet layer, add 14 bytes (switchcounters):7 54 4 590 1514PPS () 812 Since about 2004, Internet packets size distribution isbimodal (44% less than 100B and 37% more than 1400Bin 2006)5 / 61

Minimum FullMinimum rate,Full-duplexduplexusing IMIXminimum IMIXline-ratedistribution forlink speedrouter reaching link speedrouter1.48 Mpps 3 Mpps 350 Kpps700 Kpps10Gb/s 14.8 Mpps 30 Mpps 3.5 Mpps7 Mpps6 / 61

Simple benchmark lab As a telco we measure the worse case(Denial-of-Service):–Smallest packet size–Maximum link rateDeviceUnderTestingnetmap‘spkt-genMeasure pointSwitch (optional) counters used tovalidate pkt-gen measureManager(scripted benches)7 / 61

Hardware detailsServersCPUcoresGHzNetwork card (driver name)DellPowerEdgeR630Intel E5-2650 v42x12x2 2.210G Intel 82599ES (ixgbe)10G Chelsio T520-CR (cxgbe)10G Mellanox ConnectX-3 Pro (mlx4en)10-50G Mellanox ConnectX-4 LX (mlx5en)HP ProLiantDL360p Gen8Intel E5-2650 v28x22.610G Chelsio T540-CR (cxgbe)10G Emulex OneConnect be3 (oce)SuperMicro5018A-FTN4Intel Atom C2758 82.410G Chelsio T540-CR (cxgbe)SuperMicro5018A-FTN4Intel Atom C2758 82.410G Intel 82599 (ixgbe)NetgateIntel Atom C2558 4RCC-VE 48602.4Gigabit Intel i350 (igb)PC EnginesAPU21Gigabit Intel i210AT (igb)AMD GX-412TC4No 16 cores-in-one-socket CPUSame DAC for all 10G: QFX-SFP-DAC-3M8 / 61

Multi-queue NIC & RSS1)NIC drivers creates one queue per core detected(maximum values are drivers dependent)2)Toeplitz hash used for balancing received packetsaccross each queues.SRC IP / DST IP / SRC PORT / DST PORT (4 tuples)SRC IP / DST IP (2 tuples)Input packetsHash of packets’ 4 tuples usedFor selecting MSI queuesCPUCPUCPUCPU9 / 61

Multi-queue NIC & RSS!1)Needs multiple flows Local tunnel (IPSec, GRE, ) presents onlyone flow: Performance problem with 1G homefiber ISP using PPPoE as example2)Needs multi-CPUs Benefit of physical cores vs logical cores(Hyper Threading) vs multiple socket ?10 / 61

Monitoring queues usage Python script from melifaro@ parsing sysctlNIC stats (RX queue mainly)Support: bxe, cxl, ix, ixl, igb, mce, mlxen p] # nic-queue-usage cxl0[Q0 856K/s] [Q1 862K/s] [Q2 846K/s][Q0 864K/s] [Q1 871K/s] [Q2 853K/s][Q0 843K/s] [Q1 851K/s] [Q2 834K/s][Q0 844K/s] [Q1 846K/s] [Q2 826K/s][Q0 832K/s] [Q1 847K/s] [Q2 828K/s][Q0 867K/s] [Q1 874K/s] [Q2 855K/s][Q0 826K/s] [Q1 831K/s] [Q2 Q7[Q7[Q7Summary of all 1K/s6692K/s6885K/s6578K/sGlobal NICRX 09K/s15831K/s- - - - - - - 13K/s]13K/s]13K/s]12K/s]13K/s]13K/s]12K/s]Global NICTX counter11 / 61

Hyper-threading & cxgbeCPU: Intel Xeon CPU E5-2650 v2 @ 2.60GHz (2593.81-MHz K8-class CPU)( )FreeBSD/SMP: Multiprocessor System Detected: 16 CPUsFreeBSD/SMP: 1 package(s) x 8 core(s) x 2 hardware threads( )cxl0: port 0 numa-domain 0 on t5nex0cxl0: Ethernet address: 00:07:43:2e:e4:70cxl0: 16 txq, 8 rxq (NIC); 8 txq, 2 rxq (TOE)cxl1: port 1 numa-domain 0 on t5nex0cxl1: Ethernet address: 00:07:43:2e:e4:78cxl1: 16 txq, 8 rxq (NIC); 8 txq, 2 rxq (TOE)cxgbe doesn‘t use all CPUs by default if CPU 812 / 61

Hyper-threading & cxgbe Config 1: default (8 rx queues) Config 2: 16 rx queues to use ALL 16 CPUs– Config 3: disabling HT (8 rx queues)– hw.cxgbe.nrxq10g 16machdep.hyperthreading allowed 0FreeBSD 11.1-RELEASE amd6413 / 61

Disabling Hyper-Threadingministat(1) is my friendx Xeon E5-2650v2 & cxgbe, HT-enabled & 8rxq(default): inet4 packets-per-second Xeon E5-2650v2 & cxgbe, HT-enabled & 16rxq: inet4 packets-per-second* Xeon E5-2650v2 & cxgbe, HT-disabled & 8rxq: inet4 packets-per-second ------------------------ ** xxxx *** A AM A ------------------------ 93.894545.404 54925106519863251045125088362.1102920.87Difference at 95.0% confidence440068 /- 1441269.46731% /- 3.23827%(Student's t, pooled s 5Difference at 95.0% confidence1.13671e 06 /- 98524.224.4544% /- 2.62824%(Student's t, pooled s 67554.4)10Gb/s full duplex IMIX router7 MppsTips 1: Disable Hyper-threading14 / 61

Queues/cores impactLocking problem?15 / 61

Analysing bottleneckkldload hwpmcpmcstat -S CPU CLK UNHALTED CORE -l 20 -O data.outstackcollapse-pmc.pl data.out data.stackflamegraph.pl data.stack data.svgFlame Graphrw rlock rw r.arpresolveb. ether outputip tryforwardip inputnetisr dispatch srcbcmpether demuxether nh inputm.netisr dispatch srcuma . ether inputget scatt. t4 eth rxservice iqt4 intrintr event execute handlersithread loopfork exitrlock on arpreslovecxg.rw rlockfib4 lookup nh basicip findrouterlock on ip findrouteSearchrw runlock.rn ma.l.mt.random h.random harvest queueet.dra.mp .cxg.eth.ip .ip .net.eth.eth.net.eth.t4 .ser.t4 tx u.eth txdrain ringmp ring enqu.cxgbe transmitether outputip tryforwardip inputnetisr dispa.ether demuxether nh inputnetisr dispa.ether inputt4 eth rxservice iqt4 intrNIC drivers& Ethernet path16 / 61

Random harvest sources # sysctl kern.random.harvestkern.random.harvest.mask symbolic: [UMA],[FS ATIME],SWI,INTERRUPT,NET NG,NET ETHER,NET t.mask bin: 00111111111kern.random.harvest.mask: 511 Config 1: defaultConfig 2: Do not use INTERRUPT neitherNET ETHER as entropy sourcesharvest mask "351"!Security impact regarding the randomgenerator17 / 61

kern.random.harvest.maskSetupCPU (cores) & NIC511 (default)Median of 5351ministatMedian of 5E5-2650v4 (2x12) & ixgbe3.74 Mpps3.78 MppsNo diff. proven at 95.0% confidenceE5-2650v4 (2x12) & cxgbe4.82 Mpps4.87 MppsNo diff. proven at 95.0% confidenceE5-2650v4 (2x12) & ml4en3.49 Mpps3.92 Mpps11.66% /- 8.15%E5-2650v4 (2x12) & ml5en0 Mpps0 MppsSystem OverloadedE5-2650v2 (8) & cxgbe5.76 Mpps5.79 MppsNo diff. proven at 95.0% confidenceE5-2650v2 (8) & oce1.33 Mpps1.33 MppsNo diff. proven at 95.0% confidenceC2758 (8) & cxgbe2.83 Mpps3.17 Mpps12.52% /- 1.82%C2758 (8) & ixgbe2.3 Mpps2.43 Mpps6.14% /- 1.84%951 Kpps1 Mpps4.75% /- 1.08%726 Kpps749 Kpps3.14% /- 0.70%Xeon & Intel 82599ESXeon & Chelsio T520Xeon & Mellanox ConnectX-3 ProXeon & Mellanox ConnectX-4 LxXeon & Chelsio T540Xeon & Emulex be3Atom & Chelsio T540Atom & Intel 82599ESC2558 (4) & igbAtom & Intel I354GX412 (4) & igbAMD & Intel I21010Gb/s full duplex IMIX7 Mpps1Gb/s full duplex IMIX700 KppsTips 2: harvest mask "351"18 / 61

arpresolve & ip findroute Yandex contributions (melifaro@ & ae@) Published January 2016: outingProposal Patches refreshed for FreeBSD 12-head:https://people.freebsd.org/ ae/afdata.difhttps://people.freebsd.org/ ae/radix.dif Patches backported to FreeBSD 11.1:https://people.freebsd.org/ olivier/fbsd11.1.ae.afdata-radix.patch19 / 61

Yandex‘s patchessetup11.111.1-YandexministatE5-2650v4 (2x12) & ixgbe3.78 Mpps6.46 Mpps73.58% /- 7.3%E5-2650v4 (2x12) & cxgbe4.87 Mpps9.60 Mpps95.36% /- 3.8%E5-2650v4 (2x12) & mlx4en3.92 Mpps8.01 Mpps100.5% /- 15.6%E5-2650v4 (2x12) & mlx5en0 Mpps14.64 MppsNAE5-2650v2 (8) & cxgbe5.75 Mpps10.9 Mpps90.56% /- 1.24E5-2650v2 (8) & oce1.33 Mpps1.33 MppsNo diff. proven at 95.0% confidenceC2758 (8) & cxgbe3.15 Mpps4.2 Mpps34.4% /- 2.9%C2758 (8) & ixgbe2.43 Mpps3.08 Mpps26% /- 1.18C2558 (4) & igb1 Mpps1.2 Mpps20.17% /- 2.56%GX412 (4) & igb747 Kpps729 Kpps-2.37% /- 0.58%Xeon & Intel 82599ESXeon & Chelsio T520Xeon & Mellanox ConnectX-3 ProXeon & Mellanox ConnectX-4 LxXeon & Chelsio T540Xeon & Emulex be3Atom & Chelsio T540Atom & Intel 82599ESAtom & Intel I354AMD & Intel I21010Gb/s full duplex IMIX 7 Mpps1Gb/s full duplex IMIX700 KppsTips 3: Use steroid patches from Russia20 / 61

Avoid some NIC 10G Emulex OneConnect (be3)–No configurable number of rx/tx queues (4)–No configurable Ethernet Flow control–1.33Mpps is not even a gigabit line-rateTips 4: Use good NIC (Mellanox, Chelsio, Intel)21 / 61

Linear performance ?(singlesocket)Notice the linear improvement in number of queue power of 222 / 61

Queue/IRQ pins to CPU ?# grep -R bus bind intr src/sys/dev/* bxe: QLogic NetXtreme II Ethernet 10Gb PCIe cxgbe: Chelsio T4-, T5-, and T6-based (into #ifdef RSS) e1000 (igb, em, lem) : Intel Gigabit ixgbe: Intel 10 Gigabit ixl: Intel XL710 Ethernet 40Gb qlnxe: Cavium 25/40/100 Gigabit Ethernet sfxge: Solarflare 10Gb vxge: Neterion X3100 10GbCan be useful on cxgbe23 / 61

Queue/IRQ pins to CPU Config 1: Default Config 2: Queue/IRQ piningchelsio affinity enable “YES“ # service chelsio affinity startBind t5nex0:0a IRQ 284 to CPU 0Bind t5nex0:0a IRQ 285 to CPU 1Bind t5nex0:0a IRQ 286 to CPU 2Bind t5nex0:0a IRQ 287 to CPU 3Bind t5nex0:0a IRQ 288 to CPU 4Bind t5nex0:0a IRQ 289 to CPU 5Bind t5nex0:0a IRQ 290 to CPU 6Bind t5nex0:0a IRQ 291 to CPU 7(.)24 / 61

Queue/IRQ pins to CPUx Xeon E5-2650v2 & cxgbe, default: inet4 packets-per-second Xeon E5-2650v2 & cxgbe, IRQ pinned to CPU: inet4 packets-per-second ------------------------ xx xxx A A M ------------------------ 95186012056.937 ce at 95.0% confidence194810 /- 17742.81.77878% /- 0.163429%Small benefit and only if pps 10Mpps(Student's t, pooled s 12165.6)x Atom C2750 & cxgbe, default: inet4 packets-per-second Atom C2750 & cxgbe, IRQ pinned to CPU: inet4 packets-per-second ------------------------ x x x x x A M A M ------------------------ 6676051.798 54112849.5421281141730304160909.743836.87625 / 61No difference proven at 95.0% confidence

Increasing RX queuesnumberSetup8 queues24 queuesE5-2650v4 (2x12 cores)(default forixgbe &cxgbe)(default for mlx5en)ministatixgbe6.72 Mpps8.07 Mpps21.34% /- 4.96%cxgbe9.59 Mpps12.40 Mpps29.45% /- 0.37%mlx5en7.26 Mpps14.64 MppsIntel 82599ESChelsio T520Mellanox ConnectX-4 LxTips 5: Check default maximum of queues and increase it if ncpu 8mlx4en drivers didn’t allow to changes number of queue (16 here)10Gb/s full duplex IMIX 7 Mpps1Gb/s full duplex IMIX700 Kpps26 / 61

NUMA affinitynuma-domain 0CPU 0-11numa-domain 1CPU 12-23Intel Xeon Processor E5-2600 v4 Product Family: Platform Brieft5nex0: Chelsio T520-CR mem 000-0xc9685fff irq 50 at device 0.4 numa-domain 1 on pci1427 / 61

Default: NO NUMA affinity Default CPU load with 12 RX queues:Scheduleror driversnot NUMAawarelast pid: 1080; load averages: 7.13, 3.04, 1.30273 processes: 35 running, 125 sleeping, 113 waitingCPU 0:0.0% user, 0.0% nice, 0.0% system, 0.4% interrupt,CPU 1:0.0% user, 0.0% nice, 0.0% system, 0.4% interrupt,CPU 2:0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt,CPU 3:0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt,CPU 4:0.0% user, 0.0% nice, 0.0% system, 89.8% interrupt,CPU 5:0.0% user, 0.0% nice, 0.0% system, 100% interrupt,CPU 6:0.0% user, 0.0% nice, 0.0% system, 94.9% interrupt,CPU 7:0.0% user, 0.0% nice, 0.0% system, 89.8% interrupt,CPU 8:0.0% user, 0.0% nice, 0.0% system, 84.6% interrupt,CPU 9:0.0% user, 0.0% nice, 0.0% system, 92.1% interrupt,CPU 10: 0.0% user, 0.0% nice, 0.0% system, 84.6% interrupt,CPU 11: 0.0% user, 0.0% nice, 0.0% system, 83.9% interrupt,CPU 12: 0.0% user, 0.0% nice, 0.0% system, 85.8% interrupt,CPU 13: 0.0% user, 0.0% nice, 0.0% system, 92.1% interrupt,CPU 14: 0.0% user, 0.0% nice, 0.0% system, 85.0% interrupt,CPU 15: 0.0% user, 0.0% nice, 0.0% system, 78.0% interrupt,CPU 16: 0.0% user, 0.0% nice, 0.4% system, 0.0% interrupt,CPU 17: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt,CPU 18: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt,CPU 19: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt,CPU 20: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt,CPU 21: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt,CPU 22: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt,CPU 23: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt,Mem: 13M Active, 13M Inact, 1170M Wired, 6393K Buf, 248G 00%100%100%idleidleidleidleidleidleNumaidledomain omain 1idleidleidleidleidleidle 28 / 61

NUMA affinity cxgbe configured with 12 RX queues, pluggedon PCI-E belonging to numa-domain 1 (cores12-23) Config 1: no-affinity (default) Config 2: cxgbe queues pined to core 0-11chelsio affinity enable "YES" Config 3: cxgbe queues pined to core 12-23chelsio affinity enable "YES"chelsio affinity firstcpu "12"29 / 61

NUMA affinityx Xeon 2xE5-2650v4 & cxgbe, default: inet4 packet-per-seconds Xeon 2xE5-2650v4 & cxgbe, affinity-numa0: inet4 packet-per-seconds* Xeon 2xE5-2650v4 & cxgbe, affinity-numa1: inet4 packet-per-seconds ------------------------ x* xx x ** ** A M A M MA ------------------------ 5998839.328 59220385960369795572259493098.6154964.3No difference proven at 95.0% 165Difference at 95.0% confidence1.11851e 06 /- 10819111.7604% /- 1.25701%(Student's t, pooled s 74182.7)Tips 6: Take care of NUMA affinity with queue to CPU pining30 / 61

Linear performance ?(NUMA)Notice that mlx5en didn’t required number of queue power of 2cxgbe reaches line-rate with only 16 queues31 / 61

NIC hardware accelerationfeatures Checksum offload: rxcsum, txcsum, VLAN offload: vlanmtu, vlanhwtag,vlanhwfilter, vlanhwcsum, TSO :TCP Segmentation Offload NIC split large segment into MTU-sized packets MUST be disabled on a router (and incompatible with ipfw nat)LRO: Large Received Offload Breaks the end-to-end principle on a router: MUST be disabledHardware resources reservation32 / 61

Disabling LRO & TSOServerCPU (cores) & NICEnabled(default)DisabledE5-2650v4 (2x12) & ixgbe7.97 Mpps8.07 MppsNo difference proven at 95.0% confidenceE5-2650v4 (2x12) & cxgbe12.40 Mpps12.40 MppsNo difference proven at 95.0% confidenceE5-2650v4 (2x12) & ml4en8.05 Mpps7.85 MppsNo difference proven at 95.0% confidenceE5-2650v4 (2x12) & ml5en14.65Mpps14.83 Mpps1.3% /- 0.1%E5-2650v2 (8) & cxgbe10.84 Mpps10.92 Mpps0.74% /- 0.26%C2758 (8) & cxgbe4.20 Mpps4.18 MppsNo diff. proven at 95.0% confidenceC2758 (8) & ixgbe3.06 Mpps3.06 MppsNo diff. proven at 95.0% confidenceC2558 (4) & igb1.2 Mpps1.2 MppsNo diff. proven at 95.0% confidenceGX412 (4) & igb729 Kpps727 KppsNo diff. proven at 95.0% confidenceXeon & Intel 82599ESXeon & Chelsio T520Xeon & Mellanox ConnectX-3 ProXeon & Mellanox ConnectX-4 LxXeon & Chelsio T540Atom & Chelsio T540Atom & Intel 82599ESAtom & Intel I354AMD & intel I210ministatTips 6: You can disable LRO & TSO on your router/firewall33 / 61

hw.igb ix.rx process limitServer100(igb), 256(ix), -1 (disabled)defaultmedianmedianCPU (cores) & NICministatE5-2650v4 (2x12) & ixgbe8.04 Mpps8.34 Mpps3.75% /- 0.73%C2758 (8) & ixgbe3.12 Mpps3.85 Mpps22.66% /- 2.14%1.10 Mpps1.13 Mpps1.65% /- 0.9%730 Kpps735 KppsNo diff. proven at 95.0% conf.Xeon & Intel 82599ESAtom & Intel 82599ESC2558 (4) & igbAtom & Intel I354GX412 (4) & igbAMD & Intel I210Tips 6: Disable rx process limit with igb & ixgbe34 / 61

Disabling unused features“Disallowing capabilities provides a hint to thedriver and firmware to not reserve hardwareresources for that feature”/boot/loader.conf:hw.cxgbe.toecaps allowed "0"hw.cxgbe.rdmacaps allowed "0"hw.cxgbe.iscsicaps allowed "0"hw.cxgbe.fcoecaps allowed "0"35 / 61

Disabling unused featuresx Xeon 2xE5-2650v4 & cxgbe, default caps enabled: inet4 packet-per-seconds Xeon 2xE5-2650v4 & cxgbe, caps disabled: inet4 packet-per-seconds ------------------------ x x x x x A A ------------------------ 412289901.22767 ce at 95.0% confidence2.38634e 06 /- 2422.8319.2256% /- 0.0201158%(Student's t, pooled s 1661.24)Tips 7: Disable unused caps with cxgbe36 / 61

Forwarding tuningsummary Yandex‘s patches: AFDATA and RADIX locksIncrease Intel & Chelsio NIC queues if ncpu 8, but kept power-of-two numberboot/loader.confmachdep.hyperthreading allowed "0"hw.igb.rx process limit "-1"hw.em.rx process limit "-1"Intel drivershw.ix.rx process limit "-1"hw.cxgbe.toecaps allowed "0"hw.cxgbe.rdmacaps allowed "0"hw.cxgbe.iscsicaps allowed "0"Chelsio drivers (useful starting at 10Mpps,so with Yandex’s patches)hw.cxgbe.fcoecaps allowed "0" etc/rc.confharvest mask "351"37 / 61

Before vs after tuning (IPv4)SetupCPU (cores) & NICGeneric 11.1Yandex patched& tuned 11.1ministatE5-2650v4 (2x12) & ixgbe3.74 Mpps8.61 Mpps127.93% /- 8.44%E5-2650v4 (2x12) & cxgbe4.83 Mpps14.8 Mpps204.3% /- 4.80%E5-2650v4 (2x12) & ml4en3.92 Mpps8.06 Mpps126.9% /- 7.77%E5-2650v4 (2x12) & ml5en0 Mpps14.64 MppsNAE5-2650v2 (8) & cxgbe5.75 Mpps11.15 Mpps139.8% /- 5.0%E5-2650v2 (8) & oce1.33 Mpps1.33 MppsNo diff. proven at 95.0% confidence2.83 Mpps4.19 Mpps50.49% /- 5.33%C2758 (8) & ixgbe2.29 Mpps3.85 Mpps66.97% /- 2.7%C2558 (4) & igb951 Kpps1.13 Mpps18.58% /- 1.17%GX412 (4) & igb726 Kpps735 Kpps1.03% /- 0.56%Xeon & Intel 82599ESXeon & Chelsio T520Xeon & Mellanox ConnectX-3 ProXeon & Mellanox ConnectX-4 LxXeon & Chelsio T540Xeon & Emulex be3C2758 (8) & cxgbeAtom & Chelsio T540Atom & Intel 82599ESAtom & Intel I354AMD & Intel I21038 / 61

IPv4 vs IPv6 performanceSetupCPU (cores) & NICinet4inet6ministatE5-2650v4 (2x12) & ixgbe8.35 Mpps8.12 Mpps-3.25% /- 1.7%E5-2650v4 (2x12) & cxgbe14.8 Mpps14.47 Mpps-2.18% /- 0.02%E5-2650v4 (2x12) & ml4en8.06 Mpps7.71 Mpps-3.35% /- 3.26%E5-2650v4 (2x12) & ml5en14.84 Mpps14.29 Mpps-3.70% /- 0.02%E5-2650v2 (8) & cxgbe10.94 Mpps9.18 Mpps-16.12% /- 0.19%C2758 (8) & cxgbe4.29 Mpps3.43 Mpps-19.08% /- 1.61%C2758 (8) & ixgbe3.81 Mpps3.43 Mpps-9.84% /- 1.3%C2558 (4) & igb1.23 Mpps1.08 Mpps-11.79% /- 0.5%GX412 (4) & igb734 Kpps709 Kpps-3.6% /- 0.70%Xeon & Intel 82599ESXeon & Chelsio T520Xeon & Mellanox ConnectX-3 ProXeon & Mellanox ConnectX-4 LxXeon & Chelsio T540Atom & Chelsio T540Atom & Intel 82599ESAtom & Intel I354AMD & Intel I210Notice the difference between Chelsio and Intel NIC on C2758(bottleneck no more in the drivers but in the Kernel)39 / 61

Configuration impact VLAN tagging VIMAGE & VNET jail Bridge40 / 61

VLAN tagging Config 1: No VLANifconfig cxl0 "inet 198.18.0.10/24"ifconfig cxl1 "inet 198.19.0.10/24" Config 2: VLAN taggingvlans cxl0 "2"ifconfig cxl0 "up"ifconfig cxl0 2 "inet 198.18.0.10/24"vlans cxl1 "4"ifconfig cxl1 "up"ifconfig cxl1 4 "inet 198.19.0.10/24"41 / 61

VLAN taggingx Xeon E5-2650v2 & cxgbe, no VLAN tagging: inet4 packets-per-second Xeon E5-2650v2 & cxgbe, VLAN tagging: inet4 packets-per-second ------------------------ xx xxx A MA ------------------------ 94674322298.313 59056449910419590640329075563.721531.387Difference at 95.0% confidence-1.87118e 06 /- 31966.4-17.0935% /- 0.267353%(Student's t, pooled s 21918.2)-17% with tagging: Known problemYet another patch from Yandexixgbe: https://reviews.freebsd.org/D12040mlx5en: https://reviews.freebsd.org/D1204142 / 61

Adding VIMAGE supportoptionsVIMAGEE5-2650v2 & cxgbeGENERIC(median)Mppsinet 4 forwarding10.910.2-6.25% /- 0.29%inet 6 forwarding9.189.392.24% /- 0.33Xeon & Chelsio T540VIMAGE(median)Mppsministat43 / 61

Multi-tenant routerVNET jailnetmap‘spkt-genhost /etc/rc.confifconfig cxl0 "up -tso4 -tso6 -lro -vlanhwtso"ifconfig cxl1 "up -tso4 -tso6 -lro -vlanhwtso"jail enable "YES"jail list "jrouter"Jail jrouter /etc/rc.confgateway enable YESipv6 gateway enable YESifconfig cxl0 "inet 198.18.0.10/24"ifconfig cxl1 "inet 198.19.0.10/24"static routes "generator receiver"route generator "-net 198.18.0.0/16 198.18.0.108"route receiver "-net 198.19.0.0/16 198.19.0.108"44 / 61

VNET jail: impact on PPSE5-2650v2 & cxgbeNo JailVNET-JailMinistatXeon & Chelsio T540inet 4 forwarding10.8 Mpps 11.0 MppsNo diff. proven at 95.0% confidenceinet 6 forwarding10.0 Mpps 10.0 MppsNo diff. proven at 95.0% confidenceVNET-jail rocks!45 / 61

if bridge Config 1: No bridgecxl0ifconfig cxl0 "inet 198.18.0.10/24"ifconfig cxl1 "inet 198.19.0.10/24" pkt-gencxl1Config 2: Dummy bridgecloned interfaces "bridge0"ifconfig bridge0 "inet 198.18.0.8/24 addm cxl0 up"ifconfig cxl0 "up"ifconfig cxl1 "inet 198.19.0.10/24"bridge0 cxl0pkt-gencxl146 / 61

if bridgex Xeon E5-2650v2 & cxgbe, NO bridge: inet4 packets-per-second Xeon E5-2650v2 & cxgbe, bridge: inet4 packets-per-second ------------------------ x xx A AM ------------------------ 14978328766.212 ce at 95.0% confidence-6.97098e 06 /- 121051-62.5212% /- 1.05729%(Student's t, pooled s 83000.5)-62% with bridge interface involvedbridge input() include lot’s of LOCK47 / 61

Firewalls: Disclaimer!None of the following benches can conclude afirewall is better than another.A firewall can't be reduced to its onlyforwarding performance impact48 / 61

Firewalls How these impact throughput (PPS):–Enabling ipfw / pf / ipf with inet4 & inet6–Number of rules–Table size–Number of UDP flows49 / 61

Firewalls impact onthroughputWarning: do not conclude a firewall is better than another with this result!50 / 61

Firewalls impact on throughputWarning: do not conclude a firewall is better than another with this result!51 / 61

Stateless: rules impactfp:i/hcnfwepibD ring ableAa pf tBpcom s vsruleKeep MINIMUM numbers of rules with ipfw/ipf52 / 61

Stateless: Table size impactUse table53 / 61

Stateful ipfw: number ofstates One UDP flow create 1 state (dynamic rule)check-stateipfw add allow ip from any to any keep-statekeysdynamic rulesDefaultvalueIncreased value16 3845 000 00025665 536 (max)net.inet.ip.fw.dyn maxhash table size[max dyn / 64 ?](power of 2)net.inet.ip.fw.dyn buckets54 / 61

Stateful pf: number ofstate One UDP flow consumes 2 pf statesLinear relationship between maximumnumber of states and hash table sizekeysDefault valueIncreased valuestates limit10 00010 000 000Hash table size state x 3(power of 2)32 76833 554 432(max with 8GB RAM)2.5Mb2.5Gbset limit { states X }net.pf.pf states hashsizeRAM consummed(hashsize x 80)vmstat -m grep pf hash55 / 61

stateful: Number of stateNote: For a stateful firewall with more than 100K use pf on FreeBSD 11.156 / 61

ipfw stateful lockless Andrey V. Elsukov (ae)’s reaction to theprevious bench:–“Rework ipfw dynamic statesimplementation to be lockless on fastpath”–Brings lot’s of performance improvement–Use ConcurrencyKit–Committed on head as r32898857 / 61

ipfw stateful locklessFor a fast stateful firewall try IPFW on -head58 / 61

Resources Benches scripts, configurations, RAW enches BSD Router Project (nanoBSD based onFreeBSD)https://bsdrp.net59 / 61

Questions ?60 / 61

Thanks !61 / 61

Dell PowerEdge R630 Intel E5-2650 v4 2x12x2 2.2 10G Intel 82599ES (ixgbe) 10G Chelsio T520-CR (cxgbe) 10G Mellanox ConnectX-3 Pro (mlx4en) 10-50G Mellanox ConnectX-4 LX (mlx5en) HP ProLiant DL360p Gen8 Intel E5-2650 v2 8x2 2.6 10G Chelsio T540-CR (cxgbe) 10G Emulex OneConnect be3 (oce) SuperMicro 5018A-FTN4 Intel Atom C2758 8 2.4 10G Chelsio .

Related Documents:

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

OS Performance - Filesystem Tuning - Filesystems - Other Filesystems Performance Tuning Exercise 2 OS Performance - General - Virtual Memory - Drive tuning - Network Tuning Core Settings TCP/IP Settings - CPU related tuning - 2.4 Kernel tunables - 2.6 Kernel tunables Performance Tuning Exercise 3 Performance Monitoring

10 tips och tricks för att lyckas med ert sap-projekt 20 SAPSANYTT 2/2015 De flesta projektledare känner säkert till Cobb’s paradox. Martin Cobb verkade som CIO för sekretariatet för Treasury Board of Canada 1995 då han ställde frågan

service i Norge och Finland drivs inom ramen för ett enskilt företag (NRK. 1 och Yleisradio), fin ns det i Sverige tre: Ett för tv (Sveriges Television , SVT ), ett för radio (Sveriges Radio , SR ) och ett för utbildnings program (Sveriges Utbildningsradio, UR, vilket till följd av sin begränsade storlek inte återfinns bland de 25 största

Hotell För hotell anges de tre klasserna A/B, C och D. Det betyder att den "normala" standarden C är acceptabel men att motiven för en högre standard är starka. Ljudklass C motsvarar de tidigare normkraven för hotell, ljudklass A/B motsvarar kraven för moderna hotell med hög standard och ljudklass D kan användas vid

LÄS NOGGRANT FÖLJANDE VILLKOR FÖR APPLE DEVELOPER PROGRAM LICENCE . Apple Developer Program License Agreement Syfte Du vill använda Apple-mjukvara (enligt definitionen nedan) för att utveckla en eller flera Applikationer (enligt definitionen nedan) för Apple-märkta produkter. . Applikationer som utvecklas för iOS-produkter, Apple .

systems (AS) (a.k.a. "domains") inter-AS routing § routing among AS'es § gateways perform inter-domain routing (as well as intra-domain routing) Internet approach to scalable routing intra-AS routing § routing among hosts, routers in same AS ("network") § all routers in AS must run sameintra-domain protocol § routers in .

Building a security appliance based on FreeBSD BSDCan Ottawa 2019 1. Mariusz Zaborski 2 m.zaborski@fudosecurity.com oshogbo@FreeBSD.org https://oshogbo.vexillium.org @oshogbovx BSDCan Ottawa 2019. 3. 4 Data encryption. 5 Data encryption Storage. 6 Data encryption Storage External storage. 7 Data encryption Storage External storage Remote access .