Slab Allocators In The Linux Kernel: SLAB, SLOB, SLUB

1y ago
299.42 KB
35 Pages
Last View : 4m ago
Last Download : 6m ago
Upload by : Xander Jaffe

Slab allocators in theLinux Kernel:SLAB, SLOB, SLUBChristoph Lameter,LinuxCon/Düsseldorf 2014(Revision Oct 3, 2014)

The Role of the Slaballocator in Linux PAGE SIZE (4k) basic allocation unit via page allocator. Allows fractional allocation. Frequently needed for smallobjects that the kernel allocates f.e. for networkdescriptors. Slab allocation is very performance sensitive. Caching. All other subsystems need the services of the slaballocators. Terminology: SLAB is one of the slab allocator. A SLAB could be a page frame or a slab cache as awhole. It's confusing. Yes.

System Components aroundSlab AllocatorsDeviceDriversFileSystemskmalloc node(size, flags, node)kmem cache alloc node(cache,flags, node)ctebjsSlaballocatoresomlraalFeSm PagPageAllocatorkmalloc(size, flags)kfree(object)kzalloc(size, flags)kmem cache alloc(cahe, flags)kmem cache free(object)MemoryManagementUser space code

Slab allocators available SLOB: K&R allocator (1991-1999) SLAB: Solaris type allocator (1999-2008) SLUB: Unqueued allocator (2008-today) Design philosophies– SLOB: As compact as possible–SLAB: As cache friendly as possible. Benchmarkfriendly.–SLUB: Simple and instruction cost counts. SuperiorDebugging. Defragmentation. Execution timefriendly.

1991200020102014 SLUBification of SLAB2013 Common slab code2011 SLUB fastpath rework2008 SLOB multilist2007 SLUB allocator2003 SLOB allocator2004 NUMA SLAB1996 SLAB allocator1991 Initial K&R allocatorTime line: Slab subsystemdevelopment2014

Maintainers Manfred Spraul SLAB Retired Matt Mackall SLOB Retired Pekka EnbergChristoph Lameter SLUB, SLAB NUMA David RientjesJoonsoo Kim

Contributors Alokk N KatariaSLAB NUMA codeShobhit DayalSLAB NUMA architectureGlauber CostaCgroups supportNick PigginSLOB NUMA support andperformance optimizations. Multiple alternative out oftree implementations for SLUB.

Basic structures of SLOB K&R allocator: Simply manages list of free objects withinthe space of the free objects. Allocation requires traversing the list to find an object ofsufficient size. If nothing is found the page allocator isused to increase the size of the heap. Rapid fragmentation of memory. Optimization: Multiple list of free objects according tosize reducing fragmentation.

SLOB data structuresGlobal DescriptorSmallPage Frame Descriptorstruct page: s memmediumlargelruslob lockslob freeflagsunitsfreelistPage Frame Content:S/OffsFreeObjectFreeS/OffsPage frameObject Format:sizeobject sizePayloadsize offset-offsetPaddingObjectSize,OffsetFree

SLAB memory management Queues to track cache hotnessQueues per cpu and per nodeQueues for each remote node (alien caches)Complex data structures that are described in thefollowing two slides. Object based memory policies and interleaving. Exponential growth of caches nodes * nr cpus. Largesystems have huge amount of memory trapped incaches. Cold object expiration: Every processor has to scan itsqueues of every slab cache every 2 seconds.

SLAB per frame freelist managementPage Frame IFreeFIObjectObjectFreeFI Index of free object in frameTwo types: short or charPage- activeFor each object in the frameMultiple requests for free objects can be satisfied from the same cacheline withouttouching the object contents.Padding

Cache Descriptorkmem cache:nodePer Node datakmem cache node:SLAB data structurespartial listarray cache:colour offsizeavailobject sizelimitflagsbatchcountarraytouchedfull listempty listPage Frame Descriptorstruct page:s memObject inanotherpageentry[0]entry[1]lrusharedalienlist lockreapingactiveentry[2]slab cachefreelistPage Frame Content:ColoringfreelistPaddingFreeFreeObjectPage frameObject Format:object sizePayloadPoisoningsizeRedzone Last callerPaddingObjectFreePadding

SLUB memory layout Enough of the queueing. “Queue” for a single slab page. Pages associated withper cpu. Increased locality. Per cpu partials Fast paths using this cpu ops and per cpu data. Page based policies and interleave. Defragmentation functionality on multiple levels. Current default slab allocator.

Cache Descriptorkmem cache:Per Node datakmem cache node:SLUB data structuresflagspartial listoffsetlist locksizeobject sizePage Frame Descriptorstruct page: Frozennodecpu slabPagelockkmem cache cpu:freelistPage Frame tObjectFPFreeFPFreeFPFreePage frameObject Format:object buggingPadding FPPaddingFPFreePadding

SLUB slabinfo tool Query status of slabs and objects Control anti-defrag and object reclaim Run verification passes over slab caches Tune slab caches Modify slab caches on the fly

Slabinfo Examples Usually must be compiled from kernelsource tree: gcc -o slabinfolinux/tools/vm/slabinfo.cSlabinfo Slabinfo -T Slabinfo -s Slabinfo -v

slabinfo basic outputNameObjects Objsize:at-0000040 41635 40:t-0000024 724:t-0000032 312132:t-0002048 5642048:t-0002112 3842112:t-0004096 4124096Acpi-State 5180anon vma 842356bdev cache 34816blkdev queue 271896blkdev requests 168 376Dentry191961 192ext4 inode cache 163882 01048vm area struct 20680 71/15/08/8/03/3/12/2/00/0/20/0/2922/30/31O/S O %Fr %Ef Flg102 0 2 98 *a170 0 1004 *128 0 61 55 *16 3 28 78 *15 3 41 85 *8 3 15 88 *51 0 09964 0 257239 3 100 10 Aa17 3 753921 1 09621 0 098 a33 3 098 a24 1 100 2318 3 7530 A16 3 100 8 A36 3 097 A30 3 095 A22 0 397

Totals: slabinfo -TSlabcache TotalsSlabcaches : 112Aliases : 189- 84Memory used: 267.1M # Loss : 8.5M# Objects : 708.5K# PartObj: 10.2KActive: 66MRatio: 3%ORatio: 1%Per Cache %PartObjs1% otal708.5K23.1K5662%10.2K1%267.1M258.6M8.5MPer 652

Aliasing: slabinfo -a:at-0000040 - ext4 extent status btrfs delayed extent op:at-0000104 - buffer head sda2 ext4 prealloc space:at-0000144 - btrfs extent map btrfs path:at-0000160 - btrfs delayed ref head btrfs trans handle:t-0000016 - dm mpath io kmalloc-16 ecryptfs file cache:t-0000024 - scsi data buffer numa policy:t-0000032 - kmalloc-32 dnotify struct sd ext cdb ecryptfs dentry info cache pte list desc:t-0000040 - khugepaged mm slot Acpi-Namespace dm io ext4 system zone:t-0000048 - ip fib alias Acpi-Parse ksm mm slot jbd2 inode nsproxy ksm stable node ftrace event fieldshared policy node fasync cache:t-0000056 - uhci urb priv fanotify event info ip fib trie:t-0000064 - dmaengine-unmap-2 secpath cache kmalloc-64 io ksm rmap item fanotify perm event info fs cachetcp bind bucket ecryptfs key sig cache ecryptfs global auth tok cache fib6 nodes iommu iova anon vma chainiommu devinfo:t-0000256 - skbuff head cache sgpool-8 pool workqueue nf conntrack expect request sock TCPv6 request sock TCPbio-0 filp biovec-16 kmalloc-256:t-0000320 - mnt cache bio-1:t-0000384 - scsi cmd cache ip6 dst cache i915 gem object:t-0000416 - fuse request dm rq target io:t-0000512 - kmalloc-512 skbuff fclone cache sgpool-16:t-0000640 - kioctx dio files cache:t-0000832 - ecryptfs auth tok list item task xstate:t-0000896 - ecryptfs sb cache mm struct UNIX RAW PING:t-0001024 - kmalloc-1024 sgpool-32 biovec-64:t-0001088 - signal cache dmaengine-unmap-128 PINGv6 RAWv6:t-0002048 - sgpool-64 kmalloc-2048 biovec-128:t-0002112 - idr layer cache dmaengine-unmap-256:t-0004096 - ecryptfs xattr cache biovec-256 names cache kmalloc-4096 sgpool-128 ecryptfs headers

Enabling of runtime Debugging Debugging support is compiled in by default. A distro kernelhas the ability to go into debug mode where meaningfulinformation about memory corruption can be obtained.Activation via slub debug kernel parameter or via theslabinfo tool. slub debug can take some parametersLetterPurposeFEnable sanity check that may impact performancePPoisoning. Unused bytes and freed objects are overwritten withpoisoning values. References to these areas will show specificbit patterns.UUser tracking. Record stack traces on allocate and freeTTrace. Log all activity on a certain slab cacheZRedzoning. Extra zones around objects that allow to detectwrites beyond object boundaries.

Comparing memory use SLOB most compact (unless frequent freeing andallocation occurs) SLAB queueing can get intensive memory use going.Grows exponentially by NUMA node. SLUB aliasing of slabs SLUB cache footprint optimizations Kvm instance memory use of allocatorsMemory use afterbootup of a desktopLinux system*SLOB does not support the slabstatistics counters. 300Kb is thedifference of “MemAvailable” afterboot between SLUB and SLOBAllocatorReclaimableUnreclaimableSLOB* 300KB SLUB29852 kB32628 kBSLAB29028 kB36532 kB

Comparing performance SLOB is slow (atomics in fastpath, global locking) SLAB is fast for benchmarking SLUB is fast in terms of cycles used for the fastpath butmay have issues with caching. SLUB is compensating for caching issues with anoptimized fastpath that does not require interruptdisabling etc. Cache footprints are a main factor for performance thesedays. Benchmarking reserves most of the cacheavailable for the slab operations which may bemisleading.

Fastpath B18317317230083037Times in cycles on a Haswell 8 core desktop processor.The lowest cycle count is taken from the test.

Hackbench comparisonSeconds15 groups 50 filedesc 2000 messages 512 bytesSLAB4.92 4.87 4.85 4.98 4.85SLUB4.84 4.75 4.85 4.9 4.8SLOBN/A

Remote freeingCyclesAlloc all Free on oneAlloc oneFree allSLAB650761SLUB595498SLOB26502013Remote freeing is the freeing of an object that was allocated on a differentProcessor. Its cache cold and may have to be reused on the other processor.Remote freeing is a performance critical element and the reason that “alien”caches exist in SLAB. SLAB's alien caches exist for every node and everyprocessor.

Future Roadmap Common slab framework (mm/slab common.c) Move toward per object logic for Defragmentation andmaybe to provide an infrastructure for generally movableobjects (patchset done 2007-2009 maybe redo it) SLAB fastpath relying on this cpu operations. SLUB fastpath cleanup. Remove preempt enable/disablefor better CONFIG PREEMPT performance.

Slab Defragmentation Freeing of slab objects creates sparselypopulated slab pages. Memory is lostthere.Defragmentation frees pages with only afew objects and ideally moves them to theslab pages that have only a few objectsfree.

Fragmentation and partial listsPartial ListPage DescrPage DescrPage DescrPage DescrPage Descr

Defragmentation by sorting partial list Pages with only a few free objects can be removed from thepartial list if they are used before pages with more objects.Pages that have only a few objects can be removed if thoseobjects are freed so its advantageous to keep them at theend of the partial list. More chances of the objects beingfreed which would allow the page to be freed.So sort the partial lists by number of free objects. The oneswith the fewest objects available need to come first.Occurs during kmem cache shrink() or manual interventionusing the “slabinfo” tool.

Defragmented partial listPartial ListPage DescrPage DescrPage DescrPage DescrPage Descr

Defragmentation by off nodeallocation Remote node defrag ratio determines chance ofthe allocator to go offline for objects with defaultallocation policies.This gradually drains the remote partial lists if theyare not in use and make empty slots in slabsavailable.Tradeoff of node locality vs. defragmentation.Works best in cooperation with the sorting of thepartial lists.

Defragmentation by eviction Rejected patchset for slab defragmentation in 2009 Callbacks to evict objects–– Get: Establish reliable reference to objectKick: Throw object outOpportunistic: Callback can refuse to free object because it is inuse.Slab allocator can “isolate” slab page by freezing and locking it.Such a slab cannot be allocated from. Free operations can belocked out by running the “get” method on individual objects.Object can them be inspected by the subsystem and evicted.

Eviction ProcessingPage DescrPageLockGet():Take areference(stabilizeobject)Kick():Determine object referencesand remove objector fail

Movable objects Required for defragmentation. Fixed objectaddresses cause fragmentation and make largephysical allocations difficult.Subsystems need the capability to remove /relocate their metadata.This is already partially there on bootup/shutdownboth of the system and/or cpu onlining and offlining.Pages already can be migrated. The largest chunkof unmigratable memory are the slab caches now.

Conclusion Questions Suggestions New ideas

Oct 03, 2014 · allocator in Linux PAGE_SIZE (4k) basic allocation unit via page allocator. Allows fractional allocation. Frequently needed for small objects that the kernel allocates f.e. for network descriptors. Slab allocation is very performance sensitive. Caching. All o

Related Documents:

May 02, 2018 · D. Program Evaluation ͟The organization has provided a description of the framework for how each program will be evaluated. The framework should include all the elements below: ͟The evaluation methods are cost-effective for the organization ͟Quantitative and qualitative data is being collected (at Basics tier, data collection must have begun)

On an exceptional basis, Member States may request UNESCO to provide thé candidates with access to thé platform so they can complète thé form by themselves. Thèse requests must be addressed to esd rize unesco. or by 15 A ril 2021 UNESCO will provide thé nomineewith accessto thé platform via their émail address.

̶The leading indicator of employee engagement is based on the quality of the relationship between employee and supervisor Empower your managers! ̶Help them understand the impact on the organization ̶Share important changes, plan options, tasks, and deadlines ̶Provide key messages and talking points ̶Prepare them to answer employee questions

Linux in a Nutshell Linux Network Administrator’s Guide Linux Pocket Guide Linux Security Cookbook Linux Server Hacks Linux Server Security Running Linux SELinux Understanding Linux Network Internals Linux Books Resource Center linux.oreilly.comis a complete catalog of O’Reilly’s books on Linux and Unix and related technologies .

May 27, 2016 · Two Way Beam Supported Slab References: 1. Design of. Reinforced Concrete, 2014, 9th Edition, ACI 318-11 Code Edition, by . Grid or Waffle slab One way slab Two way slab . 2 . 3 Figure: Two way slab (a) Bending of center strip, (b) grid model . Example: The two-way slab shown in Figure below has been assumed to have a thickness of 7

Chính Văn.- Còn đức Thế tôn thì tuệ giác cực kỳ trong sạch 8: hiện hành bất nhị 9, đạt đến vô tướng 10, đứng vào chỗ đứng của các đức Thế tôn 11, thể hiện tính bình đẳng của các Ngài, đến chỗ không còn chướng ngại 12, giáo pháp không thể khuynh đảo, tâm thức không bị cản trở, cái được

Other Linux resources from O’Reilly Related titles Building Embedded Linux Systems Linux Device Drivers Linux in a Nutshell Linux Pocket Guide Running Linux Understanding Linux Network Internals Understanding the Linux Kernel Linux Books Resource Center linu

ANSI A300 (Part 6)-2005 Transplanting, ANSI Z60.1- 2004 critical root zone: The minimum volume of roots necessary for maintenance of tree health and stability. ANSI A300 (Part 5)-2005 Management . development impacts: Site development and building construction related actions that damage trees directly, such as severing roots and branches or indirectly, such as soil compaction. ANSI A300 (Part .