SRC: A Multicore NPU-based TCP Stream Reassembly Card For Deep Packet .

1y ago
20 Views
2 Downloads
1.03 MB
14 Pages
Last View : 19d ago
Last Download : 3m ago
Upload by : Kaydence Vann
Transcription

SECURITY AND COMMUNICATION NETWORKSSecurity Comm. Networks 2014; 7:265–278Published online 1 March 2013 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/sec.727RESEARCH ARTICLESRC: a multicore NPU-based TCP stream reassemblycard for deep packet inspectionShuhui Chen1*, Rongxing Lu2† and Xuemin (Sherman) Shen2†12College of Computer Science, National University of Defense Technology, Changsha 410073, ChinaDepartment of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1, CanadaABSTRACTStream reassembly is the premise of deep packet inspection, regarded as the core function of network intrusion detectionsystem and network forensic system. As moving packet payload from one block of memory to another is essential forthe reason of packet disorder, throughput performance is very vital in stream reassembly design. In this paper, a stream reassembly card (SRC) is designed to improve the stream reassembly throughput performance. The designed SRC adjusts thesequence of packets on the basis of the multicore network processing unit by managing and reassembling streams throughan additional level of buffer. Specifically, three optimistic techniques, namely stream table dispatching, no-locking timeout,and multichannel virtual queue, are introduced to further improve the throughput. To address the critical role of memorysize in SRC, the relationship between the system throughput and memory size is analyzed. Extensive experiments demonstrate that the proposed SRC achieves more than 3 Gbps in terms of reassembly and submission throughput and triply outperforms the traditional server-based architecture with a lower cost. Copyright 2013 John Wiley & Sons, Ltd.KEYWORDSnetwork security; deep packet inspection; multicore NPU; stream reassembly*CorrespondenceShuhui Chen, College of Computer Science, National University of Defense Technology, Changsha 410073, China.E-mail: shchen@nudt.edu.cn†Please ensure that you use the most up to date class file, available from the SEC home page at 16/home1. INTRODUCTIONDeep packet inspection (DPI) is considered as a crucial technique for network intrusion detection system (NIDS) [1] andnetwork forensic system (NFS) [2], where packet payloadsneed to be matched against predefined patterns to identifyviruses, attacks, criminal evidences, and so on. Generally,the case of a low-speed network, server-based approach cansatisfy the throughput requirement. However, with theexponential increase of bandwidth, multi-gigabits per secondlinks are widely applied and placed in the campus network,and gradually, the traditional server-based approach (evenfor a server with high performance) no longer meets thecritical performance requirement. Therefore, plentiful researchefforts have been put to improve the overall throughput byachieving efficient content matching.To minimize the time cost for content matching, different rule matching algorithms using field-programmablegate array [1,3], graphic processing unit [4–7], or ternarycontent-addressable memory [8] are proposed. However,Copyright 2013 John Wiley & Sons, Ltd.decreasing only the content-matching time is not sufficientto achieve the desired system performance because streamreassembly, as an important preprocessing plug-in, takes amajor part of the whole workload. Experiments conductedin [9] demonstrates that reassembly takes 80% of the loadof NIDS when the matching time decreases to 1% of theoverall load. In addition, when we utilized a Dell serverwith two Xeon5405 CPUs, 2G DRAM, and Intel 82599network interface card (NIC) to take Snort tests, weobtained 2.5 Gbps throughput without dropping any packetwhen turning off the Stream5 (which is the streamreassembly component in the current 2.9.2 version ofSnort) but observed that the throughput will abruptlydecrease to 1.2 Gbps once the Stream5 is turned on.In general, when packets are transmitted through network, they might be dropped because of various reasons,that is, the processing ability of routers, out of order causedby the balance of multipath and others. Therefore, to tracethe information stream between the two ends, NIDS andNFS exert to fetch each packet belonging to a stream (also265

Stream reassembly based on multicore NPUknown as session or connection) and reconstruct what havebeen sent from the transport layer of the end systems.Nevertheless, many attacks cannot be detected if thestream reassembly is not conducted completely, as eithersignatures may cross the packets or stick attack could notbe identified [10,11]. Although many researches focusingon flow measure and analysis have appeared in the pastyears, most of them only investigate the stream attributesbut not stream reassembly [12,13]. In addition, althoughthere are some open source softwares fulfilling the streamreassembly task, for example, Stream5 (Stream4) andLibnids, there still exists a big gap between their throughput and the network link bandwidth.Memory access time is often considered as one of themajor throughput constrain of network security device.Generally, the immanent fluidness of network packetsresults in the very low hit ratio of cache. Because thenumber of memory access is predefined and cannot bechanged subsequently, the crux of enhancing systemperformance is to improve the efficiency of each memoryaccess in terms of access time.Traditional network security devices such as NIDS andNFS adopt high-performance CPU platforms such as x86and MIPS. These typical CPUs focus on how to improvethe calculation performance, so they make as perfect or effective as possible on cache coherence, branch prediction,out-of-order execution, multicore parallelism, and so on.Their improvements on memory access concentrate onhow to increase the hit rate of the cache; therefore, streamreassembly using legacy CPU could not achieve very highperformance.Currently, advanced progresses have been made in thenetwork electron component area. For example, RazaMicroelectronics has developed XLR, XLS and XLPnetwork processing units (NPUs), whereas Cavium haslaunched series of OCTEON NPUs. The emergence ofthese multicore NPUs can largely improve the processingability of the network devices and network securitydevices. On the other hand, multicore NPUs have manyhardware improvements on multithread operations, whichdecrease the thread switching overhead, hide the memoryaccess latencies, and employ the memory access cycleefficiently. As a result, their memory access performancecan be improved remarkably through these techniques.However, there are several issues that need to be tackledregarding using NPU to implement stream reassembly;for example, how to distribute packets to different NPUcores, how to accelerate stream timeout processing, andhow much memory should be used for the NPU.To address the aforementioned issues, in this paper, wespecially exploit the stream reassembly issue and study onhow the new multicore NPU can be used to improve thestream reassembly performance. We present a new streamreassembly card (SRC) design, which enables to manageand reassemble streams through adding a level of bufferto adjust the sequence of packets by using the multicoreNPU. Specifically, the contributions of this paper arethreefold.266S. Chen, R. Lu and X. (S.) Shen First, a multicore NPU-based stream reassemblyarchitecture is introduced. To the best of our knowledge, this is the first work on employing multicoreNPU-based stream reassembly technology specifically for NIDS and NFS. Second, several improvements have been introducedto increase the throughput of the stream reassemblyarchitecture. It has been found that the implementation of the stream reassembly architecture may berestricted by the DRAM size, so the relationshipamong memory requirement, timeout limit, and linkbandwidth is systematically researched. Finally, an SRC is implemented and evaluated todemonstrate that the SRC can achieve more than3 Gbps in terms of capturing and processing, triplyoutperforming over the traditional server-basedarchitecture.The remainder of this paper is organized as follows. Therelated work is provided in Section 2, and the motivationson selection of multicore NPU are presented in Section 3.Then, Section 4 depicts the system architecture and framework. The three improvements are introduced in Section 5.The relationship among memory size, timeout policy, andlink bandwidth are analyzed in Section 6, followed bythe experimentation and result analysis in Section 7.Finally, we conclude our work in Section 8.2. RELATED WORKThere are two open source programs: Libnids [14] andTcpflow [15] that fulfill transmission control protocol(TCP) stream reassembly. Libnids is an applicationprogramming interface (API) component of NIDS. Libnidsoffers Internet protocol (IP) defragmentation, TCP streamreassembly, and TCP port scan detection. It can obtainthe data carried by the TCP streams with reliability and iswidely used in the NIDS and forensic systems. On theother hand, Tcpflow is a program that captures data transmitted as part of TCP connections (flows) and stores thedata in two files that are convenient for protocol analysisand debugging. Tcpflow understands sequence numbersand correctly reconstructs data streams regardless ofretransmissions [16] or out-of-order delivery. But, it cannotprocess IP fragments properly, and its performance is notalso suitable for network links with more than 1 Gbpsbandwidth.Previous researches related to TCP streams are often focuson network measurements. For example, in [13], the authorshave used data recorded from two different operationalnetworks and study the flows in size, duration, rate, and burst,to examine how they are correlated. In [17], the authorsconcerned the problem of counting the distinct flows on ahigh-speed network link. They proposed a new timestampvector algorithm that retains the fast estimation and smallmemory requirement for the bitmap-based algorithms, whileSecurity Comm. Networks 2014; 7:265–278 2013 John Wiley & Sons, Ltd.DOI: 10.1002/sec

S. Chen, R. Lu and X. (S.) Shenreducing the possibility of underestimating the number ofactive flows.In [18], a TCP reassembly model and a stream verification methodology have been introduced for deriving andcomputing reassembly errors. In [19], it has presented analgorithm to solve the problem of TCP stream reassembling and matching performance problem for NFS andNIDS. Instead of caching the total fragments, their methodstores each fragment with a two-element tuple that isconstant size data structure; thus, the memory requirementinvolved in caching fragments is largely reduced.Dharmapurikar et al. [20] have introduced a hardwarebased reassembly system to solve both the efficiency androbust performance problems in the face of the adversariesto subvert it. They characterized the behavior of out-ofsequence packets seen in benign TCP traffic and designed asystem that addresses the most commonly observed packetreordering cases in which connections have at most a singlesequence hole in only one direction of the stream.3. THE NECESSITY OF MULTICORENETWORK PROCESSING UNITThe NIDS or NFS takes advantage of a “promiscuous mode”component or “sniffer”, to obtain copies of packets directlyfrom the network media, regardless of their destinations.Raw packets captured by the NIDS or NFS are confused anddisordered messes, whereas DPI needs these packets to befabricated as an integrated block according to their affiliatedTCP stream before they are sent to the matching engine.Figure 1 shows an example that a stream composed ofsix packets is obtained in a monitor point where packets2 and 3 are in disorder and packet 4 is repeated. The streamreassembly process needs to swap disordered packets 2 and3 and delete the second unwanted repeated packet 4. Thisprocess incurs three times packet movement: packet 2moving ahead, packet 3 moving backwards, and packet 5moving ahead as well. This is just an example of a singlestream. But in a real network environment, one backbonelink may contain a large number of streams. In otherwords, there may be a large number of packet movementsin the reassembling process. In addition, modern serversalways use dynamic random-access memory (DRAM), e.g.,DDR2 or DDR3 as their main memory; one memory accessmay take a number of cycles to obtain a result as DRAM hasa relatively long startup time.Stream reassembly based on multicore NPUHowever, using multicore NPU can improve thethroughput of the system for the following reasons:(1) There are many hardware threads in one core and manycores in one NPU, which makes the total threads in anNPU be more than a dozen. Threads of this kind arehardware based instead of software based, so theswitching overhead is very low. The large number ofhardware contexts enables software to more effectivelyleverage the inherent parallelism exhibited by packetassembling applications. When one hardware threadis waiting for the result of the memory accessing, another hardware thread could switch in and make another memory accessing request without muchoverhead. If many threads can take advantage of thepipeline mechanism, the latency of the DRAM willbe hidden, and the effective bandwidths of the DRAMaccess would increase.(2) A multicore NPU often has a low electric powerconsumption, so it is easy to be manufactured as acard. By utilizing a card, a server can also conductthe task of DPI, attack warning, and so on. When anNPU-based card is used, an extra buffer is introducedto the flow processing, so the packets can be sorted asthey are being transferred from the memory of thecard to the memory of the host (server). It is a formof trading space for performance. In this way, whenthe packets have been received into the memory ofthe card, they are stored in the memory as per theirreaching order, but their order is maintained by thesoftware running on the NPU.(3) The architecture of NPU often has favorable input/output (I/O) features, and the packets could beimported from the interface to the memory with highthroughput. Moreover, its dispatching mechanism(distributing packets to different threads or differentcores) is perfectly designed, so that dispatching component could pipeline with the corresponding processing threads (or cores). As the dispatching componentgenerally dispatches packets according to the selectedbits from the packet head, stream reserved would notbe a considerable problem. As many researches[22,23] focus on how to accelerate the packets capturing performance, an approach to jointly considerpacket capture and stream reassembly is cost effective.(4) An NPU often has a well-designed message-passingmechanism among different threads, which employscross-bar structure or fast shared static randomaccess memory (SRAM) as its transferring medium,and makes the cooperation and synchronizationbetween threads facile.4. SYSTEM ARCHITECTUREFigure 1. An example of stream reassembly.Security Comm. Networks 2014; 7:265–278 2013 John Wiley & Sons, Ltd.DOI: 10.1002/secIn our proposed scheme, NPU is used as a cooperatedstream managing component, which captures raw packetsand submits ordered as well as nonrepeated packets to267

S. Chen, R. Lu and X. (S.) ShenStream reassembly based on multicore NPUthe host to conduct further inspection. Instead of movingpackets in its memory, NPU just keeps the order of thepackets and submits them to the CPU according to theorder, which wipes off the cost of packet moving, andthe performance of the system is mostly dependant on theconsecutive packet copying from NPU to CPU. Asdepicted in Figure 2, the packets arrival sequence is likethe example of Figure 1, while the NPU removes theredundant packets and keeps their order in its memory. Inthe end, the NPU submits the TCP control block (TCB)and the packets according to the original order at theappropriate time.The key concepts relative to the stream managementwill be discussed in the following section.4.1. Stream in transmission control protocoltransferring levelThe TCP specification defines several “states” that any givenconnection can be in. The states observable by an NIDS andNFS (those involving the actual exchange of data betweentwo hosts) are not the same as TCP connection states. Onlytwo states (“CLOSED” and “ESTABLISHED”) would betaken into account.Transmission control protocol is a reliable, sequencedstream protocol that employs IP addresses and ports todetermine whether a packet belongs within a stream. Toreassemble the information flowing through a TCP connection, NIDS and NFS must figure out what sequencenumbers are being used. TCB is a data structure used byNIDS and NFS to describe the stream that is in the“ESTABLISHED” state. NIDS or NFS should have amechanism by which TCBs can be created for newlydetected connections and destroyed for connections thatare no longer alive.In the following discussion, we focus on three differentcritical actions that the NIDS may perform during theprocessing for a connection. They are TCB creation (theaction that an NIDS decides to instantiate a new TCB fora detected connection), packet reordering (the process anNIDS uses to reconstruct a stream associated with an openTCB), and TCB termination (the action that the NIDSdecides to close a TCB). Simple discussions of theseactions are as follows:(1) TCB creationThe NIDS has different approaches to employ for when tocreate the TCBs. It may attempt to determine the sequencenumbers being used simply by looking at the sequencenumbers appearing in TCP data packets (referred assynchronization on data), or it may rely entirely on thethree-way handshake (3WH). In our proposed method,we use synchronization on data as the signal of TCBcreation for the purpose of simplicity.(2) Packet reorderingFor a packet that belongs to an existing stream is received,NIDS needs to decide its position in the designated stream.If the NIDS does not use sequence numbers (simply insertsdata into the “reassembled” stream in the order it isreceived), it will not work properly because an attackercan blind such a system simply by adjusting the order ofthe sent packets, whereas the actual data received by theapplication level of the destination will not be the sameas the data obtained by the NIDS.(3) TCB terminationThe TCB termination is crucial because the maintenance ofconnections in NPU is resource consuming. When a connection is terminated, it does not need to assign resources to it aswell. There are two kinds of approaches that respectively useRST (a flag in TCP head used to reset a connection) or FIN (aflag in TCP head used to normally close a connection) toterminate a connection. Note that a connection can be aliveinfinitely without any data exchanging. Thus, it is inadequateto manage the per-connection resource problem becauseTCP connections do not implicitly time out. The lack of amethod to determine if a connection is idle or closed forcesthe NIDS and NFS to terminate the connection and deleteits TCB when no packet appears in the connection for a longtime. The problem with TCB termination is that an NIDS canbe tricked into closing a connection that is still active;thereby, it forces the system to lose state and increase theprobability of detection evasion. On the other hand, a NIDSthat fails to terminate a TCB for a really closed connection isalso vulnerable because the memory will be exhausted rapidly. In our proposed method, both FIN and RST messagesare utilized as the basic judgment to indicate the terminationof the connections. In addition, we also rely on timeouts as anFigure 2. Stream reassembly using network processing unit.268Security Comm. Networks 2014; 7:265–278 2013 John Wiley & Sons, Ltd.DOI: 10.1002/sec

S. Chen, R. Lu and X. (S.) ShenStream reassembly based on multicore NPU4.2. Frameworks of stream reassembly cardthe NPU. As the packets are DMAed to the host memory,the transferring is conducted one packet after another,which is because the packets are not stored consecutivelyin the memory of the NPU while we need them to beconsecutive when they reach the memory of the CPU.Software running on the NPU mainly executes the threeactions mentioned in Section 4.1: TCB creation, packetreordering, and TCB termination. When a packet reachesone core, the corresponding thread looks up in the streamtables to determine if a corresponding TCB exists. If not,the corresponding TCB is created, and the packet isappended to the TCB. Otherwise, the packet is appendedto an existent TCB, and its link position is determined;meanwhile, a judgment is made on whether the total packetsize of the stream is equal or larger than BlockSize(submitting block size). If the answer is positive, all thepackets are submitted in the light of their sequence to thehost. In fact, there are three situations that trigger data tobe submitted to the software running on the host:The framework of SRC is depicted in Figure 3. In SRC,packets are captured from interface into memory; for maintaining the TCP connection data, a hash table known asstream table is used. Hash is calculated on the SrcIP,SrcPort, DesIP, DesPort . When the packets enter thememory, their locations are stored in the packet descriptions. Besides the pointers that point to packets, packetdescriptions also contain the packet length and the fieldsused to dispatch the packets to the threads.Threads running on the NPU process a received packetand then wait for another packet circularly. Once the dataneed to be submitted, every thread itself is responsiblefor the task of submitting the packets from the memoryof the NPU to the memory of the host. Both the softwaresrunning on the NPU and CPU share a little memory spacein the double data rate synchronous dynamic randomaccess memory (DDR SDRAM, abbreviated as DDR) ofthe NPU for message communication, which is utilizedby the NPU to gain the address of the direct memoryaccess (DMA), the timeout limit of the host setting, theBlockSize, and the consuming states. CPU can alsoemploy the memory space to obtain the running states of(1) When the size of the received packets attains orexceeds the submitting block size (BlockSize iscalled), the data block that is made up of thesepackets has to be submitted. We need to submitthe data when the buffer possessive for one streamis too large, as the memory is limited. The largerthe BlockSize is, the larger the whole DRAM spacewill be. But if we adjust the BlockSize to be verysmall, the data that the host obtains will be smallas well, which may degrade the performance ofthe NIDS and NFS. Therefore, the selection of theBlocksize causes a tradeoff between memory spaceand overall performance.(2) In the situation of a packet with a finishing tag (RSTor FIN is set in the head of the TCP packet) hasbeen received, it indicates that the correspondingstream will be terminated by the server or theclient. In this case, the data block also needs to besubmitted to the host, and the corresponding TCBneeds to be deleted.(3) In the situation of no packet for a certain stream hasbeen received for a very long time (referred asauxiliary instrument. When a connection has been terminated, its remainder data on both directions will be submittedto the host, and then, its corresponding TCB will be deleted.Each TCP connection can be expressed as a four-elementtuple including source IP, source port, destination IP, anddestination port. Once a packet is captured, its correspondingstream needs to be localized, and the TCB data structureneeds to be updated. The minimum information of a TCBshould be composed of the aforementioned four-elementtuple, client’s expected sequence number, server’s expectedsequence number, pointer to the next TCB for resolving hashcollision, the time that the last packet was received, and thepointer point to the buffered packets. Generally, TCB isattached to a hash table indexed by hash algorithm by usingsome bits from the four-element tuple as parameters. Collisions lead to several TCBs being attached to one table entry.Figure 3. Frameworks of stream reassembly card.Security Comm. Networks 2014; 7:265–278 2013 John Wiley & Sons, Ltd.DOI: 10.1002/sec269

Stream reassembly based on multicore NPUstream timeout), it also indicates that data submission is obligatory. This is because either the communication on the stream may be terminatedaccidentally, or the stream is idle. The TCBs of suchstreams and their corresponding packets cannot bemaintained forever, because of the memory limitation. Obviously, the memory space is likely dependent on the stream timeout. The larger the streamtimeout is set to, the larger the memory space willbe required. On the other side, the shorter the streamtimeout is, the less accuracy of the stream reassembly will be.The total connection records are maintained in ahash table called stream table for efficient access. Notethat the hash needs to be independent of the permutation of source and destination pairs, which could beachieved by comparing the source IP address with thedestination IP address, and the less one is always madeto be the first parameter, or some hash algorithms thatare not sensitive to the sequence of parameters areused. By using the hash values as the indexes to thestream table, the corresponding connection can belocated. Hash collisions can be resolved by chainingthe colliding TCBs in a linked list.A data block submitted to the host consists of a TCBand several subblocks; each of which represents a datatransmission in the TCP/IP transport level from one peerto another. In addition, adjoining subblocks are from twodirections—one from client to server and the other fromserver to client. Some subblocks may consist of only onepackets, whereas other subblocks may have several packets, which are determined by the application level protocol,and the data amount needs to be transferred. For example,according to the pop3 protocol, before the mail body isS. Chen, R. Lu and X. (S.) Shentransferred, there will be several interactions ahead to makethe server authenticate the client and the client to check if ithas mails on the server. After the stream reassembly, thedata block will likely be in the form shown in Figure 4.Except subblock 7, all the other subblocks consist of onlyone packet, as they are very short and need not to bedivided into multiple packets. Subblock 7 consists of an“ OK” replay and the mail body. So even if the mailbody is not very large, subblock 7 should contain at leasttwo packets.The data submission procedure handled by thepacket-processing threads needs to work cooperativelywith the programs running on the host CPU. Aconsecutive memory chunk needs to be allocated bythe CPU to store the packets uploaded, and for theconvenience of the packet organization, the chunk isdivided into fix-sized buffers that are organized as aring. Consumer software (NIDS or NFS) running onthe host continually processes the data block received.When the speed of the consumer software is higherthan that of the threads running on the NPU, ringedbuffers will be full; finally, the NPU cannot uploaddata and can only check if there is any vacant buffer.Once a vacant buffer emerges, submission continues.In the situation of the ringed buffers are full, packetsarrive continually, but there is no thread that canprocess, as all the threads are checking the state ofthe memory, packet dropping cannot be avoided. So,the processing speed of the consumer software runningon host must match the data submission speed.Different security applications running on the hosthave different operations on the data block submitted.For example, if we want to run applications on thehost to take evidence for the forensics by recoveringthe e-mail body, it needs to scan the data on thestream level; after the subblocks of “USER”, “PASS”,“LIST”, and “RETR”, the subblock from pop3 serverto pop3 client is the content of the mail.4.3. The procedures of stream reassemblyFigure 4. Data structure example after reassembly.270The two significant data structures in stream reassembly are stream table and TCB. Stream table consistsof many entries; each of which points to a list ofTCBs with the same hash value.Two types of threads are used to fulfill the streamreassembly: the packet-processing thread and the timeoutthread. The packet-processing threads are responsible forpacket receiving, stream reconstruction, and data submission; moreover, stream reconstruction is divided intoTCB creation, packet reordering, and TCB termination,as depicted in Algorithm 4.3. The timeout thread is asimple cycle procedure; it accesses TCBs one by oneceaselessly, comparing the current time with the time ofthe last coming packet in every stream. If the gap betweenthe two times is larger than the appointed value, timeoutthread deems that the corresponding stream may be idleSecurity Comm. Networks 2014; 7:265–278 2013 John Wiley & Sons, Ltd.DOI: 10.1002/sec

S. Chen, R. Lu and X. (S.) Shenor closed, so it submits the remaining data and deletes theTCB to leave space to other streams.The main function of ReorderPacket is to sort theone-stream-affiliated packets according to their TCPsequence number and drop the repeated packets thathave the same sequence number. Instead of beingprocessed after a batch of packets belonging to astream have been received, the packets are maintainedupon being received. The reasons for ordering uponreception are as follows: (i) the batch processingmethod could lead to computing burst, which isdetrimental to the smooth process, and (ii) out-of-orderpackets are actually rare because most of the arrivedpackets are ordinal and consecutive. As a result, processing packets one by one saves more computationalresource.As the data are submitted to the host, all the packetsmust be ensured to be ordinal and consecutive. We useordering to express the sequence of the packet and continuity to denote whether there is any packet that should arrivebut has not yet. When the packets arrive, the

to adjust the sequence of packets by using the multicore NPU. Specifically, the contributions of this paper are threefold. † First, a multicore NPU-based stream reassembly architecture is introduced. To the best of our knowl-edge, this is the first work on employing multicore NPU-based stream reassembly technology specifi-cally for NIDS .

Related Documents:

2021 NPU WOMEN’S SOCCER PROGRAM 2021 NPU SCHEDULE & RESULTS Date: Wednesday, October 6, . Goals by Period 1st 2nd OT OT2 Total North Park 3 4 1 0 8 Opponents 13 14 0 0 27 Shots by Period 1st 2nd OT OT2 Total . 20 Rebekah

CPU_SEL 01 CPU0/1 133MHz (default) CPU0/1 100MHz 03 Place each 0.1uF cap as close as possible to each VDD IO pin. Place the 10uF caps on the VDD_IO plane. EC-A-10 R220 10K_4 CK505 QFN32 U24 ICS9LRS3197AKLFT 18 VDD_CPU_IO SRC-2 13 15 VDD_SRC_IO 2 VSS_DOT 27M_SS 7 DOT96# 4 SRC-1/SATA 10 SRC-1#/SATA 11 SRC-2# 14 27M 6 DOT96 3 21 VSS_CPU CPU-1 20 .

NPU architecture consists of processing elements (PEs) with the weight-stationary dataflow, systolic array network, and data alignment unit. This baseline NPU architecture well satisfies the requirements of SFQ-based logics such as fast computation, dataflow-like data movements, and shift-register-based memory implementation, respectively. 58

Multicore computer: A computer with more than one CPU. 1960- 1990: Multicore existed in mainframes and supercomputers. 1990's : Introduction of commodity multicore servers. 2000's : Multicores placed on personal computers. Soon : Everywhere except embedded systems? But switched on and off based on need: each active core burns power

- A performance study of AMG on a large multicore cluster with 4-socket, 16-core nodes using MPI, OpenMP, and hybrid programming; - Scheduling strategies for highly asynchronous codes on multicore platforms; - A MultiCore SUPport (MCSup) library that provides efficient support for mapping an OpenMP program onto the underlying architecture;

Using "—multicore" compile switch with the NVCC compiler generates C code for multi-core CPU Performance scales linearly with more cores Control numbers of cores with environment variable CUDA_NROF_CORES n NVCC --multicore C/C CUDA Application Multicore CPU C Code Multicore Optimized Application gcc / MSVC

footprint than MPI and that exploits the properties of the domain. The Multicore Asso-ciation has developed such an industry standard for multicore software development. The standard for message passing communication is called MCAPI [MCA 2011]. In this article, we provide reliability techniques for multicore software developed using MCAPI.

Araling Panlipunan . 2 Araling Panlipunan 2 Ma. Ther Inilimbag sa Pilipinas ng _ Department of Eduction-Instructional Materials Council Secretariat (DepEd-IMCS) Office Address: nd 2 Floor Dorm G, PSC Complex Meralco Avenue, Pasig City Philippines 1600 Telefax: (02) 634-1054 or 634-1072 E-mail Address: imcsetd@yahoo.com Mga Bumuo ng Kagamitan ng Mag-aaral Consultant: Zenaida E. Espino .