Big Data Working Group Expanded Top Ten Big Data Security .

3y ago
25 Views
2 Downloads
990.73 KB
39 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Adele Mcdaniel
Transcription

Big Data Working GroupExpanded Top Ten Big DataSecurity and PrivacyChallengesApril 2013

CLOUD SECURITY ALLIANCE Expanded Top Ten Big Data Security and Privacy Challenges, April 2013vAbstract— Security and privacy issues are magnified by the velocity, volume, and variety of Big Data, such aslarge-scale cloud infrastructures, diversity of data sources and formats, streaming nature of data acquisition andhigh volume inter-cloud migration. Therefore, traditional security mechanisms, which are tailored to securingsmall-scale, static (as opposed to streaming) data, are inadequate. In this paper, we highlight the top ten BigData security and privacy challenges. Highlighting the challenges will motivate increased focus on fortifying BigData infrastructures.Keywords: Big Data; top ten; challenges; security; privacy 2013 Cloud Security Alliance – All Rights ReservedAll rights reserved. You may download, store, display on your computer, view, print, and link to the Top Ten BigData Security and Privacy Challenges at a/, subject to thefollowing: (a) the Document may be used solely for your personal, informational, non-commercial use; (b) theDocument may not be modified or altered in any way; (c) the Document may not be redistributed; and (d) thetrademark, copyright or other notices may not be removed. You may quote portions of the paper as permittedby the Fair Use provisions of the United States Copyright Act, provided that you attribute the portions to TopTen Big Data Security and Privacy Challenges (2013). 2013 Cloud Security Alliance - All Rights Reserved.2

CLOUD SECURITY ALLIANCE Expanded Top Ten Big Data Security and Privacy Challenges, April 2013vContentsAcknowledgments . 4Introduction . 41.0 Secure Computations in Distributed Programming Frameworks. 82.0 Security Best Practices for Non-Relational Data Stores . 103.0 Secure Data Storage and Transactions Logs . 144.0 End-Point Input Validation/Filtering . 175.0 Real-Time Security Monitoring . 196.0 Scalable and Composable Privacy-Preserving Data Mining and Analytics . 227.0 Cryptographically Enforced Data-Centric Security . 258.0 Granular Access Control . 289.0 Granular Audits. 3110.0 Data Provenance . 33Conclusion . 35References . 36 2013 Cloud Security Alliance - All Rights Reserved.3

CLOUD SECURITY ALLIANCE Expanded Top Ten Big Data Security and Privacy Challenges, April 2013vAcknowledgmentsCSA Big Data Working Group Co-ChairsLead: Sreeranga Rajan, FujitsuCo-Chair: Wilco van Ginkel, VerizonCo-Chair: Neel Sundaresan, eBayContributorsAnant Bardhan, CTSYu Chen, SUNY BinghamtonAdam Fuchs, SqrrlAditya KapreAdrian Lane, SecurosisRongxing Lu, University of WaterlooPratyusa Manadhata, HP LabsJesus Molina, FujitsuAlvaro Cardenas Mora, University of Texas DallasPraveen Murthy, FujitsuArnab Roy, FujitsuShiju Sathyadevan, Amrita UniversityNrupak Shah, Dimension DataCSA Global StaffAlex Ginsburg, CopyeditorLuciano JR Santos, Global Research DirectorEvan Scoboria, WebmasterKendall Scoboria, Graphic DesignerJohn Yeoh, Research Analyst 2013 Cloud Security Alliance - All Rights Reserved.4

CLOUD SECURITY ALLIANCE Expanded Top Ten Big Data Security and Privacy Challenges, April 2013vIntroductionThe term “Big Data” refers to the massive amounts of digital information companies and governments collectabout human beings and our environment. The amount of data generated is expected to double every two years,from 2500 exabytes in 2012 to 40,000 exabytes in 2020 [56]. Security and privacy issues are magnified by thevolume, variety, and velocity of Big Data. Large-scale cloud infrastructures, diversity of data sources and formats,the streaming nature of data acquisition and high volume inter-cloud migration all create unique securityvulnerabilities.It is not merely the existence of large amounts of data that is creating new security challenges. Big Data has beencollected and utilized by many organizations for several decades. The current use of Big Data is novel becauseorganizations of all sizes now have access to Big Data and the means to employ it. In the past, Big Data was limitedto very large organizations such as governments and large enterprises that could afford to create and own theinfrastructure necessary for hosting and mining large amounts of data. These infrastructures were typicallyproprietary and were isolated from general networks. Today, Big Data is cheaply and easily accessible toorganizations large and small through public cloud infrastructure. Software infrastructures such as Hadoop enabledevelopers to easily leverage thousands of computing nodes to perform data-parallel computing. Combined withthe ability to buy computing power on-demand from public cloud providers, such developments greatly acceleratethe adoption of Big Data mining methodologies. As a result, new security challenges have arisen from the couplingof Big Data with public cloud environments characterized by heterogeneous compositions of commodity hardwarewith commodity operating systems, and commodity software infrastructures for storing and computing on data.As Big Data expands through streaming cloud technology, traditional security mechanisms tailored to securingsmall-scale, static data on firewalled and semi-isolated networks are inadequate. For example, analytics foranomaly detection would generate too many outliers. Similarly, it is unclear how to retrofit provenance in existingcloud infrastructures. Streaming data demands ultra-fast response times from security and privacy solutions.The purpose of this paper is to highlight the top ten Big Data security and privacy challenges according topractitioners. To do so, the working group utilized a three-step process to arrive at the top challenges in Big Data:1. The working group interviewed Cloud Security Alliance (CSA) members and surveyed security-practitioneroriented trade journals to draft an initial list of high priority security and privacy problems.2. The working group studied published solutions.3. The working group characterized a problem as a challenge if the proposed solution did not cover theproblem scenarios.Based on this three-step process, the working group compiled the top ten challenges to Big Data security andprivacy:1. Secure computations in distributed programming frameworks2. Security best practices for non-relational data stores3. Secure data storage and transactions logs 2013 Cloud Security Alliance - All Rights Reserved.5

CLOUD SECURITY ALLIANCE Expanded Top Ten Big Data Security and Privacy Challenges, April 2013v4.5.6.7.8.9.10.End-point input validation/filteringReal-time security monitoringScalable and composable privacy-preserving data mining and analyticsCryptographically enforced data centric securityGranular access controlGranular auditsData provenanceFigure 1 depicts the top ten challenges in the Big Data ecosystem.Figure 1: Top Ten Security and Privacy Challenges in the Big Data EcosystemThe challenges may be organized into four aspects of the Big Data ecosystem, as depicted in Figure 2:1.2.3.4.Infrastructure SecurityData PrivacyData ManagementIntegrity and Reactive Security 2013 Cloud Security Alliance - All Rights Reserved.6

CLOUD SECURITY ALLIANCE Expanded Top Ten Big Data Security and Privacy Challenges, April 2013vInfrastructuresecurityData PrivacyDataManagementIntegrity andReactiveSecuritySecure Computationsin DistributedProgrammingFrameworksPrivacy PreservingData Mining andAnalyticsSecure Data Storageand Transaction LogsEnd-point validationand filteringSecurity BestPractices for NonRelational DataStoresCryptographicallyEnforced Data CentricSecurityGranular AuditsReal time SecurityMonitoringGranular AccessControlData ProvenanceFigure 2: Classification of the Top 10 ChallengesIn order to secure the infrastructure of Big Data systems, the distributed computations and data stores must besecured. To secure the data itself, information dissemination must be privacy-preserving, and sensitive data mustbe protected through the use of cryptography and granular access control. Managing the enormous volume ofdata necessitates scalable and distributed solutions for both securing data stores and enabling efficient audits anddata provenance. Finally, the streaming data emerging from diverse end-points must be checked for integrity andcan be used to perform real time analytics for security incidents to ensure the health of the infrastructure.Solving security and privacy challenges typically requires addressing three distinct issues:1. Modeling: formalizing a threat model that covers most of the cyber-attack or data-leakage scenarios2. Analysis: finding tractable solutions based on the threat model3. Implementation: implementing the solution in existing infrastructuresIn this paper, we provide a brief description of each challenge, review usage of Big Data that may be vulnerable,and summarize existing knowledge according to the modeling, analysis, and implementation for each challenge. 2013 Cloud Security Alliance - All Rights Reserved.7

CLOUD SECURITY ALLIANCE Expanded Top Ten Big Data Security and Privacy Challenges, April 2013v1.0 Secure Computations in DistributedProgramming FrameworksDistributed programming frameworks utilize parallel computation and storage to process massive amounts ofdata. For example, the MapReduce framework splits an input file into multiple chunks. In the first phase ofMapReduce, a Mapper for each chunk reads the data, performs some computation, and outputs a list of key/valuepairs. In the next phase, a Reducer combines the values belonging to each distinct key and outputs the result.There are two major attack prevention measures: securing the mappers and securing the data in the presence ofan untrusted mapper.1.1 Use CaseUntrusted mappers can be altered to snoop on requests, alter MapReduce scripts, or alter results. The mostdifficult problem is to detect mappers returning incorrect results, which will, in turn, generate incorrect aggregateoutputs. With large data sets, it is nearly impossible to identify malicious mappers that may create significantdamage, especially for scientific and financial computations.Retailer consumer data is often analyzed by marketing agencies for targeted advertising or customer-segmenting.These tasks involve highly parallel computations over large data sets and are particularly suited for MapReduceframeworks such as Hadoop. However, the data mappers may contain intentional or unintentional leakages. Forexample, a mapper may emit a unique value by analyzing a private record, undermining users’ privacy.1.2 ModelingThe threat model for mappers has three major scenarios:1. Malfunctioning Compute Worker Nodes – Workers assigned to mappers in a distributed computationcould malfunction due to incorrect configuration or a faulty node. A malfunctioning Worker could returnincorrect output from the mapper, which may compromise the integrity of the aggregate result. Such aWorker may also be modified to leak users’ confidential data or profile users’ behaviors or preferencesfor privacy mining.2. Infrastructure Attacks – Compromised Worker nodes may tap the communication among other Workersand the Master with the objective of replay, Man-In-the-Middle, and DoS attacks to the MapReducecomputations.3. Rogue Data Nodes – Rogue data nodes can be added to a cluster, and subsequently receive replicateddata or deliver altered MapReduce code. The ability to create snapshots of legitimate nodes and reintroduce altered copies is a straightforward attack in cloud and virtual environments and is difficult todetect. 2013 Cloud Security Alliance - All Rights Reserved.8

CLOUD SECURITY ALLIANCE Expanded Top Ten Big Data Security and Privacy Challenges, April 2013v1.3 AnalysisBased on the threat model outlined above, there are two dimensions of analysis: ensuring trustworthiness ofmappers and securing the data despite untrusted mappers.For ensuring the trustworthiness of mappers, there are two techniques: trust establishment and MandatoryAccess Control (MAC) [1].1. Trust establishment has two steps: initial trust establishment followed by periodic trust update. When aWorker sends a connection request to the Master, the Master authenticates the Worker. Onlyauthenticated Workers with expected properties will be assigned a mapper task. Following the initialauthentication, the security properties of each Worker are checked periodically for conformance withpredefined security policies.2. MACensures access to the files authorized by a predefined security policy. MAC ensures integrity of inputsto the mappers, but does not prevent data leakage from the mapper outputs.In order to prevent information leakage from mapper outputs, data de-identification techniques are required toprevent violation of privacy through the output of aggregate computations. A mathematically rigorous definitionfor data de-identification is the notion of differential privacy, which is achieved by adding random noise to theoutput of a computation. However, it is difficult to prove that a particular technique is privacy-preserving.1.4 ImplementationMAC is implemented in Airavat [1] by modifying the MapReduce framework, the distributed file system, and theJava virtual machine with SELinux as the underlying operating system. MAC in SELinux ensures that untrusted codedoes not leak information via system resources. However it cannot guarantee privacy for computations based onoutput keys produced by untrusted mappers. To prevent information leakage through the outputs, it relies on arecently developed de-identification framework of differential privacy based on function sensitivity. In the contextof mappers, function sensitivity is the degree of influence that an input can have the mapper output. Estimatingthe sensitivity of arbitrary untrusted code is difficult.There are two problems to be tackled with the solutions outlined above for widespread practical adoption:1. Performance penalties due to imposing MAC2. Limitations of differential privacy in providing guarantees 2013 Cloud Security Alliance - All Rights Reserved.9

CLOUD SECURITY ALLIANCE Expanded Top Ten Big Data Security and Privacy Challenges, April 2013v2.0 Security Best Practices for Non-Relational DataStoresThe security infrastructures of non-relational data stores popularized by NoSQL databases are still evolving [2].For instance, robust solutions to NoSQL injection are still not mature. Each NoSQL database was built to tackledifferent challenges posed by the analytics world, and security was never addressed during the design stage.Developers using NoSQL databases usually embed security in the middleware. NoSQL databases do not provideany support for explicitly enforcing security in the database. However, clustering aspects of NoSQL databases poseadditional challenges to the robustness of such security practices.2.1 Use CaseCompanies dealing with large unstructured data sets may benefit by migrating from a traditional relationaldatabase (RDB) to a NoSQL database. NoSQL databases accommodate and process huge volumes of static andstreaming data for predictive analytics or historical analysis. Threat trees derived from detailed threat analysisusing threat-modeling techniques on widely used NoSQL databases demonstrate that NoSQL databases only havea very thin security layer, compared to traditional RDBs. In general, the security philosophy of NoSQL databasesrelies on external enforcement mechanisms. To reduce security incidents, the company must review securitypolicies for the middleware and, at the same time, toughen the NoSQL database itself to match the security RDBswithout compromising on its operational features. The capability of NoSQL databases to perform analytics overunstructured and structured data at ease is in no way comparable to an RDB’s ability to handle OLTP and OLAP(to a large extent with the latest RDBMS versions). However, it is important that security loopholes within NoSQLdatabases are plugged without compromising on its outstanding analytical capabilities.Cloud-based solutions, in which traditional service providers offer Analytics as a Service (AaaS), are based onanalytics frameworks built using a combination of tools capable of handling both streaming and static data, withNoSQL databases used for intermediate data handling. In such scenarios, several users share the framework,feeding both streaming and static data with appropriate connectors through the framework for analytics. Thesedata sets need to be held in a NoSQL database for intermediate processing before the results are pushed to therespective users. With current NoSQL security mechanisms, it is virtually impossible to segregate sensitive datapertaining to different cloud users sharing the framework’s internal NoSQL database.2.2 ModelingThe same architectural flexibility that allows the two notable features of NoSQL, performance and scalability,poses the greatest security risk [3]. NoSQL was designed with the vision of tackling large data sets, with limitedemphasis on security [4]. This has caused many critical security flaws in NoSQL, only a few of which are addressedin this paper. Lack of security standards has caused vendors to develop bottom-up NoSQL solutions and addresssecurity issues on an ad-hoc basis. The threat model of NoSQL databases has six major scenarios: 2013 Cloud Security Alliance - All Rights Reserved.10

CLOUD SECURITY ALLIANCE Expanded Top Ten Big Data Security and Privacy Challenges, April 2013v1. Transactional Integrity – One of the most visible drawbacks of NoSQL is its soft approach towards ensuringtransactional integrity. Introducing complex integrity constraints into its architecture will fail NoSQL’sprimary objective of attaining better performance and scalability. Techniques like Architectural Trade-offAnalysis Method (ATAM) specifically deal with the trade-offs in quality requirements in architecturaldecision (for example, performance vs. security). This analytical method can be utilized to evaluate thelevel of integrity constraints that may be infused into a core architectural kernel without significantlyaffecting performance.2. Lax Authentication Mechanisms – Across the board, NoSQL uses weak authentication techniques andweak passwo

The purpose of this paper is to highlight the top ten Big Data security and privacy challenges according to practitioners. To do so, the working group utilized a three-step process to arrive at the top challenges in Big Data: 1. The working group interviewed Cloud Security Alliance (CSA) members and surveyed security-practitioner

Related Documents:

BIG DATA USE CASE TEMPLATE 2 NIST Big Data Public Working Group This template was designed by the NIST Big Data Public Working Group (NBD-PWG) to gather Big Data use cases. The use case information you provide in this template will greatly help the NBD-PWG in the next phase of developing the NIST Big Data Interoperability Framework.

The Rise of Big Data Options 25 Beyond Hadoop 27 With Choice Come Decisions 28 ftoc 23 October 2012; 12:36:54 v. . Gauging Success 35 Chapter 5 Big Data Sources.37 Hunting for Data 38 Setting the Goal 39 Big Data Sources Growing 40 Diving Deeper into Big Data Sources 42 A Wealth of Public Information 43 Getting Started with Big Data .

The issues of storing, computing, security and privacy, and analytics are all magnified by the velocity, volume, and variety of big data, such as large -scale cloud infrastructures, diversity of data . coupled with high input/output data rates and low latency requirements poses the most severe challenges on the . BIG DATA WORKING GROUP Big .

big data systems raise great challenges in big data bench-marking. Considering the broad use of big data systems, for the sake of fairness, big data benchmarks must include diversity of data and workloads, which is the prerequisite for evaluating big data systems and architecture. Most of the state-of-the-art big data benchmarking efforts target e-

Big data for medicines regulation and better health: publication of Big Data Steering Group workplan 2022-25 . Methods Task Force, EMA Jesper Kjær Co-chair of Big Data Steering Group/ Director of Data Analytics Centre, DKMA. Issue 3 — September 2022 Page 2 BIG DATA HIGHLIGHTS Featured topics Big Data priority recommendations Metadata list .

targeted by the recently established NIST Big Data Working Group (NBD-WG) [4] that meets at weekly basis in subgroups focused on Big Data definition, Big Data Reference Architecture, Big Data Requirements, Big Data Security. The authors are actively contributing to the NBD-WG and have presen

of big data and we discuss various aspect of big data. We define big data and discuss the parameters along which big data is defined. This includes the three v’s of big data which are velocity, volume and variety. Keywords— Big data, pet byte, Exabyte

Alex’s parents had been killed shortly after he was born and he had been brought up by his father’s brother, Ian Rider. Earlier this year, Ian Rider had died too, supposedly in a car accident. It had been the shock of Alex’s life to discover that his uncle was actually a spy and had been killed on a mission in Cornwall. That was when MI6 had