BIG DATA Terabytes, Petabytes, Exabytes, And

2y ago
6 Views
1 Downloads
5.01 MB
69 Pages
Last View : 22d ago
Last Download : 3m ago
Upload by : Abby Duckworth
Transcription

BIG DATATERABYTES, PETABYTES, EXABYTES, ANDZETTABYTES – OH MY !!COMPILED BY HOWIE BAUM

Big Data is a phrase that gets talked abouta lot in the media, the board room — andeverywhere in between.It’s been used, overused and usedincorrectly so many times that it’s becomedifficult to know what it really means. Is it a tool? Is it a technology? Is it just a buzzword used by datascientists to scare us? Is it really going to change the world?Or ruin it?First let’s just say that Big Data isgetting bigger every day, “fast”SO FAST THAT 90% OF THE WORLD’SDIGITAL DATA WAS CREATED IN THELAST TWO YEARS !!

WHAT IS BIG DATA ?Big data is the term for a collection of data sets so large andcomplex that it becomes difficult to process using on-handdatabase management tools or traditional data processingapplications. (from Wikipedia)The first ideas about its value forBusinesses started in 2006The challenges include:Data captureStorageSearchSharingTransferAnalysisSome have defined big data as an amount of datathat exceeds a petabyte—one million gigabytes !

WHY HAS BIG DATA BECOME SOPOPULAR NOW?Only in the last few years, companiesnow realize that the availability of lowerpriced and faster computer power -- orbig computing -- is the real change thathas opened the door to big opportunity.Big computing at small prices allowscompanies to look at, and deal with, data inways not possible before. It's thiscomputational capacity that has the realpotential to transform data from a complianceburden into a business asset.Organizations have always collected data, butuntil recently, large-scale cluster computingand analytic algorithms that could perform atscale were cost-prohibitive. That's no longerthe case, and many organizations are nowexperimenting with big data

THE 3 TO 8 “V”s ARE USED TO DESCRIBE BIG DATAWhat’s important to keep in mind is that Big Data isn’t just about the amount ofdata we’re generating, it’s also about all the different types of data: Text Video Search logs Sensor logs Customer transactionsIn most big data circles, it’s description is based on the four V’s: volume, variety, velocity,and veracity. (You might consider a fifth V, value.)The following pages show comparisons of 4, 7, and 8 V’s that are part of BigData.

REVIEWING THE 7 “V” ‘s ofBIG DATAVolume : With the dramatic growth ofthe internet, mobile devices, socialmedia, and Internet of Things (IoT)technology, the amount of datagenerated by all these sources has grownaccordingly.Velocity : In addition to getting bigger,the generation of data and organizations’ability to process it is accelerating.Variety: In earlier times, most datatypes could be neatly captured in rowson a structured table.In the Big Data world, data often comesin unstructured formats like social mediaposts, server log data, latitude andlongitude geo-coordinates, photos,audio, video and free text.

Variability: The meaning ofwords in unstructured data canchange based on context.Veracity: With many differentdata types and data sources,data quality issues invariablypop up in Big Data sets.Veracity deals with exploring adata set for data quality andsystematically cleansing thatdata to be useful for analysis.Visualization: Once data hasbeen analyzed, it needs to bepresented in a visualization forend users to understand andact upon.Value: Data must be combinedwith rigorous processing andanalysis to be useful.

5 MINUTE GREAT VIDEO ABOUT IT GO TO 4:02 MINUTEShttps://www.youtube.com/watch?v bAyrObl7TYE

WHAT ARE THE UNITS IN BIG DATA ?

THE AMOUNT OF MEMORY IN COMPUTERSThe smallest amount of memory is a bit which represents a 0 or a 1A memory location, or byte, is made of 8 bits and usually stores one charactersuch as a letter or a number or symbol.Therefore, a computer with 8 Megabyte of memory can store approximately 8 millioncharacters.One megabyte can hold approximately 768 pages of text information.1 Byte 8 bits 1 letter, number, or a symbol-Kilobyte (KB) 1 Thousand Bytes–Megabyte (Mb) 1 Million Bytes–Gigabyte (GB) 1 Billion Bytes–Terabyte (TB) 1 Trillion Bytes11

20

MEGABYTE 1 MILLION BYTES1 MEGABYTE IS ENOUGH TEXT FOR A 400 PAGE BOOK1 MEGABYTE 768 PAGES OF TYPED TEXT2 MEGABYTES FOR AN AVERAGE PHOTO5 MEGABYTES ONE, 4 MINUTE SONG700 MEGABYTES 1 CD WITH 80 MINUTES OF MUSIC

GIGABYTE 1,000 MEGABYTES1 GIGABYTE 10 YARDS OF BOOKS ON A SHELF1 GIGABYTE DATA OF THE MUSIC OFBEETHOVEN’S 5TH SYMPHONY1 GIGABYTE STACK OF TYPED PAGES 262 FEET HIGH --4.7 GIGABYTES 1 DVD WITH MOVIES ON IT7 GIGABYTES HOW MUCH DATA YOU USE PER HOURSTREAMING A NETFLIX HIGH DEFINITION VIDEO2.5 BILLION GIGABYTES OF DATA ARE PRODUCED EVERY DAY !!

TERABYTE 1,000 GIGABYTES1 TERABYTE OF PRINTED PAGES IS 51 MILES HIGH1 TERABYTE IS THE DATA ON ALL OF THE X-RAY IMAGES IN A LARGE HOSPITAL,PER YEAR1 TERABYTE IS 200,000, FIVE MINUTE SONGS OR 310,000 PICTURES10 TERABYTES IS THE INFORMATION IN THE PRINTED COLLECTION IN THELIBRARY OF CONGRESS24 TERABYTES OF VIDEOS ARE UPLOADED TO YOUTUBE, EVERY DAY

PETABYTE 1,000 TERABYTES1 PETABYTE A STACK OF 500 BILLION PAGESOF TYPED TEXT THAT IS 52,000 MILES HIGH,WHICH IS ¼ THE DISTANCE BETWEEN THEEARTH AND THE MOON1.5 PETABYTES 10 BILLION PHOTOS ONFACEBOOK2 PETABYTES OF PRINTED INFORMATION IS INALL THE UNITED STATES ACADEMIC LIBRARIES20 PETABYTES OF DATA IS PROCESSED BYGOOGLE, EVERY DAY

EXABYTE 1,000 PETABYTES1 EXABYTE OF TYPED PAGES IS 52 MILLIONMILES HIGH WHICH IS TWICE THE DISTANCEBETWEEN THE EARTH AND THE PLANET VENUS2 EXABYTES ALL OF THE WORLD’SINFORMATION IN A YEAR5 EXABYTES ALL OF THE WORDS EVERSPOKEN BY HUMANS1 EXABYTE 11 MILLION, 4K VIDEOS15 EXABYTES AN ESTIMATE OF ALL OF THEINFORMATION HELD BYGOOGLE

Backblaze is a data storage provider. Itoffers two products:1) B2 Cloud Storage - An object storageservice similar to Amazon's S3.2) Computer Backup - An online backuptool that allows Windows and macOSusers to back up their data to offsitedata centers.The service is designed for businessesand end-users, providing unlimitedstorage space and supporting unlimitedfile sizes.

AN EXPLOSION OF DATA –WHERE DOES IT ALL COMEFROM ?The arrival of internet, socialmedia and the digitization ofeverything around the worldhave led to massive amount ofdata generated every second. Retail sales and inventorydatabases Logistics – truck and trainmovement of goods Extracting meaningfulinformation from stillimages, video and audiothat people see and listento Smart objects(sensors) and theInternet of Things. Social media Personnel files Financial services Location data andonline activities. Healthcare Machine generated data Computer and networklogs.

DATA FROM THE INTERNET OF THINGS

USES OF BIGDATA

Benefits of Big DataUsing big data, Netflix saves 1 billion per year on customer retention (TechJury) 1 trillion – the amount businesses will save from IoT by 2020 (Grazziti) 40 billion – the projected financial impact of AI by 2025 (Tractica)8–10% – profit increase for businesses that use big data. (Entrepreneur)Data wrapped in stories are 22x more memorable than bare facts (Chicago AnalyticsGroup)70% of businesses believe that data warehouse optimization is critical to their success(Forbes)Data analytics top 4 benefits:25%17%13%12%faster innovation cyclesimproved business efficiencies/higher productivitymore effective R&Dproduct/service (Chicago Analytics Group)

USES OF BIG DATA AND WHERE IT ISUSED1) Health Care, 2) Detect Fraud, 3) SocialMedia Analysis, 4) Weather, and the5) Public sector1) Contribution of Big Data in HealthCareIt has grown a lot. With medical advancesthere was need to store large amount ofdata of the patient’s health history.This data can be used to analyze thepatient’s health condition and to preventhealth failures in future.Google famously showed that they couldpredict flu outbreaks based upon when andwhere people were searching for flu-relatedterms.

GENERAL ELECTRICHEALTH INFOSCOPEWhen you get a sore throatdo you also end up gettingan ear infection?Health Infoscope is acompilation of 72 millionelectronic records andshows the connection ofone disease withanother.It also shows thestrength of theconnection and thelikelihood of catching onedisease due to the other.

2) DETECTING FRAUDFraud detection and prevention isone of the many uses of BIg Datatoday.Credit card companies face a lot offrauds and big data technologiesare used to detect and preventthem.Earlier credit card companies wouldkeep a track on all the transactionsand if any suspicious transaction isfound they would call the buyer andconfirm if that transaction wasmade.Now the buying patterns areobserved and fraud affected areasare analyzed using Big Dataanalytics.

3) SOCIAL MEDIA ANALYSISThe best use case of big data isthe data that keeps flowing onsocial media networks like,Facebook, Twitter, etc.The data is collected andobserved in the form ofcomments, images, socialstatuses, etc.Companies use big datatechniques to understand thecustomers requirements andcheck what they say on socialmedia.This helps companies to analyzeand come up strategies that willbe beneficial for the company’sgrowth.

4) WEATHERBig Data technologiesare used to predict theweather forecast.Large amounts of dataabout the climate fromground sensors andsatellites is fed intocomputers and anaverage is taken topredict the weather.This can be useful topredict natural calamitiessuch as floods, etc.

5) PUBLIC SECTORBig Data is used in a lot ofgovernment applications as wellas in public sectors.It provides helpful information toa lot of facilities such as electricand natural gas power generation,water utilities, investigation,economic promotion, etc.It is used in many other casessuch as the Education sector,Insurance services,Transportation. SecurityIntelligence, etc.Big data has become an importantpart for analysis and is needed inorder to understand the growth ofthe businesses and buildstrategies to help it grow further.

WHY DO WE WAN’T TO COLLECT BIG DATA ?WHAT ARE THE BENEFITS ?

WHO COLLECTS BIG DATA ?

THE ‘SCARY’ SEVEN:BIG DATA CHALLENGES AND WAYS TO SOLVE THEMA.1) INSUFFICIENT UNDERSTANDING AND ACCEPTANCE OF BIGDATASome companies fail to know even thebasics: what big data actually is, what itsbenefits are, what infrastructure isneeded, etc. so if they don’t set it upproperly, it is doomed to failure and theymay waste a lot of time and resourcesB. Big Data can be a huge change for acompany, so it needs to be accepted bytop management first and then down theladder, but is shouldn’t be overdone or itwill have an adverse effect on thoseinvolved to implement it.C. To ensure big data understanding andacceptance at all levels, IT departmentsneed to organize numerous trainings andworkshops.

2) CONFUSING VARIETY OF BIG DATA TECHNOLOGIESIt can be easy to get lost in the variety ofbig data technologies now available onthe market.Do you need Spark or would the speeds ofHadoop MapReduce be enough?Is it better to store data in Cassandra orHBase?Finding the answers can be tricky. And it’seven easier to choose poorly, if you areexploring the ocean of technologicalopportunities without a clear view of what youneed.Solution - Use skills in your IT Department orseek professional help by hiring an expertabout it.

Big Data requires a set oftools and techniques foranalysis to gain insightsfrom it.Hadoop which helps in storingand processing large dataSpark helps in-memorycalculationStorm helps in fasterprocessing of unbounded data,Apache Cassandra provideshigh availability and scalabilityof a databaseMongoDB provides crossplatform capabilities,

Big Data Analysis is now commonly used by many companies to predict market trends, personalizecustomers experiences, speed up company’s workflow, etc.Big Data can be processed using different tools such as MapReduce, Spark, Hadoop, Pig, Hive,Cassandra and Kafka. Each of these different tools has its advantages and disadvantages whichdetermines how companies might decide to use them.

THREE DIFFERENTWAYS OFFORMATTINGDATA COMMONLYUSEDUnstructured unorganized data (eg.videos).Semi-structured thedata is organized in anot fixed format (eg.JSON).Structured the datais stored in astructured format (eg.RDBMS).

3) PAYING LOADS OFMONEYBig Data adoption projects entail lotsof expenses.If you opt for an on-premisessolution, you’ll have the costs ofnew hardware,new hires, electricity and the needto pay for the development, setup,configuration and maintenance ofnew software.If you decide on a cloud-based bigdata solution, you’ll still needto hire staff and pay for cloudservices, big datasolution development as well assetup and maintenance ofneeded frameworks.

4) COMPLEXITY OF MANAGINGDATA QUALITYSooner or later, you’ll run into the problem ofdata integration, since the data you need toanalyze comes from diverse sources in avariety of different formats.For instance, ecommerce companies needto analyze data from website logs, callcenters, competitors’ website ‘scans’ andsocial media.Unreliable dataThere is a whole bunch of techniques dedicatedto cleansing data. But first things first. Your bigdata needs to have a proper model. Only aftercreating that, you can go ahead and analyze it.But keep in mind that big data is never 100%accurate. You have to know it and deal with it.

5) DANGEROUS BIG DATASECURITY HOLESQuite often, big data adoption projects putsecurity off till later stages, but this is not asmart move.IT persons hope that security will be grantedon the application level but what can happenis that big data security gets cast aside.Solution:The precaution against your possible big datasecurity challenges is putting security first.It is particularly important at the stage ofdesigning your solution’s architecture.If you don’t get along with big data securityfrom the very start, it’ll bite you when youleast expect it.

6) TRICKY PROCESS OF CONVERTING BIGDATA INTO VALUABLE INSIGHTSOn Instagram, a certain soccer player posts his newlook, and the two characteristic things he’s wearingare white Nike sneakers and a beige cap.He looks good in them, and people who see thatwant to look this way too. Thus, they rush to buy asimilar pair of sneakers and a similar cap.But in your store, you have only the sneakers. As aresult, you lose revenue and maybe some loyalcustomers.Solution:The reason that you failed to have the needed itemsin stock is that your big data tool doesn’t analyzedata from social networks or competitor’s webstores.While your rival’s big data among other things doesnote trends in social media in near-real time. Andtheir shop has both items and even offers a 15%discount if you buy both.

7) TROUBLES OF UPSCALINGThe most typical feature of big data is its dramatic ability togrow. And one of the most serious challenges of big data isassociated exactly with this.Your solution’s design may be thought through and adjustedto upscaling with no extra efforts. But the real problem isn’tthe actual process of introducing new processing and storingcapacities. It lies in the complexity of scaling up so, that yoursystem’s performance doesn’t decline and you stay withinbudget.Solution:The first and foremost precaution for challenges like this is adecent architecture of your big data solution.As long as your big data solution can boast such a thing, lessproblems are likely to occur later.Another highly important thing to do is designing your bigdata algorithms while keeping future upscaling in mind.

INTERESTING EXAMPLES OF USING BIG DATA

Mt. Sinai Hospital in NYC created a computer-based project they call Deep Patient.They fed in the medical records of 700,000 people with 500 data points per patient and letthe machine iterate on the data.The machine was given no information about how the human body works or how diseasesaffect us.It found correlations that let it predict the onset of some diseases more accurately than ever,and some diseases, such as schizophrenia, for the first time at all.It does this by creating a vast network of weighted connections that is just too complex forus to understand.

MOST POPULAR WEBSITES 1996 - 2019https://www.youtube.com/watch?v 2Uj1A9AguFs

https://www.youtube.com/watch?v a3w8I8boc I

MOST POPULAR MUSIC STYLES 1910 - 2019https://www.youtube.com/watch?v eP88FUL7d 8

WINDYTY’S GLOBALWEATHER VISUALIZATIONExtremely simple and elegant,Windyty animates wind,temperature, clouds/rain, waves,snow, and air pressure patternsacross the globe, drawing on datafrom the Global Forecast System’sweather model.Users can drag and zoom to theirlocation, and can play an animatedprojection of forecasted weather fortwo weeksA snippet showing a two-day periodis shown.

HANS ROSLING’S WEALTH AND HEALTH OF NATIONS

THE END

big data terabytes, petabytes, exabytes, and zettabytes – oh my !! compiled by howie baum. big data is a phrase that gets talked about . exabyte 1,000 petabytes. 1 exabyte of typed pages is 52 million miles high which is t

Related Documents:

What is Big Data? Hadoop and Big Data Hadoop Explained . Big data is the term for a collection of large datasets that cannot be processed using traditional computing techniques. Enterprise Systems generate huge amount of data from Terabytes to and even Petabytes of informa-tion. Big data is not merely a data, rather it has become a complete .

Collider [6] produced 13 petabytes of data in 2010 and the Large Synoptic Survey Telescope [7] coming online in 2016 is projected to produce 10 petabytes of data a year [3]. Similarly, it is calculated that labs and hospitals around the globe are able to pro-vide around 15 quadrillion nucleotides per year, i.e. 15 petabytes of compressed .

Feb 01, 2017 · The quantity of data generated as Big Data ranges from Terabytes to Exabytes and Zettabytes of data. The volume has been increasing exponentially: up to 2.5 Exabyte of

Change Data Capture Data WebSphere MQ. InfoSphere Warehouse for DB2 for z/OS Source Systems DB2 for z/OS DB2 for z/OS IMS VSAM RDBMS Data Warehouse Server Cubing Services Engine . IBM InfoSphere Streams v2.0 A platform for real-time analytics on BIG data Volume Terabytes per second Petabytes per day Variety

application management, real-time analytics 3. Analysis and visualization that leverages real-time data correlation 4. Scalability to collect hundreds of terabytes of data per day and analyze petabytes of data at rest The Big Data Opportunity—A Cost Versus Value Discussion One of the most significant barriers to increasing

Keywords: Recommendation System, Hadoop, Big Data, MapReduce, Keywords and stop words. 1. INTRODUCTION . Big data analysis is one of the upcoming disciplines in data mining where the large unstructured data that is very difficult to store and retrieve in an efficient manner. Big data doesn‟t refer not only to exabytes or

framework for processing, storing, and analyzing massive amounts of distributed unstructured data. as a distributed file storage subsystem, Hadoop distributed file system (Hdfs) was designed to handle petabytes and exabytes of

‘Stars’ can allow a business to be a market leader ‘Problem Child’ products give businesses opportunity to invest ‘Dogs’ should be divested Increased profits can ari se f rom selling different products Newer products can replace thos e at the end of the life cycle A range of pro ducts increases brand awareness Easier to launch new products with larg e existing portfolio 5 Award 1 .