Operations And Big Data: Hadoop, Hive And Scribe - O'Reilly

1y ago

21 Views

2 Downloads

1.10 MB

38 Pages

Last View : 18d ago

Last Download : 3m ago

Upload by : Lilly Andre

Report this link

Download PDF

Transcription

Operations and Big Data:Hadoop, Hive and ScribeZheng Shao 微博：@邵铮912/7/2011 Velocity China 2011

Agenda1Operations: Challenges and Opportunities2Big Data Overview3Operations with Big Data4Big Data Details: Hadoop, Hive, Scribe5Conclusion

Operationschallenges and opportunities

zeUnderstandImproveMonitor

Challenges Huge amount of data Sampling may not be good enough Distributed environment Log collection is hard Hardware failures are normal Distributed failures are hard to understand

Example 1: Cache miss and performance Memcache layer has a bug thatdecreased the cache hit rate byhalfWebMemcacheMySQL MySQL layer got hit hard andperformance of MySQL degraded Web performance degraded

Example 2: Map-Reduce RetriesAttempt 1Map TaskAttempt 2Attempt 3Attempt 4 Attempt 1 hits a transientdistributed file system issue andfailed Attempt 2 hits a real hardwareissue and failed Attempt 3 hits a transientapplication logic issue and failed Attempt 4, by chance, succeeded The whole process slowed down

Example 3: RPC Hierarchy RPC 3A failedRPC 1RPC 0RPC 2RPC 3PRC 1A The whole RPC 0 failed becausePRC 1Bof that The blame was on owner ofRPC 3Aservice 3 because the log inRPC 3B service 0 shows that.

Example 4: Inconsistent results in RPC RPC 0 got results from both RPC1 and RPC 2RPC 0RPC 1RPC 2 Both RPC 1 and RPC 2succeeded But RPC 0 detects that theresults are inconsistent and fails We may not have logged anytrace information for RPC 1 andRPC 2 to continue debugging.

Opportunities Big Data Technologies Distributed logging systems Distributed storage systems Distributed computing systems Deeper Analysis Data mining and outlier detection Time-series StorageModelCompu0ngCompu0ngCompu0ng

Big Data OverviewAn example from Facebook

Big Data What is Big Data? Volume is big enough and hard to be managed by traditional technologies Value is big enough not to be sampled/dropped Where is Big Data used? Product analysis User behavior analysis Business intelligence Why use Big Data for Operations? Reuse existing infrastructure.

Overall Architecture 3GB/sec Near-realtime ProcessingPHPScribePolicyPTailPumaHBaseC Scribe- ‐HDFSJavaNectarScribeClient 6GB/secBatch Processing 9GB/secCopy/LoadCentralHDFSHive

Operationswith Big Data

logview Features PHP Fatal StackTrace Group StackTrace by similarity, order by counts Integrated with SVN/Task/Oncall tools Low-pri: Scribe can drop logview dataPHP ScribeClientScribeMid0erLogViewHTTP

logmonitor Rules Regular-expression based: ".*Missing Block.*" Rule has levels: WARN, ERROR, etc Dynamic rulesPropagaterulesApplyrulesRulesStorage RuleName,Count,PTail/LocalLogExamples Logmonitor LogmonitorClientStatsServerModifyrulesTopRulesWeb

Self Monitoring Goal: Set KPIs for SOA Isolate issues in distributed systems Make it easy for service owners to monitor Approach Log4J integration with Scribe JMX/Thrift/Fb303 counters Client-side logging Server-side unterqueryServiceOwner

Global Debugging with PTail Logging instruction Logging levels Logging destination (log name) Additional fields: Request IDLogdataService1ScribeLogdataLogdataRPC logginginstruc0onsService3PTailLogdataRPC Service2logginginstruc0ons

Hive Pipelines Daily and historical data analysis What is the trend of a metric? When did this bug first happen? Examples SELECT percentile(latency, “50,75,90,99”) FROM latency log; SELECT request id, GROUP CONCAT(log line) as total logFROM trace GROUP BY request idHAVING total log LIKE "%FATAL%“;

Big Data DetailsHadoop, Hive, Scribe

Key Requirements Ease of use Latency Smooth learning curve Real-time data Easy integration Historical data Structured/unstructured data Schema evolution Scalable Spiky traffic and QoS Raw data / Drill-down support Reliability Low data loss Consistent computation

Overall Architecture 3GB/sec Near-realtime ProcessingPHPScribePolicyPTailPumaHBaseC Scribe- ‐HDFSJavaNectarScribeClient 6GB/secBatch Processing 9GB/secCopy/LoadCentralHDFSHive

Distributed Logging System - Scribe https://github.com/facebook/scribe

Distributed Logging System - ntThriRRPCLogData category,message ogDataThriRRPC

Scribe Improvements Network efficiency Per-RPC Compression (use quicklz) Operation interface Category-based blacklisting and sampling Adaptive logging Use BufferStore and NullStore to drop messages as needed QoS Use separate hardware for now

Distributed Storage Systems - Scribe-HDFS Architecture Client Mid-tier Writers FeaturesScribeClients Scalability: 9GB/sec No single point of failure(except NameNode) Not open-sourced yetC1C2DataNodeC1C2DataNodeCalligraphusMid- ‐0erCalligraphusWritersZookeeperHDFS

Distributed Storage Systems - HDFS Architecture NameNode: namespace, block locations DataNodes: data blocks replicated 3 times Features 3000-node, PBs of spaces Highly reliable No random writesNameNodeHDFSClientDataNodes https://github.com/facebook/hadoop-20

HDFS Improvements Efficiency Random read keep-alive: HDFS-941 Faster checksum - HDFS-2080 Use fadvise - HADOOP-7714 Credits: -presentationslides-hadoop-and-performance

Distributed Storage Systems - HBase Architecture row, col-family, col, value Write-Ahead Log Records are sorted in memory/filesMaster FeaturesHBaseClient 100-node. Random read/write. Great write performance. b/RegionServers

Distributed Computing Systems – MR Architecture JobTracker TaskTracker MR ClientJobTrackerMRClient Features Push computation to data Reliable - Automatic retry Not easy to useTaskTracker

MR Improvements Efficiency Faster compareBytes: HADOOP-7761 MR sort cache locality: MAPREDUCE-3235 Shuffle: MAPREDUCE-64, MAPREDUCE-318 Credits: -presentationslides-hadoop-and-performance

Distributed Computing Systems – Hive Architecture MetaStore Compiler ExecutionMetaStoreHivecmdlineCompilerMRClient Features SQL Map-Reduce Select, Group By, Join UDF, UDAF, UDTF, ScriptMap- ‐ReduceTaskTrackers

Useful Features in Hive Complex column types Array, Struct, Map, Union CREATE TABLE (a struct c1:map string,string ,c2:array string ); UDFs UDF, UDAF, UDTF Efficient Joins Bucketed Map Join: HIVE-917

Distributed Computing Systems – Puma Architecture HDFS PTail Puma HBaseHDFSHBase Features StreamSQL: Select, Group By, Join UDF, UDAF Reliable – No data loss/duplicatePTail Puma

ConclusionBig Data can help operations

Big Data can help Operations 5 Steps to make it effective: Make Big Data easy to use Log more data and keep more sample whenever needed Build debugging infrastructure on top of Big Data Both real-time and historical analysis Continue to improve Big Data

Operations and Big Data: Hadoop, Hive and Scribe Zheng Shao 微博：@邵铮9 12/7/2011 Velocity China 2011 . 1 Operations: Challenges and Opportunities 2 Big Data Overview 3 Operations with Big Data 4 Big Data Details: Hadoop, Hive, Scribe 5 Conclusion Agenda . Operations challenges and opportunities . Operations Measure and

Related Documents:

hadoop - riptutorial.com

1: hadoop 2 2 Apache Hadoop? 2 Apache Hadoop : 2: 2 2 Examples 3 Linux 3 Hadoop ubuntu 5 Hadoop: 5: 6 SSH: 6 hadoop sudoer: 8 IPv6: 8 Hadoop: 8 Hadoop HDFS 9 2: MapReduce 13 13 13 Examples 13 ( Java Python) 13 3: Hadoop 17 Examples 17 hoods hadoop 17 hadoop fs -mkdir: 17: 17: 17 hadoop fs -put: 17: 17

35 Views

1y ago

Big Data Analytics - learnerspoint.org

The hadoop distributed file system Anatomy of a hadoop cluster Breakthroughs of hadoop Hadoop distributions: Apache hadoop Cloudera hadoop Horton networks hadoop MapR hadoop Hands On: Installation of virtual machine using VMPlayer on host machine. and work with some basics unix commands needs for hadoop.

10 Views

1y ago

Lecture @Dhbw: Data Warehouse Part Vii: Hadoop

2006: Doug Cutting implements Hadoop 0.1. after reading above papers 2008: Yahoo! Uses Hadoop as it solves their search engine scalability issues 2010: Facebook, LinkedIn, eBay use Hadoop 2012: Hadoop 1.0 released 2013: Hadoop 2.2 („aka Hadoop 2.0") released 2017: Hadoop 3.0 released HADOOP TIMELINE Daimler TSS Data Warehouse / DHBW 12

13 Views

1y ago

IN-MEMORY ACCELERATOR FOR HADOOP - GridGain Systems

The In-Memory Accelerator for Hadoop is a first-of-its-kind Hadoop extension that works with your choice of Hadoop distribution, which can be any commercial or open source version of Hadoop available, including Hadoop 1.x and Hadoop 2.x distributions. The In-Memory Accelerator for Hadoop is designed to provide the same performance

13 Views

1y ago

Big Data Hadoop Administrator - cognixia.com

BIG DATA THE WORLD OF BIG DATA HADOOP ADMINISTRATOR Hadoop Administrator is one of the most sought after skills in the world today. The global Hadoop market is expected to be worth 50.24 billion by 2020, offering great career opportunities to professionals. For any organization to start off with Hadoop, they would need Hadoop

12 Views

1y ago

Hadoop and MySQL for Big Data - Percona

Inside Hadoop Big Data with Hadoop MySQL and Hadoop Integration Star Schema benchmark . www.percona.com Hadoop: when it makes sense BIG DATA . www.percona.com Big Data Volume Petabytes Variety Any type of data - usually unstructured/raw data No normalization .

13 Views

1y ago

Big Data Hadoop & Spark - Intellipaat

Intellipaat's Big Data Hadoop training program helps you master Big Data Hadoop and Spark to get ready for the Cloudera CCA Spark and Hadoop Developer Certification (CCA175) exam, as well as to master Hadoop Administration, through 14 real-time industry-oriented case-study projects. In this Big Data course, you will master MapReduce,

9 Views

1y ago

PAM - California State Controller

A01 , A02 or A03 Verification of prior exempUcivil after exempt service must be on file with the X appointment (when appointing power. there is no break in service). A01 , A02 or A03 (to Copy of employee's retirement PM PPM X a permanent release letter from PERS must be 311.5, 360.3 appointment) after a on file with the appointing power.

91 Views

3y ago

Recent Views

Microsoft Advertising Travel Update

last minute cruise deals -58.50% Car Rental Queries WoW Change car rental -43.80% rental cars -46.30% car rentals -40.60% cheap car rentals -48.00% car rentals cheapest rates -52.20% rent a car- 40.30% cheap rental cars -45.60% rental car -41.80% car rental deals -49.30% rental cars lowest price -53.90% Flight Queries WoW Change cheap flights .

1y ago

337 Views

IN THIS ISSUE CAR WASH INSIGHT Recent, Notable M&A Transactions .

9/8/2022 Club Car Wash Sites of Tidal Wave Express Car Wash 8 8/29/2022 Take 5 Car Wash Soft Touch Car Wash, Auto Oasis Car Wash, Clearwater Car Wash and Birdie's Car Wash 5 8/25/2022 WhiteWater Express Geaux Clean Car Wash 7 8/19/2022 ModWash Home Team Car Wash 3 8/18/2022 Splash In ECO Car Wash (Wills Group) Blue Hen Car Wash 2

9m ago

100 Views

Personal insurance - Car & Business insurance King Price Insurance

The king's insurance options 5 Things you need to know 7 The stuff you need to do 14 How to claim 16 Our commitment to you 20 Car insurance 22 Car warranty 37 Shortfall cover 45 Scratch and dent 46 Tyre and rim 48 Motorbike insurance 53 Trailer and caravan insurance 64 Watercraft insurance 68 Home contents insurance 77 Buildings insurance 89

1y ago

673 Views

Decision Tree Tutorial by Kardi Teknomo - TAN THIAM HUAT 陳添發

Male 1 Cheap Medium Bus Female 1 Cheap Medium Train Female 0 Cheap Low Bus Male 1 Cheap Medium Bus Male 0 Standard Medium Train Female 1 Standard Medium Train Female 1 Expensive High Car Male 2 Expensive Medium Car Female 2 Expensive High Car Based on above training data, we can induce a decision tree as the following:

10m ago

84 Views

ESSENTIAL PLAN - Discovery

Car insurance only Car and home insurance Car insurance only Car and home insurance 12.5% 25% 5% 10% YOUR FUEL CASH BACK PERCENTAGE GET TO THE HIGHEST CASH BACK PERCENTAGE Add at least R250 000 of home insurance (household contents, buildings or both) Take your car to Tiger Wheel & Tyre and pass the Annual MultiPoint check

1y ago

269 Views

CAR INSURANCE EVERYTHING EXPLAINED - RSA Insurance Group

CAR INSURANCE 93013821.indd 1 15/03/2018 10:46. 2 WELCOME TO µ CAR INSURANCE Thank you for choosing µ to protect you and your car. This booklet is intended to help you check your cover and to reassure you that µ will give you the protection you need for the year ahead. First of all, to help you understand your car insurance policy we want to .

1y ago

274 Views

Describe types and purposes of insurance.

D.O. CAPS Consumer Skills: Insurance—10E 3 Your car - The car you drive can also affect your insurance rates. Insurance companies place certain kinds of cars in special risk categories. You should ask your insurance agent before making a car purchase to make sure you aren't getting a car that will cost you extra for your liability insurance.

1y ago

233 Views

Cruising for Customers - Experian

1. airline tickets 2. cheap airline tickets 3. rental cars 4. car rental 5. flights 6. hotels 7. cruises 8. cheap hotels 9. airlines 10. car rentals last minute travel last minute cruises all inclusive resorts cheap vacation packages cruise deals all inclusive vacations vacation rentals

1y ago

202 Views

-xglfldo:Dwfk Xjxvw Wkurxjk)2,

Affordable Care Act - insurance comparison, cheapest insurance, cheap health insurance NJ, cheapest insurance company Priority One High Volume - Washington state health insurance plans, affordable health insurance The best performing ad copy included those that made specific reference to finding "health insurance" for

1y ago

259 Views

Contours Options Infant Car Seat Adapter Instruction Sheet

your Infant Car Seat, as described in the instruction manual provided by the Infant Car Seat manufacturer. † WHEN USING ONLY ONE INFANT CAR SEAT ADAPTER OR TWO FOR TWINS, THE FOLLOWING INFANT CAR SEATS CAN BE USED: † If your Infant Car Seat is not one of the models listed above, DO NOT use your infant car seat with this car seat adapter.

2y ago

564 Views

Design and development of lift for an automatic car parking system

1. Stacker type car parking system 2. Puzzle type car parking system 3. Level type car parking system 4. Chess type car parking system 5. Rotary type car parking system 6. Tower type car parking system But lift is used only in tower type car parking system. Objectives:-

6m ago

172 Views

Gold Tier - MAPFRE Insurance

Foy Insurance of MA, LLC 198 Frank Consolati Insurance Agency, Inc. 198 County Insurance Agency, Inc. 198 Woodrow W Cross Agency 214 Woodland Insurance Agency, Inc. 214 Tegeler Insurance Services of CT, Inc. 214 Pantano/VonKahle Insurance Agency, Inc. 214 . Hanson Insurance Agency, Inc. 287 J.H. Slattery Insurance Agency, Inc. 287

1y ago

565 Views

Bilinear Prediction Using Low-Rank Models

car insurance geico insurance need cheap auto insurance geico com car insurance coupon code Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction. Modern Prediction Problems in Machine Learning Wikipedia Tag Recommendation Learning in computer vision

1y ago

133 Views

Car Insurance This booklet covers:Car Rapid Bonus Business

Car Insurance This booklet covers:Car Rapid Bonus Business RAC Direct Insurance is a trading name of London and Edinburgh Insurance Company Limited. Registered in England No 924430. Registered Office: 8 Surrey Street, Norwich NR1 3NG. Member of the Aviva Group. Authorised and regulated by the Financial Services Authority. RAC052(V27)-1971-06.06 .

1y ago

218 Views

Root Insurance (ROOT) - Citron Research

Root Insurance (ROOT) Leveling the Playing Field of Car Insurance What every trader needs to know about one of the mostheavily shorted stocks in the market Traditional Credit-Based Car Insurance PerpetuatesEconomic and Racial Inequalities as one in three American cannot affordessentials because of car insurance premiums

1y ago

209 Views

Operations And Big Data: Hadoop, Hive And Scribe - O'Reilly

It looks like you're using an ad-blocker