November 2013 ALSO INSIDE - Telliant

2y ago
15 Views
2 Downloads
1.43 MB
35 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Aarya Seiber
Transcription

Really Big DataDr. Dobb’s JournalNovember 2013ALSO INSIDEDo All Roads Lead Back to SQL? UnderstandingWhat Big DataCan DeliverApplying the Lambda Architecture From the Vault:Easy Real-Time Big Data Analysis UsingStorm www.drdobbs.com

More on DrDobbs.comJolt Awards: The Best BooksFive notable books every serious programmer should read.Dr. Dobb’s JournalCONTENTSNovember 2013COVER ARTICLE8 Understanding WhatBig Data Can Deliver23 From the Vault: Easy Real-TimeBig Data Analysis Using StormBy Aaron KimballBy Shruthi Kumar and Siddharth PatankarIt’s easy to err by pushing data to fit a projected model. Insightscome, however, from accepting the data’s ability to depict what isgoing on, without imposing an a priori bias.If you're looking to handle big data and don't want to traverse the Hadoop universe, you might well find that usingStorm is a simple and elegant solution.GUEST EDITORIAL3 Do All Roads Lead Back to SQL?6 News BriefsBy Seth ProctorRecent news on tools, platforms, frameworks, and the stateof the software development world.After distancing themselves from SQL, NoSQL products are moving towards transactional models as “NewSQL” gains popularity.What happened?By Adrian Bridgwater7 Open-Source DashboardA compilation of trending open-source projects.FEATURES15 Applying the Big Data Lambda Architecture 34 LinksBy Michael HausenblasA look inside a Hadoop-based project that matches connections insocial media by leveraging the highly scalable lambda architecture.www.drdobbs.comSnapshots of interesting items on drdobbs.com including alook at the first steps to implementing Continuous Deliveryand developing Android apps with Scala and Scaloid.November 2013http://www.drdobbs.com/240162065A Massively Parallel Stack for Data AllocationDynamic parallelism is an important evolutionary stepin the CUDA software development platform. With it,developers can perform variable amounts of workbased on divide-and-conquer algorithms and in-memorydata structures such as trees and graphs — entirelyon the GPU without host duction to Programming with ListsWhat it’s like to program with immutable lists.http://www.drdobbs.com/240162440Who Are Software Developers?Ten years of surveys show an influx of younger developers, more women, and personality profiles at oddswith traditional stereotypes.http://www.drdobbs.com/240162014Java and IoT In MotionEric Bruno was involved in the construction of the Internet of Things (IoT) concept project called “IoT InMotion.” He helped build some of the back-end components including a RESTful service written in Javawith some database queries, and helped a bit with thefront-end as well.http://www.drdobbs.com/2401621892

[GUEST EDITORIAL]IN THIS ISSUEGuest Editorial News Open-Source Dashboard What Big Data Can Deliver Lambda Storm Links Table of Contents Do All Roads Lead Back to SQL?After distancing themselves from SQL, NoSQL products are moving towards transactional models as“NewSQL” gains popularity. What happened?By Seth Proctoruch has been made in the past several years about SQLversus NoSQL and which model is better suited to modern, scale-out deployments. Lost in many of these arguments is the raison d’être for SQL and the difference between model and implementation. As new architectures emerge, thequestion is why SQL endures and why there is such a renewed interestin it today.MBackgroundIn 1970, Edgar Codd captured his thoughts on relational logic in a paper that laid out rules for structuring and querying data(http://is.gd/upAlYi). A decade later, the Structured Query Language(SQL) began to emerge. While not entirely faithful to Codd’s originalrules, it provided relational capabilities through a mostly declarativelanguage and helped solve the problem of how to manage growingquantities of data.Over the next 30 years, SQL evolved into the canonical data-management language, thanks largely to the clarity and power of its underlying model and transactional guarantees. For much of that time,deployments were dominated by scale-up or “vertical” architectures,in which increased capacity comes from upgrading to bigger, individwww.drdobbs.comNovember 2013ual systems. Unsurprisingly, this is also the design path that most SQLimplementations followed.The term “NoSQL” was coined in 1998 by a database that providedrelational logic but eschewed SQL (http://is.gd/sxH0qy). It wasn’t until2009 that this term took on its current, non-ACID meaning. By then, typical deployments had already shifted to scale-out or “horizontal” models. The perception was that SQL could not provide scale-out capability,and so new non-SQL programming models gained popularity.Fast-forward to 2013 and after a period of decline, SQL is regainingpopularity in the form of NewSQL (http://is.gd/x0c5uu) implementations. Arguably, SQL never really lost popularity (the market is estimated at 30 billion and growing), it just went out of style. Either way,this new generation of systems is stepping back to look at the last 40years and understand what that tells us about future design by applying the power of relational logic to the requirements of scale-out deployments.Why SQL?SQL evolved as a language because it solved concrete problems. Therelational model was built on capturing the flow of real-world data. If apurchase is made, it relates to some customer and product. If a song is3

[GUEST EDITORIAL ]IN THIS ISSUEGuest Editorial News Open-Source Dashboard What Big Data Can Deliver Lambda Storm Links Table of Contents played, it relates to an artist, an album, a genre, and so on. By definingthese relations, programmers know how to work with data, and the system knows how to optimize queries. Once these relations are defined,then other uses of the data (audit, governance, etc.) are much easier.Layered on top of this model are transactions. Transactions areboundaries guaranteeing the programmer a consistent view of the“Many NoSQL systems tout the benefit of having no(or a limited) schema. In practice, developers still need somecontract with their data to be effective”database, independent execution relative to other transactions, andclear behavior when two transactions try to make conflicting changes.That’s the A (atomicity), C (consistency), and I (isolation) in ACID. To saya transaction has committed means that these rules were met, andthat any changes were made Durable (the D in ACID). Either everythingsucceeds or nothing is changed.Transactions were introduced as a simplification. They free developers from having to think about concurrent access, locking, or whethertheir changes are recorded. In this model, a multithreaded service canbe programmed as if there were only a single thread. Such programming simplification is extremely useful on a single server. When scalingacross a distributed environment, it becomes critical.With these features in place, developers building on SQL were able tobe more productive and focus on their applications. Of particular importance is consistency. Many NoSQL systems sacrifice consistency for scalwww.drdobbs.comNovember 2013ability, putting the burden back on application developers. This tradeoff makes it easier to build a scale-out database, but typically leaves developers choosing between scale and transactional consistency.Why Not SQL?It’s natural to ask why SQL is seen as a mismatch for scale-out architectures, and there are a few key answers. The first is that traditionalSQL implementations have trouble scaling horizontally. This has led toapproaches like sharding, passive replication, and shared-disk clustering. The limitations (http://is.gd/SaoHcL) are functions of designingaround direct disk interaction and limited main memory, however, andnot inherent in SQL.A second issue is structure. Many NoSQL systems tout the benefit ofhaving no (or a limited) schema. In practice, developers still need somecontract with their data to be effective. It’s flexibility that’s needed —an easy and efficient way to change structure and types as an application evolves. The common perception is that SQL cannot providethis flexibility, but again, this is a function of implementation. Whentable structure is tied to on-disk representation, making changes tothat structure is very expensive; whereas nothing in Codd’s logicmakes adding or renaming a column expensive.Finally, some argue that SQL itself is too complicated a language fortoday’s programmers. The arguments on both sides are somewhatsubjective, but the reality is that SQL is a widely used language with alarge community of programmers and a deep base of tools for taskslike authoring, backup, or analysis. Many NewSQL systems are layeringsimpler languages on top of full SQL support to help bridge the gapbetween NoSQL and SQL systems. Both have their utility and their usesin modern environments. To many developers, however, being able to4

[GUEST EDITORIAL ]IN THIS ISSUEGuest Editorial News Open-Source Dashboard What Big Data Can Deliver Lambda Storm Links Table of Contents reuse tools and experience in the context of a scale-out databasemeans not having to compromise on scale versus consistency.Where Are We Heading?The last few years have seen renewed excitement around SQL.NewSQL systems have emerged that support transactional SQL, builton original architectures that address scale-out requirements. Thesesystems are demonstrating that transactions and SQL can scale whenbuilt on the right design. Google, for instance, developed F1(http://is.gd/Z3UDRU) because it viewed SQL as the right way to address concurrency, consistency, and durability requirements. F1 is specific to the Google infrastructure but is proof that SQL can scale andthat the programming model still solves critical problems in today’sdata centers.Increasingly, NewSQL systems are showing scale, schema flexibility,and ease of use. Interestingly, many NoSQL and analytic systems arenow putting limited transactional support or richer query languagesinto their roadmaps in a move to fill in the gaps around ACID and declarative programming. What that means for the evolution of these systems is yet to be seen, but clearly, the appeal of Codd’s model is asstrong as ever 43 years later.— Seth Proctor serves as Chief Technology Officer of NuoDB Inc. and has more than15 years of experience in the research, design, and implementation of scalable systems.His previous work includes contributions to the Java security framework, the Solarisoperating system, and several open-source projects.www.drdobbs.comNovember 20135

[NEWS]IN THIS ISSUEGuest Editorial News Open-Source Dashboard What Big Data Can Deliver Lambda Storm Links Table of Contents News BriefsBy Adrian BridgwaterProgress Pacific PaaS Is A Wider Developer’s PaaSHBase Apps And The 20 Millisecond FactorProgress has used its Progress Exchange 2013 exhibition and developer conference to announce new features in the Progress Pacific platform-as-a-service (PaaS) that allow more time and energy to be spentsolving business problems with data-driven applications and less timeworrying about technology and writing code. This is a case of cloudcentric data-driven software application development supportingworkflows that are engineered to Real Time Data (RTD) from disparatesources, other SaaS entities, sensors, and points within the Internet ofThings — for developers, these workflows must be functional for mobile, on premise, and hybrid apps where minimal coding is requiredsuch that the programmer is isolated to a degree from the complexityof middleware, APIs, and drivers.http://www.drdobbs.com/240162366MapR Technologies has updated its M7 edition to improve HBase application performance with throughput that is 4-10x faster while eliminating latency spikes. HBase applications can now benefit from MapR’s platform to address one of the major issues for online applications,consistent read latencies in the “less than 20 millisecond” range, as theyexist across varying workloads. Differentiated features here include architecture that persists table structure at the filesystem layer; no compactions (I/O storms) for HBase applications; workload-aware splits forHBase applications; direct writes to disk (vs. writing to an external filesystem); disk and network compression; and C implementation that doesnot suffer from garbage collection problems seen with Java applications.http://www.drdobbs.com/240162218New Java Module In SOASTA CloudTestSauce Labs and Microsoft have partnered to announce BrowserSwarm, a project to streamline JavaScript testing of Web and mobileapps and decrease the amount of time developers spend on debugging application errors. BrowserSwarm is a tool that automates testing of JavaScript across browsers and mobile devices. It connects directly to a development team’s code repository on GitHub. When thecode gets updated, BrowserSwarm automatically executes a suite oftests using common unit testing frameworks against a wide array ofbrowser and OS combinations. BrowserSwarm is powered on thebackend by Sauce Labs and allows developers and QA engineers toautomatically test web and mobile apps across 150 browser / OScombinations, including iOS, Android, and Mac OS X.http://www.drdobbs.com/240162298SOASTA has announced the latest release of CloudTest with a new Javamodule to enable developers and testers of Java applications to testany Java component as they work to “easily scale” it. Direct-to-databasetesting here supports Oracle, Microsoft SQL Server, and PostgreSQLdatabases — and this is important for end-to-end testing for enterprisedevelopers. Also, additional in-memory processing enhancementsmake dashboard loading faster for in-test analytics. New CloudTest capabilities include Direct-to-Database testing. CloudTest users can nowdirectly test the scalability of the most popular enterprise and opensource SQL databases from Oracle, Microsoft SQL Server, and obbs.comNovember 2013Sauce Labs and Microsoft Whip Up BrowserSwarm6

[OPEN-SOURCE DASHBOARD]IN THIS ISSUEGuest Editorial News Open-Source Dashboard What Big Data Can Deliver Lambda Storm Links Table of Contents TOP OPEN-SOURCE PROJECTSTrending this month on GitHub:Trending this month on SourceForge:jlukic/Semantic-UI ating a shared vocabulary for UI.Notepad Plugin r/The plugin list for Notepad Plugin Manager with code for the pluginmanager.HubSpot/pace CSShttps://github.com/HubSpot/paceAutomatic Web page progress bar.MinGW: Minimalist GNU for Windows:http://sourceforge.net/projects/mingw/A native Windows port of the GNU Compiler Collection (GCC).maroslaw/rainyday.js imulating raindrops falling on a window.peachananr/onepage-scroll rollCreate an Apple-like one page scroller website (iPhone 5S website) with OnePage Scroll plugin.twbs/bootstrap JavaScripthttps://github.com/twbs/bootstrapSleek, intuitive, and powerful front-end framework for faster and easier Webdevelopment.mozilla/togetherjs JavaScripthttps://github.com/mozilla/togetherjsA service for your website that makes it surprisingly easy to collaborate inreal-time.daviferreira/medium-editor ditorMedium.com WYSIWYG editor clone.alvarotrigo/fullPage.js jsfullPage plugin by Alvaro Trigo. Create full-screen pages fast and simple.angular/angular.js end HTML vocabulary for your applications.www.drdobbs.comNovember 2013Apache ceorg.mirror/An open-source office productivity software suite containing word processor,spreadsheet, presentation, graphics, formula editor, and databasemanagement applications.YTD Androidhttp://sourceforge.net/projects/rahul/Files Downloader is a free powerful utility that will help you to download yourfavorite videos from youtube. The application is forge.net/projects/portableapps/Popular portable software solution.Media Player Classic: Home Cinemahttp://sourceforge.net/projects/mpc-hc/This project is based on the original Guliverkli project, and contains additionalfeatures and bug fixes (see complete list on the project’s website).Anti-Spam SMTP Proxy Serverhttp://sourceforge.net/projects/assp/The Anti-Spam SMTP Proxy (ASSP) Server project aims to create an opensource platform-independent SMTP Proxy server.Ubuntuzilla: Mozilla Software la/An APT repository hosting the Mozilla builds of the latest official releases ofFirefox, Thunderbird, and Seamonkey.7

[WHAT BIG DATA CAN DELIVER]IN THIS ISSUEGuest Editorial News Open-Source Dashboard What Big Data Can Deliver Lambda Storm Links Table of Contents UnderstandingWhat Big Data Can DeliverIt’s easy to err by pushing data to fit a projected model. Insights come, however, from accepting thedata’s ability to depict what is going on, without imposing an a priori bias.By Aaron KimballWith all the hype and anti-hype surrounding Big Data, the datamanagement practitioner is, in an ironic turn of events, inundated with information about Big Data. It is easy to getlost trying to figure out whether you have Big Data problemsand, if so, how to solve them. It turns out the secret to taming your BigData problems is in the detail data. This article explains how focusing onthe details is the most important part of a successful Big Data project.Big Data is not a new idea. Gartner coined the term a decade ago, describing Big Data as data that exhibits three attributes: Volume, Velocity,and Variety. Industry pundits have been trying to figure out what thatmeans ever since. Some have even added more “Vs” to try and betterexplain why Big Data is something new and different than all the otherdata that came before it.The cadence of commentary on Big Data has quickened to the extentthat if you set up a Google News alert for “Big Data,” you will spendmore of your day reading about Big Data than implementing a Big Datasolution. What the analysts gloss over and the vendors attempt to simplify is that Big Data is primarily a function of digging into the detailsof the data you already have.www.drdobbs.comNovember 2013Gartner might have coined the term “Big Data,” but they did notinvent the concept. Big Data was just rarer then than it is today.Many companies have been managing Big Data for ten years ormore. These companies may have not had the efficiencies of scalethat we benefit from currently, yet they were certainly paying attention to the details of their data and storing as much of it as theycould afford.A Brief History of Data ManagementData management has always been a balancing act between the volume of data and our capacity to store, process, and understand it.The biggest achievement of the On Line Analytic Processing (OLAP)era was to give users interactive access to data, which was summarizedacross multiple dimensions. OLAP systems spent a significant amountof time up front to pre-calculate a wide variety of aggregations over adata set that could not otherwise be queried interactively. The outputwas called a “cube” and was typically stored in memory, giving endusers the ability to ask any question that had a pre-computed answerand get results in less than a second.8

[WHAT BIG DATA CAN DELIVER]IN THIS ISSUEGuest Editorial News Open-Source Dashboard What Big Data Can Deliver Lambda Storm Links Table of Contents Big Data is exploding as we enter the era of plenty — high bandwidth, greater storage capacity, and many processor cores. New software, written after these systems became available

Dr. Dobb’s Journal November 2013 Really Big Data Understanding What Big Data Can Deliver. November2013 2 CONTENTS COVER ARTICLE 8Understanding What Big Data Can Deliver By Aaron Kimball It’s easy

Related Documents:

26 Extended essay results November 2010 39 27 Theory of knowledge results November 2010 40 28 Distribution of additional points November 2006–November 2010 41 29 Mean points score worldwide November 2006–November 2010 42 30 Mean grade worldwide November 2006–November 2010 42 31 Pass rate worldwide November 2006–November 2010 43

H Baywatch (‘17, Com.) (Dwayne Johnson, Zac Efron, Priyanka Chopra.) Elite lifeguards must save the . The Inside Story (N) HHH American Gangster (‘07, Cri. Dra.) (Denzel Washington, Russell Crowe, Chiwetel Ejiofor.) A chauf- . TRUTV Inside Jokes Inside Jokes Inside Jokes Inside Jokes Inside Jokes Inside Jokes HH Ride Along (‘14, .

November 2: Broiler Order Due November 2, 9, 16, 23, & 30: Youth Leadership Mtg, 6-8pm, CEO November 3-4 : Major Show Sign-up, 5-8pm, CEO November 5: Scholarship Training, 6-8pm, CEO November 6-8: Junior Leadership Retreat, Brownwood November 9: Consumer Meeting, 5pm, CEO November 10: Photo Workshop Planning

Volume 29, Issue 21 Virginia Register of Regulations June 17, 2013 2526 PUBLICATION SCHEDULE AND DEADLINES June 2013 through June 2014 Volume: Issue Material Submitted By Noon* Will Be Published On 29:21 May 29, 2013 June 17, 2013 29:22 June 12, 2013 July 1, 2013 29:23 June 26, 2013 July 15, 2013 29:24 July 10, 2013 July 29, 2013

GENERIC RISK ASSESSMENT INDEX: Risk Assessments Version Issue Date Mobile Scaffold Towers 3 May 2013 Working on Scaffolds 3 May 2013 Excavations 3 May 2013 Working in Confined Spaces 3 May 2013 Working Near Buried Spaces 3 May 2013 Crane Operations 3 May 2013 Maintenance & Repair of Plant 3 May 2013 Welding 3 May 2013 Demolition 3 May 2013 Work Involving Asbestos Products 3 May 2013 Excessive .

US 9,203,881 B2 Page 3 (56) References Cited 2013/0157699 Al * 6/2013 Talwar et al. 455/466 2013/0212497 Al 8/2013 Zelenko et al. U.S. PATENT DOCUMENTS 2013/0247216 Al 9/2013 Cinarkaya et al. 2013/0086245 Al * 4/2013 Lu et al. 709/223 2013/0091204 Al * 4/2013 Loh et al. 709/204 * cited by examiner

12 2013 3.0 6.5 2.0 2.0 13 2013 4.7 7.1 1.2 1.2 14 2013 3.8 7.0 1.5 1.5 15 2013 3.8 7.0 1.6 1.6 16 2013 4.0 7.5 2.0 2.0 17 2013 4.6 7.4 1.4 1.2 18 2013 4.6 6.1 1.4 1.3 SEP: Compilation and Summary of Individual Economic Projections November 2 3, 2010 Authorized for Public Release 5 of 35

Alfredo López Austin TEMARIO SEMESTRAL DEL CURSO V. LOS PRINCIPALES SISTEMAS DEL COMPLEJO, LAS FORMAS DE EXPRESIÓN Y LAS TÉCNICAS 11. La religión 11.1. El manejo de lo k’uyel. 11.1.1. La distinción entre religión, magia y manejo de lo k’uyel impersonal. Los ritos específicos. 11.2. Características generales de la religión mesoamericana. 11.3. La amplitud social del culto. 11.3.1 .