IBM’s InfoSphere BigInsights: Smart Analytics For Big Data

2y ago
28 Views
2 Downloads
4.62 MB
71 Pages
Last View : 1d ago
Last Download : 6m ago
Upload by : Elisha Lemon
Transcription

IBM’s InfoSphere BigInsights:Smart Analytics for Big DataClaus Samuelsencsa@dk.ibm.comNovember 7, 2011 2011 IBM Corporation

IBM DisclaimerInformation regarding potential future products is intended to outlineour general product direction and it should not be relied on in makinga purchasing decision. The information mentioned regarding potentialfuture products is not a commitment, promise, or legal obligation todeliver any material, code or functionality. Information about potentialfuture products may not be incorporated into any contract. Thedevelopment, release, and timing of any future features orfunctionality described for our products remains at our sole discretion.2 2011 IBM Corporation

Agenda The “Big Data” challenge: smarter analytics for asmarter planet IBM’s approach– The big picture– Details on BigInsights– How BigInsights fits in your software stack (with datawarehouses, DBMSs, streams, etc.) How IBM can help you get off to a quick start3 2011 IBM Corporation

The “Big Data” ChallengeNovember 7, 2011 2011 IBM Corporation

Information is at the Centerof a New Wave of Opportunity 44xas much Data and ContentOver Coming Decade2009800,000 petabytes55202035 zettabytes80%Of world’s datais unstructured And OrganizationsNeed Deeper Insights1 in 3Business leaders frequentlymake decisions based oninformation they don’t trust, ordon’t have1 in 2Business leaders say they don’thave access to the informationthey need to do their jobs83%of CIOs cited “Businessintelligence and analytics” aspart of their visionary plansto enhance competitiveness60%of CEOs need to do a better jobcapturing and understandinginformation rapidly in order tomake swift business decisions 2011 IBM Corporation

Example: The Perception Gap Surrounding Social Media . . . . IBM 2010 CEO Study: 88 percent of CEOs said “getting closer to customers” was top priority over next 5 yearsand viewed social media as a core part of that strategy However, a March 2011 IBM study identified that companies fail to understand what customers want from socialadvertising and outreachSocial media and social networkingwill increase customer advocacy?Source: “Capitalizing oncomplexity, Insights from theGlobal Chief Executive OfficeStudy,” IBM Institute forBusiness Value, 201066“What Customers Want”First in a two-part seriesIBM Institute for Business ValuePublished March 2011 2011 IBM Corporation

Big Data Presents Big OpportunitiesExtract insight from a high volume, variety and velocity of datain a timely and cost-effective mannerVariety: Manage and benefit fromdiverse data types and datastructuresVelocity: Analyze streaming data andlarge volumes of persistentdataVolume: Scale from terabytes tozettabytes77 2011 IBM Corporation

Information ManagementVestas (European Energy Company)Business Challenge Analyze large volumes of public and private weather data foralternative energy business Existing high-performance computing hardware, limited staffProject objectives Leverage large volume (2 PB) of weather data to optimizeplacement of turbines. Reduce modeling time from weeks to hours. Optimize ongoing operations.Solution Components: IBM InfoSphereBigInsights EnterpriseEdition:- Scalability (data volumes)- Jaql (query support andextensibility)- IBM-provided file system(support existing hardware &apps)- Strong runtime performanceThe benefits Reliability, security, scalability, and integration needs fulfilled Standard enterprise software support IBM xSeries hardware Single-vendor solution for software, hardware, storage, nsights: videos/interviews11 2011 IBM Corporation

Information ManagementGlobal Technology FirmBusiness challenge Analyze & correlate log records across to improve service Detect & predict failure patterns; initiate automated or manualpreventive actionsProject objectives Process variety of logs generated by multiple systems, devices indistinct formats (XML, text, ) Accommodate large data volumes growing at 1 TB /day Parse logs, identify/extract entities of interest, index as needed,cluster data by sessions, detect & visualize patterns through GUI Report on Top X, Bottom X patterns; support exploratory queriesSolution Components: IBM InfoSphereBigInsights EnterpriseEdition including: Spreadsheet datadiscovery andvisualization Text analytics runtimeand tooling Flexible query support Scalability IBM InfoSphere StreamsThe benefits IBM analytics and tooling simplify development and speed time-tovalue. “You have done in 2 weeks what I have been trying to generalize forthe past 6 months.” -- Customer project leader12 2011 IBM Corporation

Information ManagementGlobal Media FirmBusiness challenge Identify unauthorized content streaming (piracy) Quantify annual revenue loss, analyze trends Monitor social media sites (e.g., Twitter, Facebook) to identifydissemination of pirated content. Time sensitive!Project objectives: Analyze high variety of data. Volumes unclear. Start with social media data for 1 year. Use text analytics to Qualify & classify info of interest (complex, custom set of rules) Search for URLs with live streaming of target data, sentiment, . Future potential for video analysisSolution Components: IBM InfoSphereBigInsights EnterpriseEdition including: Text analytics runtimeand tooling Custom textannotators Flexible query support ScalabilityThe benefits Improved understanding of business exposures through advancedanalytics Improved decision-making process Scalable, flexible infrastructure for handling future analytic needs13 2011 IBM Corporation

Customer EngagementsUse patternsCommon requirements Customer sentiment analysis (crosssell, up-sell, campaign management) Integrated retail and web customerbehavior modeling Predictive modeling (credit card fraud) System log analytics (reduceoperational risk) Extract business insight from large volumes ofraw data (often outside operational systems) Integrate with other existing software Ready for enterprise useConsumerInsightText, Blog, WeblogClick streamsLog & transactionsBiological SequencesOperational system & streams data sources1414Multi-channelsalesNext GenFraud ModelsNew BusinessDevelopmentText AnalyticsStatistical ModelBuilding 2011 IBM Corporation

IBM’s approachNovember 7, 2011 2011 IBM Corporation

Big Data: an integral part of an enterprise data platform Manage Big Data from the instant it enters the enterprise High fidelity – no changes to original format Available for new uses, analyses, and lData StoreBig DataApplicationsWarehouseBig Data PlatformIBM Big Data SolutionsClient and Partner SolutionsBig Data User EnvironmentDevelopersEnd UsersAdmin.Traditional data sources(ERP, CRM, databases,etc.)Big Data Enterprise sSource data (Web, sensors, logs, media, etc. ) 2011 IBM Corporation

IBM’s Platform Addresses Key Requirements1. Platform for V3 – Variety, Velocity, Volume Variety - manage data & content “As Is” Handle any velocity - low-latency streams and large volume batch Volume - huge volumes of at-rest or streaming dataBig Data Platform2. Analytics for V3 Analyze Sources in their native format - text, data, rich content Analyze all of the data - not just a subset Dynamic analytics - automatic adjustments and actions3. Ease of Use for Developers and Users Developer UIs, common languages & automatic optimization End-user UIs & visualization4. Enterprise Class Failure tolerance, Security and Privacy Scale Economically5. Extensive Integration Capabilities17 Integrate wide variety of sources Leverage enterprise integration technologies 2011 IBM Corporation

Platform VisionIBM Big Data SolutionsClient and Partner SolutionsRules / BPMiLog & LombardiDataWarehouseBig Data AcceleratorsText SphereWarehouseGeospatialTimes SeriesApplicationsWarehouseAppliancesAcousticIBM & non-IBMMathematicalMaster DataMgmtBlue PrintsInfoSphere MDMBig Data Enterprise EnginesDatabaseINTEGRATIONDB2 & non-IBMContentAnalyticsInfoSphere StreamsInfoSphere BigInsightsECMProductivity Tools & OptimizationWorkloadManagement ionManagerActivityMonitorIdentity &Access MgmtDataProtectionInformation ServerBusinessAnalyticsCognos & SPSSMarketingUnicaData GrowthManagementInfoSphere Optim18 2011 IBM Corporation

BigInsights Summary BigInsights analytical platform for persistent “Big Data”– Based on open source & IBM technologies– Managed like a start-up . . . . Emphasis on deep customer engagements,product plan flexibility Distinguishing characteristics– Built-in analytics . . . . Enhances business knowledge– Enterprise software integration . . . . Complements and extends existingcapabilities– Production-ready platform . . . . Speeds time-to-value; simplifiesdevelopment and maintenance IBM advantage– Combination of software, hardware, services and advanced research19 2011 IBM Corporation

InfoSphere BigInsightsPlatform for volume, variety,velocity -- V3 Enhanced HadoopfoundationEnterprise EditionLicensedUsability Web console Integrated install Spreadsheet-style tool Ready-made “apps”Enterprise Class Storage, security, clustermanagementIntegration Connectivity to DB2,Netezza20Enterprise classAnalytics for V3 Text analytics & toolingApacheHadoopBusiness process accelerators (“Apps”)Text analyticsSpreadsheet-style analysis toolRDBMS, warehouse connectivityIntegrated Web-based consoleBasic EditionFlexible job schedulerPerformance enhancementsFree downloadEclipse-based toolingIntegrated installLDAP authenticationOnline InfoCenter.BigData Univ.Breadth of capabilities 2011 IBM Corporation

BigInsights ContentFunctionVersionIntegrated p (including common utilities, HDFS, MapReduce framework)0.20.2IncIncJaql (programming / query language)0.5.2IncIncPig (programming / query language)0.7IncIncFlume (data collection/aggregation)0.9.1IncIncHive (data summarization/querying)0.5IncIncLucene (text search)*3.1.0IncIncZookeeper (process coordination)3.2.2IncIncAvro (data serialization)*1.5.1IncIncHBase (real time read/write)0.20.6IncIncOozie (workflow/ job orchestration)2.2.2IncIncOnline documentationIncIncCapability to integrate with JDBC sources through general-purposeJaql module*IncIncCapability to integrate with DB2, InfoSphere Warehouse (DB2 UDFsamples to submit jobs, and read results from BigInsights)IncInc*New or upgraded in 1.2 2011 IBM Corporation

BigInsights Content ty to integrate with R (Jaql module to invoke R statisticalcapabilities from BigInsights)n/aIncCapability to integrate with Netezza, DB2 LUW with DPF from Jaqln/aIncLDAP Authenticationn/aIncIntegrated Web Console*n/aIncIntegrated workflow capabilitiesn/aIncIntegrated flexible schedulern/aIncn/aIncText analytics capabilityn/aIncEclipse support for text analytic development, Jaql, Hive, Java*n/aIncSpreadsheet-like analytical tool (BigSheets)*n/aIncPlatform performance enhancements (Adaptive MapReduce,efficient processing of compressed files)*22BasicEditionIBM Optim Development Studio V2.2.1.0*New or upgradedn/aInc 2011IBM Corporation

Announcing BigInsights V1.3Enhanced Web Console: Administration tools– View cluster health– Manage cluster access– Manage/install cluster instances. Tools for big data – Web tools to:– Run big data applications– View progress– Graph results– Integrate with BigSheets– Manage and schedule workflows, jobs, tasks,and filesGreater Efficiency: Adaptive MapReduce – Improveperformance for small jobs (without alteringhow jobs are created) Compression – Decrease disk space &storage infrastructure requirements.23Better Manageability: Development tools for:– Text analytics– Java map reduce development– Cluster file browsing– Job submission– Jaql and Hive development– Developing and publishing applications to theweb console Web Secure online REST access to clusterto automatically leverage applications andaccess data Web applications for:– Securely importing and exporting data withrelational databases– Importing and export files to the cluster– Importing data from web crawlers and socialmedia. 2011 IBM Corporation

BigInsights: Value Beyond Open Source Technical differentiators– Built-in analytics Text processing engine, annotators, Eclipse tooling Interface to project R (statistical platform)–––––Enterprise software integration (DBMS, warehouse)Simplified programming / query interface (Jaql)Integrated installation of supported open source and IBM componentsWeb-based management consolePlatform enrichment: additional security, job scheduling options,performance features, . . .– Standard IBM licensing agreement and world-class support– More to come in future releases! Business benefits––––24Quicker time-to-value due to IBM technology and supportReduced operational riskEnhanced business knowledge with flexible analytical platformLeverages and complements existing software assets 2011 IBM Corporation

BigInsights and the data warehouseBig aditionalanalytictoolsData warehouseBigInsights25 2011 IBM Corporation

BigInsights and the data warehouseTraditionalanalytictoolsBig DataanalyticapplicationsBigInsightsData Warehouse26 Query-ready archive for “cold” warehouse data 2011 IBM Corporation

Growing Ecosystem of SolutionsIBM BigInsights SolutionsPartner SolutionsCognos Consumer InsightsSocial media analytics solution that usesBigInsights. Available now.IBM Content AnalyticsUnlock valuable business insight fromunstructured data. Proof of technologycompleted. Production offering due soon. . . with more to comeIBM Big Data User EnvironmentsIBM Big Data Platform27 2011 IBM Corporation

A Closer Look at BigInsights . . . .28 2011 IBM Corporation

About the BigInsights Platform Flexible, enterprise-class support for processing large volumesof data– Based on Google’s MapReduce technology– Inspired by Apache Hadoop; compatible with its ecosystem anddistribution– Well-suited to batch-oriented, read-intensive applications– Supports wide variety of data Enables applications to work with thousands of nodes andpetabytes of data in a highly parallel, cost effective manner– CPU disks “node”– Nodes can be combined into clusters– New nodes can be added as needed without changing Data formats How data is loaded How jobs are written29 2011 IBM Corporation

The MapReduce Programming Model "Map" step:– Input split into pieces– Worker nodes process individual pieces in parallel (underglobal control of the Job Tracker node)– Each worker node stores its result in its local file systemwhere a reducer is able to access it "Reduce" step:– Data is aggregated (‘reduced” from the map steps) byworker nodes (under control of the Job Tracker)– Multiple reduce tasks can parallelize the aggregation3030 2011 IBM Corporation

Logical MapReduce Example: Word Countmap(String key, String value):// key: document name// value: document contentsfor each word w in value:EmitIntermediate(w, "1");reduce(String key, Iterator values):// key: a word// values: a list of countsint result 0;for each v in values:result ParseInt(v);Emit(AsString(result));31Content of Input DocumentsHello World Bye WorldHello IBMMap 1 emits: Hello, 1 World, 1 Bye, 1 World, 1 Map 2 emits: Hello, 1 IBM, 1 Reduce (final output): Bye, 1 IBM, 1 Hello, 2 World, 2 2011 IBM Corporation

MapReduce ProcessingInput DocumentsHello World ByeWorldHello IBMMap 1 emits: Hello, 1 World, 1 Bye, 1 World, 1 Map 2 emits: Hello, 1 IBM, 1 Reduce (final output): 32Bye, 1 IBM, 1 Hello, 2 World, 2 2011 IBM Corporation

So What Does This Result In? Easy To Scale Fault Tolerant and Self-Healing Data Agnostic Extremely Flexible33 2011 IBM Corporation

Web-based Installation, Management Consoles Integrated installation– Seamless process for single nodeand cluster environments– Post-install validation of IBM andopen source components Integrated management console––––––3434System health managementAdd / drop nodesStart / stop servicesRun / monitor jobs (applications)Explore / modify file system. 2011 IBM Corporation

BigInsights and Text Analytics Distill structured info from unstructured data Sentiment analysis Consumer behavior Illegal or suspicious activities . Pre-built library of text annotators for commonbusiness entities Rich language and tooling to build customannotators Support for Western languages (English,Dutch/Flemish, French, German, Italian,Portuguese, or Spanish) plus select Asianlanguages (Japanese, son""PhoneNumber""StateOrProvince""URL""ZipCode" 2011 IBM Corporation

BigInsights Text Analytics Development36 2011 IBM Corporation

Example Analysis : Extraction from Twitter messagesExtract intent, interests, life events and micro segmentationattributesIhadanMonetizable IntentI had t's)! ionName, Birth ovingtotomiamimiamiinin33months.months.i gbsaYRhttp://4sq.com/gbsaYRWhile accounting for less relevantmessagesSubtle Spam,AdvertisingSarcasm,Wishful Thinking3737I topten!!!ten!!!BuythemonitunesBuy them on llPhones,Phones,WindowsMobileWindows paperpaper2day.2day.Before I die, I want a versace purple diamond tiara. Im justBefore I die, I want a versace purple diamond tiara. Im untoday!today!I ongislandsound,wrap around porch . . wading river on the long island sound,hahai iwish!wish! 2011 IBM Corporation

Spreadsheet-like Analysis Tool Web-based analysisand visualization tool Spreadsheet-likeinterface– Define and managelong running datacollection jobs– Analyze content of thetext on the pages thathave been retrieved38 2011 IBM Corporation

Business Process Accelerators (“apps”) Resuable software assets based on customer engagements– Useful for starting point for various applications– Can be customized by BigInsights application developers as needed– Accessible through Eclipse, Web console Available assets– Data import/export (from relational DBMS, files)– Web crawler– Boardreader.com support (Web forum search engine)39 2011 IBM Corporation

Performance enhancements Flexible job scheduler option– Optimize response time for small jobs– Available in addition to FAIR, FIFO scheduling Adaptive MapReduce– Speeds up a class of jobs (e.g., jobs that process small files)– Accomplished by changing how certain MapReduce tasks executed Mappers can decide at runtime to take on more work (until itdoesn’t make sense anymore). Communication viaZooKeeper.– Enabled through Jaql option, MapReduce job property setting Efficient processing of compressed text data– Use multiple Map tasks (vs Hadoop default of 1) for processingcompressed text files– Enabled through BigInsights LZO-based compression technology– Automatic with Jaql; programming option with Java MapReduce40 2011 IBM Corporation

BigInsights Connectivity to DBMS / WarehouseSample UDFs tosubmit BigInsights jobs,consume resultsJaql MS BigInsights drives RDBMS work DB2 drives BigInsights work41Jaql read/write 2011 IBM Corporation

(As of October 2011)InfoSphere BigInsights RoadmapV1.1.0.1 – June 2011V1.1.2– July 2011 IBM BigSheets for dataexploration and analysiswithout MapReduceprogramming Text analytics tools forimproved usability andaccelerated time to valueV1.1 – May 2011V1.2 – August 2011 Integrated install Apache Hadoop and associated ecosystemcomponents DB2 integration w/ Jaql & SQL Netezza connector Integrated Text Analytics engine LDAP Authentication Web Console for administration Job scheduler Jaql query language R integration for statistical computing Optim Development Studio Further enhancements for textanalytics tools Generic JDBC connector forJaql Installer enhancements42V1.3 – Nov 2011 Dev tools for Java, Hive and Jaql. Web admin tools for cluster mgmt Integration of BigSheets with webconsole Adaptive MapReduce andcompression for greater speed andefficiency. Tools for data import & exportFuture Additional analytical toolkitsincluding predictive analytics andmachine learning. Enhancements to developer andadmin interface Further integration with InformationManagement portfolio Performance and reliabilityenhancements Innovation through Researchpartnerships 2011 IBM Corporation

Trends and directions Enterprise software integration–––––Data warehouses, RDBMSs (IBM and non-IBM)ETL platforms (e.g., DataStage)Business intelligence tools (e.g., Cognos, SPSS)Applications (e.g., Coremetrics, IBM partners). Diverse range of analytics– Text– Image / video (e.g., content-based user profiling)– Predictive modeling (e.g., ranking and classification based onmachine learning)– . Sophisticated, scalable infrastructure and toolingfor processing massive data volumes– High-performance file system with POSIX compliance, granularsecurity– Fully recoverable and restartable workflows– Parallel, distributed indexing for text (“BigIndex”)– Tooling for administrators, programmers, analysts– Pre-built business process accelerators (“apps”)– .43 2011 IBM Corporation

About Big Data and BigInsights . . . Big Data is a strategic initiative for IBM– Significant investments across software, hardware and services. InfoSphere BigInsights––––––Enables firms to exploit growing variety, velocity, and volume of dataDelivers diverse range of analyticsLeverages and extends open sourceProvides enterprise-class features and supporting servicesComplement existing software investments and commercial offeringsAvailable in basic (free) and enterprise editions IBM advantage– Full solution spanning software, hardware & services– Rapid technology advances through partnerships with IBM Research– Global reach44 2011 IBM Corporation

Getting Off to a Fast Startwith IBMNovember 7, 2011 2011 IBM Corporation

BigInsights – Try Before You Buy In the Cloud– Via RightScale, or directly on Amazon, Rackspace, IBM SmartEnterprise Cloud, or on private clouds.– Pay only for the resources used. In the Virtual Classroom– Free Hadoop Fundamentals training course @www.bigdatauniversity.com On Your Cluster– Download Basic Edition from ibm.com. In the Classroom– Enroll in the InfoSphere BigInsights Essentials course.46 2011 IBM Corporation

Visit the BigInsights technical portal . . . . Free links to papers, demos, discussion forum, andmore /4747 2011 IBM Corporation

IBM big data IBM big dataTHINKIBM big data IBM big dataNovember 7, 2011IBM big data IBM big dataIBM big dataIBM big data IBM big data IBM big data 2011 IBM Corporation

Supplemental49 2011 IBM Corporation

Sampling of MapReduce Use Cases Extracted from public Web sites . . . .– AOL: advanced algorithms for doing behavioral analysis and targeting– Detikcom (Indonesian portal): analyze search log, generate Most Viewed News– eBay: Search optimization, research– Facebook: store copies of internal log and dimension data sources and use it asa source for reporting/analytics and machine learning.– Financial institutions: determine credit worthiness for loan applicants – reviewchanges in buying behaviors, etc.– LinkedIn: determine “People you may know”– Tennessee Valley Authority: Analyze electrical power sensor data to betterpredict power failures– Web advertisers: analyze historical click stream data, determine better adchoices50 2011 IBM Corporation

Sample Scenarios for Internet-Scale AnalyticsFinancial Services Improved risk decisions Customer sentiment analysis AMLTransportation Weather and trafficimpact on logistics andfuel consumptionCall Centers Voice-to-text mining forcustomer behaviorunderstandingTelecommunications Operations and failureanalysis from device, sensor,and GPS inputs5151Utilities Weather impact analysis onpower generation Smart meter data analysisIT Transition log analysisfor multipletransactional systemsE Commerce Analyze internet behaviorand buying patterns Digital asset piracyMulti-channel Integration Integrated customer behaviormodeling 2011 IBM Corporation

What is Hadoop? Apache Hadoop free, open source framework for dataintensive applications– Inspired by Google technologies (MapReduce, GFS)– Well-suited to batch-oriented, read-intensive applications– Originally built to address scalability problems of Nutch, an open sourceWeb search technology Enables applications to work with thousands of nodes andpetabytes of data in a highly parallel, cost effective manner– CPU disks of commodity box Hadoop “node”– Boxes can be combined into clusters– New nodes can be added as needed without changing Data formats How data is loaded How jobs are written52 2011 IBM Corporation

Two Key Aspects of Hadoop MapReduce framework– How Hadoop understands and assigns work to the nodes(machines) Hadoop Distributed File System HDFS– Where Hadoop stores data– A file system that spans all the nodes in a Hadoop cluster– It links together the file systems on many local nodes tomake them into one big file system53 2011 IBM Corporation

What is the Hadoop Distributed File System? HDFS stores data across multiple nodes HDFS assumes nodes will fail, so it achievesreliability by replicating data across multiple nodes The file system is built from a cluster of data nodes,each of which serves up blocks of data over thenetwork using a block protocol specific to HDFS.54 2011 IBM Corporation

How To Create MapReduce Jobs MapReduce development in Java Pig– Open source language / Apache sub-project Hive– Open source language / Apache sub-project– Provides a SQL-like interface to Hadoop Jaql– IBM Research Invented query language– Very useful for loosely structured data . . .55 2011 IBM Corporation

Limitations with Apache Hadoop (examples) Need to “roll your own” or “deal with multiple suppliers”– Iteratively install, configure, and test Hadoop and complementaryprojects– Verify software pre-requisites and project versions for compatibility– Add-your-own analytics Pig/Hive (Languages)– Limited support for nested objects, multi-level hierarchies– No built-in connectivity to commercial DBMSs Storage: Hadoop Distributed File System (HDFS)––––NameNode single point of failureLimited POSIX compliance. Cannot run other applications on node.Poor security at the file systemPoor performance with random reads and writes Technical support via open source community (or yourown experts)56 2011 IBM Corporation

IBM: Building with the Open Source CommunityBig Data PlatformLeveragingOpen SourceInnovation andGivingBack Contributing jaqlPIGZooKeeper57 2011 IBM Corporation

Streams and BigInsights - Integrated Analytics on Data inMotion & Data at RestVisualization ofreal-time andhistorical insightsInfoSphereStreamsDataData Integration,data mining,machine learning,statistical modeling1. Data Ingest2. Bootstrap/EnrichData ingest,preparation,online analysis,model ase &Warehouse3. Adaptive Analytics Model 2011 IBM Corporation

IBM WatsonIBM Watson is a breakthrough in analytic innovation, but it is only successfulbecause of the quality of the information from which it is working.59 2011 IBM Corporation

Big Data and WatsonBig Data technology is used to buildWatson’s knowledge baseWatson technology offers great potentialfor advanced business analyticsWatson uses the Apache Hadoop openframework to distribute the workload forloading information into memory.CRM DataSocial MediaPOS DataApprox. 200M pages of text(To compete on Jeopardy!)InfoSphere BigInsightsWatson’sMemory60Distilled Insight- Spending habits- Social relationships- Buying trendsAdvancedsearch andanalysis 2011 IBM Corporation

Sample hardware configuration Hardware requirements vary by customer workload Reference hardware configuration for storage-dense or dataintensive workloads– IBM System x3630– Two 6-core processors with 24TB local attached storage, 24GB RAM,Gigabit network– Replication factor of 3 for fault tolerance and distributed processing.61 2011 IBM Corporation

BigInsights and the Cloud IBM SmartCloudEnterprise Amazon, Rackspaceclouds throughRightScale.com Low hourly chargesIM Cloud Computing Center of Competence IMcloud@ca.ibm.com62 2011 IBM Corporation

IntranetBigInsights Se

IBM & non-IBM InfoSphere MDM DB2 & non-IBM Cognos & SPSS Unica ECM Data Growth Management InfoSphere Optim Rules / BPM iLog & Lombardi Data Warehouse InfoSphere Warehouse IBM Big Data Solutions Client and Partner Solutions Big Data Enterprise Engines Big Data Accelerators Text Image/Vi

Related Documents:

InfoSphere DataStage—Processes changes delivered from InfoSphere CDC that can be used by InfoSphere DataStage jobs. 4. Related information: Supported sources and targets 5-----IBM InfoSphere Change Data Capture, Version 10.2 About InfoSphere CDC and InfoSphere CDC Management Console

creating any warranties or representations from ibm (or its suppliers or licensors), or altering the terms and conditions of any agreement or license governing the use of ibm products and/or software. IBM, the IBM logo, ibm.com, InfoSphere, IBM InfoSphere Information Server, IBM InfoSphere

analytics jobs, and then the reduced result sets are combined to provide the complete answer. This IBM Redbooks Solution Guide is intended to help organizations understand how IBM InfoSphere BigInsights for Linux on System z and other related technologies can help deliver improved business outcomes as part of a big data strategy.

Additional experience available upon request Professional certifications - IBM 2016 - IBM InfoSphere Change Data Capture Technical Mastery 2016 - IBM InfoSphere DataStage v11.3 2014 - IBM Big Data Fundamentals Technical Mastery 2014 - IBM InfoSphere BigInsights Technical Mast

IBM analytics and tooling simplify development and speed time-to-value. "You have done in 2 weeks what I have been trying to generalize for the past 6 months."-- Customer project leader Information Management Solution Components: IBM InfoSphere BigInsights Enterprise Edition including: Spreadsheet data discovery and visualization

The following instructions are for installing IBM InfoSphere Discovery with IBM DB2 Express Edition. Note: If IBM InfoSphere Discovery cannot be installed using these steps, install the product using the instructions in the IBM InfoSphere Discovery Installation Guide. Procedure 1. Make sure the host meets the hardware and software prerequisites. 2.

Nov 04, 2016 · BigInsights Premium replaces Data Analyst and Data Scientist packages Simplified packaging includes all “value-adds” as BigInsights Premium Basic Plan(pay-as-you-go model) in Bluemix under BigInsights for Apache Hadoop IOP clusters on an ho

ASTM C 76 specification for reinforced sewer and storm drain pipes. (Only on special requests) inTernaTional & local approVals ISO 9001 registered firm. M.P.W Approval QualiTy assurance Concrete pipe factory has an independent quality control department with a well equipped laboratory, aided by experienced staff to give the necessary backup for the quality assurance program .