Text Analytics 101 - SAS

1y ago
2 Views
2 Downloads
540.24 KB
12 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Joao Adcock
Transcription

WEBCAST SUMMARY Text Analytics 101: Improve Decision-Making by Incorporating Unstructured Data – Words and Images – into Analytic Processes Insights from a webinar in the SAS Applying Business Analytics Series Originally broadcast in April 2010 Featuring: Fiona McNeill, Global Product Marketing Manager, SAS Kathy Lange, Senior Business Director, SAS Business Analytics Practice

TEXT ANALYTICS 101 Organizations are awash in data – gigabytes and terabytes and petabytes of it, churned out daily by operational/transactional systems, imported from purchased databases and propagated through analysis and reporting. Exabytes, zettabytes and yottabytes are on the horizon. But that’s only the tip of the data iceberg. By some estimates, this structured (numerical) data represents only about 25 percent of the information in an organization. A minimum of 70 percent of data is actually unstructured data – freeform text, images, audio and video captured from online and offline sources. This data volume is expected to expand by 35 to 45 percent in the next two to three years. Unstructured data comes from Web documents, correspondence, contact center records, social media, blogs, claims, customer complaints and any number of other sources. The remaining 5 percent or so of an organization’s information is considered semistructured – a hybrid of freeform and structured data – such as an e-mail, which has structured data in the header and unstructured text in the body. What could your organization do if you could harness the insights hidden within that vast sea of words and images? Imagine how much more insight could be gained by drawing on three or four times as much information. Picture how it would improve knowledge sharing and decision making if useful content was easy to find and use – and automatically included in analytical processes. That is where text analytics comes in. Text analytics extract relevant information, and interpret, mine and structure that information to reveal patterns, sentiments and relationships within and among documents. The SAS Text Analytics framework includes four key components: Automated content categorization makes information searches far faster and more effective than manual methods. Ontology management links text repositories together through consistently and systematically defined relationships. Sentiment analysis automatically locates and extracts sentiment from online materials, such as social networking sites, comments and blogs on the Internet, as well as internal electronic documents. Text mining provides powerful ways to explore unstructured data to discover previously unknown concepts and patterns. How can a machine interpret the nuances of human language and other freeform information and use it for meaningful structured analysis? That was the topic of a SAS webinar in the Applying Business Analytics Series. Fiona McNeill, Global Product Marketing Manager at SAS, described text analytics and presented examples of how companies in various industries have applied this technology. Kathy Lange, Senior Business Director in the SAS Business Analytics Practice, explained how natural language processing works and demonstrated some key text analytics capabilities. 1

TEXT ANALYTICS 101 Top Five Big Ideas in Text Analytics 1. If it’s true that 70 percent of available information is unstructured data – plus some combination of structured and unstructured data – shouldn’t your organization be tapping into that data to more effectively manage content management and better understand the business? 2. Text analytics is the use of computer software to: Annotate and extract information from unstructured data sources. Identify entities, concepts, facts, attributes and attitudes in that material. Discover new topics and patterns from documents. Categorize, classify and associate documents to improve relevance in search and retrieval. 3. Text analytics can either be discovery-driven, where you explore the data and let it tell you what it knows, or domain-driven, where you start with knowledge and see where it appears in the documents. 4. Text analytics brings together multiple disciplines: Natural language processing – a discipline from the field of artificial intelligence – combines computer science and linguistics to identify meaningful concepts, attributes and opinions in the spoken or written word. Advanced linguistic rules are applied to documents using the same rigorous analysis process (model creation, testing, validation, deployment and assessment) typically associated with structured data. Text mining involves techniques from several areas, including the fields of computational linguistics and information retrieval, to structure text into a numeric representation for use in traditional data mining and predictive analysis. 5. Text analytics tools enable the discovery and formalization of linguistic rules to consistently and automatically decipher documents – providing better answers to investigative questions. 2

TEXT ANALYTICS 101 Text Analytics in Action Text analytics can be approached from two different directions, McNeill explained: Discovery-driven. When you don’t know where to start, a discovery-driven approach helps identify key patterns and attributes in the unstructured data at hand. This exploration reveals new insights, which are then used to define the structure, such as the categories and concepts you will use. Domain-driven. If there is already an understanding of the data or some domain knowledge regarding which terms and phrases are meaningful, you can start with this knowledge and find where it exists in the materials. “Both approaches are valid, and more importantly, they complement each other,” said McNeill. “Discovery of concepts can be used to define a structure or taxonomy for the data. On the other hand, content that doesn’t fit into a predefined structure can be further explored using discovery to find previously unknown information.” Organizations in a variety of industries – from the public and private sector, from manufacturing to finance to health care – are using these approaches in inventive ways, said McNeill, who presented success stories from a dozen organizations. “ With text analytics, the days of manually reading through stored documents to find information are gone. What’s more, by examining entire collections of materials, people are discovering patterns that would never emerge from reading each document in isolation.” Fiona McNeill, Product Marketing Manager, SAS Manufacturing A manufacturer of high-end kitchen appliances uses text analytics to automatically analyze warranty claim data, looking for patterns among claims that could indicate possible issues. This information is proactively referred to product engineers for investigation. This early-warning detection enables the company to resolve emerging issues before other customers experience a problem. A leading IT manufacturer uses text analytics to automatically group and classify information from millions of sources. In the call center alone, the company has more than 300,000 records, plus a growing volume of e-mails, customer surveys, claims inquiries and feedback. The information volumes were simply too great to analyze manually; the task was like reading 500 copies of War and Peace – impossible. With automated content categorization, tasks that formerly required hours are now done in minutes, with 95 percent accuracy. 3

TEXT ANALYTICS 101 Government and Research The efficiency unit in an Asian government agency uses text analytics to analyze more than 2.5 million calls and nearly 100,000 e-mails, inquiries and complaints received each year. The unit can now proactively identify social or public health issues that could affect the government departments it supports and improve service delivery. The unit recently won a Best Public Service Information and Communications Technology award for its application. A national security R&D team in a federal agency uses text analytics to improve search and retrieval of information from its dynamic and growing collection of research and scientific papers. Fast, accurate access to needed documents has greatly extended the value of the department’s knowledge base. Health and Life Sciences A health insurance company uses text analytics to analyze claims information to better understand health and safety concerns for various occupations and reduce workplace injuries and accidents. Text analysis has led to more effective preventative measures – and identified 600 cases that would not have been found through historical methods of code matching or any other structured analysis. A major pharmaceutical manufacturer is automatically categorizing content in its digital information stores, reducing the time required to keep materials up to date and making it easy to find important reference information related to health conditions, therapies and ongoing research. Media and Publishing A newspaper publisher that serves more than 20 major markets uses text analytics to keep content and topics up to date from newsfeeds and user-generated content. By fully automating electronic document organization within its network of websites, the company has boosted search engine placement, even when critical search terms do not appear in the headlines. A global publisher is using text analytics to serve more than 30 million users worldwide, improving access to 20 different document databases in dozens of languages, each containing millions of records. Text analytics helps the company improve product line management, keep track of information and support comprehensive search activity. 4

TEXT ANALYTICS 101 Finance A European financial services firm uses text analytics to augment the information found in traditional credit bureau scores to better align the objectives of lenders and borrowers in its online social lending network. The firm has found that data from social network profiles improves its calculations of an applicant’s creditworthiness. A worldwide bank uses text analytics to organize millions of documents in dozens of languages in support of funding initiatives. By automating support for requests, the productivity of search and retrieval improved from three documents per hour to 50,000 per hour. Text analytics reveals insights from electronic text materials, associates them so they go to the right person and place, and provides intelligence to know what you need to do next – whether it is answering complex search-and-retrieval questions, presenting relevant content to internal or external Web users, or predicting which phrase will best affect sentiments. E-Business Staff members at an online job search organization no longer spend hours trying to manually sort through recruitment posts and pair them with résumés. Text analytics automatically matches résumés to job postings with more than 95 percent accuracy, even when documents are written in different styles and saved in different formats: Word, PDF, HTML, etc. As a result, the company is much faster to contact desired candidates with relevant notification triggers. A meta-search company, which gleans the best search results from a number of other search engines, uses text analytics to crawl, index, match, correct spelling errors and categorize information. Text analytics has improved the speed and relevance of search returns, which in turn has helped boost the company’s value to advertisers. A High-Level Look at Natural Language Processing Natural language processing (NLP) combines computer science and linguistics to identify meaningful concepts and attributes in the spoken or written word. In the context of text analytics, this analysis most often applies to electronic documents. In its simplest form, natural language processing is based upon rules of various types: Style conventions indicate the start and end of a word or sentence. Rules are unique for each language. For example, the beginning of a sentence in English is marked by a capital letter, but in German, every noun is capitalized. In English, words are separated by spaces and/or punctuation, but not in Chinese. The structure of the sentence determines the part of speech – noun, adjective, adverb, etc. – such as the difference between “the team can service eight cars per hour,” or “the team has enhanced its service protocol.” The context of the sentence aids with disambiguation, such as differentiating between car repair services and religious services, or between Amazon the river and Amazon the e-commerce giant. 5

TEXT ANALYTICS 101 Built-in lists store knowledge associated with: entities (recognizable people, places, organizations, currency, etc.); word stems (such as plural, past and future tense variants of a word); synonyms (different words that have the same meaning); spellings (which may or may not be autocorrected, depending on the need); and filtering (overlooking extraneous words, such as conjunctions). Some of the earliest forms of natural language processing used machine learning algorithms, such as decision trees that produced hard if-then rules from structured fields of categorized data. Likelihood statistics and even probabilities were then used to assign weights to categories. Modern-day text analytics goes beyond simply counting and comparing words that have been structured into relational fields. It now understands context at a deeper level by assessing words in relation to each other from the freeform unstructured text. For example, text analytics can: Understand abstract concepts, such as “blue screen of death,” even though the phrase is not about death. Extract facts, differentiating between “Driver A was hit by Driver B” and “Driver A hit Driver B.” Discern attitude polarity, whether the content expresses a positive, negative, mixed or neutral sentiment. These capabilities, coupled with the higher processing power now available, have made text analytics with huge data volumes practical for real-world business problems. A Guided Tour of Text Analytics Lange provided a walkthrough of some SAS Text Analytics capabilities. You can see this introduction to key aspects of the solution in the Text Analytics 101 webinar, which is available to watch on demand at www.sas.com/reg/web/corp/907006. Here are some of the capabilities shown in Lange’s demonstration. Identify trends in word usage. Exploratory data analysis on text is much like exploratory data analysis on numerical data, Lange explained. The process follows the same systematic steps: model creation, testing, validation, deployment and assessment. To start, you might examine the number of times a term appears in the document(s) at hand. The display shows the most frequently used terms. For any term marked with a plus symbol, you can expand the view to see stems (synonyms or variants of the term). 6 Text analytics is the use of computer software to annotate and extract information from electronic text sources – finding the key concepts, patterns and facts – and analyzing that information for business purposes.

TEXT ANALYTICS 101 For each term, you see how often it appears, the number of documents in which the terms were found, the weighted importance of the term, its role (part of speech), attribute (alpha character or not), and whether or not the term should even be kept in the analysis. Some of the same information can be displayed graphically, where the size and colors of the words indicate their frequency and importance, and the proximity of the words represents how closely they are associated. “You could also go the opposite way, and look at a graphic representation of outliers, if you’re trying to find very rare occurrences of words,” said Lange. “This technique is used primarily to understand word usage on websites, but it is increasingly being used to analyze political speeches as well, perhaps to understand hidden meanings.” Prepare data for analysis. Just as with traditional data analysis, good data preparation can be invaluable for improving the quality of results. Just as with structured data analysis, analysts can spend up to 80 percent of their time in this data prep stage, said Lange. For instance, a common problem in text analytics is the high rate of misspellings. The tool might find, for instance, that “service” has been misspelled 29 different ways in the documents being analyzed. You can see the number of documents containing misspellings or those that have no errors, the role each word played and more. “You might want to resolve this inconsistency before you start any type of analysis,” said Lange. “Maybe you want to automatically fix the misspellings, or maybe not. Depending on the analysis, fixing the misspellings could cause you to lose valuable information. Maybe you want to treat all of these versions of ‘service’ – spelled correctly or not – as synonyms, so we won’t miss any occurrences of these words in further analysis.” Link concepts that should be considered as equivalents. “We can link concepts together to be treated as the same in further analysis,” said Lange. “For example, if I’m trying to identify customer support and service issues, I can tell the application to treat ‘service’ and ‘support’ as synonyms. “Similarly, you could transform abbreviations into a consistent set of terms. I might tell the system that UPR means the same as ‘updated patient records.’ LVM means the same as ‘left voice mail.’ If you were analyzing customer contact center records, you could decree that LVM, ‘left voice mail,’ ‘no answer,’ ‘wrong number,’ and ‘no longer in service’ are all grouped under the concept of ‘unsuccessful attempts to contact the customer.’” 7

TEXT ANALYTICS 101 See how terms are being used in context. If we focus only on the term “service,” we would miss important contextual clues that help us understand how the word is being used and what that means for our model rules, said Lange. “I search for all the concepts related to ‘service’ – not just the word ‘service,’ but the characters in front of ‘service’ and the terms I have associated with ‘service.’” You can display snippets of text that provide context surrounding the search term, so you can see how the term is being used. This information helps refine rules. In automated analysis, you certainly want to be able to differentiate between “good service,” “bad service,” “not bad service,” “very bad service,” and “oh, I’m so bad, service tech came on time but I wasn’t home.” You can search, filter and get context to determine which are relevant uses of the term for your analysis, and which are not. Develop and validate models. At the next step, we start to build statistical models that work with business rules to successfully categorize the terms and concepts under study. The rules can have Boolean logic – such as “if,” “and,” “or,” and “not” – to help understand words in context. Here’s where a domain expert and subject matter expert can really improve the model, identifying linguistic rules and industry-specific ways of speaking. The model is trained by being applied to a sample set of documents. You then test the model against a similar set of documents to validate the accuracy and precision of the model. Did it deliver consistent results when applied against separate but similar sets of documents? “Model creation, testing and validation is an iterative process,” Lange noted. “If you test a model and you find that documents are not being categorized the way you would like, you go back and generate additional specifications, so the model will predict better.” Discover trends. Text analytics adds a richness to traditional numeric analyses, revealing trends that might otherwise have gone unnoticed. For instance, the display on page 9 shows a growing number of customer complaints about sudden acceleration in a vehicle brand not previously associated with that defect. “Hindsight reporting of customer service calls can discover patterns, such as the top 10 defects being reported, but through predictive modeling and text mining, we could discover emerging issues before they become top 10,” said Lange. 8

TEXT ANALYTICS 101 WAS VEHICLE INVOLVED IN A FIRE Y OR N NUMBER OF FATALITIES ASTRO N 0 FUEL: THROTTLE LINKAGES AND CONTROL 1 ACCIDENT CAUSED BY SUDDEN ACCELERATION. *LDG 1990 CHEVROLET TRUCK BLAZER N 0 FUEL: THROTTLE LINKAGES AND CONTROL 3 ACCIDENT/INJURY CAUSED BY SUDDEN ACCELERATION. *LDG 1991 FORD MOTOR COMPANY FORD TRUCK AEROSTAR N 0 FUEL: THROTTLE LINKAGES AND CONTROL 1 SUDDEN ACCELERATION IN REVERSE, CAUSING ACCIDENT. *TW 1989 FORD MOTOR COMPANY FORD TRUCK AEROSTAR N 0 FUEL: THROTTLE LINKAGES AND CONTROL 1 SUDDEN ACCELERATION IN REVERSE, CAUSING ACCIDENT. *AJ 1991 FORD MOTOR COMPANY FORD TRUCK AEROSTAR N 0 FUEL: THROTTLE LINKAGES AND CONTROL 1 SUDDEN ACCELERATION, CAUSING ACCIDENT. *LDG 1991 FORD MOTOR COMPANY FORD TRUCK F250 N 1 FUEL: THROTTLE LINKAGES AND CONTROL 1 SUDDEN ACCELERATION, SURVEY BY THE ATTORNEY FOR ESTATE VVS BOX 1545., AK 1990 MANUFACTURER NAME VEHICLE/EQUIP MAKE VEHICLE/ EQUIP MODEL GENERAL MOTORS CORP. CHEVROLET TRUCK GENERAL MOTORS CORP. SPECIFIC COMPONENT DESCRIPTION NUMBER OF OCCURRENCES SUMMARY MODEL YEAR Similarly, you might find that certain terms a customer uses in a survey or a conversation with a service rep are associated with a 50 percent higher rate of attrition – the context of the numbers. If this knowledge is embedded into predictive models, you could quickly identify other customers who are more likely to leave, and take appropriate action. “If you incorporate all kinds of data – both text and numeric data – you build a more accurate model,” said Lange. “You’re learning information from the model, then going back and taking that information into other types of models and refining them, exploring again and bringing back the information. This is really an iterative process, testing and learning and updating your models, checking results and then refining the models all over again.” Automatically classify content. For many organizations, text analytics isn’t just about pulling more information into business decision making; for some, information is the core business. Search engine sites, media companies, R&D groups, organizations that maintain large websites – all of them need to be able to keep track of content, find it fast and deliver it to users on request – without human intervention. Text analytics is the technology at work behind the scenes when a website offers you personalized content, hyperlinks to related topics and a constantly changing list of the most popular search topics du jour. In this application, the models are looking for concepts in order to tag content by topics of interest. Stories from news feeds, volumes of research reports, results of Web searches, millions of posts on blogs and social media sites, billions of tweets – all of this potentially useful content can be processed at a rate of thousands of documents per second. 9

TEXT ANALYTICS 101 Closing Thoughts “I see text analytics as the combination of two worlds coming together,” said Lange. “Text and numbers come together in converged analysis. Business rules and domain expertise come together in statistical models. Human knowledge and computer technology come together to uncover ideas that neither one could find alone – at speeds that humans could never achieve.” SAS Text Analytics provides a rich suite of tools for discovering and extracting knowledge from text documents. With this solution, you can integrate text-based information with structured data and predictive analytics for better answers to complex questions. Using a combination of advanced statistical modeling, natural language processing and advanced linguistic analysis, SAS quickly and automatically deciphers large volumes of multilingual content to discover trends, patterns and sentiments locked away in textual content. For more information: SAS Text Analytics – www.sas.com/text-analytics The SAS text mining blog, The Text Frontier – blogs.sas.com/text-mining To view the Text Analytics 101 webinar: www.sas.com/reg/web/corp/907006 About the Presenters Fiona McNeill, Global Product Marketing Manager, SAS, oversees product marketing for SAS Text Analytics. During her more than 12 years at SAS, she has defined product strategy, forged corporate partnerships, and helped organizations derive tangible benefits from their strategic use of SAS technology. She has received multiple innovation awards for her work. McNeill has an MA in quantitative behavioral geography from McMaster University and graduated cum laude with a BSc in biophysical systems from the University of Toronto. Kathy Lange, Senior Business Director, SAS Business Analytics Practice, has more than 25 years of experience selling and implementing analytics solutions. The SAS Business Analytics practice helps customers define their business problems and craft strategies for solving those problems with integrated SAS solutions, including business intelligence, data integration and advanced analytics. Lange holds a BS in mathematics from the University of Delaware and an MS in operations research from Union College. 10

SAS Institute Inc. World Headquarters 1 919 677 8000 To contact your local SAS office, please visit: www.sas.com/offices SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. Copyright 2011, SAS Institute Inc. All rights reserved. 104993 S64776.0211

Text Analytics 101: Improve Decision-Making by Incorporating Unstructured Data - Words and Images - into . Unstructured data comes from Web documents, correspondence, contact center records, social media, blogs, claims, customer complaints and any number of other

Related Documents:

POStERallows manual ordering and automated re-ordering on re-execution pgm1.sas pgm2.sas pgm3.sas pgm4.sas pgm5.sas pgm6.sas pgm7.sas pgm8.sas pgm9.sas pgm10.sas pgm1.sas pgm2.sas pgm3.sas pgm4.sas pgm5.sas pgm6.sas pgm7.sas pgm8.sas pgm9.sas pgm10.sas 65 min 45 min 144% 100%

Text text text Text text text Text text text Text text text Text text text Text text text Text text text Text text text Text text text Text text text Text text text

SAS OLAP Cubes SAS Add-In for Microsoft Office SAS Data Integration Studio SAS Enterprise Guide SAS Enterprise Miner SAS Forecast Studio SAS Information Map Studio SAS Management Console SAS Model Manager SAS OLAP Cube Studio SAS Workflow Studio JMP Other SAS analytics and solutions Third-party Data

Both SAS SUPER 100 and SAS SUPER 180 are identified by the “SAS SUPER” logo on the right side of the instrument. The SAS SUPER 180 air sampler is recognizable by the SAS SUPER 180 logo that appears on the display when the operator turns on the unit. Rev. 9 Pg. 7File Size: 1MBPage Count: 40Explore furtherOperating Instructions for the SAS Super 180www.usmslab.comOPERATING INSTRUCTIONS AND MAINTENANCE MANUALassetcloud.roccommerce.netAir samplers, SAS Super DUO 360 VWRuk.vwr.comMAS-100 NT Manual PDF Calibration Microsoft Windowswww.scribd.com“SAS SUPER 100/180”, “DUO SAS SUPER 360”, “SAS .archive-resources.coleparmer Recommended to you b

SAS Asset and Liability Management for Banking Server 3.1 - 5.1 on 9.2 - 9.4 SAS Asset and Liability Management Server 5.1.2 on 9.4M6 or later SAS Asset Performance Analytics 6.1 - 6.2M1 on SAS 9.4 SAS Asset Performance Analytics 6.3 on SAS 9.4M6 or later or SAS Analytics for IoT 7.1 on SAS Viya 3.5 or later

Both SAS SUPER 100 and SAS SUPER 180 are identified by the “SAS SUPER 100” logo on the right side of the instrument. International pbi S.p.AIn « Sas Super 100/180, Duo Sas 360, Sas Isolator » September 2006 Rev. 5 8 The SAS SUPER 180 air sampler is recognisable by the SAS SUPER 180 logo that appears on the display when the .File Size: 1019KB

Jan 17, 2018 · SAS is an extremely large and complex software program with many different components. We primarily use Base SAS, SAS/STAT, SAS/ACCESS, and maybe bits and pieces of other components such as SAS/IML. SAS University Edition and SAS OnDemand both use SAS Studio. SAS Studio is an interface to the SAS

SAS Stored Process. A SAS Stored Process is merely a SAS program that is registered in the SAS Metadata. SAS Stored Processes can be run from many other SAS BI applications such as the SAS Add-in for Microsoft Office, SAS Information Delivery Portal, SAS Web