A Toolkit For Detecting Technical Surprise

3y ago

16 Views

2 Downloads

6.47 MB

61 Pages

Last View : 2m ago

Last Download : 3m ago

Upload by : Tia Newell

Report this link

Download PDF

Transcription

SANDIA REPORTSAND2010-7392Unlimited ReleasePrinted October 2010A Toolkit for Detecting TechnicalSurpriseMichael W. Trahan, Mark C. FoehsePrepared bySandia National LaboratoriesAlbuquerque, New Mexico 87185 and Livermore, California 94550Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly ownedsubsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administrationunder contract DE-AC04-94AL85000.Approved for public release; further dissemination unlimited.

Issued by Sandia National Laboratories, operated for the United States Department of Energyby Sandia Corporation.NOTICE: This report was prepared as an account of work sponsored by an agency of theUnited States Government. Neither the United States Government, nor any agency thereof, norany of their employees, nor any of their contractors, subcontractors, or their employees, makeany warranty, express or implied, or assume any legal liability or responsibility for theaccuracy, completeness, or usefulness of any information, apparatus, product, or processdisclosed, or represent that its use would not infringe privately owned rights. Reference hereinto any specific commercial product, process, or service by trade name, trademark,manufacturer, or otherwise, does not necessarily constitute or imply its endorsement,recommendation, or favoring by the United States Government, any agency thereof, or any oftheir contractors or subcontractors. The views and opinions expressed herein do notnecessarily state or reflect those of the United States Government, any agency thereof, or anyof their contractors.Printed in the United States of America. This report has been reproduced directly from the bestavailable copy.Available to DOE and DOE contractors fromU.S. Department of EnergyOffice of Scientific and Technical InformationP.O. Box 62Oak Ridge, TN 37831Telephone:Facsimile:E-Mail:Online ordering:(865) 576-8401(865) /bridgeAvailable to the public fromU.S. Department of CommerceNational Technical Information Service5285 Port Royal Rd.Springfield, VA 22161Telephone:Facsimile:E-Mail:Online order:(800) 553-6847(703) v/help/ordermethods.asp?loc 7-4-0#online2

SAND2010-7392Unlimited ReleasePrinted October 2010A Toolkit for Detecting Technical SurpriseMichael W. TrahanEmergent Threats DepartmentMark C. FoehseProliferation Sciences DepartmentSandia National LaboratoriesP.O. Box 5800Albuquerque, New Mexico 87185-MS1207AbstractThe detection of a scientific or technological surprise within a secretive country orinstitute is very difficult. The ability to detect such surprises would allow analysts toidentify the capabilities that could be a military or economic threat to nationalsecurity. Sandia’s current approach utilizing ThreatView has been successful inrevealing potential technological surprises. However, as data sets become larger, itbecomes critical to use algorithms as filters along with the visualizationenvironments.Our two-year LDRD had two primary goals. First, we developed a tool, a SelfOrganizing Map (SOM), to extend ThreatView and improve our understanding of theissues involved in working with textual data sets. Second, we developed a toolkit fordetecting indicators of technical surprise in textual data sets. Our toolkit has beensuccessfully used to perform technology assessments for the Science & TechnologyIntelligence (S&TI) program.3

ACKNOWLEDGMENTSThis work was supported by the Laboratory Directed Research and Development program atSandia National Laboratories. Sandia is a multiprogram laboratory operated by SandiaCorporation, a Lockheed Martin Company, for the United States Department of Energy’sNational Nuclear Security Administration under Contract DE-AC04-94AL85000.4

CONTENTS1.Introduction . 112.Building a Tool: Self-Organizing Maps . 13Data Pre-Processing . 13Training . 15Metrics . 18Visualization . 192.1.2.2.2.3.2.4.3.3.1.3.2.3.3.3.4.Building a Toolkit . 21Sandia-Developed Tools . 213.1.1 Stanley-Based Tools . 213.1.2 Titan-Based Tools . 23Oak Ridge-Developed Tools. 29COTS (Commercial Off The Shelf) Tools. 313.3.1 COTS Analysis and/or Visualization Tools . 313.1.2 COTS Support Tools . 42Open Source Tools . 503.4.1 Gephi . 503.4.2 KNIME . 503.4.3 ORA. 504.Future Work . 535.Conclusions . 556.References . 57Distribution . 595

EQUATIONSEquation 1. Calculate the Distance Between an Input Vector and a Node’s Weight Vector. . 17Equation 2. Calculate the BMU’s Neighborhood Size. . 17Equation 3. Adjust the Weights of the BMU and Its Neighbors. . 17Equation 4. Update the Learning Rate. . 18Equation 5. Calculate the SOM’s Average Quantization Error. . 18Equation 6. Calculate the SOM’s Average Topology Preservation Error. . 19Equation 7. Log-Entropy. . 21Equation 8. Cosine Similarity. . 21Equation 9. Term Frequency. 30Equation 10. Term Frequency-Inverse Document Frequency. . 30Equation 11. Term Frequency-Inverse Corpus Frequency. . 306

FIGURESFigure 1. CSV2SOM – main window. . 14Figure 2. CSV2SOM – raw data window. . 14Figure 3. CSV2SOM – pre-processed data window. . 15Figure 4. CSV2SOM – define data set window. . 15Figure 5. SOM PAK – typical commands for training a basic SOM. . 16Figure 6. SOM PAK – typical commands for training an optimized SOM. . 16Figure 7. SOM PAK – Umat plot. . 19Figure 8. SOM PAK – typical commands for visualizing a SOM. 20Figure 9. Data Trace Tool – main window. . 22Figure 10. LDRDView – main window. . 24Figure 11. P2 – the main window. . 25Figure 12. P2 – the Document Text view. . 25Figure 13. P2 – the Document Clusters view. . 26Figure 14. P2 – the Corpus Map window (tree-ring layout). . 26Figure 15. P2 – the Corpus Map window (“force-directed” graph layout). . 27Figure 16. P2 – the Entities view. . 27Figure 17. P2 – the Hotlist view. . 28Figure 18. P2 – the Hotlist Map view of entity-to-document relations. . 28Figure 19. ThreatView – main window. . 29Figure 20. Piranha – plot of clustered documents. 31Figure 21. dtSearch – start-up window. . 33Figure 22. dtSearch – creating an index. 33Figure 23. dtSearch – a simple search. . 34Figure 24. dtSearch – search terms highlighted in context. . 34Figure 25. dtSearch – a complex search. . 35Figure 26. dtSearch – results of a complex query. . 35Figure 27. Analyst's Notebook – a graph showing relationships between Osama Bin Laden andthe 9/11 attackers. . 36Figure 28. Analyst's Notebook – a theme line showing events ordered by time. . 37Figure 29. TextChart – text document window. . 38Figure 30. Google Trends – “metamaterials.” . 40Figure 31. Google Insights for Search – "metamaterials." . 41Figure 32. Beyond Compare – home view. . 42Figure 33. Beyond Compare – comparing folder contents. . 43Figure 34. Beyond Compare – text file comparison. . 44Figure 35. Beyond Compare – synchronizing folders. . 44Figure 36. Beyond Compare – comparing binary files (the data is displayed in hexadecimalformat). 45Figure 37. Beyond Compare – comparing data files. . 45Figure 38. Beyond Compare – comparing image files. . 46Figure 39. Camtasia Studio – edit window. . 47Figure 40. MindManager – main window. . 48Figure 41. MindView – main window. . 48Figure 42. SnagIt – main window. . 497

Figure 43. SnagIt – editor window. . 49Figure 44. ORA – main window. . 51Figure 45. ORA – a network visualization. . 51Figure 46. ORA – results of Newman's community finding algorithm. . 528

KS&TISNASNLSOMSTANLEYTF-ICFTF-IDFVTKWWWApplication Programming InterfaceBest Matching UnitCommercial Off The ShelfComma Separated valueDepartment of EnergyData Trace ToolGraphical User InterfaceHigh Performance ComputingLaboratory Directed Research and DevelopmentLatent Semantic AnalysisNamed Entity RecognitionNatural Language ToolkitScience & Technology IntelligenceSocial Network AnalysisSandia National LaboratoriesSelf-Organizing MapSandia Text AnaLysis Extensible LibrarYTerm Frequency-Inverse Corpus FrequencyTerm frequency-Inverse Document FrequencyVisualization ToolKitWorld Wide Web9

1. INTRODUCTIONThe detection of a scientific or technological surprise within a secretive country or institute isvery difficult. The ability to detect such surprises would allow analysts to identify thecapabilities that could be a military or economic threat to our national security. Sandia’s currentapproach utilizing ThreatView has been successful in revealing potential technological surprises.However, ThreatView has limitations.ThreatView presents data visually, which allows analysts to identify trends, patterns, andrelationships that otherwise are very difficult to detect. However, this detection is dependentupon the analyst: some analysts see the patterns; some analysts miss the patterns (falsenegatives); and still other analysts see patterns that are not real (false positives). In addition,ThreatView uses a single algorithm (LSA) to cluster the data set. There is no way to compare itsresults to an alternative clustering or to measure the quality of the clustering. We have addressedthese limitations by developing a data mining toolkit, which can be used independently or as anextension to ThreatView.As data sets become larger, it becomes critical to use algorithms as filters along with thevisualization environments. Our toolkit provides a suite of algorithms to filter the data so thatanalysts are presented with less, but more relevant, data increasing the chance of detecting ascientific or technological surprise.11

2. BUILDING A TOOL: SELF-ORGANIZING MAPSOur first effort was to build a tool to extend ThreatView and improve our understanding of theissues involved in working with textual data sets. We chose to implement a Self-Organizing Map(SOM).The self-organizing map (SOM) is a type of artificial neural network first described by ProfessorTeuvo Kohonen of the Helsinki University of Technology, Laboratory of Computer andInformation Science, Neural Networks Research Centre, in the early 1980s. The SOM provides away of representing multidimensional data in a two-dimensional space, while maintaining thedata's topological relationships. SOMs are frequently used as visualization aids. They can makeit easy for us to see relationships between vast amounts of multidimensional data. SOMs havebeen successfully used in many applications, including: speech recognition (Kohonen’s originalarea of research); bibliographic classification; image browsing systems; medical diagnosis;seismic data interpretation; data compression; and, environmental modeling.SOMs have many advantages. They are easy to understand (especially compared to most otherneural network architectures). They work very well on a large number of problem classes andthey are adaptive – they cannot be over-trained.There are, however, some disadvantages to SOMs. It can be hard to get the “right” data: Youmust have a value for every dimension of every input vector. Every SOM is different and findsdifferent similarities in the data. In the final map, every vector is surrounded by similar vectors;however, similar vectors are not always near each other. And, especially during training, SOMsare computationally expensive.2.1. Data Pre-ProcessingThe data for this application is records of scientific and technical articles. The data is provided asa Microsoft Excel CSV-format file. Most of the fields consist of natural language text. This textmust be pre-processed into a form (numeric) that is usable by the self-organizing map (SOM).The pre-processor, called CSV2SOM, was written in the Python scripting language. It reads therecords from the CSV-format data file and allows the user to generate a set of training andtesting data for the SOM. The graphical user interface (GUI) is built with the wxPython toolkit(wxPython is a wrapper for the wxWidgets cross-platform GUI API, which is written in C ).The natural language text is processed using the Natural Language Toolkit (NLTK). See Figure1, Figure 2, Figure 3, and Figure 4 for screen shots of the pre-processor.NLTK provides the user with information about the data set. For each field, the NLTK parses thedata to determine the number of empty records, the number of tokens, the number of uniquewords, the diversity score, the number of common words, and the number of unusual words.13

Figure 1. CSV2SOM – main window.Figure 2. CSV2SOM – raw data window.14

Figure 3. CSV2SOM – pre-processed data window.Figure 4. CSV2SOM – define data set window.To convert the data to a form usable by the SOM, the NLTK allows the user to: reduce the wordsto their head words; reduce the words to their stems (using the Lancaster or Porter stemmers);remove stop words (of, the, etc.); remove common words; and/or remove unusual words. Inaddition, the user can choose to remove high frequency words and/or low frequency words.Finally, the user can specify the percentage of the data set to be used for training the SOM(empty records are ignored); the remainder of the data set is automatically generated for testingthe SOM. In the training and testing data sets, each record is represented as a vector ofdimension n, where each component represents the Term Frequency-Inverse DocumentFrequency (TF-IDF) of the associated word.2.2. TrainingFor this application, we used the public-domain SOM PAK software package. SOM PAK iswritten in C and is provided by the Helsinki University of Technology, Laboratory of15

Computer and Information Science, N

A Toolkit for Detecting Technical Surprise Michael W. Trahan, Mark C. Foehse Prepared by Sandia National Laboratories Albuquerque, New Mexico 87185 and Livermore, California 94550 Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned

Related Documents:

Bruksanvisning för bilstereo Bruksanvisning for bilstereo ... - Jula

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

372 Views

1y ago

10 tips och tricks för att lyckas med ert sap-projekt

10 tips och tricks för att lyckas med ert sap-projekt 20 SAPSANYTT 2/2015 De flesta projektledare känner säkert till Cobb’s paradox. Martin Cobb verkade som CIO för sekretariatet för Treasury Board of Canada 1995 då han ställde frågan

733 Views

2y ago

Nordens 25 största medieföretag efter omsättning

service i Norge och Finland drivs inom ramen för ett enskilt företag (NRK. 1 och Yleisradio), fin ns det i Sverige tre: Ett för tv (Sveriges Television , SVT ), ett för radio (Sveriges Radio , SR ) och ett för utbildnings program (Sveriges Utbildningsradio, UR, vilket till följd av sin begränsade storlek inte återfinns bland de 25 största

329 Views

1y ago

SS 02 52 68 Ljudklassning av utrymmen i byggnader - byggtjanst.se

Hotell För hotell anges de tre klasserna A/B, C och D. Det betyder att den "normala" standarden C är acceptabel men att motiven för en högre standard är starka. Ljudklass C motsvarar de tidigare normkraven för hotell, ljudklass A/B motsvarar kraven för moderna hotell med hög standard och ljudklass D kan användas vid

354 Views

1y ago

Apple Developer Program License Agreement (Swedish)

LÄS NOGGRANT FÖLJANDE VILLKOR FÖR APPLE DEVELOPER PROGRAM LICENCE . Apple Developer Program License Agreement Syfte Du vill använda Apple-mjukvara (enligt definitionen nedan) för att utveckla en eller flera Applikationer (enligt definitionen nedan) för Apple-märkta produkter. . Applikationer som utvecklas för iOS-produkter, Apple .

342 Views

1y ago

Professionella 4-tums etikett skrivare av bordsmodell

och krav. Maskinerna skriver ut upp till fyra tum breda etiketter med direkt termoteknik och termotransferteknik och är lämpliga för en lång rad användningsområden på vertikala marknader. TD-seriens professionella etikettskrivare för . skrivbordet. Brothers nya avancerade 4-tums etikettskrivare för skrivbordet är effektiva och enkla att

517 Views

2y ago

Boksamtal - DiVA portal

Den kanadensiska språkvetaren Jim Cummins har visat i sin forskning från år 1979 att det kan ta 1 till 3 år för att lära sig ett vardagsspråk och mellan 5 till 7 år för att behärska ett akademiskt språk.4 Han införde två begrepp för att beskriva elevernas språkliga kompetens: BI

518 Views

2y ago

CASTROL-OLJA FÖR KOMMERSIELLA FORDON - ILS Nordic

**Godkänd av MAN för upp till 120 000 km och Mercedes Benz, Volvo och Renault för upp till 100 000 km i enlighet med deras specifikationer. Faktiskt oljebyte beror på motortyp, körförhållanden, servicehistorik, OBD och bränslekvalitet. Se alltid tillverkarens instruktionsbok. Art.Nr. 159CAC Art.Nr. 159CAA Art.Nr. 159CAB Art.Nr. 217B1B

187 Views

1y ago

Recent Views

Grammar as a Foreign Language - List of Proceedings

Grammar as a Foreign Language Oriol Vinyals Google vinyals@google.com Lukasz Kaiser Google lukaszkaiser@google.com Terry Koo Google terrykoo@google.com Slav Petrov Google slav@google.com Ilya Sutskever Google ilyasu@google.com Geoffrey Hinton Google geoffhinton@google.com Abstract Synta

2y ago

445 Views

Attention is All you Need - NIPS

Google Brain avaswani@google.com Noam Shazeer Google Brain noam@google.com Niki Parmar Google Research nikip@google.com Jakob Uszkoreit Google Research usz@google.com Llion Jones Google Research llion@google.com Aidan N. Gomezy University of Toronto aidan@cs.toronto.edu Łukasz Kaiser Google Brain lukaszkaiser@google.com Illia Polosukhinz illia .

1y ago

303 Views

GSA Implementation of Google (G) Suite

Google Meet Classic Hangouts Google Chat Google Calendar Google Drive and Shared Drive Google Docs Google Sheets Google Slides Google Forms Google Sites Google Keep Apps Script D

2y ago

316 Views

Google Drive (Google Docs, Google Sheets, Google Slides)

Google Drive (Google Docs, Google Sheets, Google Slides) Employees are automatically issued a Kyrene Google account. Navigate to drive.google.com. Use Kyrene email address and network password to login. Launch in Chrome browser for best experience. Google Drive is a cloud storage sys

2y ago

388 Views

Quick Guide of Using Google Home to Control Smart Devices

Configuration needs Google Home app. Search "Google Home" in App Store or Google Play to install the app. 3.1 Set up Google Home with Google Home app You can skip this part if your Google Home is already set up. 1. Make sure your Google Home is energized. 2. Open the Google Home app by tapping the app icon on your mobile device. 3.

1y ago

326 Views

Elaboração de Provas Online usando o Formulário Google Docs

2 Após o login acesse o Google Drive ou o Google Docs e selecione a ferramenta Google Forms (Formulários). Clique na caixa de Ferramentas do Google, localizada no canto direito superior da tela e selecione o Google Drive. Na tela do Google Drive clique em New , opção More e selecione Google Forms. OBS: É possível acessar o google

10m ago

123 Views

ACS WASC Templates

File upload, Folder upload, Google Docs, Google Sheets, or Google Slides. You can also create Google Forms, Google Drawings, Google My Maps, etc. Share with exactly who you want — without email attachments. Search or sort your list of files, folders, and Google Docs. Preview files and Google Docs.

2y ago

366 Views

Google Drive - San Bernardino City Unified School District

Google Apps All of the Google applications that are available upon logging into Google.com (G , Gmail, Gphotos, Gdrive, etc.). Google Suite Google’s online cloud based office companion applications (Docs, Sheets, Slides). Google Drive Google’s online cloud storage and file sharing/collaboration application.

2y ago

378 Views

Single Sign On for Google Apps with NetScaler Unified Gateway

Google Apps for Work is a suite of cloud computing productivity and collaboration applications provided by Google on a subscription basis. It includes Google’s popular web applications including Gmail, Google Drive, Google Hangouts, Google Calendar and Google

2y ago

295 Views

Serviceteil

Google 84, 87, 124 Google 110 Google AdWords 101, 103 Google Alerts 127 Google Analytics 89 Google Maps 100, 110, 173 Google-Maps 63 Google Places 100, 103, 124 Graphiken 66 H Haftung 170 Haftungsausschluss 72 Hausfarbe 11 Headline 35 Heilmittelwerbegesetz 14, 69, 163 Heilversprechen 164 HONcode 78 HTML 58 HWG 31 I Imagefilm 31

2y ago

336 Views

Best practices for managing identities when you move to Google Cloud

Google Cloud. To provide t he informat ion an organizat ion would ne e d to transfer data and ownership from one Google Account to anot her for s ome of t he noncore Google s er vice s, such as Google Ads, Google Analyt ics, or DV360. Intende d audience Organizat ion administrators. Sta planning Google Cloud / Google Wor kspace migrat ion. Key .

1y ago

481 Views

MANAGERIAL FINANCE - GBV

of Managerial Finance page 2 Introduction to Managerial Finance 1 Starbucks—A Taste for Growth page 3 1.1 Finance and Business What Is Finance? 4 Major Areas and Opportunities in Finance 4 Legal Forms of Business Organization 5 Why Study Managerial Finance? Review Questions 9 1.2 The Managerial Finance Function 9 Organization of the Finance

3y ago

6.8K Views

Chapter 1 The roles of finance function in organisations

The roles of the finance function in organisations 4. The role of ethics in the role of the finance function Ethics is the system of moral principles that examines the concept of right and wrong. Ethics underpins an organisation’s sustained value creation. The roles that the finance function performs should be carried out in an .File Size: 888KBPage Count: 10Explore furtherRole of the Finance Function in the Financial Management .www.managementstudyguide.c Roles and Responsibilities of a Finance Department in a .www.pharmapproach.comRoles and Responsibilities of a Finance Department .www.smythecpa.comTop 10 – Functions of Business Finance in an om23 Functions and Duties of Accounting and Finance nded to you b

1y ago

335 Views

2013 National Senior Games presented by Humana Medal

3 martin cherie ann canada track & field 2 martin cherieann canada track & field 3 rossi elsie canada track & field 1 stuart pam canada track & field 2 stuart pam canada track & field 3 stuart pam canada track & field 1 stuart pam canada track & field 1 sleepers canada volleyball 3 volleyhawks canada volleyball 1 horiuchi kumi co archery

2y ago

176 Views

International Registered and Reporting Companies .

Dorel Industries Inc. Canada GLOBAL MKT Draxis Health Inc. Canada GLOBAL MKT Dundee Corp. Canada OTC DynaMotive Energy Systems Corp. Canada OTC Eiger Technology Inc. Canada OTC El Nino Ventures, Inc. Canada OTC Eldorado Gold Corp. Canada AMEX Elephant & Castle Group, Inc. Canada OTC Emgold Mining Corp. Canada OTC

1y ago

112 Views

A Toolkit For Detecting Technical Surprise

It looks like you're using an ad-blocker