California Institute Of Technology NASA/JPL: Methodology .

3y ago
17 Views
3 Downloads
6.47 MB
36 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Julia Hutchens
Transcription

Big Data and Data Science atNASA/JPL: Methodology Transfer From SpaceScience to BiomedicineJet Propulsion LaboratoryCalifornia Institute of TechnologyDaniel CrichtonLeader, Center for Data Science and TechnologyProj. Manager, Planetary Data System EngineeringProg. Manager, Data Science OfficeDr. Richard DoyleProg. Manager, Information and Data ScienceProj. Manager, High Performance Spaceflight Computingleaving thesafe harborto exploreuncharted watersJet Propulsion LaboratoryCalifornia Institute of TechnologyMarch 6, 2017 2017 California1Institute of Technology.Government sponsorship acknowledged.

Jet Propulsion LaboratoryCalifornia Institute of Technology ContextJPL is involved in the research anddevelopment of technologies,methodologies in science, missionoperations, engineering, and other nonNASA applications.– Includes onboard computing to scalablearchives to analytics JPL and Caltech formed a joint initiativein Data Science and Technology tosupport fundamental research all the wayto operational systems.– Methodology transfer across applicationsis a major goal.2

Jet Propulsion LaboratoryCalifornia Institute of TechnologyTerms: Big Data and Data ScienceBig Data When needs for data collection, processing,management and analysis go beyond the capacity andcapability of available methods and software systemsData Science Scalable architectural approaches, techniques,software and algorithms which alter the paradigm bywhich data is collected, managed and analyzed3

Jet Propulsion LaboratoryCalifornia Institute of TechnologyU.S. National Research Council Report:Frontiers in the Analysis of Massive Data Chartered in 2010 by the U.S. NationalResearch Council, National AcademiesChaired by Michael Jordan, Berkeley, AMPLab (Algorithms, Machines, People)NASA/JPL served on the committeecovering systems architecture for big datamanagement and analysisImportance of more systematicapproaches for analysis of dataNeed for end-to-end data lifecycle: frompoint of capture to analysisIntegration of multiple discipline expertsApplication of novel statistical and machinelearning approaches for data discovery20134

Jet Propulsion LaboratoryCalifornia Institute of TechnologyNASA Science and Big Data TodayHow do these connect?CommNetworkBig DataInfrastructure(Data, Algorithms,Machines)?EOSDIS DAACEOSDIS DAACFocus on generating, capturing, managing big dataFocus on using/analyzingbig data5

Jet Propulsion LaboratoryCalifornia Institute of Technology JPL Data Science Working GroupEstablished in 2014 to explore big data use cases and challenges inscience and to make a recommendation to JPL senior management.– Launched internal investments: planetary science (onboard agile science),earth science (distributed data analytics), and astronomy (machinelearning and data collection methods).– Engaged cross disciplinary expertise (science, computer science –systems and machine learning, statistics, program management)– Partnered with Caltech to bring in research perspectives. In November 2016 a chartered Data Science WG reporting to JPL’sLeadership Management Council (LMC), chaired by Deputy DirectorLarry James, was established in data science covering all aspects ofthe Lab operations.6

JPL Data Science StrategyJet Propulsion LaboratoryCalifornia Institute of TechnologyGuiding PrinciplesAgile Science – Onboard AnalysisChallenge:Too much data, too fast;cannot transport dataefficiently enoughPerform original processing at the sensor / instrumentExtreme Data Volumes – DataTriageDistributed Data AnalyticsChallenge: Data collec oncapacity at the instrumentoutstrips data transport anddata storage capacityChallenge: Datadistributed in massivearchives; manydifferent types ofmeasurementsFuture Solu ons: Dynamicarchitectures to scale dataprocessing and triageexascale data streamsFuture Solu ons:Distributed dataanaly cs; uncertaintyquan fica onData ArchitectureData LifecycleFuture Solutions:Onboard computationand data scienceData Genera onMake choices at the collec on point about which data to keepData TriageImprove resource efficiencies to enable moving the most dataData TransportAn cipate the need to work across mul ple data sourcesData Cura onIncrease compu ng availability at the data to generate productsData ProcessingIncrease the scale and integra on of distributed archivesData ArchivingApply visualiza on techniques to enable data understandingData Visualiza onApply machine learning and sta s cs to enable data understandingData MiningCreate analy cs services effec ve across massive, distributed dataData Analy csOn DemandAlgorithmsScience TeamsResearchMassiveData ScienceInfrastructure(Compute, Storage,Data, Software)Data DrivenAnaly csData StewardshipTodayOther data systems,(in situ, models, etc.)Applica onsDecisionSupportData Analy csFutureData EcosystemCross-Cutting7

Jet Propulsion LaboratoryCalifornia Institute of TechnologyData Lifecycle Modelfor NASA Space MissionsEmerging Solutions Onboard DataProducts Onboard DataPrioritization FlightComputing(1) Too much data, too fast;cannot transport dataefficiently enough to storeObservational Platforms/Flight ComputingEmerging Solutions Low-Power DigitalSignal Processing Data Triage Exa-scaleComputing(2) Data collection capacity at theinstrument continually outstrips datatransport (downlink) capacityGround-based Mission SystemsMassive Data Archives andBig Data AnalyticsEmerging Solutions Distributed DataAnalytics Advanced Data ScienceMethods Scalable Computationand Storage(3) Data distributed in massivearchives; many different types ofmeasurements and observations

Jet Propulsion LaboratoryCalifornia Institute of TechnologyCross-Cutting CapabilitiesOriginal ImageBioinformaticsClientsand ToolsAccess, t DetectionGenerativeP. Process FittingTMA AnnotatorTMA ClassifierSearch, Present/Visualize,DistributeData and Services Access and DistributonPublicScience stigators/NCI colDatabasesSemanticRelationshipsExternalData andServicesLaboratoryData Processing& stems(OtherDatabases,Repositories,Services,etc)TMA EstimatorCommon Data ElementsData Infrastructure (Data Storage, Computation, Apache OODT,Apache Hadoop, Apache SOLR)International DataArchive and SharingArchitecturesBig DataInfrastructures(from open source to cloudcomputing and scalablecompute infrastructures)PlanetaryAnalytical DataPipelinesEstimate the Stainingon a whole spotBiologyDetect nuclei on awhole spotClassify single nuclei intotumor, non-tumor andstained, not-stainedIntelligent DataAlgorithms(Machine Learning,Deep Learning)Common DataElements &Information Models(discipline and common)HealthcareGreat Opportunities forMethodology Transfer and Collaboration 9VisualizationTechniques

Future of Data Science at NASAJet Propulsion LaboratoryCalifornia Institute of TechnologyEnabling a Big Data Research EnvironmentCommNetworkInstrumentDataSystemsBig DataInfrastructure(Data, ta CaptureOther datasystems (insitu, otheragency, etc.)(Water, Ocean,CO2, ExtremeEvents, Mars, etc.)Data AnalysisReducing Data Wrangling: “There is a major need for the development of software components that linkhigh-level data analysis-specifications with low-level distributed systems architectures.”Frontiers in the Analysis of Massive Data, National Research Council, 2013.

Opportunities and Use Case Acrossthe Ground EnvironmentJet Propulsion LaboratoryCalifornia Institute of TechnologyIntelligent Ground StationsEmerging Solutions Anomaly Detection Combining DSN &Mission Data Attention Focusing Controlling FalsePositivesIntelligent Archives and Knowledge-basesEmerging Solutions Automated MachineLearning - FeatureExtraction Intelligent Search Learning over time Integration ofdisparate dataTechnologies: Machine Learning, Deep Learning, Intelligent Search, Data Integration,Interactive Visualization and AnalyticsIntelligent MOS-GDSEmerging Solutions Anomaly Interpretation Dashboard for TimeSeries Data Time-Scalable DecisionSupport Operator TrainingData Analytics and Decision SupportEmerging Solutions Interactive Data Analytics Cost Analysis ofComputation UncertaintyQuantification Error Detection in DataCollection

Jet Propulsion LaboratoryCalifornia Institute of Technology2015-2016 AIST Big Data Study Study led by JPL for the NASA Advanced Information SystemsTechnology Program (under Mike Little) Mapped technology and data needs against the mission-sciencedata lifecycle Focuses on expansion from data stewardship to data use across thevast data ecosystem (satellite, airborne, in situ) Basis for 2015 IEEE Big Data workshop on Data and ComputationalChallenges in Earth Science Research Key input for 2016 ROSES AIST call (per Mike Little, NASA PM forAIST)12

Jet Propulsion LaboratoryCalifornia Institute of TechnologyAIST Big Data Study: 10 YearCapability Needs in Big Data13Derived from AIST Big Data Study & NASA Office of the Chief Technologist TA-11 Roadmap (2015)

Jet Propulsion LaboratoryCalifornia Institute of TechnologyPlanetary Data System Purpose: To collect, archive and makeaccessible digital data and documentationproduced from NASA’s exploration of the solarsystem from the 1960s to the present. Infrastructure: A highly distributed infrastructurewith planetary science data repositoriesimplemented at major government labs andacademic institutions System driven by a well defined planetaryscience information model Over 1 PB of data Movement towards internationalinteroperability Distributed federation of US nodes andinternational archives Being realized through PDS41414

Jet Propulsion LaboratoryCalifornia Institute of Technology(Some) Big Data Challenges inPlanetary Science Variety of planetary science disciplines, moving targets, and dataVolume of data returned from missions including provenanceFederation of disciplines and international interoperability These factors can affect choices in:––––––Data ConsistencyData StorageComputationMovement of DataData DiscoveryData DistributionUltimately, having a planetary science information architectural strategythat can scale to support the size, distribution, and heterogeneity of thedata is criticalA well formed model that drives the software is something15that many groups have struggled with!

PDS4: International Adoption of anOpen Planetary ApproachJet Propulsion LaboratoryCalifornia Institute of TechnologyLADEE (NASA)InSight (NASA)MAVEN (NASA)BepiColombo (ESA/JAXA)Osiris-REx (NASA) ExoMarsJUICE(ESA/Russia)(ESA) also Hayabussa-2, Chandrayaan-216Mars 2020 (NASA)Endorsed by the International Planetary Data Alliance in July 2012 ons

Jet Propulsion LaboratoryCalifornia Institute of TechnologyLunar Mapping and Modeling Portal asData Analytics and VisualizationPDS & other DataSystemsStandard I/F ProtocolStandard Data ModelData Management & AccessData Storage, Data ProcessingData GenerationWorkflow, Web PortalKnowledge BaseTools / ServicesVisualizationDistribution/MovementDS ToolsDS ToolsDS ToolsBuilt on PDS4E. Law, S. Malhotra, G. Chang17

Pacific NorthwestGreat BasinUpper ColoradoCaliforniaLower ColoradoObservationsCoupled and ValidatedComputer Models40330(Notional)202110California Institute of Technologye.g., CA Total Usable Freshwater (million acre-feet)Jet Propulsion LaboratoryWestern States Water Mission –Understanding Water Availability1week4(Prospective customers)Colorado River BasinStakeholders and Customers12 January 2016WSWM31month1seasonLead TimeEstimates withUncertainties1year

Western States Water Mission (WSWM): AScience/Data Science CollaborationResearchDecision SupportJet Propulsion LaboratoryCalifornia Institute of TechnologyApplicationsData ScienceInfrastructure(Tools, Services,Methods forMassive DataAnalysis)(Web-Based Interface)Standard ReportsAd Hoc Queries and Custom ReportsSingle-Month EstimatesSnow-Water EquivalentInput-Forcing(e.g., GPM)Short and Long-Term TrendsSurface WaterA Scalable DataProcessingSystem forHydrologicalScienceGround WaterFor Data Assimilation(e.g., MODSCAG)19

Jet Propulsion LaboratoryWaterTrekCalifornia Institute of TechnologyUserDefinedPolygonGPSIn-Situ: Stream GageSensorsRiver Network SAR derived SubsidenceModel OutputSoil MoistureFusing In-situ, Air-borne, Space-borne and model generateddata using visualization and a big data20analytics engine

Methodology Transfer inData Science fromPlanetary & Earth toBiomedicine

NASA/JPL Informatics Center:Jet Propulsion LaboratoryCrossing Disciplines to Support ScientificResearchCalifornia Institute of Technology Development of an advancedKnowledge System to capture, sharand support reproducible analysisfor biomarker research Genomics, Proteomics, Imaging, etcdata types of data NASA-NCI partnership, leveraginginformatics and data sciencetechnologies from planetary andEarth science Reproducible, Big Data Systems forexploring the universeSoftware and data science methodologytransferPresented informatics collaboration at acongressional briefing in October 201522

Jet Propulsion LaboratoryCalifornia Institute of Technology Early Detection Research Network:Finding Cancer BiomarkersA comprehensive infrastructure to supportbiomarker data management acrossEDRN’s distributed cancer centers– A national data sharing architecture– Data Integration– Information model for cancer biomarkersfollowing the PDS4 approach– Development of data analytic pipelines– Shared open source software capabilities Integration of data across the EDRN(biomarkers, specimens, protocols,biomarker data, publications) including:– Data from over 100 research labs; multipleorgans– 800 data elements– 900 biomarkers captured– 200 protocols of study– 1500 publications– Multiple terabytes of data from biomarkerstudieshttp://cancer.gov/edrn23

Example of Data Science CapabilitiesJet Propulsion Laboratoryin Cancer Research from NASACalifornia Institute of TechnologyOverall ArchitectureSpecimens Curation of datafrom studies, otherscience data, etc.(collaborators)Local LaboratoriesCBRG funded labs (EDRN, MCL, alScienceCommunityPublish Data SetsScientific ResultsInstrumentOperationsScience DataProcessing Automated pipelines Complex workflows Scaleable algorithms Computational -omics Auto feature detection Auto curationAnalysis Team Local algorithmicprocessingBig Data Scaleable computation Biology infrastructur e Cloud, HPC, etc.BioinformaticsCommunityBioinformaticsToolsBig Data OODT Tika Hadoop Solr On-demand algorithms Data fusion methods Machine-learningJPL Dartmouth Caltech24

Jet Propulsion LaboratoryCalifornia Institute of TechnologyDescription: Detecting objects from astronomical measurements by evaluating lightmeasurements in pixels using intelligent software algorithms.Image Credit: Catalina Sky Survey (CSS), of the Lunar and Planetary Laboratory, Universityof Arizona, and Catalina Realtime Transient Survey (CRTS), Center for Data-Driven Discovery,25Caltech.

Jet Propulsion LaboratoryCalifornia Institute of TechnologyDescription: Detecting objects from oncology images using intelligent softwarealgorithms transferred to and from space science.Image Credit: EDRN Lung Specimen Pathology image example, Universityof Colorado26

Jet Propulsion LaboratoryCalifornia Institute of TechnologySep 22, 201627

Jet Propulsion LaboratoryCalifornia Institute of TechnologyOther PartnershipsDOE/ESGF, L. Cinquini, JPLDARPA/Memex, C. Mattmann, JPLSPAWAR/Data Science for C4CSI,L. Deforrest, JPLNSF/DIBBS, A Talukder, UNC, G. Djorgovski, Caltech,D. Crichton, JPL28

Driving Forward

Jet Propulsion LaboratoryCalifornia Institute of TechnologyCaltech-JPLPartnership in Data Science Center for Data-Driven Discovery on campus/Center for DataScience and Technology at JPL From basic research to deployed systems 10 collaborations– Leveraged funding from JPL to Caltech; from Caltech to JPL Virtual Summer School (2014) has seen over 25,000 students30

Jet Propulsion LaboratoryCalifornia Institute of TechnologyExample University Partnerships31

Jet Propulsion LaboratoryCalifornia Institute of Technology RecommendationsUse the Mission-Science Data Lifecycle toorganize Big Data at NASA.– From flight computing to data analytics. Enable use and data analytics for thecommunity.What do we do with all this data?– Promote data ecosystems for sharing data.– Support international partnerships. Explore opportunities for methodology transfer.– Across SMD– With other agencies– Focused around open source Establish multi-disciplinary teams betweenscience/discipline experts, computerscience/data science.This is looking like a black hole –but wait, there’s light at the end of the tunnel!32

Jet Propulsion LaboratoryCalifornia Institute of Technology ReferencesFrontiers on Massive Data Analysis, NRC, 2013NASA OCT Technology Roadmap, NASA, 2015NASA AIST Big Data Study, NASA/JPL 2016IEEE Big Data Conference, Data and Computational Science BigData Challenges for Earth Science Research, IEEE, 2015IEEE Big Data Conference, Data and Computational Science BigData Challenges for Earth and Planetary Science Research, IEEE,2016Planetary Science Informatics and Data Analytics Conference, April201833

Questions?

Jet Propulsion LaboratoryCalifornia Institute of Technology The Role of Open Source in BigData InfrastructuresOpen source is an excellent vehicle for collaborations in big dataacross the science community– Great opportunities for sharing software frameworks and tools JPL has been involved in the Apache Software Foundation forseveral years and helped launch Apache in Science.– JPLers are committers on several Apache projects35

Jet Propulsion LaboratoryCalifornia Institute of TechnologyCommon Big Data Challenges Defining the data lifecycle for different domains in science, engineering, business Capturing well-architected and curated data repositories Enabling access and integration of highly distributed, heterogeneous data Developing novel statistical approaches for data preparation, integration and fusion Supporting analysis and computation across highly distributed data environments andsilos Developing mechanisms for identifying and extracting interesting features andpatterns Developing methodologies for validating and comparing predictive models vs.measurements Methods for visualizing massive dataSPACE TECHNOLOGY RESEARCH GRANTS PROGRAM, Feb 201736

earth science (distributed data analytics), and astronomy (machine learning and data collection methods). – Engaged cross disciplinary expertise (science, computer science – systems and machine learning, statistics, program management) – Partnered with Caltech to bring in research perspectives.

Related Documents:

2016 nasa 0 29 nasa-std-8739.4 rev a cha workmanship standard for crimping, interconnecting cables, harnesses, and wiring 2016 nasa 0 30 nasa-hdbk-4008 w/chg 1 programmable logic devices (pld) handbook 2016 nasa 0 31 nasa-std-6016 rev a standard materials and processes requirements for spacecraft 2016 nasa 0 32

Jan 10, 2012 · The NASA STI program provides access to the NASA Aeronautics and Space Database and its public interface, the NASA Technical Report Server, thus providing one of the largest collections of aeronautical and space science STI in the world. Results are published in both non-NASA channels and by NASA in the NASA

The NASA STI program provides access to the NASA Aeronautics and Space Database and its public interface, the NASA Technical Report Server, thus providing one of the largest collections of aero-nautical and space science STI in the world. Results are published in both non-NASA channels and by NASA in the NASA

The NASA STI program provides access to the NASA Aeronautics and Space Database and its public interface, the NASA Technical Report Server, thus providing one of the largest collections of aero-nautical and space science STI in the world. Results are published in both non-NASA channels and by NASA in the NASA STI Report Series, which includes

The NASA STI program provides access to the NASA Aeronautics and Space Database and its public interface, the NASA Technical Report Server, thus providing one of the largest collections of aero-nautical and space science STI in the world. Results are published in both non-NASA channels and by NASA in the NASA STI Report Series, which includes

6 SCIENCE RESOURCES NASA Earth Observations https://neo.sci.gsfc.nasa.gov NASA Climate https://climate.nasa.gov NASA Goddard Institute for Space Studies

provides access to the NASA Aeronautics and Space Database and its public interface, the NASA Technical Report Server, thus providing one of the largest collections of aeronautical and space science STI in the world. Results are published in both non-NASA channels and by NASA in the NASA STI Report Series, which includes the following report types:Cited by: 11Page Count: 37File Size: 408KBAuthor: Corneli

The NASA STI program provides access to the NTRS Registered and its public interface, the NASA Technical Reports Server, thus providing one of the largest collections of aeronautical and space science STI in the world. Results are published in both non-NASA channels and by NASA in the NASA STI Report Series, which includes the following report