ERDC/ITL SR-19-2ERDC Technology Transfer and Infusion/Knowledge ManagementGoogle Search Appliance End of Life andReplacement RecommendationsInformation Technology LaboratoryByron M. Garton, Jonathan S. Broderick,and Brandon K. RandleApproved for public release; distribution is unlimited.June 2019

ERDC Technology Transfer andInfusion/Knowledge ManagementERDC/ITL SR-19-2June 2019Google Search Appliance End of Life andReplacement RecommendationsByron M. Garton, Jonathan S. Broderick, and Brandon K. RandleInformation Technology LaboratoryU.S. Army Engineer Research and Development Center3909 Halls Ferry RoadVicksburg, MS 39180Final ReportApproved for public release; distribution is unlimited.Prepared forHeadquarters, U.S. Army Corps of EngineersWashington, DC 20314-1000

ERDC/ITL SR-19-2AbstractThe Engineering Research and Development Center (ERDC) KnowledgeManagement relies on a search technology provided by Google. Known asthe Google Search Appliance (GSA), it indexes and searches ERDC’saccumulation of knowledge stored on various web connected systems. TheGSA provides a familiar and simple to use interface to facilitate quicklocation and retrieval of ERDC knowledge stored on these systems that arelocated on ERDC’s internal and extra-net websites. In 2016, Googleannounced the discontinuation of the GSA product at the end of March2019.This document details the investigation into potential GSA replacementoptions and the recommended actions to be taken to minimize the impactof the March 2019 sunset. Emphasis is placed on equivalent or enhancedfeature sets, ease of installation and migration, and the costs associatedwith installation, migration, and maintenance.DISCLAIMER: The contents of this report are not to be used for advertising, publication, or promotional purposes.Citation of trade names does not constitute an official endorsement or approval of the use of such commercial products.All product names and trademarks cited are the property of their respective owners. The findings of this report are not tobe construed as an official Department of the Army position unless so designated by other authorized documents.DESTROY THIS REPORT WHEN NO LONGER NEEDED. DO NOT RETURN IT TO THE ORIGINATOR.ii

ERDC/ITL SR-19-2ContentsAbstract . iiFigures and Tables .ivPreface. vAcronyms and Abbreviations. vi1Introduction . . 1Objectives . 4Approach . 5Scope . 5Replacement Options . . 7Yippi . 8Lucidworks . 93Requirements Analysis . 124Cost Analysis . 145Conclusions and Recommendations . 15References . 16Report Documentation Pageiii

ERDC/ITL SR-19-2Figures and TablesFiguresFigure 1. The Discover ERDC landing page search box performs searches on the GSA. . 2Figure 2. Accessing the GSA directly is also a method of performing searches. . 3Figure 3. Search results are presented in the familiar Google search style. . 4Figure 4. Mindbreeze InSpire search appliance. . 7Figure 5. Yippy search appliance. . 9Figure 6. Lucidworks fusion server. . 10TablesTable 1. Level of satisfaction of requirements. . 12Table 2. Initial and recurring costs analysis. 14iv

ERDC/ITL SR-19-2PrefaceThis research was conducted for the Engineering Research andDevelopment Center (ERDC) Office of Research and Technology Transfer(ORTT) utilizing 219 CDR funding for “ERDC Technology Transfer andInfusion/Knowledge Management.” The technical monitor was Ms. AntisaWebb.The work was performed by the Scientific Software Branch (SSB) of theComputational Science and Engineering Division (CSED), U.S. ArmyEngineer Research and Development Center – Information TechnologyLaboratory (ERDC-ITL). At the time of publication, Mr. Timothy Dunawaywas Chief, CEERD-SSB; and Dr. Jerrell R. Ballard was Chief, CEERD-IES.The Technical Director for Engineered Resilient Systems wasDr. Robert M. Wallace. The Deputy Director of ERDC-ITL was Ms. Patti S.Duett, and the Director was Dr. David A. Horner.COL Ivan P. Beckman was the Commander of ERDC, and Dr. David W.Pittman was the Director.v

ERDC/ITL SR-19-2viAcronyms and AbbreviationsTermDefinitionAPIApplication Programming InterfaceCSEDComputational Science and Engineering DivisionDoDDepartment of DefenseDISADefense Information Systems AgencyeMASSEnterprise Mission Assurance Support ServiceERDCEngineer Research and Development CenterFTEFull Time EquivalentHPC-MPHigh Performance Computing Modernization ProgramITLInformation Technology LaboratoryORTTOffice of Research and Technology TransferPKIPublic Key InfrastructureRDEResearch and Development EnvironmentRMFRisk Management FrameworkSAMLSecurity Assertion Markup LanguageSSOSingle Sign OnSTIGSecurity Technical Implementation GuideYSAYippy Search Appliance

ERDC/ITL SR-19-21Introduction1.1BackgroundThe Engineer Research and Development Center (ERDC) has embarkedon several knowledge management initiatives over the years in an effort tomake the accumulation of knowledge easier to catalog, locate, and share.Knowledge cataloging and locating initiatives have been focused on acombination of technologies including enterprise search. Historically, theindustry leader in enterprise search has been Google. In 2002, Googleintroduced a rack-mounted computing device containing its prized searchtechnology to the enterprise search market. Deemed the Google SearchAppliance (GSA), the intention of this device was to put the power ofGoogle’s indexing and search technology in the hands of enterprisesaround the world.When purchasing a GSA from Google, a physical rack-mounted server isdelivered to customers to install within their enterprise network. Sales areconducted through licensing which typically includes a maximum numberof stored indexes and begins with a two year contract for maintenance,technical support, and software updates. Three year contracts are alsoavailable. Out of the box, the search appliance contains Google’subiquitous user facing search interface and highly acclaimed indexingsoftware. The search interface is customizable to better match theenterprise’s look and feel by allowing the changing of colors, logos, etc.Since its release, the GSA has been adopted by many private and publicindustries, including ERDC. The enticement of the GSA to ERDC was theability to index and search organizational data such as website content,documents, photos, videos, etc. within ERDC’s internal Research andDevelopment Environment (RDE) network. The GSA’s ability to performthese functions using Google’s world renowned web crawling and searchalgorithms and its simple integration into existing knowledgemanagement applications are its most appealing attributes.The GSA has been indexing and serving search results to ERDC intranetusers since 2010 when it was installed and integrated into the RDEnetwork. Thirty servers on the ERDC RDE network, including intranet,extranet, and knowledge management websites (e.g., internal and external1

ERDC/ITL SR-19-2wikis) are currently indexed by the GSA. Users access the GSA by typing asearch string into the search box at the top of the Discover ERDC landingpage (, or by directly accessing the search appliancehome page ( Figures 1 3 show these methods ofaccessing the GSA from a knowledge consumer’s prospective.The GSA has served ERDC knowledge management efforts well sinceinception. Unfortunately, Google announced the discontinuation of theGSA in February, 2016. End of life is scheduled for March, 2019 (GoogleSearch Appliance End of Life n.d.), and no new licenses have been issuedsince 2016. Google’s new approach to enterprise search leans heavily oncloud computing, and they have advised current GSA customers to adoptthe new cloud technology. Indexing search results in a cloud environmentintroduces a host of security and regulatory issues outlined by numerousSecurity Technical Implementation Guides (STIGs) written anddisseminated to Department of Defense (DoD) entities by the DefenseInformation Systems Agency (DISA).Figure 1. The Discover ERDC landing page search box performs searches on the GSA.2

ERDC/ITL SR-19-2Figure 2. Accessing the GSA directly is also a method of performing searches.3

ERDC/ITL SR-19-2Figure 3. Search results are presented in the familiar Google search style.1.2ObjectivesSeveral private industry companies have recognized the security andregulatory issues that governments and their agencies are facing whenpreparing to replace their GSAs and are offering competing products tosolve those issues. This document explores these products with emphasisplaced on equivalent or enhanced feature sets, ease of installation andmigration, and costs associated with migration and maintenance.4

ERDC/ITL SR-19-21.3ApproachResearch into GSA replacement search technologies was performed onlineusing standard web search methods. Upon location of a potentialcandidate, information was gathered from the candidate’s website, and insome instances communication was initiated to their technical support tofurther clarify the information.Following the information gathering phase, each potential candidate’sinformation was analyzed to determine if the technology met specifiedrequirements in the areas of functionality and cost. Each technology wasgiven a satisfaction rating based on the how completely the specifiedrequirements were satisfied. A recommended course of action was selectedfollowing the requirements and costs analyses.1.4ScopeThe purpose of this study was to determine the best course of action totake to replace the GSA prior to its end of life. Online research revealed aplethora of search technologies exist on the private marked, but theresearch was focused on technologies that most closely replicate theexisting functionality of the GSA. From a larger list of potential candidatetechnologies, a condensed list of three technologies was selected that couldmost likely meet all the specified requirements.5

ERDC/ITL SR-19-22Replacement OptionsNow that the GSA is nearing end-of-life, several companies are hoping toprovide solutions to entities where security and regulatory concerns areprominent and cloud search technologies are not viable. For example,Google’s cloud search service would require the ERDC-RDE to allowaccess to internal content from the Internet which is a violation of securityprotocols.Candidate systems must meet certain baseline requirements in the areas offunctionality, ease of installation, long term viability, initial and recurringcosts, and maintenance. Baseline functionality of a GSA replacementsystem must account for the following capabilities:1. Hardware must meet or exceed current GSA hardware capabilities.2. Installation of the replacement system must be easily achieved by existingERDC personnel or contracted labor from the product vendor.3. The ability to index internal and extranet content on ERDC servers fromwithin the RDE network.4. Support existing RDE authentication methods for knowledge consumersand provide a method for authentication while indexing content on serversthat require authentication prior to access.5. Provide a customizable user interface for allowing searches to be made onthe indexed data and output results in an easy to use and understandableformat.6. Not limit the size or number of indexes that can be made or provide simplemethods to increase the number of allowable indexes in order to scale andsatisfy long term viability requirements.7. Software must allow customizability in order to meet unforeseen futurerequirements through the use of open source software or timelycustomization from the product vendor.8. The ability to acquire security certifications for installation on RDEnetwork prior to GSA end-of-life.9. Provide responsive and timely technical support in case problems arise.Online research was conducted to identify companies and products whichcould potentially meet these baseline requirements. Each companyidentified was contacted and requested to provide information related tothe baseline requirements. The following sections detail the collectedinformation for each company.6

ERDC/ITL SR-19-22.17MindbreezeMindbreeze GmbH is an enterprise search technology company based inLinz, Austria. Founded in 2005, Mindbreeze has positioned to become aglobal leader in knowledge management and machine learning and ispositioned in the Leaders Quadrant of the Gartner’s 2018 Magic Quadrantfor Insight Engines (Ulrike 2018). In 2018, KMWorld listed Mindbreeze inits list of 100 Companies that Matter in Knowledge Management (Ulrike2018). Mindbreeze has earned this honor nine consecutive times.Figure 4. Mindbreeze InSpire search appliance.The Mindbreeze InSpire search appliance is a server based searchtechnology that is very similar to the GSA. It is a physical device that israck-mounted and connected to the network, and installation of theappliance can be performed by ERDC personnel or contracted labor fromthe product vendor.InSpire been developed and marketed as a direct replacement for the GSA,and much effort has been put into making migration from the GSA toInSpire as simple as possible. All of the current GSA functionality has beenreplicated in the software that runs on InSpire, and additional connectorsand authentication protocols beyond those provided by GSA are included.Mindbreeze also provides a GSA migration tool which copies theconfiguration settings from a GSA which reduces the amount of setup andconfiguration work. Search results are displayed in a format veryreminiscent of Google search results and the results template iscustomizable by the customer to better integrate into their environment.InSpire is coded to work with authentication methods currently deployedon the ERDC-RDE network including Public Key Infrastructure (PKI) andSingle Sign On (SSO) utilizing Security Assertion Markup Language(SAML). These authentication methods can be used to access the searchfunctionality or during indexing of data stores that are access restricted.

ERDC/ITL SR-19-2Over 450 types of data connectors are included in the software whichallows the device to connect to nearly every type of data source.The InSpire search appliance software runs from a subscription modelvery similar to the GSA model. Different levels of subscription service aredefined based on the number of indexes the customer wishes to store, withthe price increasing as the number of indexes increases. Packages start at500,000 indexes and go up to 10,000,000 indexes, although above10,000,000 indexes, additional hardware must be purchased. Appliancescan be connected in a daisy-chain configuration, increase processing andindexing capabilities.Basic technical support comes standard with the purchase of an InSpireappliance. Additional technical support can be purchased, including anoption for 24x7x365 remote access support option for instant support overthe Internet. Mindbreeze also provides on-site installation and trainingcourses as additional support options. These options come with additionalcosts.2.2YippiYippy Inc. is an enterprise search technology company based in MarcoIsland, FL. Founded in 2009, Yippy has evolved from its educational rootsinto a leading provider of search and eDiscovery technologies for all typesof consumers of data. Yippy’s search technology was born out of CarnegieMellon, and was purchased by the company in 2010. Yippy includes searchsoftware known as Velocity by Vivisimo which was acquired by IBM in2012 and renamed IBM Watson Explorer (Granville n.d.).Yippy has developed a server based search appliance called Yippy SearchAppliance (YSA) very similar to the Mindbreeze InSpire and GSA. It is alsoa physical device that is rack-mounted and connected to the network.Installation of the appliance can be performed by ERDC personnel orcontracted labor from the product vendor.8

ERDC/ITL SR-19-29Figure 5. Yippy search appliance.As a relative new comer to the market, Yippy has also developed andmarketed the YSA as a direct GSA replacement. The YSA has all the samefeatures as the GSA and InSpire, and runs on a similar subscription model.The required authentication methods are integrated and the searchinterface is customizable. A multitude of data connectors are included toconnect to all required data sources.The main differentiation between the GSA, InSpire, and YSA is the factthat YSA utilizes IBM’s Watson artificial intelligence to perform naturallanguage searches. With IBM as Yippy’s third largest shareholder, IBM’svast amount of continuing research into artificial intelligence provides theability to quickly combine data with natural language search.Yippy also provides technical support in a very similar manner asMindbreeze, with basic support supplied with appliance purchase andadditional support including on-site installation and training at additionalcosts.2.3LucidworksLucidworks is an enterprise search technology company based in SanFrancisco, CA. Originally founded in 2007 as Lucid Imagination,Lucidworks offers an application development platform including amachine learning and signal processing engine built on top of the opensource Apache Solr search platform that promises to allow corporations to

ERDC/ITL SR-19-210“translate massive pools of data into actionable insights faster than ever”(Carney 2014) (Figure 6).Figure 6. Lucidworks fusion server.Lucidworks has taken a different approach to tackling enterprise searchand knowledge management. Rather than develop an appliance basedsearch solution, they have leveraged open source software from Apacheand built enterprise level capabilities on top it. Lucidworks Fusion Serveris the company’s enterprise search software that performs data indexingand provides an Application Programming Interface (API) for access tothose indexes. Open source software typically does not come with a priceor subscription, and this is the case with Lucidworks Fusion Server. Thesoftware is freely downloadable from the company’s website.Since the solution is software only, additional servers must be provided inhouse to run the application. Installation and configuration of the softwareon in-house hardware will also require in-house labor, as Lucidworks doesnot provide any on-site installation services that were locatable. This isexpected since open source applications typically do not come withwarranties or support.In addition to hardware requirements, Lucidworks Fusion Server requirescustom programming to function as required. Once the application isinstalled and configured, it can begin indexing data sources within RDE.Those indexes are stored within the application’s database. Access to thoseindexes within the database is gained through the application’s API. TheAPI simply provides an entry point into the data, not a fully functioningsearch page for end users. A custom search page will have to be developedin-house to connect to the API entry points, retrieve the indexed itemsfrom the database, and present them to the user performing the search.

ERDC/ITL SR-19-2Lucidworks Fusion Server is coded to support PKI and SSO authenticationprotocols. Since it is an open source application, it can be highlycustomized to work with any protocol RDE may support in the future.There should be no issues with accessing authentication protected contentduring data indexing. A wide array of data connectors are also preloadedto allow connections to numerous types of data sources. The number ofindexes that can be stored is essentially unlimited. The only limitation isthe amount of physical storage space on the host server or its networkattached storage.11

ERDC/ITL SR-19-2312Requirements AnalysisEach of the three potential enterprise search solutions listed previouslywere analyzed independently to identify their level of satisfaction ofERDC’s knowledge management requirements. The table below illustratesthe level of satisfaction of requirements for each solution.Table 1. Level of satisfaction of requirements.RequirementMindbreezeYippy SearchLucidworksInSpireApplianceFusion ServerMeets or exceeds current GSAfunctionalityInstallation achieved by ERDCpersonnel or vendorIndexes internal and extranet contenton RDE networkSupports with RDE authenticationprotocolsCustomizable end user searchinterfaceAllowable number of indexes isscalableCustomizable to meet futurerequirementsResponsive and timely technicalsupport availableSecurity certification achievable priorto GSA end of life High Satisfaction, Moderate Satisfaction, Low SatisfactionAll three solutions meet almost all of the defined requirements. SinceLucidworks Fusion Server does not provide a search page for end users bydefault, it only partially satisfies the requirement of meeting or exceeding

ERDC/ITL SR-19-2current GSA functionality. Additionally, Lucidworks Fusion Server is opensource software and does not satisfy the requirement of providingresponsive and timely technical support.The major point of interest during the requirements analysis is that asolution must be able to acquire the necessary security certifications tooperate on the ERDC-RDE network prior to the GSA end of life in March2019. This requirement is essential to the successful replacement of theGSA and the continuity of enterprise knowledge management acrossERDC. The RDE operates under the umbrella of U.S. Army networksecurity requirements. The current method of determining net worthinessof an information operating on an Army network is successful completionof the Risk Management Framework (RMF) process. This process isfacilitated by the Enterprise Mission Assurance Support Service (eMASS)which is administered by DISA. The eMASS manages and facilitates theacquisition of an RMF certification of an information system by identifyingvarious security controls and requires those controls to be fully satisfied bythe information system owner prior to certification. Typically, this processto take up to twelve months to complete.None of the potential solutions currently have an RMF certification, andsince end of life for the GSA is less than twelve months away, they will notbe able to meet the RMF requirement. Luckily, the RMF process for theMindbreeze InSpire appliance was begun in February of 2018 by the HighPerformance Computing Center Modernization Program (HPC-MP) aspart of an independent knowledge management initiative within thatprogram. When received, that certification can be used to certify aMindbreeze appliance on the RDE network, therefore, satisfying the RMFrequirement. While it may be possible that the Yippy Search Applianceand Lucidworks Fusion Server are currently being RMF certified underother programs, no evidence supporting that could be found.13

ERDC/ITL SR-19-2414Cost AnalysisIt is essential when determining an enterprise search solution to analyzethe costs associate with successful implementation. Costs come in the formof acquisition, subscription, installation, configuration, certification,training, support, and maintenance. The table below enumerates thesecosts for each of the three potential search solutions.Table 2. Initial and recurring costs analysis.CostMindbreeze InSpireYippy SearchApplianceLucidworks Fusion ServerAcquisition 68K 35K 0SubscriptionIncluded in acquisition(3 years) 35K(annually after year 1) 0Installation* 4K 5K* 4KConfigurationIncluded in acquisition 5K* 50KCertification* 200K* 300K* 300KTraining 4K 5KNot availableSupportIncluded in acquisitionIncluded in acquisitionNot availableMaintenanceIncluded in acquisitionIncluded in acquisition* 200K(annual)Initial Total Cost 276K 350K 554KRecurring Costs 68K subscriptionrenewal every 3 years 35K subscriptionrenewal annually* 200K annualmaintenance* Estimated in-house labor is one Full Time Equivalent (FTE) at DB4 level.Despite an initial and subscription cost of zero dollars, Lucidworks FusionServer has very high initial and recurring costs due to the extensiveamount of development, customization, and maintenance that is requiredto make and keep it functional. Because the RMF process for MindbreezeInSpire has already begun within the HPC-MP, the certification costs forthe system are significantly lower than the other solutions.Initial, subscription, and recurring costs for the Mindbreeze InSpire andYippy Search Appliance are very similar. They are competing products, sothis is expected. The main difference between the two is installation for theMindbreeze InSpire appliance is not provided by the vendor and must beconducted with in-house labor. Yippy provides on-site installation by avendor provided technician at a slightly higher cost. Both vendors providetraining for appliance administrators at a very competitive additional cost.

ERDC/ITL SR-19-25Conclusions and RecommendationsThere are many market players in the enterprise search game and manydifferent approaches to solving the same problem. Multiple companies andproducts other than the three enumerated in this report were researchedand considered. After a requirements analysis of all the various searchsolutions, these three were chosen because they most closely aligned withERDC knowledge management requirements.Each of the three selected potential search solutions meet nearly all therequirements, but the security certification acquisition requirement was akey factor in selecting a recommendation. Mindbreeze InSpire has a clearadvantage because an RMF process has already been started in an effort tohave it certified on the HPC network, and that certification can beleveraged to certify the net worthiness of the appliance on the RDEnetwork. Additionally, there is already some Mindbreeze expertise withinHPC that RDE can leverage to reduce the configuration effort. Recurringcosts associated with Mindbreeze are slightly lower than the othersolutions which is a great benefit to ERDC.It is recommended that ERDC purchase and install two MindbreezeInSpire search appliances to successfully replace the GSA prior to its endof life. Two appliances should be purchased because one will serve as a hotbackup to fail over to in case one encounters issues that take it offline. Thecost enumerated in the previous section reflects the cost of the twoappliances. Collaborating with Mindbreeze technical support duringrequirements gathering and configuration is also recommended to helpfacilitate the successful implementation of the system on the RDEnetwork.ERDC knowledge management is essential to maintaining successfulexecution of programs, and this recommendation should be carried out assoon as possible by the ERDC Office of Research and Technology. Carryingout this recomme

