UvA-DARE (Digital Academic Repository) SMTP: Stedelijk .

2y ago

126 Views

4 Downloads

2.17 MB

5 Pages

Last View : 17d ago

Last Download : 3m ago

Upload by : Jenson Heredia

Report this link

Download PDF

Transcription

UvA-DARE (Digital Academic Repository)SMTP: Stedelijk Museum Text Mining ProjectSmeets, J.; Scholtes, J.C.; Rasterhoff, C.; Schavemaker, M.Publication date2016Document VersionFinal published versionPublished inDigital Humanities 2016Link to publicationCitation for published version (APA):Smeets, J., Scholtes, J. C., Rasterhoff, C., & Schavemaker, M. (2016). SMTP: StedelijkMuseum Text Mining Project. In W. Eder, & J. Rybicki (Eds.), Digital Humanities 2016:Concerence abstracts : Jagiellonian University & Pedagogical University, Kraków, 11-16 July2016 (pp. 683-685). European Association for Digital Humanities [etc.].http://dh2016.adho.org/abstracts/270General rightsIt is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s)and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an opencontent license (like Creative Commons).Disclaimer/Complaints regulationsIf you believe that digital publication of certain material infringes any of your rights or (privacy) interests, pleaselet the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the materialinaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letterto: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. Youwill be contacted as soon as possible.UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl)Download date:15 Jun 2021

The European Association for Digital Humanities (EADH)Association for Computers and the Humanities (ACH)Canadian Society for Digital Humanities / Société canadienne des humanités numériques (CSDH/SCHN)centerNetAustralasian Association for Digital Humanities (aaDH)Japanese Association for Digital Humanites (JADH)Digital Humanities 2016Conference AbstractsJagiellonian University&Pedagogical UniversityKraków11–16 July 2016Kraków 2016

SMTP: Stedelijk Museum Text MiningProject1980 that resulted from the query ”Stedelijk Museum” AND”Amsterdam” were used, forming a set of 18.290 articles.MethodologyJeroen SmeetsThe following methodology uses two approaches toobtain a quick and detailed overview of the content of adigitized archive that contains unstructured information.The first one focuses on the relations between namedentities and aims at finding communities in the relationnetwork. The second approach uses time based topicmodeling to get an overview of content changes over time.Finally, a name extraction method is presented that is ableto handle multiple causes of name variations.smeetsjeroen@hotmail.comMaastricht University, Netherlands, TheJohannes C. ht University, Netherlands, TheClaartje RasterhoffC.Rasterhoff@uva.nlCREATE, University of Amsterdam, Netherlands, TheRelation networks and community detectionMargriet SchavemakerM.Schavemaker@stedelijk.nlStedelijk Museum Amsterdam, Netherlands, TheIntroductionThis paper addresses how text-mining, machinelearning and information retrieval algorithms from thefield of artificial intelligence can be used to analyze ArtResearch archives and conduct (art-) historical research.To gain quick insight into the archive, two aspects arefocused on: relations between groups of people usingcommunity detection, and global content changes overtime using topic modeling. For such archives pre-taggedground-truth collections are generally not available, andthe archives are often too large, geographically distributed,and not always available in digital formats to build such aground-truth at reasonable costs. To develop and test thevalidity and relevance of existing tools, close collaborationwas established between the AI researchers, museum staff,and researchers in CREATE, a digital humanities projectthat investigates the development of cultural industriesin Amsterdam over the course of the last five centuries.DataThe research draws on two datasets. The principaldataset is the digitized archive of the Stedelijk MuseumAmsterdam, a renowned international museum dedicatedto modern and contemporary art and design. The archiveof the Stedelijk Museum Amsterdam contains documentsfrom the period 1930-1980. The corpus is a static collection of approximately 160.000 text documents that weredigitized using OCR. The second dataset is drawn fromDelpher, developed by (Koninklijke Bibliotheek Nederland,2015). Delpher provides a collection of digitized newspapers, books and magazines that is available for research.A selection of newspapers was made that is used as anadditional dataset for this project. Only articles from 1930-In its most basic form, a relation between two namedentities can be said to exist when they occur together inthe same document. The strength of a relation can becharacterized by the number of documents in which bothnamed entities occur. When all the co-occurrences arefound, a relation network can be constructed.In addition, sentiment analysis can be done to furthercharacterize a relation. A sentiment score is assigned toeach document, indicating the sentiment content of thedocument. No distinction is made between positive andnegative sentiment polarity. The hypothesis is that relations between individuals with a high sentiment are moreinteresting than relations with a low sentiment. This isbecause sentiments around trigger-events are often higherthan around common-day events. A lexicon based approach is used with lists of language specific sentimentwords. The sentiment score of a document is then givenby the sigmoid of the count of the sentiment words inthe document, normalized by the number of words inthe document.Finally, community detection algorithms can be applied to the relation network. These types of algorithmsaim at finding clusters of groups of entities that have denseconnections between members of the clusters and sparseconnections with members of other clusters (Fortunato,2010). The relation weight measure that is used to calculatethe communities, is taken as the product of the strengthof the relation, i.e. the number of documents where bothentities occur in, and the average sentiment score of thedocuments of a relation. It was found that combining thesetwo measures, resulted in more meaningful communities.683Time based Topic ModelingIn the next approach, topic modeling algorithms areapplied to analyze the information content and their evolution over time. Topic modeling tries to discover theunderlying thematic structure in a collection of documents.Non-Negative Matrix Factorization (NMF) is being used

as a tool for topic modeling (Arora et al., 2012). NMF isan unsupervised method where a matrix is approximatedby two low rank non-negative matrices. The extracted semantic feature vectors have only non-negative values andare sparse so they are easily interpretable. Furthermore,NMF is shown to generate more consistent results overmultiple runs (Choo et al., 2013), compared to other toolsused for topic modeling such as LDA (Blei et al., 2003).The approach suggested in (Vaca et al., 2014) uses atime-based collective matrix factorization based on NMFand is used in this project. It extends NMF by introducinga topic transition matrix that allows to track topics as theyemerge, evolve and fade over time.to the museum director, could be identified with the helpof a museum expert.Name ExtractionThe following method was used to extract namedentities from a collection of documents in order to buildthe relation network. It handles different causes of namevariations such as OCR induced errors commonly foundin digitized document collections, spelling mistakes, nameabbreviations and first and last name combinations.The method makes use of lists of name variations.Starting from a set of names extracted from a name database, such as RKDArtists and (RKD, 2015), the documentcollection is searched for possible name variations. Thesevariations are found by searching for the last name using afuzzy search. The similarity between the group of tokensaround the found last name, and the original name is thencalculated as a similarity score. The similarity score calculation is based on the idea described in (Song and Chen,2007), which uses a n-gram set matching technique. Thelists of name variations can then be evaluated manually ora threshold on the similarity score can be used to identifyname variations that correspond to the original name. Themethod using a threshold of 0.9 on the similarity scorewas tested on 50 randomly chosen names. The averageprecision was found to be 81 percent.Figure 1: Found communities for graphic artists in the archive ofthe Stedelijk MuseumFigure 2: Time based topic modeling for the archive of the StedelijkMuseum AmsterdamResultsA relation network was constructed for the document collection of the archive of the Stedelijk MuseumAmsterdam. Only artists with the graphic artist qualification in the RKDArtists and database were used. Themethods were implemented using available open sourcesoftware libraries such as the Apache Lucene text searchengine library (The Apache Software Foundation, 2015)and the Gephi platform (Bastian et al., 2009). The standardcommunity detection feature in Gephi was used, whichis based on the Louvain method (Blondel et al., 2008).The result is shown in Figure 1. The color of the relationbetween the nodes indicates the average sentiment scoreof the relation, starting from blue (neutral) to red (highsentiment content). Communities such as group exhibitions, art movements or a group of artists closely relatedFigure 3: Time based topic modeling for Delpher newspaper articlesThe time based topic modeling algorithm suggestedin (Vaca et al., 2014) was implemented in MATLAB andJava. The algorithm was applied to both the archive of theStedelijk Museum Amsterdam and newspaper articles684

from the Delpher database. The results are visualized overtime in the form of stacked topic rivers (Wei et al., 2010),shown in Figure 2 and Figure 3. Several exhibitions andevents could be identified and are annotated on the chart.ConclusionThis paper discusses two approaches to gain insightinto a digitized archive. Relation networks of persons withcommunity detection are considered, relying on a robustname extraction method. Furthermore, the evolution ofcontent over time can be explored using time based topicmodeling.For the humanities researchers in this project, themain aim was to asses the research potential of computational analysis of digitized art archives in general, and theStedelijk Museum in particular. Two types of preliminaryresearch questions were developed to do so. The first typehad to do with identifying patterns of change and continuity, across time and place. These include for instancetracing the position of the Stedelijk Museum as an intermediary in Dutch design industries, or the developmentof the Stedelijk Museum as an increasingly internationalplayer. The second type of question is less concernedwith general historical patterns, and more with specificart-historical research questions, regarding for instance(networks of) particular artists, artworks or exhibitions.But before we could start asking such questions to digitized art-historical archives, the quality and accessibilityof the texts needed to be established. Secondly, specificmethods needed to be explored and adapted in order toclean, identify, retrieve, extract, and structure the texts.The first results presented in this paper demonstrate thateven though they may not be clean at the first try or capture all historical nuance, they do help archives to openup and show unexpected relationships and patterns, toanswer specific questions, and to get connected with otherrelevant sources, such RKDartists and Delpher. The community detection in relation with sentiment mining, thetopic modeling and name extraction method developedin this project therefore provide a solid basis for the nextstep in assessing the research potential of art-historicalarchives: developing in-depth case studies, again in closecollaboration with art-historians and historians, allowingthe archive to speak up in unprecedented ways, offeringaccess to hidden story lines that subvert and augmentprevailing historical narratives.Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent dirichletallocation. The Journal of Machine Learning Research, 3:993–1022.Blondel, V. D., Guillaume, J.-L., Lambiotte, R. and Lefebvre,E. (2008). Fast unfolding of communities in large networks.Journal of Statistical Mechanics: Theory and Experiment,2008(10): P10008.Choo, J., Lee, C., Reddy, C. K. and Park, H. (2013). Utopian:User-driven topic modeling based on interactive nonnegativematrix factorization. Visualization and Computer Graphics,IEEE Transactions on, 19(12): 1992–2001.Fortunato, S. (2010). Community detection in graphs. PhysicsReports, 486(3): 75–174.Koninklijke Bibliotheek Nederland (2015). Delpher - BoekenKranten Tijdschriften http://www.delpher.nl/ (accessed 1November 2015).RKD (2015). Netherlands Institute for Art History https://rkd.nl/en/ (accessed 1 November 2015).Song, S. and Chen, L. (2007). Similarity joins of text withincomplete information formats. Advances in Databases:Concepts, Systems and Applications. Springer, pp. 313–24.The Apache Software Foundation (2015). Apache Lucene - Welcome to Apache Lucene http://lucene.apache.org/ (accessed1 November 2015).Vaca, C. K., Mantrach, A., Jaimes, A. and Saerens, M. (2014).A time-based collective factorization for topic discovery andmonitoring in news. Proceedings of the 23rd InternationalConference on World Wide Web. ACM, pp. 527–38.Wei, F., Liu, S., Song, Y., Pan, S., Zhou, M. X., Qian, W., Shi,L., Tan, L. and Zhang, Q. (2010). Tiara: a visual exploratorytext analytic system. Proceedings of the 16th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining. ACM, pp. 153–62.BibliographyArora, S., Ge, R. and Moitra, A. (2012). Learning topic models going beyond SVD. Foundations of Computer Science (FOCS),2012 IEEE 53rd Annual Symposium on. IEEE, pp. 1–10.Bastian, M., Heymann, S. and Jacomy, M. (2009). Gephi: anopen source software for exploring and manipulating networks. ICWSM, 8: 361–62.685

Margriet Schavemaker M.Schavemaker@stedelijk.nl Stedelijk Museum Amsterdam, Netherlands, The Introduction This paper addresses how text-mining, machine-learning and information retrieval algorithms from the field of artificial intelligence can be used to analyze Art-

Related Documents:

The love dare challenge

The love dare challenge day 1. The love dare challenge reviews. The love dare daily challenges. The love dare challenge printable. The fireproof love dare challenge. The love dare challenge app. I believe the only thing you need to have to know true love is true love. SearchReSearchDaniel M. Surprisingly, I am not in a failing marriage, but I .

37 Views

1y ago

DARE!! Instruments DARE!! EMC & RF Measurement equipment - Raditeq

DARE!! Instruments DARE!! EMC & RF Measurement equipment Vijzelmolenlaan 3 3447 GX Woerden The Netherlands Tel. 31 348 416 592 www.dare.eu instruments@dare.eu DARE!! Products B.V. CoC number: 30138672 VAT number: NL8056.13.390.B01 . The CI test Bundle is a turn-key solution for

13 Views

1y ago

DARE Digital Storytelling Handbook for Empowerment - SALTO-YOUTH

DARE Digital Storytelling Handbook for Empowerment 5 DARE Project The DARE Digital Storytelling Handbook was developed as part of DARE: Disable the Barriers Project. It includes accessible multimedia resources to accommodate the needs of people with and without impairments. The aims of the Digital Storytelling Handbook and DARE Project are to:

15 Views

1y ago

Copying and Creating Oracle Solaris 11.1 Package Repositories

solaris repository description Local\ copy\ of\ the\ Oracle\ Solaris\ 11.1\ repository solaris repository legal-uris solaris repository mirrors solaris repository name Oracle\ Solaris\ 11.1\ Package\ Repository solaris repository origins solaris repository

38 Views

2y ago

DE60-DE80 Installation Manual - Dare Products, Inc

* 2. One to three Dare ground rod clamps 3. Dare insulated underground & hook-up wire 4. One Dare cut-off switch, if desired 5. Dare line clamps/split bolts/fence taps 6. Surge Protector * The pulse energy of the DE 20, DE 60, or DE 80 is low enough where one ground rod may be all that is needed. INSTALLING THE GROUND SYSTEM

14 Views

1y ago

Siebel Data Warehouse Installation and Administration Guide - Oracle

Creating, Restoring, and Configuring the Informatica Repository 78 Starting the Informatica Repository Server 78 Creating or Restoring the Informatica Repository 79 Dropping the Informatica Repository (Optional) 81 Registering the Informatica Repository Server in Repository Server Administration Console 81 Pointing to the Informatica Repository 82

22 Views

1y ago

A quick intro to Git and GitHub - UC3M

Introduction Basic Git Branching in Git GitHub Hands-on practice Git: General concepts (II/II) I clone: Clone remote repository (and its full history) to your computer I stage: Place a le in the staging area I commit: Place a le in the git directory (repository) I push: Update remote repository using local repository I pull: Update local repository using remote repository

10 Views

9m ago

March 01, 2020

Mar 01, 2020 · dare. I did apologize years later. But the point is that there is power in a dare. Most of us are have a daring spirit. We almost always want to rise to the challenge of something put before us, especially if done by a peer, a teacher or an employer. In general, we like to be dared. So, I want to dare you to something. This Lent I dare you to .

16 Views

2y ago

Recent Views

Stock Market Development and Economic Growth: Empirical Evidence from China

measures used to proxy for stock market size and the size of real economy. Most of the existing studies use stock market index as a proxy for measuring the growth and development of stock market in a country. We argue that stock market index may not be a good measure of stock market size when looking at its association with economic growth.

1y ago

263 Views

Lasso Technique Application In Stock Market Modelling: An Empirical .

This research tries to see the influence of G7 and ASEAN-4 stock market on Indonesian stock market by using LASSO model. Stock market estimation method had been conducted such as Stock Market Forecasting Using LASSO Linear Regression Model (Roy et al., 2015) and Mali et al., (2017) on Open Price Prediction of Stock Market Using Regression Analysis.

3m ago

18 Views

The Stock Market Profits Blueprint - Liberated Stock Trader

The stock market profits blueprint has been hand crafted to enable you to understand all the factors that play on the stock market. It is called a blueprint because a blueprint is in effect an architectural document to show how something is designed. The Blueprint will show you a powerful way to envisage how the stock market and the stock market

1y ago

181 Views

Factors Affecting Performance of Stock Market: Evidence from . - HRMARS

We used the data of Colombo Stock Exchange (CSE) for Sri Lankan stock market in this research which is the main stock exchange of Sri Lanka. The market capitalization of CSE is over 20 billion USD. Colombo stock exchange is the first south Asian region stock market and overall 52nd who obtain the membership of World Federation of Exchanges.

11m ago

103 Views

Stock Market Development in the Philippines: Past and Present

Philippine stock market. This paper may serve as a basis for further research on the stock market development in the country. This paper is organized as follows: Section 2 traces the origins of the stock market in the Philippines while section 3 outlines the reforms that have been implemented to strengthen the stock market.

1y ago

128 Views

Columbus,Ohio 1890

Slicing Steaks 3563 Beef Tender, Select In Stock 3852 Angus XT Shoulder Clod, Choice In Stock 3853 Angus XT Chuck Roll, Choice 20/up In Stock 3856 Angus XT Peeled Knuckle In Stock 3857 Angus XT Inside Rounds In Stock 3858 Angus XT Flats, Choice In Stock 3859 Angus XT Eye Of Round, Choice In Stock 3507 Point Off Bnls Beef Brisket, Choice In Stock

2y ago

268 Views

Buying Your First Stock - Stock-Trak

Stock Market Game Time: 15 Minutes Requires: StockTrak Curriculum , Computer Access Buying Your First Stock This lesson is an introduction to buying a stock. Students will be introduced to basic vocabulary that is involved with a buying and owning a stock. Stu-dents will be going through the entire process of buying a stock from looking

1y ago

164 Views

1.11.1. Where to Find Wall Street Training - Investing 101

investing and day trading, how to trade stock options, online free stock trading, market timing strategies, and mutual funds. But, first—learn what these terms mean. Play stock market games:Play stock market games: A stock simulation market game will train you to be comfortable with investing

2y ago

125 Views

Stock Price Prediction Using RNN and LSTM - JETIR

1. BASIC INTRODUCTION OF STOCK MARKET A stock market is a public market for trading of company stocks. Stock market prediction is the task to find the future price of a company stock. The price of a share depends on the number of people who want to buy or sell it. If there are more buyers, then prices will rise. If the seller has a number of .

1y ago

114 Views

Stock Market Wealth Effects - Harvard University

negative stock return and a subsequent decline in household spending and employment. We use a local labor market analysis to address this empirical challenge and provide quantitative evidence on the stock market consumption wealth e ect. Our empirical strategy combines regional heterogeneity in stock market wealth with aggregate movements in stock

1y ago

104 Views

Artificial Intelligence Approach for Stock Market - IJSER

The forecast of stock market helps investors to make investment decisions, via giving them strong insights about the behavior of stock market for avoiding investment risks. It was found that news has an influence on the stock price behavior [2]. The stock market is a constantly changing indicator of economic activity all over the world.

1y ago

109 Views

The Stock Market Game Student Activity Packet - Maryland Council on .

1. The Stock Market Game Kick Off! (3 mins) 2. Intro to Investing (4 mins) 3. Intro to Companies (3 mins) 4. Intro to Stocks (4 mins) 5. Building Your Portfolio (5 mins) 6. The Stock Market Game Trading Portfolio (6 mins) 7. The Stock Market Game Rules (6 mins) 8. Conducting Research (5 mins) 9. Entering Stock Trades (4 mins) 10. Assessing Risk .

1y ago

114 Views

Stock Market Uncertainty and the Stock-Bond Return Relation

implied volatility and stock turnover may prove useful for ﬁnancial applications that need to under-stand and predict stock and bond return co-movements. Finally, our empirical results suggest that the beneﬁts of stock-bond diversiﬁcation increase during periods of high stock market uncertainty. This study is organized as follow.

1y ago

158 Views

The Stock Market Crash of 1929, Great Depression, Dust .

The Stock Market Crash of 1929 In 1929, the Stock Market Crashed!! The stock of a business represents the original money paid into or invested in the business by its founders. So the stock represents how much mone

2y ago

358 Views

Web Based Stock Forecasters - Winlab

Stock market prediction is the act of trying to determine the future value of a company stock or other financial instrument traded on a financial exchange. The successful prediction of a stock's future price could yield significant profit. The stock market is not an efficient market.

1y ago

102 Views

UvA-DARE (Digital Academic Repository) SMTP: Stedelijk .

It looks like you're using an ad-blocker