Collocation Extraction Using Square Mutual Information .

1y ago

8 Views

1 Downloads

867.53 KB

6 Pages

Last View : 9d ago

Last Download : 3m ago

Upload by : Kian Swinton

Report this link

Download PDF

Transcription

International Journal of Knowledgewww.ijklp.orgKLP International 2011 ISSN 2191-2734and Language ProcessingVolume 2, Number 1, January 2011pp. 53-58Collocation Extraction Using Square Mutual Information Approaches123Huarui Zhang , Yongwei Zhang and Jingsong Yu1Institute of Computational LinguisticsPeking UniversityBeijing, Chinahrzhang@pku.edu.cn2,3School of Software and MicroelectronicsPeking University2Beijing, China3zhangywibb@gmail.com, yjs@ss.pku.edu.cnReceived December 2010; revised January 2011ABSTRACT. MI (Mutual Information) has been proposed for measure of collocation longbefore, although still widely applied today in various fields, it has the disadvantage ofheavily favoring rarely occurring items.A new improved Square Mutual Information approach is proposed to solve this problem.Supported by experimental results, the precision of this new method is better than that ofMI and other modified approach such as combination of external and internal measures.Another advantage of this new approach is that it remains language independent.Keywords: Collocation, association measure, square mutual information, improvedsquare mutual information1. Introduction. Statistical approach of collocation extraction has been a dominant trendfor years, from [4, 9, 6] to [5, 7, 1]. Mutual Information (MI) is one of most early andwidely used measures, referred the by the majority of research papers on collocationextraction.In [8], a total of 82 association measures are empirically tested, 6 among which aremutual information and derived measures. However, the new approach proposed in thispaper is not found in the full list.Our main interest lies on the improvement of mutual information related measures. Oneintuitional motivation is that mutual information is originated from information theory,while many information-theoretic approaches have been quite successful in NLP. Anothermotivation from the opposite direction is that mutual information is sometimes consideredas a poor measure for collocation extraction. Despite the disadvantage of heavily favoringrarely occurring items, we think that MI can be improved to get better performance.We will first review one of such attempt to modify MI [2, 3].

2. Unithood: Chen’s approach. Chen [2, 3] calculates unithood measure by combining theexternal measure and the internal measure.The external measure is based on two rates: the left dependent rate (LD) and the rightdependent rate (RD).LD( w1wn ) RD( w1wn ) max f (aw1a Af ( w1max f ( w1b Bf ( w1wn )wn )wnb)wn )where w w1w2 wnf(w) is the frequency of a string w,A is the full set of all the left neighbor elements of w,a is any element of set A,B is the full set of all right neighbor elements of w,b is any element of set B.The external measure, denoted as IDR (independent rate), is given by.IDR(w1.wn ) (1 1/ f ( w1.wn )) (1 LD( w1.wn )) (1 RD( w1.wn ))(3)The internal measure is based on ConnectRate(wiwi 1), which is given byConnectRat e( wi wi 1 ) p( wi wi 1 ) p( wi ) p( wi 1 )p( wi wi 1 )The minimum of ConnectRate(wiwi 1), denoted as MinConnectRate(w1.wn), is theinternal measure.MinConnectRate(w1.wn ) min ConnectRate(wi wi 1 )1 i n 1The final formula of unithood measure, denoted as UnitRate(w1.wn), is the product ofexternal measure IDR(w1.wn) and internal measure MinConnectRate(w1.wn).UnitRate(w1.wn ) IDR(w1.wn ) MinConnectRate(w1.wn )It can be seen that ConnectRate(wiwi 1) is a transformation of MI, which can be derivedfrom MI directly. This suggests that Chen‟s approach also belongs to the family of MI, withwhich we will compare the results of our new method.3. Improved square mutual information: New approach. We add a new term to squareMI, which increases the influence of high frequency combinations by logarithmic scale.The bigram version is given by54

SquareMI ( x, y ) log (f ( xy ) 2 log (1 f ( xy )))f ( x) f ( y )where x, y is the adjacent part of combination xy,f(x), f(y) is the frequency of part x, y,f(xy) is the frequency of combination xy.While the n-gram version isSquareMI ( w1 ,., wn ) log (f ( w1.wn )n log (1 f ( w1.wn ))n f (w )i 1)iwhere w w1w2 wn,f(wi) is the frequency of part wi,f(w1 wn) is the frequency of combination w.4. Results and Discussion. The evaluations and results are as below:The first part of the evaluation data is the People‟s Daily Corpus (January 1998)segmented and annotated by Institute of Computational Linguistics, Peking University.The second part of the evaluation data is Financial Times (http://www.ftchinese.com/),mainly Chinese text translated from original English text.The evaluation is based on the following assumption: The connection betweencollocations and words is similar to that between words and Chinese characters. If a methodis suitable for extracting words from Chinese character combinations, then it is suitable forextracting collocations from word combinations.TABLE 1. Comparison of precisionsNumber of MutualUnitcollocations Information(%) Rate(%)Top 10068.0086.00Top 50069.6087.58Top 100066.7081.60Top 500063.0267.34Top 1000058.4658.75Top 1500053.2953.55Top 557.3250.26The top 21296 terms are selected for evaluation, in parallel with Chen‟s approach(denoted as UnitRate hereafter) for better comparability, as shown in Table 1.The precision changes with the number of collocations selected. As shown in Figure 1, 2,and 3, the horizontal axis is number of collocations (100 as a unit), while the y-axis isprecision.From Figure 1 we can see that our improved square mutual information approach is55

better than Chen‟s method and pointwise mutual information method.FIGURE 1. Comparison with MI and UnitRate.In [2], Chen‟s methods achieved higher precision than that by repeating his method. Oneconjecture is that preprocessing and/or postprocessing are done before/after the extraction.After we remove the word extraction result containing Chinese characters in stop list, theprecision curve becomes Figure 2.FIGURE 2. Comparison with UnitRate after filtering.From Figure 2 we can see that after the removal of words containing Chinese charactersin stop list, Chen‟s method get much closer result to our improved square mutualinformation method.Figure 3 shows the change in precision curve of our improved square mutual informationmethod before and after the removal of words containing stopping Chinese characters.The minor change in precision curve of our method suggests that our method can dobetter even before the use of filtering, which means our method is more effective and canbe language independent.56

(After)(Before)FIGURE 3. Improved Square MI (before and after filtering).Expert Evaluation: A randomly-chosen sample of the result is manually checked byhuman experts, and the approved percentage is shown in Table 2.TABLE 2. Comparison of expert evaluationNumber of UnitSquarecollocations Rate(%) MI(%)Top 1008284Top 5007278Top 10005863Top 30005356Top 50004043Top 100003838From these comparisons, we find that our improved square mutual information approachobtains a better precision in collocation extraction.5. Conclusions. The new improved square mutual information approach over performspointwise mutual information method completely. Although simpler than Chen‟s approach,our approach is still more effective than Chen‟s when no filter is applied. Humanevaluation on chosen sample also confirms the advantage of this new approach.Acknowledgment. This work is partially based on the segmented and annotated Chinesecorpus developed by Institute of Computational Linguistics at Peking University under theleadership of Professor Shiwen YU.57

REFERENCES[1]I. A. Bolshakov, E. I. Bolshakova, A. P. Kotlyarov and A. Gelbukh, Various Criteria of CollocationCohesion in Internet: Comparison of Resolving Power, Computational Linguistics and Intelligent TextProcessing, Lecture Notes in Computer Science, vol.4919, pp.64-72, 2010.[2]Chen Yirong, The Research on Automatic Chinese Term Extraction Integrated with Unithood andDomain Feature, Master Thesis in Peking University, Beijing, 2005.[3]Yirong Chen, Qin Lu, Wenjie Li, Zhifang Sui and Luning Ji, A Study on Terminology Extraction Basedon Classified Corpora, Proceedings of the Fifth International Conference on Language Resources andEvaluation (LREC'06), pp.2383-2386, 2006.[4]K. Church and P. Hanks, Word association norms, mutual information and lexicography, Computational[5]S. Evert, The Statistics of Word Cooccurrences: Word Pairs and Collocations, PhD dissertation, IMS,Linguistics, vol.16, no.1, pp.22–29, 1990.University of Stuttgart, 2004.[6]C. Manning and H. Schutze, Foundations of statistical natural language processing, MIT Press,[7]B. T. McInnes, Extending the Log Likelihood Measure to Improve Collocation Identification, M.S.Cambridge, MA, 1999.Thesis, Department of Computer Science, University of Minnesota, Duluth, 2004.[8]P. Pecina, Lexical association measures and collocation extraction, Lang Resources & Evaluation,[9]J. Pustejovsky, P. Anick, and S. Bergler, Lexical semantic techniques for corpus analysis,vol.44, pp.137–158, 2010.Computational Linguistics, vol.19, no.2, pp.331-358, 1993.58

Statistical approach of collocation extraction has been a dominant trend for years, from [4, 9, 6] to [5, 7, 1]. Mutual Information (MI) is one of most early and widely used measures, referred the by the majority of research papers on collocation extraction. In [8], a total of 82 . association

Related Documents:

Review of Extraction Techniques

Advance Extraction Techniques - Microwave assisted Extraction (MAE), Ultra sonication assisted Extraction (UAE), Supercritical Fluid Extraction (SFE), Soxhlet Extraction, Soxtec Extraction, Pressurized Fluid Extraction (PFE) or Accelerated Solvent Extraction (ASE), Shake Flask Extraction and Matrix Solid Phase Dispersion (MSPD) [4]. 2.

33 Views

1y ago

Concordance & Collocation Softwares

this software is not intended to be an automatic collocation extraction tool, but it is collocation extraction aided software.! 1.!The statistical values should be interpreted relatively rather than absolutely.! 2.!Using different statistical methods will yield different results. 19 34 Tips on using Colloc Extract

10 Views

1y ago

Danusia Horochowska - SerenityStreetNews.com

Alfred Lambremont Webre III 3 mutual friends Adam Wiederholtz 5 mutual friends Michael's Wave 1 mutual friend Julie Castonguay 1 mutual friend Joseph Marie Buzzé 2 mutual friends Bob Challenger 1 mutual friend Joseph Irving 3 mutual friends Lorenzo Segarra 3 mutual friends Danny Wright 8 mut

83 Views

2y ago

A Three-layered Collocation Extraction Tool and its Application in ...

Collins COBUILD English Collocations in-cludes about 140,000 collocations of 10,000 headwords of English core vocabulary. Collocation is of great importance in Natural Language Processing (NLP) as well as in Linguistics and Applied Linguistics. Various methods of automatic collocation identification and extraction have been proposed.

11 Views

1y ago

MUTUAL FUNDS IN INDIA ISSUES, OPPORTUNITIES AND …

Mutual Fund (Nov 89), Bank of India (Jun 90), Bank of Baroda Mutual Fund (Oct 92). LIC established its mutual fund in June 1989 while GIC had set up its mutual fund in December 1990. At the end of 1993, the mutual fund industry had assets under management of Rs.47,004 crores

1.2K Views

2y ago

Global Mutual Market Share - International Cooperative and Mutual ...

Germany. Mutual insurance accounted for more than 25% of the national market in 20 countries. Mutual insurers in 80% of the countries included in this report experienced a growth in their national market share between 2007 and 2017. Mutual life and non-life insurance Global mutual life business grew by a total of 23%

47 Views

1y ago

A Study on Mutual Funds in India - IJSER

Mutual funds became popular in the United States in the 1920s and continue to be popular since the 1930s, especially open-end mutual funds. Mutual funds experienced a period of tremendous growth after World War II, especially in the 1980s and 1990s. LIC established its mutual fund in June 1989 while GIC had set up its mutual fund in December 1990.

110 Views

1y ago

OLD ENGLISH GRAMMAR AND EXERCISE BOOK

the principles of English etymology, than as a general introduction to Germanic philology. The Exercises in translation will, it is believed, furnish all the drill necessary to enable the student to retain the forms and constructions given in the various chapters. The Selections for Reading relate to the history and literature of King Alfred’s day, and are sufficient to give the student a .

109 Views

3y ago

Recent Views

Aina Haina Shopping Center Hours of Operation 820 West .

Grilled Pork Chop & Tofu (Cơm Sườn Nướng & Đậu Hũ) - 14.25 15. Grilled Tofu (Cơm Đậu Hũ Nướng) - 14.25 16. Grilled Shrimp & Tofu (Cơm Tôm Nuong & Đậu Hũ) - 16.25 . Thai Tea (Trà Thai) - 4.50 . Iced Coffee (Cà Phê S

2y ago

190 Views

Yahoo: Failures - Harvard University

Stock closes at an all time low 8.11 Yahoo invested 1Bn in Alibaba Yahoo co-founder & CEO Jerry Yang steps down after 18 months Microsoft and Yahoo agree to search partnership 2008 Yahoo tries to buy Google for 3Bn. Google denied the offer 2009 Yahoo acquires many media companies Microsoft tries to buy Yahoo for 44.6Bn Yahoo denied offer .

1y ago

200 Views

Reviewers Guide – AT&T Yahoo! Go Mobile

Reviewers Guide – AT&T Yahoo! Go Mobile AT&T Yahoo! Go Mobile gives you access to a wide range of the Yahoo! services you . select download then select attachments to view and download the attachment. 4 . emoticons, audibles, voice IMs and attach photos to IM conversations. To use Yahoo! Messenger, click on Messenger in the Yahoo! Go .

2y ago

369 Views

MANAGERIAL FINANCE - GBV

of Managerial Finance page 2 Introduction to Managerial Finance 1 Starbucks—A Taste for Growth page 3 1.1 Finance and Business What Is Finance? 4 Major Areas and Opportunities in Finance 4 Legal Forms of Business Organization 5 Why Study Managerial Finance? Review Questions 9 1.2 The Managerial Finance Function 9 Organization of the Finance

3y ago

6.8K Views

Chapter 1 The roles of finance function in organisations

The roles of the finance function in organisations 4. The role of ethics in the role of the finance function Ethics is the system of moral principles that examines the concept of right and wrong. Ethics underpins an organisation’s sustained value creation. The roles that the finance function performs should be carried out in an .File Size: 888KBPage Count: 10Explore furtherRole of the Finance Function in the Financial Management .www.managementstudyguide.c Roles and Responsibilities of a Finance Department in a .www.pharmapproach.comRoles and Responsibilities of a Finance Department .www.smythecpa.comTop 10 – Functions of Business Finance in an om23 Functions and Duties of Accounting and Finance nded to you b

2y ago

335 Views

Yahoo Microsoft: A Horizontal Romance, or a Broken

News, Finance, Sports and Rivals Entertainment -Yahoo! Music, Movies, TV, Games, Video and omg! Life Style - Yahoo! Autos, Real Estate, Food, Tech, Kids, Health o Connected Life - Co-branded broadband, Yahoo! Moblie Digital Home, Desktop

1y ago

127 Views

2017-2018 GRANDE ÉCOLE MSc in MANAGEMENT

Descriptif des cours Course Outlines 10 Catalogue des cours/ Course Catalog 2017-2018 FIN: Finance/Finance A : Actuariat/Actuarial, Insurance E : Finance d’entreprise/Corporate Finance The course liste tables and the course outlines G : Finance générale/General Finance M : Finance de marché/Market Finance S : Synthèse/Synthesis IDS: Systèmes d’Information, Sciences de la Décision et .

3y ago

312 Views

Behavioral Finance and Wealth L Management

Introduction to Behavioral Finance CHAPTER1 What Is Behavioral Finance? Behavioral Finance: The Big Picture Standard Finance versus Behavioral Finance The Role of Behavioral Finance with Private Clients How Practical Application of Behavioral Finance Can Create a Successful Advisory Rel

2y ago

377 Views

Catalogue des Cours Course Catalog - ESSEC Business School

10 Catalogue des cours/Course Catalog 2021-2022 FIN: Finance/Finance E : Finance d'entreprise/Corporate Finance G : Finance générale/General Finance M : Finance de marché/Market Finance S : Synthèse/Synthesis IDS: Systèmes d'Information, Sciences de la Décision et Statistiques/ Information Systems, Decision Sciences and Statistics

1y ago

222 Views

kama sastry 2004@yahoo.co.uk in.groups.yahoo .

kama_sastry_2004@yahoo.co.uk up/hot-indi

2y ago

477 Views

IX. “Can You Buy Me Now?”: The Erratic Closing of the .

2016-2017 Developments in Banking law 547 by both parties, Verizon was supposed to purchase Yahoo’s shares for 4,825,800,000.965 Excluded from the transaction were Yahoo’s holdings in Yahoo Japan and Alibaba.966 The sale will end Yahoo’s twenty

2y ago

358 Views

Implementasi Rest Web Service Pada Aplikasi Pengolah Pesan Yahoo . - Core

REST Web Service: Gambar 3. Desain Sistem REST Web Service 3. HASIL DAN PEMBAHASAN 3.1 Gambaran Umum Aplikasi Pada Penelitian ini akan menghasilkan sebuah aplikasi pengolah pesan Yahoo Messenger dan Aplikasi REST Web Service. Aplikasi pengolah pesan Yahoo Messenger berfungsi untuk mengirim dan menerima pesan Yahoo Messenger.

1y ago

165 Views

SINGAPORE - Kelly Services

FINANCE Chief Financial Officer Degree/Master 15 20,000 25,000 Finance Assistant Diploma 1-3 2,800 3,400 Finance Controller Degree 10-15 10,000 18,000 Finance Director Degree 15 15,000 20,000 Finance Executive/ Senior Finance Executive Degree 2-5 3,000 6,000 Finance Manager/ Assistan

2y ago

527 Views

Ministries of Finance and Nationally Determined Contributions

Rodrigo Rojo, IDB Sr. Consultant and advisor to Ministry of Finance of Chile. Colombia German Romero Otalora and Laura Marcela Ruiz Daza — Office of the Vice-Minister — Ministry of Finance. Ireland Paul Ryan — International Finance Division — Ministry of Finance Sean Judge — Department of Finance — Ministry of Finance

1y ago

232 Views

Trade Finance & Supply Chain Finance Awards 2022

In February 2022, Global Finance will publish its annual selections for the World's Best Trade Finance and Supply Chain Finance Providers. Global Finance will name the best trade finance providers in more than 100 countries and territories, eight global regions and

1y ago

215 Views

Collocation Extraction Using Square Mutual Information .

It looks like you're using an ad-blocker