THE MATCHBOX ALGORITHM FOR - University Of Regina

10m ago
6 Views
1 Downloads
2.00 MB
80 Pages
Last View : 23d ago
Last Download : 3m ago
Upload by : Isobel Thacker
Transcription

THE MATCHBOX ALGORITHM FOR CLEANING CONTACT LISTS THAT INCLUDE NICKNAMES AND SPOUSES A A Thesis subm itted to the Faculty of o f Graduate G raduate Studies and Research R esearch submitted F ulfillm ent of o f the Requirements R equirem ents in Partial Fulfillment of for the degree of Master M aster of o f Science in Computer C om puter Science University U niversity of o f Regina R egina By Yashu Y ashu Bither B ither R egina, Saskatchewan Saskatchew an Regina, Septem ber, 2005 2005 September, Copyright C opyright 2005: 2005: Yashu Y ashu Bither B ither R ep ro d u ced with permission p erm ission of o f the th e copyright owner. ow ner. Further Further reproduction reproduction prohibited prohibited without w ithout permission. perm ission. Reproduced

1 1 Library and Archives Canada Bibliotheque et Archives Canada Published Heritage Branch Branch Direction du Direction Patrimoine de redition I'edition 3 9 5 Wellington W ellington Street S tre e t 395 O ttaw a ON ON KlA K1A ON4 0N 4 Ottawa C anada Canada 395, rue ru e Wellington W ellington 395, O ttaw a ON ON KlA K1A ON4 0N 4 Ottawa C anada Canada Your file Votre reference ISBN: ISBN: 978-0-494-18878-1 978-0-494-18878-1 Our file Notre reference ISBN: ISBN: 978-0-494-18878-1 978-0-494-18878-1 NOTICE: non The author has granted a nonexclusive license allowing Library reproduce, and Archives Canada to reproduce, publish, archive, preserve, conserve, conserve, communicate to the public by telecommunication or on the Internet, Internet, loan, distribute and sell theses worldwide, for commercial or nonnon microform, commercial purposes, in microform, paper, electronic and/or any other formats. formats. AVIS: accorde une licence non exclusive L'auteur a accord& permettant a la Bibliotheque et Archives archiver, Canada de reproduire, publier, archiver, sauvegarder, conserver, transmettre au public I'lnternet, preter, par telecommunication ou par ('Internet, distribuer et vendre des theses partout dans le monde, a des fins commerciales ou autres, sur support microforme, papier, electronique formats. et/ou autres formats. The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from from it may be printed or otherwise reproduced without the author's permission. L'auteur conserve la propriete du droit d'auteur et des droits moraux qui protege cette these. these. Ni la these ni des extraits substantiels de celle-ci ne doivent etre imprimes ou autrement autorisation. reproduits sans son autorisation. In compliance with the Canadian In supporting Privacy Act some supporting forms may have been removed thesis. from this thesis. Conformement a la loi canadienne sur la protection de la vie privee, quelques formulaires secondaires these. ont ete enleves de cette these. While these forms may be included in the document page count, count, their removal does not represent any loss of content from the thesis. Bien que ces formulaires aient inclus dans la pagination, pagination, manquant. itil n'y aura aucun contenu manquant. i*i 1*1 Canada R ep ro d u ced with permission p erm ission of o f the th e copyright owner. ow ner. Further Further reproduction reproduction prohibited prohibited without w ithout permission. perm ission. Reproduced

UNIVERSITY OF REGINA FACULTY OF GRADUATE STUDIES AND RESEARCH SUPERVISORY AND EXAMINING COMMITTEE Yashu Bither, candidate for the degree of o f Master of o f Science, has presented a thesis titled, The fo r Cleaning Contact Lists that Include Nicknames and Spouses, in Matchbox Algorithm for an oral examination held on September 9, 9,2005. 2005. The following committee members have found the thesis acceptable in form and content, and that the candidate demonstrated satisfactory knowledge of o f the subject material. External Examiner: Dr.Allan Dr. f Chemistry AllanEast, East,Department Departmentoof Chemistryand and Biochemistry Biochemistry Supervisor: Dr.Howard HowardHamilton, Hamilton,Department Departmentoof ComputerScience Science Dr. f Computer Committee Member: Dr.Philip PhilipFong, Fong,Department Dr. Departmentof ofComputer ComputerScience Science Committee Member: Dr. f Computer Dr.Dominik DominikSlezak, Slezak,Department Departmentoof Computer Science Science o f Defense: Chair of Dr.Chris ChrisFisher, Fisher,Department Departmentof ofMathematics Dr. Mathematicsand and Statistics Statistics R ep ro d u ced with permission p erm ission of o f the th e copyright owner. ow ner. Further Further reproduction reproduction prohibited prohibited without w ithout permission. perm ission. Reproduced

Abstract Data cleaning is a two-step process of o f detecting and correcting errors in a data set. Typically, data cleaning occurs before data is placed in a data warehouse. The purpose of of the data warehouse is to provide access to historical information derived from operational databases and to combine all information obtained from disparate, often external, sources of o f data. One type of data commonly stored in a data warehouse is a contact list, which is a file containing uniquely identifiable records, each of o f which describes how to make contact with a person or group of o f people via telephone, mail, email, etc. The best known type of contact list is a mailing list. A major difficulty faced by many organizations is the synonymous record problem, where multiple records in a contact list represent the same real world entity in different forms. Whenever this problem exists, it is crucial that it be addressed by data cleaning. The most common version of o f the synonymous record problem occurs when the same person is represented in a contact list with slightly varying names or addresses. In practice, the synonymous record problem often results after multiple contact lists are merged. A new data cleaning algorithm, called the Matchbox algorithm, is proposed for a version of o f the synonymous record problem where one of o f the contact lists, which is called o f much higher quality than the other, which is called the secondary the primary list, is of list. As well, in these contact lists, nicknames are often used instead of o f standard names, and the name field often holds the names of o f groups of o f people instead of o f a single name. o f this study was limited to a group of o f two people, such as For simplicity, the scope of husband and wife. i Reproduced R ep ro d u ced with permission p erm ission of o f the th e copyright owner. ow ner. Further reproduction reproduction prohibited without w ithout permission. perm ission.

Matchbox is based on conditional rules (or C-rules) for matching records based on combinations of o f attributes such as names, addresses, name aliases, and spousal units. As indicated in other studies, the probability of o f cleaning 100% 100% of o f the errors from data is very slim. Matchbox can be used as an automatic method that can efficiently find a high o f potentially synonymous records. In order to attain higher accuracy, some percentage of human interaction is also allowed. Matchbox was evaluated by a series of o f experiments performed on a combination of o f synthetic and real data. When compared to the Multi Pass Sorted Neighborhood Method (MP-SNM), Matchbox was found to provide higher accuracy and precision in less time. ii Reproduced R ep ro d u ced with permission p erm ission of o f the th e copyright owner. ow ner. Further reproduction reproduction prohibited without w ithout permission. perm ission.

Acknowledgements of I would like to thank to all those people who aided me in the successful completion of my research and the preparation of o f this thesis. First of o f all, I would like to thank the Department of o f Computer Science and the Faculty of o f Graduate Studies and Research for giving me the opportunity to pursue my Master's M aster’s Degree in Computer Science at the University of o f Regina. My sincere appreciation goes to my supervisor, Dr. Howard J. Hamilton, without whose encouragement, guidance, and suggestions this work would not have been completed. Dr. Hamilton's Hamilton’s constructive criticisms really helped me in completing this thesis. I am really lucky to have him as my supervisor, and his guidance has led me to a variety of o f new knowledge throughout my research. Financial assistance from the National Sciences and Engineering Research Council of o f Canada, via research grants provided to Dr. Hamilton, is gratefully acknowledged. In addition, I would like to o f my supervisory committee, Dr. Philip W. L. Fong and Dr. thank the other members of Dominik Slezak, for their time and efforts. Finally, I am grateful to my previous employer Arcas Areas Group Inc. and my current employer SaskTel who provided me with the flexibility to pursue an M.Sc. at the University of o f Regina as a part time student. iii Reproduced R ep ro d u ced with permission p erm ission of o f the th e copyright owner. ow ner. Further reproduction reproduction prohibited without w ithout permission. perm ission.

Post Defense Acknowledgement I would like to thank the external examiner, Dr. Allan East, and the committee chair, Dr. Chris Fisher, for their time. iv R ep ro d u ced with permission p erm ission of o f the copyright owner. ow ner. Further reproduction reproduction prohibited without w ithout permission. perm ission. Reproduced

Dedication Completing the M.Sc. program at the University of o f Regina is one of o f the important achievements of o f my life. My family has always encouraged me to study and work hard from my school days and has helped create my interest in Computer Science. I am really thankful to my school and college teachers in India who helped give me a firm foundation of o f knowledge and who also inspired me to continue on to higher studies in Computer Science. I would like to sincerely thank my wife, Zinnia Bither, and my two children, Yashica and Aniket, for all their patience, support, and understanding. Last but not least, I dedicate all my work to my mother Smt. Sushma Rani Bither and my father Sh. Vinod Bhushan Bither without whose blessings I could not have achieved this goal. VV R ep ro d u ced with permission p erm ission of o f the th e copyright owner. ow ner. Further Further reproduction reproduction prohibited prohibited without w ithout permission. perm ission. Reproduced

Table of Contents Abstract Abstract.i Acknowledgements.iii Acknowledgements iii Post Defense Acknowledgement Acknowledgement. iv Dedication. v Dedication o f Contents Contents. vi Table of List of Tables Tables. viii ix o f Figures Figures.ix List of Acronyms.x List of Abbreviations and Acronyms CHAPTER 11 INTRODUCTION INTRODUCTION. 11 1.1 1.1 1 thesis.1 Introduction to the thesis 1.2 1.2 9 Structure f the Structureoof thethesis.9 thesis CHAPTER 2 BACKGROUND. 11 PROBLEM AND BACKGROUND 2.1 Problem statement ent. 11 11 Problem statem 2.2 Survey of o f previous research research.19 19 CHAPTER 3 THE MATCHBOX ALGORITHM A LGORITHM . 27 3.1 AApproach pproach. 27 3.2 29 The TheMatchbox Matchbox algorithm. algorithm 3.3 32 derived attributes. attributes Updated Updated and and derived 3.4 Conditionalru rules Conditional les. 38 3.5 38 o f records records.38 Matching and merging of 3.6 M ethod. 39 The Multi Pass Sorted Neighborhood Method vi vi Reproduced R ep ro d u ced with permission p erm ission of o f the th e copyright owner. ow ner. Further reproduction reproduction prohibited without w ithout permission. perm ission.

CHAPTER 4 44 EXPERIMENTAL RESULTS. RESULTS EXPERIMENTAL 44 4.1 Introduction. Introduction 4.2 Independent Independentapplication applicationoof f CC-rules -rules. 45 47 f aasequence f CC-rules -rules.47 Applicationoof sequenceoof 4.3 Application 50 4.4 Varying Varyingthe theorder orderoof thesequence sequenceoof f the f CC-rules -rules.50 4.5 for HAS HAS and and LA LAS Confusion S. 52 Confusion matrix matrix for 4.6 Spousal es. 53 Spousal ordering ordering and and canonical canonical first first nam names 4.7 55 the size size ooff the the repaired repaired secondary secondary list. list Increasing Increasing the Matchboxand andMMP-SNM f Matchbox P-SN M . 56 4.8 Comparison Comparisonoof CHAPTER 5 62 FUTURE WORK. WORK CONCLUSIONS AND AND FUTURE CONCLUSIONS 62 f contributions.62 contributions 5.1 Summary Summaryoof 5.2 63 Suggestions for for future future research. research Suggestions R eferences. References vii R ep ro d u ced with permission p erm ission of o f the copyright owner. ow ner. Further reproduction reproduction prohibited without w ithout permission. perm ission. Reproduced 65

List of Tables Table 1.1 1.1 Tworecords recordsthat thatare aremissing Two missingcrucial crucialinformation. information 9 Table 2.1 Primary 12 Primarylist. list 12 Table 2.2 Secondarylist. list Secondary 13 13 Table 2.3 Repaired Repairedsecondary secondarylist.14 list 14 Table 2.4 Matchlist. list Match 15 15 Table 2.5 Novel 15 Novellist. list 15 Table 2.6 Partiallist listoof nicknamesfor forfirst firstnam names Partial f nicknames es.17 17 Table 2.7 Individual 19 Individualand andspousal spousalname namevariations. variations 19 Table 2.8 Summary Summaryof ofmethods methodsfor forsynonymous synonymousrecords.26 records 26 Table 3.1 Attributesused usedininC-rules C-rulesininthis thisthesis.33 thesis Attributes 33 Table 3.2 Example Exampleset setfor forSNM SNMstep step 1.41 1 41 Table 3.3 Exampleset setfor forSNM Example SNMstep step22.42 42 Table 4.1 Performancefor forconditional conditionalrules rules(No (NoSequence). Sequence) 46 Performance Table 4.2 High HighAccuracy AccuracySequence Sequence(HAS): (HAS):Sequence Sequenceand andperformance.48 performance . 48 Table 4.3 Summaryoof accuracycomparison comparisonfor for20 20sequences. sequences Summary f accuracy 50 Table 4.4 LowAccuracy AccuracySequence Sequence(LAS): (LAS):Sequence Sequenceand andperformance. performance Low 51 Table 4.5 Confusionmatrices matricesfor forHAS HASand andLAS.52 LAS 52 Confusion Table 4.6 Effectiveness f SO Effectivenessoof SOand andCFN CFN (for (for step step 11 only). only) 54 Table 4.7 Effectivenessoof SOand andCFN CFN (for Effectiveness f SO ).54 (for both both step 1and andstep step22) 54 Table 4.8 Comparison f time, Comparisonoof time,precision, precision,and andaccuracy. accuracy 56 Table 4.9 Comparisonoof resultsbetween betweenMP-SNM MP-SNMand andMatchbox.57 Comparison f results Matchbox 57 Table 4.10 Feature Featurecomparison. comparison 61 viii Reproduced R ep ro d u ced with permission p erm ission of o f the th e copyright owner. ow ner. Further reproduction reproduction prohibited without w ithout permission. perm ission.

List of Figures Figure 2.1 24 mergephase. phase Window Windowscan scanduring duringthe themerge Figure 3.1 30 Matchbox Matchboxalgorithm algorithm-— Step Step 1.30 1 Figure 3.2 Steps22 and and33. 31 Matchbox Matchboxalgorithm algorithm-— Steps Figure 3.3 Canonical first firstname namequery.35 query Canonical 35 Figure 3.4 Algorithm 37 The The Soundex SoundexAlgorithm. Figure 4.1 secondary list list on onelapsed elapsed size oof therepaired repairedsecondary Effect f the Effect of ofincreasing increasingthe the size time tim e .55 55 Figure 4.2 Comparison f precision.59 Comparisonoof precision 59 Figure 4.3 Comparison f accuracy. accuracy Comparisonoof 59 Figure 4.4 Comparisonoof elapsedtim time Comparison f elapsed e. 60 ix R ep ro d u ced with permission p erm ission of o f the th e copyright owner. ow ner. Further reproduction reproduction prohibited without w ithout permission. perm ission. Reproduced

List of Abbreviations and Acronyms AC: Accuracy. CA: Compare All. CF: Certainty Factor. CFN: Canonical First Name. C-rule: Conditional Rule. DB-P: Primary Contact List. DB-S: Repaired Secondary Contact List. DE-SNM: DE-SNM: Duplicate Elimination Sorted Neighborhood Method. FP: False Positive. HAS: Highest Accuracy Sequence. LAS: Lowest Accuracy Sequence. LLLK algorithm: Lee, Lu, Ling, and Ko algorithm. MP-SNM: MP-SNM: Multi Pass Sorted Neighborhood Method. P: Precision. RFM: Recursive Field Matching. SNM: Sorted Neighborhood Method. SO: Spousal Ordering. SQL: Structured Query Language. TH: Threshold. TP: True Positive. x R ep ro d u ced with permission p erm ission of o f the th e copyright owner. ow ner. Further Further reproduction reproduction prohibited prohibited without w ithout permission. perm ission. Reproduced

CHAPTER 11 INTRODUCTION This chapter provides an introduction to the thesis in Section 1.1. 1.1. An outline of o f the remainder of o f the thesis is given in Section 1.2. 1.2. 1.1 1.1 Introduction to the thesis The general problem addressed in this thesis is ensuring that data inserted into a o f high quality. According to the original definition, a data warehouse data warehouse is of is a subject-oriented, integrated, time-variant, and non-volatile collection of o f data in o f management's decision making process [15]. Currently, the term data support of warehouse is used to refer to a relational database that contains historical data collected from several data sources [1]. A data warehouse is a relational database that is designed for query and analysis purposes [15]. Ensuring that data in a data warehouse is of o f high quality simplifies the update and other maintenance operations such as development of of new services that utilize the data warehouse and facilitate the provision of o f excellent customer service. Users may view the same data from a data warehouse in different forms. In this thesis, we focus on data viewed as contact lists. A contact list is a file or a database containing uniquely identifiable records, each o f which describes how to make contact with a person or group of o f people via telephone, of mail, email, etc. The best known type of o f contact list is a mailing list. Contact information can be maintained in a data warehouse to allow organization wide access to this valuable resource. A particular contact list can be extracted from the data warehouse using criteria based on its purpose or the current direct marketing strategy. For example, When a phone contact list is extracted from the data warehouse, the required fields may be the 1 Reproduced R ep ro d u ced with permission p erm ission of o f the th e copyright owner. ow ner. Further reproduction reproduction prohibited without w ithout permission. perm ission.

salutation, the last name, and the telephone number. On the other hand, when a mailing list is extracted, the required fields may be the salutation, the first name, the last name, the street address, the city name, and the postal code. Lee et al. emphasizes the need for setting up a data warehouse [16]. According to Han and Kamber, data should be pre-processed before it is stored in a data warehouse [8]. Preprocessing can improve the quality of o f data and thus the quality of o f the results obtained from all subsequent analyses based on the data in the data warehouse. According to one definition [7], data cleaning is the task of o f detection and correcting errors in a data set. Typically, data cleaning occurs before the data is placed in the data warehouse [8]. Thus, data cleaning can be viewed as a kind of o f preprocessing applied before data are loaded into a data warehouse. The process of o f data cleaning is crucial because of o f the "garbage “garbage in, out” principle [11]. [11], garbage out" o f data cleaning typically applied to contact lists are as follows: Some types of updating the addresses for those customers who recently moved (based on change of o f an address address information filed with the post office), verifying the accuracy of (based on lists of o f valid street addresses, city names, and postal codes maintained by the post office or other organization), and removing extra records for the same person or group. Data cleaning permits better customer service, because cleaning increases the accuracy and completeness of o f the data. Some typical benefits that an organization can obtain from data cleaning are as follows (adapted from [13]): 1. Reduced mailing costs from identifying unique customers and removing duplicated 1. information from the database. 2 Reproduced R ep ro d u ced with permission p erm ission of o f the th e copyright owner. ow ner. Further reproduction reproduction prohibited without w ithout permission. perm ission.

2. More efficient data maintenance and consequently increased operational savings through ongoing data quality controls and data integrity. For example, in a credit check application, high quality data will improve the decision-making concerning the customer's customer’s credit status and thus lower the overall operational cost. 3. Increased fraction of responses from contact campaigns run by the organization. Clean data helps the organization target the right customers at the right time. 4. Enhanced customer satisfaction and loyalty via consolidated customer profiling. Clean data helps an organization understand its customer likes and dislikes. A major difficulty faced by direct marketing, non-profit, political, and charitable organizations is the synonymous record problem, where multiple records in a contact list represent the same real world entity in different forms. Whenever this problem exists, it is o f the crucial that it be addressed by data cleaning. The most common version of synonymous record problem occurs when the same person is represented in a contact list with slightly varying names or addresses. The major reasons for synonymous records and other problems in data quality are as follows (adapted from [8] and [16]): 1. Typographical errors in data entry and data recording: for example, the first name 1. may be recorded as "Hohn" “Hohn” instead of o f "John". “John”. 2. Inconsistent data entry formats or naming conventions: for example, the apartment number may be recorded as a separate field in one list, but it may be combined with address line 11 in another list. 3 Reproduced R ep ro d u ced with permission p erm ission of o f the th e copyright owner. ow ner. Further reproduction reproduction prohibited without w ithout permission. perm ission.

3. Inconsistent updates: for example, two lists may both include product codes to o f product codes may have been changed, resulting in categorize items, but the set of two different codes for the same product in the list. 4. Poor database design: for example, a field may not have been provided to record middle names. 5. Faulty data collection instrument: for example, due to poor software design the first name may always be stored as the last name and the last name as the first name. 6. Errors in data transmission: for example, a credit card processing mechanism with a limited buffer size may drop significant information, such as the last letters from names longer than 20 characters. 7. Missing updates: for example, a woman who changed her name when she got married may not have made an explicit request that her last name be updated. 8. Misleading information deliberately provided by customer: for example, a person o f his legal first name in an effort to be treated may have given his nickname instead of as two different individuals. 9. An incomplete or missing data value: for example, a customer may not have provided his middle name. Organizations that do not address the synonymous record problem waste significant amounts of o f time, money, and other resources. For example, extra mailing costs are incurred for every letter sent to a duplicated or incorrect address. These costs are repeated for every direct mail campaign. Updates to the contact list are also complicated by the synonymous record problem. Sincere efforts to respond to change of o f address requests, spelling corrections, death notices, last name changes due to marriage, and 4 Reproduced R ep ro d u ced with permission p erm ission of o f the th e copyright owner. ow ner. Further reproduction reproduction prohibited without w ithout permission. perm ission.

requests for removal from the mailing list may be ineffective, because only one of o f the synonymous records is adjusted. For example, consider a contact list that contains two records that represent the same person. If the organization learns that the person has died, o f the records, it may subsequently try to contact the dead person. but it updates only one of This attempted contact wastes effort and money and may offend potential customers. The synonymous record problem often results after two contact lists are merged. A specific version of the problem with three notable features is investigated in this thesis. First, one of o f the contact lists is of o f much higher quality and value than the other. Secondly, o f standard names. Thirdly, the name field of o f a record nicknames are often used instead of o f a group of o f people

Matchbox is based on conditional rules (or C-rules) for matching records based on combinations of attributes such as names, addresses, name aliases, and spousal units. As indicated in other studies, the probability of cleaning 100% of the errors from data is very slim. Matchbox can be used as an automatic method that can efficiently find a high

Related Documents:

Matchbox Oven Four Matchbox Horno Matchbox Models/Modèles/Modelos: M1313, M1718 Do not operate this equipment unless you have read and understood the contents of this manual! Failure to follow the instructions contained in this manual may result in serious injury or death. This manual contains important safety information concerning the .

May 02, 2018 · D. Program Evaluation ͟The organization has provided a description of the framework for how each program will be evaluated. The framework should include all the elements below: ͟The evaluation methods are cost-effective for the organization ͟Quantitative and qualitative data is being collected (at Basics tier, data collection must have begun)

Silat is a combative art of self-defense and survival rooted from Matay archipelago. It was traced at thé early of Langkasuka Kingdom (2nd century CE) till thé reign of Melaka (Malaysia) Sultanate era (13th century). Silat has now evolved to become part of social culture and tradition with thé appearance of a fine physical and spiritual .

On an exceptional basis, Member States may request UNESCO to provide thé candidates with access to thé platform so they can complète thé form by themselves. Thèse requests must be addressed to esd rize unesco. or by 15 A ril 2021 UNESCO will provide thé nomineewith accessto thé platform via their émail address.

̶The leading indicator of employee engagement is based on the quality of the relationship between employee and supervisor Empower your managers! ̶Help them understand the impact on the organization ̶Share important changes, plan options, tasks, and deadlines ̶Provide key messages and talking points ̶Prepare them to answer employee questions

Dr. Sunita Bharatwal** Dr. Pawan Garga*** Abstract Customer satisfaction is derived from thè functionalities and values, a product or Service can provide. The current study aims to segregate thè dimensions of ordine Service quality and gather insights on its impact on web shopping. The trends of purchases have

42 3 NIB Matchbox Dallas Cowboys Trucks NFL Dallas Cowboys, LTD Edition, 1994, '96, '97. 43 2 NIB Matchbox Penn State Trucks Matchbox Team Collectibles, 1991 & 1992 44 2 NIB Matchbox Dallas Cowboys Trucks 2 Adult Team Collectibles NFL Cowboys Trucks, 1999. 45 2 NIB White Rose Collectibles Collegiate Trucks 2 Penn State 2001 trucks

Matchbox 360: The Commercial Cooking Revolution Continues Matchbox 360-14 Specifications Height: 19.9" Width: 33.2" Depth: 41.3" Oven Weight: 211 lbs. Power: Single Phase 60 Hz Operating Voltage: 208V / 240V Max Current Draw: 32A/36A Plug: NEMA 6-50P Cord Length: 72" Matchbox 360-12 Specifications Height: 19.9" Width: 29.3" Depth .