Mid-frequency Readers - Lancaster University

3y ago
52 Views
2 Downloads
393.08 KB
12 Pages
Last View : 8d ago
Last Download : 3m ago
Upload by : Roy Essex
Transcription

Mid-frequency readersPaul NationLALS, Victoria University of Wellington, Wellington, New ZealandLaurence AnthonyWaseda University, Tokyo, JapanThis article describes a new free extensive reading resource for learning themid-frequency words of English and for reading well known texts with minor vocabulary adaptation. A gap exists between the end of graded readers at around 3,000 word families and the vocabulary size needed to readunsimplified texts at around 8,000 word families. Mid-frequency readers aredesigned to fill this gap. They consist of texts from Project Gutenberg adaptedfor learners with a vocabulary size of 4,000 word families, 6,000 word families and 8,000 word families. Each text is available at these three differentlevels. The goal is to have at least fifty such texts at each of the three differentlevels freely available. The adaptation is done using the BNC/COCA wordfamily lists and the AntWordProfiler program. The article also discusses research that needs to be done on learning mid-frequency vocabulary and oncreating and using mid-frequency readers.The vocabulary demands of readingResearch on vocabulary comprehension hasshown that a learner of English needs tounderstand around 98% of the running words(tokens) in a text for unassisted comprehension(Hu & Nation, 2000; Schmitt, Jiang, & Grabe,2011). Using corpora from various genres,Nation (2006) showed that this value equates toaround 8,000 word families (see Table 1), whichis an ambitious goal for most learners andwould require a lot of deliberate and incidentallearning of vocabulary.5Table 1: Vocabulary sizes needed to get 98%coverage (including proper nouns) of various kinds of texts (Nation, 2006)Texts98% coverageNovels9,000 word familiesNewspapers8,000 word familiesSpoken English7,000 word familiesChildren’s movies6,000 word familiesTo reach these high vocabulary sizes, extensivereading should play a large role in any vocabulary learning program, both in helping thelearning of vocabulary and in improving its use

Journal of Extensive Reading2013, Volume 1(see Pigada & Schmitt, 2006; Waring & Takaki,2003, for reviews). Unfortunately, most gradedreading schemes end at around the 3000 wordfamily level. This means that if learners witha vocabulary size of 3,000 word families ormore want to continue doing extensive readingwhich is at the right level for them, there is nosuitable material. The 5,000-6,000 word-familygap between the end of graded readers and therequirements for unassisted comprehension issimply too large. Also, it is possible that even if alearner has the vocabulary knowledge requiredfor unassisted reading, some of the vocabularywill not be accessed quickly enough for fluentextensive reading. Thus, the need to bridge thegap between graded readers and authentic textsis even more important.In the past, a series to bridge this gap, appropriately called the Bridge series, was publishedby Longman, Green, and Co. The Bridge seriescontained 32 titles including fiction works,such as Animal Farm, Lucky Jim, Persuasion, TheRed Badge of Courage, and Great Expectations,and non-fiction works, including The Mysterious Universe, Changing Horizons, and Mankindagainst the Killers. Although the series is nowout of print, the number of printings for someof the books shows that they, at least, sold well.The following is a note describing the series thatappeared in the introduction to Animal Farm.The Bridge Series is intended for students of English asa second or foreign language who have progressed beyond the elementary graded readers and the LongmanSimplified English Series but are not yet sufficiently advanced to read works of literature in their original form.The books in the Bridge Series are moderately simplifiedin vocabulary and often slightly reduced in length, butwith little change in syntax. The purpose of the texts isto give practice in understanding fairly advanced sentence patterns and to help in the appreciation of Englishstyle. We hope that they will prove enjoyable to read fortheir own sake and that they will at the same time helpstudents to reach the final objective of reading originalworks of literature in English with full understandingand appreciation.6ISSN: 2187-5065Technical Note:In the Bridge Series words outside the commonest 7000(in Thorndike and Lorge: A Teacher’s Handbook of 30,000Words. Columbia University, 1944) have usually beenreplaced by commoner and more generally usefulwords. Words used which are outside the first 3,000 ofthe list are explained in a glossary and are so distributedthroughout the book that they do not occur at a greaterdensity than 25 per running 1000 words.(from the introduction to the Bridge Series editionof Animal Farm, 1945)The Bridge series involves a reasonable amountof glossing (the glossary is in the form of a listwith definitions at the back of the book) and asmall amount of adaptation. For example, Animal Farm contains a glossary of around 880 wordswhich cover approximately 3.3% of the runningwords in the text. The number of glossed wordsfor Animal Farm is high because no words werereplaced in the text. Other glossaries range from120 to 600 words. Although having an extensiveglossary at the back of the book could interruptthe flow of reading, glossed words in the BridgeSeries are not bolded or marked in any wayin the text. Learners are supposed to look upwords only when they need to.With the growth of personal computers and thedevelopment of word family lists and computerprograms that use them, the study of the vocabulary load of text has become increasingly moredetailed. For example, Nation (2009) lookedin detail at the number of changes that wouldneed to be made to adapt texts for learners atvarious vocabulary size levels. In Table 2, wecan see that to adapt the Project Gutenberg version of the novel Lord Jim by Joseph Conrad fora learner who knew 4,000 word families, 5% ofthe word families would need to be glossed and0.75% of the word families would need to bereplaced. In column 3, the “target word familiesto gloss” is arbitrarily set at a maximum of 5%.If this percentage is lowered, then the percentage in Column 4 needs to be increased. Severalunknown words will be easy to guess from context, and words which are easy to guess shouldnot be chosen for replacement. The lowest frequency level words are replaced unless they are

Journal of Extensive ReadingISSN: 2187-50652013, Volume 1Table 2: Percentage of target word families to support and word families to replace in Lord Jimat various levels of previous knowledgeAssumed knownword families2,0003,0004,0005,0006,0007,0008,0009,000% coverage of knownword families% target wordfamilies to 05.04.353.442.802.261.85% of word familiesto replace6.682.840.7500000Total %100100100100100100100100repeated within the text or they are easy toguess.most well-developed and well-known example.Table 2 shows that as learners’ vocabulary sizeincreases, the percentages of changes that needto be made become small. However, the weakness of this method of calculating changes isthat a small percentage can still be a large number of word families. Lord Jim is 132,413 tokenslong, so 5% of the tokens equals 6,621 tokens.This is well over 2,500 word families, which isfar too heavy an unknown vocabulary load fora reader. What this means is that the numberof words replaced needs to be greater so thatonly a small percentage of the running words(well under 2%) are unknown words. The critical figure is the raw number of unknown wordfamilies that need to be dealt with by the reader,not the percentage coverage of text by unknownwords.Following the lead of Schmitt and Schmitt (2012),here we consider the high-frequency vocabularyto include the most frequent and wide ranging3,000 word families of English (see Table 3).The arguments in favour of including the first3,000 word families in the high-frequency levelare that the 3,000 word-family level is neededto gain 95% coverage of the running words inmost texts (when the coverage of proper nounsand marginal words is included), and that mostgraded readers end at around the 3,000 wordfamily level. Note that this figure differs fromNation (2001) who considered this level to contain only the first 2,000 word families.High-, mid-, and low-frequency vocabularyIt is useful to distinguish three broad frequencylevels of vocabulary: high-frequency vocabulary, mid-frequency vocabulary, and low-frequency vocabulary. The idea of high-frequencywords has a long history, and Michael West's(1953) A General Service List of English Wordscontaining around 2,000 word families is the7In Table 3, the mid-frequency vocabulary consists of around 6,000 word families, which whenadded to high-frequency vocabulary adds up to9,000 word families. The reason for making thearbitrary cut-off point between mid-frequencyand low-frequency vocabulary after the 9th1000 word-family level is because 9,000 wordfamilies provide 98% coverage of most texts,when the coverage of proper nouns and othermarginal words is also included.

Journal of Extensive Reading2013, Volume 1ISSN: 2187-5065Table 3: High-frequency, mid-frequency, and low-frequency vocabularyVocabulary levelWord family levels (and total) Nature of the vocabularyHigh-frequency1st 1000-3rd 1000 (3,000)Mid-frequency4th 1000-9th 1000 (6,000)Low-frequency10th 1000 onWide range, very high-frequency, essential, general purpose vocabularyWide range, moderate frequency, general purpose vocabularyNarrower range, low-frequency, some technicalvocabulary unique to a particular disciplineIn order to create the word family lists reportedin the Nation (2009) study, an untagged versionof the British National Corpus (BNC) was used.This was divided along genre divisions into 10roughly equally sized sections each 10,000,000word tokens long. At around the 9,000 wordfamily level, the range figures for the mostfrequent words changed from a value of 10 toa value of 9. That is, at around the 9,000 wordfamily level, the word families did not occur inall 10 sections of the BNC, but in only 9 of them.This can be seen as marking a change fromgenerally useful vocabulary to more narrowlyfocused vocabulary.Table 4 shows examples of word families from arevised list of mid-frequency word family levellists that were developed for this study on thebasis of frequency information from the BNCcombined with that from the Corpus of Contemporary American English (COCA) kindlysupplied by Mark Davies (Nation, 2012). Theword families in Table 4 are taken from the listsbeginning at the letter b and are shown here sothat readers of this article can get a feel for thekinds of words in the mid-frequency vocabulary.Table 4: Example word families from the six 1000 mid-frequency word-family levels using theBNC/COCA listsWord family frequency levelExample word families4th 1000ballet, balloon, ballot, bankrupt, barn, barrel, baseball5th 1000badge, bail, bait, balcony, bald, banner, Baptist6th 1000babe, bachelor, baffle, bandage, banish, banquet, barb7th 1000badger, bale, ballad, bamboo, baptism, baptize, barbarian8th 1000babble, backfire, baggy, ballistic, banal, bandit, barber9th 1000backlog, bailiff, bandwagon, banister, banter, barbaric, bard8

Journal of Extensive ReadingISSN: 2187-50652013, Volume 1Mid-frequency words are commonly known byadult native speakers of the language, and wewould expect native-speaking children beginning secondary school to know many of thesewords to some degree. Note that the relatedwords Baptist, baptize, and baptism in Table 4are separate word families. This is because thestem form of these words is a bound form, not afree-form. That is, there is no word Bapt whichstands as a free word. Note also that compoundwords, such as backfire and bandwagon, are included. This is because these are not transparentcompounds where the meaning of the word canbe explained directly from the word parts. Thetest for transparent compounds is to see if it ispossible to state the meaning of the compoundusing the parts with few if any further contentwords needed. For example, your birthday is theday of your birth.The low-frequency words of the language area very large group. The BNC/COCA lists goup to the 25th 1000 word-family level, but thelow-frequency words stretch far beyond this.It is not easy to say how many low-frequencyword families there are in English, but variousestimates put the number at somewhere around100,000 word families (Nation, in press). Thecurrent BNC/COCA word family lists going upto and including the 25th 1000 plus the four listsof proper nouns, marginal words, transparentcompounds and abbreviations provide over99% coverage of the tokens in most texts andcorpora. At least half of the words outside thelists turn out to be proper nouns, and a largenumber of the remainder are transparent lowfrequency members of word families already inthe existing lists but which have not yet beenadded to the families (Nation, in press).Table 5 shows the typical coverage of highfrequency, mid-frequency and low-frequencyword families. The high-frequency words, midfrequency words, and proper nouns, exclamations, transparent compounds and abbreviations add up to over 98% of the running wordsin the text. The high-frequency words, propernouns, exclamations, transparent compoundsand abbreviations add up to around 95% of therunning words.Table 5: Coverage of the British National Corpus (BNC) by high, mid- and low-frequencyword familiesType of vocabulary% coverageHigh-frequency (3,000 word families)Mid-frequency (6,000 word families)Low-frequency (10th 1000 word-family level on)Other (Proper nouns, exclamations, transparent compounds, abbreviations)90%5%1-2%3-4%Total100%Table 6: Distribution of high, mid- and low-frequency word families in a variety of genresLevelHigh-frequency - 3,000SpokenTV/MoviesChildren’s 5%2.99%97.83%96.47%93.72%93.20%Mid-frequency - 6%2.13%Proper nouns etcHigh-frequency plus proper nouns9

Journal of Extensive ReadingISSN: 2187-50652013, Volume 1Table 6 shows the range of coverages in a varietyof million-token corpora that were created forthis study. The spoken corpus consists of onemillion tokens from the spoken demographicsection of the BNC and represents informalconversation. The children’s reading materialis from the New Zealand School Journal. Therange of coverage by the mid-frequency wordfamilies (1.68-4.67%) varies in line with thecoverage of the high-frequency word families.Generally, the higher the coverage by high-frequency word families, the lower the coverage bymid-freq-uency word families. Low-frequencyword families follow a similar pattern.Mid-frequency readersIn order to fill the gap left by the Bridge seriesand to enable learners to more easily mastervocabulary up to the 9,000 word-family level, oneof the authors (Nation) has begun developing aset of mid-frequency readers. Mid-frequency readersare books within a controlled vocabulary for advanced learners of English as a foreign or secondlanguage. They are adapted from the originaltexts using the profiling and simplification toolsof AntWordProfiler (Anthony, 2012) and are designed to provide interesting, comprehensiblereading to fill the gap of 6,000 word familiesbetween the end of graded readers and the de-mands of unsimplified text.Mid-frequency readers can be used in extensivereading programs or for individual study andenjoyment. Also, they can help learners learnmid-frequency vocabulary and read texts thatwould otherwise be too difficult. Each book isavailable at three levels. There is one level forlearners who know 4,000 word families, anotherfor learners who know 6,000 word families, andanother for those who know 8,000 word families. The first ones to be made available on PaulNation’s web site are The Art of War, More William, Glimpses of Unfamiliar Japan, Alice’s Adventures in Wonderland, and Metamorphosis.Note that two of these books are adaptationsof translations, so they could be called friendly translations rather than simplifications.The mid-frequency readers are available free, andcan be used for any purpose without permission,as long as they are not offered for sale or offeredon the web where payment must be made foraccess to the site. The adaptation is done in thesame spirit that was behind the setting up of thetremendous resource, Project Gutenberg. It hasbeen done without payment and without thewish for financial profit.Table 7: Corpus sizes needed to gain an average of at least ten repetitions at each of the sixmid-frequency 1000 word-family levels using a corpus of novels1000 word-familylist levelCorpus size to get an averageof at least 10 repetitions atthis 1000 word-family level(repetitions)Number of wordsappearing once/twice (out of 1000)Number offamilies metNumberofnovels4th 1000 families534,697 (12.6)93/73812 of 4th 100065 1000 families1,061,382 (13.7)101/79807 of 5 100096th 1000 families1,450,068 (13.1)89/82795 of 6th 1000137 1000 families2,035,809 (13.7)92/63766 of 7 1000168th 1000 families2,427,807 (14.1)96/70755 of 8th 1000209 1000 families2,956,908 (12.0)88/78805 of 9 100025ththth10ththth

Journal of Extensive Reading2013, Volume 1At present only a few mid-frequency readers areavailable from Paul Nation’s web site, but thegoal is to have at least fifty, each at three different levels so that the mid-frequency vocabularyis well covered with plenty of repetitions.Table 7 is adapted from Nation (in press),and shows how much reading would have tobe done to meet most of the words at each ofthe mid-frequency 1000 word-family levels.Column 2 of Table 7 shows that 534,697 wordsneed to be met before 10 repeats of the 4th 1000word families are encountered. Although thismay seem a lot of reading, it only represents sixaverage length novels, and at a reading speedof around 200 words per minute (a moderatereading speed), it would require only 1 hour 5minutes a week of reading for forty weeks, or 13minutes a day, five days a week for forty weeks.Such an amount of reading is possible if thematerial is at the right level for the learners.Making mid-frequency readersEach mid-frequency reader is in a controlled vocabulary and is adapted from the Project Gutenberg version of the text. Most of the words beyond the specified 1000 word-family levels havebeen removed, largely through replacement byhigher frequency words, but very occasionallyadjectives and adverbs and the occasional shortsentence have simply been omitted where thisdoes not affect the story line. This is rarelydone. Almost all substitutions are single wordsubstitutions. Where mis-spelling is usedto represent accented speech (orl all, nuff enough), this is changed to regular spelling. No other changes are made to the books.The aim of the changes is to make the booksmore accessible for non-native-speaking learners of English by removing the large number oflow-frequency words which are way beyondtheir vocabulary level. The goal is to adapt thebook so that in each book only a relatively smallnumber of word families which make up muchless than 2% of the running words (tokens) areunfamiliar words. The number of word families11ISSN: 2187-5065considered to be unknown which are left in thetext, differs according to the length of the text,but should only be a few hundred words. Theseunknown words can be dealt with by guessingfrom context, or looking them up in a dictionary.The number of words beyond the adaptedword-family level is given at the beginning ofeach book. Here is an example from the 4,000word-family level version of Alice’s Adventuresin Wonderland adapted by Sonia Millett.This book, Alice in Wonderland, is a Mid-FrequencyReader and has been adapted to suit readers with avocabulary of 4,000 words. It is about 27,500 words inlength. It is available in three versions of different difficulty. This version is adapte

graded readers end at around the 3,000 word-family level. Note that this figure differs from Nation (2001) who considered this level to con- tain only the first 2,000 word families. In Table 3, the mid-frequency vocabulary con-sists of around 6,000 word families, which when added to high-frequency vocabulary adds up to 9,000 word families. The reason for making the arbitrary cut-off point .

Related Documents:

22 acres of historic and tranquil green space within the city. 205 East Lemon Street, Lancaster PA (717) 393-6476 lancastercemetery.org 5 LANCASTER CENTRAL MARKET 23 North Market Street, Lancaster PA (717) 735-6890 centralmarketlancaster.com V LANCASTER COUNTY FOOD TOURS 38 Penn Square, Lancaster PA (717) 473-4397 lancofoodtours.com

MITSUBISHI METALWOOD CUSTOM SHAFTS OPTIONS mitsubishirayongolf.com Model Flex Weight Torque Tip Size Butt Size Launch Spin Tip Stiffness Fubuki J 60 X 66 3.9 0.335 0.600 Mid Mid Mid S 64 3.9 0.335 0.600 Mid Mid Mid R 61 3.9 0.335 0.600 Mid Mid Mid Fubuki J 70 X 74 3.6 0.335 0.600 Mid

Lancaster Family Allergy, LLC 730 Eden Road Suite 301 Lancaster, PA 17601 717-569-5618 Amanda J. Bittner MD Lancaster Family Allergy, LLC 730 Eden Road Suite 301 . Harrisburg, PA 17109 717-545-5256 George W. Rung MD Lancaster Orthopedic Group 231 Granite Run Drive Lancaster, PA 17601 717-560-4200 Manda Null DO

Lancaster residents by giving generously during Lancaster's largest giving day, Friday, November 18. DID YOU KNOW? Each year, more than 8,600 children participate in Lancaster Rec programing. Lancaster Rec awards more than 50,000 in youth sports scholarships annually to more than 1,600 families.

Low Mid High Launch Spin Low Mid High Launch Spin KBS Hi-Rev 2.0 Wedge Flex R S X Tip.355" .355" .355" Weight (g) 115 125 135 Torque N/A N/A N/A Launch Mid Mid Mid Program Stock Stock Stock KBS TOUR 105 Flex R S X Tip.355" .355" .355" Weight (g) 105 110 115 Torque 2.5 2.5 2.5 Launch Mid-High Mid-High Mid-High Spin Mid-High M

Penguin Readers Teacher’s Guide to Teaching Listening Skills ISBN 0 582 34423 9 NB: Penguin Readers Factsheets and Penguin Readers Teacher’s Guides contain photocopiable material. For a full list of Readers published in the Penguin Readers series, and for copies of the Penguin Readers catalogue, please .

Oct 17, 2016 · Warehouse Operators - Clamp Truck Forklift Unknown Lancaster, CA Build a Forklift Operator Resume For Free Live Career Lancaster, CA Forklift Operator Jobs, Part-time & Full-time Data Mgmt Group Lancaster, CA Yard/Warehouse Worker BMC Holdings Lancaster, CA Plumber Naval Facilities Engineerin

Year 12 Opportunities for Prospective Applicants Thanks to the outreach work of universities and colleges, professional bodies and widening participation charities, there now exist a wide range of opportunities for UK maintained-sector students to explore Higher Education in the years before they come to apply. While many providers offer opportunities for KS4 pupils, or even younger year .