DOCUMENT RESUMEED 351 872FL 020 748AUTHORTITLELeung, Kei Wai; Maciejewski, Anthony A.Technical Specifications of the Nihongo TutorialINSTITUTIONPurdue Univ., West Lafayette, IN. School ofElectrical Engineering.National Science Foundation, Washington, D.C.TR-EE-90-27Apr 90NSF-INT-881803969p.; For related document, see FL 020 747.Descriptive (141)ReportsSystem.SPONS AGENCYREPORT NOPUB DATECONTRACTNOTEPUB TYPEEDRS PRICEDESCRIPTORSIDENTIFIERSMF01/PC03 Plus Postage.*Computer Assisted Instruction; *Japanese; *Languagesfor Special Purposes; *Programing; ReadingComprehension; Second Language Instruction; SentenceStructure; Technical Writing; Uncommonly TaughtLanguages; *Vocabulary Development*Nihongo Tutorial SystemABSTRACTThe Nihongo tutorial system is an intelligenttutorial system designed to use a computer to assist scientists andengineers in developing reading competence in technical Japanese. Itconsists of three applications: the Nihongo Tutor, which providesuseful information about an article (translation, syntax,pronunciation) to help understand the text of a personalized lesson,and records student performance; the Parse Tree Editor, used toprepare technical articles for the tutorial system; and theAdministrator, which uses student data to assign an article that istechnically and pedagogically appropriate. The program integrates thethree functions. This report details the technical specifications forsystem implementation, including both the data structures andalgorithms used. The section on the Administrator looks at theattributes of the system files, technical (discipline) fieldsincluded, user records, user article selection, and user database.Information on the Nihongo Tutor covers text data structures, studentdatabase management, and electronic dictionaries. An explanation ofthe Parse Tree Editor offers specifics on the four stages of parsing:segmentation, syntax, semantics, and mapping. Some screens are usedfor illustration. **************************Reproductions supplied by EDRS are the best that can be madefrom the original ******************************
Technical Specifications of theNihongo Tutorial SystemKei Wai LeungAnthony A. MaciejewskiU.S. DEPARTMENT OF EDUCATIONOR(ce of Educational Research and improvementEDUCATIONAL RESOURCES INFORMATIONCENTER (ERIC)cighis document has been reproduced asWaived from the person or organizationoriginalingO Wry Or changes have been made to Improvereproduction qualityPants of view or OpuOns Stated In dns dot Ixmeat do not necessarily represent ott(c(aiOERI positoon or ooaCyTR-EE 90-27April 1990"PERMISSION TO REPRODUCE THISMATERIAL HAS BEEN GRANTED BYTO THE EDUCATIONAL RESOURCESINFORMATION CENTER (ERIC)."School of Electrical EngineeringPurdue UniversityWest Lafayette, Indiana 47907BEST COPY AVAILABLE2
TECHNICAL SPECIFICATIONS OF THENIHONGO TUTORIAL SYSTEMKei Wai LeungAnthony A. MaciejewskiSchool of Electrical EngineeringPurdue UniversityWest Lafayette, IN 47907TR-EE 90-27April 1990This material is based upon work supported by the National Science Foundationunder Grant No. INT-8818039. The Government has certain rights in this material.3
TABLE OF CONTENTSPageCHAPTER ICHAPTER IIINTRODUCTION1THE ADMINISTRATOR22.1 System Files32.2 Technical Fields52.3 User Record92.4 User Selection132.5 User Database15CHAPTER IIITHE NIHONGO TUTOR173.1 Text Data Structures183.2 Student Database Management373.3 Electronic Dictionaries46CHAPTER IVTHE PARSE TREE EDITOR514.1 Segmentation Stage524.2 Syntactic Stage554.3 Semantic Stage604.4 Mapping Stage61ar
1CHAPTER I - INTRODUCTIONThe Nihongo tutorial system presented here is an intelligenttutorial system designed to use a computer to assist scientists andengineers in developing reading competence in technical Japanese. Itis comprised of three applications: the Nihongo Tutor, the Parse TreeEditor and the Administrator. The Nihongo Tutor provides usefulinformation about an article such as an English translation, syntax,and pronunciation to help the student to comprehend the text of apersonalized lesson. It also records the student's performance intohis personal database which reflects his current Japan -se languageproficiency. The Administrator uses the personal data of a student toassign an article that is related to his technical interests and at hiscurrent level of Japanese competence. The Parse Tree Editor is usedto prepare technical articles for the tutorial system by incorporatingsemantic, syntactic, phonetic, and morphological information into theparse tree of a sentence in each article. This report presents thetechnical detailsof the implementation of the Nihongo TutorialSystem including both the data structures and algorithms employed.
2CHAPTER II - THE ADMINISTRATORThe function of the Administrator is to perform databasemanagement for the tutorial system. It receives information from thetutor about the performance of the student and selects anappropriate article for his next lesson based on his current Japanesereading competence. The student's personal database keeps a recordof the Japanese words, readings, and grammatical structures that thestudent has currently mastered. By comparing this information withthe content of a potential instructional article the Administrator canassign an article that has a desirable amount of new material. Notonly does this preclude unnecessary repetition of already masteredmaterials but it also prohibits the introduction of a large amount ofmaterial that the student, at his current level, cannot handle at onetime. The Administrator also attempts to provide articles whichconform to the student's technical area of interest.In addition to handling the student personal database, theAdministrator also assists in system management and development.It groups all currently available articles into categories so that thestudent can be assigned course materials directly related to hisprofession and interest. It also saves important information aboutthe student, such as his area of interest, his last access time of theNihongo Tutor, etc, which aids the Administrator in selecting articlesfor him.
32.1 System Filesimportant files that the Administratormaintains for system management. These files are called:There are several(a) Technical Fields(b) User Record(c) User Selection(d) User DatabaseThe first file records all the currently available articlescategorized into different technical areas. The second one keepspersonal information for each individual student, such as his area ofinterest, the article name of his last lesson, etc. The third one savesthe article that should be assigned to a student as the next lesson.The last one stores information about the student's level of Japaneselanguage proficiency. The attributes of the records stored in thesefiles are presented in table 1.
4Table 1. The Attributes of the Database System FilesRecord,AttributestypeFile titleStart/EndChildParentindexnode electionname*File databasename** the key of the recordWell-known UnfamiliarAreaAreaLast access Last accesstimefile
52.2 Technical FieldsThe Administrator attempts to assign articles that match thetechnical area of interest of the student. Since there is a great varietyof different interests among the students, the list of articles mustinclude topics like physical sciences, engineering, computer sciences,etc. These topics can further be divided into sub-topics if there is ademand for more specific technical areas. This suggests the usage ofa generalized tree structure to maintain all the technical fieldsavailable. Figure 1 shows the tree configuration of the technicalfields.For each topic (or sub-topic), there will be a number of articlesacces:ible to the student for instruction. Thus, each one of them isassigned a generic file name together with a unique file index. Thefirst and last indices reflect how many articles are available in aspecific area. New topics are added to the article list by insertingthem into the tree. Any article can be appended to or removed froma technical area by updating the file indices of that area.The records in the "Technical Fields" file are stored in a stringlist format (see figure 2). All the records are kept in sequence so thatthey can be identified by their position in the file. The child node listof a topic includes the sub-topics under it. The list is terminated by asentinel value of " -1 ". If there is only the sentinel value, the childnode list is empty and the topic is then one of the leaf nodes in thetree. The parent of a sub-topic represents a more general area thatthis sub-topic can be categorized under. If an area is in the top levelof the tree with no parent topic, its parent node identificationnumber will be set to zero. In figure 2, the first record physics is atop level area with two sub-topics: kinetics and kinematics whichhave two and three articles available respectively. If there is onemore article to be introduced into the area physics, its ending fileindex has to be changed from 1 to 2. Initially, only those topics withno parentsare displayedwhen a studentstartssearchingthrough
6Technical InformationManagementtRt9Figure 1. A Tree Representation of the Technical FieldsIU
7Physics; Physics; 1 1;2 3 -1; 0" 1Kinematics; Kinematics; 1 2; -1; 1" 2Kinetics; Kinetics; 1 3;-1 ;1" -100 e 0Legends:C)Technical0 File TitleArea0 Starting and Ending File Indices Child Node Liste Parent Node StringNumberFigure 2. The Format of the Technical Fields File
8the tree. If the sub-topics of an area are requested, only those in itschild node list are shown. On the other hand, when a pareht topic isrequested, only one area (the parent area) is displayed.
92.3 User RecordThe Administrator must keep personal information of thestudent in the "User Record" file to select articles that are related tohis profession and technical area of interest. If some articles areavailable in this area, an appropriate one will be assigned to himbased on his level of Japanese proficiency level. Normally, theAdministrator automatically searches through the database to findarticles that match the student's technical area. If there are no sucharticles available, however, the student can manually choose anarticle area that is closest to his technical area.The records of the "User Record" file are stored in a list format(see figure 3). Each record can be retrieved by using the MungerFunction (available from the Macintosh operating system) to searchfor a particular key which is the name of the student. Once the nameis found, the entire record is retrieved and update operations areperformed with it. Any changes are then saved into the "UserRecord" file. For example, in figure 3, if the student finishes readingthe article "Kinetics!" in the last lesson and proceeds to study thenext article "Kinetics2", the last field of the record will be changed to"Kinetics2" and the last access time is modified.Since the Administrator keeps the technical area and the articlearea as separated fields, the demand of the student for new technicalareas can be reflected to the supervisor. This provides somefeedback from the student as to what he wants to be included intothe system. The last access time reveals to the student when he lastused the tutor to study an article. The last file access shows theprevious lesson that the student may want to review if he does nothave access to the tutor for a long period of time.Each time the Administrator attempts to assign a new articlefor a student the information in his user record is displayed. Then hispersonal database is retrieved and comparcd with the contents of the1
10Nelson Leung" Kinetics" Kinetics" -1 5791 9021 6" Kinetics1"00Legends: User Name2 Technical Area0 ArticleArea Last Access Time Last Access FileFigure 3. The Format of the User Record File
11next lesson. In particular, the lexemes (i.e. all leaf nodes in the parsetrees of the sentences) stored in the file containing the translationinformation of the article are compared with the student database inorder to give a rough estimation of how much new material will bepresented to him in this article (see figure 4).The task of article selection for a student is carried outautomatically by the Administrator based on his current Japanesecompetence. However, if the student is not satisfied with the articleassigned to him, he can manually override the selection and invokecomparisons of his database with other articles in the area until hefinds one that is most appropriate. Once he accepts an assignment,his user record will be updated with the new access file name andnew access time. In case the student wants to study an article that isnot in his technical area of interest, he will be assigned with onestarting with the first file index.
12Nelson LeungUser Information:Technical AreaArticle AreaLast File of AccessLast Time of AccessKineticsKineticsKineticslArticle FileKnown areaReview areaNew areaKinetics1( Cancel1990* 1 fl 21 8 88x8 2:31 PM57%3012:.eerf: :.%ee(Up Level)OkFigure 4. A Comparison of a Student's Database with the Lexemes ofan ArticleG
132.4 User SelectionAfter the Administrator selects an article for the student, itsaves the assignment in the "User Selection" file under the student'sname (see figure 5). The records, stored in a list format, are retrievedby using the Munger Function. The tutor knows about thisassignment by searching through the selection file for the file name.Changesto the selection file can onlybe performed by theAdministrator. This insures that if a student does not finish a lessonin a session, he can return to the same article in the next session.When the student finishes a lesson, he should use the Administratorto select the next article for him. What he learns from the last lessonis updated and saved in his personal database by the tutor asdiscussed in the next chapter. Thus the selection file serves as acommunication link between the Administrator and the NihongoTutor.7
14Nelson Leung' Knetics1"0Legends:O User NameO File NameFigure 5. The Format of the User Selection File
152.5 User DatabaseEach student using the tutorial system has a personal databasefile that provides a measure of his Japanese language competence.The filename is defined as the student's login followed by a ".db"extension. This file consists of two areas: a well-known area and anunfamiliar area. The well-known area keeps all the material that thestudent has currently mastered whereas the unfamiliar area savesinformation about material that needs to be reviewed. Each time hestudies a lesson, his database records are updated by the tutor toreflect his current level of competence. The Administrator's task ofselecting an appropriate article for the student is closely related tothe information on the student's proficiency of Japanese. TheAdministrator is designed to maximize comprehension throughcontext and minimize rote memorization and dictionary use. Thus, itdoes not assign an article to the student that is too difficult for him tostudy. In other words, a certain percentage of the lexemes in thearticle should fall into the well-known area.Figure 6 shows the contents of the two areas stored in a "UserDatabase" file. The records are loaded into the system and stored intwo hash tables, one for each area. When the Administrator iscomparing the contents of an article with a student's personaldatabase, the lexemes in the translation file will be hashed into thetables to check for a match with the student database. A match issaid to have occurred if the two keys are identical. There is a chancethat a lexeme in the article is not matched in neither the well-knownarea nor the unfamiliar area. This lexeme is then totally new to thestudent. The Administrator does not change any records of thestudent's database during the process of comparison.
164*-);* 5 ty: 1T;;" 2:(1';" 5cn (1;7 6; ;'7C\;L.5Legends: Well-known Areae Unfamiliar Area ; "9O Keyword Reading Frequency Count0 String;b ;r4t117t;ZNumber ; 3L 2'DC1;11-1tc.;4: 1;5;;" 2t;5;;" 3i )j;5; 3 h, e 3 4,;4;;" 50;5;;" 6-c Cs;" 775v;5;;" 8itgt,;5; e;" -1OOFigure 6. The Format of a User Database File7
17CHAPTER III - THE NIHONGO TUTORThe underlying task of the Nihongo Tutor is to improve astudent's proficiency in comprehending technical Japanese byproviding him with Japanese articles to study. The article assigned iscarefully selected by the Administrator according to the student'scurrent Japanese competence, the level of difficulty of the article andif the student does notknow the meaning or pronunciation of some portion of the text hehis technical area of interest. Presumably,will request semantic, syntactical, or phonetic information from thetutor. Consequently, there must be a fast and efficient interactionbetween the student and the internal data structures that store all ofthis information. In addition, the tutor keeps a record of what thestudent has learned from the current lesson and what is stillunfamiliar. This information is stored into his database to be used bythe Administrator in assigning an appropriate article as the nextlesson.The tutor also furnishes severaldictionaries for cross-referencing purposes.electronic on-line
183.1 Text Data StructuresEach sentence in an article is represented by a parse tree basedon the syntax of the language. The semantics of the sentence isincluded in the parse tree as additional information stored in eachnode of the tree. Figure 7 shows how a Japanese sentence isrepresented as a parse tree. Note that each node in the parse treemay have an arbitrary number of chiid nodes.A generalized tree is a suitable data structure to represent theparse tree. In the tutorial system, this generalized tree isimplemented as a doubly-linked list. The linked list can effectivelyconnect all nodes together for fast searching and retrieval. Movingupward or downward along different levels of the parse tree can beaccomplished readily due to the double link between the nodes. Onone hand, moving downward along a parse tree shows how asentence or a phrase is broken down into its constituents. On theother hand, movingupwarddepicts how several clauses arecombined together to form a sentence. The double link thus allowsthe student to easily investigate Japanese sentence structure. (Sincethere are no word delimiters in Japanese text, it is sometimesdifficult to understand Japanese sentence structure since the firststep of breaking a sentence down into phrases is not trivial.) Thistree data structure is further augmented to include all of thetranslation information for the entire article. In particular, there is asentence list consisting of nodes that keep the translationinformation for every sentence in the augmented parse tree.Figure 8 illustrates how translation information is stored in thesentence list. With the help of this data structure, translationinformation can be retrieved by the scheme shown in figure 9.Whenever the student has difficultyinunderstanding some of thetext in the article, he highlights this text to notify the tutor.Examining the internal edit record of the text window, the tutoridentifies the selected text based on its relative position in thearticle. To retrieve the actual highlighted characters, however, the
19Original Japanese sentence:it1ri.L0:50)(litLftlgTN.ttZt0)-e6,,Figure 7. A Parse Tree Representation of a Japanese Sentence73
20Sentence ListVSentenceSentence12ParseTree 1ParseTree 2Figure 8. The Sentence List of a Japanese Article
21The student highlights sometext in the text window114111111II1111N11111111111111111Show translationin the translation -4CwindowLocate thesentence and findthe parse treeLocate the nodeand retrieve theinformationFigure 9. Interaction between the User and the Nihongo Tutor
22Macintosh operating system (see Inside Macintosh, Vol I, p. 384)provides a routine called TEGetText that converts the entire textdocument into a character array. If this had to occur each time thestudent highlighted a region of text, it would be very inefficient. As aresult, each node in the sentence list and each parse tree stores thelocation of the text that it encloses. To check whether the selectedtext is included in a node, only a comparison of its text locaiicri withthe selection range is required. To get the translation information,the Nihongo Tutor locates the highlighted text in a sentence andreturns the associated parse tree. It then finds a node in the lowestpossible level of the parse tree enclosing all of the selected text. Allthe translation information based on this node is retrieved and thendisplayed. More about this searching algorithm will be describedlater.Time Complexity AnalysisThe retrieval time depends on two levels of searching: throughthe sentence list, and through the parse tree. The time complexity ofthe first one is 0(k) for the worst case, where k is the number ofsentences in the article. More often than not, the student will bereading the article sequentially from one sentence to the next oneinstead of randomly selecting a sentence to study. The tutor takesadvantage of this locality of reference by putting a marker besidethe sentence which has f!urrently been comprehended. Assumingthat most sentence selections do not change drastically from sentenceto sentence, it is highly probable that the next sentence will be thetarget sentence to be searched. In this case, the search time will be0(1). Thus, searching from the marker can significantly reduce theamount of time required to look for the target sentence.The retrieval time through the parse tree is more complicatedsince nodes in the parse tree have a variable number of child nodes.Suppose that m is the total number of nodes in a parse tree and each
23node bears n children on average. The expected number of levels inthis parse tree will be lognm, and in eacl- selected node within a nonleaf level, half of its children are expected to be examined before thenext searching path can be determined. Thus, the average case is0 ([ nlognm11/2). The overall expected time complexity is therefore0(1-1[nlognm]/2).AttributesThe contents of a node must include all semantic, phonetic, andmorphological information as well as any supplemental data forinstruction but at the same time remove as much redundancy aspossible. The attributes should include the lexical boundary of thestored text, its best English equivalent contextual meaning, Japanesereadings, possible syntactical breakdown, and so on. Moreover, thenode must be structured in such a way that it can be used torepresent the sentence list. The attributes associated with a node arepresented in figure 10 and are defined as follows:(1) Identification number:(2) Text location:(3) Child node list(4) Parent link(5) Actual text(6) Character index::::identification within the parse treestarting and ending positions of the textin the articlea linked list of child nodesa pointer to the parent nodestrings of Japanese text (with readings)and English translationindices to the actual text strings forretrieving Japanese text and EnglishequivalentThe actual text attribute pertains only to a sentence list nodewhereas the character index attribute applies only to a parse treenode. The purpose of this special implementation is to removeredundant storage of text including Japanese text, readings, andEnglishtranslations, within a node.Theroot of the parse tree
24Start locationJapanese textEnd locationEnglish textParent node(Japanese wordsChildnode list)English wordsReadings Only used in a parse tree node--- Only used in a sentence nodeFigure 10. The Attributes Associated with a Noder.--).0(.)
25normally contains most of the text contents that its descendantsneed. Consequently, instead of redundantly storing the textinformation in each node, it is kept only in the sentence node.Whenever translation information is requested, the tutor uses thecharacter index attribute as an index into the actual text strings andretrieves the appropriate words. Table 2 shows the text contentsstored in a sentence node and what text characters will be retrievedaccording to the word indices in the parse tree nodes.The actual text string can be considered as a single string ofinfinite length used to store a large amount of text information. Allattempts of text information retrieval should access this text string.The actual implementation, however, divides it into substrings ofequal string length and connects them together in a linked list (seefigure 11). This kind of data structure is frequently used in thetutorial system and will be referred to as a "string list".The character index is a record consisting of two fields: thestarting index (text begin) of the text and the length of the words(text length). The desired text is retrieved according to the followingformulas: (text begin) mod (string length) Start Index (text length)String number (text begin) div (string length)Start indexEnd index(1 )(2)(3)String number indicates the location of the substring in thestring list where the text can be found. To give an example of how topractically retrieve some Japanese text, its reading, nd theassociated translation, figure 12 shows the contents of the textstrings and table 3 summarizes the results of the retrieval for somespecific values of character indices.To compare the storage savings, the three records of thecharacter index account for six bytes of space in each node. If thetext or readings to be stored inanode requires more than this
26Table 2. Example of Text Information Stored in NodesIn the sentence node:Japanese text and readings:f Li]g"-C'N.t).5 t)GOT*5.,itig.LC)-30a3,1g-3I'Dtsga)M. eEnglish translation:velocity is a thing represented by speed and directionthe thing called velocitycan be representedperiodStarting text location: 0Ending text location: 48In the parse tree:NodeChar. Japanese' Char. English Char. ReadingngLocationindex01231445961011413text Indexif 1 t*18textvelocity*-365Oa *55the9thingL (,)6calledIndex656 0000* two bytes are required to represent a Japanese character30
27Text string listString1String 21StringlengthInor Stringlength --- 1Figure 11. The String List Representation of the Text String
2gJapanese text string3011151131111111-.1 le111111IEnglish text string1IsiI1I v 1 El L I ol cli 1 T i Yi1i1Figure 12. The Japanese and English Actual Text Strings32
29Table 3. Example of Text Information RetrievalIndexTextBeginJapanese lation*String length 255TextStringLengthNumberTextit a velocity
30amount, the character index data structure will reduce the storagespace requirement. For instance, the sentence in figure 7 has 20nodes with 120 bytes used for the character indices. Assuming thatactual text characters are stored instead, it would require 185 bytesfor the Japanese text, 268 bytes for the English text, and 22 bytes forthe readings, thus requiring a total of 475 bytes. The saving in thiscase is a ratio of 3.96 to 1. Thus the space requirements for thecharacter index method are four times smaller than that required bykeeping the actual text. This figure will, of course, vary based on theamount of redundancy in the node information which is in turnrelated to the depth of the parse tree. Counting all the sentences inan article, the reduction in space complexity is overwhelming. Theonly drawback of this scheme is the slight increase in timecomplexity i.e. the overhead of locating the correct substring in thestring list.The child node list is the linkage that connects all the siblingsof a node together. Its linked list implementation can accommodate avariable number of nodes. Figure 13 shows how the different kindsof links join the nodes together within a parse tree. Each node in thechild node list is a record consisting of the current child and a pointerto the next child. The sibling nodes are sorted according to their textlocation in the article, and each one bears its own child node list.Using these links, the algorithm shown in figure 14 can be usedto search the parse treefor the highlighted text. The search isdifficult since there are no delimiters used to denote the boundary ofa word, a phrase, or even a clause in Japanese. When some text ishighlighted, it is important to know which node is being selected.Since each node encloses a certain amount of text, the relativestarting and ending position of this text can be used to compare withthe selection range. In this way, the tutor can identify the selectednode by the unique text enclosure range stored in the nodes.
31Child linkParent linkSibling linkParent nodeV-- Child node 2Child node 1Child node 3IFigure 13. Connection of Nodes within a Parse Tree
32Current node RootInclude allhighlightedext7Current node Next siblingyesVSearch node First child yesnoInclude allhighlightedext9Current nodeSearch node Search nodenoIs Currentnode a leafode Next childIs Searchyesyesnode empty/noHighlight allCurrent node textFigure 14. Algorithm for Searching through the Parse Tree
33The algorithm starts searching from the root node of the parsetree. The "current node" is used to examine sibling nodes whereasthe "search node" is for inspecting child nodes. For each level withina parse tree, the selection range is compared with the text location ofthe "current node." If the latter completely includes the first one, itproceeds to examine its child nodes. Otherwise, the "current node"will be set to be its next sibling for continual searching. For each"current node", however, it is necessary to check whether any one ofits child nodes entirely encompasses the se
the Parse Tree Editor offers specifics on the four stages of parsing: segmentation, syntax, semantics, and mapping. Some screens are used for illustration. (MSE) ***** Reproductions supplied by EDRS are the best that can be made. from the original document.