Towards Generating Math Word Problems From Equations And .

2y ago
118 Views
2 Downloads
755.97 KB
10 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Dahlia Ryals
Transcription

Towards Generating Math Word Problems from Equations and TopicsQingyu Zhou Harbin Institute of Technologyqyzhou@hit.edu.cnDanqing Huang Microsoft ki et al., 2016). Previous attempts are mainly based on templates. Polozov et al. (2015) consider several components(e.g., event graph construction, surface text realization), each with manual defined templates andrules. Koncel-Kedziorski et al. (2016) generatemath problems by revising existing problems intoa new topic. They use a problem as the verbaltemplate and simply replace nouns and verbs withsuitable words from the new topic. The generation of template-based systems is based directlyon existing items with high coherence. However,they have clear limitations. As templates are fixed,the possible outputs are limited to follow templatepatterns without too many grammatical and lexicaloptions. Additionally, they require manual effortto construct domain-specific templates.Recently, neural network approaches to automatic generation of questions (Du et al., 2017;Zhou et al., 2017) and stories (Fan et al., 2018)have shown promising results. Despite their success, they cannot be directly applied to math wordproblem generation, since generation of mathword problems need to maintain the underlyingmathematical operations between quantities andvariables, while at the same time ensuring the relevance of the output problem and a given topic.In this paper, we propose a novel neural network model for Math word Problem Generationfrom Equations and Topics (M AG N ET). The proposed model consists of three main components:an equation encoder, a topic encoder and a mathproblem decoder. The equation encoder is implemented with a bidirectional recurrent neural networks (RNN) which takes the equation tokens asinput and produces a sequence of hidden vectors.The topic encoder maps the given topic words intocontinuous word representations. The decoder is asingle directional RNN with dual-attention mechanism, which can dynamically extract informationA math word problem is a narrative with aspecific topic that provides clues to the correct equation with numerical quantities andvariables therein. In this paper, we focuson the task of generating math word problems. Previous works are mainly templatebased with pre-defined rules. We propose anovel neural network model to generate mathword problems from the given equations andtopics. First, we design a fusion mechanismto incorporate the information of both equations and topics. Second, an entity-enforcedloss is introduced to ensure the relevance between the generated math problem and theequation. Automatic evaluation results showthat the proposed model significantly outperforms the baseline models. In human evaluations, the math word problems generated byour model are rated as being more relevant (interms of solvability of the given equations andrelevance to topics) and natural (i.e., grammaticality, fluency) than the baseline models.1IntroductionA math word problem is a narrative which describes a story under a specific topic. Moreover,it provides clues to the correct equation interpreting mathematical relations of numerical quantitiesand variables. The two example problems in Table 1 belong to two different topics (ticket selling,land purchase) respectively. Meanwhile they sharethe same equation template interpreting the underlying mathematical relations between numbersand variables. To generate a math word problem,a system needs to produce a topic-specific storywhile maintaining the underlying equation.There is a surge of interest in automatic mathword problem generation (K. and Elliot, 2002;Deane and Sheehan, 2013; Polozov et al., 2015; Equal contribution.494Proceedings of The 12th International Conference on Natural Language Generation, pages 494–503,Tokyo, Japan, 28 Oct - 1 Nov, 2019. c 2019 Association for Computational Linguistics

Equation Template:x y [num0][num1] x [num2] y [num3]Problem 1:Tickets to local movie were sold at 4.00 foradults and 2.50 for students. If 267 ticketswere sold for a total of 1042.50, how manyadult tickets were sold?Topic: ticket sellingEquation:x y 267, 4 x 2.5 y 1042.5Problem 2:A farmer bought 100 acres of land, part at 300 an acre and part at 450, paying for thewhole 42,200. How much land was there ineach part?Topic: land purchaseEquation:x y 100, 370 x 450 y 42200(Topic2Math). We use three commonly used automatic evaluation metrics in recent text generation works, i.e., BLEU (Papineni et al., 2002),ROUGE (Lin, 2004) and METEOR (Denkowskiand Lavie, 2014). Evaluation results on all threemetrics show that our M AG N ET model outperforms the baseline methods. To further examinethe quality of generated math word problems, wealso conduct human evaluations. Human evaluation results show that our M AG N ET model performs better than the baseline systems on three aspects, i.e., 1) solvability to the given equation; 2)relevance to the given topic and 3) grammaticalityand fluency of language.Our contributions are three-folds:1. We propose a novel end-to-end neural network model M AG N ET to generate mathword problems based on given equations andtopics.2. We introduce an Equation-Topic Fusionmechanism which helps the decoder incorporate both the information from the equationand the topic.3. We design an entity-enforced loss function toimprove the relevance between the generatedmath word problem and the given equations.Table 1: Math word problems of the same equationtemplate but with different topics.from equations and topic words. To leverage boththe equation and topic information, we design anequation-topic fusion mechanism to enable the decoder to choose which information to use. Furthermore, to ensure that the generated math wordproblem is highly related to the given equations,we introduce a novel entity-enforced loss functionwhich considers the correspondence between variables in the given equations and entities in the output problem.Large-scale annotated math problem datasetsplay a crucial role in developing neural math problem generation systems. We propose to adaptDolphin18K (Huang et al., 2016) as the training, development and test sets, since it is one ofthe current largest math problem datasets with diverse problem types. It contains 18,460 elementary math problems from Yahoo! Answers1 , withannotation of equations and answers.Extensive experiments are conducted on theDolphin18K dataset. We first propose three baseline methods: 1) a retrieve-based model that findthe closest math problems in the training set; 2)a sequence-to-sequence model which takes onlythe equation as input (Equ2Math); 3) a neural decoder model conditioned on topic words12Related WorkAutomatic question generation from text aims togenerate questions taking text as input, which hasthe potential value of education purpose (Heilman,2011). Previous question generation works focuson generating natural language questions from agiven piece of text. Heilman (2011) employs asyntactic parser to parse the input text into a treeand extract answer candidates. Then a rule-basedsystem transforms the tree into the corresponding question. Recently, generative neural networkmethods are also applied to this area since largescale manually annotated passage-question pairsbecome available. Du et al. (2017) and Zhou et al.(2017) propose to use SQuAD (Rajpurkar et al.,2016) question answering dataset as the trainingdata of question generation. In SQuAD dataset,the given passage is a piece of text from Wikipediaand the answer is a sub-span in it. Du et al. (2017)use a sequence-to-sequence model on the passagequestion pair to generate questions. Their modeltakes the passage text as input to generate a question from it. Different from Du et al. (2017), theyadd the answer position to the model input as BIOhttps://answers.yahoo.com/495

Equation is a solution for a specific math problem, while an equation template can correspondto several math problems. Therefore, an equationtemplate can be seen as an abstraction of a set ofequations.tagging features. However, these methods cannotdirectly applied to math word problem generation.There are previous approaches specifically targeting math problem generation. Most of themare template based, such as natural languageschemas (K. and Elliot, 2002) and semantic framesof conceptual structures (Deane and Sheehan,2013). Polozov et al. (2015) propose a pipeline including equation generation, plot generation andsurface text realization, which requires manuallydefined ontology and templates. These approachesare ensured to maintain highly-coherent story, butwith the manual cost of template construction,which is difficult to extend to more domains. Recently, Koncel-Kedziorski et al. (2016) propose arewrite-based approach. They generate new problems by simply replacing noun phrases and verbsin the existing math problems with words in thetarget topic. However, they do not consider globaloptimization of the whole problem that results insemantic incoherence.Math problem solving, which can be formatted as learning the mapping from math problemto equations, is also related to our work. Inthis paper, we adapt a math problem dataset Dolphin18K for development. Dolphin18K (Huanget al., 2016) is constructed from Yahoo! Answercontaining over 18,000 math problems. Previousto that, there are several datasets with size lessthan 2,000, such as VERB-375 (Hosseini et al.,2014), ALG514 (Kushman et al., 2014) and Dolphin1878 (Shi et al., 2015).33.2As pointed out in Koncel-Kedziorski et al. (2016),math problems are coherent stories with differenttopics (e.g., ticket selling or land purchase). Inone math problem, there are words that act astopic indicators. For the problems in Table 1, thecorresponding topic indicators are:Problem 1: {tickets, movie, adults, students,sold}Problem 2: {farmer, bought, dollar, land, pay}Therefore, we extract the keywords of a mathproblem as its topic words for representing thetopic. The details of topic words extraction willbe described in Section 3.4.3.3Math Problem GenerationNow we can formally define the task of math wordproblem generation. Given an equation templateE and a set of topic words T as input, the goal isto generate a math word problem P , satisfying:(1) P is a piece of natural language text whosetopic is T ;(2) P maintains the mathematical operations between numerical quantities and variables in theequation template E.Problem StatementGiven an equation template and a target topic, ourgoal is to generate a math word problem in natural language. In this section, we first define theequation template and the topic, and then give theformal introduction of our task.3.1Topic3.4Dataset CreationWe create the math word problem generation dataset based on the Dolphin18K (Huanget al., 2016) dataset. Specifically, we construct(E, T, P ) triple where E is an equation, T is a setof topic words, and P is the corresponding mathword problem. In the Dolphin18K dataset, theequation E and math word problem P are given.Therefore, we need to extract the topic words fromthe text of P .Equation TemplateEquation template, introduced in Kushman et al.(2014), is a unique form of an equation system.For example, given an equation system as follows:x y 20; x 4 yThere are previous studies on the task of topicword extraction, such as simple counting of wordfrequency and LDA topic model (Blei et al., 2003).We practically observe that the TF-IDF method iseffective which satisfies our needs. We calculateWe replace the numbers with tokens and generalize the equations as the following template:x y [num0]; x [num1] y496

4.3the scores of the words as follows:nijtfij Pk nkj P idfi log j : ti Pj 1scoreij tfij idfiAt each time-step t, the decoder GRU holds itsprevious hidden state st 1 , the embedding of previous output word yt 1 and the previous contextvector ct 1 . With these previous states, the decoder GRU updates its states as given by Equation 6. To initialize the GRU hidden state, we usea linear layer with the last backward encoder hidden state h 1 of equation as input:(1)(2)(3)where tfij is the term frequency of word i in problem Pj , and idfi is the inverse document frequencyof word i. We sort the score of each word i in Pj ,and keep the top ntp words as the problem’s topicwords.4st GRU(wt 1 , ct 1 , st 1 )s0 tanh(Wd h 1 b)(7)Then the decoder first generates a readout statert and passes it through a maxout hidden layer(Goodfellow et al., 2013) to predict the next wordwith a softmax layer over the output vocabulary.(8)rt0 [max{rt,2j 1 , rt,2j }] j 1,.,d(9)softmax(Wo rt0 )(10)where Wr , Ur , Vr and Wo are weight matrices.wt 1 is the word embedding of the previously generated word yt 1 . The readout state rt is a 2ddimensional vector, and the maxout layer (Equation 9) picks the max value for every two numbersin rt and produces a d-dimensional maxout vectorrt0 . We then apply a linear transformation on rt0 toget a target vocabulary size vector and predict thenext word yt with the softmax operation.Topic Encoder4.4Equation EncoderEquation-Topic FusionTo incorporate both the information of equationand topic, we propose the Equation-Topic fusionmechanism. Intuitively, the Equation-Topic Fusion mechanism enables the decoder to pay different portions of attention to the equation templatesand topic words. For instance, when the decoderis generating descriptive words about the story, itshould pay more attention to the topic words. Viceversa, the decoder should pay more attention to theequation if it is generating numbers or variablesin the equation. In detail, the context vector ct inEquation 6 and 8 is a fused vector of equation andtopic. We employ two attention modules to produce the corresponding context vectors of equa-The encoder is implemented as a single-layer bidirectional GRU (Cho et al., 2014) (BiGRU). Weconcatenate all the equations together with a special delimiter “,” (indicates the end of an equation). The BiGRU reads the input equation tokensone-by-one, producing a sequence of hidden stateshi [ hi ; h i ] with: hi GRU(xi , hi 1 )h i GRU(xi , h i 1 )rt Wr wt 1 Ur ct Vr stp(yt y t ) The input topic T contains a set of keywords t1 , t2 , . . . , tntp . Considering the fact that thesetopic words do not have sequential or temporal relationships, we represent them as a set of wordembeddings tp1 , tp2 , . . . , tpntp as shown in theupper-left part of Figure 1. Specifically, the topicencoder is a lookup table which maps input topicwords to the corresponding real-valued vectors.4.2(6)M AG N ETAs shown in Figure 1, our M AG N ET model consists of three main parts, namely, the topic encoder, the equation encoder and the math wordproblem decoder. The topic encoder and equationdecoder are used to map topic words and equationsto continuous vectors. The decoder is a single directional recurrent neural network equipped withdual-attention mechanism which leverages by theequation-topic fusion mechanism.4.1Math Word Problem Decoder(4)(5)The initial states of the BiGRU are set to zero vectors, i.e., h1 0 and h n 0.497

quationℎ1ℎ2ℎ3m ���8𝑠9.Decoder-[num0]Figure 1: The overview diagram of M AG N ET. For simplicity, we omit some units and connections. This figureshows the detail that the decoder is generating the third word by fusing the information from both the input equationand topic words.tion and topic :et,i va tanh(Wa st Ua repr(·)i )exp(et,i )αt,i Pni 1 exp(et,i )nXc(·)t αt,i repr(·)irelevance of equation template and the generatedmath problem, we propose an entity-enforced loss:(11)acce (12)gt αt,e(16)t 1(13)(14)ct gt · ECt (1 gt ) · T Ct ,(15)ReLU(1 acce )(17) e equationwhere L is the length of output problem, andReLU is rectifier function defined as:where repr(·)i represents the vector of encodedequation tokens or topic words, which can be hior tpi . The va , Wa and Ua are learnable parameters. Since the equation and topic information areof different types, we use two sets of these parameters for equation and topic attention modules.We represent the equation context vectorc(equation)t and topic context vector c(topic)t asECt and T Ct respectively. To fuse ECt and T Cttogether, we predict a fusion coefficient gt usingan MLP:gt sigmoid(Wf st b)XLe i 1ReLU(x) max(0, x)(18)The intuition behind the entity-enforced loss isthat the model needs to attend to the entities in thegiven equations. In Equation 16, we accumulatethe attention scores of variables in the equation forall the decoding time steps. Then a ReLU functionis applied on (1 acce ) to ensure that the entity eis attended for at least one time during decoding.4.6Objective uestiontriplesD {(E (1) , T (1) , P (1) ), . . . , (E (n) , T (n) , P (n) )},the training objective is to minimize the negativelog likelihood loss L with respect to the modelparameter θ:where gt is the fusion gate. Therefore, the contextvector ct is the combination of equation templateand topic which is determined by the current decoding state st .4.5LXEntity-Enforced LossAs we mention before, the generated math problems should be highly relevant to the given equation template. The entities in the generated mathproblem should correspond to the variables in theequations (e.g., m, n, [num0]). To ensure highL nXlog p(P i E i , T i ; θ) λLe(19)iwhere λ is a hyper-parameter that controls the contribution of entity-enforced in the loss.498

5Experiment5.3The dimension of encoder/decoder hidden stateand embedding are set to 512. The hyperparameter λ in Equation 19 is 0.7. Dropout rateis set to 0.6. All model parameters are initialized using a Gaussian distribution with Xavierscheme (Glorot and Bengio, 2010). We use theAdam (Kingma and Ba, 2015) optimizer with itshyper-parameters set as: learning rate α 0.001,momentum parameters β1 0.9 and β1 0.999,and 10 8 . We also apply gradient clipping (Pascanu et al., 2013) with range [ 5, 5]. Thebeam size is set to 3 in the decoding stage. We release the source code at an anonymous URL forblind review.In this section, we evaluate our model with bothautomatic and human evaluations.5.1DatasetsWe conduct our experiments on the Dolphin18Kdataset2 . Since we need equation templates as input to generate math problems, we use its subset with equation annotation, which sums up to10,644 problems with 5,738 equation templates.The average sentence length of a problem is 2.70.The average number of words in a problem is32.72. We train on 8,515 examples and evaluateon 2,129 test examples, following the split settingin Huang et al. (2016).As pre-processing, we obtain equation templateand topic words for each problem as their input.We extract at most ntp 10 words with highest TF-IDF scores as the topic words in our experiments. According to the statistic, the averagenumber of extracted topic words are 7.7 and 7.5 inthe training and testing datasets respectively.5.2Implementation Details5.4Automatic EvaluationThough the automatic evaluation methods havetheir limitations in natural language generationevaluation, we use them as important evaluationmethods since they are easily reproducible. Furthermore, in the task of math word problem generation, retaining some key information such as thequantities and entities can be well measured by theautomatic evaluation methods.BaselinesWe provide three baselines of math word problemgeneration, considering the input of equation template and topic words respectively3 .5.4.1 Evaluation MetricsWe evaluate the performance of our model usingthree evaluation metrics following recent text generation works (Du et al., 2017; Zhou et al., 2017;Fan et al., 2018):KNN finds the closest problem in the trainingset given the input topic words. It first narrows down training proble

target topic. However, they do not consider global optimization of the whole problem that results in semantic incoherence. Math problem solving, which can be format-ted as learning the mapping from math problem to equations, is also related to our work. In this paper, we adapt a math probl

Related Documents:

3rd grade Steps to solve word problems Math, word ShowMe I teach 3rd grade Math. word problems with dividson. 2nd grade two step word problem. Grade 3 Word Problems. 3rd grade math word problems Grade 3 math worksheets and math word problems. Use these word problems to see if learner

Visit www.enslow.com and search for the Math Word Problems Solved series to download worksheets for the following titles: Amusement Park Word Problems Fun Food Word Problems Starring Pre-Algebra Starring Fractions 978-0-7660-2922-4 978-0-7660-2919-4 Animal Word Problems Space Word Problems

Math 5/4, Math 6/5, Math 7/6, Math 8/7, and Algebra 1/2 Math 5/4, Math 6/5, Math 7/6, Math 8/7, and Algebra ½ form a series of courses to move students from primary grades to algebra. Each course contains a series of daily lessons covering all areas of general math. Each lesson

No Problem! These worksheets practice math concepts explained in Math Measurement Word Problems: No Problem! (ISBN: 978-0-7660-3369-6), written by Rebecca Wingard-Nelson. Math Busters Word Problems reproducible worksheets are designed to help teachers, parents, and tutors use the books from the Math

These worksheets practice math concepts explained in Fun Food Word Problems Starring Fractions (ISBN: 978-0-7660-2919-4), written by Rebecca Wingard-Nelson. Math Word Problems Solved reproducible worksheets are designed to help teachers, parents, and tutors use the books from the Math Word Problems Solved series in the classroom and the home.

MATH 110 College Algebra MATH 100 prepares students for MATH 103, and MATH 103 prepares students for MATH 110. To fulfil undergraduate General Education Core requirements, students must successfully complete either MATH 103 or the higher level MATH 110. Some academic programs, such as the BS in Business Administration, require MATH 110.

math-drills.com math-drills.com math-drills.com math-drills.com math-drills.com math-drills.com math-drills.com math-drills.com math-drills.com Making Number Patterns (C) I

2016 MCAS Results September 29, 2016 Page 4 8 Year Math CPI Results For State, District, and Schools Ranked by 2016 CPI School el 2009 Math MCAS 2010 Math MCAS 2011 Math MCAS 2012 Math MCAS 2013 Math MCAS 2014 Math MCAS 2015 Math MCAS 2016 Math PARCC Sewell-Anderson 1 80.0 78.7 76.7 84.2 88.3 89.0 89.3 92.5