The Usage Of Decision Tree In Medical Field

3y ago
45 Views
3 Downloads
644.48 KB
6 Pages
Last View : 15d ago
Last Download : 10m ago
Upload by : Camryn Boren
Transcription

The Usage of Decision Tree in Medical FieldDicky Adrian - 135160501Program Studi Teknik InformatikaSekolah Teknik Elektro dan InformatikaInstitut Teknologi Bandung, Jl. Ganesha 10 Bandung 40132, Indonesia113516050@std.stei.itb.ac.idAbstract— Many instances in the world now have some kind of bigdata. With big data, we also need to know how to process them. Oneof the way to process big data is constructing a tree. Some instanceswith a big data is hospital, or to be more generalized medicalinstances. In medical instances there will be time when we need tomake a quick decision. A decision tree can make a quick decisionusing some algorithm and with decision trees, risk can be reduced.One example is the caesarian procedure for pregnant women,although in these years the caesarian procedure has become moresafe, but it is still a procedure that have some risk. With the databaseand the decision tree, we can generalize what is the best course ofaction depending on the situation, so the risk will be reduced. And it’sproven with the success rate over 80%.Keywords— Decision Tree, Medical, Treeengineers it’s really important to know how to implement treesin the most effective way and the most efficient way.II. TREEA. DefinitionTree in data structure means an undirected graph that containsno circuit. Tree is a collection of data (Node) which is organizedin hierarchical structure and this is a recursive definition [7].There are two types of trees, first is free tree. Free tree is a treewith no terminologies such as root, leaf, etc. There is also arooted tree, which contain root, child, parent, etc. All of theseterminologies will be explained later. When there are two ormore trees in one case, it can be called a “forest”.B. TerminologyI. INTRODUCTIONIn the recent years, databases have become more and moreadvanced than ten years ago. That makes some trouble inhandling cases with big database. With that much knowledge,we need to know how to determine a decision that is optimal forevery case that may occur. Decision tree is one way to decidewhat is the best possible decision to make in each case.Decision Tree can be used in many ways, from the simplestcase to advanced cases. For example, decision tree can be usedto determine whether or not to go outside based the humidity,raining, and wind parameters, or where is the best place to go inthis time of the year for a holiday.Other field which benefit from decision trees are the one withbig database. For example, medicine, doctors need to know fasthow to handle patients with some symptoms, and sometimes,there are not enough doctors in the hospital. That’s when amachine is needed to take the job. Other than that, a decisiontree can also determine what drugs must be taken based on thesymptoms.One of the procedure that needs attention is caesarian sectionfor pregnant women. The rate of mortality of pregnant womenis not stable, it’s always fluctuating. One way to stabilize themortality rate is to have a system that can always determine thebest decision for each patient. But, of course the procedure mustalways be accompanied with a certified doctor, since we can’tdepend fully on the machine itself.To make a decision tree, there are some algorithms around theworld. The easy one is effective enough, but can’t handle sometypes of data. But, there are also other algorithms which canhandle most data types but comes with a higher “price”. So, asMakalah IF2120 Matematika Diskrit – Sem. I Tahun a structures algorithms/tree data structure.htmThere are a lot of terminology in tree data structure. Theterminologies are: RootThe node at the top of the tree, in the image the root ofthe tree is ‘A’. Child and ParentA child is a node which pointed by the parent node.For example, ‘D’ is the child of ‘B’ and ‘B’ is the parentof ‘D’ PathA path is how one node can go to another node with thehelp of the edges. For example, the path from ‘A’ to ‘H’is A, B, D, H, and the length of the path is 3. Descendant and AncestorIf there is a path between ‘x’ and ‘y’ nodes, then ‘x’ isthe ancestor of ‘y’ and ‘y’ is the descendant of ‘x’. For

example, in the image, ‘H’ is the descendant of ‘B’ and‘B’ is the ancestor of ‘H’.SiblingIf one parent has 2 or more nodes, then if ‘X’ and ‘Y’are the children from ‘Z’, then ‘X’ and ‘Y’ are siblings.LevelLevel represents the generation of the nodes, from theimage it’s clear that the node ‘B’ is level 1, and so on.HeightHeight represents the max level of the tree,LeafA leaf in tree data structure is a node that doesn’t haveany child for example the node ‘H’, ‘I’, ‘J’, ‘F’ and ‘G’.Internal NodesInternal nodes unlike leaf, is the node in the tree that atleast have one child. For example, ‘A’, ‘B’, ‘C’, ‘D’ and‘E’.Sub treeA sub tree represents the descendants of a node.Source: https://i.stack.imgur.com/8QFVk.gifThe evaluation of the tree above is ((2*3)/(2-1)) (5*(41)). Decision TreeA decision tree is a tree that contains a solution on eachleaf. The root and each internal node are labeled withquestion. A decision tree doesn’t always come in abinary tree form, it can also be N-ary tree with N greaterthan two.C. Types of Trees Balanced TreeBalanced tree is a tree where there are no leaf on theright side which is farther than the leaf on the left side.Source: https://www.cise.ufl.edu/ ddd/cap6635/Fall97/Short-papers/Image3.gif Source:https://www.ocf.berkeley.edu/ shidi/cs61a/w/images/8/88/Balanced vs unbalanced BST.pngN-Ary TreeN-Ary tree is a rooted tree and every node on that treecan’t have more than N children. If every node exceptthe leaf on the tree have approximately N children, thenthe tree is called full N-ary tree.Binary TreeA binary tree is a tree and every node on that tree can’thave more than 2 children, in other words, a binary treeis a 2-ary tree.Ordered TreeOrdered tree is a tree where the children of every parentis sorted in some way.D. Application of Binary TreeBinary tree is a very important data structure in computerscience. It’s a powerful data structure that can do many things. Expression TreeAn expression tree is a tree that contains a mathematicalexpression such as subtraction, addition, multiplication,etc. Nodes can contain both operand and operators.Nodes with operators will operate the children of thatnode. For example, given a tree belowMakalah IF2120 Matematika Diskrit – Sem. I Tahun 2017/2018 Prefix codePrefix code is a set of codes, which every member of theset is not a prefix of other member. For example:o {000, 001, 01, 10, 11} is a prefix codeo {1, 00, 01, 000, 0001} is not a prefix code because’00’ is the prefix fore ‘000’ and ‘0001’ Huffman CodeHuffman code is one algorithm to shorten the length ofbitwise string based on how frequent the letter shows up.More often the letter shows up, the algorithm will makethe bitwise size smaller. This kind of algorithm is calledcompression algorithm. The steps to get Huffman codeare:1. Pick two symbols with the least probability, thenmake the two of them into two nodes for one parentwhich is the concatenation of the two.2. The concatenation of the two previous symbol is anew symbol and the two symbol picked before isnow “erased”.3. Repeat step 1 until all the symbol “erased”.

B. Algorithm Source: hapter-2/figs/huffman.gifBinary Search TreeA binary search tree is one of the most important tree indata structure. It can handle the fundamentals such assearching, inserting elements, deleting elements. Thework of binary search tree is determined by the key ofevery node. Left subtree’s key is less than the root’s keyand the right subtree’s key is greater than the root’s key.This occur on every node.Some of the algorithm are to construct a decision tree are:1. ID3 AlgorithmThis Algorithm first introduced by J.R. Quinlan and it isone of the simplest algorithm to construct a decision tree.This algorithm process the database top-down until itcreate a tree and no backtracking. ID3 Algorithm usesentropy and information gain to construct a decision tree.But, ID3 Algorithm doesn’t do any pruning, which resultin not very effective decision tree.As stated above, we need to know the entropy of thedatabase we have. Entropy calculates the homogeneity ofa sample. If the sample is homogenous then the entropywill be 0, and that sample is a leaf node. The steps are:o Calculating entropyConstruct a frequency table of one attribute, forexample ‘play tennis’ isPlay TennisYesNo95The entropy is calculated by the equation:𝑐𝐸(𝑆) 𝑝𝑖 𝑙𝑜𝑔2 𝑝𝑖𝑖 1On the example above we will get:𝐸(𝑝𝑙𝑎𝑦 𝑡𝑒𝑛𝑛𝑖𝑠) 𝐸(5,9)𝐸(𝑝𝑙𝑎𝑦 𝑡𝑒𝑛𝑛𝑖𝑠) 𝐸(0.36, 0.64)𝐸(𝑝𝑙𝑎𝑦𝑡𝑒𝑛𝑛𝑖𝑠 ) (0.36 𝑙𝑜𝑔2 0.36) (0.64 𝑙𝑜𝑔2 0.64) commons/thumb/d/da/Binary search tree.svg/1200px-Binary search tree.svg.pngIf there are more than one attribute then we calculateit as𝐸(𝑇, 𝑋) 𝑃(𝑐)𝐸(𝑐)III. CONSTRUCTING DECISION TREEA. StagesThere are three main phase to construct a decision tree:1. Construction PhaseIn this phase, the tree is constructed with the entiredatabase. It is constructed by doing recursion techniqueuntil a stopping criteria is met.2. Pruning PhaseAfter constructing the tree, there’s a chance that the treeis not in the best version due to over-fitting. In this phase,we make the previously built tree more effective byremoving some of the lower branches and nodes toimprove the performance of the tree.3. Processing the pruned treeThis stage is an optional stage where we improve the treeto make the tree more understandable.Makalah IF2120 Matematika Diskrit – Sem. I Tahun 2017/2018𝑐 𝑋For example,OutlookSunnyOvercastRainyPlay 𝑛𝑛𝑖𝑠, 𝑂𝑢𝑡𝑙𝑜𝑜𝑘) 𝑃(𝑆𝑢𝑛𝑛𝑦) 𝐸(3,2) 𝑃(𝑂𝑣𝑒𝑟𝑐𝑎𝑠𝑡) 𝐸(4,0) 𝑃(𝑅𝑎𝑖𝑛𝑦) 𝐸(2,3)o Calculating GainGain is obtained by calculating the deviation on eachpredictor’s entropy and other predictor’s entropy.After that, first, we decide which node is the target of thedecision tree, then we calculate the gain on each

predictor. Choose the one with the most gain as adecision node, then divide the dataset by its branches,repeat the same process on every branch.However, as stated above, this algorithm has somedisadvantages, such as:o ID3 doesn’t produce the most optimal solutionbecause the algorithm doesn’t do any backtrackingon each branch.o ID3 can overfit to the training data.o ID3 can only make trees from nominal data, notcontinuous data.2. C4.5 AlgorithmC4.5 algorithm is also introduced by J.R. Quinlan as theevolution of ID3 Algorithm. As we know before, the ID3algorithm has some disadvantages, in the C4.5 algorithm,they are fixed. The C4.5 algorithm uses gain ratio assplitting criteria and uses error-based pruning after thegrowing phase.The most important updates from ID3 Algorithm to C4.5algorithm are:o C4.5 algorithm uses pruning procedure, so somebranches that don’t contribute to accuracy arereplaced by leaf nodes.o C4.5 allow some values to be missingo C4.5 can handle not only nominal data, but alsocontinuous data.There is another update to the C4.5 algorithm which isthe C5.0. C5.0 algorithm claims to have more efficiencyboth in memory and computation time.6.QUEST which stands for The Quick, Unbiased, EfficientStatistical Tree algorithm is an algorithm that can handleunivariate and linear combination splits. This algorithmuses several test, such as ANOVA F-test or Levene’s testor Pearson’s chi-square test. First the algorithm willcalculate ANOVA F-statistic for each attribute, and theone with the largest value will be selected as a splittingnode. Otherwise, Levene’s test will be performed and ifthe largest value of the test is greater than a certainthreshold, the node is selected as a splitting node.This algorithm also uses Quadratic DiscriminantAnalysis to find the optimal splitting point for each inputattribute, and uses ten-fold cross-validation to prune thetree.Other AlgorithmsThere are a lot of other algorithms to create Decisiontrees, some are CAL5, FACT, LMDT, T1, PUBLIC, andMARS. Each of them has their own uses, depending onthe type of data. They also have their own weaknessesand strength. The CAL5 is designed specifically fornumerical-valued attributes, the FACT algorithm is anearlier version of QUEST, the LMDT algorithmconstruct a decision tree by using multivariate tests, theT1 algorithm a one level decision tree that classifiesinstances using one attribute, the PUBLIC uses MDLcost to prune the decision tree in order to reducecomputational complexity, the MARS uses multipleregression to create the decision tree.D. Tree Representation in Data Structure3.4.5.CART AlgorithmCART or Classification and Regression Trees wascreated by Breiman et al.(1984). In this algorithm, thetree is a binary tree which has exactly two outgoingedges. The splits are selected by Twoing Criteria and italso uses pruning, with Cost-Complexity Pruning. As thename suggest, CART Algorithm can also makeregression trees. In the regression trees, the leaves predicta real number and not a class.CHAIDCHAID which stands for Chi-squared-AutomaticInteractive-Detection is another algorithm to construct adecision tree. This algorithm will find the leastsignificantly difference attribute from the target attribute.The attributes are measured by some tests, an F test isused is the target attribute is continuous, a Pearson chisquared test if it is nominal and a likelihood ration test ifit is ordinal.For each input attribute, the algorithm will check if thevalue is greater than a certain merge threshold. If it ishigher, then the attributes are merged. The method ofsplitting is by making a group of homogenous values thechildren of a node. This procedure stops if a maximumtree depth is reached, maximum number of cases in anode for being a parent is reached and a minimumnumber of cases in a node for being a child node isreached.QUESTMakalah IF2120 Matematika Diskrit – Sem. I Tahun 2017/2018Tree can be represented in many ways, two of them are listrepresentation and Left-child Right-sibling representation. Inlist representation, it uses two types of nodes, one list containthe data of the node, and one contain only references. Each nodeis connected to each other by looking at the pointer in thereference nodes.The second one is more understandable, every element in thelist has two pointers, one to the left child, and the other to theright sibling. If there is no pointer to the left child or the rightsibling, the pointer will point to NULL. This occur on each nodeon the tree.E. Advantages and Disadvantages of Decision treesAs a data structure, trees also have their advantages anddisadvantages. Some of the advantages of decision trees are:1. Decision trees are easy to understand, because they areself-explanatory.2. Decision tree can handle nominal and numeric inputattributes3. Decision tree representation is enough to representdiscrete value4. Decision trees can handle missing cases5. Decision trees can handle error cases6. Decision trees are considered to be a non-parametricmethod because they do not include any assumptionsabout the space distribution and on the classifierstructure.

7.If the classification cost is high, trees can be used becausethey ask only for the values of the features along a singlepath.Some disadvantages of decision trees are:1. With some algorithm, the result will always be discretevalues2. They tend to only perform well if a few highly relevantattributes exist.3. Some fragmentation problem causes partitioning datainto smaller fragments4. To handle missing cases, the algorithm must employspecial mechanism to process the missing value5. Decision trees are very unstable, a minor change willchange everything from the root to the leavesIV. DATA AND CALCULATIONAs stated in chapter I, there are usage of decision trees inmedical field such as determining whether a pregnant womenneed a caesarian procedure or not. Caesarian procedure itself hassome risk, such as, the baby’s immune system is not as strong atfirst so the death rate of the baby is higher than normalprocedure. But in some case, a pregnant women need caesarianprocedure to save both the mother and the baby. In this data,from a paper written by Farhad S. G., Peyman Mohammadi, andParvin Hakimi, they choose five attributes to construct thedecision tree. They are age, number of pregnancy, delivery time,blood pressure and heart status, and to construct the tree they usethe C4.5 algorithm so it’s more o18201LatecomerHighaptYesMakalah IF2120 Matematika Diskrit – Sem. I Tahun TimelyLowaptYes

er6/pxc3881613.pdfFrom the data, they collected, as stated before, they uses theC4.5 algorithm to construct the decision tree due to itscapabilities and the efficiency of the algorithm. In the algorithmthey construct a tree with the size of 31 and it has 21 leaves node.The algorithm is a top-down, greedy search procedure. Afterconstructing the tree it can be seen that most women with ineptheart status didn’t have a natural delivery and over 65% of thoseinept heart case savors an abnormal pressure of bloodAfter they construct the tree, they evaluate the data once againthis time with the tree, and the result is pretty good, the tree canguess most cases correctly. Out of 80 cases, the tree guess 69cases correctly, or approximately 86.25% of the time correctly.This means that with the decision tree, we can make medicalguesses efficiently and correctly under normal circumstances.Even though, it’s not 100% accurate, but there can be someimprovements in the future that make the decision trees moreaccurate. Of course, if there are some abnormalities in thepatient’s body, it’s best to ask a doctor rather than trusting amachine to decide a surgery, because until now, machine stillcan’t handle the abnormalities in data.V. CONCLUSIONMedical field has one of the biggest database in the world.How to treat each patient is determined with each patient’sunique condition. In order to make things more efficient we cansimplify the problem with a decision tree. With the tree, we canunderstand what is the most important thing in deciding the bestcourse of action. For example, in caesarian procedure fordelivering a baby, the mother’s heart status is the most importantstatus. With this information, we can make decision faster andaccurately.Makalah IF2120 Matematika Diskrit – Sem. I Tahun 2017/2018There are still other uses for decision tree in many field, andmedicine is one of them. Other field that benefit from decisiontree are business, economy, chemical (drug), and many more.Everything that needs classification can be done with a decisiontree.VII. ACKNOWLEDGMENTThis paper was supported by Institut Teknologi Bandung. Ithank all of my references’ writers who provided insight andexpertise that greatly assisted the research. I also hope that Ididn’t step any boundaries while writing this paper.I would also like to express my special thanks to all thelecturers that made this paper possible, Dra. Harlili, M.Sc. andalso with the help of all the friends that helped me during theprocess of writing this paper. I would also like to express mythanks to my parents for the financial and moral support.REFERENCES[1][2][3][4][5][6][7][8][9]Dr. Rinaldi Munir, “Diktat Kuliah Matematika Diskrit” 4th edhttps://xlinux.nist.gov/dads/HTML/tree.html accessed at 2nd December2017, 22.02https://www.tutorialspoint.com/data structures algorithms/tree data structure.htm accessed at 2nd December 2017, 2/10/23.pdf accessed at 2ndDecember 2017, ata%20Mining%20with%20Decision%20Trees 0%5BRokach%20%26%20Maimon%202014-1023%5D.pdf accessed at 3rd December 2017, r6/pxc3881613.pdfaccessed at 3rd December 2017, 00.18http://btechsmartclass.com/DS/U3 T1.html accessed at 3rd December2017, 09.42http://btechsmartclass.com/DS/U3 T2.html accessed at 3rd December2017, 10.02http://www.saedsayad.com/decision tree.htm accessed at 3rd December2017, 00.42PERNYATAANDengan ini saya menyatakan bahwa makalah yang saya tulis iniadalah tulisan saya sendiri, bukan saduran, atau terjemahan darimakalah orang lain, dan bukan plagiasi.Bandung, 3 Desember 2017Dicky Adrian – 13516050

With the database and the decision tree, we can generalize what is the best course of . 88/Balanced_vs_unbalanced_BST.png N-Ary Tree N-Ary tree is a rooted tree and every node on that tree . The concatenation of the two previous symbol is a new symbol and the two symbol picked before is now “erased”.

Related Documents:

May 02, 2018 · D. Program Evaluation ͟The organization has provided a description of the framework for how each program will be evaluated. The framework should include all the elements below: ͟The evaluation methods are cost-effective for the organization ͟Quantitative and qualitative data is being collected (at Basics tier, data collection must have begun)

Silat is a combative art of self-defense and survival rooted from Matay archipelago. It was traced at thé early of Langkasuka Kingdom (2nd century CE) till thé reign of Melaka (Malaysia) Sultanate era (13th century). Silat has now evolved to become part of social culture and tradition with thé appearance of a fine physical and spiritual .

On an exceptional basis, Member States may request UNESCO to provide thé candidates with access to thé platform so they can complète thé form by themselves. Thèse requests must be addressed to esd rize unesco. or by 15 A ril 2021 UNESCO will provide thé nomineewith accessto thé platform via their émail address.

̶The leading indicator of employee engagement is based on the quality of the relationship between employee and supervisor Empower your managers! ̶Help them understand the impact on the organization ̶Share important changes, plan options, tasks, and deadlines ̶Provide key messages and talking points ̶Prepare them to answer employee questions

Dr. Sunita Bharatwal** Dr. Pawan Garga*** Abstract Customer satisfaction is derived from thè functionalities and values, a product or Service can provide. The current study aims to segregate thè dimensions of ordine Service quality and gather insights on its impact on web shopping. The trends of purchases have

Civic Style - Marker Symbols Ü Star 4 û Street Light 1 ú Street Light 2 ý Tag g Taxi Æb Train Station Þ Tree 1 òñðTree 2 õôóTree 3 Ý Tree 4 d Truck ëWreck Tree, Columnar Tree, Columnar Trunk Tree, Columnar Crown @ Tree, Vase-Shaped A Tree, Vase-Shaped Trunk B Tree, Vase-Shaped Crown C Tree, Conical D Tree, Conical Trunk E Tree, Conical Crown F Tree, Globe-Shaped G Tree, Globe .

Chính Văn.- Còn đức Thế tôn thì tuệ giác cực kỳ trong sạch 8: hiện hành bất nhị 9, đạt đến vô tướng 10, đứng vào chỗ đứng của các đức Thế tôn 11, thể hiện tính bình đẳng của các Ngài, đến chỗ không còn chướng ngại 12, giáo pháp không thể khuynh đảo, tâm thức không bị cản trở, cái được

Le genou de Lucy. Odile Jacob. 1999. Coppens Y. Pré-textes. L’homme préhistorique en morceaux. Eds Odile Jacob. 2011. Costentin J., Delaveau P. Café, thé, chocolat, les bons effets sur le cerveau et pour le corps. Editions Odile Jacob. 2010. Crawford M., Marsh D. The driving force : food in human evolution and the future.