A Guide To Machine Learning For Biologists

1y ago
36 Views
2 Downloads
1.70 MB
16 Pages
Last View : 4d ago
Last Download : 3m ago
Upload by : Gannon Casey
Transcription

REVIEWSA guide to machine learningfor biologistsJoe G. Greener, Shaun M. Kandathil1,2, Lewis Moffat1 and David T. Jones1,21 Abstract The expanding scale and inherent complexity of biological data have encourageda growing use of machine learning in biology to build informative and predictive models of theunderlying biological processes. All machine learning techniques fit models to data; however,the specific methods are quite varied and can at first glance seem bewildering. In this Review,we aim to provide readers with a gentle introduction to a few key machine learning techniques,including the most recently developed and widely used techniques involving deep neuralnetworks. We describe how different techniques may be suited to specific types of biological data,and also discuss some best practices and points to consider when one is embarking on experimentsinvolving machine learning. Some emerging directions in machine learning methodology arealso discussed.Deep learningMachine learning methodsbased on neural networks.The adjective ‘deep’ refersto the use of many hiddenlayers in the network, twohidden layers as a minimumbut usually many more thanthat. Deep learning is a subsetof machine learning, andhence of artificial intelligencemore broadly.Artificial neural networksA collection of connectednodes loosely representingneuron connectivity in abiological brain. Each node ispart of a layer and representsa number calculated from theprevious layer. The connections,or edges, allow a signal to flowfrom the input layer to theoutput layer via hidden layers.1Department of ComputerScience, University CollegeLondon, London, UK.2These authors contributedequally: Joe G. Greener,Shaun M. Kandathil. e- mail: 021-00407-0Humans make sense of the world around them byobserving it, and learning to predict what might happennext. Consider a child learning to catch a ball: the child(usually) knows nothing about the physical laws thatgovern the motion of a thrown ball; however, by a process of observation, trial and error, the child adjusts hisor her understanding of the ball’s motion, and how tomove his or her body, until he or she is able to catch itreliably. In other words, the child has learned how tocatch the ball by building a sufficiently accurate anduseful ‘model’ of the process, by repeatedly testing thismodel against the data and by making corrections tothe model to make it better.‘Machine learning’ refers broadly to the process of fitting predictive models to data or of identifying informative groupings within data. The field of machine learningessentially attempts to approximate or imitate humans’ability to recognize patterns, albeit in an objective manner, using computation. Machine learning is particularlyuseful when the dataset one wishes to analyse is too large(many individual data points) or too complex (containsa large number of features) for human analysis and/orwhen it is desired to automate the process of data analysis to establish a reproducible and time- efficient pipeline.Data from biological experiments frequently possessthese properties; biological datasets have grown enormously in both size and complexity in the past few decades, and it is becoming increasingly important not onlyto have some practical means of making sense of thisdata abundance but also to have a sound understanding of the techniques that are used. Machine learninghas been used in biology for a number of decades, butit has steadily grown in importance to the point whereNature Reviews Molecular Cell Biology0123456789();:it is used in nearly every field of biology. However, onlyin the past few years has the field taken a more criticallook at the available strategies and begun to assess whichmethods are most appropriate in different scenarios,or even whether they are appropriate at all.This Review aims to inform biologists on how theycan start to understand and use machine learning techniques. We do not intend to present a thorough literaturereview of articles using machine learning for biologicalproblems1, or to describe the detailed mathematics ofvarious machine learning methods2,3. Instead, we focuson linking particular techniques to different types of biological data (similar reviews are available for specificbiological disciplines; see, for example, refs4–11). We alsoattempt to distil some best practices of how to practically go about the process of training and improving amodel. The complexity of biological data presents pitfallsas well as opportunities for their analysis using machinelearning techniques. To address these, we discuss thewidespread issues that affect the validity of studies, withguidance on how to avoid them. The bulk of the Reviewis devoted to the description of a number of machinelearning techniques, and in each case we provide examples of the appropriate application of the method andhow to interpret the results. The methods discussedinclude traditional machine learning methods, as theseare still the best choices in many cases, and deep learningwith artificial neural networks, which are emerging asthe most effective methods for many tasks. We finishby describing what the future holds for incorporatingmachine learning in data analysis pipelines in biology.There are two goals when one is using machine learning in biology. The first is to make accurate predictions

Reviewswhere experimental data are lacking, and use thesepredictions to guide future research efforts. However,as scientists we seek to understand the world, and sothe second goal is to use machine learning to furtherour understanding of biology. Throughout this guidewe discuss how these two goals often come into conflict in machine learning, and how to extract understanding from models that are often treated as ‘blackboxes’ because their inner workings are difficult tounderstand12.Ground truthThe true value that the outputof a machine learning modelis compared with to train themodel and test performance.These data usually come fromexperimental data (for example,accessibility of a region ofDNA to transcription factors)or expert human annotation(for example healthy orpathological medical image).EncodingAny scheme for numericallyrepresenting (often categorical)data in a form suitable for usein a machine learning model.An encoding can be a fixednumerical representation(for example, one- hot orcontinuous encoding) or canbe defined using parametersthat are trained along withthe rest of a model.One- hot encodingAn encoding scheme thatrepresents a fixed set of ncategorical inputs using nunique n- dimensional vectors,each with one element setto 1 and the rest set to 0.For example, the set of threeletters (A,B,C) could berepresented by the threevectors [1,0,0], [0,1,0]and [0,0,1], respectively.Mean squared errorA loss function that calculatesthe average squared differencebetween the predicted valuesand the ground truth. Thisfunction heavily penalizesoutliers because it increasesrapidly as the differencebetween a predicted valueand the ground truth grows.Binary cross entropyThe most common lossfunction for training a binaryclassifier; that is, for tasksaimed at answering a questionwith only two choices (suchas cancer versus non- cancer);sometimes called ‘log loss’.Key conceptsWe first introduce a number of key concepts in machinelearning. Where possible, we illustrate these conceptswith examples taken from biological literature.General terms. A dataset comprises a number of datapoints or instances, each of which can be thought ofas a single observation from an experiment. Each datapoint is described by a (usually fixed) number of features. Examples of such features include length, time,concentration and gene expression level. A machinelearning task is an objective specification for what wewant a machine learning model to accomplish. Forexample, for an experiment investigating the expression of genes over time, we might want to predict therate of conversion of a specific metabolite into anotherspecies. In this case, the features ‘gene expression level’and ‘time’ could be termed input features or simplyinputs for the model, and ‘conversion rate’ would bethe desired output of the model; that is, the quantitywe are interested in predicting. A model can have anynumber of input and output features. Features can beeither continuous (taking continuous numerical values)or categorical (taking only discrete values). Quite often,categorical features are simply binary and are either true(1) or false (0).Supervised and unsupervised learning. ‘Supervisedmachine learning’ refers to the fitting of a model todata (or a subset of data) that have been labelled —where there exists some ground truth property, whichis usually experimentally measured or assigned byhumans. Examples include protein secondary structure prediction13 and prediction of genome accessibi lity to genome- regulatory factors14. In both cases, theground truth is derived ultimately from laboratoryobservations, but often these raw data are preprocessedin some way. In the case of secondary structure, forexample, the ground truth data are derived from analysing protein crystal structure data in the Protein DataBank, and in the latter case, the ground truth comesfrom data derived from DNA- sequencing experiments.By contrast, unsupervised learning methods are able toidentify patterns in unlabelled data, without the needto provide the system with the ground truth information in the form of predetermined labels, such as findingsubsets of patients with similar expression levels in agene expression study15 or predicting mutation effectsfrom gene sequence co- variation16. Sometimes the twoapproaches are combined in semi-supervised learning,where small amounts of labelled data are combined withlarge amounts of unlabelled data. This can improveperformance in cases where labelled data are costlyto obtain.Classification, regression and clustering problems. Whena problem involves assigning data points to a set of discrete categories (for example, ‘cancerous’ or ‘not cancerous’), the problem is called a ‘classification problem’,and any algorithm that performs such classification canbe said to be a classifier. By contrast, regression modelsoutput a continuous set of values, such as predicting thefree energy change of folding after mutating a residuein a protein17. Continuous values can be thresholdedor otherwise discretized, meaning that it is often possible to reformulate regression problems as classification problems. For example, the free energy changementioned above can be binned into ranges of valuesthat are favourable or unfavourable for protein stability.Clustering methods are used to predict groupings ofsimilar data points in a dataset, and are usually based onsome measure of similarity between data points. Theyare unsupervised methods that do not require that theexamples in a dataset have labels. For example, in agene expression study, clustering could find subsets ofpatients with similar gene expression.Classes and labels. The discrete set of values returnedby a classifier can be made to be mutually exclusive, inwhich case they are called ‘classes’. Where these valuesneed not be mutually exclusive, they are termed ‘labels’.For example, a residue in a protein structure can be inonly one of multiple secondary structure classes, butcould simultaneously be assigned the non- exclusivelabels of being α- helical and transmembrane. Classesand labels are usually represented by an encoding(for example, a one- hot encoding).Loss or cost functions. The output or outputs of amachine learning model are never ideal and will divergefrom the ground truth. The mathematical functionsthat measure this deviation or in more general terms thatmeasure the amount of ‘disagreement’ between theobtained and ideal outputs are referred to as ‘loss functions’ or ‘cost functions’. In supervised learning settings,the loss function would be a measure of deviationof the output relative to the ground truth output. Examplesinclude mean squared error loss for regression problemsand binary cross entropy for classification problems.Parameters and hyperparameters. Models are essentiallymathematical functions that operate on some set of inputfeatures and produce one or more output values or features. To be able to learn on training data, models contain adjustable parameters whose values can be changedover the training process to achieve the best performanceof the model (see later). In a simple regression model,for example, each feature has a parameter that is multiplied by the feature value, and these are added togetherto make the prediction. Hyperparameters are adjustablevalues that are not considered part of the model itselfin that they are not updated during training, but whichstill have an impact on the training of the model and itswww.nature.com/nrm0123456789();:

Reviewsperformance. A common example of a hyperparameter is the learning rate, which controls the rate or speedwith which the model’s parameters are altered duringtraining.Training, validation and testing. Before being usedto make predictions, models require training, whichinvolves automatically adjusting the parameters of amodel to improve its performance. In a supervisedlearning setting, this involves modifying the parametersso the model performs well on a training dataset, byminimizing the average value of the loss or cost function (described earlier). Usually, a separate validationdataset is used to monitor but not influence the training process so as to detect potential overfitting (see thenext section). In unsupervised settings, a cost function is still minimized, although it does not operate onground truth outputs. Once a model is trained, it canbe tested on data not used for training. See Box 1 for aguide to the overall process of training and how to splitthe data appropriately between training and testing sets.A flowchart to help the overall process is shown in Fig. 1,and some of the concepts in model training are shownin Fig. 2.Box 1 Doing machine learningHere we outline the steps that should be taken when one is training a machinelearning model. there is surprisingly little guidance available on the model selectionand training process146,147, with descriptions of the stepping stones and failed modelsrarely making it into published research articles. the first step, before touching anymachine learning code, should be to fully understand the data (inputs) and predictiontask (outputs) at hand. this means a biological understanding of the question, suchas knowing the origin of the data and the sources of noise, and having an idea of howthe output could theoretically be predicted from the input using biological principles.For example, it can be reasoned that different amino acids might have preferences forparticular secondary structures in proteins, so it makes sense to predict secondarystructure from amino acid frequencies at each position in a protein sequence. it is alsoimportant to know how the inputs and outputs are stored computationally. are theynormalized to prevent one feature having an unduly large influence on prediction?are they encoded as binary variables or continuously? are there duplicate entries?are there missing data elements?Next, the data should be split to allow training, validation and testing. there are anumber of ways to do this, two of which are shown in Fig. 2a. the training set is used todirectly update the parameters of the model being trained. the validation set, usuallyaround 10% of the available data, is used to monitor training, select hyperparametersand prevent the model overfitting to the training data. Often k- fold cross- validation isused: the training set is split into k evenly sized partitions (for example, five or ten) toform k different training and validation sets, and the performance is compared acrosseach partition to select the best hyperparameters. the test set, sometimes calledthe ‘hold- out set’, typically also around 10% of the available data, is used to assess theperformance of the model on data not used for training or validation (that is, estimateits expected real- world performance). the test set should be used only once, at thevery end of the study, or as infrequently as possible27,38 to avoid tuning the model tofit the test set. see the section Data leakage for issues to consider when making a fairtest set.the next step is model selection, which depends on the nature of the data and theprediction task, and is summarized in Fig. 1. the training set is used to train the modelfollowing best practices of the software framework being used. Most methods have ahandful of hyperparameters that need to be tuned to achieve the best performance.this can be done using random search or grid search, and can be combined with k- foldcross- validation as outlined above27. Model ensembling should be considered, where theoutputs of a number of similar models are simply averaged to give a relatively reliableway to boost overall accuracy of the modelling task. Finally, the accuracy of the modelon the test set (see above) should be assessed.Nature Reviews Molecular Cell Biology0123456789();:Overfitting and underfitting. The purpose of fitting amodel to training data is to capture the ‘true’ relationship between the variables in the data, such that themodel has predictive power on unseen (non- training)data. Models that are either overfitted or underfittedwill produce poor predictions on data not in the training set (Fig. 2d) . An overfitted model will produceexcellent results on data in the training set (usually asa result of having too many parameters), but will produce poor results on unseen data. The overfitted modelin Fig. 2d passes exactly through every training point,and so its prediction error on the training set will bezero. However, it is evident that this model has ‘memorized’ the training data and is unlikely to produce goodresults on unseen data. By contrast, an underfitted modelfails to adequately capture the relationships between thevariables in the data. This could be due to an incorrectchoice of model type, incomplete or incorrect assumptions about the data, too few parameters in the modeland/or an incomplete training process. The underfittedmodel depicted in Fig. 2d is inadequate for the data itis trying to fit; in this case it is evident that the variables have a non- linear relationship, which cannot beadequa tely described with a simple linear model and soa non- linear model would be more appropriate.Inductive bias and the bias–variance trade- off. The‘inductive bias’ of a model refers to the set of assumptions made in the learning algorithm that leads it tofavour a particular solution to a learning problem overothers. It can be thought of as the model’s preferencefor a particular type of solution to a learning problemover others. This preference is often programmed intothe model using its specific mathematical form and/orby using a particular loss function. For example, theinductive bias of recurrent neural networks (RNNs; discussed later) is that there are sequential dependencies inthe input data such as the concentration of a metaboliteover time. This dependence is explicitly accounted for inthe mathematical form of an RNN. Different inductivebiases in different model types make them more suitableand usually better performing for specific types of data.Another important concept is the trade- off between biasand variance. A model with a high bias can be said tohave stronger constraints on the trained model, whereasa model with low bias makes fewer assumptions aboutthe property being modelled, and can, in theory, model awide variety of function types. The variance of a modeldescribes how much the trained model changes inresponse to training it on different training datasets.In general, we desire models with very low bias and lowvariance, although these objectives are often in conflictas a model with low bias will often learn different signalson different training sets. Controlling the bias–variancetrade- off is key to avoiding overfitting or underfitting.Traditional machine learningWe now discuss several key machine learning methods,with an emphasis on their particular strengths andweaknesses. A comparison of different machine learning approaches is shown in Table 1. We begin with adiscussion of methods not based on neural networks,

ReviewsDefine taskObtain dataForm test set(if supervised)Select modelTrainSufficient data? NoYesGraphconvolutionalnetworkConnections betweenYes entities?No2D/3Dconvolutionalneural networkSpatial orYes image data?NoRecurrent neuralnetwork/1Dconvolutionalneural networkTest(if supervised)Get more dataSmall, fixed number ofNo features or no data labelsYesJust visualizingPredictingclass or value? ValueClassSequentialYes data?NoMultilayerperceptronTuneLabelled ClusteringSupport vector machine/random forest/gradient boostingFig. 1 choosing and training a machine learning method. The overall procedure for training a machine learningmethod is shown along the top. A decision tree to assist researchers in selecting a model is given below. This flowchartis intended to be used as a visual guide linking the concepts outlined in this Review. However, a simple overview such asthis cannot cover every case. For example, the number of data points required for machine learning to become applicabledepends on the number of features available for each data point, with more features requiring more data points, andalso depends on the model being used. There are also deep learning models that work on unlabelled data.Linear regressionA model that assumes thatthe output can be calculatedfrom a linear combinationof inputs; that is, each inputfeature is multiplied by asingle parameter and thesevalues are added. It is easyto interpret how these modelsmake their predictions.Kernel functionsTransformations applied toeach data point to map theoriginal points into a space inwhich they become separablewith respect to their class.Non- linear regressionA model where the output iscalculated from a non- linearcombination of inputs; that is,the input features can becombined during predictionusing operations such asmultiplication. These modelscan describe more complexphenomena than linearregression.sometimes called ‘traditional machine learning’. Figure 3shows some of the methods of traditional machinelearning. Various software packages can be used to trainsuch models, including scikit- learn in Python18, caret inR19 and MLJ in Julia20.When one is developing machine learning methodsfor use with biological data, traditional machine learningshould generally be seen as the first area to explore infinding the most appropriate method for a given task.Deep learning can be a powerful tool, and is undeniablytrendy currently. However, it is still limited in the application areas in which it excels: when large amounts ofdata are available (for example, millions of data points);when each data point has many features; and when thefeatures are highly structured (the features have clearrelationships with one another, such as adjacent pixelsin images)21. Data such as DNA, RNA and proteinsequences22,23 and microscopy images24,25 are examples ofbiological data where these requirements can be met anddeep learning has been successfully applied. However,the requirement for large amounts of data can makedeep learning a poor choice even when the other tworequirements are met.Traditional methods, in comparison to deep learning, are much faster to develop and test on a givenproblem. Developing the architecture of a deep neuralnetwork and then training it can be a time-consumingand computationally expensive task to undertake26 compared with traditional models such as support vectormachines (SVMs) and random forests27. Although someapproaches exist, with deep neural networks it is stillnot trivial to estimate feature importance28 (that is, howimportant each feature is for contributing to the prediction) or the confidence of predictions of the model1,28,29,both of which are often essential in biological settings.Even if deep learning appears technically feasible fora particular biological prediction task, it is often still prudent to train a traditional method to compare it againsta neural network- based model, if possible30.Traditional methods typically expect that each example in the dataset has the same number of features, so thisis not always possible. An obvious biological example ofthis is when protein, RNA or DNA sequences are beingused and each example has a different length. To use traditional methods with these data, the data can be alteredso they are all the same size using simple techniques suchas padding and windowing. ‘Padding’ means takingeach example and adding additional values containingzero until it is the same size as the largest example inthe dataset. By contrast, windowing shortens indivi dualexamples to a given size (for example, using only thefirst 100 residues of each protein in a dataset of proteinsequences with lengths ranging from 100 upwards).Use of classification and regression models. For regression problems such as those shown in Fig. 3a, ridgeregression (linear regression with a regularization term)is often a good starting point for developing a model, asit can provide a fast and well- understood benchmark fora given task. Other variants of linear regression such asLASSO regression31 and elastic net regression32 are alsoworth considering when there is a desire for a model torely on a minimal number of features within the availabledata. Unfortunately, the relationships between features inthe data are often non- linear, and so use of a model suchas an SVM is often a more appropriate choice for thesecases33. SVMs are a powerful type of regression and classification model that uses kernel functions to transform anon- separable problem into a separable problem that iseasier to solve. SVMs can be used to perform both linearregression and non- linear regression depending on thekernel function used34–37. A good approach to developingwww.nature.com/nrm0123456789();:

ReviewsA classification approachwhere a data point is classifiedon the basis of the known(ground truth) classes of the kmost similar points in thetraining set using a majorityvoting rule. k is a parameterthat can be tuned. Can alsobe used for regression byaveraging the property valueover the k nearest neighbours.aa model is to train a linear SVM and an SVM with aradial basis function kernel (a general- purpose non- linear type of SVM) to quantify what gain, if any, canbe had from a non- linear model. Non- linear approachescan provide more powerful models but at the cost ofeasy interpretation of which features are influencing themodel, a trade- off mentioned in the introduction.Many of the models that are commonly used inregression are also used for classification. Training alinear SVM and an SVM with a radial basis functionkernel is also a good default starting point for a classifi cation task. An additional method that can be tried isk nearest neighbours classification38. Being one of thesimplest classification methods, k nearest neighboursclassification provides a useful baseline performancemarker against which other more complex models, suchas SVMs, can be compared. Another class of robustnon- linear methods is ensemble- based models such asrandom forests39 and XGBoost40,41. Both methods arepowerful non- linear models that have the added benefits of providing feature importance estimates and oftenrequiring minimal hyperparameter tuning. Due to theassignment of feature importance values and the decision tree structure, these models are a good choice ifunderstanding which features contributed the most to aprediction is essential for biological understanding.For both classification and regression, the manyavailable models tend to have a bewildering variety offlavours and variants. Trying to predict how well suited aUsed to train Used to assessmodel y reduction. Dimensionality reductiontechniques are used to transform data with a largenumber of attributes (or dimensions) into a lower- dimensional form while preserving the different relationships between the data points as much as possible.c Continuous encodingHelix(1.0, 0.0, 0.0)Sheet(0.0, 1.0, 0.0)(0.00, 0.57, 1.00)(0.96, 0.42, 1.00)(0.0, 0.0, 1.0)EncodingeData pointPixelLearning rateRGB valuesfEncodingEarly stoppingToo lowLossModelUnderfitUse of clustering models. The use of clustering algorithms (Fig. 3e) is pervasive within biology42,43. k- means isa strong general purpose approach to clustering that, likemany other clustering algorithms, requires the numberof clusters to be set as a hyperparameter44. DBSCAN isan alternative method that does not require the numberof clusters to be predefined, but has the trade- off thatother hyperparameters have to be set45. Dimensionalityreduction can also be performed before clustering toimprove performance for datasets with a large numberof features.b One-hot encodingCoilCategoryk-fold cross-validationparticular method will be to a particular problem a priorican be deceptive, and instead taking an empiri c al,trial- and- error approach to finding the best model isgenerally the most prudent approach. With modernmachine learning suites such as scikit- learn18, changingbetween these model variants often requires changingjust one line of code, so a good overall strategy for selecting the best method is to train and optimize a varietyof the aforementioned methods and choose the one withthe best performance on the validation set before finallycomparing their performance on a separate test set.Good fitOverfitToo highNature Reviews Molecular Cell Biology0123456789();:Validation setGoodTraining timeFig. 2 Training machine learning methods. a Available data ar

Deep learning Machine learning methods based on neural networks. The adjective 'deep' refers to the use of many hidden layers in the network, two hidden layers as a minimum but usually many more than that. Deep learning is a subset of machine learning, and hence of artificial intelligence more broadly. Artificial neural networks

Related Documents:

decoration machine mortar machine paster machine plater machine wall machinery putzmeister plastering machine mortar spraying machine india ez renda automatic rendering machine price wall painting machine price machine manufacturers in china mail concrete mixer machines cement mixture machine wall finishing machine .

Machine learning has many different faces. We are interested in these aspects of machine learning which are related to representation theory. However, machine learning has been combined with other areas of mathematics. Statistical machine learning. Topological machine learning. Computer science. Wojciech Czaja Mathematical Methods in Machine .

Machine Learning Real life problems Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro. to Stats. Machine Learning . Learning from the Databy Yaser Abu-Mostafa in Caltech. Machine Learningby Andrew Ng in Stanford. Machine Learning(or related courses) by Nando de Freitas in UBC (now Oxford).

Machine Learning Machine Learning B. Supervised Learning: Nonlinear Models B.5. A First Look at Bayesian and Markov Networks Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL .

work/products (Beading, Candles, Carving, Food Products, Soap, Weaving, etc.) ⃝I understand that if my work contains Indigenous visual representation that it is a reflection of the Indigenous culture of my native region. ⃝To the best of my knowledge, my work/products fall within Craft Council standards and expectations with respect to

with machine learning algorithms to support weak areas of a machine-only classifier. Supporting Machine Learning Interactive machine learning systems can speed up model evaluation and helping users quickly discover classifier de-ficiencies. Some systems help users choose between multiple machine learning models (e.g., [17]) and tune model .

Artificial Intelligence, Machine Learning, and Deep Learning (AI/ML/DL) F(x) Deep Learning Artificial Intelligence Machine Learning Artificial Intelligence Technique where computer can mimic human behavior Machine Learning Subset of AI techniques which use algorithms to enable machines to learn from data Deep Learning

a) Plain milling machine b) Universal milling machine c) Omniversal milling machine d) Vertical milling machine 2. Table type milling machine 3. Planer type milling machine 4. Special type milling machine 5.2.1 Column and knee type milling machine The column of a column and knee