Convolutional Neural Networks: An Overview And Application .

1y ago
13 Views
2 Downloads
3.50 MB
19 Pages
Last View : 23d ago
Last Download : 3m ago
Upload by : Victor Nelms
Transcription

Insights into Imaging (2018) -9REVIEWConvolutional neural networks: an overviewand application in radiologyRikiya Yamashita 1,2 & Mizuho Nishio 1,3 & Richard Kinh Gian Do 2 & Kaori Togashi 1Received: 3 March 2018 / Revised: 24 April 2018 / Accepted: 28 May 2018 / Published online: 22 June 2018# The Author(s) 2018AbstractConvolutional neural network (CNN), a class of artificial neural networks that has become dominant in various computer visiontasks, is attracting interest across a variety of domains, including radiology. CNN is designed to automatically and adaptivelylearn spatial hierarchies of features through backpropagation by using multiple building blocks, such as convolution layers,pooling layers, and fully connected layers. This review article offers a perspective on the basic concepts of CNN and itsapplication to various radiological tasks, and discusses its challenges and future directions in the field of radiology. Twochallenges in applying CNN to radiological tasks, small dataset and overfitting, will also be covered in this article, as well astechniques to minimize them. Being familiar with the concepts and advantages, as well as limitations, of CNN is essential toleverage its potential in diagnostic radiology, with the goal of augmenting the performance of radiologists and improving patientcare.Key Points Convolutional neural network is a class of deep learning methods which has become dominant in various computer vision tasksand is attracting interest across a variety of domains, including radiology. Convolutional neural network is composed of multiple building blocks, such as convolution layers, pooling layers, and fullyconnected layers, and is designed to automatically and adaptively learn spatial hierarchies of features through abackpropagation algorithm. Familiarity with the concepts and advantages, as well as limitations, of convolutional neural network is essential to leverage itspotential to improve radiologist performance and, eventually, patient care.Keywords Machine learning . Deep learning . Convolutional neural network . Medical imaging . sional3DThree-dimensional* Rikiya Yamashitarickdom2610@gmail.com1Department of Diagnostic Imaging and Nuclear Medicine, KyotoUniversity Graduate School of Medicine, 54 Kawahara-cho,Shogoin, Sakyo-ku, Kyoto 606-8507, JapanCADCADeCAMCNNCTFBPGANGPUIEEEILSVRC2Department of Radiology, Memorial Sloan Kettering Cancer Center,1275 York Avenue, New York, NY 10065, USAISBI3Preemptive Medicine and Lifestyle Disease Research Center, KyotoUniversity Hospital, 53 Kawahara-cho, Shogoin, Sakyo-ku,Kyoto 606-8507, JapanLIDC-IDRIComputer-aided diagnosisComputer-aided detectionClass activation mapConvolutional neural networkComputed tomographyFiltered backprojectionGenerative adversarial networkGraphical processing unitThe Institute of Electrical andElectronics EngineersImageNet Large Scale VisualRecognition CompetitionIEEE International Symposium onBiomedical ImagingLung Image Database Consortium andImage Database Resource Initiative

612MRIPETReLURIRGBSDGInsights Imaging (2018) 9:611–629Magnetic resonance imagingPositron emission tomographyRectified linear unitRadio isotopeRed, green, and blueStochastic gradient descentIntroductionA tremendous interest in deep learning has emerged in recentyears [1]. The most established algorithm among various deeplearning models is convolutional neural network (CNN), aclass of artificial neural networks that has been a dominantmethod in computer vision tasks since the astonishing resultswere shared on the object recognition competition known asthe ImageNet Large Scale Visual Recognition Competition(ILSVRC) in 2012 [2, 3]. Medical research is no exception,as CNN has achieved expert-level performances in variousfields. Gulshan et al. [4], Esteva et al. [5], and EhteshamiBejnordi et al. [6] demonstrated the potential of deep learningfor diabetic retinopathy screening, skin lesion classification,and lymph node metastasis detection, respectively. Needlessto say, there has been a surge of interest in the potential ofCNN among radiology researchers, and several studies havealready been published in areas such as lesion detection [7],classification [8], segmentation [9], image reconstruction [10,11], and natural language processing [12]. Familiarity withthis state-of-the-art methodology would help not only researchers who apply CNN to their tasks in radiology and medical imaging, but also clinical radiologists, as deep learningmay influence their practice in the near future. This articlefocuses on the basic concepts of CNN and their applicationto various radiology tasks, and discusses its challenges andfuture directions. Other deep learning models, such as recurrent neural networks for sequence models, are beyond thescope of this article.TerminologyThe following terms are consistently employed throughoutthis article so as to avoid confusion. A Bparameter in thisarticle stands for a variable that is automatically learned duringthe training process. A Bhyperparameter refers to a variablethat needs to be set before the training process starts. ABkernel refers to the sets of learnable parameters applied inconvolution operations. A Bweight is generally used interchangeably with Bparameter ; however, we tried to employthis term when referring to a parameter outside of convolutionlayers, i.e., a kernel, for example in fully connected layers.What is CNN: the big picture (Fig. 1)CNN is a type of deep learning model for processing data thathas a grid pattern, such as images, which is inspired by theorganization of animal visual cortex [13, 14] and designed toautomatically and adaptively learn spatial hierarchies of features, from low- to high-level patterns. CNN is a mathematicalconstruct that is typically composed of three types of layers (orbuilding blocks): convolution, pooling, and fully connectedlayers. The first two, convolution and pooling layers, performfeature extraction, whereas the third, a fully connected layer,maps the extracted features into final output, such as classification. A convolution layer plays a key role in CNN, whichis composed of a stack of mathematical operations, such asconvolution, a specialized type of linear operation. In digital images, pixel values are stored in a two-dimensional(2D) grid, i.e., an array of numbers (Fig. 2), and a smallgrid of parameters called kernel, an optimizable feature extractor, is applied at each image position, which makesCNNs highly efficient for image processing, since a featuremay occur anywhere in the image. As one layer feeds itsoutput into the next layer, extracted features can hierarchically and progressively become more complex. The processof optimizing parameters such as kernels is called training,which is performed so as to minimize the difference between outputs and ground truth labels through an optimization algorithm called backpropagation and gradient descent, among others.How is CNN different from other methodsemployed in radiomics?Most recent radiomics studies use hand-crafted feature extraction techniques, such as texture analysis, followed by conventional machine learning classifiers, such as random forests andsupport vector machines [15, 16]. There are several differences to note between such methods and CNN. First, CNNdoes not require hand-crafted feature extraction. Second,CNN architectures do not necessarily require segmentationof tumors or organs by human experts. Third, CNN is far moredata hungry because of its millions of learnable parameters toestimate, and, thus, is more computationally expensive,resulting in requiring graphical processing units (GPUs) formodel training.Building blocks of CNN architectureThe CNN architecture includes several building blocks, suchas convolution layers, pooling layers, and fully connectedlayers. A typical architecture consists of repetitions of a stackof several convolution layers and a pooling layer, followed by

Insights Imaging (2018) 9:611–629613Fig. 1 An overview of a convolutional neural network (CNN)architecture and the training process. A CNN is composed of a stackingof several building blocks: convolution layers, pooling layers (e.g., maxpooling), and fully connected (FC) layers. A model’s performance underparticular kernels and weights is calculated with a loss function throughforward propagation on a training dataset, and learnable parameters, i.e.,kernels and weights, are updated according to the loss value throughbackpropagation with gradient descent optimization algorithm. ReLU,rectified linear unitone or more fully connected layers. The step where input dataare transformed into output through these layers is called forward propagation (Fig. 1). Although convolution and poolingoperations described in this section are for 2D-CNN, similaroperations can also be performed for three-dimensional(3D)-CNN.consists of a combination of linear and nonlinear operations, i.e., convolution operation and activation function.Convolution layerA convolution layer is a fundamental component of the CNNarchitecture that performs feature extraction, which typicallyConvolutionConvolution is a specialized type of linear operation used forfeature extraction, where a small array of numbers, called akernel, is applied across the input, which is an array of numbers, called a tensor. An element-wise product between eachelement of the kernel and the input tensor is calculated at eachlocation of the tensor and summed to obtain the output valuein the corresponding position of the output tensor, called aFig. 2 A computer sees an image as an array of numbers. The matrix on the right contains numbers between 0 and 255, each of which corresponds to thepixel brightness in the left image. Both are overlaid in the middle image. The source image was downloaded via http://yann.lecun.com/exdb/mnist

614feature map (Fig. 3a–c). This procedure is repeated applyingmultiple kernels to form an arbitrary number of feature maps,which represent different characteristics of the input tensors;different kernels can, thus, be considered as different featureextractors (Fig. 3d). Two key hyperparameters that define theconvolution operation are size and number of kernels. Theformer is typically 3 3, but sometimes 5 5 or 7 7. Thelatter is arbitrary, and determines the depth of output featuremaps.The convolution operation described above does not allowthe center of each kernel to overlap the outermost element ofthe input tensor, and reduces the height and width of the outputfeature map compared to the input tensor. Padding, typicallyzero padding, is a technique to address this issue, where rowsand columns of zeros are added on each side of the input tensor,so as to fit the center of a kernel on the outermost element andkeep the same in-plane dimension through the convolutionoperation (Fig. 4). Modern CNN architectures usually employzero padding to retain in-plane dimensions in order to applymore layers. Without zero padding, each successive featuremap would get smaller after the convolution operation.The distance between two successive kernel positionsis called a stride, which also defines the convolution operation. The common choice of a stride is 1; however, astride larger than 1 is sometimes used in order to achievedownsampling of the feature maps. An alternative technique to perform downsampling is a pooling operation, asdescribed below.The key feature of a convolution operation is weight sharing: kernels are shared across all the image positions. Weightsharing creates the following characteristics of convolutionoperations: (1) letting the local feature patterns extracted bykernels translation b invariant as kernels travel across all theimage positions and detect learned local patterns, (2) learningspatial hierarchies of feature patterns by downsampling inconjunction with a pooling operation, resulting in capturingan increasingly larger field of view, and (3) increasing modelefficiency by reducing the number of parameters to learn incomparison with fully connected neural networks.As described later, the process of training a CNN modelwith regard to the convolution layer is to identify the kernelsthat work best for a given task based on a given trainingdataset. Kernels are the only parameters automatically learnedduring the training process in the convolution layer; on theother hand, the size of the kernels, number of kernels, padding, and stride are hyperparameters that need to be set beforethe training process starts (Table 1).Nonlinear activation functionThe outputs of a linear operation such as convolution are thenpassed through a nonlinear activation function. Althoughsmooth nonlinear functions, such as sigmoid or hyperbolicInsights Imaging (2018) 9:611–629bcFig. 3 a–c An example of convolution operation with a kernel size of 3 3, no padding, and a stride of 1. A kernel is applied across the input tensor,and an element-wise product between each element of the kernel and theinput tensor is calculated at each location and summed to obtain theoutput value in the corresponding position of the output tensor, called afeature map. d Examples of how kernels in convolution layers extractfeatures from an input tensor are shown. Multiple kernels work as different feature extractors, such as a horizontal edge detector (top), a verticaledge detector (middle), and an outline detector (bottom). Note that the leftimage is an input, those in the middle are kernels, and those in the rightare output feature maps

Insights Imaging (2018) 9:611–629615000000002053008000000004900 1860 10002 11314000000000000003 11309700060007000000000000902 1400 10 14 11 11004206000000000002 150990000000000000004 60 ## ## ## ## ## 95 61 3200 2900000000000 10 16 ## ## ## ## ## ## ## ## ## ## ## 10000000000000 14 ## ## ## ## ## ## ## ## ## ## ## ## ##18002420002 98 ## ## ## ## ## ## ## ## ## ## ## ## ## 49308 13 ## ## ## ## 33 ## 5200 11 10013201012003 10 16 ## ## ## 49 000230200001210000000000000 110-1 -2 000000476000000 28 20000004 238110000000 18 137 138007 140 1381000007 14700000000000 12000000000006000000 38 ## ## ## ## ##0000000000000000076 20 ## ## ## ## ## ## ## ## 5600000000000 1200052 39 ## ##00 30 ## ## ## ## ##000000005 1000 38 ## ## ## ## ## ## ## ## ##00000005 10 114 37 15 ## ## 2705800 ## ## ## ## ## ## ## ## ## ## 18 25000000000 80 ## ## ## ## ##0400 13709 188000000000000000000000000000000019 1830180000002 11 28 20009 18 23800005 ## ## ##00 ## ## ## 200 ## ## 40000020 10 13 ## ## ## 36 10000005638 20 97 ## 8400 ## ## ## 410000 ## ## ## 56 1000000 ## ## ## ## 19 ## 77 ## 1700077500000002 16 39 ## ## ##0000000000 ## ## ## 40000000 ## ## ##4026 ## ## ## ## 25 11930 ## ## ## ## ##00000500009 ## ## ## ##00000000 ## ## ## 99000000 ## ## ## ## ## ## ## ## 24000 87 ## ## ## ## 6001 ## ## ## ## ##0 160000450000 83 ## ## ## 99000000 ## ## ## ## 10 20 1000000000 13 ## ## ## ## ## ## ## ## ## ## ## 360 190000004 10 1051 13 ## ## ## ## ##0000 ## ## ## ## 53 37 1900032105 ## ## ## ## ## ## ## ## ## 1707060000015 10 11842 22 37 1500794000005860000000001000000510000000000 ## ## ## ## 27000000000000004 97 ## ## ## ## ## ## ## ## ## 10040000000000000000 ## ## ## ## ## ## ##000040000000 22 ## ## ## ## ## ## 24 ## ## ## ## ##9080000000000000 80 ## ## ## ## ## ## ## ## ##000000000000 ## ## ## ## ## 24006 39 ## ## ## 56000000000000000 ## ## ## ##0000000000000 ## ## ## ##7 110002 62 ## ## ##390000000000 ## ## ##0000 77 ## 80 179000000000 ## ## ## ##9 200 133 13 ## ## ## 61010000000000 ## ## ##0000000000 ## ## ## ## ## 98 55 19 ## ## ## ## ## 5240000000 ## ## ## ## ##00000000 18 ## ## ## ## ## ## ## ## ## ## ## ##5 10000000000000000000 23 ## ## ## ## ## ## ## ## ## ## 14 120000000000000 18 ## ## ## ## ## ## ## ## ## ## ## ## 32 110000000000000610 52 ## ## ## ## ## 3700410000000000002 1100000000055000066001000000000000000000002002437 1600000000 101031000 70 ## ## ## 624 58 ## ## ## ## ## ## ## 110 141060000000002000600000 14920373000001000 19 ## ## 320000000000000 62 ## ## 420000000000 ## ## ## ## 4840000 18 ## ## ## ## ## ## ## 31 ## ## ## ## ## ## 14 24 1000001000 13029000005000100890 -1000000 1500 211 15 1109 233 2000000000006000 13 13900 20304 200000007 120 ## ## 758 29000000000504 12006 1371 11800 1900000000 ## ## ## ## 58000000000700000 77 ## ## 56 75 4800 2903 ## ## ##00000000000000 ## ## ## ## ## ## 45 10 580000 1200 ## ## ## ##0 87 ## ##000000000000000 78 ## ## ## ## ## ## ## 30800010 ## ## ##0 ## ## 620004200000 ## ##00 115000003 ##0 50 ## ## 28000 98 ## ## ## 2400302208000 12100000 ## ##0 14 ## ## 100000 ## ## ## 28005000 74 ## ##00 ## ## 170038000000 ## ## ## 21 ## 47000 87 ## ## ## 20005000 ## ## ##00 ## ## 19056000000 ## ## ## ## 6800000 ## ## ## 62 21000000 37 ## ##0 94 ## ## 2000034100000000 ## ## ## ## 19 32002000000 ## ## ## ## 1000 ## ## ## ##06000 1410 ## ## 25 11 1300 ## ## ## ##9000037400000 ## ## ##0000055000000 68 6300 ## ## ## ## 17610012200000 170 12 ## ## ## ##0 ## ##1000000000060 ## ## ## ## 51 1920 38 2200000003 19 120063001 ## ## ## ##2468800000000000000 ## ## ##0000 ## ## ## 584 1600000000000 74 ## ## ## ##00000 ## ## ##3 1700000000000 ## ## ## ## 5300000 ## ## ##6 19000 ## ## ##0000000000 ## ## ## ## ##00007 110000000000 37 ## ## ## ##0000 34 ## ## ## 13 110000000000000 ## ## 74000 94 ## ## ## 44 14 2000000000000000000 ## ## ## ## ## ## 186 100000000000000000 ## ## ## ## ## 18 192100000000000 1400000 ## ## ## 25 10 1101800000000009001000 22 2603 130000000 18 4700 ##60 6500700008 -100000000000000 5200000000000000000000000360 7510 190 28000 110630070 156 21 15000 ## ## ##0 ## ## ## 380 ## 12 ## ## ##000000000000 18 57 ## 31 6500 46000000 52 ## 85 40 84 68 15 12 410 8500 40 84 63 6800 15 120 410000000000 ##00000000000000022700 ## ## ## ## ## ## ## 1100000 ## ## ## ## ##00000 ##0000105 31 ## ##009 ## ## ## ##0000 57000 310 ## ## 510000000 ## ## ##8 22700000 ##0 10500 190 ## ## ##00 ##06003 310 ## ## ##008000000 41 ## ## ##017000000 ## ## ## ##00000-1 -1 -1-10 28 ## ##003-1 -1 -100 2310 -25090 -10002 11 ## ## ## ## 331003 110 ## ## ##2006 45 ## ## ## ## ## ## ## 33000000 73 ## ## ## ## ## ## ## 85000000 41 ## ##000 ## ##00 ## ## ## ##0000 110000 13 150 ## ## ## ## ## ## ## 29000 ## ## 28 ##0000050000000000 ## 85 ##000000000 ## ## ##0 600000000 ## ## ##00000000 ## ##00000 ## ## ## ##00000000 13 ## ## ## ## ## ##0 7100000 ## ## ## 190 ##00000000000000 26 23 37 930 44000 13 120000 ## ## ## ## ## ## ## ## ## ##000 13 15 110000 ## ##500000000000 ##000000000 2900000000000000000000000000000 ## ## ## ## ##000 ## ## ## ##000000000 ## 84 ## 920 ## ## ## ## ## ## ## ## ##00000 ## ## ##00 14000000 6000000000000000 ## ## ##00000000 68000000000 ## ## ##0000000 ## ## ##00000000000000 13 ## ## ## ##0000 ## ## ## ##00000000000000000000000000 ## ## ## ## ## ## ## ## ## 78 ## 7300 710000008 ## ## ## 73 38 ## ## ## 210000000000000 ## ## ## ## ##00000000000000000 26 260000003 36001000000000003 23 12 37 93000000 44000000 ##000 ## ## ## ##0080 75 19 460 ## ## ## ## ## ##0 ## ## ## ##0 ## 3400 21 21 ##00 ## ## ##00 ## ## ## ## ## 14000 ## ##0 ## ## ## 688 ## ## ## ## 210 3600Fig. 3 (continued)tangent (tanh) function, were used previously because they aremathematical representations of a biological neuron behavior,the most common nonlinear activation function used presentlyis the rectified linear unit (ReLU), which simply computes thefunction: f(x) max(0, x) (Fig. 5) [1, 3, 17–19].and distortions, and decrease the number of subsequent learnable parameters. It is of note that there is no learnable parameter in any of the pooling layers, whereas filter size, stride, andpadding are hyperparameters in pooling operations, similar toconvolution operations.Pooling layerMax poolingA pooling layer provides a typical downsampling operationwhich reduces the in-plane dimensionality of the feature mapsin order to introduce a translation invariance to small shiftsThe most popular form of pooling operation is max pooling,which extracts patches from the input feature maps, outputsthe maximum value in each patch, and discards all the other

616Insights Imaging (2018) 9:611–629Fig. 4 A convolution operationwith zero padding so as to retainin-plane dimensions. Note that aninput dimension of 5 5 is kept inthe output feature map. In thisexample, a kernel size and a strideare set as 3 3 and 1, respectivelyvalues (Fig. 6). A max pooling with a filter of size 2 2 with astride of 2 is commonly used in practice. This downsamplesthe in-plane dimension of feature maps by a factor of 2. Unlikeheight and width, the depth dimension of feature maps remains unchanged.Global average poolingAnother pooling operation worth noting is a global average pooling [20]. A global average pooling performs anextreme type of downsampling, where a feature map withsize of height width is downsampled into a 1 1 arrayby simply taking the average of all the elements in eachfeature map, whereas the depth of feature maps isretained. This operation is typically applied only once before the fully connected layers. The advantages of applying global average pooling are as follows: (1) reduces thenumber of learnable parameters and (2) enables the CNNto accept inputs of variable size.Fully connected layerThe output feature maps of the final convolution orpooling layer is typically flattened, i.e., transformed intoTable 1 A list of parameters andhyperparameters in aconvolutional neural network(CNN)Convolution layerPooling layerFully connected layerOthersa one-dimensional (1D) array of numbers (or vector), andconnected to one or more fully connected layers, alsoknown as dense layers, in which every input is connectedto every output by a learnable weight. Once the featuresextracted by the convolution layers and downsampled bythe pooling layers are created, they are mapped by a subset of fully connected layers to the final outputs of thenetwork, such as the probabilities for each class in classification tasks. The final fully connected layer typicallyhas the same number of output nodes as the number ofclasses. Each fully connected layer is followed by a nonlinear function, such as ReLU, as described above.Last layer activation functionThe activation function applied to the last fully connectedlayer is usually different from the others. An appropriateactivation function needs to be selected according to eachtask. An activation function applied to the multiclass classification task is a softmax function which normalizes output real values from the last fully connected layer to targetclass probabilities, where each value ranges between 0 and1 and all values sum to 1. Typical choices of the last rnel size, number of kernels, stride, padding, activation functionPooling method, filter size, stride, paddingNumber of weights, activation functionModel architecture, optimizer, learning rate, loss function, mini-batchsize, epochs, regularization, weight initialization, dataset splittingNote that a parameter is a variable that is automatically optimized during the training process and ahyperparameter is a variable that needs to be set beforehand

Insights Imaging (2018) 9:611–629617Fig. 5 Activation functions commonly applied to neural networks: a rectified linear unit (ReLU), b sigmoid, and c hyperbolic tangent (tanh)activation function for various types of tasks are summarized in Table 2.mathematically, a partial derivative of the loss with respect toeach learnable parameter, and a single update of a parameter isformulated as follows:Training a networkw ¼ w α*Training a network is a process of finding kernels in convolution layers and weights in fully connected layers which minimize differences between output predictions and givenground truth labels on a training dataset. Backpropagationalgorithm is the method commonly used for training neuralnetworks where loss function and gradient descent optimization algorithm play essential roles. A model performance under particular kernels and weights is calculated by a loss function through forward propagation on a training dataset, andlearnable parameters, namely kernels and weights, are updated according to the loss value through an optimizationalgorithm called backpropagation and gradient descent,among others (Fig. 1).where w stands for each learnable parameter, α stands for alearning rate, and L stands for a loss function. It is of note that,in practice, a learning rate is one of the most importanthyperparameters to be set before the training starts. In practice,for reasons such as memory limitations, the gradients of theloss function with regard to the parameters are computed byusing a subset of the training dataset called mini-batch, andapplied to the parameter updates. This method is calledmini-batch gradient descent, also frequently referred to as stochastic gradient descent (SGD), and a mini-batch size is also ahyperparameter. In addition, many improvements on the gradient descent algorithm have been proposed and widely used,such as SGD with momentum, RMSprop, and Adam [21–23],though the details of these algorithms are beyond the scope ofthis article.Loss functionA loss function, also referred to as a cost function, measuresthe compatibility between output predictions of the networkthrough forward propagation and given ground truth labels.Commonly used loss function for multiclass classification iscross entropy, whereas mean squared error is typically appliedto regression to continuous values. A type of loss function isone of the hyperparameters and needs to be determined according to the given tasks.Gradient descentGradient descent is commonly used as an optimization algorithm that iteratively updates the learnable parameters, i.e.,kernels and weights, of the network so as to minimize the loss.The gradient of the loss function provides us the direction inwhich the function has the steepest rate of increase, and eachlearnable parameter is updated in the negative direction of thegradient with an arbitrary step size determined based on ahyperparameter called learning rate (Fig. 7). The gradient is, L wData and ground truth labelsData and ground truth labels are the most important components in research applying deep learning or other machinelearning methods. As a famous proverb originating in computer science notes: BGarbage in, garbage out. Careful collection of data and ground truth labels with which to train andtest a model is mandatory for a successful deep learning project, but obtaining high-quality labeled data can be costly andtime-consuming. While there may be multiple medical imagedatasets open to the public [24, 25], special attention should bepaid in these cases to the quality of the ground truth labels.Available data are typically split into three sets: a training, avalidation, and a test set (Fig. 8), though there are some variants, such as cross validation. A training set is used to train anetwork, where loss values are calculated via forward propagation and learnable parameters are updated via backpropagation.A validation set is used to evaluate the model during the training

618Insights Imaging (2018) 9:611–629abFig. 6 a An example of max pooling operation with a filter size of 2 2,no padding, and a stride of 2, which extracts 2 2 patches from the inputtensors, outputs the maximum value in each patch, and discards all theother values, resulting in downsampling the in-plane dimension of anTable 2 A list of commonly applied last layer activation functions forvarious tasksTaskLast layer activation functionBinary classificationMulticlass single-class classificationMulticlass multiclass classificationRegression to continuous valuesSigmoidSoftmaxSigmoidIdentityinput tensor by a factor of 2. b Examples of the max pooling operationon the same images in Fig. 3b. Note that images in the upper row aredownsampled by a factor of 2, from 26 26 to 13 13process, fine-tune hyperparameters, and perform model selection. A test set is ideally used only once at the very end of theproject in order to evaluate the performance of the final modelthat was fine-tuned and selected on the training process withtraining and validation sets.Separate validation and test sets are needed because training amodel always involves fine-tuning its hyperparameters andperforming model selection. As this process is performed basedon the performance on the validation set, some informationabout this validation set leaks into the model itself, i.e.,

Insights Imaging (2018) 9:611–629619models during the training process. On the other hand, inmedicine, Bvalidation usually stands for the process of verifying the performance of a prediction model, which is analogous to the term Btest in machine learning. In order to avoidthis confusion, the word Bdevelopment set is sometimes usedas a substitute for Bvalidation set .OverfittingFig. 7 Gradient descent is an optimization algorithm that iterativelyupdates the learnable parameters so as to minimize the loss, whichmeasures the distance between an output prediction and a ground truthlabel. The gradient of the loss function provides the direction in which thefunction has the steepest rate of increase, and all parameters are updated inthe negative direction of the gradient with a step size determined based ona learning rateoverfitting to the validation set, even though the model is neverdirectly trained on it for the learnable parameters. For that reason, it is guaranteed that the model with fine-tunedhyperparameters on the validation set will perform well on thissame validation set. Therefore, a completely unseen dataset, i.e.,a separate test set, is necessary for the appropriate evaluation ofthe model performance, as what we care about is the modelperformance on never-before-seen data, i.e., generalizability.It is worthy of

operations described in this section are for 2D-CNN, similar operations can also be performed for three-dimensional (3D)-CNN. Convolution layer A convolution layer is a fundamental component of the CNN architecture that performs feature extraction, which typically Convolution Convolution is a specialized type of linear operation used for

Related Documents:

Learning a Deep Convolutional Network for Image Super-Resolution . a deep convolutional neural network (CNN) [15] that takes the low- . Convolutional Neural Networks. Convolutional neural networks (CNN) date back decades [15] and have recently shown an explosive popularity par-

Video Super-Resolution With Convolutional Neural Networks Armin Kappeler, Seunghwan Yoo, Qiqin Dai, and Aggelos K. Katsaggelos, Fellow, IEEE Abstract—Convolutional neural networks (CNN) are a special type of deep neural networks (DNN). They have so far been suc-cessfully applied to image super-resolution (SR) as well as other image .

2 Convolutional neural networks CNNs are hierarchical neural networks whose convolutional layers alternate with subsampling layers, reminiscent of sim-ple and complex cells in the primary visual cortex [Wiesel and Hubel, 1959]. CNNs vary in how convolutional and sub-sampling layers are realized and how the nets are trained. 2.1 Image processing .

Deep Neural Networks Convolutional Neural Networks (CNNs) Convolutional Neural Networks (CNN, ConvNet, DCN) CNN a multi‐layer neural network with – Local connectivity: Neurons in a layer are only connected to a small region of the layer before it – Share weight parameters across spatial positions:

ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012 M. Zeiler and R. Fergus, Visualizing and Understanding Convolutional Networks, ECCV 2014 K. Simonyan and A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, ICLR 2015

Dual-domain Deep Convolutional Neural Networks for Image Demoireing An Gia Vien, Hyunkook Park, and Chul Lee Department of Multimedia Engineering Dongguk University, Seoul, Korea viengiaan@mme.dongguk.edu, hyunkook@mme.dongguk.edu, chullee@dongguk.edu Abstract We develop deep convolutional neural networks (CNNs)

Convolutional Neural Networks While in fully-connected deep neural networks, the activa-tion of each hidden unit is computed by multiplying the entire in-put by the correspondent weights for each neuron in that layer, in CNNs, the activation of each hidden unit is computed for a small input area. CNNs are composed of convolutional layers which

Deep Convolutional Neural Networks for Remote Sensing Investigation of Looting of the Archeological Site of Al-Lisht, Egypt by Timberlynn Woolf . potential to expedite the looting detection process using Deep Convolutional Neural Networks (CNNs). Monitoring of looting is complicated in that it is an illicit activity, subject to legal sanction .