Convolutional Neural Networks: An Overview And Application .

1y ago

13 Views

2 Downloads

3.50 MB

19 Pages

Last View : 23d ago

Last Download : 3m ago

Upload by : Victor Nelms

Report this link

Download PDF

Transcription

Insights into Imaging (2018) -9REVIEWConvolutional neural networks: an overviewand application in radiologyRikiya Yamashita 1,2 & Mizuho Nishio 1,3 & Richard Kinh Gian Do 2 & Kaori Togashi 1Received: 3 March 2018 / Revised: 24 April 2018 / Accepted: 28 May 2018 / Published online: 22 June 2018# The Author(s) 2018AbstractConvolutional neural network (CNN), a class of artificial neural networks that has become dominant in various computer visiontasks, is attracting interest across a variety of domains, including radiology. CNN is designed to automatically and adaptivelylearn spatial hierarchies of features through backpropagation by using multiple building blocks, such as convolution layers,pooling layers, and fully connected layers. This review article offers a perspective on the basic concepts of CNN and itsapplication to various radiological tasks, and discusses its challenges and future directions in the field of radiology. Twochallenges in applying CNN to radiological tasks, small dataset and overfitting, will also be covered in this article, as well astechniques to minimize them. Being familiar with the concepts and advantages, as well as limitations, of CNN is essential toleverage its potential in diagnostic radiology, with the goal of augmenting the performance of radiologists and improving patientcare.Key Points Convolutional neural network is a class of deep learning methods which has become dominant in various computer vision tasksand is attracting interest across a variety of domains, including radiology. Convolutional neural network is composed of multiple building blocks, such as convolution layers, pooling layers, and fullyconnected layers, and is designed to automatically and adaptively learn spatial hierarchies of features through abackpropagation algorithm. Familiarity with the concepts and advantages, as well as limitations, of convolutional neural network is essential to leverage itspotential to improve radiologist performance and, eventually, patient care.Keywords Machine learning . Deep learning . Convolutional neural network . Medical imaging . sional3DThree-dimensional* Rikiya Yamashitarickdom2610@gmail.com1Department of Diagnostic Imaging and Nuclear Medicine, KyotoUniversity Graduate School of Medicine, 54 Kawahara-cho,Shogoin, Sakyo-ku, Kyoto 606-8507, JapanCADCADeCAMCNNCTFBPGANGPUIEEEILSVRC2Department of Radiology, Memorial Sloan Kettering Cancer Center,1275 York Avenue, New York, NY 10065, USAISBI3Preemptive Medicine and Lifestyle Disease Research Center, KyotoUniversity Hospital, 53 Kawahara-cho, Shogoin, Sakyo-ku,Kyoto 606-8507, JapanLIDC-IDRIComputer-aided diagnosisComputer-aided detectionClass activation mapConvolutional neural networkComputed tomographyFiltered backprojectionGenerative adversarial networkGraphical processing unitThe Institute of Electrical andElectronics EngineersImageNet Large Scale VisualRecognition CompetitionIEEE International Symposium onBiomedical ImagingLung Image Database Consortium andImage Database Resource Initiative

612MRIPETReLURIRGBSDGInsights Imaging (2018) 9:611–629Magnetic resonance imagingPositron emission tomographyRectified linear unitRadio isotopeRed, green, and blueStochastic gradient descentIntroductionA tremendous interest in deep learning has emerged in recentyears [1]. The most established algorithm among various deeplearning models is convolutional neural network (CNN), aclass of artificial neural networks that has been a dominantmethod in computer vision tasks since the astonishing resultswere shared on the object recognition competition known asthe ImageNet Large Scale Visual Recognition Competition(ILSVRC) in 2012 [2, 3]. Medical research is no exception,as CNN has achieved expert-level performances in variousfields. Gulshan et al. [4], Esteva et al. [5], and EhteshamiBejnordi et al. [6] demonstrated the potential of deep learningfor diabetic retinopathy screening, skin lesion classification,and lymph node metastasis detection, respectively. Needlessto say, there has been a surge of interest in the potential ofCNN among radiology researchers, and several studies havealready been published in areas such as lesion detection [7],classification [8], segmentation [9], image reconstruction [10,11], and natural language processing [12]. Familiarity withthis state-of-the-art methodology would help not only researchers who apply CNN to their tasks in radiology and medical imaging, but also clinical radiologists, as deep learningmay influence their practice in the near future. This articlefocuses on the basic concepts of CNN and their applicationto various radiology tasks, and discusses its challenges andfuture directions. Other deep learning models, such as recurrent neural networks for sequence models, are beyond thescope of this article.TerminologyThe following terms are consistently employed throughoutthis article so as to avoid confusion. A Bparameter in thisarticle stands for a variable that is automatically learned duringthe training process. A Bhyperparameter refers to a variablethat needs to be set before the training process starts. ABkernel refers to the sets of learnable parameters applied inconvolution operations. A Bweight is generally used interchangeably with Bparameter ; however, we tried to employthis term when referring to a parameter outside of convolutionlayers, i.e., a kernel, for example in fully connected layers.What is CNN: the big picture (Fig. 1)CNN is a type of deep learning model for processing data thathas a grid pattern, such as images, which is inspired by theorganization of animal visual cortex [13, 14] and designed toautomatically and adaptively learn spatial hierarchies of features, from low- to high-level patterns. CNN is a mathematicalconstruct that is typically composed of three types of layers (orbuilding blocks): convolution, pooling, and fully connectedlayers. The first two, convolution and pooling layers, performfeature extraction, whereas the third, a fully connected layer,maps the extracted features into final output, such as classification. A convolution layer plays a key role in CNN, whichis composed of a stack of mathematical operations, such asconvolution, a specialized type of linear operation. In digital images, pixel values are stored in a two-dimensional(2D) grid, i.e., an array of numbers (Fig. 2), and a smallgrid of parameters called kernel, an optimizable feature extractor, is applied at each image position, which makesCNNs highly efficient for image processing, since a featuremay occur anywhere in the image. As one layer feeds itsoutput into the next layer, extracted features can hierarchically and progressively become more complex. The processof optimizing parameters such as kernels is called training,which is performed so as to minimize the difference between outputs and ground truth labels through an optimization algorithm called backpropagation and gradient descent, among others.How is CNN different from other methodsemployed in radiomics?Most recent radiomics studies use hand-crafted feature extraction techniques, such as texture analysis, followed by conventional machine learning classifiers, such as random forests andsupport vector machines [15, 16]. There are several differences to note between such methods and CNN. First, CNNdoes not require hand-crafted feature extraction. Second,CNN architectures do not necessarily require segmentationof tumors or organs by human experts. Third, CNN is far moredata hungry because of its millions of learnable parameters toestimate, and, thus, is more computationally expensive,resulting in requiring graphical processing units (GPUs) formodel training.Building blocks of CNN architectureThe CNN architecture includes several building blocks, suchas convolution layers, pooling layers, and fully connectedlayers. A typical architecture consists of repetitions of a stackof several convolution layers and a pooling layer, followed by

Insights Imaging (2018) 9:611–629613Fig. 1 An overview of a convolutional neural network (CNN)architecture and the training process. A CNN is composed of a stackingof several building blocks: convolution layers, pooling layers (e.g., maxpooling), and fully connected (FC) layers. A model’s performance underparticular kernels and weights is calculated with a loss function throughforward propagation on a training dataset, and learnable parameters, i.e.,kernels and weights, are updated according to the loss value throughbackpropagation with gradient descent optimization algorithm. ReLU,rectified linear unitone or more fully connected layers. The step where input dataare transformed into output through these layers is called forward propagation (Fig. 1). Although convolution and poolingoperations described in this section are for 2D-CNN, similaroperations can also be performed for three-dimensional(3D)-CNN.consists of a combination of linear and nonlinear operations, i.e., convolution operation and activation function.Convolution layerA convolution layer is a fundamental component of the CNNarchitecture that performs feature extraction, which typicallyConvolutionConvolution is a specialized type of linear operation used forfeature extraction, where a small array of numbers, called akernel, is applied across the input, which is an array of numbers, called a tensor. An element-wise product between eachelement of the kernel and the input tensor is calculated at eachlocation of the tensor and summed to obtain the output valuein the corresponding position of the output tensor, called aFig. 2 A computer sees an image as an array of numbers. The matrix on the right contains numbers between 0 and 255, each of which corresponds to thepixel brightness in the left image. Both are overlaid in the middle image. The source image was downloaded via http://yann.lecun.com/exdb/mnist

614feature map (Fig. 3a–c). This procedure is repeated applyingmultiple kernels to form an arbitrary number of feature maps,which represent different characteristics of the input tensors;different kernels can, thus, be considered as different featureextractors (Fig. 3d). Two key hyperparameters that define theconvolution operation are size and number of kernels. Theformer is typically 3 3, but sometimes 5 5 or 7 7. Thelatter is arbitrary, and determines the depth of output featuremaps.The convolution operation described above does not allowthe center of each kernel to overlap the outermost element ofthe input tensor, and reduces the height and width of the outputfeature map compared to the input tensor. Padding, typicallyzero padding, is a technique to address this issue, where rowsand columns of zeros are added on each side of the input tensor,so as to fit the center of a kernel on the outermost element andkeep the same in-plane dimension through the convolutionoperation (Fig. 4). Modern CNN architectures usually employzero padding to retain in-plane dimensions in order to applymore layers. Without zero padding, each successive featuremap would get smaller after the convolution operation.The distance between two successive kernel positionsis called a stride, which also defines the convolution operation. The common choice of a stride is 1; however, astride larger than 1 is sometimes used in order to achievedownsampling of the feature maps. An alternative technique to perform downsampling is a pooling operation, asdescribed below.The key feature of a convolution operation is weight sharing: kernels are shared across all the image positions. Weightsharing creates the following characteristics of convolutionoperations: (1) letting the local feature patterns extracted bykernels translation b invariant as kernels travel across all theimage positions and detect learned local patterns, (2) learningspatial hierarchies of feature patterns by downsampling inconjunction with a pooling operation, resulting in capturingan increasingly larger field of view, and (3) increasing modelefficiency by reducing the number of parameters to learn incomparison with fully connected neural networks.As described later, the process of training a CNN modelwith regard to the convolution layer is to identify the kernelsthat work best for a given task based on a given trainingdataset. Kernels are the only parameters automatically learnedduring the training process in the convolution layer; on theother hand, the size of the kernels, number of kernels, padding, and stride are hyperparameters that need to be set beforethe training process starts (Table 1).Nonlinear activation functionThe outputs of a linear operation such as convolution are thenpassed through a nonlinear activation function. Althoughsmooth nonlinear functions, such as sigmoid or hyperbolicInsights Imaging (2018) 9:611–629bcFig. 3 a–c An example of convolution operation with a kernel size of 3 3, no padding, and a stride of 1. A kernel is applied across the input tensor,and an element-wise product between each element of the kernel and theinput tensor is calculated at each location and summed to obtain theoutput value in the corresponding position of the output tensor, called afeature map. d Examples of how kernels in convolution layers extractfeatures from an input tensor are shown. Multiple kernels work as different feature extractors, such as a horizontal edge detector (top), a verticaledge detector (middle), and an outline detector (bottom). Note that the leftimage is an input, those in the middle are kernels, and those in the rightare output feature maps

Insights Imaging (2018) 9:611–629615000000002053008000000004900 1860 10002 11314000000000000003 11309700060007000000000000902 1400 10 14 11 11004206000000000002 150990000000000000004 60 ## ## ## ## ## 95 61 3200 2900000000000 10 16 ## ## ## ## ## ## ## ## ## ## ## 10000000000000 14 ## ## ## ## ## ## ## ## ## ## ## ## ##18002420002 98 ## ## ## ## ## ## ## ## ## ## ## ## ## 49308 13 ## ## ## ## 33 ## 5200 11 10013201012003 10 16 ## ## ## 49 000230200001210000000000000 110-1 -2 000000476000000 28 20000004 238110000000 18 137 138007 140 1381000007 14700000000000 12000000000006000000 38 ## ## ## ## ##0000000000000000076 20 ## ## ## ## ## ## ## ## 5600000000000 1200052 39 ## ##00 30 ## ## ## ## ##000000005 1000 38 ## ## ## ## ## ## ## ## ##00000005 10 114 37 15 ## ## 2705800 ## ## ## ## ## ## ## ## ## ## 18 25000000000 80 ## ## ## ## ##0400 13709 188000000000000000000000000000000019 1830180000002 11 28 20009 18 23800005 ## ## ##00 ## ## ## 200 ## ## 40000020 10 13 ## ## ## 36 10000005638 20 97 ## 8400 ## ## ## 410000 ## ## ## 56 1000000 ## ## ## ## 19 ## 77 ## 1700077500000002 16 39 ## ## ##0000000000 ## ## ## 40000000 ## ## ##4026 ## ## ## ## 25 11930 ## ## ## ## ##00000500009 ## ## ## ##00000000 ## ## ## 99000000 ## ## ## ## ## ## ## ## 24000 87 ## ## ## ## 6001 ## ## ## ## ##0 160000450000 83 ## ## ## 99000000 ## ## ## ## 10 20 1000000000 13 ## ## ## ## ## ## ## ## ## ## ## 360 190000004 10 1051 13 ## ## ## ## ##0000 ## ## ## ## 53 37 1900032105 ## ## ## ## ## ## ## ## ## 1707060000015 10 11842 22 37 1500794000005860000000001000000510000000000 ## ## ## ## 27000000000000004 97 ## ## ## ## ## ## ## ## ## 10040000000000000000 ## ## ## ## ## ## ##000040000000 22 ## ## ## ## ## ## 24 ## ## ## ## ##9080000000000000 80 ## ## ## ## ## ## ## ## ##000000000000 ## ## ## ## ## 24006 39 ## ## ## 56000000000000000 ## ## ## ##0000000000000 ## ## ## ##7 110002 62 ## ## ##390000000000 ## ## ##0000 77 ## 80 179000000000 ## ## ## ##9 200 133 13 ## ## ## 61010000000000 ## ## ##0000000000 ## ## ## ## ## 98 55 19 ## ## ## ## ## 5240000000 ## ## ## ## ##00000000 18 ## ## ## ## ## ## ## ## ## ## ## ##5 10000000000000000000 23 ## ## ## ## ## ## ## ## ## ## 14 120000000000000 18 ## ## ## ## ## ## ## ## ## ## ## ## 32 110000000000000610 52 ## ## ## ## ## 3700410000000000002 1100000000055000066001000000000000000000002002437 1600000000 101031000 70 ## ## ## 624 58 ## ## ## ## ## ## ## 110 141060000000002000600000 14920373000001000 19 ## ## 320000000000000 62 ## ## 420000000000 ## ## ## ## 4840000 18 ## ## ## ## ## ## ## 31 ## ## ## ## ## ## 14 24 1000001000 13029000005000100890 -1000000 1500 211 15 1109 233 2000000000006000 13 13900 20304 200000007 120 ## ## 758 29000000000504 12006 1371 11800 1900000000 ## ## ## ## 58000000000700000 77 ## ## 56 75 4800 2903 ## ## ##00000000000000 ## ## ## ## ## ## 45 10 580000 1200 ## ## ## ##0 87 ## ##000000000000000 78 ## ## ## ## ## ## ## 30800010 ## ## ##0 ## ## 620004200000 ## ##00 115000003 ##0 50 ## ## 28000 98 ## ## ## 2400302208000 12100000 ## ##0 14 ## ## 100000 ## ## ## 28005000 74 ## ##00 ## ## 170038000000 ## ## ## 21 ## 47000 87 ## ## ## 20005000 ## ## ##00 ## ## 19056000000 ## ## ## ## 6800000 ## ## ## 62 21000000 37 ## ##0 94 ## ## 2000034100000000 ## ## ## ## 19 32002000000 ## ## ## ## 1000 ## ## ## ##06000 1410 ## ## 25 11 1300 ## ## ## ##9000037400000 ## ## ##0000055000000 68 6300 ## ## ## ## 17610012200000 170 12 ## ## ## ##0 ## ##1000000000060 ## ## ## ## 51 1920 38 2200000003 19 120063001 ## ## ## ##2468800000000000000 ## ## ##0000 ## ## ## 584 1600000000000 74 ## ## ## ##00000 ## ## ##3 1700000000000 ## ## ## ## 5300000 ## ## ##6 19000 ## ## ##0000000000 ## ## ## ## ##00007 110000000000 37 ## ## ## ##0000 34 ## ## ## 13 110000000000000 ## ## 74000 94 ## ## ## 44 14 2000000000000000000 ## ## ## ## ## ## 186 100000000000000000 ## ## ## ## ## 18 192100000000000 1400000 ## ## ## 25 10 1101800000000009001000 22 2603 130000000 18 4700 ##60 6500700008 -100000000000000 5200000000000000000000000360 7510 190 28000 110630070 156 21 15000 ## ## ##0 ## ## ## 380 ## 12 ## ## ##000000000000 18 57 ## 31 6500 46000000 52 ## 85 40 84 68 15 12 410 8500 40 84 63 6800 15 120 410000000000 ##00000000000000022700 ## ## ## ## ## ## ## 1100000 ## ## ## ## ##00000 ##0000105 31 ## ##009 ## ## ## ##0000 57000 310 ## ## 510000000 ## ## ##8 22700000 ##0 10500 190 ## ## ##00 ##06003 310 ## ## ##008000000 41 ## ## ##017000000 ## ## ## ##00000-1 -1 -1-10 28 ## ##003-1 -1 -100 2310 -25090 -10002 11 ## ## ## ## 331003 110 ## ## ##2006 45 ## ## ## ## ## ## ## 33000000 73 ## ## ## ## ## ## ## 85000000 41 ## ##000 ## ##00 ## ## ## ##0000 110000 13 150 ## ## ## ## ## ## ## 29000 ## ## 28 ##0000050000000000 ## 85 ##000000000 ## ## ##0 600000000 ## ## ##00000000 ## ##00000 ## ## ## ##00000000 13 ## ## ## ## ## ##0 7100000 ## ## ## 190 ##00000000000000 26 23 37 930 44000 13 120000 ## ## ## ## ## ## ## ## ## ##000 13 15 110000 ## ##500000000000 ##000000000 2900000000000000000000000000000 ## ## ## ## ##000 ## ## ## ##000000000 ## 84 ## 920 ## ## ## ## ## ## ## ## ##00000 ## ## ##00 14000000 6000000000000000 ## ## ##00000000 68000000000 ## ## ##0000000 ## ## ##00000000000000 13 ## ## ## ##0000 ## ## ## ##00000000000000000000000000 ## ## ## ## ## ## ## ## ## 78 ## 7300 710000008 ## ## ## 73 38 ## ## ## 210000000000000 ## ## ## ## ##00000000000000000 26 260000003 36001000000000003 23 12 37 93000000 44000000 ##000 ## ## ## ##0080 75 19 460 ## ## ## ## ## ##0 ## ## ## ##0 ## 3400 21 21 ##00 ## ## ##00 ## ## ## ## ## 14000 ## ##0 ## ## ## 688 ## ## ## ## 210 3600Fig. 3 (continued)tangent (tanh) function, were used previously because they aremathematical representations of a biological neuron behavior,the most common nonlinear activation function used presentlyis the rectified linear unit (ReLU), which simply computes thefunction: f(x) max(0, x) (Fig. 5) [1, 3, 17–19].and distortions, and decrease the number of subsequent learnable parameters. It is of note that there is no learnable parameter in any of the pooling layers, whereas filter size, stride, andpadding are hyperparameters in pooling operations, similar toconvolution operations.Pooling layerMax poolingA pooling layer provides a typical downsampling operationwhich reduces the in-plane dimensionality of the feature mapsin order to introduce a translation invariance to small shiftsThe most popular form of pooling operation is max pooling,which extracts patches from the input feature maps, outputsthe maximum value in each patch, and discards all the other

616Insights Imaging (2018) 9:611–629Fig. 4 A convolution operationwith zero padding so as to retainin-plane dimensions. Note that aninput dimension of 5 5 is kept inthe output feature map. In thisexample, a kernel size and a strideare set as 3 3 and 1, respectivelyvalues (Fig. 6). A max pooling with a filter of size 2 2 with astride of 2 is commonly used in practice. This downsamplesthe in-plane dimension of feature maps by a factor of 2. Unlikeheight and width, the depth dimension of feature maps remains unchanged.Global average poolingAnother pooling operation worth noting is a global average pooling [20]. A global average pooling performs anextreme type of downsampling, where a feature map withsize of height width is downsampled into a 1 1 arrayby simply taking the average of all the elements in eachfeature map, whereas the depth of feature maps isretained. This operation is typically applied only once before the fully connected layers. The advantages of applying global average pooling are as follows: (1) reduces thenumber of learnable parameters and (2) enables the CNNto accept inputs of variable size.Fully connected layerThe output feature maps of the final convolution orpooling layer is typically flattened, i.e., transformed intoTable 1 A list of parameters andhyperparameters in aconvolutional neural network(CNN)Convolution layerPooling layerFully connected layerOthersa one-dimensional (1D) array of numbers (or vector), andconnected to one or more fully connected layers, alsoknown as dense layers, in which every input is connectedto every output by a learnable weight. Once the featuresextracted by the convolution layers and downsampled bythe pooling layers are created, they are mapped by a subset of fully connected layers to the final outputs of thenetwork, such as the probabilities for each class in classification tasks. The final fully connected layer typicallyhas the same number of output nodes as the number ofclasses. Each fully connected layer is followed by a nonlinear function, such as ReLU, as described above.Last layer activation functionThe activation function applied to the last fully connectedlayer is usually different from the others. An appropriateactivation function needs to be selected according to eachtask. An activation function applied to the multiclass classification task is a softmax function which normalizes output real values from the last fully connected layer to targetclass probabilities, where each value ranges between 0 and1 and all values sum to 1. Typical choices of the last rnel size, number of kernels, stride, padding, activation functionPooling method, filter size, stride, paddingNumber of weights, activation functionModel architecture, optimizer, learning rate, loss function, mini-batchsize, epochs, regularization, weight initialization, dataset splittingNote that a parameter is a variable that is automatically optimized during the training process and ahyperparameter is a variable that needs to be set beforehand

Insights Imaging (2018) 9:611–629617Fig. 5 Activation functions commonly applied to neural networks: a rectified linear unit (ReLU), b sigmoid, and c hyperbolic tangent (tanh)activation function for various types of tasks are summarized in Table 2.mathematically, a partial derivative of the loss with respect toeach learnable parameter, and a single update of a parameter isformulated as follows:Training a networkw ¼ w α*Training a network is a process of finding kernels in convolution layers and weights in fully connected layers which minimize differences between output predictions and givenground truth labels on a training dataset. Backpropagationalgorithm is the method commonly used for training neuralnetworks where loss function and gradient descent optimization algorithm play essential roles. A model performance under particular kernels and weights is calculated by a loss function through forward propagation on a training dataset, andlearnable parameters, namely kernels and weights, are updated according to the loss value through an optimizationalgorithm called backpropagation and gradient descent,among others (Fig. 1).where w stands for each learnable parameter, α stands for alearning rate, and L stands for a loss function. It is of note that,in practice, a learning rate is one of the most importanthyperparameters to be set before the training starts. In practice,for reasons such as memory limitations, the gradients of theloss function with regard to the parameters are computed byusing a subset of the training dataset called mini-batch, andapplied to the parameter updates. This method is calledmini-batch gradient descent, also frequently referred to as stochastic gradient descent (SGD), and a mini-batch size is also ahyperparameter. In addition, many improvements on the gradient descent algorithm have been proposed and widely used,such as SGD with momentum, RMSprop, and Adam [21–23],though the details of these algorithms are beyond the scope ofthis article.Loss functionA loss function, also referred to as a cost function, measuresthe compatibility between output predictions of the networkthrough forward propagation and given ground truth labels.Commonly used loss function for multiclass classification iscross entropy, whereas mean squared error is typically appliedto regression to continuous values. A type of loss function isone of the hyperparameters and needs to be determined according to the given tasks.Gradient descentGradient descent is commonly used as an optimization algorithm that iteratively updates the learnable parameters, i.e.,kernels and weights, of the network so as to minimize the loss.The gradient of the loss function provides us the direction inwhich the function has the steepest rate of increase, and eachlearnable parameter is updated in the negative direction of thegradient with an arbitrary step size determined based on ahyperparameter called learning rate (Fig. 7). The gradient is, L wData and ground truth labelsData and ground truth labels are the most important components in research applying deep learning or other machinelearning methods. As a famous proverb originating in computer science notes: BGarbage in, garbage out. Careful collection of data and ground truth labels with which to train andtest a model is mandatory for a successful deep learning project, but obtaining high-quality labeled data can be costly andtime-consuming. While there may be multiple medical imagedatasets open to the public [24, 25], special attention should bepaid in these cases to the quality of the ground truth labels.Available data are typically split into three sets: a training, avalidation, and a test set (Fig. 8), though there are some variants, such as cross validation. A training set is used to train anetwork, where loss values are calculated via forward propagation and learnable parameters are updated via backpropagation.A validation set is used to evaluate the model during the training

618Insights Imaging (2018) 9:611–629abFig. 6 a An example of max pooling operation with a filter size of 2 2,no padding, and a stride of 2, which extracts 2 2 patches from the inputtensors, outputs the maximum value in each patch, and discards all theother values, resulting in downsampling the in-plane dimension of anTable 2 A list of commonly applied last layer activation functions forvarious tasksTaskLast layer activation functionBinary classificationMulticlass single-class classificationMulticlass multiclass classificationRegression to continuous valuesSigmoidSoftmaxSigmoidIdentityinput tensor by a factor of 2. b Examples of the max pooling operationon the same images in Fig. 3b. Note that images in the upper row aredownsampled by a factor of 2, from 26 26 to 13 13process, fine-tune hyperparameters, and perform model selection. A test set is ideally used only once at the very end of theproject in order to evaluate the performance of the final modelthat was fine-tuned and selected on the training process withtraining and validation sets.Separate validation and test sets are needed because training amodel always involves fine-tuning its hyperparameters andperforming model selection. As this process is performed basedon the performance on the validation set, some informationabout this validation set leaks into the model itself, i.e.,

Insights Imaging (2018) 9:611–629619models during the training process. On the other hand, inmedicine, Bvalidation usually stands for the process of verifying the performance of a prediction model, which is analogous to the term Btest in machine learning. In order to avoidthis confusion, the word Bdevelopment set is sometimes usedas a substitute for Bvalidation set .OverfittingFig. 7 Gradient descent is an optimization algorithm that iterativelyupdates the learnable parameters so as to minimize the loss, whichmeasures the distance between an output prediction and a ground truthlabel. The gradient of the loss function provides the direction in which thefunction has the steepest rate of increase, and all parameters are updated inthe negative direction of the gradient with a step size determined based ona learning rateoverfitting to the validation set, even though the model is neverdirectly trained on it for the learnable parameters. For that reason, it is guaranteed that the model with fine-tunedhyperparameters on the validation set will perform well on thissame validation set. Therefore, a completely unseen dataset, i.e.,a separate test set, is necessary for the appropriate evaluation ofthe model performance, as what we care about is the modelperformance on never-before-seen data, i.e., generalizability.It is worthy of

operations described in this section are for 2D-CNN, similar operations can also be performed for three-dimensional (3D)-CNN. Convolution layer A convolution layer is a fundamental component of the CNN architecture that performs feature extraction, which typically Convolution Convolution is a specialized type of linear operation used for

Related Documents:

LNCS 8692 - Learning a Deep Convolutional Network for ...

Learning a Deep Convolutional Network for Image Super-Resolution . a deep convolutional neural network (CNN) [15] that takes the low- . Convolutional Neural Networks. Convolutional neural networks (CNN) date back decades [15] and have recently shown an explosive popularity par-

47 Views

3y ago

Video Super-Resolution With Convolutional Neural Networks

Video Super-Resolution With Convolutional Neural Networks Armin Kappeler, Seunghwan Yoo, Qiqin Dai, and Aggelos K. Katsaggelos, Fellow, IEEE Abstract—Convolutional neural networks (CNN) are a special type of deep neural networks (DNN). They have so far been suc-cessfully applied to image super-resolution (SR) as well as other image .

37 Views

3y ago

Flexible, High Performance Convolutional Neural Networks ...

2 Convolutional neural networks CNNs are hierarchical neural networks whose convolutional layers alternate with subsampling layers, reminiscent of sim-ple and complex cells in the primary visual cortex [Wiesel and Hubel, 1959]. CNNs vary in how convolutional and sub-sampling layers are realized and how the nets are trained. 2.1 Image processing .

54 Views

3y ago

Deep neural networks I - University of California, Davis

Deep Neural Networks Convolutional Neural Networks (CNNs) Convolutional Neural Networks (CNN, ConvNet, DCN) CNN a multi‐layer neural network with – Local connectivity: Neurons in a layer are only connected to a small region of the layer before it – Share weight parameters across spatial positions:

41 Views

3y ago

Convolutional Neural Network Architectures: from LeNet to ...

ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012 M. Zeiler and R. Fergus, Visualizing and Understanding Convolutional Networks, ECCV 2014 K. Simonyan and A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, ICLR 2015

41 Views

3y ago

Dual-Domain Deep Convolutional Neural Networks for Image ...

Dual-domain Deep Convolutional Neural Networks for Image Demoireing An Gia Vien, Hyunkook Park, and Chul Lee Department of Multimedia Engineering Dongguk University, Seoul, Korea viengiaan@mme.dongguk.edu, hyunkook@mme.dongguk.edu, chullee@dongguk.edu Abstract We develop deep convolutional neural networks (CNNs)

43 Views

3y ago

Deep Convolutional Neural Networks for the Classiﬁcation ...

Convolutional Neural Networks While in fully-connected deep neural networks, the activa-tion of each hidden unit is computed by multiplying the entire in-put by the correspondent weights for each neuron in that layer, in CNNs, the activation of each hidden unit is computed for a small input area. CNNs are composed of convolutional layers which

47 Views

3y ago

Deep Convolutional Neural Networks for Remote Sensing Investigation of ...

Deep Convolutional Neural Networks for Remote Sensing Investigation of Looting of the Archeological Site of Al-Lisht, Egypt by Timberlynn Woolf . potential to expedite the looting detection process using Deep Convolutional Neural Networks (CNNs). Monitoring of looting is complicated in that it is an illicit activity, subject to legal sanction .

23 Views

1y ago

Recent Views

PHONE NO. CONTACT TOPIC/SUBTOPIC ORGANIZATION #A

651-757-2762 Deborah Klooz MPCA Paralegal: 651-757-2631 Jean Coleman MPCA Staff Attorney: 651-757-2791 Adonis Neblett MPCA Staff Attorney: 651-757-2017 Carmen Netten MPCA Staff Attorney: 651-757-2759 David Stellmach MPCA Staff Attorney: 651-757-2247 Joseph Dammel MPCA Staff Attorney: 651-757-2545 Michelle Janson MPCA Staff Attorney: #ATTORNEY .

2y ago

403 Views

Local Prosecutors and The Attorney General

Attorney General of Iowa Other Members iii Honorable Arthur K. Bolton Attorney General of Georgia Honorable Chauncey H. Browning, J 1'. Honorable John C. Danforth Attorney General of Missouri Honorable J olm P. Moore Attorney General of Colorado Attorney General of West Virginia Honorable Larry Derryberry Attorney General of Oklahoma

1y ago

178 Views

30th Annual Anti-Fraud Conference Tentative Schedule

Apr 30, 2019 · Jill Nerone, Supervising Deputy District Attorney, Alameda County District Attorney’s Office Laura Meyers, Assistant District Attorney, San Francisco County District Attorney’s, Office Nicole Pantaleo, Deputy District Attorney, Marin County District Attorney’s Office, Insurance F

2y ago

150 Views

Shannon McClellan Hon. Diane O. Leasure Ellery M. “Rick .

Attorney at Law Hon. Pamila J. Brown BOG Liaison District Court, Howard County Alan S. Carmel Attorney at Law Sarah Dawn Cline Attorney at Law Adam Sean Cohen Attorney at Law Delegate Kathleen M. Dumais District 15 Suzanne K. Farace Attorney at Law Barry L. Gogel Attorney at Law Michael I. Gordon

2y ago

142 Views

Powers of Attorney Act 2003 A Commentary - Law Society of New South Wales

POWERS OF ATTORNEY ACT 2003: A COMMENTARY 6 POWERS OF ATTORNEY ACT 2003: COMMENTARY The commentary is provided in black text. Reference to the "Act" is a reference to the Powers of Attorney Act 2003 as amended. Reference to the "Regulation" is a reference to the Powers of Attorney Regulation 2011, recently amended by the Powers of Attorney Amendment Act 2013 and the Powers of

7m ago

94 Views

California Safe Drinking Water and Toxic Enforcement Act .

District Attorney of Madera County 209 West Yosemite Avenue Madera, CA 93637 District Attorney of Marin County 3501 Civic Center Drive, Rm. 130 San Rafael, CA 94903 District Attorney of Mariposa County P.O. Box 730 Mariposa, CA 95338 District Attorney of Mendocino County P.O. Box 1000 Ukiah, CA 95482 District Attorney of Merced County

3y ago

163 Views

IN THE UNITED STATES COURT OF APPEALS FOR THE FIRST

Mar 06, 2020 · Attorney General of New Jersey Assistant Attorney General Counsel of Record Attorney for Amicus Curiae JOHN T. PASSANTE State of New Jersey Deputy Attorney General New Jersey Attorney General’s Office Richard J. Hughes Justice Complex 25 Market Street Trenton, NJ 086

2y ago

128 Views

ATTORNEY HANDBOOK - United States Courts

e. Each attorney's or pro se litigant's name must be typed and signed on the last page of the complaint, with: (1) his/her address (2) telephone number (3) if a Pennsylvania attorney, his/her Pennsylvania Attorney ID Number f. To file a complaint, the attorney must have an electronic signature on the complaint and must have an electronic

1y ago

124 Views

Power of Attorney - FedEx

Show the date the Power of Attorney is signed. Corporation Power of Attorney Partnership 1 10 9 8 7 6 5 4 3 2 12 11 1 10 9 8 7 6 5 4 3 2 12 11 1 10 9 8 7 6 5 4 3 2 12 11 Rev 6/13 The number preceding each instruction corresponds to the same number on the example of the power of attorney form. Customs Power of Attorney, Designation as Export .

1y ago

157 Views

Powers of Attorney - Ontario

attorney, a family member or friend may have to apply to be appointed as guardian. Powers of attorney that were properly made under previous laws of Ontario remain legally valid. The forms for a Continuing Power of Attorney for Property and a Power of Attorney for Personal Care contained in this booklet were revised on March 29, 1996 in accordance

1y ago

155 Views

STATUTORY POWER OF ATTORNEY - eForms

repudiated the power of attorney; and the power of attorney still is in full force and effect. 5. I/we make this affidavit for the purpose of inducing _ to accept delivery of the above described instrument, as executed by me/us in my/our capacity of attorney(s)-in-fact for the Principal. _, Attorney-in-fact

1y ago

118 Views

John J. Hoffman Acting Attorney General of New Jersey

JOHN J. HOFFMAN ACTING ATTORNEY GENERAL OF NEW JERSEY Division of Law 124 Halsey Street — 5th Floor P.O. Box 45029 Newark, New Jersey 07101 Attorney for Plaintiffs By: Jah-Juin Ho - #033032007 Deputy Attorney General 973-648-2500 JOHN J. HOFFMAN, Acting Attorney General of the State of New Jersey, and ERIC T.

1y ago

89 Views

Options in Oregon to Help Another Person Make Decisions

Power of Attorney A “Power of Attorney” is a legal document that allows a person to give another person (called an “agent”) the right to act on the person’s behalf. A “Power of Attorney” in Oregon can only be used for financial decisions. The way a “Power of Attorney” is written is important. The authority given to the agent can

3y ago

134 Views

- fcdfa

FRESNO COUNTY SUPERIOR COURT By DEPT.402 JAN SCULLY District Attorney, County of Sacramento RUTH YOUNG, State Bar No. 133606 Deputy District Attorney 906 G Street, Suite 700 Sacramento, CA 95814 Telephone: (916) 874-6174 JACKIE LACEY District Attorney, County of Los Angeles STUART C. LYTTON, State Bar No. 114241 Deputy District Attorney

3y ago

136 Views

Non-Attorney E-File Registration

your motion for e-filing access. Instructions to submit the Non-Attorney E-File Registration: 1. Register for a Non-Attorney Filer Account on the PACER website at www.pacer.uscourts.gov. If you already have a PACER Account, login to Manage My Account, select Non-Attorney E-File Re

2y ago

181 Views

Convolutional Neural Networks: An Overview And Application .

It looks like you're using an ad-blocker