Nonlinear Activation Functions In CNN Based On Fluid Dynamics And Its .

1y ago

5 Views

2 Downloads

764.70 KB

14 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Maxton Kershaw

Report this link

Download PDF

Transcription

Copyright 2019 Tech Science PressCMES, vol.118, no.1, pp.1-14, 2019Nonlinear Activation Functions in CNN Based on Fluid Dynamicsand Its ApplicationsKazuhiko Kakuda1, *, Tomoyuki Enomoto1 and Shinichiro Miura2Abstract: The nonlinear activation functions in the deep CNN (Convolutional NeuralNetwork) based on fluid dynamics are presented. We propose two types of activationfunctions by applying the so-called parametric softsign to the negative region. We usesignificantly the well-known TensorFlow as the deep learning framework. The CNNarchitecture consists of three convolutional layers with the max-pooling and one fullyconnected softmax layer. The CNN approaches are applied to three benchmark datasets,namely, MNIST, CIFAR-10, and CIFAR-100. Numerical results demonstrate theworkability and the validity of the present approach through comparison with othernumerical performances.Keywords: Deep learning, CNN, activation function, fluid dynamics, MNIST, CIFAR10, CIFAR-100.1 IntroductionThe state-of-the-art on the deep learning in artificial intelligence is nowadaysindispensable in engineering and science fields, such as robotics, automotive engineering,web-informatics, bio-informatics, and so on. There are recently some neural networks inthe deep learning framework [LeCun, Bengio and Hinton (2015)], i.e., CNN(Convolutional Neural Networks) to recognize object images [Fukushima and Miyake(1982); LeCun, Bottou, Bengio et al. (1998); Krizhevsky, Sutskever and Hinton (2012)],RNN (Recurrent Neural Networks) to process time-series data [Rumelhart, Hinton andWilliams (1986)], and so forth.The appropriate choice of the activation functions for neural networks is a key factor inthe deep learning simulations. Heretofore, there have been significantly proposed variousactivation functions in the CNN/RNN-frameworks. The standard activation function isthe rectified linear unit (ReLU) introduced firstly by Hahnloser et al. [Hahnloser,Sarpeshkar, Mahowald et al. (2000)] in the theory of symmetric networks withrectification (It was called a rectification nonlinearity or ramp function [Cho and Saul(2009)]). Nair et al. [Nair and Hinton (2010)] have successfully performed by applyingthe ReLU activation functions based on the restricted Boltzmann machines to the deep1Department of Mathematical Information Engineering, College of Industrial Technology, Nihon University,Chiba 275-8575, Japan.2 Department of Liberal Arts and Basic Sciences, College of Industrial Technology, Nihon University, Chiba275-8576, Japan.* Corresponding Author: Kazuhiko Kakuda. Email: kakuda.kazuhiko@nihon-u.ac.jp.CMES. es

2 Copyright 2019 Tech Science PressCMES, vol.118, no.1, pp.1-14, 2019neural networks. The activation function of ReLU has been widely used by manyresearchers for visual recognition tasks [Glorot, Bordes and Bengio (2011); Krizhevsky,Sutskever and Hinton (2012); Srivastava, Hinton, Krizhevsky et al. (2014); LeCun,Bengio and Hinton (2015); Kuo (2016); Agarap (2018)]. The ReLU activation functionleads to better recognition performances than conventional sigmoid/tanh units involvingthe vanishing gradient problem, while has parameter-free, and zero-gradients in thenegative part.In order to provide meaningful such negative values, there have been presented someactivation functions, such as leaky rectified linear unit (LReLU) [Maas, Hannun and Ng(2013)], parametric rectified linear unit (PReLU) [He, Zhang, Ren et al. (2015)],exponential linear unit (ELU) [Clevert, Unterthiner and Hochreiter (2016)], and so forth.The LReLU has been slightly improved for ReLU by replacing the negative part of theReLU with a linear function involving small constant gradient. The PReLU has beengeneralized by adaptively learning the parameters introduced in the negative part ofLReLU. They have improved significantly the learning performances on large imagedatasets called ImageNet. Clevert et al. [Clevert, Unterthiner and Hochreiter (2016)] haveproposed an activation function, ELU, and shown applicability and validity for variousbenchmark datasets. As another approach, Goodfellow et al. [Goodfellow, Warde-Farley,Mirza et al. (2013)] have also proposed an activation function called maxout that hasfeatures both for optimization and model averaging with dropout [Hinton, Srivastava,Krizhevsky et al. (2012)].In our previous work, we have presented newly the characteristic function (i.e., activationfunction) as an optimum function which is derived on the advection-diffusion system influid dynamics framework [Kakuda (2002)]. The purpose of this paper is to propose theactivation functions based on the concept of fluid dynamics framework. We present twotypes of activation functions by applying the so-called parametric softsign [Glorot andBengio (2010)] to the negative part of ReLU. By using the well-known TensorFlow[Abadi, Agarwal, Barham et al. (2015)] as the deep learning framework, we utilize theCNN architecture that consists of three convolutional layers with the max-pooling andone fully-connected softmax layer. The workability and the validity of the presentapproach are demonstrated on three benchmark datasets, namely, MNIST [LeCun, Bottou,Bengio et al. (1998)], CIFAR-10 and CIFAR-100 [Krizhevsky and Hinton (2009)],through comparison with other numerical performances.2 Construction of nonlinear activation functions2.1 Neural network modelIn the field of neural networks, the input-output (I/O) relationship known as the backpropagation is represented by inputs 𝑈𝑗 , output 𝑉𝑗 and the characteristic function h (i.e.,activation function) as follows:𝑉𝑗 ℎ(𝑈𝑗 )𝑈𝑗 𝑛𝑖 1 𝑆𝑖𝑗 𝑤𝑖𝑗(1) 𝐼𝑗 𝑇𝑗(2)where 𝑆𝑖𝑗 are j-th input values as shown in Fig. 1, 𝑤𝑖𝑗 are the connection weights, 𝐼𝑗 is thebias value, and 𝑇𝑗 denotes threshold.

Nonlinear Activation Functions in CNN Based on Fluid Dynamics3The sigmoid function (see Fig. 2(a)) has been mainly used as the following continuousfunction.1𝑣ℎ(𝑣) 2 {1 𝑡𝑎𝑛ℎ (2𝑘)}(3)where k is ad hoc parameters.Figure 1: Neuron model2.2 Nonlinear activation functions based on fluid dynamicsHeretofore, we have presented the following activation function as an optimum functionwhich is derived on the steady advection-diffusion system in fluid dynamics framework[Kakuda (2002)] (see Eq. (25) in next Subsection 2.3).1ℎ(𝑣) 2 {1 𝑔̂(𝑣)}(4)1𝑔̂(𝑣) 𝑐𝑜𝑡ℎ(𝛾) 𝛾(5)where 𝛾 𝑣 2𝑘 0.Mizukami [Mizukami (1985)] has presented significantly the following approximationfunction instead of Eq. (5) involving the singularity.1𝑔̃(𝑣) 1 𝛾 1(6)Therefore, we obtain the functions by substituting Eq. (6) into Eq. (4) and considering thesign of v as follows (see Fig. 2(b)):ℎ(𝑣) 11(2 )21 𝛾 {1 1()2 1 𝛾 (𝛾 0)(𝛾 0)(7)In this stage, we adjust the functions, h(v), so that 𝑔(0) 0.1𝑔(𝑣) 𝜎̂{ℎ(𝑣) ℎ(0)}, ℎ(0) 2As a result, we obtain the following form by taking into account that 𝜎̂ 2𝜅.(8)

4 Copyright 2019 Tech Science PressCMES, vol.118, no.1, pp.1-14, 2019𝜅𝑣𝑔(𝑣) 𝜅 𝑣 (9)in which 𝜅 2𝑘. Eq. (9) represents the softsign function with 𝜅 1 [Glorot and Bengio(2010)]. The so-called parametric softsign is equivalent to the ReLU [Nair and Hinton(2010)] under the conditions, such as 𝜅 for 𝑣 0 and 𝜅 0 for 𝑣 0.In order to avoid zero-gradients in the negative part of v, by applying Eq. (9) to thenegative region, we propose two types of activation function involving parameter, a, asfollows (see Fig. 3):(a) Sigmoid functions with k(b) Characteristic functions of Eq. (7)Figure 2: Characteristic functionsRational-type activation function and its derivatives𝑣(𝑣 0)𝑔(𝑣) { 𝑎𝑣(𝑣 0)𝑎 𝑣 𝑔(𝑣) 𝑣1 {(𝑣 0)2𝑎(𝑎 𝑣 )(𝑣 0) 𝑔(𝑣) 𝑎,0 {(10)(𝑣 0)𝑣 𝑣 (𝑎 𝑣 )2(𝑣 0)Exponential-type activation function and its derivatives𝑣(𝑣 0)𝑔(𝑣) { 𝑒 𝑎 𝑣(𝑣 0)𝑒 𝑎 𝑣 𝑔(𝑣) 𝑣1 {(𝑣 0)(𝑒𝑎2)𝑒 𝑎 𝑣 (𝑣 0), 𝑔(𝑣) 𝑎0 {(11)(12)(𝑣 0)𝑒 𝑎 𝑣 𝑣 (𝑒 𝑎 𝑣 )2(𝑣 0)The corresponding derivatives of the activation functions are also shown in Fig. 4.(13)

Nonlinear Activation Functions in CNN Based on Fluid Dynamics(a) Rational-type activation function5(b) Exponential-type activation functionFigure 3: Nonlinear activation functions(a) Rational-type activation function(b) Exponential-type activation functionFigure 4: Derivatives of activation functions2.3 Steady advection-diffusion equation2.3.1 Problem statementLet us briefly consider the one-dimensional advection-diffusion equation in spatialcoordinate, x, given by𝑓,𝑥 𝑘𝜑,𝑥𝑥(14)with the adequate boundary conditions, where 𝑓 𝑢φ, u and k are the given velocity anddiffusivity, respectively.2.3.2 Finite element formulationIn order to solve the flux, 𝑓 𝑢φ, in a stable manner, we shall adopt the Petrov-Galerkin

6 Copyright 2019 Tech Science PressCMES, vol.118, no.1, pp.1-14, 2019finite element formulation using exponential weighting function [Kakuda and Tosaka(1992)]. On the other hand, the conventional Galerkin finite element formulation can beapplied to solve numerically Eq. (14).First of all, we start with the following weighted integral expression in a subdomain Ω𝑖 [𝑥𝑖 1 , 𝑥𝑖 ] with respect to weighting function 𝑤̃:̃𝑑𝑥 0 Ω (𝑓 𝑢𝜑)𝑤(15)The weighting function 𝑤̃ can be chosen as a general solution which satisfies𝑢𝑤̃ Δ𝑥𝑖 𝜎(𝑢)𝑤̃,𝑥 0(16)𝑖where Δ𝑥𝑖 𝑥𝑖 𝑥𝑖 1 , and σ(𝑢) denotes some function described by Yee et al. [Yee,Warming and Harten (1985)], which is sometimes referred to as the coefficient ofnumerical viscosity. The solution of Eq. (16) is as follows:𝑤̃ 𝐴𝑒 𝑎̂𝑥(17)where A is a constant, and 𝑎̂ 𝑢 Δ𝑥𝑖 𝜎(𝑢).By applying the piecewise linear function to the flux f and φ, we obtain the followingintegral form Ω 𝑀𝛼 𝑁𝛽 𝑑𝑥𝑓𝛽 𝑢 Ω 𝑀𝛼 𝑁𝛽 𝑑𝑥𝜑𝛽 0𝑖(18)𝑖in which𝑀𝛼 𝑒 𝑎̂(𝑥 𝑥𝛼 )(𝛼 1, 2)(19)Here, applying an element-wise mass lumping to the first term of the left-hand side of Eq.(18), and carrying out exactly those integrals in Eq. (18), we can obtain the followingnumerical fluxes 𝑓𝑖 1/2 and 𝑓𝑖 1/2 in the subdomains Ω𝑖 and Ω𝑖 1 , respectively𝑢1𝑓𝑖 1/2 𝑓𝑖 [1 {𝑠𝑔𝑛(𝛾)𝑐𝑜𝑡ℎ 𝛾 }] (𝜑𝑖 1 𝜑𝑖 )2𝛾𝑢1𝑓𝑖 1/2 𝑓𝑖 2 [ 1 {𝑠𝑔𝑛(𝛾)𝑐𝑜𝑡ℎ 𝛾 𝛾}] (𝜑𝑖 𝜑𝑖 1 )(20)(21)where 𝛾 𝑢 2𝜎(𝑢), and 𝑠𝑔𝑛(𝛾) denotes the signum function.Let us next derive the Galerkin finite element model for Eq. (14). The weighted residualequation in Ω𝑖 is given as follows:\ Ω (𝑓,𝑥 𝑘𝜑,𝑥𝑥 )𝑤𝑑𝑥 0(22)𝑖In this stage, we assume a uniform mesh Δ𝑥𝑖 Δ𝑥 for simplicity of the formulation.Taking into consideration the continuity of 𝜑,𝑥 at nodal point i, we can obtain thefollowing discrete form𝑓𝑖 1/2 𝑓𝑖 1/2 𝑘(𝜑𝑖 1Δ𝑥 2𝜑𝑖 𝜑𝑖 1 ) 0(23)Substituting Eq. (20) and Eq. (21) into Eq. (23) and after some manipulations, we obtainthe following finite difference form𝑢𝜑 2𝜑 𝜑(𝜑𝑖 1 𝜑𝑖 1 ) (𝑘 𝑘̃ ) 𝑖 1 2𝑖 𝑖 1(24)2Δ𝑥where for any velocity uΔ𝑥

Nonlinear Activation Functions in CNN Based on Fluid Dynamics 𝑢 Δ𝑥1𝑘̃ 2 {𝑐𝑜𝑡ℎ 𝛾 𝛾 }7(25)Using the element Peclet number 𝑃𝑒 ( 𝑥𝑢/2𝑘) as 𝛾 , we reduce Eq. (24) to thefollowing form{𝑠𝑔𝑛(𝑃𝑒 ) 𝑐𝑜𝑡ℎ 𝑃𝑒 }𝜑𝑖 1 2𝑐𝑜𝑡ℎ 𝑃𝑒 𝜑𝑖 {𝑠𝑔𝑛(𝑃𝑒 ) 𝑐𝑜𝑡ℎ 𝑃𝑒 }𝜑𝑖 1 0(26)This equation has the same structure as the SUPG scheme developed by Brooks et al.[Brooks and Hughes (1982)], and it leads to nodally exact solutions for all values of 𝑃𝑒[Christie, Griffiths, Mitchell et al. (1976)].3 CNN architectureWe adopt the similar approach as the PReLU [He, Zhang, Ren et al. (2015)] which can betrained using back-propagation and optimized simultaneously with other layers. As thevariables, v and a, in Eq. (10) through Eq. (13), we define 𝑣𝑗 and 𝑎𝑗 with respect to theinput and the coefficient, respectively, on the j-th channel. The momentum approachwhen updating 𝑎𝑗 is given as follows: 𝐸Δ𝑎𝑗 𝜇Δ𝑎𝑗 𝜀 𝑎 𝐸 𝑎𝑗 𝑣𝑗(27)𝑗 𝐸 𝑔(𝑣𝑗 ) 𝑔(𝑣𝑗 ) 𝑎𝑗(28)where E represents the objective function, 𝜇 is the momentum to accelerate learning, 𝜀 isthe learning rate. The parameters, 𝑎𝑗 , are optimally obtained by using back-propagationanalysis.Fig. 5 shows the CNN architecture consisting of three convolutional (i.e., conv) layerswith some max-pooling and one fully-connected (i.e., fc) softmax layer.Figure 5: CNN architecture4 Numerical experimentsIn this section, we use the well-known TensorFlow [Abadi, Agarwal, Barham et al.(2015)] as the deep learning framework, and present numerical performances obtainedfrom applications of the above-mentioned CNN approach to three typical datasets,namely, MNIST [LeCun, Bottou, Bengio et al. (1998)], CIFAR-10 and CIFAR-100[Krizhevsky and Hinton (2009)]. We utilize the Adam [Kingma and Ba (2015)] as thelearning algorithm for the stochastic gradient-based optimization.

8 Copyright 2019 Tech Science PressCMES, vol.118, no.1, pp.1-14, 2019The model is previously trained for some epochs on mini-batches of size 100 with thelearning rate, 𝜀 10 3, and the momentum, 𝜇 0. The specification of CPU and GPUusing CUDA is summarized in Tab. 1.Table 1: A summary of the specification of CPU and GPUCPUCoresBase ClockCashe MemoryGPUGlobal MemoryCUDA coreGPU Max Clock rateGPU Boost Clock rateMemory Clock rateMemory BandwidthCUDA DriverIntel CoreTM i7-8700 K124.2 GHz12 MBNVIDIA GeForce GTX 10808 GB25601607 MHz1733 MHz10 Gbps320 GB/sVersion 9.04.1 MNISTLet us first consider the MNIST dataset which consists of 28 28 pixel gray-scalehandwritten digit images with 50,000 for training and 10,000 for testing images.Fig. 6 shows the behaviors of training accuracy and loss (i.e., cross-entropy) obtained byusing various activation functions for the MNIST. The corresponding validation accuracyand loss behaviors are shown in Fig. 7. We can see from Fig. 6 and Fig. 7 that ourapproaches are similar to the ones using other activation functions. Tab. 2 summarizesthe transitions of the learned parameter, a, at each layer of CNN architecture (see Fig. 5)for the MNIST. The validation accuracy rate and loss for the MNIST are given in Tab. 3.In this case, the quantitative agreement between our results and other ones appears alsosatisfactory.Table 2: Transitions of the learned parameter, a, for the MNISTPReLUInitial type0.0-0.15130.010670.03090

Nonlinear Activation Functions in CNN Based on Fluid Dynamics(a) Training accuracy9(b) Training lossFigure 6: Training accuracy and loss behaviors for the MNIST(a) Validation accuracy(b) Validation lossFigure 7: Validation accuracy and loss behaviors for the MNISTTable 3: Accuracy rate and loss for the MNISTAccuracy 80698.790.0521094.2 CIFAR-10As the second benchmark dataset, we consider the CIFAR-10 which consists of 32 32color images drawn from 10 classes with 50,000 for training and 10,000 for testingimages.Fig. 8 shows the behaviors of training accuracy and loss (i.e., cross-entropy) obtained byusing various activation functions for the CIFAR-10. The corresponding validationaccuracy and loss behaviors are shown in Fig. 9. We can see from Fig. 8 and Fig. 9 thatour approaches outperform entirely the ones using other activation functions. Tab. 4summarizes the transitions of the learned parameter, a, at each layer of CNN architecture(see Fig. 5) for the CIFAR-10. The validation accuracy rate and loss for the CIFAR-10are given in Tab. 5. For the accuracy rate on the CIFAR-10, we obtain best result of

10 Copyright 2019 Tech Science PressCMES, vol.118, no.1, pp.1-14, 201980.76% using the exponential-type activation function. On the other hand, our approachesfor the loss outperform other ones.Table 4: Transitions of the learned parameter, a, for the CIFAR-10PReLUInitial valueconv1conv2conv30.250.056030.027310.03020(a) Training resentExponential-type0.0-2.110-2.029-1.656(b) Training lossFigure 8: Training accuracy and loss behaviors for the CIFAR-10(a) Validation accuracy(b) Validation lossFigure 9: Validation accuracy and loss behaviors for the CIFAR-10

Nonlinear Activation Functions in CNN Based on Fluid Dynamics11Table 5: Accuracy rate and loss for the CIFAR-10Accuracy 0.760.396214.3 CIFAR-100As the third benchmark dataset, the CIFAR-100 is the same size and format as theCIFAR-10 one, while contains 100 classes for consisting of 20 super-classes with fiveclasses each.Table 6: Transitions of the learned parameter, a, for the 438PReLUInitial valueconv1conv2conv30.250.015110.020660.02695(a) Training 64(b) Training lossFigure 10: Training accuracy and loss behaviors for the CIFAR-100Table 7: Accuracy rate and loss for the CIFAR-100Accuracy 6.521.55405

12 Copyright 2019 Tech Science Press(a) Validation accuracyCMES, vol.118, no.1, pp.1-14, 2019(b) Validation lossFigure 11: Validation accuracy and loss behaviors for the CIFAR-100Fig. 10 shows the behaviors of training accuracy and loss obtained by using variousactivation functions for the CIFAR-100. The corresponding validation accuracy and lossbehaviors are shown in Fig. 11. We can see from Fig. 10 and Fig. 11 that our approachesoutperform the ones using other activation functions. Tab. 6 summarizes the transitions ofthe learned parameter, a, at each layer of CNN architecture for the CIFAR-100. Thevalidation accuracy rate and loss for the CIFAR-100 are given in Tab. 7. For the accuracyrate on the CIFAR-100, we obtain best result of 56.91% using the rational-type activationfunction. On the other hand, our approaches for the loss outperform also other ones.5 ConclusionsWe have proposed new activation functions which were based on the steady advectiondiffusion system in fluid dynamics framework. In our formulation, two types ofactivation functions have been reasonably presented by applying the so-called parametricsoftsign to the negative part of ReLU. By using the TensorFlow as the deep learningframework, we have utilized the CNN architecture that consists of three convolutionallayers with some max-pooling and one fully-connected softmax layer.The performances of our approaches were carried out on three benchmark datasets,namely, MNIST, CIFAR-10 and CIFAR-100, through comparison with the ones usingother activation functions. The learning performances demonstrated that our approacheswere capable of recognizing somewhat accurately and in less loss (i.e., cross-entropy) theobject images in comparison with other ones.ReferencesAbadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z. et al. (2015): TensorFlow:large-scale machine learning on heterogeneous system. https://www.tensorflow.org/.Agarap, A. F. M. (2018): Deep learning using rectified linear units (ReLU).arXiv:1803.08375v1.Brooks, A.; Hughes, T. J. R. (1982): Streamline upwind/Petrov-Galerkin formulationsfor convection dominated flow with particular emphasis on the incompressible NavierStokes equations. Computer Methods in Applied Mechanics and Engineering, vol. 32, pp.

Nonlinear Activation Functions in CNN Based on Fluid Dynamics13199-259.Cho, Y.; Saul, L. K. (2009): Kernel methods for deep learning. Advances in NeuralInformation Processing Systems, vol. 22, pp. 342-350.Christie, I.; Griffiths, D. F.; Mitchell, A. R.; Zienkiewicz, O. C. (1976): Finite elementmethods for second order differential equations with significant first derivatives.International Journal for Numerical Methods in Engineering, vol. 10, pp. 1389-1396.Clevert, D. A.; Unterthiner, T.; Hochreiter, S. (2016): Fast and accurate deep networklearning by exponential linear units (ELUs). International Conference on LearningRepresentations (ICLR), arXiv:1511.07289v5.Fukushima, K.; Miyake, S. (1982): A new algorithm for pattern recognition tolerant ofdeformations and shifts in position. Pattern Recognition, vol. 15, pp. 455-469.Glorot, X.; Bengio, Y. (2010): Understanding the difficulty of training deep feedforwardneural networks. 13th International Conference on Artificial Intelligence and Statistics,pp. 249-256.Glorot, X.; Bordes, A.; Bengio, Y. (2011): Deep sparse rectifier neural networks. 14thInternational Conference on Artificial Intelligence and Statistics, pp. 315-323.Goodfellow, I. J.; Warde-Farley, D.; Mirza, M.; Courville, A.; Bengio, Y. (2013):Maxout networks. 30th International Conference on Machine Learning, pp. 1319-1327.Hahnloser, R. H. R.; Sarpeshkar, R.; Mahowald, M. A.; Douglas, R. J.; Seung, H. S.(2000): Digital selection and analogue amplification coexist in a cortex-inspired siliconcircuit. Nature, vol. 405, pp. 947-951.He, K.; Zhang, X.; Ren, S.; Sun, J. (2015): Delving deep into rectifiers: surpassinghuman-level performance on ImageNet classification. IEEE International Conference onComputer Vision, arXiv:1502.01852v1.Hinton, G. E.; Srivastava, N.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. R.(2012): Improving neural networks by preventing co-adaptation of feature detectors.Technical Report, arXiv:1207.0580v1.Kakuda, K. (2002): Applications of fluid dynamic approach to neural network. 15thComputational Mechanics Conference, JSME, pp. 529-530. (In Japanese)Kakuda, K.; Tosaka, N. (1992): Finite element approach for high Reynolds numberflows. Theoretical and Applied Mechanics, vol. 41, pp. 223-232.Kingma, D. P.; Ba, J. L. (2015): ADAM: a method for stochastic optimization.International Conference on Learning Representations, arXiv:1412.6980v8.Krizhevsky, A.; Hinton, G. E. (2009): Learning multiple layers of features from tinyimages. Technical Report, University of Toronto, Canada.Krizhevsky, A.; Sutskever, I.; Hinton, G. E. (2012): ImageNet classification with deepconvolutional neural networks. 25th International Conference on Neural InformationProcessing Systems, pp. 1097-1105.Kuo, C. C. J. (2016): Understanding convolutional neural networks with a mathematicalmodel. arXiv:1609.04112v2.LeCun, Y.; Bengio, Y.; Hinton, G. E. (2015): Deep learning. Nature, vol. 521, pp. 436-

14 Copyright 2019 Tech Science PressCMES, vol.118, no.1, pp.1-14, 2019444.LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. (1998): Gradient-based learning appliedto document recognition. Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324.Maas, A. L.; Hannun, A. Y.; Ng, A. Y. (2013): Rectifier nonlinearities improve neuralnetwork acoustic models. 30th International Conference on Machine Learning.Mizukami, A. (1985): An implementation of the streamline-upwind/Petrov-Galerkinmethod for linear triangular elements. Computer Methods in Applied Mechanics andEngineering, vol. 49, pp. 357-364.Nair, V.; Hinton, G. E. (2010): Rectified linear units improve restricted Boltzmannmachines. 27th International Conference on Machine Learning, pp. 807-814.Rumelhart, D. E.; Hinton, G. E.; Williams, R. J. (1986): Learning representations byback-propagating errors. Nature, vol. 323, pp. 533-536.Srivastava, N.; Hinton, G. E.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. R.(2014): Dropout: a simple way to prevent neural networks from overfitting. Journal ofMachine Learning Research, vol. 15, pp. 1929-1958.Yee, H. C.; Warming, R. F.; Harten, A. (1985): Implicit total variation diminishing(TVD) schemes for steady-state calculations. Journal of Computational Physics, vol. 57,pp. 327-360.

The CNN approaches are applied to three benchmark datasets, namely, MNIST, CIFAR-10, and CIFAR-100. Numerical results demonstrate the workability and the validity of the present approach through comparison with other numerical performances. Keywords: Deep learning, CNN, activation function, fluid dynamics, MNIST, CIFAR-10, CIFAR-100.

Related Documents:

Express VPN 853 Crack Activation Code Mac 2020 Latest

Express VPN 8.5.3 Crack Activation Code Mac 2020 [Latest] . mobiledit forensic express activation code, spyder 3 express activation code, roku express activation code, vpn express activation code 2021, express vpn activation code, express vpn . a fantastic IP link system for your pc,

100 Views

2y ago

Fast R-CNN - University of California, Davis

Fast R-CNN a. Architecture b. Results & Future Work Agenda 42. Fast R-CNN Fast test-time, like SPP-net One network, trained in one stage Higher mean average precision than slow R-CNN and SPP-net 43. Adapted from Fast R-CNN [R. Girshick (2015)] 44.

50 Views

1y ago

Rich Feature Hierarchies for Accurate Object Detection and Semantic ...

CNN R-CNN: Regions with CNN features Figure 1: Object detection system overview. Our system (1) takes an input image, (2) extracts around 2000 bottom-up region proposals, (3) computes features for each proposal using a large convolutional neural network (CNN), and then (4) classiﬁes each region using class-speciﬁc linear SVMs. R-CNN .

41 Views

1y ago

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal ...

Fast R-CNN [2] enables end-to-end detector training on shared convolutional features and shows compelling accuracy and speed. 3 FASTER R-CNN Our object detection system, called Faster R-CNN, is composed of two modules. The ﬁrst module is a deep fully convolutional network that proposes regions, and the second module is the Fast R-CNN detector [2]

53 Views

1y ago

InstallShield 2020 Activation User Guide - Revenera

activation is necessary. The activation wizard shows the number of days that are left in your trial period. Sometimes activation is not successful. The most common reason is that the activation code was used to activate InstallShield on another machine. The activation wizard protects the license in this case, preventing users from activating

14 Views

1y ago

Hip No. Consigned by Darrell Wollert 1 1 DJ Impressive Phales

High Brow Hickory Smart Little Kitty Smart Lil Highbrow Lena Joe Peppy 1992 Consigned by CNN Quarter Horses CNN Highbrow Lady 2006 Bay Mare CNN Highbrow Lady 4902100 NOTES: CNN Highbrow Lady is a smart, fancy, Highbrow filly out of a powerful female line. She is well broke.

63 Views

2y ago

Fast R-CNN - CVF Open Access

fast-rcnn. 2. Fast R-CNN architecture and training Fig. 1 illustrates the Fast R-CNN architecture. A Fast R-CNN network takes as input an entire image and a set of object proposals. The network ﬁrst processes the whole image with several convolutional (conv) and max pooling

55 Views

1y ago

Technical Publications Catalog

American Gear Manufacturers Association AGMA is a voluntary association of companies, consultants and academicians with a direct interest in the design, manufacture, and application of gears and flexible couplings. AGMA was founded in 1916 by nine companies in response to the market demand for standardized gear products; it remains a member- and market-driven organization to this day. AGMA .

109 Views

3y ago

Recent Views

Technological Revolutions and Stock Prices National Bureau of Economic .

stock prices up, but the discount rate eﬀect prevails eventually, pushing the stock prices down. The resulting pattern in the new economy stock prices looks like a bubble but it obtains under rational expectations through a general equilibrium eﬀect. The bubble-like pattern in stock prices arises in part due to an ex post selection bias.

1y ago

121 Views

Dynamic correlation between stock market and oil prices: The case of .

between stock market and oil prices is still growing. Nevertheless, there are very few studies on the dynamic correlation between these two markets. A first approach on the dynamic co-movements between oil prices and stock markets was performed by Ewing and Thomson (2007), using the cyclical components of oil prices and stock prices.

1y ago

122 Views

Prices Effective January 1, 2020 Machine Prices and Speci cations

Prices Effective January 1, 2020 Machine Prices and Speci cations Prices Effective January 1, 2020 ZERO TURN-4 SERIES REVISED MAY 18, 2020. Machine Prices and Secications Prices Eectie anuar , 2 ZT1. Prices F.O.B. Selma, Alabama and Subject to Change Without Notice. ESTATE SERIES

1y ago

119 Views

Vanguard U.S. Stock Index Small-Capitalization Funds .

† Stock market risk, which is the chance that stock prices overall will decline. Stock markets tend to move in cycles, with periods of rising prices and periods of falling prices.The Fund’s target index tracks a subset of the U.S. stock market, which could cause the Fund to perform di

2y ago

260 Views

Forecasting Prices on the Stock Exchange Using a Trading System

Forecasting prices in stock markets is a matter of great interest both in the academic field and in business. The forecasting of stock prices and stock returns is possible using various techniques and methods. Many researchers study price trends in stock markets with the help of artificial neural networks [1-2] or fuzzy-trends [3, 4]. The

1y ago

132 Views

A Hybrid Prediction Method for Stock Price Using LSTM and . - Hindawi

the relationship between stock prices and these factors. Although these factors will temporarily change the stock price, in essence, these factors will be reﬂected in the stock price and will not change the long-term trend of the stock price. erefore, stock prices can be predicted simply with historical data.

1y ago

159 Views

Columbus,Ohio 1890

Slicing Steaks 3563 Beef Tender, Select In Stock 3852 Angus XT Shoulder Clod, Choice In Stock 3853 Angus XT Chuck Roll, Choice 20/up In Stock 3856 Angus XT Peeled Knuckle In Stock 3857 Angus XT Inside Rounds In Stock 3858 Angus XT Flats, Choice In Stock 3859 Angus XT Eye Of Round, Choice In Stock 3507 Point Off Bnls Beef Brisket, Choice In Stock

2y ago

268 Views

Determine your Pricing Point Taxonomy to Help you Pricing Strategy: Using

Lowering of prices to match competitor prices can be done on a more precise level with Taxonomy. Price matching can be done based on the comparison of further classification via Taxonomy. Stock Availability: In stock/ Out of stock When both the competitor and us have the same product in stock, we ought to markdown to match prices when

1y ago

117 Views

COVID-19 and Energy COVID-19 and the Oil Price - Stock Market Nexus .

stock market. 1. Introduction Oil prices play a key role in stock market performance of oil-importing economies. A decline in oil prices reduces the cost of production and increases economic growth (Narayan et al., 2014). The effect of this is a rise in stock prices due to higher future earnings and dividends (Filis, 2010; Jones &

1y ago

125 Views

Relationship between Financial Ratios in the Stock Prices of .

projected financial ratios. Miri and Abraham (2010) linear and non-linear relationship between the ratio of stock prices in the financial and non-metallic minerals industry Tehran Stock Exchange for the years 2003 to 2007 were reviewed. The results showed that linear and nonlinear relationships between financial ratios and stock prices there is no

10m ago

99 Views

Stock prediction using a Hidden Markov Model versus a Long Short-Term .

the close price of a stock is used when training and predicting stock prices. All data was retrieved from Yahoo. Historical stock prices from 1 January 1990 until 1 June 2019 results in approximately 7000 data points (trading days) per stock. Data preparation In order to create enough data for the two models to

1y ago

121 Views

Learning from Peers Stock Prices and Corporate Investment - USI

stock prices then an increase in the -rm s own stock price informativeness reduces the sensitivity of its investment to its peer stock price (prediction 1). Indeed, as the signal conveyed by its own . stock price (prediction 2), but not otherwise. The same prediction holds for an increase in the correlation of the fundamentals of a -rm .

1y ago

128 Views

Do stock prices respond to changes in corporate income tax rates?

In the event studies, I regress stock returns on market returns and other factors over a time span well before the events of a tax change, creating a model of how the stock returns behave. Then I use the deviation of stock prices from the model's prediction around the events of the tax change to establish the stock's abnormal returns.

1y ago

117 Views

The Impact of Persian News on Stock Returns Through Text Mining Techniques

Persian news - on the stock prices has been neglected. Consequently, this study aimed to fill this gap. To this aim, the stock index values were collected from the Tehran Stock Exchange along with the . Stock market prediction is a way to understand the future fluctuations of a company's stock price (Jishag et al., 2020). Generally, two .

1y ago

225 Views

Buying Your First Stock - Stock-Trak

Stock Market Game Time: 15 Minutes Requires: StockTrak Curriculum , Computer Access Buying Your First Stock This lesson is an introduction to buying a stock. Students will be introduced to basic vocabulary that is involved with a buying and owning a stock. Stu-dents will be going through the entire process of buying a stock from looking

1y ago

164 Views

Nonlinear Activation Functions In CNN Based On Fluid Dynamics And Its .

It looks like you're using an ad-blocker