Scaling RBMs To High Dimensional Data With Invertible Neural Networks

1y ago
8 Views
2 Downloads
2.57 MB
7 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Victor Nelms
Transcription

Scaling RBMs to High Dimensional Data with Invertible Neural NetworksWill Grathwohl * 1 2 3 Xuechen Li * 3 Kevin Swersky 3 Milad Hashemi 3 Jörn-Henrick Jacobsen 1 2Mohammad Norouzi 3 Geoffrey Hinton 3AbstractWe combine invertible neural networks withRBMs to create a more tractable energy-basedmodel which retains the power of recent scalableEBM variants while allowing for more efficientsampling and likelihood evaluation. We furtherfind that replacing the Gaussian base distributionstypically used in normalizing flows with an RBMleads to improved likelihood compared to a flowwith a similar architecture possibly providing apathway to more efficient, but still tractable generative models. We demonstrate the performanceof our approach on small image datasets and compare to recent normalizing flows and EBMs.1. IntroductionRestricted Botzmann Machines (RBMs) have had a long andrich history in the generative modeling community (Smolensky et al., 1986; Hinton, 2002; Hinton & Salakhutdinov,2006). As a generative model they have many desirableproperties including compositional structure (Hinton, 2002;Du & Mordatch, 2019) and the ability to be trained on unlabeled data, or data with missing values. Although they areunnormalized Energy-Based models, the structure of RBMsemits a tractable blocked Gibbs sampler, which enables relatively fast sampling and training, compared to other classesof Energy-Based Models (EBMs). While standard RBMshave been successful at modeling simple distributions, tosuccessfully model more complicated data such as images,RBMs typically need to be stacked on top of each other tocreate a Deep Belief Network (Hinton et al., 2006; Salakhutdinov & Hinton, 2009). This greatly increases the model’sexpressive power but sampling must now be done sequentially, reducing the efficiency and increasing the bias of thetraining objective.*Equal contribution 1 Department of Computer Science, University of Toronto 2 Vector Institute 3 Google AI Research. Correspondence to: Will Grathwohl wgrathwohl@cs.toronto.edu ,Xuechen Li lxuechen@google.com .Second workshop on Invertible Neural Networks, NormalizingFlows, and Explicit Likelihood Models (ICML 2020), Virtual ConferenceIn recent years alternative classes of generative models havebecome more popular such as Normalizing Flows (Rezende& Mohamed, 2015; Deco & Brauer, 1995) and VariationalAutoencoders (Kingma & Welling, 2013; Rezende et al.,2014). These models allow for more efficient samplingand likelihood computation (or estimation) at the cost ofexpressiveness. Despite this, considerable progress has beenmade in these more tractable models causing RBMs to fallout of favor.In this work we propose a simple method to increase thescalability of RBMs without having to rely on sequentialsampling. We train an RBM on top of a learned embeddinggiven by an invertible neural network similar to those usedto define Normalizing Flows (Kingma & Dhariwal, 2018;Dinh et al., 2016). The entire model is trained end-to-endto approximately maximize likelihood. We find that ourEBM-flow hybrid models (which we refer to as EB-andFlow) achieve better likelihoods than normalizing flows andRBMs while being easier to sample from and evaluate thanrecent EBM approaches.2. Background2.1. Energy-Based ModelsAn Energy-Based Model (EBM) is a model which representsa probability distribution asp(x) e E(x),Z(1)where E is known as the energy function which maps thedata to a scalar value and Z is the normalizing constant.The normalizing constantis implicitly defined by the energyRfunction as Z e E(x) dx so it is not modeled. Thismakes training and sampling challenging but gives greatflexibility to the model.2.2. Restricted Boltzmann MachinesAn RBM is an EBM which defines a distribution over visibleunits v and hidden units h defined asp(v) Xhp(v, h) X e E(v,h)hZ.(2)

Scaling RBMs to High Dimensional Data with Invertible Neural NetworksThe visible or hidden units can be discrete or continuous.In this work we focus on Gaussian-Bernoulli RBMs (Choet al., 2013) which have continuous v and discrete h. Theenergy function for this model is defined asE(v, h) v v bv 2 h bh h W ,22σvσv(3)with parameters {W, bv , σv , bh }. While the joint distribution p(v, h) is unnormalized, the conditional distributionsare notp(v h) N (v bv σv hW , σv2 ), vW bh,p(h v) Bernoulli h sigmoidσv(4)(5)allowing for an efficient blocked Gibbs sampler to be usedto draw samples from p(v, h).We can also analytically sum out the hidden variables toproduce an EBM for the marginal distribution of visibleunits with energy:2.4. Issues with EBMsWhile EBMs have shown many advantages over tractablelikelihood models and are becoming one of the premier approaches to generative modeling (Du & Mordatch, 2019;Grathwohl et al., 2019; Nijkamp et al., 2019b; Song & Ermon, 2019), they have limitations that make them challenging to work with. The energy-based parameterization isvery flexible, but sampling and likelihood evaluation requireMarkov Chain Monte Carlo (MCMC) techniques. Giventhe unconstrained nature of the energy functions in theserecent models, the gradient-based samplers typically usedhave difficulty mixing (Nijkamp et al., 2019a) making likelihood evaluation a futile task (Du & Mordatch, 2019). Thisis particularly problematic because samples and likelihoodare the current standard for evaluating generative models,making it unclear how these recent EBMs compare withother classes of generative models. For this reason, it wouldbe desirable to train a model which retains the flexibility ofEBMs while enabling a tractable way to accomplish both ofthese tasks.2E(v) kv bv k softplus W v/σ bh 1.22σ3. Related WorkRBMs are trained with gradient decent by estimating θ log p(v) θ E(v) Ep(v) [ θ E(v)],(6)where θ E(v) can be easily computed. The samples inthe expectation come from a Gibbs chain by repeated sampling from Equations 4 and 5. The chain can be seededfrom data samples, giving the Contrastive Divergence (CD)algorithm (Hinton, 2002). Recent work has also proposedstarting the chain from random noise (Nijkamp et al., 2019b).A persistent chain can be used (Tieleman, 2008; Du & Mordatch, 2019; Grathwohl et al., 2019) for a lower-bias estimate which we use in this work.Base Distributions for Flows. Typically a Gaussian basedensity is used for NF models which imposes topological constraints on the data distributions that can be modeled (Falorsi et al., 2018). Recently, some alternatives havebeen explored. Izmailov et al. (2019) train NF models witha Gaussian Mixture base distribution for semi-supervisedlearning. This leads to strong performance at this task, butthe parameters of the base distribution could not be learnedonline with the flow model. Autoregressive base distributions have also been shown to improve sample quality aswell as likelihoods (Mahajan et al., 2020) at the cost ofslower sampling and added model complexity. On MNIST,our approach provides a larger benefit.2.3. Normalizing FlowsA Normalizing Flow (NF) is a generative model for datawhich works by drawing a sample z p(z) where p(z) isan easily sampled distribution with a closed form density,referred to as the base distribution. This sample z is thenpassed through a function f 1 to give us our data x f 1 (z). When f is bijective, then we can compute log p(x)aslog p(x) log p(f (x)) log det f (x). x(7)Most progress in NF research focuses on designing maximally expressive invertible architectures with efficient Jacobian log determinant computation (Rezende & Mohamed,2015; Dinh et al., 2014; 2016; Kingma & Dhariwal, 2018;Grathwohl et al., 2018; Behrmann et al., 2018; Chen et al.,2019).Unstructured EBMs. EBMs impose very few constraintson model architecture. Recently, energy functions based onunstructured neural networks have achieved impressive performance in terms of sample quality as well as downstreamdiscriminative tasks (Du & Mordatch, 2019; Grathwohlet al., 2019; Song & Ou, 2018). Training these models requires sampling using MCMC from the model distribution,which is notoriously difficult for unstructured energy functions, leading to costly and unstable optimization. Thesedifficulties can partially be side-stepped by training withalternative objectives such as Score Matching (Song & Ermon, 2019; Li et al., 2019) or objectives based on SteinDiscrepancies (Grathwohl et al., 2020) – the latter also providing a compelling method for model evaluation. However,despite this progress, reliably training large-scale EBMs andevaluating their likelihoods is still an open problem.

Scaling RBMs to High Dimensional Data with Invertible Neural Networks4. The Best of Both WorldsWe define a new model for data x. WePfirst sample v froma Gaussian-Bernoulli RBM p(v) h p(v, h). We thenpass v through an invertible neural network with tractableJacobian log-determinant, fθ 1 , defining a model:p(x) Xp(x, h) hXp(v, h) deth fθ (x), x(8)where v fθ (x). Overall, this gives2 kv bv k softplus W v/σ bh 122σ fθ (x) log Z. log det xlog p(x) With respect to θ, we can easily optimize log p(x) sincethe gradients with respect to θ do not depend on log Z.With respect to the RBM parameters φ {W, bh , bv }, weestimate φ log p(x) using Persistent Contrastive Divergence (PCD) (Hinton, 2002; Tieleman, 2008) using a replaybuffer (Du & Mordatch, 2019; Grathwohl et al., 2019).Each step of PCD learning requires only 2 matrix multiplies. Assuming fθ is a large neural network, then PCDwith 20 sampling steps per training training iteration addsnegligible computational overhead compared to training aNF model with a Gaussian base distribution. Pseudocodefor our training procedure can be found in Algorithm 1.Algorithm 1 EB-and-Flow TrainingRequire: Invertible net fθ , RBM pφ (v, h), replay buffer B,number of MCMC steps nfor x in training data doCompute v fθ (x)Compute g v log pφ (v)Update θ with θ log p(v) g θ fθ (x) θ log x fθ (x) Sample v 0 B, remove from Bv̂ MCMC sample for n steps from v 0Update φ with φ E(v) Ev̂ [ φ E(v̂)]Add v̂ to Bend forso on. This gives us L separate outputs v1 , . . . , vL where Lis the number of times the variables are factored out.When D is small, we can simply concatenate all vi together into one vector v and use an RBM on top to definep(v, h). Alternatively we can give each vi its own independent RBM, which defines the overall product distributionp(v1 , h1 ) · · · p(vL , hL ). Under this model each vi is independent of all v\i .If vi is spatially structured (as it would be using a convolutional model) we can define p(v 0 , hi ) with a convolutionalRBM (Lee et al., 2009). We use traditional RBMs for ourMNIST and Fashion MNIST experiments and use convolutional RBMs for our CIFAR10 experiments. Specifically,we use convolutional RBMs which share a spatially structured hidden state. Full details of this RBM architecture arefound in Appendix B.5.1. Conditional VersionsTo incorporate side information, we may include additionalvisible units to the Gaussian-Bernoulli RBM component ofthe EB-and-Flow model following (Larochelle & Bengio,2008). For instance, given an image x and one-hot label y,the log-likelihood can be defined as2log p(x, y) kv bv k y by2σ 2 softplus W v/σ V y bh log det 1 fθ (x) log Z, xwhere V is a weight matrix and by is the bias for the labels.The contrastive divergence learning algorithm can still beused to approximately maximize this log-likelihood, sinceefficient blocked Gibbs sampling is available (Larochelle &Bengio, 2008).Given both a labeled set Dl and unlabeled set Du , we mayoptimize the following joint objectiveXXXlogp(x, y) λlog p(x, y).(9)x Duy(x,y) DlWe leave the investigation of this model as future work.5. Model Structure6. ExperimentsWhen training normalizing flows, the invertible model isa mapping f : Rd Rd . Traditionally, variables are“factored out” as transformations are added. This means weapply one invertible mapping f1 to get v10 f1 (x). We thensplit the features of v10 into two groups v1 , v1 , We then applyour second transformation to v1 giving v20 which is splitinto v2 , v2 , and v2 is passed to the next transformation, andWe run a number of experiments to demonstrate the performance of our approach. We train EB-and-Flow modelsusing the invertible network architectures from NICE (Dinhet al., 2014) and Glow (Kingma & Dhariwal, 2018). Wefirst explore how our approach compares with other EBMson sampling and likelihood evaluation. Next we explorelikelihood computed on held-out test data and compare our

Scaling RBMs to High Dimensional Data with Invertible Neural Networksapproach with standard RBMs and NF models. On each ofthese tasks we find our model performs favorably. Full details of model architectures, baselines, and hyperparameterscan be found in Appendix A.6.1. Likelihood EvaluationWe evaluate our models by finding an upper-bound on likelihood using Annealed Importance Sampling (AIS) (Neal,2001) and a lower-bound using RAISE (Burda et al., 2015).These methods run MCMC chains which slowly annealfrom a tractable distribution to our model’s distribution. Forrecent EBM models, many chains must be run (over 300,000was used in Du & Mordatch (2019) and still the bounds arevery loose). We find that EB-and-Flow models are mucheasier to evaluate. As seen in Table 1, we can arrive atbounds within .001 bit/dim (bit/dim is typically reportedup to 2 decimal places) using 1000 steps meaning that EBand-Flow models can be reliably compared with tractablelikelihood models like NFs and d4.451.0061# Iterations300k1kFigure 1. Consecutive Gibbs samples from a markov chain that hasbeen burned in for 1000 iterations. Top: Standard RBM. Bottom:EB-and-Flow. Top left: MNIST. Bottom left: Fashion MNIST.Right: CIFAR10.2020). We leave this for further work.Table 1. Likelihood evaluation results on 4.03/2.90NICE/Glow3.36GlowFlow AR1.03GlowN/AGlow3.31GlowRBM2.114.555.476.2. SamplingWe compare the ease of sampling from our model witha standard RBM trained on MNIST, FashionMNIST, andCIFAR10. As can be seen in Figure 1 we observe muchfaster mixing, higher quality, and more diverse samples. Wecan see our chain quickly mixes between the various modesof the data distribution producing a varied set of samplesrelatively quickly.6.3. Likelihood EvaluationWe compare EB-and-Flow Models with standard NFs,RBMs, and Flows with autoregressive base distributions (Mahajan et al., 2020) on the MNIST, FashionMNIST (Xiao et al., 2017), and CIFAR10 (Krizhevskyet al., 2009) datasets. As seen in Table 2, across variousarchitectures on MNIST and Fashion MNIST we find thatEB-and-Flow outperforms the baselines in terms of test setlog-likelihood. We see that we are able to obtain competitiveresults on CIFAR10 but do not outperform the state-of-theart or flows with standard Gaussian base distributions. Webelieve this has to do with difficulties training our latentRBM on higher-dimensional data. Izmailov et al. (2019)also experienced these difficulties leading them to fix thebase distribution throughout training and refine it post-hoc.We expect this performance gap could be closed in this wayor by using improved RBM training techniques (Qiu et al.,Table 2. Unconditional Density Estimation. All CIFAR10 modelsuse the Glow architecture.7. ConclusionIn this work we have presented a new type of EBM thatretains the flexibility and desirable properties of previousEBM approaches while addressing several key issues ofthese prior methods; namely sampling and evaluation. Byleveraging invertible neural networks we have created amore tractable EBM which outperforms a flow-based baseline with the same architecture, indicating that the additionalflexibility provided by the RBM improves modeling performance. We are excited about the potential of our approachto make invertible models more efficient by allowing themto use smaller networks and by the potential of scaling ourapproach up to tackle more challenging datasets.

Scaling RBMs to High Dimensional Data with Invertible Neural NetworksReferencesBehrmann, J., Grathwohl, W., Chen, R. T., Duvenaud, D.,and Jacobsen, J.-H. Invertible residual networks. arXivpreprint arXiv:1811.00995, 2018.Burda, Y., Grosse, R., and Salakhutdinov, R. Accurateand conservative estimates of mrf log-likelihood usingreverse annealing. In Artificial Intelligence and Statistics,pp. 102–110, 2015.Chen, T. Q., Behrmann, J., Duvenaud, D. K., and Jacobsen,J.-H. Residual flows for invertible generative modeling.In Advances in Neural Information Processing Systems,pp. 9913–9923, 2019.Cho, K. H., Raiko, T., and Ilin, A. Gaussian-bernoulli deepboltzmann machine. In The 2013 International JointConference on Neural Networks (IJCNN), pp. 1–7. IEEE,2013.Deco, G. and Brauer, W. Nonlinear higher-order statisticaldecorrelation by volume-conserving neural architectures.Neural Networks, 8(4):525–535, 1995.Dinh, L., Krueger, D., and Bengio, Y. NICE: Non-linearindependent components estimation. arXiv preprintarXiv:1410.8516, 2014.Dinh, L., Sohl-Dickstein, J., and Bengio, S. Density estimation using real nvp. arXiv preprint arXiv:1605.08803,2016.Du, Y. and Mordatch, I. Implicit generation and generalization in energy-based models. arXiv preprintarXiv:1903.08689, 2019.Falorsi, L., de Haan, P., Davidson, T. R., De Cao, N.,Weiler, M., Forré, P., and Cohen, T. S. Explorations inhomeomorphic variational auto-encoding. arXiv preprintarXiv:1807.04689, 2018.Grathwohl, W., Chen, R. T., Bettencourt, J., Sutskever, I.,and Duvenaud, D. Ffjord: Free-form continuous dynamics for scalable reversible generative models. arXivpreprint arXiv:1810.01367, 2018.Grathwohl, W., Wang, K.-C., Jacobsen, J.-H., Duvenaud, D.,Norouzi, M., and Swersky, K. Your classifier is secretlyan energy based model and you should treat it like one.arXiv preprint arXiv:1912.03263, 2019.Hinton, G. E. and Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. science, 313(5786):504–507, 2006.Hinton, G. E., Osindero, S., and Teh, Y.-W. A fast learningalgorithm for deep belief nets. Neural computation, 18(7):1527–1554, 2006.Izmailov, P., Kirichenko, P., Finzi, M., and Wilson, A. G.Semi-supervised learning with normalizing flows. arXivpreprint arXiv:1912.13025, 2019.Kingma, D. P. and Ba, J. Adam: A method for stochasticoptimization. arXiv preprint arXiv:1412.6980, 2014.Kingma, D. P. and Dhariwal, P. Glow: Generative flowwith invertible 1x1 convolutions. In Advances in NeuralInformation Processing Systems, pp. 10215–10224, 2018.Kingma, D. P. and Welling, M. Auto-encoding variationalbayes. arXiv preprint arXiv:1312.6114, 2013.Krizhevsky, A., Hinton, G., et al. Learning multiple layersof features from tiny images. 2009.Larochelle, H. and Bengio, Y. Classification using discriminative restricted boltzmann machines. In Proceedings ofthe 25th international conference on Machine learning,pp. 536–543, 2008.Lee, H., Grosse, R., Ranganath, R., and Ng, A. Y. Convolutional deep belief networks for scalable unsupervisedlearning of hierarchical representations. In Proceedingsof the 26th annual international conference on machinelearning, pp. 609–616, 2009.Li, Z., Chen, Y., and Sommer, F. T. Annealed denoisingscore matching: Learning energy-based models in highdimensional spaces. arXiv preprint arXiv:1910.07762,2019.Mahajan, S., Bhattacharyya, A., Fritz, M., Schiele, B., andRoth, S. Normalizing flows with multi-scale autoregressive priors. arXiv preprint arXiv:2004.03891, 2020.Neal, R. M. Annealed importance sampling. Statistics andcomputing, 11(2):125–139, 2001.Grathwohl, W., Wang, K.-C., Jacobsen, J.-H., Duvenaud,D., and Zemel, R. Cutting out the middle-man: Trainingand evaluating energy-based models without sampling.arXiv preprint arXiv:2002.05616, 2020.Nijkamp, E., Hill, M., Han, T., Zhu, S.-C., and Wu,Y. N. On the anatomy of mcmc-based maximum likelihood learning of energy-based models. arXiv preprintarXiv:1903.12370, 2019a.Hinton, G. E. Training products of experts by minimizingcontrastive divergence. Neural computation, 14(8):1771–1800, 2002.Nijkamp, E., Zhu, S.-C., and Wu, Y. N. On learning nonconvergent short-run mcmc toward energy-based model.arXiv preprint arXiv:1904.09770, 2019b.

Scaling RBMs to High Dimensional Data with Invertible Neural NetworksQiu, Y., Zhang, L., and Wang, X. Unbiased contrastive divergence algorithm for training energy-based latent variable models. In International Conference on LearningRepresentations, 2020. URL https://openreview.net/forum?id r1eyceSYPr.Rezende, D. J. and Mohamed, S. Variational inference withnormalizing flows. arXiv preprint arXiv:1505.05770,2015.Rezende, D. J., Mohamed, S., and Wierstra, D. Stochasticbackpropagation and approximate inference in deep generative models. arXiv preprint arXiv:1401.4082, 2014.Salakhutdinov, R. and Hinton, G. Deep boltzmann machines.In Artificial intelligence and statistics, pp. 448–455, 2009.Smolensky, P., McClelland, J., and Rumelhart, D. Paralleldistributed processing: Explorations in the microstructure of cognition. MIT Press, 1986.Song, Y. and Ermon, S. Generative modeling by estimatinggradients of the data distribution. In Advances in NeuralInformation Processing Systems, pp. 11895–11907, 2019.Song, Y. and Ou, Z. Learning neural random fieldswith inclusive auxiliary generators. arXiv preprintarXiv:1806.00271, 2018.Tieleman, T. Training restricted boltzmann machines usingapproximations to the likelihood gradient. In Proceedingsof the 25th international conference on Machine learning,pp. 1064–1071, 2008.Xiao, H., Rasul, K., and Vollgraf, R. Fashion-mnist: anovel image dataset for benchmarking machine learningalgorithms. arXiv preprint arXiv:1708.07747, 2017.A. Experimental DetailsA.1. ArchitecturesThe NICE architectures used were exactly as in Dinh et al.(2014). For MNIST and Fashion MNIST the Glow modelswe used have two levels of features. Each level consistsof 8 affine-coupling blocks with 1x1 convolutions and thehidden dimension of the couple blocks was 512. For CIFAR10 we use 3 levels of features and 16 blocks per level.Our CIFAR10 models used convolutional RBMs with 100hidden channels. We detail this RBM structure in AppendixB.For the MNIST and Fashion MNIST we use a single fullyconnected RBM whose input is a concatenation of the features from the invertible network. These RBMs had 512hidden units. The baseline RBMs also have 512 hiddenunits.A.2. TrainingMNIST and Fashion MNIST models and baselines weretrained for 250 epochs using a batch size of 128. CIFAR10models were trained for 500 epochs with the same batchsize. We use the Adam (Kingma & Ba, 2014) optimizerwith default hyperparameters and learning rate .001.We use PCD with a replay buffer to train our RBMs. Weuse a replay buffer of size 10000 and use 25 steps of Gibbssampling to update the particles at each training iteration.B. Shared Convolutional RBMsFor high dimensional data, the invertible networks used tospecify normalizing flow will typically “factor out” groupsof features as they apply more invertible transformations.This leaves us with f (x) (v1 , . . . , vL ). When convolutional models are used, each vi is spatially structured withits on height, width, and depth. We could unwrap these intovectors, concatenate them together to v and build a fullyconnected RBM on top of them. This gives the followingjoint energy function:E(v, h) v bv 2v h bh h W .2σv2σv(10)Alternatively, we can treat them all separately and giveneach of them their own weight matrix, leading to the identical energy function:E(v, h) LX v bv 2vi hb h Wi .h2σv2σvi 1(11)For fully connected RBMs this interpretation is pointless,but it is not when dealing with convolutional RBMs. A

Scaling RBMs to High Dimensional Data with Invertible Neural Networksconvolutional RBM replaces the weight matrix W with afilter Ω and defines the energy function: v v bv 2 h bh hΩ E(v, h) . (12)2σv2σvWhen the input v is split into L spatially structured vi eachwith their own height, width, and channels, we can define anew energy function which uses L separate filters to definean RBM with a single hidden tensor state: LX v bv 2vi E(v, h) h bh hΩi 2σv2σvi 1(13)where each Ωi has the same number of output filters, andthe kernel size and/or stride is chosen to make sure that thewidth and height of Ωi vi is identical. For example, theinvertible network for our CIFAR10 model outputs 3 vi0 swith height and width equal to (16, 16), (8, 8), (4, 4). ThusΩ1 has a 5x5 kernel and stride 4, Ω2 has a 3x3 kernel andstride 2, and Ω3 has a 3x3 kernel and stride 1. Each kernelhas 100 filters leading h to have size (4, 4, 100). This canbe seen pictorially in Figure 2.Figure 2. Visualization of our shared convolutional RBM structure. The image is mapped to spatially structured v1 , v2 , v3 by theinvertible neural network fθ . Each vi is then convolved with adistinct filter Ωi which maps to the shared hidden state h.

Restricted Botzmann Machines (RBMs) have had a long and rich history in the generative modeling community (Smolen-sky et al.,1986;Hinton,2002;Hinton & Salakhutdinov, 2006). As a generative model they have many desirable properties including compositional structure (Hinton,2002; Du & Mordatch,2019) and the ability to be trained on unla-

Related Documents:

Measurement and Scaling Techniques Measurement In Research In our daily life we are said to measure when we use some yardstick to determine weight, height, or some other feature of a physical object. We also measure when we judge how well we like a song, a File Size: 216KBPage Count: 23Explore further(PDF) Measurement and Scaling Techniques in Research .www.researchgate.netMeasurement & Scaling Techniques PDF Level Of .www.scribd.comMeasurement and Scaling Techniqueswww.slideshare.netMeasurement and scaling techniques - SlideSharewww.slideshare.netMeasurement & scaling ,Research methodologywww.slideshare.netRecommended to you b

AWS Auto Scaling lets you use scaling plans to configure a set of instructions for scaling your resources. If you work with AWS CloudFormation or add tags to scalable resources, you can set up scaling plans for different sets of resources, per application. The AWS Auto Scaling console provides recommendations for

Memory Scaling is Dead, Long Live Memory Scaling Le Memoire Scaling est mort, vive le Memoire Scaling! . The Gap in Memory Hierarchy Main memory system must scale to maintain performance growth 21 3 227 11 13 2215 219 23 Typical access latency in processor cycles (@ 4 GHz) L1(SRAM) EDRAM DRAM HDD 25 29 217 221 Flash

SCS considered the future of instructions in the BIBCO Standard Record (BSR) and CONSER Standard Record (CSR) relating to Rare Materials in light of the planned Rare Books & Manuscripts (RBMS) Policy Statements and have provided feedback to RBMS. Other: SCS reviewed its activities from 2014-2018 for the PCC Policy Committee (PoCo) and

Introduction to (Restricted) Boltzmann Machines . Conditional Factored RBM with F 500 hidden units and C 30 Each of the K weight matrices Wk (size M*F) . Results 19. Conclusions 20 Gaussian hidden units perform worse. Conditional RBMs outperform the simple RBMs, but also

Dimensional analysis Scaling -a powerful idea Similitude Buckingham Pi theorem Examples of the power of dimensional analysis Useful dimensionless quantities and their interpretation Scaling and similitude Scaling is a notion from physics and engineering that sho

strategy. It provides an overview of scaling frameworks and models, together with a set of case studies of scaling strategies applied by organisations within and outside the YBI network. Different models for scaling and replication are introduced by means of frameworks developed by innovation and scaling experts Nesta and Spring

current trends and techniques in the fi eld of analytical chemistry. Written for undergraduate and postgraduate students of chemistry, this revised and updated edition treats each concept and principle systematically to make the subject comprehensible to beginners as well as advanced learners. FEATURES Updated nomenclature Addition of tests for metals based on fl ame atomic emission .