Deep Neural Nets As A Method For Quantitative Structure Activity .

1y ago

6 Views

1 Downloads

1.91 MB

12 Pages

Last View : 13d ago

Last Download : 3m ago

Upload by : Bennett Almond

Report this link

Download PDF

Transcription

Articlepubs.acs.org/jcimDeep Neural Nets as a Method for Quantitative Structure ActivityRelationshipsJunshui Ma,*,† Robert P. Sheridan,‡ Andy Liaw,† George E. Dahl,§ and Vladimir Svetnik††Biometrics Research Department and ‡Structural Chemistry Department, Merck Research Laboratories, Rahway, New Jersey 07065,United States§Computer Science Department, University of Toronto, Toronto, Ontario ON M5S, CanadaS Supporting Information*ABSTRACT: Neural networks were widely used for quantitative structure activityrelationships (QSAR) in the 1990s. Because of various practical issues (e.g., slow onlarge problems, diﬃcult to train, prone to overﬁtting, etc.), they were superseded bymore robust methods like support vector machine (SVM) and random forest (RF),which arose in the early 2000s. The last 10 years has witnessed a revival of neuralnetworks in the machine learning community thanks to new methods forpreventing overﬁtting, more eﬃcient training algorithms, and advancements incomputer hardware. In particular, deep neural nets (DNNs), i.e. neural nets withmore than one hidden layer, have found great successes in many applications, suchas computer vision and natural language processing. Here we show that DNNs canroutinely make better prospective predictions than RF on a set of large diverseQSAR data sets that are taken from Merck’s drug discovery eﬀort. The number ofadjustable parameters needed for DNNs is fairly large, but our results show that it isnot necessary to optimize them for individual data sets, and a single set ofrecommended parameters can achieve better performance than RF for most of the data sets we studied. The usefulness of theparameters is demonstrated on additional data sets not used in the calibration. Although training DNNs is still computationallyintensive, using graphical processing units (GPUs) can make this issue manageable. INTRODUCTIONQuantitative structure activity relationships (QSAR) is a verycommonly used technique in the pharmaceutical industry forpredicting on-target and oﬀ-target activities. Such predictionshelp prioritize the experiments during the drug discoveryprocess and, it is hoped, will substantially reduce theexperimental work that needs to be done. In a drug discoveryenvironment, QSAR is often used to prioritized large numbersof compounds, and in that case the importance of having eachindividual prediction be accurate is lessened. Thus, models withpredictive R2 of as low as 0.3 can still be quite useful. Thatsaid, higher prediction accuracy is always desirable. However,there are practical constraints on the QSAR methods that mightbe used. For example1. QSAR data sets in an industrial environment may involvea large number of compounds ( 100 000) and a largenumber of descriptors (several thousands).2. Fingerprint descriptors are frequently used. In thesecases, the descriptors are sparse and only 5% of them arenonzero. Also, strong correlations can exist betweendiﬀerent descriptors.3. There is a need to maintain many models (e.g., dozens)on many diﬀerent targets.4. These models need to be updated routinely (e.g.,monthly). XXXX American Chemical SocietyEven in well-supported, high-performance in-house computing environments, computer time and memory may becomelimiting. In our environment, an ideal QSAR method should beable to build a predictive model from 300 000 molecules with10 000 descriptors within 24 h elapsed time, without manualintervention. QSAR methods that are particularly computerintensive or require the adjustment of many sensitiveparameters to achieve good prediction for an individualQSAR data set are less attractive.Because of these constraints, only a small number of themany machine learning algorithms that have been proposed aresuitable for general QSAR applications in drug discovery.Currently, the most commonly used methods are variations onrandom forest (RF)1 and support vector machine (SVM),2which are among the most predictive.3,4 In particular, RF hasbeen very popular since it was introduced as a QSAR methodby Svetnik et al.5 Due to its high prediction accuracy, ease ofuse, and robustness to adjustable parameters, RF has beensomething of a “gold standard” to which other QSAR methodsare compared. This is also true for non-QSAR types of machinelearning.6In 2012, Merck sponsored a Kaggle competition (www.kaggle.com) to examine how well the state of art of machineReceived: December 17, 2014ADOI: 10.1021/ci500747nJ. Chem. Inf. Model. XXXX, XXX, XXX XXX

ArticleJournal of Chemical Information and ModelingTable 1. Data Sets for Prospective Predictiondata NK1OX1OX2PGPPPBRAT IBACECAVCLINTERK2FACTORXIAFASSIFHERGHERG (full targettargetADMEADMEADMEKaggle Data SetsCYP P450 3A4 inhibition log(IC50) Mbinding to cannabinoid receptor 1 log(IC50) Minhibition of dipeptidyl peptidase 4 log(IC50) Minhibition of HIV integrase in a cell based assay log(IC50) Minhibition of HIV protease log(IC50) MlogD measured by HPLC methodpercent remaining after 30 min microsomal incubationinhibition of neurokinin1 (substance P) receptor binding log(IC50) Minhibition of orexin 1 receptor log(Ki) Minhibition of orexin 2 receptor log(Ki) Mtransport by p-glycoprotein log(BA/AB)human plasma protein binding log(bound/unbound)log(rat bioavailability) at 2 mg/kgtime dependent 3A4 inhibitions log(IC50 without NADPH/IC50 withNADPH)human thrombin inhibition log(IC50) MAdditional Data SetsCYP P450 2C8 inhibition log(IC50) MCYP P450 2C9 inhibition log(IC50) MCYP P450 2D6 inhibition log(IC50) Mbinding to Angiotensin-II receptor log(IC50) Minhibition of beta-secretase log(IC50) Minhibition of Cav1.2 ion channelclearance by human microsome log(clearance) μL/min·mginhibition of ERK2 kinase log(IC50) Minhibition of factor Xla log(IC50) Msolubility in simulated gut conditions log(solubility) mol/Linhibition of hERG channel log(IC50) Minhibition of hERG ion channel log(IC50) MADMEADMEADMEinhibition of Nav1.5 ion channel log(IC50) Mapparent passive permeability in PK1 cells log(permeability) cm/sinduction of 3A4 by pregnane X receptor; percentage relative to rifampicinnumber ofmoleculesnumber of unique AP, 00003093850000830277139282vision,9 and other artiﬁcial intelligence applications. One of themajor diﬀerences between DNNs today and the classicalartiﬁcial neural networks widely used for chemical applicationsin the 1990s is that DNNs have more than one intermediate(i.e., hidden) layer and more neurons in each layer and are thusboth “deeper” and “wider.”The classical neural networks suﬀered from a number ofpractical diﬃculties. For example, they could handle only alimited number of input descriptors. Therefore, descriptorselection or extraction methods had to be applied to reduce theeﬀective number of descriptors from thousands to tens or atmost hundreds. Valuable predictive information was thus lost.Also, to avoid overﬁtting the training data and to reducecomputation burden, the number of hidden layers was limitedto one, and the number of neurons in that hidden layer had tobe limited. Thanks to the advancements in theoretical methods,optimization algorithms, and computing hardware, most of theissues with classical neural networks have been resolved.Nowadays, neural networks with multiple hidden layers andthousands of neurons in each layer can be routinely applied todata sets with hundreds of thousands of compounds andthousands of descriptors without the need of data reduction.Also, overﬁtting can be controlled even when the nets havemillions of weights.learning methods can perform in QSAR problems. We selected15 QSAR data sets of various sizes (2000 50 000 molecules)using a common descriptor type. Each data set was divided intoa training set and test set. Kaggle contestants were givendescriptors and activities for the training set and descriptorsonly for the test set. Contestants were allowed to generatemodels using any machine learning method or combinationsthereof, and predict the activities of test set molecules.Contestants could submit as many separate sets of predictionsas they wished within a certain time period. The winning entry(submitted by one of the authors, George Dahl) improved themean R2 averaged over the 15 data sets from 0.42 (for RF) to0.49. While the improvement might not seem large, we haveseldom seen any method in the past 10 years that couldconsistently outperform RF by such a margin, so we felt thiswas an interesting result.The winning entry used an ensemble of diﬀerent methods,including deep neural net (DNN), gradient boosting machine(GBM),3 and Gaussian process (GP) regression.7 Here wefocus on DNN, since it is the major contributor to the highprediction accuracy of the winning entry, and we would like toinvestigate the usefulness of DNN by itself as a QSAR method.DNNs were one of the increasingly popular methods in themachine learning community in the past 8 years and produceddisruptively high performance in speech recognition,8 computerBDOI: 10.1021/ci500747nJ. Chem. Inf. Model. XXXX, XXX, XXX XXX

ArticleJournal of Chemical Information and ModelingFigure 1. Architecture of deep neural nets.For this study, it is useful to use proprietary data sets for tworeasons:1. We wanted data sets that are realistically large and whosecompound activity measurements have a realistic amountof experimental uncertainty and include a non-negligibleamount of qualiﬁed data.2. Time-split validation (see below), which we considermore realistic than any random cross-validation, requiresdates of testing, and these are almost impossible to ﬁndin public domain data sets.The Kaggle data sets are provided as Supporting Information.Due to the proprietary nature of the compounds, as in theKaggle competition, the descriptor names are disguised so thecompound structures cannot be reverse engineered from thedescriptors. However, comparisons can be made betweendiﬀerent QSAR methods.A number of these data sets contain signiﬁcant amounts of“qualiﬁed data”. For example, one might know IC50 30 μMbecause 30 μM was the highest concentration tested. It is quitecommon for aﬃnity data in the pharmaceutical industry to havethis characteristic. Most oﬀ-the-shelf QSAR methods canhandle only ﬁxed numbers, so for the purposes of regressionmodels, those activities were treated as ﬁxed numbers, forexample, IC50 30 μM or log(IC50) 4.5. Our experience isthat keeping such data in the QSAR models is necessary;On the other hand, DNNs, as with any neural networkmethod, require the user to set a number of adjustableparameters. In this paper, we examine 15 diverse QSAR datasets and conﬁrm that DNNs in most cases can make betterpredictions than RF. We also demonstrate that it is possible tohave a single set of adjustable parameters that perform well formost data sets, and it is not necessary to optimize theparameters for each data set separately. This makes DNNs apractical method for QSAR in an industrial drug discoveryenvironment. Previously, Dahl et al.10 used DNNs for QSARproblems, but with a less realistic classiﬁcation formulation ofthe QSAR problem, and on public data without a prospectivetime-split of training and test sets. Additionally, Dahl et al.optimized adjustable parameters separately on each assay anddid not focus on the practicalities of industrial drug discoverytasks. METHODSData Sets. Table 1 shows the data sets used in this study.These are in-house Merck data sets including on-target andADME (absorption, distribution, metabolism, and excretion)activities. The 15 labeled “Kaggle Data Sets” are the same datasets we used for the Kaggle competition, which are a subset ofthe data sets in the work of Chen et al.11 A separate group of 15diﬀerent data sets labeled “Additional Data Sets” are used tovalidate the conclusions acquired from the Kaggle data sets.CDOI: 10.1021/ci500747nJ. Chem. Inf. Model. XXXX, XXX, XXX XXX

ArticleJournal of Chemical Information and Modelingnodes with 5 or fewer molecules are not split further. We applythese parameters to every data set.Deep Neural Nets. A neural network is network composedof simulated “neurons”. Figure 1a shows a neuron in its detailedform and simpliﬁed form. Each neuron has multiple inputs (theinput arrows) and one output (the output arrow). Each inputarrow is associated with a weight wi. The neuron is alsoassociated with a function, f(z), called the activation function,and a default bias term b. Thus, when a vector of inputdescriptors X [x1 ··· xN]T of a molecule goes through aneuron, the output of the neuron can be representedmathematically in eq 1:otherwise, less active compounds are predicted to be moreactive than they really are.In order to evaluate QSAR methods, each of these data setswas split into two nonoverlapping subsets: a training set and atest set. Although a usual way of making the split is by randomselection, i.e. “split at random,” in actual practice in apharmaceutical environment, QSAR models are applied“prospectively”. That is, predictions are made for compoundsnot yet tested in the appropriate assay, and these compoundsmay or may not have analogs in the training set. The best wayof simulating this is to generate training and test sets by “timesplit”. For each data set, the ﬁrst 75% of the molecules assayedfor the particular activity form the training set, while theremaining 25% of the compounds assayed later form the testset. We have found that, for regressions, R2 from time-splitvalidation better estimates the R2 for true prospectiveprediction than R2 from “split at random” scheme.12 Sincetraining and test sets are not randomly selected from the samepool of compounds, the data distributions in these two subsetsare frequently not the same, or even similar to, each other. Thisviolates the underlying assumption of many machine learningmethods and poses a challenge to them. Both the training andtest data sets of the Kaggle data sets are provided as SupportingInformation.Descriptors. Each molecule is represented by a list offeatures, i.e. “descriptors” in QSAR nomenclature. Our previousexperience in QSAR favors substructure descriptors (e.g., atompairs (AP), MACCS keys, circular ﬁngerprints, etc.) for generalutility over descriptors that apply to the whole molecule (e.g.,number of donors, LOGP, molecular weight, etc.). In thispaper, we use a set of descriptors that is the union of AP, theoriginal “atom pair” descriptor from Carhart et al.13 and DPdescriptors (“donor acceptor pair”), also called “BP” in thework of Kearsley et al.14 Both descriptors are of the followingform:NO f ( wxi i b)(1)i 1A row of neurons forms a layer of the neural network, and aDNN is built from several layers of neurons, which is illustratedin Figure 1b.Normally, there are three types of layers in a DNN:(1) the input layer (i.e., the bottom layer), where thedescriptors of a molecule are entered(2) the output layer (i.e., the top layer) where predictions aregenerated(3) the hidden (middle) layers; the word “deep” in deepneural nets implies more than one hidden layer.There are two popular choices of activation functions in thehidden layers: (1) the sigmoid function and (2) the rectiﬁedlinear unit (ReLU) function. Both functions and theirderivatives are shown in Figure 2.The output layer can have one or more neurons, and eachoutput neuron generates prediction for a separate end point(e.g., assay result). That is, a DNN can naturally model multipleend points at the same time. The activation function of theneurons in the output layer is usually a linear function, which isshown in Figure 3.The layout of a DNN, including the number of layers and thenumber of neurons in each layer, needs to be prespeciﬁed,along with the choice of the activation function in each neuron.Therefore, to train a DNN is to maximize an objective functionby optimizing the weights and bias of each neuronatom type i (distance in bonds) atom type jFor AP, atom type includes the element, number ofnonhydrogen neighbors, and number of pi electrons; it isvery speciﬁc. For DP, atom type is one of seven (cation, anion,neutral donor, neutral acceptor, polar, hydrophobe, and other).Random Forest. The main purpose of this paper is tocompare DNN to RF. RF is an ensemble recursive partitioningmethod where each recursive partitioning “tree” is generatedfrom a bootstrapped sample of compounds and a randomsubset of descriptors is used at each branching of each node.The trees are not pruned. RF can handle regression problemsor classiﬁcation problems. RF naturally handles correlationbetween descriptors, and does not need a separate descriptorselection procedure to obtain good performance. Importantly,while there are a handful of adjustable parameters (e.g., numberof trees, fraction of descriptors used at each branching, nodesize, etc.), the quality of predictions is generally insensitive tochanges in these parameters. Therefore, the same set ofparameters can be eﬀectively used in various problems.The version of RF we are using is a modiﬁcation of theoriginal FORTRAN code from the work of Breiman.1 It hasbeen parallelized to run one tree per processor on a cluster.Such parallelization is necessary to run some of our larger datasets in a reasonable time. For all RF models, we generate 100trees with m/3 descriptors used at each branch-point, where mis the number of unique descriptors in the training set. The tree ({wi , j}, {bj}, i 1, ., Nj , j 1, ., L 1)where Nj is the number of neurons in the jth layer and L is thenumber of hidden layers. The extra one layer of j is for theoutput layer.The training procedure is the well-known backwardpropagation (BP) algorithm implemented using mini-batchedstochastic gradient descent (SGD) with momentum.15 Theindividual values for are ﬁrst assigned random values. Themolecules in the training set are randomly shuﬄed and thenevenly divided into small groups of molecules called “minibatches”. Each mini-batch is used to update the values of once using the BP algorithm. When all the mini-batches fromthe training set are used, it is said that the training procedureﬁnishes one “epoch”. The training procedure of a DNN usuallyrequires many epochs. That is, the training set is reused manytimes during the training. The number of epochs is anadjustable parameter.The number of elements in for a QSAR task can be verylarge. For example, the training data set can have 8000descriptors, and the DNN can have three hidden layers, eachDDOI: 10.1021/ci500747nJ. Chem. Inf. Model. XXXX, XXX, XXX XXX

ArticleJournal of Chemical Information and Modelingpretraining proposed by Hinton et al.16 and (2) the procedureof “drop-out” proposed by Srivastava et al.17 The ﬁrst approachcan mitigate overﬁtting because it acts as a data-dependentregularizer of , i.e. constraining the values of . Instead ofusing random values to initialize in a DNN, it generatesvalues of by using an unsupervised learning procedureconducted on the input descriptors only, without consideringthe activities of the compounds they represent. The subsequentsupervised BP training algorithm just ﬁne-tunes starting fromthe values produced from the unsupervised learning. Thesecond approach introduces instability to the architecture of theDNN by randomly “dropping” some neurons in each minibatch of training. It has been shown that the drop-outprocedure is equivalent to adding a regularization process to theconventional neural network to minimize overﬁtting.18 Thesetwo approaches can be used separately or jointly in a DNNtraining process.As with a conventional neural network, a DNN can havemultiple neurons in the output layer with each output neuroncorrespond to a diﬀerent QSAR model. We will call this type ofDNN joint DNNs, which was called multitask DNNs in the workof Dahl et al.10 Joint DNNs can simultaneously model multipleQSAR tasks, and all QSAR models embedded in a joint DNNshare the same weights and bias in the hidden layers but havetheir own unique weights and bias in the output layer.Generally speaking, the hidden layers function as a complexfeature/descriptor optimization process, while the output layeracts as a classiﬁer. That is, all involved QSAR activities share thesame feature-extraction process but have their own predictionbased on the weights and bias associated with thecorresponding output neuron. As we will see, joint DNNs areespecially useful for those QSAR tasks with a smaller trainingset. The training set of a joint DNN is formed by mergingtraining sets of all involved QSAR tasks. DNNs generallybeneﬁt from a large training set and can potentially borrowlearned molecule structure knowledge across QSAR tasks byextracting better QSAR features via the shared hidden layers.Therefore, a DNN user can choose either to train a DNN froma single training set or to train a joint DNN from multipletraining sets simultaneously. Many models presented in thispaper were trained as joint DNNs with all 15 data sets. Sincejointly training multiple QSAR data sets in a single model is nota standard approach for most non-neural-net QSAR methods,we need to show the diﬀerence in performance between jointDNNs and individual DNNs trained with a single data set.In order to improve the numeric stability, the input data in aQSAR data set is sometimes preprocessed. For example, theactivities in the training set are usually normalized to zero meanand unit variance. Also, the descriptors, x, can also undergosome transformations, such as logarithmic transformation (i.e., y log(x 1)) or binary transformation (i.e., y 1 if x 0,otherwise y 0). Both transformations were specially designedfor substructure descriptors, which are used in this study, wherethe possible values are integers 0, 1, 2, 3, . For other descriptortypes, one would have to adjust the mathematic form of bothtransformations to achieve the same goal.For a typical QSAR task, training a DNN is quitecomputationally intensive due to the large number of moleculesin the training set, and the large number of neurons needed forthe task. Fortunately, the computation involved in training aDNN is primarily large matrix operations. An increasinglypopular computing technique, called GPU (graphical processing unit) computing, can be very eﬃcient for such large matrixFigure 2. Activation functions used in the hidden layers.Figure 3. Activation function in the output layer.layer having 2000 neurons. Under this condition, the DNN willhave over 24 million tunable values in . Therefore, the DNNtrained using the BP algorithm is prone to overﬁtting.Advancements in avoiding overﬁtting made over the pasteight years played a critical role in the revival of neuralnetworks. Among the several methods to avoid overﬁtting, thetwo most popular ones are (1) the generative unsupervisedEDOI: 10.1021/ci500747nJ. Chem. Inf. Model. XXXX, XXX, XXX XXX

ArticleJournal of Chemical Information and ModelingFigure 4. Overall DNN vs RF using arbitrarily selected parameter values. Each column represents a QSAR data set, and each circle represents theimprovement, measured in R2, of a DNN over RF. The horizontal dashed red line indicates 0, where DNNs have the same performance of RF. Apositive value means that the corresponding DNN outperforms RF. The horizontal dotted green line indicates the overall improvement of DNNsover RF measured in mean R2. The data sets, in which DNNs dominates RF for all arbitrarily parameter settings, are colored blue; the data set, inwhich RF dominates DNNs for all parameter settings, is colored black; the other data sets are colored gray.set. The same metric was used in the Kaggle competition. R2measures the degree of concordance between the predictionsand corresponding observations. This value is especiallyrelevant when the whole range of activities is included in thetest set. R2 is an attractive measurement for model comparisonacross many data sets, because it is unitless, and range from 0 to1 for all data sets. We found in our examples that other popularmetrics, such as normalized root mean squared error (RMSE),i.e. RMSE divided by the standard deviation of observedactivity, is inversely related to R2, so the conclusions would notchange if we used the other metrics.Workﬂow. One key question that this paper tries to answeris whether we can ﬁnd a set of values for the algorithmicparameters of DNNs so that DNNs can consistently makemore accurate predictions than RF does for a diverse set ofQSAR data sets.Due to the large number of adjustable parameters, it isprohibitively time-consuming to evaluate all combinations ofpossible values. The approach we took was to carefully select areasonable number of parameter settings by adjusting the valuesof one or two parameters at a time, and then calculate the R2sof DNNs trained with the selected parameter settings. For eachdata set, we ultimately trained and evaluated at least 71 DNNswith diﬀerent parameter settings. These results provided uswith insights into sensitivities of many adjustable parameters,allowed us to focus on a smaller number of parameters, and toﬁnally generate a set of recommended values for all algorithmicparameters, which can lead to consistently good DNNs acrossthe 15 diverse QSAR data sets.The DNN algorithms were implemented in Python and werederived from the code that George Dahl’s team developed towin the Merck Kaggle competition. The python modules,gnumpy19 and cudamat,20 are used to implement GPUcomputing. The hardware platform used in this study is aWindows 7 workstation, equipped with dual 6-core XeonCPUs, 16 GB RAM, and two NVIDIA Tesla C2070 GPU cards.operations and can dramatically reduce the time needed to traina DNN.To summarize, the adjustable algorithmic parameters (alsocalled metaparameters or hyperparameters in the machinelearning literature) of a DNN are as follows: Related to the data Options for descriptor transformation: (1) no transformation, (2) logarithmic transformation, i.e. y log(x 1), or (3) binary transformation, i.e. y 1 if x 0,otherwise y 0. Related to the network architecture Number of hidden layers Number of neurons in each hidden layer Choices of activation functions of the hidden layers: (1)sigmoid function and (2) rectiﬁed linear unit (ReLU) Related to the DNN training strategy Training a DNN from a single training set or a jointDNN from multiple training sets Percentage of neurons to drop-out in each layer Using the unsupervised pretraining to initialize theparameter or not Related to the mini-batched stochastic gradient descentprocedure in the BP algorithm Number of molecules in each mini-batch, i.e. the minibatch size Number of epochs, i.e. how many times the training set isused Parameters to control the gradient descent optimizationprocedure, including (1) learning rate, (2) momentumstrength, and (3) weight cost strength.10One of the goals of this paper is to acquire insights into howadjusting these parameters can alter the predictive capability ofDNNs for QSAR tasks. Also, we would like to ﬁnd out whetherit is possible for DNNs to produce consistently good results fora diverse set of QSAR tasks using one set of values for theadjustable parameters, which is subsequently called analgorithmic parameter setting.Metrics. In this paper, the metric to evaluate predictionperformance is R2, which is the squared Pearson correlationcoeﬃcient between predicted and observed activities in the test RESULTSDNNs Trained with Arbitrarily Selected Parameters.First, we want to ﬁnd out how well DNNs can perform relativeFDOI: 10.1021/ci500747nJ. Chem. Inf. Model. XXXX, XXX, XXX XXX

ArticleJournal of Chemical Information and ModelingTable 2. Comparing Test R2s of Diﬀerent Modelsto RF. Therefore, over 50 DNNs were trained using diﬀerentparameter settings. These parameter settings were arbitrarilyselected, but they attempted to cover a suﬃcient range of valuesof each adjustable parameter. (A full list of the parametersettings is available as Supporting Information.) Morespeciﬁcally, our choices for each parameter are listed as follows: Each of three options of data preprocess (i.e., (1) notransformation, (2) logarithmic transformation, and (3)binary transformation) was selected. The number of hidden layers ranged from 1 to 4. The number of neurons in each hidden layer rangedfrom 100 to 4500. Each of the two activation functions (i.e., (1) sigmoidand (2) ReLU) was selected. DNNs were trained both (1) separately from anindividual QSAR data set and (2) jointly from a dataset combining all 15 data sets. The input layer had either no dropouts or 10% dropouts.The hidden layers had 25% dropouts. The network parameters were initialized as randomvalues, and no unsupervised pretraining was used. The size of mini-batch was chosen as either 128 or 300. The number of epochs ranged from 25 to 350. The parameters for the optimization procedure wereﬁxed as their default values. That is, learning rate is 0.05,momentum strength is 0.9, and weight cost strength is0.0001.Figure 4 shows the diﬀerence in R2 between DNNs and RF foreach data set. Each column represents a QSAR data set, andeach circle represents the improvement, measured in R2, of aDNN over RF. A positive value means that the correspondingDNN outperforms RF. A boxplot with whiskers is also shownfor each data set. Figure 4 demonstrates that, with ratherarbitrarily selected parameter settings, DNNs on averageoutperform RF in 11 out of the 15 Kaggle data sets. Moreover,in ﬁve data sets, DNNs do better than RF for all parametersettings. Only in one data set (TDI), the RF is

Quantitative structure activity relationships (QSAR) is a very commonly used technique in the pharmaceutical industry for predicting on-target and oﬀ-target activities. Such predictions help prioritize the experiments during the drug discovery process and, it is hoped, will substantially reduce the experimental work that needs to be done.

Related Documents:

Invited: Co-Design of Deep Neural Nets and Neural Net ...

Neural Network, Power, Inference, Domain Specific Architecture ACM Reference Format: KiseokKwon,1,2 AlonAmid,1 AmirGholami,1 BichenWu,1 KrsteAsanovic,1 Kurt Keutzer1. 2018. Invited: Co-Design of Deep Neural Nets and Neural Net Accelerators f

48 Views

2y ago

Tutorial: Learning Deep Architectures

The Deep Breakthrough Before 2006, training deep architectures was unsuccessful, except for convolutional neural nets Hinton, Osindero & Teh « A Fast Learning Algorithm for Deep Belief Nets », Neural Computation, 2006 Bengio, Lamblin, Popovici, Larochelle « Greedy Layer-Wise Training of Deep Networks », NIPS'2006

34 Views

1y ago

Co-Design of Deep Neural Nets and Neural Net Accelerators ...

Co-Design of Deep Neural Nets and Neural Net Accelerators for Embedded Vision Applications Kiseok Kwon,1,2 Alon Amid,1 Amir Gholami,1 Bichen Wu,1 Krste Asanovic,1 Kurt Keutzer1 1 Berkeley AI Research, University of California, Berkeley 2 Samsung Research, Samsung Electronics, Seoul, South Korea {kiseo

33 Views

2y ago

MPACT

Make the 3D shapes 13 Use the nets you just made. 1. Put the nets flat on thin cardboard or thick paper. 2. Trace around the nets with a pencil to draw the nets on the thin cardboard. Or you can glue your paper net on the thin cardboard. 3. Cut out the cardboard nets. 4. Decorate the

57 Views

2y ago

APPLICATIONS OF PETRI NETS

APPLICATIONS OF PETRI NETS A Thesis Submitted to . In this thesis we research into the analysis of Petri nets. Also we give the structure of Reachability graphs of Petri nets and . (Ye and Zhou 2003) about Petri nets and its’ properties. One can ﬁnd further information about Pet

15 Views

2y ago

Neural Networks and Introduction to Bishop (1995) : Neural networks for ...

Deep Learning 1 Introduction Deep learning is a set of learning methods attempting to model data with complex architectures combining different non-linear transformations. The el-ementary bricks of deep learning are the neural networks, that are combined to form the deep neural networks.

17 Views

1y ago

Deep Neural Nets with Interpolating Function as Output ...

Deep Neural Nets with Interpolating Function as Output Activation Bao Wang Department of Mathematics University of California, Los Angeles wangbaonj@gmail.com Xiyang Luo Department of Mathematics University of California, Los Angeles xylmath@gmail.com Zhen Li Department of Mathematics HKUS

15 Views

3y ago

Lecture 10.3 Introduction to deep learning (CNN)

Deep learning has dramatically improved state- of-the-art in: Speech and character recognition Visual object detection and recognition Convolutional neural nets for processing of images, video, speech and signals (time series) in general Recurrent neural nets for processing of sequential data (speech, text). 2 Level 3

5 Views

1y ago

Recent Views

Cyber Security Guide for NZ Law Firms - WordPress

2 Incident Response Solutions Cyber Security Guide for NZ Law Firms Welcome to the Cyber Security Guide for NZ Law Firms The storage of sensitive client information and management of large funds make law firms an attractive target for cybercriminals. It is therefore critical for law firms to understand and mitigate the cyber risks they face.

1y ago

145 Views

New Prudential Regime for Investment Firms - Allen Overy

(iii) Investment firms - often referred to as 'Class 2 firms' - these are non-systemic investment firms that do not carry out dealing on own account or underwriting activities. This category of firms are subject to the full scope of the prudential regime is set out in the IFR and IFD. (iv) Small and non-interconnected investment firms -

1y ago

105 Views

The new EU prudential regime for investment firms

In any event, many bank and non-bank financial groups operating through investment firms in the UK have created new EU27 investment firms (or are scaling up existing EU27 investment firms) to serve EU27 clients as part of their Brexit planning. These firms will be subject to the new EU prudential regime. New Classification of Investment Firms

4m ago

54 Views

Actionable Intelligence: Successful Bi for Law Firms

Source: Gartner, Business Intelligence Imperative, 2001 ACTIONABLE INTELLIGENCE: SUCCESSFUL BI FOR LAW FIRMS - 3. A decade later, the fact gap remains a core issue. Law firms have more data than ever about . 1990 Mid-2000s 2015 A CONDENSED HISTORY OF BUSINESS INTELLIGENCE ACTIONABLE INTELLIGENCE: SUCCESSFUL BI FOR LAW FIRMS - 5.

1y ago

139 Views

12 PUBLIC LAW AND PRIVATE LAW - Home: The National .

INTRODUCTION TO LAW MODULE - 3 Public Law and Private Law Classification of Law 164 Notes z define Criminal Law; z list the differences between Public and Private Law; and z discuss the role of Judges in shaping Law 12.1 MEANING AND NATURE OF PUBLIC LAW Public Law is that part of law, which governs relationship between the State

3y ago

753 Views

Dr. Ram Manohar Lohiya National Law University, Lucknow

2. Health and Medicine Law 3. Int. Commercial Arbitration 4. Law and Agriculture IXth SEMESTER 1. Consumer Protection Law 2. Law, Science and Technology 3. Women and Law 4. Land Law (UP) Xth SEMESTER 1. Real Estate Law 2. Law and Economics 3. Sports Law 4. Law and Education **Seminar Courses Xth SEMESTER (i) Law and Morality (ii) Legislative .

3y ago

506 Views

Investment banks hedge funds private equity

investment banks, hedge funds, and private equity firms can use the book to broaden their understanding of their industry and competitors. Finally, professionals at law firms, accounting firms, and other firms that advise investment banks, hedge funds, and private equity firms should

2y ago

378 Views

2021 Report on the State of the Legal Market

1 Thomson Reuters Peer Monitor data are based on reported results from 162 U.S.-based law firms, including 45 Am Law 100 firms, 56 Am Law Second 100 Firms, and 61 additional Midsize firms. 2 Malcolm Gladwell, The Tipping Point

2y ago

143 Views

Cyber Security for Law Firms

Cyber Security and Legal Practice (Australia) Cyber security threats are increasing. 2019 Cyber Security Report - American Bar Association (ABA)(United States) Over a quarter of firms report that they have experienced some sort of security breach Less than a third of law firms have an incident response plan. 2019 PwC Law Firms' Survey

1y ago

136 Views

MARTINDALE-HUBBELL TOP RANKED LAW FIRMS METHODOLOGY TOP - Fee, Smith

view the entire list online at: fortune.com & law.com martindale-hubbell top ranked law firms methodology ranked firms law top page proof—for approval only presents leal leaders coming in 2015 featured in women leaders law in the 2015 for more information call: 855-808-4520 or e-mail legalleaders@alm.com page proof—for approval .

1y ago

99 Views

Companies Law - Cayman Islands dollar

Law 1 of 1971-15th December, 1970 Law 7 of 2000- 20th July, 2000 Law 7 of 1973-28th June, 1973 Law 5 of 2001-20th April, 2001 Law 24 of 1974-22nd November, 1974 Law 10 of 2001-25th May, 2001 Law 25 of 1975-9th December, 1975 Law 29 of 2001-26th September, 2001 Law 19 of 1977-10th November, 1977 Law 46 of 2001-14th January, 2002

3y ago

464 Views

It’s the Law!

ciples stated in Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law. Students will be able to explain the application of Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law to observations or events related to SCUBA diving. MateriaLs None audio/visuaL MateriaLs None teachinG tiMe

2y ago

387 Views

WHAT LAW IS ? An Introduction to Law

common law system civil law system!! sources of law in civil law !! a1. primary: statutes (written law) enacted by legislative power are the principal source of law. ! a2. two subsidiary sources of law: ! a2.1 administrative regulations a.2.2 customs!! ! sources of law in common law !!! b1. two primary sources of

2y ago

396 Views

Growth Processes of High- Growth Firms in the UK - Nesta

Interest in high-growth ﬁrms (HGFs) has exploded in recent years, once the job-creating prowess of a minority of fast-growing ﬁrms became recognized - roughly 4% of ﬁrms can be expected to generate 50% of jobs (Storey, 1994, p. 117). Research into high-growth ﬁrms has itself undergone high-growth. However, the level of analysis has of-

1y ago

127 Views

Socio-economic profile Coastal and marine ecosystem and economy

According to the Philippine Plastics Industry Association, Inc. (PPIA), there are 1,088 firms throughout the Philippines. The majority of the plastics companies are situated in the National Capital Region (NCR) with 642 firms. This is followed by CALABARZON area with 176 firms. While Central Luzon registered 87 firms. Central Visayas have 87 firms.

1y ago

130 Views

Deep Neural Nets As A Method For Quantitative Structure Activity .

It looks like you're using an ad-blocker