Black Box FDR - Columbia University

2y ago
21 Views
2 Downloads
5.60 MB
10 Pages
Last View : 24d ago
Last Download : 3m ago
Upload by : Rafael Ruffin
Transcription

Black Box FDRWesley Tansey 1 2 Yixin Wang 1 3 David M. Blei 1 3 4 Raul Rabadan 2AbstractAnalyzing large-scale, multi-experiment studiesrequires scientists to test each experimental outcome for statistical significance and then assessthe results as a whole. We present Black BoxFDR (BB-FDR), an empirical-Bayes method foranalyzing multi-experiment studies when manycovariates are gathered per experiment. BB-FDRlearns a series of black box predictive models toboost power and control the false discovery rate(FDR) at two stages of study analysis. In Stage1, it uses a deep neural network prior to reportwhich experiments yielded significant outcomes.In Stage 2, a separate black box model of eachcovariate is used to select features that have significant predictive power across all experiments.In benchmarks, BB-FDR outperforms competingstate-of-the-art methods in both stages of analysis. We apply BB-FDR to two real studies oncancer drug efficacy. For both studies, BB-FDRincreases the proportion of significant outcomesdiscovered and selects variables that reveal keygenomic drivers of drug sensitivity and resistancein cancer.1. IntroductionHigh-throughput screening (HTS) techniques have fundamentally changed the landscape of modern biological experimentation. Rather than conducting just one experimentat a time, HTS enables scientists to perform hundreds ofparallel experiments, each with different biological samples and different interventions. At the same time, HTSalso enables scientists to gather rich contextual information about each experiment by profiling the samples under1Data Science Institute, Columbia University, New York, NY,USA 2 Department of Systems Biology, Columbia UniversityMedical Center, New York, NY, USA 3 Department of Statistics,Columbia University, New York, NY, USA 4 Department of Computer Science, Columbia University, New York, NY, USA. Correspondence to: Wesley Tansey wt2274@columbia.edu .Proceedings of the 35 th International Conference on MachineLearning, Stockholm, Sweden, PMLR 80, 2018. Copyright 2018by the author(s).study using techniques like DNA sequencing. Thus, eachHTS study produces a dataset of many experiments, whereeach experiment contains both an outcome variable and ahigh-dimensional feature set describing the context.Figure 1 shows a slice of the Genomics of Drug Sensitivityin Cancer (GDSC) dataset (Yang et al., 2012), an HTS studyinvestigating how cancer cell lines respond to different cancer therapeutics. The left panel shows the relative responseof 30 different cancer cell lines (C1 , C2 , . . . , C30 ) treatedwith the drug Nutlin-3. For each cell line, the treatment response (black triangles) is overlayed on top of the untreatedcontrol replicate distribution (gray box plots). Even when nodrug is applied, each cell line still exhibits natural variation.The first goal in analyzing this data is therefore to addressthe question of whether a given cell line responded to thetreatment. Concretely, we need to perform a hypothesis testfor each cell line, where the null hypothesis is that the drughad no effect. Absent other information, this would be aclassic multiple hypothesis testing (MHT) problem.But HTS studies such as GDSC differ from the classicalsetup by also producing a rich set of side information foreach experiment. The right panel of Figure 1 shows a subsetof the genomic profile for each cell line, with a black dotindicating the cell line has a mutation in that gene. Biologically, a mutated gene can lead to different phenotypicbehavior that may cause sensitivity or resistance to a drug.Statistically, this means the likelihood of a cell line responding to treatment is a latent function of that cell line’s genomic profile. Identifying which mutations are associatedwith treatment response could guide future experiments anddevelopment of new targeted therapies. Deriving scientificinsight from patterns across experiments represents a secondphase of hypothesis testing, where the null hypothesis isthat a given gene is not associated with drug response.We term these two phases Stage 1 and Stage 2 and ask twoscientifically-motivated statistical inference questions: Stage 1: How do we leverage the available side information in a HTS study to increase how many significantoutcomes we can detect? Stage 2: Can we discover which variables are associated with significant outcomes, even when the underlying function is high-dimensional and nonlinear?

C6C5C4C3C2C1 6 4 2024Control and treatment response TP53VHLEWS-FLI1Cell line experimentsBlack Box FDRFigure 1. Left: a subset of 30 cell line experiments from the Nutlin-3 case study in Section 5. Control replicates (gray box plots) and cellline responses (black triangles) are measured as z-scores relative to mean control values. Right: a subset of the corresponding genomicfeatures for each experiment; black dots indicate a cell line has a recurrent mutation in the gene labeled on the x-axis. The goal in Stage1 analysis is to select cell lines that showed a significant response (circled in blue). In Stage 2, the genomic features are analyzed tounderstand the mutations driving drug response (circled in orange).We answer both of these questions and propose Black BoxFDR (BB-FDR), a method for analyzing multi-experimentstudies with many covariates gathered per experiment. BBFDR uses the covariates to build a deep probabilistic modelthat predicts how likely a given experiment is to generatesignificant outcomes a priori. It uses this prior model toadaptively select significant outcomes in a manner that controls the overall false discovery rate (FDR) at a specifiedStage 1 level. BB-FDR then builds a predictive model ofeach covariate to perform variable selection on the Stage 1model while conserving a specified Stage 2 FDR threshold.We validate BB-FDR on both synthetic and real data. BBFDR outperforms other state-of-the-art Stage 1 methodsin a series of benchmarks, including the recently-proposedNeuralFDR (Xia et al., 2017). BB-FDR is also a morepragmatic choice compared to a fully-Bayesian approach:it scales trivially to thousands of covariates, can learn arbitrarily complex functions, and runs easily on a laptop.We apply BB-FDR to a real-world case study of two cancerdrug screenings. BB-FDR finds more significant discoverieson the data and recovers an informative set of biologicallyplausible genes that may convey drug sensitivity and resistance in cancer.2. Multiple testing and FDR controlIn the classical MHT setup, z (z1 , . . . , zn ) are a set ofindependent observations of the outcomes of n experiments.For each experiment, a treatment is applied to a target andthe treatment has either no effect (hi 0) or some effect(hi 1). If the treatment has no effect, the distribution ofthe test statistic is the null distribution f0 (z); otherwise itfollows an unknown alternative distribution f1 (z). The nullhypothesis for every experiment is that the test statistic wasdrawn from the null distribution: H0 : hi 0.2.1. False discovery rate controlIn most experiments of interest, it is impossible to determinehi with no error. For a given prediction ĥi , we say it isa true positive or a true discovery if ĥi 1 hi anda false positive or false discovery if ĥi 1 6 hi . LetS {i : hi 1} be the set of observations for which thetreatment had an effect and Ŝ {i : ĥi 1} be the set ofpredicted discoveries. We seek procedures that maximizethe true positive rate (TPR) also known as power, whilecontrolling the false discovery rate–the expected proportionof the predicted discoveries that are actually false positives,FDR : E[FDP] ,FDP #{i : i Ŝ\S}#{i : i Ŝ}.(1)FDP in (1) is the false discovery proportion: the actualproportion of false positives in the predicted discovery setfor a specific experiment. While ideally we would like tocontrol the FDP, the randomness of the outcome variablesmakes this impossible in practice. Thus FDR is the typicalerror measure targeted in modern scientific analyses.

Black Box FDR3.1. Stage 1: determining significant outcomesθhcaxzbMnFigure 2. The graphical model for BB-FDR.We model the test statistic as arising from a mixture modelof two components, the null (f0 ) and the alternative (f1 ). Anexperiment-specific weight ci then models the prior probability of the test statistic coming from the alternative (i.e.the probability of the treatment having an effect a priori).We place a beta prior on each experiment-specific prior ciand model the parameters of the hyperprior with a black boxfunction G parameterized by θ; in our implementation, G isa deep neural network. The complete model for BB-FDR is:zi hi f1 (zi ) (1 hi )f0 (zi )hi Bernoulli(ci )(2)ci Beta(ai , bi )2.2. Related workControlling FDR in multiple hypothesis testing has a longhistory in statistics and machine learning. The BenjaminiHochberg (BH) procedure (Benjamini & Hochberg, 1995)is the classic technique and still the most widely used inscience. Many other methods have since been developedto take advantage of study-specific information to increasepower. Recent examples include accumulation tests forordering information (Li & Barber, 2017), the p-filter forgrouping and test statistic dependency (Ramdas et al., 2017),FDR smoothing for spatial testing (Tansey et al., 2017),FDR-regression for low-dimensional covariates (Scott et al.,2015), and, most recently, NeuralFDR for high-dimensionalcovariates (Xia et al., 2017). We consider high-dimensionalcovariates and compare against NeuralFDR in Section 4.3. Black Box FDRConsider a study with n independent experiments that produces a set of independent test statistics z (z1 , . . . , zn )corresponding to the outcome measurements, as in Section 2. However, now each experiment also has a vector ofm covariates Xi· (Xi1 , . . . , Xim ) containing side information that may influence the outcome of that experiment.Specifically, whether the experiment comes from the nulldistribution hi 0 or the alternative hi 1 is allowed todepend arbitrarily on Xi· .BB-FDR extends the empirical-Bayes two-groups model ofEfron (2008) by building a hierarchical probabilistic modelwith experiment-specific priors modeled through a deepneural network. We first estimate the alternative distributionoffline using predictive recursion (Newton, 2002) to estimatef1 . This follows other recent extensions to the two-groupsmodel (Scott et al., 2015; Tansey et al., 2017) and enjoysstrong empirical performance and consistency guarantees(Tokdar et al., 2009). BB-FDR then focuses on modeling theexperiment-specific prior, assuming the null and alternativedistributions are fixed.ai , bi Gθ,i (X) .We optimize θ by integrating out hi and maximizing thecomplete data log-likelihood,Z1(ci f1 (zi ) (1 ci )f0 (zi ))Beta(ci Gθ,i (X))dci .pθ (zi ) 0(3)Figure 2 shows the BB-FDR graphical model.The beta prior is a departure from other two-groups extensions, which use a flatter hierarchy and learn a predictivemodel for ci (Scott et al., 2015; Tansey et al., 2017). Wefound the flat approach to be difficult to train, leading to adegenerate G that always predicts the global mean prior.A hierarchical prior improves training for two reasons. First,optimization is easier and more stable because the output ofthe function is two soft-plus activations. Compared to a sigmoid, this form leads to less saturated gradients. Second, theadditional hierarchy allows the model to assign different degrees of confidence to each experiment, changing the modelfrom homoskedastic to heteroskedastic. Finally, we found itimportant to enforce concavity of the beta distribution; wethus add 1 to both ai and bi .We fit the model in (2) with stochastic gradient descent(SGD) on an L2 -regularized loss function,minimizeθ R θ X2log pθ (zi ) λ Gθ (X) F ,(4)iwhere · F is the Frobenius norm. In pilot studies, wefound adding a small amount of L2 -regularization preventedover-fitting at virtually no cost to statistical power. Forcomputational purposes, we approximate the integral in (3)by a fine-grained numerical grid.Once the optimized parameters θ̂ are chosen, we calculatethe posterior probability of each test statistic coming from

Black Box FDRthe alternative,ŵi pθ̂ (hi 1 zi )Z 1ci f1 (zi )Beta(ci Gθ̂,i (X))dci . 0 ci f1 (zi ) (1 ci )f0 (zi )(5)Assuming the posteriors are accurate, rejecting the ith hypothesis will produce 1 ŵi false positives in expectation.Therefore we can maximize the total number of discoveriesby a simple step down procedure. First, we sort the posteriors in descending order by the likelihood of the test statisticsbeing drawn from the alternative. We then reject the first qhypotheses, where 0 q n is the largest possible indexsuch that the expected proportion of false discoveries is below the FDR threshold. Formally, this procedure solves theoptimization problem,maximize qqPqi 1 (1 ŵi ) α,subject toqfor a given FDR threshold α. By convention00(6) 0.The model in (2)–(6) handles Stage 1 of the analysis. Theblack box model G uses the entire feature vector Xi· ofevery experiment to predict the prior parameters over ci .The observations zi are then used to calculate the posteriorprobability ŵi that the treatment had an effect. The selectionprocedure in (6) uses these posteriors to reject a maximumnumber of null hypotheses while conserving the FDR.3.2. Stage 2: identifying important variablesUsing a flexible black box model for G in (2) provides atrade-off. On one hand, it enables BB-FDR to learn a richclass of functions for the relationship between the covariatesand the test statistic. As we show in Section 4, this increasespower in Stage 1 compared to a standard linear model.However, variable selection (Stage 2) is straightforwardin linear models whereas black box models are by definition opaque. Understanding which variables deep learningmodels use to make predictions is an ongoing area of research in both machine learning (e.g. Elenberg et al., 2017)and specific scientific disciplines (e.g. Olden & Jackson,2002; Giam & Olden, 2015, in ecology). As far as we areaware, there are currently no methods that provide variableselection with FDR control when the covariates may havearbitrary dependency structure.To overcome this challenge, BB-FDR uses conditional randomization tests (CRTs) (Candes et al., 2018). The ideaof a CRT is to model each coordinate of the feature matrixX·j using only the other coordinates X· j . The conditionaldistribution P(X·j X· j ) then represents a valid null distribution for testing the hypothesis X·j Z X· j , whereZ is the test statistic. The corresponding p-value can becalculated by sampling from the conditional to approximatethe true p-value,h hiie·j , X· j )) ,pj EXe·j P(X·j X· j ) I t(z, X) t(z, (Xwhere t is the test statistic of interest. Once the p-valueshave been estimated for all features, we can apply standardBH correction and report significant features.BB-FDR tests which features are associated with a change inthe posterior probability of zi coming from the alternative.It uses the negative entropy of the posteriors as the teststatistic,XXt(z, X) ŵi log ŵi (1 ŵi ) log(1 ŵi ) . (7)iiIntuitively, if a feature is useful in predicting treatmentefficacy, it should reduce the overall entropy of the posterior.By definition, a feature sampled from the null adds no newinformation to the model; it cannot systematically reducethe entropy.For this procedure to retain frequentist consistency guarantees, both the conditional null distribution and the modelof the prior must be the true distributions. In practice, onenever has access to these and thus we estimate both; for theconditional null, we use gradient boosting trees (Chen &Guestrin, 2016).4. BenchmarksWe perform a series of benchmark studies to assess theperformance of BB-FDR in both stages of inference. Foreach benchmark, we compare the power of BB-FDR toother state-of-the-art approaches. In all studies, we considerbinary covariates and real-valued z-scores as test statistics.Across experiments, we found BB-FDR is particularly suitable for large samples: it outperforms competing methodsin both stages while being more computationally efficient.4.1. SetupWe consider three different ground truth models for P (X),the joint distribution over the covariates, and P (h 1 X),the prior probability of coming from the alternative distribution given the covariates: Constant: All covariates are sampled IID normal; theprior is independent of the covariates, with P (hi 1 X) 0.5. Linear: Covariates are sampled from a multivariatenormal with full covariance matrix (i.e. conditionallylinear); the prior is a linear function with IID standardnormal coefficients for each covariate.

Black Box 1(z)0.250.200.150.100.050.00 10.0 7.5 5.0 2.50.02.55.07.510.0zFigure 3. The two alternative densities used in our benchmarks.The well-separated (WS) density has little overlap with the null,making for a stronger signal. Nonlinear: Covariates and prior coefficients are generated similarly. We first drawing 20 IID uniformBern(0.5) latent variables. For each covariate, 5 pairsof latent variables (ui , uj ) are chosen and with equalprobability are either ANDed or XORed together andmultiplied by a draw from a standard normal; the latentweights are summed to get the final logit value for thecovariate or coefficient.For each of the three ground truth models, we consider twodifferent alternative distributions: Well-Separated (WS): A 3-component Gaussian mixture model, f1 (z) 0.48N ( 2, 1) 0.04N (0, 16) 0.48N (2, 1) Poorly-Separated (PS): A single normal with highoverlap with the null, f1 (z) N (0, 9).Figure 3 shows the densities used in our benchmarks.For each of the 6 combinations of the above scenarios, werun 100 independent trials. Each trial uses 50 covariates; forall trials with a non-constant prior, 25 of the variables areused in the true data generating distribution and the other25 are null variables with no association with the outcome.To measure sample efficiency, we also vary the sample sizefrom n 100 to n 50K. The target FDR threshold is setto 10% for both stages of inference.We compare BB-FDR to the classic Benjamini-Hochberg(BH) method (Benjamini & Hochberg, 1995), the recentlyproposed NeuralFDR (Xia et al., 2017), and a fullyBayesian logistic regression model for ci in place of theblack box prior in (2). For NeuralFDR, we use the default recommended settings, including five random restartsand a ten-layer deep neural network. The fully-Bayesianmethod uses a standard normal prior on the weights and aninverse-Wishart prior on the variance, with weak hyperpriors. In the nonlinear scenario, we specify all possible pairwise interactions as the covariate set for the fully-Bayesianmodel to ensure it is well-specified. We fit the model using Polya-gamma sampling (Polson et al., 2013) with 5000burn-in iterations and 1000 samples. For BB-FDR, we usea 50 200 200 2 network with ReLU activation; fortraining we use RMS-prop (Tieleman & Hinton, 2012) withdropout, learning rate 3 10 4 , and batch size 100, and runfor 50 epochs, with 3 folds to create 3 separate models as inNeuralFDR; we set the λ regularization term to 10 4 .4.2. Stage 1 performanceFigure 4 shows the results for the Stage 1 benchmarks,where the goal is to determine for which experiments thetreatment had an effect. The four methods generally conserve FDR at the specified 10% threshold, though NeuralFDR seems to systematically violate FDR in the lowsample regime.Across all experiments, we see that both BH and NeuralFDRunder-perform the two Bayesian methods. In the case ofBH, this is straight-forward as it uses only the p-value fromeach experiment and has no notion of side information. NeuralFDR, on the other hand, uses a deep neural network andseveral random restarts. There are a few possible reasonsfor its poor performance. First, the NeuralFDR methodwas reported to be very difficult to train by the original authors, so it is possible that it is simply not finding good fitsof the model. Second, BB-FDR assumes that the alternative distribution is conditionally independent of the prior;NeuralFDR makes no such assumption and may lose somepower as a result. Finally, NeuralFDR was originally testedon 1- and 2-dimensional problems against relatively weakbaselines. Our benchmarks examine its performance in ahigher-dimensional setting and with several uninformativefeatures that may make fitting NeuralFDR difficult.Since the fully-Bayesian method is well-specified in everybenchmark, it serves as an oracle model to establish a reasonable upper bound on Stage 1 performance. However,the oracle power depends on the MCMC approximation ofthe posterior being well-mixed. As the sample size grows,the empirical-Bayes model used by BB-FDR gains an increasingly precise approximation to the true posterior. Inthe large-sample regime with a well-separated alternative,BB-FDR outperforms even the oracle. Furthermore, thefully-Bayesian method takes several hours to fit in the nonlinear scenarios; BB-FDR fits within a few minutes and caneasily be run on a laptop.

Black Box FDR0.50.70.6Power and FDRPower and 20.10.1102104103102104103SamplesSamples(a) Constant (PS)(b) Constant (WS)0.60.8Power and FDRPower and c) Linear (PS)(d) Linear (WS)0.60.80.7Power and FDR0.5Power and 02104103Samples(e) Nonlinear (PS)102104103Samples(f) Nonlinear (WS)Figure 4. Hypothesis testing results on the synthetic datasets averaged over 100 trials at varying sample sizes on the two differentalternative distributions. Solid lines show power; dashed lines show estimated FDR; the red horizontal line denotes the specified 10%FDR threshold. In general, the Benjamini-Hochberg and NeuralFDR methods have lower power since they do not model the alternative.The fully-Bayesian method has high power in the low-to-moderate sample regime, but as the sample size grows the empirical-Bayesapproach of BB-FDR becomes more effective.

Black Box FDR0.80.8Full-BayesBB-FDR0.70.7Power and FDRPower and 4103102(a) Linear (PS)104103SamplesSamples(b) Nonlinear (PS)Figure 5. Variable selection results at a 10% FDR threshold. In low sample regimes, the conditional null distribution used in the CRTprocedure is poorly fit and results in violated FDR thresholds. At moderate-to-large samples, BB-FDR has higher power than thefully-Bayesian model and conserves FDR.4.3. Stage 2 performance5. Cancer drug screeningNeither BH nor NeuralFDR provide support for detectingimportant features (Stage 2), so we could not compareagainst them. For the Bayesian linear regression, we takethe 90% posterior credible interval over the coefficient valuefor each covariate. If the interval does not contain zero, wereject the null hypothesis and report it as a discovered feature; this approach is standard in the Bayesian literature(Gelman et al., 2014).As a case study of how BB-FDR is useful in practice, we apply it to two high-throughput cancer drug screening studies(Lapatinib and Nutlin-3) from the Genomics of Drug Sensitivity in Cancer (GDSC) (Yang et al., 2012). For both drugstudies, BB-FDR increases the number of Stage 1 discoveries over classical BH correction; results on NeuralFDR weresimilar to BH and are omitted. In Stage 2, BB-FDR discovers biologically-plausible genes that may have a causal linkto drug sensitivity and resistance. Experimental details areavailable in the supplement.Figure 5 presents the results of the variable selection benchmarks for the poorly-separated alternative distribution; results for the well-separated are similar. We omit the constantscenario, since there are no features to discover. In the smallsample regime, the conditional distributions are poor estimators of the conditional null distribution for each feature.This leads to BB-FDR overestimating the number of signalfeatures and violating the FDR threshold. As the samplesize grows, the conditional null and black box prior becomemore accurate, leading to FDR control and higher power,respectively; in the large-sample regime, BB-FDR outperforms the fully-Bayesian approach.We conclude by noting that BB-FDR is competitive withthe fully-Bayesian approach even when the latter is wellspecified. In practical data analysis scenarios, such as thecancer study we discuss next, we do not know the trueprior function. It may easily contain many higher-orderinteraction terms that are prohibitive to consider explicitlyin a fully-Bayesian model, making BB-FDR a pragmaticchoice for real-world scientific datasets.5.1. Analysis overviewAnalysis of the two drug studies broadly follows the twostages outlined in the motivating example in Section 1. TheStage 1 task is to determine, for a specific drug being testedon a specific cell line, whether the drug had any effect. Aswith any biological process, natural variation injects randomness at many levels of the experiment: how fast the cellsgrow, how each cell responds, etc. Thus Stage 1 requiresperforming statistical hypothesis testing to determine if thecell population after treatment is truly smaller than wouldbe expected from a control (null) population.The inferential goal in Stage 2 is to gain scientific insightabout which genes may be driving drug response. Thisinvolves building a statistical model of the relationship between the genomic profile of a cell line and its correspondingtreatment response, then performing variable selection onthe model. The selected genes form the basis for potentialmechanisms of action and future experiments can be designed to test for a causal link or to investigate new drugsthat better-target the proteins encoded by the genes.

Black Box FDRLapatinibNutlin-3BRCA1, BRCA2, CDK4FGFR2, KIT, MSH2P300, FLCN, FLT3MET, KIT, MSH6SETD2, TP53, BCR-ABLTable 1. Significant gene mutations identified by BB-FDR that areassociated with sensitivity and resistance to each drug. Both listsalign well with known genomic targets of Lapatinib and Nutlin-3.5.2. ResultsFigure 6 shows the aggregate number of treatment effectsdiscovered by both BH and BB-FDR. For both drugs, BBFDR provides approximately a 50% increase in Stage 1discoveries compared to BH. The genomic profiles of thecell lines provide enough prior information that even someoutcomes with a z-score above zero are still found to besignificant. This flexibility is impossible with classical Stage1 testing methods like BH that do not consider covariateinformation.Table 1 lists the genes reported by BB-FDR in Stage 2.Interpreting the quality of the results requires familiaritywith genomics and cancer biology. Below, we briefly detailthe scientific rational behind the biological plausibility ofthe Stage 2 results and refer the reader to Weinberg (2013)for a full review.35NullDiscoveries257060CountCount30201510 8 6 4z 2002 10 8 6 4z 202(b) BH on Nutlin-3(151 discoveries)703060Count25Count3010 10(a) BH on Lapatinib(117 discoveries)201510504030205040Nutlin-3 is an inhibitor of the oncogene MDM2, whichnegatively-regulates TP53. When highly over-expressed,MDM2 can functionally inactivate TP53. By targetingMDM2, Nutlin-3 enables a non-mutated (“wild type”) TP53to trigger apoptosis in cancer cells. However, if TP53 ismutated, Nutlin-3 will be ineffective and hence its mutationstate is an important driver of Nutlin-3 sensitivity. Whenwild type TP53 is present, MET controls the fate of the cell(Sullivan et al., 2012), SETD2 functionality is required toactivate TP53 (Carvalho et al., 2014), P300 mediates TP53acetylation (Reed & Quelle, 2014), and BCR-ABL is a genefusion that induces loss of TP53 (Pierce et al., 2000). Thesegenes interact in complex, non-linear ways, yet BB-FDR isstill able to identify them as important. Finally, FLCN is atumor suppressor gene that can delay cell cycle like TP53(Laviolette et al., 2013). The mechanism by which FLCNand TP53 are interrelated is currently unclear, representinga potential target for future experiments.205050tumor suppressor genes that are seen mutated in more than10% of breast cancers (BRCA stands for “breast cancer”)and thus cancer type may represent a latent confounderfor drug efficacy that induces a conditional dependence.Lapatinib targets over-expression of the gene ERBB2 whichcan be caused by a mutant CDK4 gene. FGFR2 and KITare also commonly associated with breast cancers (Slatteryet al., 2013; Zhu et al., 2014) and BRCA1 is known toinduce inactivation of the tumor suppressor MSH2 (Atalayet al., 2002). Given Lapatinib’s success as a breast cancerdrug, the connection between all of the selected genes andbreast cancer is a reassuring sign that BB-FDR selectedbiologically plausible features.10 10 8 6 4z 20(c) BB-FDR on Lapatinib(181 discoveries)20 10 8 6 4z 202(d) BB-FDR on Nutlin-3(222 discoveries)Figure 6. Discoveries found by BB-FDR on the two drugs, compared to the discoveries found by a naive BH approach. BB-FDRleverages the genomic profiling information of the cell lines toidentify 50% more discoveries at the same 10% FDR threshold.Lapatinib has been approved for the treatment of HER2positive breast cancers. BB-FDR indicates that BRCA1 andBRCA2 are associated with responses to Lapatinib. Both are6. ConclusionWe presented Black Box FDR (BB-FDR), an empiricalBayes method that increases statistical power in multiexperiment scientific studies when side information is available for each experiment. BB-FDR combines deep proba

1Data Science Institute, Columbia University, New York, NY, USA 2Department of Systems Biology, Columbia University Medical Center, New York, NY, USA 3Department of Statistics, Columbia University, New York, NY, USA 4Department of Com-puter Science, Columbia University, New York, NY, USA. Corre-spondence to: Wesley Tansey wt2274@columbia.edu .

Related Documents:

FDR asked Congress to expand SC from 9 justices to 15, the new six being pro-ND. FDR misjudged mood of Americans. Citizens alarmed at FDR's grab for power Feared FDR attacking American system of govt FDR had to back down 'A switch in time saves nine' Less obstructionist from fall '35 onward 7 justices retired allowing

CIR vs FDR. 2011‐02‐28 5 yCIR Asphalt only 75 - 100 mm thick Must be covered yFDR Asphalt granular base 100 - 300 mm thick Must be covered CIR vs FDR yCompare and characterize different CIR and FDR mixes yEvaluate the effect of post-compaction on FDR and CIR Objectives.

Box 1 1865-1896 Box 14 1931-1932 Box 27 1949 Box 40 1957-1958 Box 53 1965-1966 Box 2 1892-1903 Box 14 1932-1934 Box 28 1950 Box 41 1959 Box 54 1966-1967 Box 3 1903-1907 Box 16 1934-1936 Box 29 1950-1951 Box 42 1958-1959 Box 55 1967 Box 3 1907-1911 Box 17 1936-1938 Box 30 1951-1952 Box 43 1959 Box 56 1967-1968 Box 5 1911-

Columbia 25th Birthday Button, 1992 Columbia 25th Birthday Button, 1992 Columbia Association's Celebrate 2000 Button, 1999 Columbia 40th Birthday Button, 2007 Lake-Front Live, Columbia Festival of the Arts Button, n.d. Columbia 22nd Birthday Button, 1989 I Love Columbia Button, n.d. Histor

pd8b3417.frm eng6 black black black 239-7q, black 239-6r, black eng8 239-8q, black eng11 239-10j , black 239-4bb, black 239-1jj , black black 239-15g, black

July 25 2012 ATST GIS FDR 3 . More Importantly July 25 2012 ATST GIS FDR 4 . Applicable Codes and . ANSI/RIA R15.06 Industrial Robots and Robot Systems - Safety Requirements July 25 2012 ATST GIS FDR 5 . Industrial Robot “An automatically controlled, reprogrammable multipurpose man

84 FDR & Hoover Presentation Notes 2017 18.notebook 4 April 18, 2018 Group Tasks Spirit 301 302 & 302 303 Spirit 303 304 & 304 Spirit 304 305 & 305 306 Write a statement from each candidate expressing their point of view on the issue you read about that could have been part of a debate between FDR &

Security activities in scrum control points 23 Executive summary 23 Scrum control points 23 Security requirements and controls 24 Security activities within control points 25 References 29 Risk Management 30 Executive summary 30 Introduction 30 Existing frameworks for risk and security management in agile software development 34 Challenges and limitations of agile security 37 a suggested model .