Bayesian Time Varying Coefficient Model With Applications To Marketing .

1y ago
6 Views
1 Downloads
1.82 MB
6 Pages
Last View : 1m ago
Last Download : 2m ago
Upload by : Elise Ammons
Transcription

Bayesian Time Varying Coefficient Model with Applications toMarketing Mix ModelingEdwin NgZhishi WangAthena DaiUber Technologies, Incedwinng@uber.comUber Technologies, Inczhishiw@uber.comUber Technologies, Incathena.dai@uber.comABSTRACTBoth Bayesian and varying coefficient models are very useful toolsin practice as they can be used to model parameter heterogeneity ina generalizable way. Motivated by the need of enhancing MarketingMix Modeling (MMM) at Uber, we propose a Bayesian Time VaryingCoefficient (BTVC) model, equipped with a hierarchical Bayesianstructure. This model is different from other time-varying coefficient models in the sense that the coefficients are weighted over aset of local latent variables following certain probabilistic distributions. Stochastic Variational Inference (SVI) is used to approximatethe posteriors of latent variables and dynamic coefficients. Theproposed model also helps address many challenges faced by traditional MMM approaches. We used simulations as well as real-worldmarketing datasets to demonstrate our model’s superior performance in terms of both accuracy and interpretability.CCS CONCEPTS Theory of computation Probabilistic computation; Information systems Computational advertising; Mathematics of computing Bayesian computation.KEYWORDSMarketing Mix Modeling, Time Varying Coefficient Model, Hierarchical Bayesian Model, Bayesian Time SeriesACM Reference Format:Edwin Ng, Zhishi Wang, and Athena Dai. 2021. Bayesian Time VaryingCoefficient Model with Applications to Marketing Mix Modeling. In Proceedings of The 27th ACM SIGKDD Conference on Knowledge Discovery andData Mining, August 14–18, 2021, Singapore (KDD ’21). ACM, New York, NY,USA, 6 pages.1INTRODUCTIONMarketing as an essential growth driver accounts for sizable investment levels at many companies. Given these large investments, itis not surprising that understanding the return and optimizing theallocation of marketing investment is of foundational importance tomarketing practitioners. For many decades Marketing Mix Model(a.k.a. MMM) has been leveraged as one of the most importanttools in marketers’ arsenal to address such needs. Recent consumerPermission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from permissions@acm.org.KDD ’21, August 14–18, Singapore 2021 Association for Computing Machinery.privacy initiatives (e.g., Apple’s announcement of no-IDFA 1 in iOS14) further underscores the strategic importance of future-proofingany marketing measurement game plan with MMM.While randomized experiments [26] and causal models [14] areoften used for causal inference, they can be either costly or simply infeasible [17] under some circumstances. As an alternative,MMM offers a solution by leveraging aggregated time-series dataand regression to quantify the relationship between marketing anddemand [19]. MMM could be further tailored and enhanced fordifferent requirements and purposes such as controlling for seasonality, trend, and other control factors [16] and introducing thegeo-level hierarchy [24]. More importantly, the primary use caseof MMM is often not to predict the sales, but rather to quantify themarginal effects of the different marketing tactics.There are various issues and challenges that have to be accountedfor when building MMM [5]. First, advertising media is evolving ata fast pace so it requires modelers to take into account new marketing levers constantly. It ends up forcing modelers to face the “smalln large p” problem. Second, in order to have actionable insights,modelers tend to pick a high level of data granularity. However,higher level of data granularity may lead to sparse observationsand outliers. Practitioners need to strike a balance between thelimited amount of reliable historical data and a proper level of datagranularity. Third, the sequential nature of the data makes it moresusceptible to correlated errors, which violates the basic modelassumption of ordinary least squares [21]. Fourth, there are severeendogeneity and multicollinearity concerns due to common marketing planning practices and media dynamics. For instance, settingmarketing budget as percent of expected revenue is widely used,which contributes to both endogeneity as well as multicollinearity(i.e., highly correlated channel-level spend) in the models. Selfselection bias especially for the demand-capturing channels suchas branded paid search [2] can also lead to inflated measurementresults if not properly addressed. Fifth, in practice MMM usuallyinvolves a large amount of investment and a diverse set of stakeholders with whom alignments need to be secured. As such thebar for model interpretability is very high. Lastly, it is often hardto rely on traditional machine learning approaches such as crossvalidation when tuning parameters and choosing models for MMM.Modelers cannot just fall back on that, since there is rarely enoughdata and/or the holdout periods may not be representative of thechallenging series to forecast.It has been a long journey to build an in-house MMM solution from zero to one at Uber, which takes collaborative effortsacross marketers, engineers and data scientists. Throughout thisjourney, we seek to address all of the above challenges. The preferred modeling solution needs to have the capability of deriving1 -and-data-use/

KDD ’21, August 14–18, SingaporeEdwin Ng, Zhishi Wang, and Athena Daitime-varying elasticity along with other temporal patterns on observational studies. More importantly, randomized experimentationresults, which are generally deemed as the golden standard in measuring causality, should be incorporated to calibrate the marginaleffects in marketing levers. Benefits from both experimentation andregression modeling can be maximized when combined into oneholistic framework.In this paper, we introduce a class of Bayesian time varyingcoefficient (BTVC) models that power Uber’s MMM solution. Ourwork brings the ideas of Bayesian modeling and kernel regressiontogether. The Bayesian framework allows a natural way to incorporate experimentation results, and understand the uncertaintyof measurement for different marketing levers. The kernel regression is used to produce the time-varying coefficients to capture thedynamics of marketing levers in an efficient and robust way.The remainder of this paper is organized as follows. In section 2,we describe the problem formulation and the related work. In section 3, we discuss the proposed modeling framework with emphasison applications to MMM. In section 4, simulations as well as realcase benchmark studies are presented. In section 5, we talk abouthow to deploy the proposed models using Uber’s modern machinelearning platform. section 6 is about the conclusion.2 PROBLEM FORMULATION2.1 Basic Marketing Mix ModelExpressing sales as a function of spending variables with diminishing marginal returns [9] is one of the fundamental properties inan attribution or marketing response model. In view of that, ourmodel can be expressed in a multiplicative format as belowŷt д(t) ·PÖft ,p (x t ,p ), t 1, · · · ,T ,(1)p 1where x t ,p are the regressors (i.e., the ads spending variables in ourcase), yˆt is the marketing response, д is a time-series process, f isthe cost curve function, P is the number of regressors, and T is thenumber of time points. Choice of f is desired to have the followingproperties such that ŷt has an explainable structure to decompose into differentdriving factors, temporal effects such as trend and seasonality of ŷt are captured, ft ,p is differentiable and monotonically increasing, ŷt has diminishing marginal returns with respect to x t ,p .Equation 1 has an intuitive form asŷt e lt · e st ·PÖx t ,p βt ,p , 0 βt ,p 1, t, p,(2)p 1where e lt is the trend component, e st is the seasonality, and βt ,pare channel-specific time-varying coefficients.2.2Related WorkWith a log-log transformation, equation (2) can be re-written asln(ŷt ) lt st PÕln(x t ,p )βt ,pp 1(3) lt st r t , t 1, · · · ,T ,A natural idea is to use state-space models such as DynamicLinear Model (DLM) [27] or Kalman filter [7] to solve equation (3).However, there are some caveats associated with these approaches,especially given the goal we want to achieve with MMM: DLM with Markov Chain Monte Carlo(MCMC) sampling isnot efficient and could be costly especially for high-dimensionalproblems, which requires sampling for a large number ofregressors and time steps. Although the Kalman filter provides analytical solutions, itheavily relies on the assumption that the noise is normallydistributed, which might be violated in real applications.It also has limited room for further customization such asingesting priors on coefficients and applying restrictions oncoefficient signs (e.g., positive coefficient sign for marketingspend).Meanwhile, there have been parametric and non-parametric statistical methods proposed [8]. Wu and Chiang [28] considered anonparametric varying coefficient regression model with longitudinal dependent variable and cross-sectional covariates. Two kernelestimators based on componentwise local least squares criteria wereproposed to estimate the time varying coefficients. Li et al. [18]proposed a semiparametric smooth coefficient model as a useful yetflexible specification for studying a general regression relationshipwith time varying coefficients. It used a local least squares methodwith a kernel weight function to estimate the smooth coefficientfunction. Nonetheless, the frequentist approaches can be expensivewhen doing local estimates with respect to time dimension. Thereare also no straightforward ways to incorporate information fromexperimentation results. As such, we are motivated to develop anew approach to derive time varying coefficients under a Bayesianframework for our MMM applications.3 METHODS3.1 Time Varying Coefficient RegressionIn view of the increased complexity of regression problem in practical MMM, we propose a Bayesian Time Varying Coefficient (BTVC)model, as inspired by the Generalized Additive Models (GAM) [11]and kernel regression smoothing. The key idea behind BTVC is toexpress regression coefficients as a weighted sum of local latentvariables.First, we define a latent variable b j,p for the p-th regressor attime t j , p 1, · · · , P, j 1, · · · , J , t j {1, · · · ,T }. There are Jlatent variables in total for each regressor. From the perspective ofspline regression, b j,p can be viewed as a knot distributed at timet j for a regressor. w is a time-based weighting function such thatβt ,p Õjw j (t) · b j,p ,(4)

Bayesian Time Varying Coefficient Model with Applications to Marketing Mix ModelingIt is intuitive to use a weighting function taking into accountthe time distance between t j and t,JÕw j (t) k(t, t j )/ k(t, t j ),(5)j 1where k(·, ·) is the kernel function, and the denominator is to normalize the weights across knots. In practice, we have differentchoices for the kernel functions, such as Gaussian kernel, quadratickernel or any other custom kernels. In subsection 3.3, we will discuss this in more detail.We can also rewrite Equation 4 into a matrix formβ Kb,(6)where β is the T P coefficient matrix with entries βt ,j , K is the T Jkernel matrix with normalized weight entries w j (t), and b is theJ P knot matrix with entries b j,p . At time point t, the regressioncomponentr t X t βtT ,(7)spending data,KDD ’21, August 14–18, Singapore2µ reg N (µ pool, σpool),2breg N (µ reg, σreg),(11)where the superscript means a folded normal distribution (positive restriction on the coefficient signs).In the hierarchy, the latent variable µ reg depicts the overall meanof a set of knots of a single marketing lever. We can treat thisas the overall estimate of a channel coefficient across time. Thisprovides two favorable behaviors for the model: during a periodwith absence of spending for a channel, coefficient knot estimationof such a channel exhibits a shrinkage effect towards the overallestimate; it helps fight against over-fitting due to the volatile localstructure.The two-layer hierarchy is widely adopted in hierarchical Bayesianmodels and the shrinkage property is sometimes called the poolingeffect on regression coefficients [10]. Figure 1 depicts the modelflowchart of BTVC. Stochastic Variational Inference [13] is used toestimate the knot coefficient posteriors from which time varyingcoefficient estimates can be derived using the formulas above inEquation 7, Equation 8, and Equation 9.where βt (βt ,1, · · · , βt ,p ) and X t is the t-th row of regressorcovariate matrix.Besides the regression component, we can also apply Equation 4to other components such as trend and seasonality in Equation 3.Specifically, for the trend component,β lev K levblev,lt βt ,lev .(8)The trend component can be viewed as a dynamic intercept. Forthe seasonality component,β seas K seasbseas,st X t ,seas βtT,seas .(9)X t ,seas is the t-th row of seasonality covariate matrix derived fromFourier series. In subsection 3.4, we will discuss the seasonality inmore detail.Instead of estimating the local knots directly (i.e., b, blev , andbseas in the above equations) by optimizing an objective function,we introduce the Bayesian framework along with customizablepriors to conduct the posterior sampling.3.2Bayesian FrameworkTo capture the sequential dynamics and cyclical patterns, we usethe Laplace prior to model adjacent knotsb j,lev Laplace(b j 1,lev, σlev ),b j,seas Laplace(b j 1,seas, σseas ).(10)The initial values (b0,lev and b0,seas ) can be sampled from a Laplacedistribution with mean 0. A similar approach can be found in models implemented in Facebook’s Prophet package [25], which usesLaplace prior to model adjacent change points of the trend component.For the regression component, we introduce a two-layer hierarchy for more robust sampling due to the sparsity in the channelFigure 1: BTVC model flowchart. Blue boxes present theprior related input, where priors derived from lift tests forspecific channels can be readily ingested in the framework.Orange box represents the kernel function in use. Greenboxes represent the sampled posteriors, which are also thequantities of interest.3.3Kernel SelectionFor the kernel function used for trend and seasonality, we proposea customized kernel, i.e.,if t j t t j 1 and l {j, j 1},k lev (t, tl ) 1 t tl ;t j 1 t j(12)otherwise zero values are assigned. This kernel bears some similarity with the triangular kernel.

KDD ’21, August 14–18, SingaporeEdwin Ng, Zhishi Wang, and Athena DaiModel β 1 (t)β 2 (t)β 3 (t)BSTS0.0067 0.0078 0.0080tvReg 0.0103 0.0103 0.0096BTVC 0.0030 0.0026 0.0029Table 1: Average of mean squared errors based on 100 timessimulations.For the kernel function used for regression, we adopt the Gaussian kernel, i.e.,!(t t j )2k reg (t, t j ; ρ) exp ,(13)2ρ 2where ρ is the scale parameter. Other kernels such as Epanechnikovkernel and quadratic kernel etc., could also be leveraged for theregression component.3.4SeasonalitySeasonality is a pattern that repeats over a regular period in a timeseries. To estimate seasonality, a standard approach is to decomposetime-series into trend, seasonality and irregular components usingFourier analysis [6]. This method represents the time series by a setof elementary functions called basis such that all functions understudy can be written as linear combinations of the elementaryfunctions in the basis. These elementary functions involve thesine and cosine functions or complex exponential. The Fourierseries approach describes the fluctuation of time series in terms ofsinusoidal behavior at various frequencies.Specifically, for a given period S and a given order k, two seriescos(2kπt/S) and sin(2kπt/S) will be generated to capture the seasonality. For example, with daily data, S 7 represents the weeklyseasonality, while S 365.25 represents the yearly seasonality.4 RESULTS4.1 Simulations4.1.1 Coefficient Curve Fitting. We conduct a simulation studybased on the following modelyt trend β 1t x 1t β 2t x 2t β 3t x 3t ϵt , t 1, · · ·T ,where the trend and β 1t , β 2t , β 3t are all random walks. The covariates x 1t , x 2t , x 3t N (3, 1) are independent of the error termϵt N (0, .3).In our study, we compare BTVC with two other time varyingregression models available in R CRAN: Bayesian structural timeseries (BSTS) [22], and time varying coefficient for single and multiequation regressions (tvReg) [4]. We set T 300 and calculate theaverage Mean Squared Errors (MSE) against the truth of each regressor across 100 simulations. The estimated coefficient curvesof a sample is plotted in Figure 2. The results are reported in Table 1, which demonstrates that BTVC has a better accuracy on thecoefficient estimation over the other two models in consideration.4.1.2 Experimentation Calibration. One appealing property of theBTVC model is its flexibility to ingest any experimentation basedpriors for any regressors (e.g., advertising channels) since experiments are often deemed as a trustworthy source to tackle thechallenges as mentioned in section 1.Figure 2: Comparison of the BSTS, tvReg and BTVC estimates of coefficient functions. The true values are plottedin grey dots, the blue line is BSTS estimate, the green linerefers to tvReg and the red line for BTVC.To illustrate this feature of BTVC, we first fit a BTVC model onthe simulated data, which are generated using a similar simulationscheme as outlined in subsubsection 4.1.1. Next, we assume thereis one lift test for the first and third regressors, respectively, andtwo lift tests for the second regressor. All the tests have a 30-stepduration. We use the simulated values as the “truth” derived fromthe tests, and ingest them as priors into BTVC models. The resultsare summarized in Figure 3. As expected, the confidence intervalsduring the ingestion periods and the adjacent neighborhood becomenarrower, compared to the ones without prior knowledge. Moreover,with this calibration, the coefficient curves are more aligned withthe truth around the neighborhood of the test ingestion period. Todemonstrate this, in Table 2 we calculated the symmetric meanabsolute percentage error (SMAPE) and pinball loss (with 2.5%and 97.5% target quantiles) between the truth and the estimationfor the following 30 steps after the prior ingestion period. Moreimportantly, at Uber we apply BTVC to real MMM applications. Itsflexibility of ingesting multiple experimentation insights leads tosignificant improvement in attribution projection accuracy.4.2Real Case StudiesTo benchmark the model’s forecasting accuracy, we conduct a realcase study using Uber Eats’ data across 10 major countries or markets. Each country series consists of the daily number of first ordersin Uber Eats by newly acquired users. The data range spans from Jan2018 to Jan 2021 including a typical Covid-19 period. The schemechange caused by Covid-19 poses a big challenge for modeling.We compared BTVC with two other time series modeling techniques, SARIMA [23] and Facebook Prophet [25]. Both Prophet

Bayesian Time Varying Coefficient Model with Applications to Marketing Mix ModelingKDD ’21, August 14–18, SingaporeSMAPEPinball Lossw/o priorsw priorsw/o priors w priorslower upper lower upperβ 1 (t)0.390.210.0009 0.0032 0.0005 0.0019β 2 (t)1.371.250.0021 0. 0019 0.0011 0.0014β 3 (t)0.300.180.0017 0.0028 0.0014 0.0013Table 2: SMAPE and pinball loss of coefficient estimates formodels without and with prior ingestions. The metrics arecalculated using the 30-step coefficients following the testingestion period. Lower (2.5%) and upper(97.5%) quantilesare reported for pinball loss. With prior ingestion, the coefficient estimation accuracy is improved significantly.(a) without prior ingestionModelMean of SMAPE Std of 45Table 3: SMAPE comparison across models. It shows thatBTVC outperforms the other two models in terms of average SMAPE across 10 countries.We use out-of-sample SMAPE as the benchmark performancemetric with 28 day forecast horizon on 6 different splits with expanding training windows (i.e., 6 different cuts of the data withincremental training size).Figure 4 depicts the SMAPE results across the 10 countries, andTable 3 gives the average and standard deviation of SMAPE valuesfor the three models in consideration. As a result, BTVC outperforms the other two models for the majority of 10 countries in termsof SMAPE.(b) with prior ingestionFigure 3: (a) The coefficient estimation of BTVC mode without the prior ingestion. (b) The coefficient estimation ofBTVC model with the prior ingestion. The test ingestion periods are highlighted in blue dots. The black solid lines arethe simulated truth, while the red ones are the estimations.The shaded bands represent the 95% confidence intervals.and BTVC models use Maximum A Posterior (MAP) estimates andthey are configured as similarly as possible in terms of optimizationand seasonality settings. For SARIMA, we fit the (1, 1, 1) (1, 0, 0)Sstructure by maximum likelihood estimation (MLE) where S represents the choice of seasonality. In our case, S 7 for the weeklyseasonality.Figure 4: Bar plots of SMAPE on 10 countries with Uber Eatsdata.Blue bars represent the results from BTVC, orange onesProphet, and yellow ones SARIMA.5 ARCHITECTURE5.1 ImplementationWe implemented the BTVC model as a feature branch in our opensourced package Orbit [20] by Uber. Orbit is a software packageaiming to simplify time series inferences and forecasting with structural Bayesian time series models for real-world cases and research.It provides a familiar and intuitive initialize-fit-predict interface

KDD ’21, August 14–18, Singaporefor time series tasks, while utilizing the probabilistic programminglanguages such as Stan [3] and Pyro [1] under the hood.5.2DeploymentThe BTVC deployment system is customized by leveraging Michelangelo [12], a machine learning (ML) platform developed by Uber.Michelangelo provides centralized workflow management for endto-end modeling process.With the help of Michelangelo, BTVCdeployment system is able to automate data preprocessing, modeltraining, validations, predictions, and measurement monitoring atscale.Figure 5: BTVC deployment system.The deployment workflow summarized in Figure 5 consists ofthree main components: Hyperparameter management layer: this is to store and manage the various supplement data needed for BTVC modeltraining such as normalization scalar, adstock [15], lift testbased priors, as well as model specific hyperparameters. Orchestration layer: this is to upload and trigger the modeltraining job. Model container: a docker container including all the essential modeling code to be integrated with Michelangelo’secosystem.6CONCLUSIONIn this paper, we propose a Bayesian Time Varying Coefficient(BTVC) model in particular developed for MMM applications atUber. By assuming the local latent variables follow certain probabilistic distributions, a kernel-based smoothing technique is appliedto produce the dynamic coefficients. This modeling frameworkentails a comprehensive solution for the challenges faced by traditional MMM. More importantly, it enables marketers to leveragemultiple experimentation results in an intuitive yet scientific way.Simulations and real case benchmark studies demonstrate BTVC’ssuperiority in prediction accuracy and flexibility in experimentationingestion. We also present the model deployment system, which canserve model training and predictions in real time without humanoversight or intervention in a scalable way.Edwin Ng, Zhishi Wang, and Athena Dai7ACKNOWLEDGMENTSThe authors would like to thank Sharon Shen, Qin Chen, Ruyi Ding,Vincent Pham, and Ariel Jiang for their help on this project, DirkBeyer, and Kim Larsen for their comments on this paper.REFERENCES[1] Eli Bingham, Jonathan P Chen, Martin Jankowiak, Fritz Obermeyer, Neeraj Pradhan, Theofanis Karaletsos, Rohit Singh, Paul Szerlip, Paul Horsfall, and Noah DGoodman. 2019. Pyro: Deep universal probabilistic programming. The Journal ofMachine Learning Research 20, 1 (2019), 973–978.[2] Thomas Blake, Chris Nosko, and Steven Tadelis. 2015. Consumer heterogeneityand paid search effectiveness: A large-scale field experiment. Econometrica 83, 1(2015), 155–174.[3] Bob Carpenter, Andrew Gelman, Matthew D Hoffman, Daniel Lee, Ben Goodrich,Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell.2017. Stan: A probabilistic programming language. Journal of statistical software76, 1 (2017).[4] Isabel Casas and Ruben Fernandez-Casal. 2021. tvReg: Time-Varying CoefficientsLinear Regression for Single and Multi-Equations. https://CRAN.R-project.org/package tvReg R package version 0.5.4.[5] David Chan and Mike Perry. 2017. Challenges and opportunities in media mixmodeling. (2017).[6] Alysha M De Livera, Rob J Hyndman, and Ralph D Snyder. 2011. Forecastingtime series with complex seasonal patterns using exponential smoothing. Journalof the American statistical association 106, 496 (2011), 1513–1527.[7] James Durbin and Siem Jan Koopman. 2012. Time series analysis by state spacemethods. Oxford university press.[8] Jianqing Fan and Wenyang Zhang. 2008. Statistical methods with varying coefficient models. Statistics and its Interface 1, 1 (2008), 179.[9] Paul W Farris, Dominique M Hanssens, James D Lenskold, and David J Reibstein. 2015. Marketing return on investment: Seeking clarity for concept andmeasurement. Applied Marketing Analytics 1, 3 (2015), 267–282.[10] Andrew Gelman, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, andDonald B Rubin. 2013. Bayesian data analysis. CRC press.[11] Trevor J Hastie and Robert J Tibshirani. 1990. Generalized additive models. Vol. 43.CRC press.[12] Jeremy Hermann and Mike Del Balso. 2017. Meet Michelangelo: Uber’s machinelearning platform. https://eng.uber.com/michelangelo/.[13] Matthew D Hoffman, David M Blei, Chong Wang, and John Paisley. 2013. Stochastic variational inference. Journal of Machine Learning Research 14, 5 (2013).[14] Guido W Imbens and Donald B Rubin. 2015. Causal inference in statistics, social,and biomedical sciences. Cambridge University Press.[15] Yuxue Jin, Yueqing Wang, Yunting Sun, David Chan, and Jim Koehler. 2017.Bayesian methods for media mix modeling with carryover and shape effects.(2017).Data Science Can’t Replace Human Mar[16] Kim Larsen. 2018.keters Just Yet – Here’s 55d85c[17] Randall A Lewis and Justin M Rao. 2015. The unfavorable economics of measuringthe returns to advertising. The Quarterly Journal of Economics 130, 4 (2015), 1941–1973.[18] Qi Li, Cliff J Huang, Dong Li, and Tsu-Tan Fu. 2002. Semiparametric smoothcoefficient models. Journal of Business & Economic Statistics 20, 3 (2002), 412–422.[19] E Jerome McCarthy. 1978. Basic marketing: a managerial approach. RD Irwin.[20] Edwin Ng, Zhishi Wang, Huigang Chen, Steve Yang, and Slawek Smyl. 2021. Orbit:Probabilistic Forecast with Exponential Smoothing. arXiv:stat.CO/2004.08492[21] John O Rawlings, Sastry G Pantula, and David A Dickey. 2001. Applied regressionanalysis: a research tool. Springer Science & Business Media.[22] Steven L Scott and Hal R Varian. 2014. Predicting the present with bayesianstructural time series. International Journal of Mathematical Modelling andNumerical Optimisation 5, 1-2 (2014), 4–23.[23] Skipper Seabold and Josef Perktold. 2010. statsmodels: Econometric and statisticalmodeling with python. In 9th Python in Science Conference. Package version0.11.1.[24] Yunting Sun, Yueqing Wang, Yuxue Jin, David Chan, and Jim Koehler. 2017.Geo-level Bayesian hierarchical media mix modeling. (2017).[25] Sean J Taylor and Benjamin Letham. 2018. Forecasting at scale. The AmericanStatistician 72, 1 (2018), 37–45. Package version 0.7.1.[26] Jon Vaver and Jim Koehler. 2011. Measuring ad effectiveness using geo experiments. (2011).[27] Mike West and Jeff Harrison. 2006. Bayesian forecasting and dynamic models.Springer Science & Business Media.[28] Colin O Wu and Chin-Tsang Chiang. 2000. Kernel smoothing on varying coefficient models with longitudinal dependent variable. Statistica Sinica (2000),433–456.

new approach to derive time varying coefficients under a Bayesian framework for our MMM applications. 3 METHODS 3.1 Time Varying Coefficient Regression In view of the increased complexity of regression problem in practi-cal MMM, we propose a Bayesian Time Varying Coefficient (BTVC) model, as inspired by the Generalized Additive Models (GAM) [11]

Related Documents:

Ng, Wang and Dai. 2021. Bayesian Time Varying Coefficient Model with Applications to Marketing Mix Modeling. Conclusion Traditional Marketing Mix Models struggle with endogenous variables, multicollinearity and correlation vs. causality challenge. Bayesian Time-Varying Coefficient (BTVC) model solves these

value of the parameter remains uncertain given a nite number of observations, and Bayesian statistics uses the posterior distribution to express this uncertainty. A nonparametric Bayesian model is a Bayesian model whose parameter space has in nite dimension. To de ne a nonparametric Bayesian model, we have

Computational Bayesian Statistics An Introduction M. Antónia Amaral Turkman Carlos Daniel Paulino Peter Müller. Contents Preface to the English Version viii Preface ix 1 Bayesian Inference 1 1.1 The Classical Paradigm 2 1.2 The Bayesian Paradigm 5 1.3 Bayesian Inference 8 1.3.1 Parametric Inference 8

example uses a hierarchical extension of a cognitive process model to examine individual differences in attention allocation of people who have eating disorders. We conclude by discussing Bayesian model comparison as a case of hierarchical modeling. Key Words: Bayesian statistics, Bayesian data a

Key words Bayesian networks, water quality modeling, watershed decision support INTRODUCTION Bayesian networks A Bayesian network (BN) is a directed acyclic graph that graphically shows the causal structure of variables in a problem, and uses conditional probability distributions to define relationships between variables (see Pearl 1988, 1999;

Bayesian" model, that a combination of analytic calculation and straightforward, practically e–-cient, approximation can ofier state-of-the-art results. 2 From Least-Squares to Bayesian Inference We introduce the methodology of Bayesian inference by considering an example prediction (re-gression) problem.

ion coefficient. The average absorption coefficient is defined as the ratio between the total absorption in the hall to the total surface area of the hall. 2 Measurement of sound absorption coefficient. Let us consider a smaple fo. r which the absorption coefficient (a. m) is to be measured.

par catégorie alimentaire. A partir des informations disponibles dans les listes d’ingrédients, il est parfois délicat pour un même libellé d’ingrédient de différencier son utilisation en tant qu’additif ou en tant que substance à usage d’enrichissement (exemple : acide ascorbique). Pour ce rapport et pour ces substances, il a été décidé, par convention (choisie), de .