A Multistage Architecture For Statistical Inference With .

2y ago
18 Views
2 Downloads
963.64 KB
10 Pages
Last View : 2m ago
Last Download : 3m ago
Upload by : Louie Bolen
Transcription

J Sign Process SystDOI 10.1007/s11265-015-1020-6A Multistage Architecture for Statistical Inferencewith Stochastic Signal AcquisitionRyan M. Corey1 · Andrew C. Singer1Received: 31 January 2015 / Accepted: 23 June 2015 Springer Science Business Media New York 2015Abstract We describe a statistical inference approach fordesigning signal acquisition interfaces and inference systems with stochastic devices. A signal is observed by anarray of binary comparison sensors, such as highly scaledcomparators in an analog-to-digital converter, that exhibitrandom offsets in their reference levels due to process variations or other uncertainties. These offsets can limit theperformance of conventional measurement devices. In ourapproach, we build redundancy into the sensor array and usestatistical estimation techniques to account for uncertaintyin the observations and produce a more reliable estimate ofthe acquired signal. We develop an observational model andfind a Cramér-Rao lower bound on the achievable squareerror performance of such a system. We then propose a twostage inference architecture that uses a coarse estimate toselect a subset of the sensor outputs for further processing, reducing the overall complexity of the system whileachieving near-optimal performance. The performance ofthe architecture is demonstrated using a simulated prototypefor parameter estimation and symbol detection applications.The results suggest the feasibility of using unreliable components to build reliable signal acquisition and inferencesystems.This work was supported in part by Systems on Nanoscale Information fabriCs (SONIC), one of the six STARnet centers sponsored by MARCO and DARPA. This material is based upon worksupported by the National Science Foundation Graduate ResearchFellowship Program under Grant Number DGE-1144245. Ryan M. Coreycorey1@illinois.edu1University of Illinois at Urbana-Champaign, Urbana,Illinois, USAKeywords Statistical inference · Parameter estimation ·Stochastic circuits · Quantization1 IntroductionThis paper addresses the problem of statistical decisionmaking using sensors that exhibit uncertainty in their behavior. In particular, we are concerned with a sensor array thatmeasures an unknown variable by comparing it to a number of reference levels, such as a flash analog-to-digitalconverter (ADC) that measures a voltage using a seriesof electronic comparator circuits. The sensors do not havefixed known reference levels; instead, the levels vary randomly around a nominal reference level. Such sensors mayhave advantages in size, power, speed, or cost compared tomore reliable sensors, but cannot be used in conventionaldeterministic architectures. We propose an alternative architecture that leverages redundancy and statistical inferencetechniques to build reliable inference systems from suchunreliable components.An important motivation of this work is electronic mixedsignal interfaces built with emerging circuit technologiesthat exhibit larger behavioral uncertainty than conventionaltechnologies. High-speed analog-to-digital converter circuits, for example, are often limited in speed and power bylarge comparator circuits. Smaller circuits generally operate faster and consume less power, but are more sensitiveto process variations. Small comparators may exhibit uncertain offsets to their reference levels, in part, because ofmismatch between transistor threshold voltages [1]. Thesevoltage variations, which can be caused by dopant fluctuations during production, are generally modeled as having anormal distribution with variance inversely proportional tothe gate area [2]. If the random offsets are large compared

J Sign Process Systto the spacing between nominal reference levels, they cancause errors in the output of a conventional converter.There have been a number of proposed designs for ADCsusing comparator circuits with large offsets. Since the offsets are generally static with time, they can be correctedusing analog offset cancellation [3] or digital techniques,such as disabling comparators with large offsets [4] or usingextra logic for error correction [5]. These designs all corrector compensate for the offsets so that the converter behavesmore like a conventional flash ADC. Other recent work hasdirectly leveraged the offset statistics rather than compensating for them. In [6], many comparators were producedwith intentionally large variations and only a few were chosen to create a reference ladder. The comparators in [7] werealso designed to be highly variable, but the true levels werenot measured; instead, the outputs were averaged to producean estimate based on the offset statistics.Although this problem has been addressed primarily inthe circuits literature, it can be framed as a more generalstatistical inference problem: there is an unknown parameter (the input voltage) that is observed indirectly using aset of uncertain observations (sensor outputs) whose statistics depend on that parameter. We can therefore apply toolsfrom parameter estimation and machine learning to designa robust inference architecture that incorporates and evenexploits device-level uncertainty. The results presented heredo not apply only to quantization circuits, but to any problem in which a parameter is observed via uncertain binarythresholds. Indeed, the mathematical model developed inSection 2 is equivalent to a distributed estimation systemsubject to bandwidth constraints [8, 9]. The results in [9]for parameter estimation over a bandlimited sensor networkparallel the results presented here, including the form of theCramér-Rao lower bound. A key advantage of a statisticalinference approach is that it can be applied at the systemlevel: if the sensor array is to be used as a front-end to adecision-making system, such as a classifier, the array outputs can be used directly for inference; there is no need todesign a separate quantizer and classifier.In this paper, which expands on results first presentedin [10], we consider a general architecture with an abitrarynumber of nominal levels and of sensors per level, generalizing both the conventional flash ADC (many levels withone sensor per level) and the design of [7] (few levels withmany sensors each). We develop an observational modelthat relates the array outputs to the input parameter and theoffset statistics, then use this model to predict the achievableperformance of an optimal parameter estimator. Becauseone motivation of using unreliable sensors is to improvespeed and efficiency, it may not be practical to use suchan optimal estimator; fortunately, the statistical model suggests an effective strategy for approximate inference usinga multistage design. We will show how the sensor outputscan be reduced to a vector of sufficient statistics and thenfurther reduced to a scalar statistic with little loss of information. This design takes advantage of the device statisticswith relatively small computational overhead.2 System ModelThe sensor array architecture is shown in Fig. 1. The arrayobserves an unknown real-valued variable x using a numberof binary sensors. There are r sensors at each of n nominal reference levels v1 , . . . , vn . Each sensor has a randomreference level Vi,j that is offset from its nominal referencelevel vi for i 1, . . . , n and j 1, . . . , r. If r 1, thenthe architecture is a conventional reference ladder with onesensor assigned to each level. If n 1, then all of the sensors are identical, as in [7]. The choice of n and r enables atradeoff between performance and complexity.The offsets are assumed to be independent and identically distributed with a known cumulative distributionfunction (CDF) FV . This assumption is reasonable as longas the spatial correlation in offsets and the independentnoise on each sensor are small compared to the offsets.For comparator circuits, the offsets are often assumed tobe normally distributed and the numerical results, figures,and simulations in this paper will use a normal distribution; however, the analytical results are derived for arbitraryoffset , jYi, jFigure 1 The sensor array (top) uses r binary comparison sensors ateach of n nominal reference levels. Each sensor (bottom) has a randomoffset to its reference level. The sum of outputs at each nominal levelis a sufficient statistic for x.

J Sign Process SystEach sensor has a binary output Yi,j 1 if x Vi,j and 0otherwise. The set of Yi,j for i 1, . . . , n and j 1, . . . , ris the observed data to be used for inference about x. EachYi,j has a Bernoulli probability mass function (PMF): (1)pYi,j X (1 x) Pr x Vi,j Pr {x vi V }(2) Pr {V x vi }(3) FV (x vi ) .(4)For brevity, define Fi (x) FV (x vi ). Similarly,F̄i (x) pYi,j X (0 x) 1 FV (x vi ). When Fi isdifferentiable at x, denote the probability density function Fi (x). These nr binary observations(PDF) by fi (x) xand the corresponding PMFs can be used to estimate x.However, for high-resolution systems with many sensors,it may be impractical to use the full set of nr-dimensionaldata. Fortunately, the dimensionality of the observationcan be significantly reduced by making some reasonableassumptions about the system.Because the sensors all have independent and identicallydistributed random offsets, the observations can be reducedto an n-dimensional sufficient statistic T that fully preservesrthe information about x. Let Ti j 1 Yi,j for i 1, . . . , n. As the sum of independent Bernoulli variables,each Ti has a binomial conditional PMF: rpTi X (ti x) Fi (x)ti F̄i (x)r ti .(5)tiThe conditional mean of Ti is given byEx [Ti ] rFi (x) ,(6)where Ex [·] denotes the conditional expectation given x,and its conditional variance isVarx (Ti ) rFi (x) F̄i (x) .(7)Because each Ti is conditionally independent given x,the PMF of the total observation is the product of thesePMFs for i 1, . . . , n. If the offset density has finite support, as in Fig. 2, then it is possible that some or many ofthe Ti are deterministic; in particular, if Fi (x) 1 thenTi r and if Fi (x) 0 then Ti 0. These componentsprovide information by restricting the range of x for whichthe observation has nonzero probability, but do not otherwise contribute to the conditional PMF. For the remainingcomponents whose level densities have support at x, andv1v2v3xv4v5v6Figure 2 If the offset distributions have finite support, then only levels with support at x (bold curves) contribute to the PMF; the otherobservations are deterministic given x.therefore satisfy F̄i (x) 0, the PMF can be expressed inexponential form as r Fi (x) ti(8)pT X (t x) F̄i (x)rtiF̄i (x)i:F̄i (x) 0 h (t) expti ηi (x) Ai (x) , (9) where h (t) i:F̄i (x) 0 r i 1 ti , Ai (x) r ln F̄i (x), and nηi (x) ln Fi (x) /F̄i (x) .(10)If Fi is differentiable at x and fi (x) 0 for all i includedin the sum, then (9) forms a curved exponential family [11]with natural parameter η (x).Although T is a sufficient statistic, it may not be acomplete sufficient statistic; that is, there may exist a lowerdimensional statistic that also preserves the observed information about x. This is the case, in particular, when eachηi (x) is a linear function of x. If ηi (x) ai x bi , then thePMF can be expressed nai xti bi ti Ai (x)pT X (t x) h (t) expi 1 n ai ti x Ai (x) h (t) exp(11), (12)i 1where the terms containing bi have been absorbed intoh (t). By the completeness theorem for exponential families [12], the sum S ni 1 ai Ti is a complete sufficientstatistic. The natural parameter has a linear form if the random offsets have a logistic distribution; in that case, thesum of the outputs of all the sensors is a complete sufficient statistic and the observations can be represented by thisscalar with no loss of information about x. In Section 5, wewill use this special case to find approximate estimators formore general distributions.3 Statistical InferenceIn many inference applications, the system is designed toinfer the value of x using the observations from the sensorarray. There are several metrics that can be used to characterize the performance of the inference system. If x is knownto take values in a discrete set, such as a finite alphabet ofsymbols, then we are often concerned with the probabilityof detection error. If x is drawn from a continuous set, thenwe often evaluate the performance of an estimator x̂ (T)at a particular value of x by the bias, Ex x̂ (T) x ,andvariance, Varx x̂ (T) , of the estimator.In both cases, a reasonable decision-making strategy isthe maximum likelihood rule, which selects the value of x

J Sign Process Systthat maximizes the conditional probability of the observation given x:x̂ML (T) arg max ln pT X (T x)xηi (x) Ti Ai (x) arg maxx(13)(14)i:F̄i (x) 0If ηi (x) were linear for all i, then the maximum could beexpressed in terms of the scalar sufficient statistic:nx̂ML (T) arg maxxreference levels for which Fi (x) is differentiable at x andfi (x) 0. It is given by 2 I (x) Ex x)(22)lnp(TT X x 2 2 Exln pTi X (Ti x)(23) x 2i:fi (x) 0 2 Ex(24)(ηi (x) Ti Ai (x)) xi:fi (x) 0ai xTi bi Ti Ai (x)(15)i 1 nxAi (x)(16)i 1First consider the problem of estimating x over a continuous set using the maximum likelihood estimator (MLE).Assume that x is in the support of at least one level density,i.e., fi (x) 0 for at least one i {1, . . . , n}, so that theparameter is identifiable and the log-likelihood is differentiable. If the log-likelihood function (14) is strictly concavein x, its maximum is the solution to the likelihood equation ln pT X (T x) x (ηi (x) Ti Ai (x)) x0 (17)(18)i:fi (x) 0ηi (x) (Ti rFi (x)) .(19)i:fi (x) 0Substituting the mean of the binomial distribution (6), thelikelihood equation can also be expressedηi (x) Ti i:fi (x) 0ηi (x) Ex [Ti ] . rηi (x) fi (x) .(26)i:fi (x) 03.1 Maximum Likelihood Parameter Estimation (25)i:fi (x) 0(20)i:fi (x) 0In the special case where each ηi (x) is linear, (20) reducesto S (T) Ex [S (T)].3.2 Achievable PerformanceWe can use tools from information theory to bound theachievable performance of the estimator. An estimator iscalled unbiased if Ex x̂ (T) x. For any unbiased estimator x̂UE , the Cramér-Rao Lower Bound (CRLB) on theconditional variance is Varx x̂UE (T) I (x) 1 ,(21)where I (x) is the Fisher information provided by T aboutx [12]. The Fisher information is well defined for theFigure 3 shows the Fisher information contribution of asingle sensor as a function of the distance between the nominal level vi and the parameter x for normally distributedoffsets with mean zero and variance σ 2 . The sensors providethe most Fisher information about signals near their nominal reference levels and little information about signals farfrom their levels.To find a simple, approximate expression for the CRLBof a high-resolution sensor array, notice that Eq. 26 hasthe form of a probability-weighted mean of ηi (x) whichresembles an expectation. Suppose that the nominal reference levels are densely spaced a uniform distance Δv σapart and that x is far from the smallest and largest referencelevels. Let V be a random variable distributed according tothe offset PDF fV . The total Fisher information (26) can beapproximated asI (x) rΔvrΔvη (x vi ) f (x vi ) Δv(27)iη (x vi ) Pr {V (x vi , x vi 1 )} (28)i r E η (V ) . ΔvFisher information Ii (x) r arg max Sx fi (x)2Fi (x) F̄i (x)r(29)0.60.40.2064220x46viFigure 3 Fisher information contributed by observation Ti with normally distributed offsets. The information is largest when x vi .

J Sign Process SystFornormaldistribution with zero mean and variance σ 2 , the E η (V ) 1.806/σ by numerical integration. For a logis tic distribution with variance σ 2 , it is π/( 3σ ) in closedform. There are some densities, such as the uniform density,for which the Fisher information curve does not have finitearea and therefore this approximation is not valid. Figure 4shows the exact Fisher information as a function of x andΔv for a sample sensor array with normally distributed offsets. The approximation (29) is close when Δv σ . Thus,the high-resolution CRLB appears to be proportional toσ Δv/r, the ratio of the standard deviation of the offsets tothe average density of sensor levels.TSensorArrayxSubsetSelect.Solving the likelihood equation exactly requires the fullobservation vector T; yet Fig. 3 shows that only a subset ofthe sensor measurements contribute significant informationabout x. The calculation could be simplified significantlyif the low-information observations were removed. Therefore, we propose the two-step estimation architecture shownin Fig. 5. A simple coarse estimator makes a rough estimate of x and uses it to select a subset of the observations.That lower-dimensional observation vector is then passedto a more complex inference function that makes the finaldecision or estimate.CoarseEstimatex0range. For the remainder of this paper, we relabel Fk , fk ,Ak and ηk to correspond to the elements Tk of T for k 1, . . . , n . We assume that n is small enough that fk (x) 0 for all k.The likelihood for T has the same form as that for T (9): n ln pT X t x h t expηk (x) tk Ak (x) . k 1(30)Similarly, the Fisher information is4.1 Subset SelectionT .n Its length n is aDenote the selected subset bydesign choice that will be discussed later. For simplicity, wewill assume that n is fixed, though in implementation itmay be smaller for x values near the boundaries of the level4vvvv3432.512 n I (x) rηk (x) fk (x) .(31)k 1The first stage need not have high resolution, so thereare many possible solutions for selecting the subset. Oneconvenient approach is to assign each subset a corresponding coarse estimate and choose one such estimate basedon a few of the observations. For example, suppose thatthe coarse estimator has access to a single sensor output (say, Yi,1 ) from each nominal reference level and thatnn is even. Let Q i 1 Yi,1 be the sum of theseoutputs.Asimplesubsetselectionrule chooses T TQ n /2 1 , . . . , TQ n/2 andcorresponding coarse esti 1086InferenceApplicationFigure 5 A multistage architecture for statistical inference problems.A coarse estimate is used to select a subset of the output statistics thatcontains most of the information about the parameter.4 A Multistage ArchitectureNormalized Fisher information I(x) vT420x2468Figure 4 Fisher information I (x) as a function of x for an array ofsensors with nominal levels uniformly spaced Δv apart and normallydistributed offsets with unit variance. The peaks of each curve correspond to the nominal levels. As the spacing grows smaller, the Fisherinformation approaches the high-resolution approximation (29) shownby the horizontal gray line I (x) Δv 1.806. mate x0 v1 Δv Q 12 for n2 Q n n2 . Thisrule is used in the simulations in Section 6. The subset selection stage can be made more or less accurate based on therequirements of the inference stage.4.2 Linear ModelWhen the subset is restricted to a sufficiently small range ofx values, the nonlinear function ηk (x) can be well approximated by a linear function with slope ηk (x0 ), where x0 is the

J Sign Process Systcoarse estimate from the first stage of the estimator. Thenthe log-likelihood function is approximatelyT1 ηk (x0 ) xtk Ak (x) T2 ln pT X t x h t exp n k 1w1w0w2x̂(32)where h (t ) includes all terms that depend only on t andnot x. In that case, the n -dimensional vector T can bereduced to the scalar statisticS n ηk T , x0 (x0 ) Tk ,(33)wnTnFigure 6 Under the linear approximation of the log-likelihood of asubset of observations T , we can derive a linear estimator for x.k 1which has conditional mean Ex S T , x0 n ηk (x0 ) rFk (x) (34)k 1and conditional variance Varx S T

Abstract We describe a statistical inference approach for designing signal acquisition interfaces and inference sys-tems with stochastic devices. A signal is observed by an array of binary comparison sensors, such as highly scaled comparators in an analog-to-digital converter, that exhibit ra

Related Documents:

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

10 tips och tricks för att lyckas med ert sap-projekt 20 SAPSANYTT 2/2015 De flesta projektledare känner säkert till Cobb’s paradox. Martin Cobb verkade som CIO för sekretariatet för Treasury Board of Canada 1995 då han ställde frågan

service i Norge och Finland drivs inom ramen för ett enskilt företag (NRK. 1 och Yleisradio), fin ns det i Sverige tre: Ett för tv (Sveriges Television , SVT ), ett för radio (Sveriges Radio , SR ) och ett för utbildnings program (Sveriges Utbildningsradio, UR, vilket till följd av sin begränsade storlek inte återfinns bland de 25 största

Hotell För hotell anges de tre klasserna A/B, C och D. Det betyder att den "normala" standarden C är acceptabel men att motiven för en högre standard är starka. Ljudklass C motsvarar de tidigare normkraven för hotell, ljudklass A/B motsvarar kraven för moderna hotell med hög standard och ljudklass D kan användas vid

LÄS NOGGRANT FÖLJANDE VILLKOR FÖR APPLE DEVELOPER PROGRAM LICENCE . Apple Developer Program License Agreement Syfte Du vill använda Apple-mjukvara (enligt definitionen nedan) för att utveckla en eller flera Applikationer (enligt definitionen nedan) för Apple-märkta produkter. . Applikationer som utvecklas för iOS-produkter, Apple .

Step decision rules for multistage stochastic programming: a heuristic approach J. Th eni e J.-Ph.Vial September 27, 2006 Abstract Stochastic programming with step decision rules, SPSDR, is an attempt to over-come the curse of computational complexity of multistage stochastic programming problems. SPSDR combines several techniques.

Moller et al. [17] is the only other paper that proposes the use of a multistage stochastic programming (MSP) model for bid price generation. Like the model by Chen and Homem-de-Mello, this approach is a multistage extension of the PNLP approach, but Moller et al. manage to

Multistage stochastic programming is essentially the extension of stochastic program-ming (Dantzig, 1955; Beale, 1955) to several recourse stages. After an introduction to multistage stochastic programming and a summary of existing approximation approaches based on scenario trees, this thesis mainly focusses on the use of supervised learning for