Lecture 3 Aggregation And Least Squares - Historyofdsc

1y ago
10 Views
2 Downloads
6.47 MB
38 Pages
Last View : 14d ago
Last Download : 3m ago
Upload by : Grady Mosby
Transcription

Lecture 3Aggregation and Least SquaresHistory of Data Science, Spring 2022 @ UC San DiegoSuraj Rampure

Announcements Homework 3 is released and is due on Sunday, April 17th at 11:59PM. Homework 1 is graded! Make sure to look at the solutions, posted on Slackand on the course website.

Agenda Pythagorean means. Tycho Brahe’s use of the mean. A pre-cursor to least squares – Boscovich’s method. Legendre, Gauss, and least squares.

Means

Means The concept of the “arithmetic mean” was known to the Pythagoreans – infact, they are known for establishing three types of means. However, means were not used for the purposes of summarizing data untilmuch, much later.

Pythagorean meansmeana,b2Ca2bFrom Archytas (member of the Pythagorean school of thought)1:“There are three 'means' in music: one is the arithmetic, the second is the geometric, and the third is thesubcontrary, which they call 'harmonic'. The arithmetic mean is when there are three terms showingsuccessively the same excess: the second exceeds the third by the same amount as the first exceeds thesecond. In this proportion, the ratio of the larger numbers is less, that of the smaller numbers greater.”--c-b a2c -C/C a 1. http://www.cs.uni.edu/ campbell/stat/pyth.htmlGeneral!9taz a,

Pythagorean meanscFrom Archytas (member of the Pythagorean school of thought)1:bC ca"The geometric mean is when the second is to the third as the first is to the second; in this, the greaternumbers have the same ratio as the smaller numbers.”c-§ I aba. .c-:*Tab"“The subcontrary, which we call harmonic, is as follows: by whatever part of itself the first term exceeds tofthethird.Inthisproportion,theratioofthelarger numbers is larger, and of the lower numbers less.”--- 1. http://www.cs.uni.edu/ campbell/stat/pyth.html 1-E- 5--1

Generalharmonicmean:aInat .

Tycho Brahe Recall, Tycho Brahe (1546-1601) was a Danish astronomer.1 He was a pioneer in measuring the positions of stars in thenight sky, without the use of telescopes. Kepler used Brahe’s data when creating his laws of planetarymotion. He is also one of the earliest scientists documented ashaving used the mean to combine observations.2 Also supposedly lost his nose in a fight and wore a fakenose.1. anish-astronomer2. Pearson and Kendall, Studies in the History of Probability and Statistics, p122-123Tycho Brahe’s triangular sextant

Right ascension One of the earliest documented examples of combiningobservations is in the work of Tycho Brahe, who was measuring theright ascension of α Arietis (a star). Right ascension is the celestial equivalent of longitude on Earth. It is measured in units of time, relative to when a reference point(the “vernal equinox”) passes overhead. e.g. if an object’s right ascension is 2 hours and 15 minutes,you will see it pass directly above you 2 hours and 15 minutesafter the reference point does. Similar to GMT-8 meaning “8 hours before GreenwichMeridian Time.”tapparent pathofsun

Brahe collected several measurements forthe right ascension of α Arietis from1582-1588, with the goal of coming upwith a single value. He selected 3 values from 1582, and 12values from the next 6 years, each of whichwas the mean of two other observations. Question: how do we interpret thesenumbers and verify that he did indeedtake the mean of each pair?Source: Pearson and Kendall, Studies in the History of Probability and Statistics, p122-123

Aside: measuring time in degrees360. 24hours-24-241 hour Right ascension is measured in time, and can vary from 0 hours to 24 hours (because oneµ 4minurotation of the Earth takes 24 hours). A circle has 360º degrees in it, so one way of describing time is as using360º 24 hours This means that 15º 1 hour, and 1º 4 minutes.degree into 60 arcminutes, denoted by ‘, and each arc minute We can further subdivide each into 60 arcseconds, denoted by ‘’. As an example, let’s try and convert the following measurement into regular minutes:E)i-f.to/;i--li-.j--82º 15’ 10”

10--4minutes):-.itI" (82º 15’ 10”- ①convert82 15degreesto'lo" / 82 61-0.15 * -② Convert4. (82 4 1%0)tominutes(82 4- 3,4).to) & mi①" ""minute,328 1 4-0 329 4minute

34.13'4"ÉÑs①Converttodegrees34②Convert 0 6 minutesto4 ( 34 Ig)136 1,3- 4,122hours,16 minutes[ (E) Jain

Back to Brahe’s data Now that we know how to interpret these80numbers, we can verify that the operation Braheused on each pair was the mean. Strategy: to compute mean(d1, d2): Convert d1 and d2 to minutes (i.e. regularnumbers) and compute their mean. Convert the mean back into degreesarcminutes-arcseconds. Let’s try this in a Jupyter Notebook!

Reducing observational error The values in the right-most column are far lessspread out than the values in the middle column. As such, Brahe used the mean to eliminatesystematic errors.1É The final right ascension that Brahe reported was26º 0’ 30”, which is very close to both the meanof all 15 numbers in the right column and themean of just the bottom 12. Per his biographer1, the correct value of the rightascension of α Aries at the time was 26º 0’ 45”,which is quite close.1. Pearson and Kendall, Studies in the History of Probability and Statistics, p122-123

The mean and least squares

For context Without proper context, it may not be clear what aggregation (e.g. taking themean or median of a set of values) has anything to do with least squares(which you learned in DSC 10 is the foundation of linear regression). This connection is made more clear in DSC 40A. We’ll spend a little bit of time providing this context, as we move into theorigins of least squares.

Making predictionsi# As you’ve seen in DSC 10, the slope and intercept of the line of best fit come from findingthe values of a and b that minimize mean squared error.n12MSE yi (a bxi))(n i 1 What if we want to use a more simple prediction technique – what if we want to make aconstant prediction, for each observation? To do this, we’d need to find the constant c that minimizes mean squared error.n12MSE (yi c)n i 1

R (c) §Gi c)" Yi yz ;. y-the"squarederror,"least squares "aggregation

Other types of error Why do we minimize mean squared error? Instead of squaring the errors before taking the mean, is there anotheroperation we could apply?meanabsolutecould 'veerror:usedabsolute/ yiInF- Ivalue-c/

Mean squared error vs. sum of squared errors Minimizing mean squared error is thet.EEsame as minimizing the sum of squarederrors.5-9 1 Key idea: the value of x that minimizes.f(x) is the same value of x that minimizesc f(x), if c is some positive constant. Many of the original authors we willminimizingstudy aimed to minimize the sum ofsquared errors, not the mean – but this isthe same task.sumis n.meanthe same!

Boscovich’s method

The length of a meridian arc A meridian arc is a curve drawn between two points on thesurface of the Earth that have the same longitude. In the mid-1700s, geodesists were concerned with studyingthe shape of Earth.": Earth is an ellipsoid that is slightly flatter at the poles than itis at the equator. Their goal at the time was to determine the relationshipbetween the length of one degree of latitude near theNorth Pole and the length of one degree of latitudeelsewhere on Earth. To do this, they measured the lengths of several meridian arcs.

Boscovich’s dataCroatiai(1711-1787) was a Dalmatian Roger Joseph BoscovichEcuadorastronomer, mathematician, andJesuit priest. He obtained data containing thelength of one degree of latitude atfive different spots on Earth. tFinland- -Source: Stigler, Studies in the History of Probability and Statistics, p. 43

The modelA: arc0:2- ,lengthlatitudey:(known)( known)uns A rough approximation for the length of an arc is2a z y sin θ--where z is the length of a degree at theequator and y is the “excess”. If y 0, then the Earth is a perfect sphere, andmeridian arcs are of the same length (z) at anylatitude. If y 0, the Earth is flatter towards the poles,and meridian arcs range from length z at theequator to length z y at the North Pole.Source

An abundance of data2a z y sin θ If Boscovich had just 2 observations, he’dhave a system of two equations and twounknowns, and would be able to solve forz and y. However, he had 5 observations, and hadto deduce a method of computing z and yusing all 5 observations. Ideas?

Boscovich’s method For each of our five observations (θi, ai), we can write2ai z y sin θi Boscovich’s described a method for selecting z and y:21. For each i, write ei ai z y sin θi.sumofabsoluteerrors ei is minimized.2. Choose z and y such thatei 0 and i What does this resemble?iaint

Least squares

Legendre Adrien-Marie Legendre (1752-1833) was a Frenchmathematician who was also active in the field ofgeodesy1. In 1791, the French Academy of Sciencedefined a meter as being one ten millionth ofthe length of the meridian arc starting at theNorth Pole, passing through Paris, and endingat the equator. He helped measure the length of a meter.1. Legendre

Legendre’s least squares In a 1805 paper about measuring the orbits of comets, Legendre published anappendix titled “Sur la Methode des moindres quarres”, which detailed ageneral procedure for estimating coefficients of linear equations. He wrote (translated):“Of all the principles which can be proposed for [making estimates from asample], I think there is none more general, more exact, and more easy ofapplication, than that of which we have made use which consists ofrendering the sum of the squares of the errors a minimum.”

Gauss Carl Friedrich Gauss (1777-1855)1 was a German mathematician, and is one of the most accomplishedmathematicians of all time. He is known for developing or contributing to: Least squares. The normal (Gaussian) distribution. Algebra and number theory. He supposedly summed the positive integers between 1 and 100 very quickly. Electromagnetism. Not Gaussian elimination!1. h-Gauss

Gauss and least squares In 1809, Gauss published “Theory of the Motion of the HeavenlyBodies Moving About the Sun in Conic Sections”, and in it he used themethod of least squares to calculate the shapes of orbits. Legendre published about least squares in 1805, 4 years before.However, Gauss claimed to have known about least squares in 1795. Evidence: Gauss was able to predict the precise location ofplanetoid Ceres using his method of least squares. Ceres was observed on January 1st, 1801 for a period of 40 days.Several astronomers competed to predict where it would bespotted again, and Gauss’ guess was the only correct one2.1. h-Gauss2. uss-and-the-method-of-least-squaresSource

Error distributions One of the key differences between the approaches to least squares by Gauss andLegendre was that Gauss linked the theory of least squares to probability theory. Specifically, he posed the least squares model whereyi a bxi ϵiwhere ϵi is a random variable that follows the following error distribution:ϕ(x μ, σ) 12πσ 2e(x μ)2 22σ

We will study Gauss’ derivation of the (now-called) Gaussian/normal distribution in Lecture 5.

Summary, next time

Summary, next time Much of the advances regarding aggregation and statistical estimation in the1500-1800s was motivated by geodesy and astronomy. Tycho Brahe’s use of the mean. Boscovich’s method regarding meridian arcs. Legendre’s method of least squares. Next time: Percentiles and regression.

Bodies Moving About the Sun in Conic Sections", and in it he used the method of least squares to calculate the shapes of orbits. Legendre published about least squares in 1805, 4 years before. However, Gauss claimed to have known about least squares in 1795. .

Related Documents:

Introduction of Chemical Reaction Engineering Introduction about Chemical Engineering 0:31:15 0:31:09. Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Lecture 27 Lecture 28 Lecture

Lecture 1: A Beginner's Guide Lecture 2: Introduction to Programming Lecture 3: Introduction to C, structure of C programming Lecture 4: Elements of C Lecture 5: Variables, Statements, Expressions Lecture 6: Input-Output in C Lecture 7: Formatted Input-Output Lecture 8: Operators Lecture 9: Operators continued

Lecture 1: Introduction and Orientation. Lecture 2: Overview of Electronic Materials . Lecture 3: Free electron Fermi gas . Lecture 4: Energy bands . Lecture 5: Carrier Concentration in Semiconductors . Lecture 6: Shallow dopants and Deep -level traps . Lecture 7: Silicon Materials . Lecture 8: Oxidation. Lecture

TOEFL Listening Lecture 35 184 TOEFL Listening Lecture 36 189 TOEFL Listening Lecture 37 194 TOEFL Listening Lecture 38 199 TOEFL Listening Lecture 39 204 TOEFL Listening Lecture 40 209 TOEFL Listening Lecture 41 214 TOEFL Listening Lecture 42 219 TOEFL Listening Lecture 43 225 COPYRIGHT 2016

Partial Di erential Equations MSO-203-B T. Muthukumar tmk@iitk.ac.in November 14, 2019 T. Muthukumar tmk@iitk.ac.in Partial Di erential EquationsMSO-203-B November 14, 2019 1/193 1 First Week Lecture One Lecture Two Lecture Three Lecture Four 2 Second Week Lecture Five Lecture Six 3 Third Week Lecture Seven Lecture Eight 4 Fourth Week Lecture .

News referral services can take the form of media aggregation services, online search services or social media services. These are explained below. 5.1.1 Media aggregation services A digital platform that supplies a media aggregation service collects and presents news content from across the internet. Most providers of media aggregation .

Alcatel-Lucent 7705 Service Aggregation Router Overview The Alcatel-Lucent 7705 Service Aggregation Router (SAR) is an edge aggregation platform providing superior IP/MPLS and pseudowire capabilities. It addresses your need for a cost-effective, scalable mobile radio access network (RAN) transport solution. The 7705 SAR excels at con-

Aggregation in Social LCA studies, SETAC CPH Nov 2012 1 Aggregation over the entire life cycle: In order to indeed get a holistic picture of the social impacts over the entire life cycle, aggregation is needed, because a life cycle model provides information for its smallest elements, processes, which are usually grouped into life