FINAL EXAM - Data-8.github.io

1y ago

8 Views

2 Downloads

2.93 MB

17 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Kamden Hassan

Report this link

Download PDF

Transcription

DATA 8, FALL 2016FINAL EXAMNAME (FIRST LAST):A. AdhikariSID:TIME AND CONDITIONS: 3 hours; closed book/notes/internet; no calculator/computerQUESTIONS AND ANSWERS There are 16 questions. Not all questions will take the same amount of time. You may answer any part of any question. If the answer to one part depends on another that youcouldn’t do, you can still provide an answer such as “The answer to part (a), divided by 2.” When answers involve calculations that can’t easily be done by mental arithmetic, please leave thearithmetic unsimplified, unless you need to carry out a straightforward calculation in order to complete theproblem. Leave arithmetic expressions in any form that can be typed (perhaps laboriously) into a calculatorto get the decimal answer. Explanations are expected to be concise. One or two clear sentences should be enough. Calculationsand code are sufficient as explanations.GRADING The exam is worth 100 points. Questions 1-6 are worth 5 points each. Questions 7-11 are worth 6 points each. Questions 12-16 areworth 8 points each. We will give partial credit, but only for substantial progress towards a correct answer. We get todecide what “substantial progress” means. Commit yourself to a single answer for each part of each question. If you give multiple answers (suchas both True and False), please don’t expect credit, even if the right answer is among those that you gave.FORMAT Please write your name on each page in the space provided. This will identify your work shouldthere be any mechanical problems during scanning. There is space for your answer below each question. Please do not write outside the blackboundary; the scanner and Gradescope won’t read it. If you need scratch paper, please use the backs of the pages of the exam, but be aware that they willnot be graded. A reference sheet of code and formulas will be provided. But it does not contain everything that wascovered in class.HONOR CODEData Science and the entire academic enterprise are based on one quality – integrity. We are all part ofa community that doesn’t fabricate evidence, doesn’t fudge data, doesn’t steal other people’s work, doesn’tlie and cheat. You trust that we will treat you fairly and with respect. We trust that you will treat us andyour fellow students fairly and with respect. Please abide by UC Berkeley’s Honor Code:“As a member of the UC Berkeley community, I act with honesty, integrity, and respect forothers.”Please sign here to commit to following the Honor Code:

Name:1. Each individual in a population belongs to one oftwo classes: a triangle or a square. Two attributesare going to be used to classify new individuals. Thetraining set, consisting of 12 of points, is shown onthe right. Both of the attributes have been measuredin standard units so that distances are comparable onthe two axes.(a) On the graph, mark one new point (not in thetraining set) that a 3-nearest neighbor classifier usingthis training set would classify as a triangle. Youdon’t have to provide reasoning.(b) The training set is provided again for yourreference. This time, the graph also contains a newpoint not in the training set, shown as a star. Circlethe three nearest neighbors (in the trainingset) of the star, and classify it using two differentclassifiers below. Just underline the right shape. Youdon’t have to provide reasoning.1-nearest neighbor:TriangleSquare3-nearest neighbor:TriangleSquare(c) Suppose a new point is below average in both attributes. In which class would it be placed by the3-nearest neighbor classifier? Explain briefly.1

Name:2. The figure below appears on the website of the Canadian National Household Survey. The graphsattempt to display the distribution of family income: the graph on the left shows the incomes in 2005 andthe one on the right shows incomes in 2010.In each of the two graphs, the eleventh bar from the left is unusually tall compared to the tenth bar. Explainwhy.2

Name:3. In a population of tiny birds, the diameter of the egg and the weight of the hatchling (the baby birdthat hatches from the egg) follows the regression model. The summary statistics in the sample are:correlation 0.75egg diameter (mm)bird weight (gm)mean236SD0.50.4(a) Find the regression estimate of the weight of a bird that hatches from an egg of diameter 24 mm.(b) If you use the sample to make a bootstrap prediction interval at x 24 mm, the interval is for predictingthe height of the(i) regression line(ii) true line in the regression modelat x 24. Pick one option and explain your choice.3

Name:4. A data science class has 500 students. As part of an assignment, each student tests the fairness of a coinusing data from his/her own set of tosses of the coin. All 500 students test the same coin, and they all testthe same pair of hypotheses:Null: The coin is fair.Alternative: The coin is not fair.All of the students use the 5% cutoff for the P-value. You can assume that all the students perform thesame test based on the same large number of tosses.Suppose that, unknown to the students, the coin is fair. About how many students will conclude thatthe coin is not fair? Pick one option and justify your choice.(i) No students(ii) 5 students(iii) 10 students4(iv) 25 students(v) 250 students

Name:5. In a population, 85% of the people are in Class A and the remaining 15% are in Class B. For people inClass A, a classifier has an accuracy of 90% (that is, among Class A people, 90% are classified as Class Aand 10% as Class B). For people in Class B, the accuracy of the classifier is 98%.One person is picked at random from the population.(a) What is the chance that the person is classified correctly?(b) Given that the person is classified correctly, what is the chance that the person is in Class B?5

Name:6. A new function that takes a numerical argument is defined as follows:def my function(c):if c -2:return 4elif c 2:return 4else:return abs(c) 2(a) Draw the plot generated by the following code. You don’t have to worry about exactly what labelsPython will put on the axes. Just make sure the horizontal and vertical coordinates of your points areclear.t Table().with column(‘x’, np.arange(-3, 3.1, 1))t.with column(‘y’, t.apply(my function, ‘x’)).scatter(0, 1)(b) Pick the option that best completes the sentence, and explain your choice.The expression minimize(my function) evaluates to(i) 3(ii) 0(iii) 1(iv) 2(v) 36(vi) 3.1(vii) 4

Name:7. A hospital system has data on the systolic and diastolic blood pressures (both measured in millimeters ofmercury) of hundreds of thousands of patients. Assume that the scatter plot of the two variables is roughlyfootball shaped with an unknown correlation coefficient r.The table bp consists of one row for each of 300 patients sampled at random from the population ofpatients. The table has two columns. Column Systolic contains the systolic blood pressures and columnDiastolic contains the diastolic blood pressures.(a) Complete the code below so that the last line evaluates to an array consisting of the end points of anapproximate 90% bootstrap confidence interval for r, based on 10,000 repetitions of the bootstrap process.You may use a function corr that takes as its arguments two numerical arrays of the same length andreturns the correlation between them. You do not need to define corr.r values make array()for i in np.arange():resample bp.new r corr(resample., resample.)r values np.append()left end percentile()right end percentile()make array()(b) How would you use the interval constructed in part (a) to test whether or not r 0.6? Your answershould include the cutoff for the P-value. [No code is required for this answer. Just explain in words.]7

Name:8. The prices of 152 cars are summarized in the table below. Prices are in thousands of dollars. Eachinterval includes the left end point but not the right.intervalnumber of cars[10, 22)26[22, 27)26[27, 34)30[34, 46)29[46, 58)14[58, 70)14[70, 110)13(a) One of the graphs below is a histogram of these data. Which is it, and why? [No, you don’t needvertical scales or a calculator.](i)(ii)(iii)(b) The prices are sorted in increasing order and placed in the array prices.len(prices) evaluates to 152. Here are the first 20 entries of 9.01,16.39,19.04,16.91,19.08,17.05,19.14,What does the following expression evaluate to, and why?percentile(10, prices)818.24,19.14,18.25,19.24,Thus the expression18.56,19.32

Name:9. Researchers studying health insurance in the United States have gathered data on whether or not peopleare insured.There are several thousand people in the study. The table insured contains one row for each person.The table has three columns in the following order: the column Name contains the person’s name; ZipCode contains the zip code of the person’s home address; and Insured is a 0/1 variable where 1 means“insured” and 0 means “not insured”.The table states consists of one row for each zip code in the United States. The first column is labeledZip Code and contains the zip code; the second column is labeled State and contains the name of thestate (such as California, or New York) in which that zip code is located.Write Python code in each of the following parts. You can use multiple lines of code. The last line ofyour code should evaluate to the element described in the question.(a) the proportion of insured people in the study(b) a state that has the largest number of insured people among the all states represented in the study(c) a state that has the largest proportion of insured people among the all states represented in the study9

Name:10. A population consists of more than half a million people. Histogram A below is an empirical histogramof the mean weight (in pounds) of a random sample of 100 people drawn with replacement from thepopulation, based on 25,000 repetitions of the sampling process. Histogram B is an empirical histogram ofthe mean weight of a random sample drawn with replacement from the population, also based on 25,000repetitions, but the sample size is unknown.(a) Pick one option and justify your choice:The SD of the 25,000 sample means used to construct Histogram A is closest to(i) 1 pound(ii) 2 pounds(iii) 3 pounds(iv) 4 pounds(v) 10 pounds(vi) 20 pounds(b) Pick one option and justify your choice:The size of each of the 25,000 samples whose means were used to construct Histogram B is closest to(i) 100(ii) 200(iii) 400(iv) 80010(v) 1600

Name:11. The plot on the right shows 15 points along withthe regression line. The data represent thousands ofwomen in the United States, grouped by height tothe nearest inch. For example, all the women whoseheights are 62 inches to the nearest inch form onegroup. The value on the horizontal axis is the heightto the nearest inch, and the value on the vertical axisis the average weight of women in the correspondinggroup. The correlation is about 0.995.(a) One of the graphs below is the residual plot of this regression. Which is it, and why?(b) If you draw a scatter plot consisting of one point for each of the thousands of women, with her heighton the horizontal axis and her weight on the vertical, will your scatter show a correlation of about 0.995,more than 0.995, or less than 0.995? Pick one option and explain your choice with reference to the scatterplot of heights to the nearest inch and average weights given in this problem.11

Name:12. In a large random sample of U.S. households, the median annual income is 54,000. This originalsample is bootstrapped 5,000 times and the sample median is recorded for each of the bootstrap samples.The middle 95% interval of these values is ( 53,000, 55,000).(a) True or false (explain your answer):The interval ( 53,000, 55,000) is an approximate bootstrap 95% confidence interval for the median incomeof all the households in the sample.(b) Pick the option that you think best completes the sentence, and explain your choice.The percent of all U.S. households with annual incomes in the range ( 53,000, 55,000)(i) is about 95%.(ii) is about 50%.(iii) cannot be approximated based on the information given.(c) Pick the option that you think best completes the sentence, and explain your choice.If you calculate the mean of each of the 5,000 bootstrap samples and take the middle 95% interval of the5,000 means, the center of the new interval will be(i) less than 54,000.(ii) about 54,000.12(iii) more than 54,000.

Name:13. The “handedness” of a person refers to whether the person mainly uses their left hand or right hand;some people are equally at ease with both hands and are called “ambidextrous”. In a study of whetherhandedness is is related to gender, a random sample of 1,000 people was taken in a county. There were 488men and 512 women in the sample, and the distributions of handedness of males and females came out asfollows:right handedleft 0790.006(a) To test whether or not handedness and gender are related, we need null and alternative hypotheses.Does the null hypothesis say that the two distributions displayed above are the same? If not, which twodistributions does it compare, and what does it say about them?(b) State the alternative hypothesis.(c) Justify a choice of test statistic and find its observed value in the sample.(d) To carry out the test, the process starts with (pick one option and justify your choice):(i) drawing 512 times at random with replacement from the distribution of males in the table above.(ii) drawing 488 times at random with replacement from the distribution of females in the table above.(iii) permuting all 1000 people and labeling the first 488 “male” and the remaining 512 “female”.13

Name:14. The code below generates a plot.data Table().with columns(‘x’, make array(-1, 2, 0),‘y’, make array( 2, -4, 0))def mse(slope):intercept 0predictions slope*data.column(‘x’) interceptreturn np.mean((predictions - data.column(‘y’))**2)slopes Table().with column(‘potential slope’, np.arange(-3, 1, 1))mses slopes.apply(mse, ‘potential slope’)slopes.with column(‘MSE’, mses).scatter(‘potential slope’, ‘MSE’)(a) Draw the plot. Don’t worry about the labels that Python will put on the axis. Just make sure thatyou provide coordinates of some points so that it is clear what you are plotting.(b) Consider the following four equations for lines. Among these, which has the lowest mean-squared errorin predicting the ‘y’ column of data based on the ‘x’ column, according to the plot you made?(i) y -3*x 0(ii) y -2*x 0(iii) y -1*x 14(iv) y 0*x 0

Name:15. A random sample of 1,000 12-year-olds in a state took a multiple choice test. One of the questions hadfive possible answers, one of which was correct. Test results showed that 180 of the 1000 students got thatquestion right.This alarmed some educators, who said, “The kids did worse than they would have by random guessing!”But other educators said the results were like random guessing, allowing for chance variation.Show how to perform a statistical test to see which educators’ viewpoint is better supported by thedata, in the following steps.(a) State the null hypothesis as a clearly specified chance model.(b) State the alternative hypothesis. Keep in mind that the goal of the statistical test is to decide betweenthe two viewpoints of the educators.(c) Suppose the test is performed using as its test statistic the number of students who get the answerright. Draw a sketch of the empiricial distribution of this statistic under the null hypothesis. Mark theobserved value of the test statistic in a reasonable place on the horizontal axis (it doesn’t have to be exactbut it should make sense).(d) On the sketch above, shade the area corresponding to the P-value. In the space below, explain whyyou chose to shade that region.15

Name:16. Bootstrapping is a way of replicating a sample so that you get a sample that is similar but most likelynot exactly the same as the original sample. However, there is a chance that a bootstrap sample is exactlythe same as the original. In this problem you will find that chance.(a) The original sample consists of four people: John, Paul, George, and Ringo. This sample will bebootstrapped. Find the chance that all four people appear in the bootstrap sample. Your answer shouldjust be an arithmetic expression; no code is needed.(b) The original sample consists of N people. The sample will be bootstrapped. Write a Python functioncalled same that takes N as its argument and returns the chance that all N people appear in the bootstrapsample. [There are many different ways of writing this code. Any correct way is fine.]16

The exam is worth 100 points. Questions 1-6 are worth 5 points each. Questions 7-11 are worth 6 points each. Questions 12-16 are worth 8 points each. We will give partial credit, but only for substantial progress towards a correct answer. We get to decide what \substantial progress" means. Commit yourself to a single answer for each part of .

Related Documents:

FIN 370 Final Exam - docdroid.net

Final Exam Answers just a click away ECO 372 Final Exam ECO 561 Final Exam FIN 571 Final Exam FIN 571 Connect Problems FIN 575 Final Exam LAW 421 Final Exam ACC 291 Final Exam . LDR 531 Final Exam MKT 571 Final Exam QNT 561 Final Exam OPS 571

231 Views

2y ago

Afrikaans P2 Exam and Memo 5. 6. 7. 8. 9. 10. 11.

Past exam papers from June 2019 GRADE 8 1. Afrikaans P2 Exam and Memo 2. Afrikaans P3 Exam 3. Creative Arts - Drama Exam 4. Creative Arts - Visual Arts Exam 5. English P1 Exam 6. English P3 Exam 7. EMS P1 Exam and Memo 8. EMS P2 Exam and Memo 9. Life Orientation Exam 10. Math P1 Exam 11. Social Science P1 Exam and Memo 12.

262 Views

3y ago

CLASS INFORMATION SHEET - South Plains College

FINAL EXAM: The final exam will cover chapter 11, 13 and 15. There will be no make-up exam for the final exam. The final exam will count 100 points. The final exam will be 40 questions. The format will be multiple-choice. Only the materials covered in the lectures will be on the exam and you will have designated class time to finish the exam.

88 Views

3y ago

A FULL SET OF ALL SUBJECTS R 68

GRADE 9 1. Afrikaans P2 Exam and Memo 2. Afrikaans P3 Exam 3. Creative Arts: Practical 4. Creative Arts: Theory 5. English P1 Exam 6. English P2 Exam 7. English P3 Exam 8. Geography Exam 9. Life Orientation Exam 10. MathP1 Exam 11. Math P2 Exam 12. Physical Science: Natural Science Exam 13. Social Science: History 14. Technology Theory Exam

167 Views

3y ago

MATH 242 ONLINE – CALCULUS II VIRTUAL OFFICE …

Note: If the score earned on the final exam is higher than the lowest unit exam score, then the lowest unit exam score will be replaced with the score earned on the final exam. If a student misses an exam, then that exam will be counted as the lowest exam score. Only one exam score can be replace

36 Views

2y ago

Final Exam Practice - MIT OpenCourseWare

1 Final Exam Practice Final Exam is on Monday, DECEMBER 13 9:00 AM - 12 NOON BRING PICTURE I.D. Exam Review on Thursday, Dec. 9 (new material only) 7-9 PM Exam Tutorial Friday, Dec 10th 1-3 PM Spring 2004 Final Exam Practice MIT Biology Department 7.012: Introductory Biology - Fall 2004

52 Views

3y ago

World History I: The Dawn of Civilization History E-10a/W

This course has only one exam – the final exam. The questions on the final exam will test your knowledge and critical thinking ability. The exam will be given in the classroom. You will have two hours on December 13 for the final exam. You will receive sample questions for the final exam.

34 Views

2y ago

Precalculus Final Exam Review 2014 2015 You must show work ...

Adv Alg/Precalculus Final Exam Precalculus Final Exam Review 2014 – 2015 You must show work to receive credit! This review covers the major topics in the material that will be tested on the final exam. It is not necessarily all inclusive and additional study and problem solving practice may be required to fully prepare for the final exam.File Size: 303KBPage Count: 11

27 Views

2y ago

Recent Views

Case 580 Sl Backhoe Service Manual

series b, 580c. case farm tractor manuals - tractor repair, service and case 530 ck backhoe & loader only case 530 ck, case 530 forklift attachment only, const king case 531 ag case 535 ag case 540 case 540 ag case 540, 540c ag case 540c ag case 541 case 541 ag case 541c ag case 545 ag case 570 case 570 ag case 570 agas, case

3y ago

242 Views

12 PUBLIC LAW AND PRIVATE LAW - Home: The National .

INTRODUCTION TO LAW MODULE - 3 Public Law and Private Law Classification of Law 164 Notes z define Criminal Law; z list the differences between Public and Private Law; and z discuss the role of Judges in shaping Law 12.1 MEANING AND NATURE OF PUBLIC LAW Public Law is that part of law, which governs relationship between the State

3y ago

753 Views

Dr. Ram Manohar Lohiya National Law University, Lucknow

2. Health and Medicine Law 3. Int. Commercial Arbitration 4. Law and Agriculture IXth SEMESTER 1. Consumer Protection Law 2. Law, Science and Technology 3. Women and Law 4. Land Law (UP) Xth SEMESTER 1. Real Estate Law 2. Law and Economics 3. Sports Law 4. Law and Education **Seminar Courses Xth SEMESTER (i) Law and Morality (ii) Legislative .

3y ago

506 Views

Companies Law - Cayman Islands dollar

Law 1 of 1971-15th December, 1970 Law 7 of 2000- 20th July, 2000 Law 7 of 1973-28th June, 1973 Law 5 of 2001-20th April, 2001 Law 24 of 1974-22nd November, 1974 Law 10 of 2001-25th May, 2001 Law 25 of 1975-9th December, 1975 Law 29 of 2001-26th September, 2001 Law 19 of 1977-10th November, 1977 Law 46 of 2001-14th January, 2002

3y ago

464 Views

It’s the Law!

ciples stated in Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law. Students will be able to explain the application of Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law to observations or events related to SCUBA diving. MateriaLs None audio/visuaL MateriaLs None teachinG tiMe

2y ago

387 Views

WHAT LAW IS ? An Introduction to Law

common law system civil law system!! sources of law in civil law !! a1. primary: statutes (written law) enacted by legislative power are the principal source of law. ! a2. two subsidiary sources of law: ! a2.1 administrative regulations a.2.2 customs!! ! sources of law in common law !!! b1. two primary sources of

2y ago

396 Views

GENERAL SELECTION GUIDE - LOADER - Combi Wear Parts

case 721e z bar 132,5 r10 r10 - - case 721 bxt 133,2 r10 r10 - - case 721 cxt 136,5 r10 r10 - - case 721 f xr tier 3 138,8 r10 r10 - - case 721 f xr tier 4 138,8 r10 r10 - - case 721 f xr interim tier 4 138,9 r10 r10 - - case 721 f tier 4 139,5 r10 r10 - - case 721 f tier 3 139,6 r10 r10 - - case 721 d 139,8 r10 r10 - - case 721 e 139,8 r10 r10 - - case 721 f wh xr 145,6 r10 r10 - - case 821 b .

3y ago

272 Views

Your one stop shop for deli container packaging - Pactiv

12oz Container Dome Dimensions 4.5 x 4.5 x 2 Case Pack 960 Case Weight 27.44 Case Cube 3.21 YY4S18Y 16oz Container Dome Dimensions 4.5 x 4.5 x 3 Case Pack 480 Case Weight 18.55 Case Cube 1.88 YY4S24 24oz Container Dome Dimensions 4.5 x 4.5 x 4.17 Case Pack 480 Case Weight 26.34 Case Cube 2.10 YY4S32 32oz Container Dome Dimensions 4.5 x 4.5 x 4.18 Case Pack 480 Case Weight 28.42 Case Cube 2.48 YY4S36

1y ago

120 Views

Faculty of Juridical, Social and Political Sciences Year .

Law L Law IV 8 Drept procesual civil II / Civil Procedure Law II 5 Law L Law IV 8 Dreptul comerțului internațional / International ommercial Law 4 Law L Law IV 8 riminalistică / Forensics 4 Law L Law IV 8 Practică de cercetare pentru elaborarea lucrării de lincență(3 săptămân

2y ago

392 Views

Ohm ’s Law

Ohm ’s Law Ohm's law states that, in an electrical circuit, the current passing through most materials is directly proportional to the potential difference applied across them. 3-1—3-3: Ohm ’s Law Formulas There are three forms of Ohm’s Law: I V/R V IR R V/I where:File Size: 1MBPage Count: 40Explore furtherOhm's Law Quiz MCQs with Answers Ohm Lawohmlaw.comOhm’s Law Worksheet - Basic Electricity - All About omohms law worksheet - eering.orgOhm’s Law Worksheet - Richmond County School Systemwww.rcboe.orgOhm's Law with Examples - Physics Problems with Solutions ended to you b

2y ago

302 Views

Intermediate Law Law and You Worksheet 3: Australian law - Home Affairs

4. There are different kinds of law to deal with different kinds of problems. Four important kinds of law are civil law, criminal law, family law and administrative law. Civil law deals with disputes between individuals; for example, if someone sells you goods that are faulty, or that cause you injury or damage, you can take that person to court.

4m ago

116 Views

PRINCIPLES OF BUSINESS LAW - DPHU

ABE Diploma in Business Administration Study Manual PRINCIPLES OF BUSINESS LAW Contents Study Unit Title Page Syllabus i 1 Nature and Sources of Law 1 Nature of Law 3 Historical Origins 6 Sources of Law 9 The European Community and UK Law: An Overview 13 2 Common Law, Equity and Statute Law 23 Custom 25 Case Law 26 Nature of Equity 32

3y ago

290 Views

WHARTON CONSULTING CLUB - Wall Street Oasis

Case 4: Major Magazine Publisher 56 61 63 Case 5: Tulsa Hotel - OK or not OK? Case 6: The Coffee Grind Case 7: FoodCo Case 8: Candy Manufacturing 68 74 81 85 Case 9: Chickflix.com Case 10: Skedasky Farms Case 11: University Apartments 93 103 108 Case 12: Vidi-Games Case 13: Big School Bus Company Case 14: American Beauty Company 112 118

2y ago

352 Views

WRITING CASE NOTES AND CASE COMMENTS1 - The Open University Law School

Jessica Giles, Law Lecturer, The Open University Contents 1. Introduction Learning outcomes 2. Writing case notes 2.1 How to start 2.2 Common law, civil law, international law and supranational law legal systems and types of judgment 2.3 Deconstructing and reconstructing a case 2.2.1 Organising the pieces 2.2.2. Reconstructing legal argument

1y ago

142 Views

A Trail Guide to Careers in Environmental Law

law, constitutional law, property law, bankruptcy law, criminal law, food and drug law, land use planning law, and international law. A distinctive aspect of environmental practice is the role of science in advocacy efforts.

3y ago

248 Views

FINAL EXAM - Data-8.github.io

It looks like you're using an ad-blocker