1y ago

57 Views

2 Downloads

1.26 MB

55 Pages

Transcription

Stat 152 Team 1 - Weight GainDecember 15, 2006Contents1 Authors12 Introduction23 Summary of Findings24 Methodology25 Questionnaire design36 Findings47 Problems Encountered78 Conclusion and Suggestions79 Group Reports91Authors Michael Hoisie Kevin Teng Alex Young1

Stat 152 - Weight Gain2December 15, 2006Introduction3Summary of FindingsThe purpose of this survey is to determine what fac-In our sample, 54 out of the 257 people surveyedtors affect weight change of UC Berkeley engineer-gained weight in the past year, 39 lost weight, anding students. We test a variety of factors and howthe rest did not report any significant weight changes.they are correlated to weight change, including theMost of the factors that we recorded did not haveamount on food, exercise time, class year, currentsignificant effects on weight gain or loss.unit count, total course count, age, and gender.we regressed spending on food with weight change,WhenWeight-related issues have become leading pub-there was no significant correlation. There was alsolic health concerns in the United States. For in-no significant relationship between weight changestance, over the past two and a half decades, obe-and exercise time, course load, and living status.sity has doubled in adults and has tripled in childrenThere are two factors, however, that seemed to haveand adolescents, and over 32% of adults ages 20 andan effect on weight change. One of them was gen-older were obese 1 . Because of these concerns, it isder: males tended to lose more weight over the pastimportant to address which factors affect weight inyear than females. The other factor that seemed toBerkeley students.have an effect on weight change was class: sopho-There are many factors that can affect the weightmores and seniors tended to gain more weight thanof college students. Primarily, most students areother classes.overloaded with schoolwork, and may not be ableto find time for eating healthy meals and exercis-4Methodologying. Students who are subscribed to a meal planOur designated target population is all upperclass-may be more likely to gain weight by having accessmen at UCB College of Engineering. This targetto unlimited quantities of food, and older studentspopulation was selected because College of Engi-who do not have access to cafeteria food may eatneering is smaller and more restrictive than othermore sporadically. Another factor is money: stu-colleges, yet it is still a relatively good representa-dents with more money are capable of buying moretive of the total student body population. Infor-food (or higher quality foods). Because the weightmation about courses in the College of Engineeringtrends in college are likely to be maintained afteris readily available online, so planning the surveygraduation, it is important to address any trends inand selecting sampling units is relatively easier thanweight change and identify the relevant factors.that of other colleges. Although we limit our scope1 Ogden et al. Prevalence of Overweight and Obesity inthe United States, 1999 - 2004. JAMA 2006to engineering students, the quality of statistics isaccurate enough so that our analyses and results2

Stat 152 - Weight GainDecember 15, 2006should not be compromised.ing the minimum sample size. The actual minimumOur sampled population is respondents of thesample size for our complex survey is deff*n. Thedistributed questionnaire. Since we carried out thedeff equation can be computed through the follow-survey in both lectures and discussions, the sam-ing expression:pling frame consists of UCB upper division engi1 δ(n 1)neers who attend the upper division engineeringcourses in which we administer the survey.Here, n represents the average cluster size orWe estimated our population size, the number of46.7 students enrolled in a class. Meanwhile, δ isall upperclassmen at UCB College of Engineering,the intraclass correlation, which represents the liketo be N 1765.127 . The estimation is based on thelihood that two elements within the same clusterdata on 2005-2006 College of Engineering’s statishave the same value for a given statistic relativetics, provided by the Office of Student Research2 .to two elements chosen completely at random inThis is the best estimate of our target populationthe population. Unfortunately this cannot be calsince the data of the 2006-2007 has not yet beenculated before the survey data has been collected.published. However, the number of students in theIf we make the assumption that the interclass corCollege of Engineering is unlikely to change signifirelation is negligible, then our calculated minimumcantly over the course of one year, so the 2005-2006sample size using the SRS method is still justifiable.estimates should be sufficient.We can then determine the minimum number ofstudents for our sample size, n, by using the follow-5ing expression:Although there were several strategies used to de-n0 Questionnaire designsign the questionnaire, the two overarching prin-2zα/2S2ciples were to keep it simple, and to design thee2questions to maximize responses. In order to keepAnd the final estimate:the survey as short as possible, questions whichn01 nN0were likely to produce vague responses were omit-We obtained a minimum sample size off n 235to elicit responses that provided interesting back-for a 90% confidence interval and a margin of er-ground information, but were not wholly relevantror 0.05. Unfortunately, at the time of the survey,to the survey goal. These questions were graduallywe did not learn about the design effect (deff) yetstripped from the questionnaire because they addedand thus did not take that into account in calculat-unnecessary complexity. Some questions were omit-n ted. Many of the initial questions were designed2 http://calprofiles.berkeley.edu/index.cfm3

Stat 152 - Weight GainDecember 15, 2006ted because they weren’t relevant to the samplepopulation. Also, some categorical questions wereomitted (e.g. “What is your ethnicity?”) becausethey weren’t likely to be directly correlated withweight change without the issue of confounding factors.The broad and overarching questions were shiftedtowards the end of the survey. This allowed students to get accustomed to the format of the survey, and be more comformtable with answering thebroad questions later.Because some categoricalquestions may not cover all students, a zero-valuewas included in some cases to give students a broaderrange of answers. Some open-ended questions, suchFigure 1: Histogram of weight gain in the samplepopulationas “What sorts of foods do you buy the most?”were instead formatted as multiple choice questionsto reduce the survey response time. Categoricalquestions which may have an effect on weight, suchas “What is your sex?” were added to the survey.Some groups of different questions were combinedinto more generalized questions. Personal questionssuch as “Are you comfortable with your weight?”were designed in the multiple choice format.6FindingsFigure 1 shows changes in weight over the past year.From this graph, we see that a majority of peopledid not report any weight changes. Slightly morepeople reported a weight gain than a weight loss,but the small difference may not be stastically sigFigure 2: Weight gain vs. price per mealnificant.We first test if price spent per meal influences4

Stat 152 - Weight GainDecember 15, 2006weight gain. Intuitively, the amount of money spenton food should be linearly related to weight gain.However, according to figure 2, which displays a histogram of weight gain against price spent per meal,we see there is no significant relationship betweenthese two variables.Figure 4: Exercise time per majorFigure 3: Weight gain vs. majorNext we test if weight gain is influenced by exercise time. Figure 3 shows weight gain againstmajor, and figure 4 shows exercise time per major. Initially, we thought that students with highlevels of exercise would either lose or maintain theirweight. From the data, however, the relationship isambiguous: some students in majors (such as ME)who exercise often do tend to maintain their weight.In contrast, IEOR students also reported high levelsof exercise, but they gained weight. Exercise timeFigure 5: Weight gain vs. units takendoes not have any predictable effects on weight gain.Next we test how course load affects weight gain.Intuitively, there should not be a correlation be5

Stat 152 - Weight GainDecember 15, 2006Figure 6: Weight gain vs. courses takentween workload and weight gain. While it can be ar-Figure 7: Weight gain vs. gendergued that students under high stress eat less healthymeals, and may gain weight as a result, they alsomore likely to eat irregularly. Figures 5 and 6 whichshows how weight gain is related to unit load andcourse load, respectively, we see there is no clearrelationship.Next we test how gender affects weight gain.Figure 7 shows the relationship between weight gainand gender. From the results of the linear regression, we see that male engineer students tend to loseweight during last year, and females tend to gainweight. However, both men and women are morelikely to maintain their weight throughout the year.Because males are more numerous in the sampledpopulation, the results for males are more accurate.Finally, we examine how school year affects weightFigure 8: Weight gain vs. school yearchange. From figure 8, we see that first-year, thirdyear, fifth-year, and graduate level engineer stu6

Stat 152 - Weight GainDecember 15, 2006dents tend to keep a steady weight. Students in thedid not answer the last page of the questionnaire.second or fourth year, however, had their weightThis could be because they did not realize therechange throughout the year. Intutively, it makeswas a last page, did not have enough time to an-sense that second-years mentioned a weight changeswer those questions, or were unwilling to answerin the past year. Freshman in Berkeley tend tothem. Many of the questions on the last page weregain weight because of the unrestricted access torelated to economic status, how they receive funds,the cafeteria.and living status. Because these categorical ques-After doing a multiple linear regression on alltions were important to the results of our survey,the variables exercise time, school year, units taken,it may have been beneficial to condense the surveycourses taken, age, gender, and living status versusinto two pages. In addition, there was no statementweight gain, only school year and gender seem toon the questionnaire about the purpose of our sur-have some impact on weight gain.vey, so students may not have felt the survey waslegitimate.7The goal of our survey was to gain insight on theProblems Encounteredbehavior of upperclassmen engineering students. HowAfter a preliminary analysis of the data processed,ever, after surveying it was clear that not all of thethere were some problems with regard to the survey.students were engineering majors, nor were they allOne problem was the question asking the num-upperclassman. Because these students were in theber of hours that students spent on exercise. Thesampling frame, and they did not fit our target pop-question expected a numerical value for each cate-ulation, it reduced the amount of surveys we can usegory. However, because the input was placed at thefor each class.beginning of the line, the result was often a checkmark rather than a numerical value. The “hours”8Conclusion and Suggestionspart of the question was not emphasized enough,and perhaps should have been in bold script.This was a good exercise for us to gain experienceperforming actual sampling surveys in the real world.There were some other problems encountered inthe questionnaire design. Some of the questionsWe learned how to design a sampling plan for a com-had too large an agreement scale. For instance, theplex survey, create a clear and meaningful question-question addressing the “Freshman 15” myth hadnaire, gather data efficiently and accurately, per-an agreement scale with 10 possible values. Insteadform statistical data analysis, and write up a com-of 10 choices, it would have been better to restrictprehensive statistical report. Most importantly, wethe number to 3 or 4.also realized some of the mistakes encountered andhave the wisdom to prevent the same mistakes inThe other problem was that many respondents7

Stat 152 - Weight GainDecember 15, 2006the future.thus is ignorable. However, it is fallacious to simplyNow we can offer suggestions on how to improveassume that the respondents and non-respondentsfuture sampling surveys on the same topic. First,are distributed equally without any data to back itwe should make sure that our questions are pre-up. Instead, we should perform two-phase samplingcise and clear, and the results of its responses canin order to gather data from nonrespondents to helpbe easily translated into meaningful statistical data.reduce this bias.When we wish to find out if there are any relationships between the data provided by two differentquestions, we should make sure that the questionsand answer choices are worded in such a way thatthere can be a clear, direct comparison without anyconfounding variables. For instance, if we want tosee if money spent on food has any impact on a student’s weight change within a year, we should askabout how the student’s food spending habits havechanged compared to last year rather than just asking about how much money the student is spendingthis year.Another suggestion for improvement is to beable to seemingly categorical variables into relevantnumerical values for statistical analysis.For in-stance, rather than assigning Gained Weight 1, LostWeight 2, and Maintained Weight 3 within a table, it is more practical to do: Lost Weight -1,Maintained Weight-0, and Gained Weight-1. Withthis change, calculations can now be done to combine these three answer choices into one simple average number when comparing among different groupsof people.Finally, for future sampling design, we can improve on how to handle non-response bias. For thissurvey, we make the huge assumption that nonresponse data is missing at random within a class and8

Stat 152 - Weight Gain9December 15, 2006Group Reports9

Justin MungalHarmony ChiSibo ZhaoRicky SunStat 152 Team 1, Group 1October 12, 2006Statistics 152: Team 1 Sampling DesignPurpose:The purpose of this survey is to test for the existence of a correlation between the amountof money spent by UC Berkeley engineering upperclassmen on food and weight gained in thepast year. One might hastily assume that the more money a student spends on food, the moreweight one would gain given that they would have more food to eat. However, one must takeinto consideration that a student could spend more money on food by purchasing organic andquality food items that could provide a healthier diet, thereby maintaining or even decreasingbody weight. Thus the question of a correlation is no longer trivial and interesting enough forfurther analysis.Methodology:Our designated target population is all upperclassmen at UCB College of Engineering.This target population was selected because College of Engineering is smaller and morerestrictive than other colleges, yet it is still a relatively good representative of the total studentbody population. Also, information about the College of Engineering is readily available online.The quality of statistics is accurate enough so that our analyses and results will not becompromised.Our sampled population will be all respondents of the distributed questionnaire. Sincewe will be carrying out the survey in lectures and discussions, the sampling frame will consist ofUCB upper division engineers who attend the upper division engineering courses we survey in.We estimated our population size, the number of all upperclassmen at UCB College ofEngineering, to be N 1765.127 . The estimation is based on the data on 2005-2006 College ofEngineering’s statistics, provided by the Office of Student ). This is the best estimate of our target populationsince the data of the 2006-2007 has not been published, and previous year’s data is the closestrepresentative. We can then determine the total number of students to be sampled, n, by usingthe following formula:Zα2 / 2 S 2n ,Zα2 / 2 S 22e N()where S 2 1 1 1 is used get the “safe” estimation since there22is no prior survey conducted regarding this topic.When e 0.03 and Zα 2 1.96 for 95% confidence interval, we calculated that n 665

When e 0.05 and Zα 2 1.96 for 95% confidence interval, we calculated that n 315When e 0.05 and Zα 2 1.645 for 90% confidence interval, we calculated that n 235While e 0.03 and Zα 2 1.96 is the norm for estimating population size, it is notpractical for us to survey 665 students because we do not have the resources to do so. We canreduce the sample size by either increase the margin or error e, or lower the confidence interval.While changing the margin of error to 0.05 and the confidence interval to 90% will decrease thesample size to 235 students, we thought it was best that we keep the confidence interval at 95%and increase the margin of error to 0.05. A sample size of 315 is attainable while giving us moreaccuracy than 235.We first stratified the data by engineering discipline: Bioengineering (BioE), Civil &Environmental Engineering (CEE), Electrical Engineering & Computer Science (EECS),Industrial Engineering & Operations Research (IEOR), Material Science Engineering (MSE),Mechanical Engineering (ME), and Nuclear Engineering (NE). Note, students in otherengineering sciences (i.e., Engineering Physics, Engineering Mathematics) are assumed to beincluded in the sampling frame, given that they must take upper division classes from thepreviously listed major engineering disciplines. The decision to stratify was made because anSRS would be overly time consuming – one would have to devise a plan of randomly samplingall engineering upperclassmen at once. Also, engineers from different disciplines have verydifferent food consumption patterns. For example, an EECS major may only have time to eatwhile doing labs or late at night while engineers from other disciplines have much more freedom.Our surveys will take place in upper division engineering classes because this ensures efficiencyin data collection given that the target population is upperclassmen in the College of Engineering.Stratification was performed by grouping students by engineering departments (EECS, IEOR,etc.) because classes are listed in the college of engineering catalog by department.After stratifying the College of Engineering into seven strata, we used the two-stagecluster sampling method to select our samples. We estimated the number of upper divisionstudents in each department by multiplying the number of undergraduate students in eachdepartment by the percentage of upper division students in the College of Engineering. This isthe best estimation we can obtain because the number of upper division students in eachEngineering department is not reported. Next, we needed to calculate the sample size from eachstratum, or department. The stratum sample size is determined by the ratio of upper divisionstudents in each department and number of students in College of Engineering, we will call this r.Then, the sampling sizes for each department is obtained by r*n (see the following table of alldepartments and their corresponding sample sizes).StratumSample Size,# of UndergradStudents# of Upper 637361647.65836581Departmentnh

5.12496315.4958124Next, we collected data for the upper division UCB engineering class sizes from theOnline Schedule of Classes for the current semester (http://schedule.berkeley.edu/srchfall.html).Each class was assigned a ratio based on the number of people enrolled in the course to ensurethat everyone has the same probability of being chosen. For each class, the ratio equals to thenumber of people enrolled the class divided by the total number of people in the stratum.Cumulative frequency is then calculated based on the ratio we just assigned. Correspondingly,random number are generated between 0 and 1 using Excel function RAND(). The larger classeshave a higher chance of being selected because their cumulative frequency intervals are larger.Each randomly generated number falls into a cumulative frequency interval and thecorresponding class to the interval is chosen. Also, to ensure that we cover enough classes toachieve our desired sampling size, we generate enough random numbers so that the classes wesample have number of enrolled students that is approximately 1.5nh to 2nh . Notice that we areoverestimating the number of classes we need to sample. This is because we want to take intoaccount the people who are enrolled in the class but do not show up on the day of the survey, orthey show up but are not engineering majors.For lectures that are too large to sample in lecture, we chose to sample all the discussionsections for that lecture if they are offered. Instead of listing the lecture as a primary samplingunit, we have each discussion session for that lecture as a primary sampling unit. This can bedone because we assume that students from the same lecture have similar backgrounds, and theyare evenly distributed throughout all the discussion sections. Furthermore, since we have no wayto reach the students for classes that do not meet regularly in an assigned classroom, such asindependent studies classes, we decided to omit them. This will not produce a large bias sincethese classes are very small in size, usually 3 to 10 students. And these students are most likelytaking other engineering courses also. Not only will it be costly and time consuming to trackdown these students, but since we are excluding those who have already taken the survey inother engineering classes to avoid multiple submissions from one observational unit, it seemslogical to omit these classes. In addition, if a class is cross-listed, we will survey everybody inthe course and group it under the department that had the most students enrolled in the course.Also, if more than one randomly generated number falls into the same cumulative frequencyinterval, we discard all but one of those randomly generated numbers. Once the classes areselected, under two-stage cluster sampling, we survey every student in those classes and take anSRS of the number of surveys we need based on the calculation above. If a student has alreadytaken a survey in another class, we ask them not to take it again.

To demonstrate the method we used in choosing the classes to sample, consider the followingclass list of the Nuclear Engineering Department:ClassSizeRatioCumulativeNE 101370.3162390.3162393NE 104B30.0256410.3418803NE 120190.1623930.5042735NE 124200.170940.6752137NE 161220.1880340.8632479NE 180160.1367521Total1171n h 5RAND()Classesto besampled0.83126NE 161The first randomly generated number is 0.83126, it falls between [0.675214, 0.863248],therefore, NE 161 is included in our sample. Since we only needed 5 students from the NuclearEngineering Department, and the first randomly generated number gave us NE 161, which has22 students, it is enough to just sample NE 161. After all surveys are completed for this class,we randomly pick out 5 of them to be included in our sample. Using the same method for alldepartments, a total of 14 classes were selected.The classes to be sampled are displayed in the following table:Classes to beSampledSize of ClassClasses to be Sampled Size of ClassBioE 10435EECS 14350BioE/MSE 11842IEOR 15032CEE 12443ME 102 LAB 328CEE 130 L253ME 11068EECS 12651ME 109 DIS 144EECS 12318MSE 10264EECS 14155NE 16122Lastly, to make the survey process run smoothly, it is recommended that surveyorscontact those teaching the class to be surveyed. Together they can decide on a time that wouldbe most appropriate and most comprehensive for a class survey to be conducted. If a professordecides that we cannot survey his or her class, we may need to sample another class similar tothat one. Also, as a warning to surveyors, caution should be taken to ensure that duplicate

surveys are not collected. This can be done with verbal and/or written directions requesting thatthe survey only be participated in once. All surveyors should be consistent with thesepreliminary directions. To avoid questionnaire bias, multiple questions regarding students’weight and money spent on food should be asked. The intention of asking the same question inmultiple ways is to catch inconsistencies as well as to gain context of the participant’s responseto the most interesting questions on the questionnaire.

Stat 152Professor HuangChen ChangNancy YaoDina ObaAndrew ParkGroup I Questionnaire Group ReportIntroduction:Our objective was to come up with questions with regards to our Team I project proposal.The purpose of our survey is to test for the existence of a correlation between the amountof money spent by UC Berkeley engineering upperclassmen on food and weight gained inthe past year. The two dependent variables are clearly amount of money spent on foodand the weight of upperclassmen engineering students. Thus our motivation is clear andour goal is to derive the simplest and quickest survey to perform and yet becomprehensive enough to gather enough data for analysis as desired.The Questionnaire Design Process:The following questions were omitted from the final survey design with reasoningprovided:36. How important is food to you?People from our own stat152 class didn’t provide any useful responses during the testingdemo we conducted.All general information questions were shifted towards the end of the survey becausestudents felt it was too personal to be right at the beginning, better to let the testparticipant get accustomed to the background of the survey contents before striking forthe personal detailsQuestions with options to select a number on a scale from 1-10 had a zero added to befrom 0-10 to be even more inclusive in the event that the test participant feels he or she issomehow not included amongst the possible choicesFor instance,11. How many meals (not including snacks) do you eat per day?12. How often do you snack per day?There was a suggestion to add an additional question of “Which grocery stores do you goto more?” but we chose not to add it because we were trying to shorten the questionnaireand the follow-up question didn’t directly pertain to our topic.Parts of the following question were omitted to shorten the length of the survey:19. What sorts of foods do you buy the most? (choose one from each row)

23. Do you have a meal plan?This question resulted in the response “no” more often than not so we got rid of thefollow-up question although it would be useful to know if these students had a meal plan.Removed 24. If you answered yes to the above question, how many points do you haveper semester?We got the suggestion to add in the question, “Do your parents cook food for you and thenbring it over?” but we decided to omit it because it was hard to define and we figured thatit applied to a small portion of people.26. Do you eat while you are studying?27. Do your eating habits change during midterm/final weeks?We decided to omit both of these questions because the relevancy factor wasn’t as high ascertain other questions and in the interest of shortening the length of the survey somequestions had to go.8. Do you have a job/paying internship or do you get a consistent allowance?Revised to the following to make it clearer:22. Do you have a job/paying internship or do you get a consistent allowance? You maycircle more than one.We added in the question of gender as part of the general information section because itmay be desirable to view the weight and food money spent correlation across sexes aswomen and men certainly have different diets and appetites.30. How many hours per week do you spend at the gym/playing sports/exercising?Omitted because we combined this question into a better worded question 31.17. How much do you spend per trip to the grocery store?This question was made clearer by changing the wording to “How much do you spend pertrip to the grocery store on yourself?”20. Do you buy bulk (as in do you shop at Costco, Sam’s Club, etc)?This question was omitted because question 21 was reworded to incorporate the responseto question 20 and it is no longer necessary.25. How many meals a week do you eat with friends?This question is not directly related to the survey topic so it was discarded.6. What is your ethnicity?This question from the background information session was tossed out because ofirrelevancy to the topic and some people may not feel like disclosing the classification.

Second Revision Below:13. On average, how many caffeinated beverages (soda, coffee, energy drinks, etc) do youconsume per week?4. Approximately how many of your meals per week are spent eating fast food likeMcDonald’s, Taco Bell, etc?8. How often do you buy organic food items?On a scale of relevancy, these three questions were eliminated for the sake of length of thesurvey as it was four pages long and professor Huang commented that it was too lengthyfor students to take and should be less than 3 minutes long. These three were chosen tobe removed because they have sibling questions which are closely related and thesignificance is not great as long as one or the other are retained.Categories were deleted to make the survey look shorter and because it became irrelevantsince there were only one or two questions under each classified category.The Final Questionnaire:After all the changes and modifications noted

Stat 152 - Weight Gain December 15, 2006 Figure 6: Weight gain vs. courses taken tween workload and weight gain. While it can be ar-gued that students under high stress eat less healthy meals, and may gain weight as a result, they also more likely to eat irregularly. Figures 5 and 6 which shows how weight gain is related to unit load and

Related Documents: