Data And Society Anonymity -Lecture 6 - Computer Science

1y ago
17 Views
3 Downloads
6.27 MB
28 Pages
Last View : 2d ago
Last Download : 3m ago
Upload by : Mia Martinelli
Transcription

Data and SocietyAnonymity – Lecture 62/11/21Fran Berman, Data and Society, CSCI 4370/6370

Today (2/11/21) Briefing Instructions Lecture / Discussion – Anonymity Student PresentationsFran Berman, Data and Society, CSCI 4370/63702

Briefing Instructions – Team Project Briefing topic: Your team shouldchoose a data-related bill currentlyin Congress. Describe its content,status, and potential implicationsfor enforcement for your reader –your boss / elected official.Briefings are informational pieces.Everything you need to knowshould be summarized andexplained within the briefing. Yourbriefing should summarize keyaspects of the bill (see next slide)and provide a recommendation.Briefing is due by 11:59 p.m.,2/26/21Format: Two pages, .docx, 11 pt.font or larger, cite references in textas neededBriefing is worth 15 points. How your teambe graded:Each team member will get the samebriefing grade. Content: 8 points– Does the briefing address the questions?– Is it clear and self-contained?– Is the briefing interesting to read?– Did you provide a recommendation? Writing: 7 points– Is the writing compelling, concise andinformative– Does the piece read well? Is itinformative?– Is the spelling and grammar correct? Isthe piece appropriately referenced?Fran Berman, Data and Society, CSCI 4370/6370

Briefing Structure A briefing is informational, often prepared for a decision-maker, who may need tomake hard choices about topics that they do not understand well or don’t have time toresearch in-depth. A briefing fills in key details your decision-maker needs to know. Your briefing will alsopropose a recommendation on whether to vote for the bill or not. Your briefing should address the following questions– What bill are you describing, who introduced it, and when?– What does the bill do? (significant aspects)– Who will the bill impact and how?– What are its limitations? (legal limitations, what it does not cover)– How will it be enforced?– What is your recommendation? (should your stakeholder vote for or against this billand why)– References as needed (not counted in page count)Fran Berman, Data and Society, CSCI 4370/6370

Useful resources Focus on data-related bills in the 116th and117th Congress that HAVE NOT been passed– Privacy bills in the 116th df/LSB/LSB10441– https://www.congress.gov/browseFran Berman, Data and Society, CSCI 4370/6370

Briefing Teams – email bermaf@rpi.edu ifyou don’t have your partner’s email Justin C. and Nicholas J.Jeff H. and Isaac L.Jin H. and Ishita P.Nathan S. and Eric X.Davis E. and Hannah L.Adam M. and Sola S.Angelina M. and Liam M.Grant B. and Justin O.Julian C. and Chris P.Josh M. and Greg S.Fran Berman, Data and Society, CSCI 4370/6370

Reading / Speaker for February 18 Reading for nexttime: “BenWizner: Pull backto lback-to-reveal/Fran Berman, Data and Society, CSCI 4370/6370

an1-28The Data-driven WorldFran2-1Data and COVID-19Fran2-4Data and Privacy -- IntroFran2-8Data and Privacy – DifferentialPrivacyFran2-11Data and Privacy – Anonymity /Briefing InstructionsFran2-15NO CLASS / PRESIDENT’S DAY2-18Data and Privacy – LawBen Wizner2-22Digital rights in the EU andChinaFran2-25Data and Discrimination 1Fran3-1Data and Discrimination 2Fran3-4Data and Elections 1Fran3-8Data and Elections 2Fran3-11NO CLASS / WRITING DAY3-15Data and AstronomyAlyssaGoodman3-18Data ScienceFran3-22Digital HumanitiesBrett Bobley3-25Data Stewardship andPreservationFran3-29Data and the IoTFran4-1Data and Smart FarmsRich Wolski4-5Data and Self-Driving CarsFran4-8Data and Ethics 1Fran4-12Data and Ethics 2Fran4-15CybersecurityFran4-19Data and DatingFran4-22Data and Social MediaFran4-26Tech in the News5-3NO CLASSFran4-29Wrap-up / DiscussionFran Berman, Data and Society, CSCI 4370/6370Fran

Lecture – Anonymity Anonymity (Sweeney) Netflix Competition (Narayanan andShmatikov)Fran Berman, Data and Society, CSCI 4370/6370

Privacy and Anonymity Privacy: The state of being free from being observedor disturbed by other people; the state of being freefrom public attention. Anonymity: Lack of outstanding, individual, orunusual features; impersonalityFran Berman, Data and Society, CSCI 4370/6370

Anonymity: Can you keep your dataprivate by removing explicit identifiers? Latanya Sweeney et al:Removing / Changingexplicit identifiers willnot get you anonymityFran Berman, Data and Society, CSCI 4370/6370

Key Definitions (informal)) Anonymous data: Data that cannot be manipulated or linkedto confidently identify the entity that is the subject of the data. Explicit identifier: Set of data elements (e.g. {name, address}or {name, phone number} for which with no additionalinformation, the designated person can be directly anduniquely ascertained. Quasi-identifier: Set of data elements that in combination canbe used to identify an entity uniquely or almost uniquely. De-identified data: Data with explicit identifiers removed,generalized, or replaced with a made-up alternative.Fran Berman, Data and Society, CSCI 4370/6370

Re-identifying data may be straightforward Most states collect hospitaldischarge data, which is distributedto researchers, sold to industry, andoften made publicly available. When coupled with census data orvoter registration data,combinations of characteristics canidentify individuals, even if the datahas been de-identified.– {Zip, gender, month and year ofbirth}– {Zip, gender, age}– {County, gender, date of birth}– {County, gender, age}– Etc.Fran Berman, Data and Society, CSCI 4370/6370

Sweeney/2000: Simple quasi-identifierscan be used to identify entities Sweeney showed that {zip,gender, birthdate} (quasiidentifier) could be used toidentify most people (87%) inthe U.S.– Methodology used publicdata (1990 U.S. Census) andother publicly and semipublicly available health dataFran Berman, Data and Society, CSCI 4370/6370

K-anonymity (Sweeney and Samarati) K-anonymity is a property of a data set, usually used inorder to describe the data set’s level of anonymity. A data set is said to be k-anonymous if the information foreach person contained in the set cannot be distinguishedfrom at least k-1 other individuals whose information alsoappear in the data-setFran Berman, Data and Society, CSCI 4370/6370

Ways to increase k in k-anonymity:Suppression and Generalization Suppression: Some values ofthe attributes are replaced by * Generalization: Individualvalues of the attributes arereplace by a broader category Upper table has 1-anonymitybecause Ramsha can beidentified uniquely by age. Lower table has 2-anonymitywrt {age, gender, state ofdomicile}Fran Berman, Data and Society, CSCI 4370/6370

CaveatsK-anonymity is susceptible to attacks: K-anonymity fails in high-dimensional data sets and mostreal-world datasets of individual recommendations andpurchases. When background knowledge is available to an attacker,such attacks become even more effective.Fran Berman, Data and Society, CSCI 4370/6370

The 2006 Netflix Competition Netflix 2006 competition: Netflix offered 1M prize for improving their movierecommendation service.– Dataset with 100M movie ratings created by 480K Netflix subscribersbetween 1995 and 2005 provided.– Ratings did not appear to have been perturbed significantly NetFlix Prize dataset did not provide user names. In answer to the question “Isthere any customer information in the database that should be kept private?”,Netflix said (FAQ):– “No, all customer identifying information has been removed; all that remains are ratings anddates. This follows our privacy policy, which you can review here. Even if, for example, you knewall your own ratings and their dates you probably couldn’t identify them reliably in the databecause only a small sample [of data] was included (less than one-tenth of our complete dataset)and that data was subject to perturbation. Of course, since you know all your own ratings thatreally isn’t a privacy problem, is it?“ Narayanan and Shmatikov showed that the data set could be de-anonymizedFran Berman, Data and Society, CSCI 4370/6370

Netflix data set: removing identifyinginformation was not sufficient for anonymity Researchers used an “adversary approach” to deanonymize users, even when some of the auxiliaryinformation was imprecise Used additional information to identify individualsubscribers:– Private records of Netflix subscribers they knew.– Public IMDB ratings Narayanan and Shmatikov studied the question “How muchdoes the adversary need to know about a Netflixsubscriber in order to identify her record in the data set,and thus learn her complete movie viewing history”.Fran Berman, Data and Society, CSCI 4370/6370

Methodology Adversary’s goal is to de-anonymize an anonymous record R from thepublic database.– Adversary has a small bit of auxiliary information or background knowledgerelated to R (restricted to a subset of R’s attributes)– The auxiliary information may be imprecise or incorrect Designate Netflix users to be “similar” when two subscribers create ratingsthat are close with respect to date (within 3 days, within 14 days, withininfinity days [no date given])) and value (same or within 1) De-anonymization approach:– Assign a numerical score to each record based on how well it matches someauxiliary/outside information– Use matching criteria used to see if there is a match between records in thedatabase and auxiliary information– Select “best guess” candidate records with highest score(s)Fran Berman, Data and Society, CSCI 4370/6370

Additional information How does an adversary get auxiliary information?– Conversation or overheard information– Personal blogs and Google searches– Public IMDB ratings (likely strong correlation with Netflix ratings) Researchers used a few dozen IMDB users to breach data set Sparsity of information in dataset increases the probabilitythat the adversary strategy succeeds in de-anonymizing thedata and decreases the amount of auxiliary informationneeded.– True of Netflix dataset– Many real-world datasets containing individual transactions,preferences, etc. are sparseFran Berman, Data and Society, CSCI 4370/6370

Resultshttp://www.cs.utexas.edu/ shmat/shmat oak08netflix.pdfFran Berman, Data and Society, CSCI 4370/6370

Why does de-anonymizing the Netflixdataset matter? Cross-correlation methods can reveal other non-publicpersonal information (viewing habits may indicate political,sexual, religious or other preferences) Privacy breaches can endanger “future privacy” – privateinformation in future sessions General methodology can be used with other similar sparsedatasets (e.g. those focusing on social relationships) Could violate data privacy policy that claims that Netflixcustomer data will be shared and used anonymouslyFran Berman, Data and Society, CSCI 4370/6370

Lecture 8 References (not already on slides) “Simple Demographics Often Identify People Uniquely”, Latanya, Sweeney,CMU Working ability/paper1.pdf K-anonymity, https://www.quora.com/What-is-k-anonymity “Robust de-anonymization of sparse data sets”, Arvind Narayanan andVitaly Shmatikov,http://www.cs.utexas.edu/ shmat/shmat oak08netflix.pdf “Have I Been Pwned — which tells you if passwords were breached — isgoing open source”, The untFran Berman, Data and Society, CSCI 4370/6370

PresentationsFran Berman, Data and Society, CSCI 4370/6370

Upcoming PresentationsFebruary 18 “Analysis: California privacy reboot puts rights in spotlight”, Bloomberg ts-in-spotlight “To fix social media now, focus on privacy, not platforms”, The ormsFebruary 22 “Grindr on the hook for 10M euro violations over GDPR consent violations”, /?guccounter 1&guce referrer aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&guce referrer sig 1Fn d-YBJCxvHYwuGyB9scWgeT “How the West got China’s social credit system wrong,” dit-score-system/Fran Berman, Data and Society, CSCI 4370/6370

Need VolunteersFebruary 25 (Vaccines and discrimination) “Where do the vaccine doses go and who gets them? The algorithms decide.”,New York times, inealgorithms.html?referringSource articleShare (Nicholas J.) “Getting a Covid vaccine can be required by your boss. Why that's a good thing— and a danger”, NBC News, 89 (Julian C.)Fran Berman, Data and Society, CSCI 4370/6370

Presentations for February 11 “We’re banning facial recognition. We’re missing the point.” NewYork Times, ecognition-ban-privacy.html (Josh) “This site published every face from Parler’s Capitol riot videos”,Wired, https://www.wired.com/story/faces- (Nate)Fran Berman, Data and Society, CSCI 4370/6370

3-8 Data and Elections 2 Fran 3-11 NO CLASS / WRITING DAY 3-15 Data and Astronomy Alyssa Goodman 3-18 Data Science Fran 3-22 Digital Humanities Brett Bobley 3-25 Data Stewardship and Preservation Fran 3-29 Data and the IoT Fran 4-1 Data and Smart Farms Rich Wolski 4-5 Data and Self-Driving Cars Fran 4-8 Data andEthics 1 Fran

Related Documents:

A separate privacy principle dealing with consent? 686 20. Anonymity and Pseudonymity 689 Introduction 689 Expanding the anonymity principle 690 Application of the 'Anonymity and Pseudonymity' principle 696 Guidance on the 'Anonymity and Pseudonymity' principle 706 Summary of 'Anonymity and Pseudonymity' principle 708 21.

Introduction of Chemical Reaction Engineering Introduction about Chemical Engineering 0:31:15 0:31:09. Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Lecture 27 Lecture 28 Lecture

anonymity unless accompanying policies are respected. The k-anonymity protection model is important because it forms the basis on which the real-world systems known as Datafly, µ-Argus and k-Similar provide guarantees of privacy protection. Keywords: data anonymity, data privacy,

Oct 10, 2013 · Anonymity and encryption are not new phenomena: anonymity has long facilitated the expression of controversial ideas and enabled dissent in many countries of the world; the use of ciphers and codes to protect the privacy of communications has The protection of anonymity

actor or actress, or the local chief of police?” A.“Like everyone else, public figures should have the protection of anonymity to the extent that they desire it.” (“Understanding Anonymity,” p. 9) Q. “I saw an ad in the paper for an A.A. group. It t

Fran Berman, Data and Society, CSCI 4370/6370 Today (9/28/20) Personal Essay 2 due October 4 Lecture - Anonymity Discussion Presentations 2

Lecture 1: A Beginner's Guide Lecture 2: Introduction to Programming Lecture 3: Introduction to C, structure of C programming Lecture 4: Elements of C Lecture 5: Variables, Statements, Expressions Lecture 6: Input-Output in C Lecture 7: Formatted Input-Output Lecture 8: Operators Lecture 9: Operators continued

to define the quality of anonymity and to compare different anonymity systems. Malacaria [16] defined the leakage of confidential information in a program as the conditional mutual information between its outputs and secre