LangXage? Gender BiaVand Under-RepreVenWaWion In

2y ago
3 Views
3 Downloads
5.77 MB
45 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Warren Adams
Transcription

I Machine Learning Speaking M Lang age? Gender Bia andUnder-Repre en a ion in Na ral Lang age Proce ing AcroH manLang ageJeanna Neefe Matthews, Yan Chen, Christopher Mahoney, Isabella Grasso, Esma Wali, AbigailMatthews, ThomasMiddleton, Mariama NjieE panded Tech Repor (April 25 2021)ABSTRACTNatural Language Processing (NLP) systems are at the heart of many critical automated decision-making systems making crucialrecommendations about our future world. However, these systems reflect a wide range of bias, from gender bias to a bias inwhich voices they represent. In this paper, a team including speakers of 9 languages - Chinese, Spanish, English, Arabic,German, French, Farsi, Urdu, and Wolof - reports and analyzes measurements of gender bias in the Wikipedia corpora for these 9languages. In the process, we also document how our work exposes crucial gaps in the NLP-pipeline for many languages. Despitesubstantial investments in multilingual support, the modern NLP-pipeline still systematically and dramatically under-representsthe majority of human voices in the NLP-guided decisions that are shaping our collective future. We develop extensions toprofession-level and corpus-level gender bias metric calculations originally designed for English and apply them to 8 otherlanguages, including languages like Spanish, Arabic, German, French and Urdu that have grammatically gendered nounsincluding different feminine, masculine and neuter profession words. We compare these gender bias measurements across theWikipedia corpora in different languages as well as across some corpora of more traditional literature.KEYWORDSgender bias, natural language processing, Wikipedia1.INTRODUCTIONCorpora of human language are regularly fed into machine learning systems as a key way to learn about the world. NaturalLanguage Processing plays a significant role in many powerful applications such as speech recognition, text translation, andautocomplete and is at the heart of many critical automated decision systems making crucial recommendations about our futureworld (Yordanov 2018)(Banerjee 2020)(Garbade 2018). Systems are taught to identify spam email, suggest medical articles ordiagnoses related to a patient’s symptoms, sort resumes based on relevance for a given position, and many other tasks that formkey components of critical decision making systems in areas such as criminal justice, credit, housing, allocation of publicresources and more. Much like facial recognition systems are often trained to represent white men more than black women(Buolamwini 2018), machine learning systems are often trained to represent human expression in languages such as English andChinese more than in languages such as Urdu or Wolof.The degree to which some languages are under-represented in commonly used text-based corpora is well-recognized, but theways in which this effect is magnified throughout the NLP-tool chain is less discussed. Despite huge and admirable investmentsin multilingual support in projects like Wikipedia (Wikipedia 2020C), BERT (Devlin et al. 2018), Word2Vec (Mikolov et al.2013), Wikipedia2Vec (Yamada et al. 2018)(Ousia 2016), Natural Language Toolkit (NLTK 2005), MultiNLI (Williams et al.2020), many NLP tools are only developed for and tested on one or at most a handful of human languages and importantadvancements in NLP research are rarely extended to or evaluated for multiple languages. For some languages, the NLP-pipelineis streamlined: large publicly available corpora and even pre-trained models exist, tools run without errors and there is a rich setof research results applied to that language. However, for the vast majority of human languages, there is hurdle after hurdle. Evenwhen a tool technically does support a given language, that support often comes with substantial caveats such as higher errorrates and surprising problems. Also lack of representation at early stages of the pipeline (e.g. small corpora) adds to the lack ofrepresentation in later stages of the pipeline (e.g lack of tool support o

Apr 25, 2021 · The defining set is a list of gendered word pairsused to define what a gendered relationship lookslike. Bolukbasi et al’s original defining set contained 10 English word pairs (she-he,daughter-son, her-his, mother-father, woman-man,gal-guy, Mary-John, . she-he They are the same words in some languages like Wolof, Farsi, and Urdu. In German .

Related Documents:

accessible and diverse gender information. It is one of a family of knowledge services based at IDS . Other recent publications in the Cutting Edge Pack series: Gender and Care, 2009 Gender and Indicators, 2007 Gender and Sexuality, 2007 Gender and Trade, 2006 Gender and Migration, 2005 Gender and ICTs, 2004 . 6.3.1 Gender mainstreaming .

keywords: gender identity bill - gender identity - gender discrimination – equality - human rights - european union law - national law. malta’s gender identity, gender expression and sex characteristics act – a shift from a binary gender to a whole new spectrum?

Brief 1.Gender and countering transnational organized crime and trafficking Brief 2.Gender and countering corruption Brief 3.Gender and terrorism prevention Brief 4.Gender and justice Brief 5.Gender and health and livelihoods Annexes Checklists for gender mainstreaming

7 In order to effectively mainstream gender in an organisation, the staff should be able to: n Identify gender inequalities in their field of activity; n Define gender equality objectives; n Take account of gender when planning and implementing policies and programmes; n Monitor progress; n Evaluate programmes from a gender perspective. Principles of gender mainstreaming

responsif gender - advokasi /sosialisasi ttg hiv aids - pameran publikasi ttg kegiatan ttg gender pug , kegiatan responsif gender - pelatihan/pendampingan pprg (perencanaan. penganggaran responsif gender - pembentukan pokja, fokal point - penyusunan data terpilah - publikasi gender melalui media cetak website pameran

REDUCING GENDER BASED VIOLENCE 3 Reducing Gender-Based Violence Gender-based violence (GBV) is physical, psychological, or sexual violence perpetrated against an individual or group on the basis of gender or gender norms.

only upon women to consider also how masculinities (socially constructed meanings of manhood) underpin gender inequality. Many gender equality advocates are reflecting upon the relationships between gender and sexual orientation, gender identity and gender Image: Ghanaian police officer Ma

American Chiropractic Board of Radiology Heather Miley, MS, DC, DACBR Examination Coordinator PO Box 8502 Madison WI 53708-8502 Phone: (920) 946-6909 E-mail: exam-coordinator@acbr.org CURRENT ACBR BOARD MEMBERS Tawnia Adams, DC, DACBR President E-mail: president@acbr.org Christopher Smoley, DC, DACBR Secretary E-mail: secretary@acbr.org Alisha Russ, DC, DACBR Member-at-Large E-mail: aruss@acbr .