Lutz Prechelt Freie Universität Berlin, Institut Für .

3y ago

3 Views

2 Downloads

468.92 KB

37 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Cade Thielen

Report this link

Download PDF

Transcription

Course "Empirical Evaluation in Informatics"BenchmarkingLutz PrecheltFreie Universität Berlin, Institut für Informatik Example 1: SPEC CPU2000 Benchmark measure task Quality attributes: accessibility, sample comparisonProblems: cost, taskcomposition, overfittingLutz Prechelt, prechelt@inf.fu-berlin.deaffordability, clarity, portability,scalability, relevance.Example 2: TREC1 / 37

"Empirische Bewertung in der Informatik"Vergleichstests (Benchmarks)Prof. Dr. Lutz PrecheltFreie Universität Berlin, Institut für Informatik Beispiel 1: SPEC CPU2000 Qualitätsmerkmale: Benchmark Maß Aufgabe Zugänglichkeit, Aufwand, VergleichProbleme: Kosten,Aufgabenauswahl,ÜberanpassungLutz Prechelt, prechelt@inf.fu-berlin.de Klarheit, Portierbarkeit,Skalierbarkeit, RelevanzBeispiel 2: TREC2 / 37

"Benchmark"Merriam-Webster online dictionary, m-w.com: a mark on a permanent object indicating elevation and serving as areference in topographic surveys and tidal observations a point of reference from which measurements may be made a standardized problem or test that serves as a basis forevaluation or comparison (as of computer systemperformance)Lutz Prechelt, prechelt@inf.fu-berlin.de3 / 37

Example 1: SPEC CPU2000 SPEC Standard Performance Evaluation Corporation A not-for-profit consortium of HW and SW vendors etc. Develops standardized measurement procedures (benchmarks)for various aspects of computer system performance CPU (including cache and memory)Cloud platforms , virtualizationGraphicsHigh-performance computing (msg-passing, shared-memory)Java (client, server)Mail serverStorage (network file system etc.)Power consumption We consider the CPU benchmarkLutz Prechelt, prechelt@inf.fu-berlin.de4 / 37

Sources http://www.spec.org John Henning: "SPEC CPU2000:Measuring CPU Performance in the New Millennium",IEEE Computer, May 2000 The benchmark suite had five versions:CPU92, CPU95, CPU2000, CPU2006, CPU2017. CPU2017 still has the same basic architecture.Lutz Prechelt, prechelt@inf.fu-berlin.de5 / 37

CPU2000 approach Select a number of real-world programs must be portable to all Unix and Windows systems of interest balance different aspects such as pipelining, cache, memoryperformance etc. some emphasize floating point computations (SPECfp2000) others have only integer operations (SPECint2000) now SPECspeed2017 Integer, SPECspeed2017 Floating Point,SPECrate2017 Integer SPECrate2017 Floating Point rate vs. speed for multi-core vs. single-core performance Specify concrete program runs for each program Package programs and runs so as to make them easilyapplicable on any new system application requires recompilation:SPEC also tests compiler performance!Lutz Prechelt, prechelt@inf.fu-berlin.de6 / 37

CPU2000 performance measuresThere are 2 x 2 different measurement modes: 2 different compiler settings: using basic compiler optimization settings SPECint base2000, SPECfp base2000Benchmarks needto decide onmany details! using aggressive settings SPECint2000, SPECfp2000 requires experimentation and experience with the compiler 2 different measurements: measuring speed (1 task) measuring throughput (multiple tasks) SPECint rate2000, SPECint rate base2000 etc. throughput is relevant for multi-user systems or long-runningprocessesLutz Prechelt, prechelt@inf.fu-berlin.de7 / 37

CPU2000 performance measures (2) Performance is expressed relative to a reference machine Sun Ultra 5, 300 MHz defined to have performance 100 used to normalize the measurements from the different programs Overall performance is determined as the geometric meanover the n benchmark programs geometric mean: n-th root of the product e.g. mean of 100 and 200 is 141 best results require steady performance across all programsLutz Prechelt, prechelt@inf.fu-berlin.de8 / 37

CPU2000integer benchmark compositionLutz Prechelt, prechelt@inf.fu-berlin.de9 / 37

floating point benchmark compositionLutz Prechelt, prechelt@inf.fu-berlin.de10 / 37

Reasonsfor selecting a program (or not) Should candidate program X be part of the benchmark? Yes if: it has many users and solves an interesting problem it exercises hardware resources significantly it is different from other programs in the set No if: ititititis not a complete applicationtoo difficult to portperforms too much I/Ois too similar to other programs in the set These factors are weighed against each otherLutz Prechelt, prechelt@inf.fu-berlin.de11 / 37

Some results From top to bottom (in each group of 4 machines): Processor clock speed: 500, 500, 533, 500 MHz L1 cache size: 16, 16, 16, 128 KB L3 cache size: 8, 2, 4, 4 MBLutz Prechelt, prechelt@inf.fu-berlin.deWhich one willbe slowest?12 / 37

Problems of SPEC CPU2000 Portability It is quite difficult to get all benchmark programs to work on allprocessors and operating systems SPEC uses 'benchathons': multi-day meetings where engineerscooperate to resolve open problems for the next version of thebenchmark Which programs go into the benchmark set? Won't one company's SPEC members try to get programs in thatfavor that company's machines? No, for two reasons:1. SPEC is rather cooperative. These are engineers; they valuetechnical merit2. The benchmark is too complex to predict what program mightbenefit my company's next-generation machine more than itscompetitorsLutz Prechelt, prechelt@inf.fu-berlin.de13 / 37

Problems of SPEC CPU2000 (2)Or: How to shoot yourself in the foot Compiler optimizations can break a program's semantics SPEC has to check the results produced for correctness Is execution time the right basic measurement? The programs do have small source code differences on variousoperating systems (in particular for C and C : #ifdef ) library not fully standardized, big-endian vs. little-endian etc. Even identical programs with identical inputs may do differentnumbers of iterations implementation differences of floating point operations SPEC allows such differences within limitsLutz Prechelt, prechelt@inf.fu-berlin.de14 / 37

General benchmarking methodology Benchmarking is one of several evaluation methods We have now seen a concrete example SPEC CPU2000 Now let us look at the general methodologyLutz Prechelt, prechelt@inf.fu-berlin.de15 / 37

Source Literature: Susan Sim, Steve Easterbrook, Richard Holt:"Using benchmarking to advance research: A challenge tosoftware engineering",25th Intl. Conf. on SW Engineering, IEEE CS press, May 2003Lutz Prechelt, prechelt@inf.fu-berlin.de16 / 37

Benchmark partsA benchmark consists of three main ingredients: Performance measure(s) As a measure of fitness-for-purpose Measurement is often automatic and usually quantitative, butcould also be manual and/or qualitative Task sample One or several concrete tasks, specified in detail Should be relevant and representative Comparison Measurement results are collected and compared Provides motivation for using the benchmark Promotes progressLutz Prechelt, prechelt@inf.fu-berlin.de17 / 37

Benchmarking methodology1. Agree on a performance measure2. Agree on a benchmarking approach3. Define the benchmark content4. Define a benchmarking procedure5. Define a result report format6. Package and distribute benchmark7. Collect and catalog benchmark resultsLutz Prechelt, prechelt@inf.fu-berlin.de18 / 37

Benchmarks define paradigms A scientific benchmark operationalizes a research paradigm Paradigm: Dominant view of a discipline Reflects consensus on what is important Immature fields cannot agree on benchmarks A commercial benchmark (such as SPEC) reflects amainstreamLutz Prechelt, prechelt@inf.fu-berlin.de19 / 37

Why are benchmarks helpful? Technical factors Easy-to-understand and easy-to-use technique High amount of control Support replication of findings, hence credibility Sociological factors Focus attention to what is (considered) important Define implicit rules for conducting research hence promote collaboration among researchers help create a community with common interest Promote openness force the dirty details into the open make hiding flaws difficultLutz Prechelt, prechelt@inf.fu-berlin.de20 / 37

Problems with benchmarks Cost Designing, composing, implementing, and packaging abenchmark is a very work-intensive task Can only be done by a significant group of experts; takes long Task composition Agreeing on what exactly goes into a benchmark task is difficult: different players may have different foci of interest different players may want to emphasize their own strengths real-world usage profiles are usually unkown Overfitting If the same benchmark task is used too long, the systems willadapt to it too specifically benchmark performance will increasealthough real performance does notLutz Prechelt, prechelt@inf.fu-berlin.de21 / 37

Quality attributes of good benchmarks Accessibility should be publicly available and easy to obtain Affordability effort required for executing benchmark must be adequate Clarity specification must be unambiguous Portability, Scalability must be easily applicable to different objects under study Relevance task must be representative of real world Solvability (relevant for methods benchmarks) objects under study must be able to "succeed"Lutz Prechelt, prechelt@inf.fu-berlin.de22 / 37

A short benchmark example Image Segmentation benchmark Given a picture, the user marks known foreground (white), andpossible foreground (gray) Segmentation algorithm tries to extract exactly all foreground Result is compared against "ground truth" distance m/2005/2489/00/24890253.pdfLutz Prechelt, prechelt@inf.fu-berlin.de23 / 37

Example 2: TREC Text Retrieval Conference annually since 1992 Topic: Information Retrieval of text documents Given large set of documents and query, find all documents relevantto the query and no others (like a web search engine) Documents are ranked by perceived relevance Performance measures:Precision: Fraction of retrieved documents that are relevantRecall:Fraction of relevant documents that are retrieved Core activity is comparing results (and approaches for gettingthem) on pre-defined tasks used by the participants TREC now has many different tasks Each of them is a separate benchmark number of tasks at TREC overall: 1992: 2, 2005: 15, 2018: 7 There is even a formalized procedure for proposing new tracks We will look at only one of them: "Ad-hoc retrieval"Lutz Prechelt, prechelt@inf.fu-berlin.de24 / 37

Sources Conference homepage http://trec.nist.gov Ellen M. Voorhees, Donna Harman:"Overview of the Eighth Text REtrieval Conference (TREC-8)",1999Lutz Prechelt, prechelt@inf.fu-berlin.de25 / 37

TREC "Ad hoc retrieval" task started at TREC-1 (1992), used through TREC-8 (1999) then discontinued because performance had leveled off no more progress, the benchmark had done its job! Corpus contained 740 000 news articles in 1992 had grown to 1.5 Mio (2.2 GB) by 1998Benchmark composition: 50 different query classes (called 'topics') are used and changed each year Performance measures are Precision and Recall Comparison is done at the conferenceLutz Prechelt, prechelt@inf.fu-berlin.de26 / 37

An example 'topic definition' From TREC-8 (1999) earlier topic definitions were more detailedLutz Prechelt, prechelt@inf.fu-berlin.de27 / 37

TREC procedure Dozens of research groups from universities and companiesparticipate: run all 50 queries through their system conversion from topic definition to query can be automatic or manual two separate performance comparisons submit raw retrieval results conference organizers evaluate results and compile performancestatistics Precision: fraction of results that are correct Recall: fraction of eligible documents that are in the results at the conference, performance of each group is known presentations explain the techniques usedLutz Prechelt, prechelt@inf.fu-berlin.de28 / 37

Results (TREC-8, automatic queryformulation)AUC (Area Underthe Curve): 0.3Lutz Prechelt, prechelt@inf.fu-berlin.de29 / 37

Results(TREC-8, manual query formulation)AUC (Area Underthe Curve): 0.35-0.5Lutz Prechelt, prechelt@inf.fu-berlin.de30 / 37

Year-to-year improvement levels off Results for only 1 system(SMART), but would besimilar for most othersLutz Prechelt, prechelt@inf.fu-berlin.de31 / 37

Problem: How to judge query results How can anyone possibly know which of 1.5 Mio documentsare relevant for any one query? necessary for computing recall TREC procedure: For each query, take the results of a subset of all participants Take the top 100 highest ranked outputs from each e.g. TREC-8: 7100 outputs from 71 systems Merge them into the candidate set e.g. TREC-8: 1736 unique documents (24 per system on average) Have human assessors judge relevance of each document Overall, consider only those documents relevant that were (a) inthis set and (b) were judged relevant by the assessor e.g. TREC-8: 94 relevant documents (What are the problems with this procedure?)Lutz Prechelt, prechelt@inf.fu-berlin.de32 / 37

TREC recall measurement problems1. Human assessors make errors This is bad for all participants who (at those points) do not2. There are often many more relevant documents in the corpusbeyond the candidate set The procedure will consider them all irrelevant This is bad for participants who did not contribute to thecandidate set and find documents of a different nature than the contributors or rank relevance different than the contributorsHow could TREC evaluate how serious this problem is?Lutz Prechelt, prechelt@inf.fu-berlin.de33 / 37

Precision decrease for system A whenhits unique to system A are left outALutz Prechelt, prechelt@inf.fu-berlin.de34 / 37

Summary Benchmarks consist of a performance measure, a task, anddirect comparison of different results Selecting tasks (and sometimes measures) is notstraightforward! They apply to classical performance fields such as hardware,to capabilities of intelligent software (e.g. TREC), oreven to methods to be applied by human beings Measurement in a benchmark may even have subjectivecomponents Even benchmarks can have credibility problems Putting together a benchmark is difficult, costly, and usuallyproduces disputes over the task composition A good benchmark is a powerful and cost-effective evaluationtool.Lutz Prechelt, prechelt@inf.fu-berlin.de35 / 37

Further literature ICPE: Int'l. Conf. on Performance Evaluation Web search for other computer benchmarks Related approach: RoboCup Robot performance cannot be quantified,so use direct games and tournaments insteadLutz Prechelt, prechelt@inf.fu-berlin.de36 / 37

Thank you!Lutz Prechelt, prechelt@inf.fu-berlin.de37 / 37

SPECrate2017 Integer SPECrate2017 Floating Point rate vs. speed for multi-core vs. single-core performance Specify concrete program runs for each program Package programs and runs so as to make them easily applicable on any new system application requires recompilation: SPEC also tests compiler performance!

Related Documents:

Agile Methods: Scrum, Crystal, Lean SD,

Lutz Prechelt, prechelt@inf.fu-berlin.de 3 / 45 More agile methods Scrum Ken Schwaber Crystal Alistair Cockburn Feature-Driven Development (FDD) Coad, Palmer, Felsing Lean Softwar

32 Views

2y ago

Rehabilitation Center at Lutz

19091 N DALE MABRY HWY LUTZ, FLORIDA 33548 LUXEATLUTZ.COM 813.751.0557. DESIGNED FOR . REHABILITATION CENTER . WELLNESS. AT . LUTZ. EXPERIENCE . UNPARALLELED. CARE IN A RESORT-STYLE SETTING. . Luxe Rehabilitation Center. at Lutz is truly designed for wellness. LUXE HEALTHCARE -

7 Views

1y ago

PCSS-ASPMN Dieterichs Lutz Mar2019 v3 FINAL For HANDOUTS

Chad Dieterichs, MD Peggy Lutz, FNP-BC, RN-BC March 27, 2019 2 Conflict of Interest Chad Dieterichs, no conflict of interest Peggy Lutz, no conflict of interest 3 Educational Objectives At the conclusion of this activity, participants should be able to: 1. Identify the fundamental con

9 Views

2y ago

Acquisition by Adverse Possession

Van Valkenburgh v. Lutz, 106 N.E. 2d 28 (N.Y. 1952) Casebook, p. 115 April 1947 Van Valkenburghs buy lots 19-22 July 6, 1947 Van Valkenburghs take possession of lot 19 July 8, 1947 Attorney sends letter to Lutz to clear out 1912 Mary & William Lutz buy lots 14 & 15 and travel across lots 19-22 1920 Charlie’

18 Views

2y ago

DBpedia: A Nucleus for a Web of Open Data

DBpedia: A Nucleus for a Web of Open Data S oren Auer1;3, Christian Bizer 2, Georgi Kobilarov , Jens Lehmann1, Richard Cyganiak2, and Zachary Ives3 1 Universit at Leipzig, Department of Computer Science, Johannisgasse 26, D-04103 Leipzig, Germany, fauer,lehmanng@informatik.uni-leipzig.de 2 Freie Universit at Berlin, Web-based Systems Group, Garystr. 21, D-14195 Berlin, Germany,

13 Views

1y ago

First Joint Meeting Brazil Italy of Mathematics Book of ...

3 Scienti c and Organizing Committees 3.1 Scienti c Committee ( ) Liliane Basso Barichello (UFRGS, Porto Alegre, lbaric@mat.ufrgs.br)( ) Piermarco Cannarsa (Universit a di Roma Tor Vergata, Roma,cannarsa@axp.mat.uniroma2.it)( ) Ciro Ciliberto (Universit a di Roma Tor Vergata, Roma, cilibert@axp.mat.uniroma2.it)- co-chair ( ) Giorgio Fotia (Universit a di Cagliari, Giorgio.Fotia@crs4.it)

68 Views

3y ago

COMMENTS ON THE DEVELOPMENT OF TOPOS THEORY

Clemens Berger, Universit e de Nice-Sophia Antipolis: cberger@math.unice.fr Richard Blute, Universit e d’ Ottawa: rblute@uottawa.ca Lawrence Breen, Universit e de Paris 13: breen@math.u

50 Views

2y ago

Abrasive Water Jet Machining of Carbon Epoxy Composite

Abrasive water jet machining (AWJM) process is one of the most recent developed non-traditional machining processes used for machining of composite materials. In AWJM process, machining of work piece material takes place when a high speed water jet mixed with abrasives impinges on it. This process is suitable for heat sensitive materials especially composites because it produces almost no heat .

79 Views

3y ago

Recent Views

MANAGERIAL FINANCE - GBV

of Managerial Finance page 2 Introduction to Managerial Finance 1 Starbucks—A Taste for Growth page 3 1.1 Finance and Business What Is Finance? 4 Major Areas and Opportunities in Finance 4 Legal Forms of Business Organization 5 Why Study Managerial Finance? Review Questions 9 1.2 The Managerial Finance Function 9 Organization of the Finance

3y ago

6.8K Views

Chapter 1 The roles of finance function in organisations

The roles of the finance function in organisations 4. The role of ethics in the role of the finance function Ethics is the system of moral principles that examines the concept of right and wrong. Ethics underpins an organisation’s sustained value creation. The roles that the finance function performs should be carried out in an .File Size: 888KBPage Count: 10Explore furtherRole of the Finance Function in the Financial Management .www.managementstudyguide.c Roles and Responsibilities of a Finance Department in a .www.pharmapproach.comRoles and Responsibilities of a Finance Department .www.smythecpa.comTop 10 – Functions of Business Finance in an om23 Functions and Duties of Accounting and Finance nded to you b

2y ago

335 Views

2017-2018 GRANDE ÉCOLE MSc in MANAGEMENT

Descriptif des cours Course Outlines 10 Catalogue des cours/ Course Catalog 2017-2018 FIN: Finance/Finance A : Actuariat/Actuarial, Insurance E : Finance d’entreprise/Corporate Finance The course liste tables and the course outlines G : Finance générale/General Finance M : Finance de marché/Market Finance S : Synthèse/Synthesis IDS: Systèmes d’Information, Sciences de la Décision et .

3y ago

312 Views

Behavioral Finance and Wealth L Management

Introduction to Behavioral Finance CHAPTER1 What Is Behavioral Finance? Behavioral Finance: The Big Picture Standard Finance versus Behavioral Finance The Role of Behavioral Finance with Private Clients How Practical Application of Behavioral Finance Can Create a Successful Advisory Rel

2y ago

377 Views

Catalogue des Cours Course Catalog - ESSEC Business School

10 Catalogue des cours/Course Catalog 2021-2022 FIN: Finance/Finance E : Finance d'entreprise/Corporate Finance G : Finance générale/General Finance M : Finance de marché/Market Finance S : Synthèse/Synthesis IDS: Systèmes d'Information, Sciences de la Décision et Statistiques/ Information Systems, Decision Sciences and Statistics

1y ago

222 Views

INDIAN FINANCIAL SYSTEM - Tumkur University

banking financial companies (NBFCs) providing whole range of financial services. These include hire - purchase 300 consumer finance companies, leasing companies, housing finance companies, factoring companies, Credit rating agencies, merchant banking companies etc. NBFCs mobilise public funds and provide loanable funds.

5m ago

47 Views

SINGAPORE - Kelly Services

FINANCE Chief Financial Officer Degree/Master 15 20,000 25,000 Finance Assistant Diploma 1-3 2,800 3,400 Finance Controller Degree 10-15 10,000 18,000 Finance Director Degree 15 15,000 20,000 Finance Executive/ Senior Finance Executive Degree 2-5 3,000 6,000 Finance Manager/ Assistan

2y ago

527 Views

Ministries of Finance and Nationally Determined Contributions

Rodrigo Rojo, IDB Sr. Consultant and advisor to Ministry of Finance of Chile. Colombia German Romero Otalora and Laura Marcela Ruiz Daza — Office of the Vice-Minister — Ministry of Finance. Ireland Paul Ryan — International Finance Division — Ministry of Finance Sean Judge — Department of Finance — Ministry of Finance

1y ago

232 Views

Trade Finance & Supply Chain Finance Awards 2022

In February 2022, Global Finance will publish its annual selections for the World's Best Trade Finance and Supply Chain Finance Providers. Global Finance will name the best trade finance providers in more than 100 countries and territories, eight global regions and

1y ago

215 Views

Industry Data Report - Restaurant Research

Building Bridges between Franchisees, Franchisors & Financiers Industry Data Report Finance & Valuations - 2021 RR's Finance & Valuation report is based on survey responses (equally weighted) from 50 finance companies including traditional cash flow lenders, sale leaseback companies, SBA lenders and equipment finance companies.

1y ago

94 Views

McKinsey on Finance

finance and strategy 23 How M&A practitioners enable their success Perspectives on Corporate Finance and Strategy Number 56, Autumn 2015 Finance McKinsey on. McKinsey on Finance. is a quarterly publication written by corporate-finance experts and practitioners at McKinsey & Company. This publication offers readers insights into value-creating .

3y ago

272 Views

SAP Simple Finance - tutorialspoint

SAP Simple Finance is only known as S/4 HANA Finance and this will be the only name of other releases of SAP Simple Finance. During the installation of SAP S/4 HANA Finance, various front-end and back-end components get installed. 2. SAP Simple Finance Introduction

3y ago

252 Views

pwc Finance Function Transformation

PwC’s finance effectiveness framework looks at 3 core areas within finance, to frame a programme of work that makes the finance function more effective, and to increase its interaction with the business: Finance efficiency Risk, Compliance and Control Finance Insights (the key lever in

2y ago

285 Views

Sustainable Finance: A Primer and Recent Developments

Social (impact) finance RBC Wealth Management Green finance Resonance Fund Impact finance Bridges Fund Management Socially responsible finance Nutmeg . Source: Author's own research. Despite this variety of definitions, some consistency of terminology has coalesced around the construct of "sustainable finance" in terms of a range of

1y ago

151 Views

The International Finance Corporation's Blended Finance Operations

The International Finance Corporation's Blended Finance Operations . 1. Context. Blended finance is a risk mitigation tool applied to investments for which it is difficult to attract commercial funding. Blended finance refers to the combination of concessional and commercial funding in private sector-led projects. Its rationale is

1y ago

187 Views

Lutz Prechelt Freie Universität Berlin, Institut Für .

It looks like you're using an ad-blocker