DATA PREPARATION AND ANALYSIS LABORATORY LAB

2y ago
5 Views
2 Downloads
4.21 MB
55 Pages
Last View : 2m ago
Last Download : 3m ago
Upload by : Ryan Jay
Transcription

DATA PREPARATION AND ANALYSIS LABORATORYLAB MANUALAcademic Year:2019SubjectCode:BCSB20Regulations:IARE -R18Semester:IIBranch:CSEPreparedByMrs. G SulakshanaAssistant ProfessorDEPARTMENT OF COMPUTER SCIENCE AND ENGINEERINGINSTITUTE OF AERONAUTICAL ENGINEERING(Autonomous)Dundigal, Hyderabad – 500 0431

INSTITUTE OF AERONAUTICAL ENGINEERING(Autonomous)DUNDIGAL – 500 043, HYDERABADCOMPUTER SCIENCE AND ENGINEERING1. PROGRAM OUTCOMES:PO1M.TECH-PROGRAM OUTCOMES(POS)Analyze a problem, identify and define computing requirements, design and implementappropriate solutionsPO2Solve complex heterogeneous data intensive analytical based problems of real time scenariousing state of the art hardware/software toolsPO3Demonstrate a degree of mastery in emerging areas of CSE/IT like IoT, AI, Data Analytics,Machine Learning, cyber security, etc.PO4Write and present a substantial technical report/documentPO5Independently carry out research/investigation and development work to solve practicalproblemsFunction effectively on teams to establish goals, plan tasks, meet deadlines, manage risk andproduce deliverablesPO6PO7Engage in life-long learning and professional development through self-study, continuingeducation, professional and doctoral level studies.2. PROGRAM SPECIFIC OUTCOMES:PEO1PEO2PROGRAM SPECIFIC OUTCOMES(PEO’s)Independently design and develop computer software systems and products based on soundtheoretical principles and appropriate software development skills.Demonstrate knowledge of technological advances through active participation in life-longPEO3Accept to take up responsibilities upon employment in the areas of teaching, research,andsoftware development.PEO4Exhibit technical communication, collaboration and mentoring skills and assume rolesboth asteam members and as team leaders in an organization.2

ATTAINMENT OF PROGRAM OUTCOMESS. NoProgramOutcomesAttainedExperimentDATA PRE-PROCESSING AND DATA CUBEData preprocessing methods on student and labor datasets Implement datacube for data warehouse on 3-dimensional data12345678910DATA CLEANINGImplement various missing handling mechanisms, Implement various noisyhandling mechanismsEXPLORATORY ANALYSISDevelop k-means and MST based clustering techniques, Develop themethodology for assessment of clusters for given datasetASSOCIATION ANALYSISDesign algorithms for association rule mining algorithmsHYPTOTHYSIS GENERATIONDerive the hypothesis for association rules to discovery of strong associationrules; Use confidence and support thresholds.TRANSFORMATION TECHNIQUESConstruct Haar wavelet transformation for numerical data, Construct principalcomponent analysis (PCA) for 5-dimensional data.DATA VISUALIZATIONImplement binning visualizations for any real time dataset, Implement linearregression techniquesCLUSTERS ASSESSMENTVisualize the clusters for any synthetic dataset, Implement the program forconverting the clusters into histogramsHIERARCHICAL CLUSTERINGWrite a program to implement agglomerative clustering technique ,Write aprogram to implement divisive hierarchical clustering techniqueSCALABILITY ALGORITHMSDevelop scalable clustering algorithms ,Develop scalable a priorialgorithm3PO1PO1PO2PO2PO1PO7PO2PO7PO7PO2

SYLLABUS:II Semester: CSECourse CodeCategoryBCSB20CoreHours / Week CreditsTPC042L0Contact Classes: Nil Total Tutorials: NilTotal Practical Classes: 36Maximum MarksCIA SEETotal3070100Total Classes: 36OBJECTIVES:The course should enable the students to:I. Learn pre-processing method for multi-dimensionaldataII. Practice on data cleaningmechanismsIII. Learn various data exploratoryanalysisIV. Develop the visualizations for clusters orpartitionsLIST OF EXPERIMENTSWeek-1 DATA PRE-PROCESSING AND DATA CUBEData preprocessing methods on student and labor datasets Implement data cube for datawarehouse on 3-dimensional dataWeek-2 DATA CLEANINGImplement various missing handling mechanisms ,Implement various noisy handling mechanismsWeek-3 EXPLORATORY ANALYSISDevelop k-means and MST based clustering techniques, Develop the methodology forassessment of clusters for given datasetWeek-4 ASSOCIATION ANALYSISDesign algorithms for association rule mining algorithmsWeek-5 HYPTOTHYSIS GENERATIONDerive the hypothesis for association rules to discovery of strong association rules; Use confidenceand support thresholds.Week-6TRANSFORMATION TECHNIQUESConstruct Haar wavelet transformation for numerical data, Construct principal component analysis (PCA)for 5-dimensional data.Week-7DATA VISUALIZATIONImplement binning visualizations for any real time dataset, Implement linear regression techniquesWeek-8 CLUSTERS ASSESSMENT4

Visualize the clusters for any synthetic dataset,Implement the program for converting the clustersinto histogramsWeek-9 HIERARCHICAL CLUSTERINGWrite a program to implement agglomerative clustering technique ,Write a program to implement divisivehierarchical clustering techniqueWeek-10 SCALABILITY ALGORITHMSDevelop scalable clustering algorithms ,Develop scalable a priori algorithmReference Books:1. Sinan Ozdemir, “Principles of Data Science”, Packt Publishers, 2016.Web References:1. https://paginas.fe.up.pt/ ec/files 1112/week 03 Data Preparation.pdf2. . nd-analysis/SOFTWARE AND HARDWARE REQUIREMENTS FOR 18 STUDENTS:SOFTWARE: Open source Weka 3.8, PythonHARDWARE: 18 numbers of Intel Desktop Computers with 4 GB RAM5

INDEXS. No12345678910List of ExperimentsDATA PRE-PROCESSING AND DATA CUBEData preprocessing methods on student and labor datasets Implement datacube for data warehouse on 3-dimensional dataDATA CLEANINGImplement various missing handling mechanisms, Implement various noisyhandling mechanismsEXPLORATORY ANALYSISDevelop k-means and MST based clustering techniques, Develop themethodology for assessment of clusters for given datasetASSOCIATION ANALYSISDesign algorithms for association rule mining algorithmsHYPTOTHYSIS GENERATIONDerive the hypothesis for association rules to discovery of strong associationrules; Use confidence and support thresholds.TRANSFORMATION TECHNIQUESConstruct Haar wavelet transformation for numerical data, Construct principalcomponent analysis (PCA) for 5-dimensional data.DATA VISUALIZATIONImplement binning visualizations for any real time dataset, Implement linearregression techniquesCLUSTERS ASSESSMENTVisualize the clusters for any synthetic dataset,Implement the program forconverting the clusters into histogramsHIERARCHICAL CLUSTERINGWrite a program to implement agglomerative clustering technique ,Write aprogram to implement divisive hierarchical clustering techniqueSCALABILITY ALGORITHMSDevelop scalable clustering algorithms ,Develop scalable a priori algorithm6Page 4

WEEK-1Aim: Data preprocessing methods on student and labor datasetsDescription:We need to create an Employee Table with training data set which includes attributes like name, id, salary,experience, gender, phone number.Procedure:Steps:1) Open Start ProgramsAccessories Notepad2) Type the following training data set with the help of Notepad for Employee Table.@relation employee@attribute name {x,y,z,a,b}@attribute id numeric@attribute salary {low,medium,high}@attribute exp numeric@attribute gender {male,female}@attribute phone emale,200200b,105,high,2,male,2402403) After that the file is saved with .arff file format.4) Minimize the arff file and then open Start Programs weka-3-4.5) Click on weka-3-4, then Weka dialog box is displayed on the screen.6) In that dialog box there are four modes, click on explorer.7) Explorer shows many options. In that click on ‘open file’ and select the arff file8) Click on edit button which shows employee table on weka.7

Training Data Set Weather TableResult:This program has been successfully executed.8

Aim:Implement data cube for data warehouse on 3-dimensional dataDescription:We need to create a Weather table with training data set which includes attributes like outlook,temperature, humidity, windy, play.Procedure:Steps:1) Open Start - Programs -Accessories - Notepad2) Type the following training data set with the help of Notepad for Weather Table.@relation weather@attribute outlook {sunny,rainy,overcast}@attribute temparature numeric@attribute humidity numeric@attribute windy {true,false}@attribute play .0,false,yes3)4)5)6)7)8)After that the file is saved with .arff file format.Minimize the arff file and then open Start -Programs - weka-3-4.Click on weka-3-4, then Weka dialog box is displayed on the screen.In that dialog box there are four modes, click on explorer.Explorer shows many options. In that click on ‘open file’ and select the arff fileClick on edit button which shows weather table on weka.9

Training Data Set - Weather TableResult:This program has been successfully executed.10

WEEK-2Aim:Implement various missing handling mechanismsDescription:Real world databases are highly influenced to noise, missing and inconsistency due to their queue sizeso the data can be pre-processed to improve the quality of data and missing results and it also improves theefficiency.There are 3 pre-processing techniques they are:1) Add2) Remove3) NormalizationCreation of Weather Table:Procedure:1) Open Start - Programs - Accessories - Notepad2) Type the following training data set with the help of Notepad for Weather Table.@relation weather@attribute outlook {sunny,rainy,overcast}@attribute temparature numeric@attribute humidity numeric@attribute windy {true,false}@attribute play ainy,75.0,80.0,false,yesAfter that the file is saved with .arff file format. Minimize the arff file and then open StartProgramsweka-3-4.Click on weka-3-4, then Weka dialog box is displayed on the screen.In that dialog box there are four modes, click on explorer.Explorer shows many options. In that click on ‘open file’ and select the arff fileClick on edit button which shows weather table on weka.11

Weather Table after removing attributes WINDY, PLAY:Normalize -Pre-Processing Technique:Procedure:1) Start - Programs -Weka-3-4- Weka-3-42) Click on explorer.3) Click on open file.4) Select Weather.arff file and click on open.5) Click on Choose button and select the Filters option.6) In Filters, we have Supervised and Unsupervised data.7) Click on Unsupervised data.8) Select the attribute Normalize.9) Select the attributes temparature, humidity to Normalize.10) Click on Apply button and then Save.11) Click on the Edit button, it shows a new Weather Table with normalized values on Weka.12

Weather Table after Normalizing TEMPARATURE, HUMIDITY:Weather Table after removing attributes WINDY, PLAY:Normalize -Pre-Processing Technique:Procedure:1)Start - Programs -Weka-3-4 -Weka-3-42) Click on explorer.3) Click on open file.4) Select Weather.arff file and click on open.5) Click on Choose button and select the Filters option.6) In Filters, we have Supervised and Unsupervised data.7) Click on Unsupervised data.8) Select the attribute Normalize.9) Select the attributes temparature, humidity to Normalize.10) Click on Apply button and then Save.11) Click on the Edit button, it shows a new Weather Table with normalized values on Weka.13

Weather Table after Normalizing TEMPARATURE, HUMIDITY:Add -Pre-Processing Technique:Procedure:1) Start - Programs -Weka-3-4 -Weka-3-42) Click on explorer.3) Click on open file.4) Select Weather.arff file and click on open.5) Click on Choose button and select the Filters option.6) In Filters, we have Supervised and Unsupervised data.7) Click on Unsupervised data.8) Select the attribute Add.9) A new window is opened.10) In that we enter attribute index, type, data format, nominal label values for Climate.11) Click on OK.12) Press the Apply button, then a new attribute is added to the Weather Table.13) Save the file.14) Click on the Edit button, it shows a new Weather Table on Weka.Weather Table after adding new attribute CLIMATE:Add -Pre-Processing Technique:Procedure:1)Start - Programs - Weka-3-4 -Weka-3-42) Click on explorer.3) Click on open file.4) Select Weather.arff file and click on open.5) Click on Choose button and select the Filters option.6) In Filters, we have Supervised and Unsupervised data.7) Click on Unsupervised data.8) Select the attribute Add.9) A new window is opened.10) In that we enter attribute index, type, data format, nominal label values for Climate.11) Click on OK.12) Press the Apply button, then a new attribute is added to the Weather Table.13) Save the file.14) Click on the Edit button, it shows a new Weather Table on Weka.05,high,2,male,24024014

3)4)5)6)7)8)After that the file is saved with .arff file format. Minimize the arff file and then open StartProgramsweka-3-4.Click on weka-3-4, then Weka dialog box is displayed on the screen.In that dialog box there are four modes, click on explorer.Explorer shows many options. In that click on ‘open file’ and select the arff fileClick on edit button which shows employee table on weka.Training Data Set -Employee Table15

Add -Pre-Processing Technique:Procedure:1) Start - Programs -Weka-3-4 -Weka-3-42) Click on explorer.3) Click on open file.4) Select Employee.arff file and click on open.5) Click on Choose button and select the Filters option.6) In Filters, we have Supervised and Unsupervised data.7) Click on Unsupervised data.8) Select the attribute Add.9) A new window is opened.10) In that we enter attribute index, type, data format, nominal label values for Address.11) Click on OK.12) Press the Apply button, then a new attribute is added to the Employee Table.13) Save the file.14) Click on the Edit button, it shows a new Employee Table on Weka.Employee Table after adding new attribute ADDRESS:16

Remove -Pre-Processing Technique:Procedure:1) Start - Programs -Weka-3-4 -Weka-3-42) Click on explorer.3) Click on open file.4) Select Employee.arff file and click on open.5) Click on Choose button and select the Filters option.6) In Filters, we have Supervised and Unsupervised data.7) Click on Unsupervised data.8) Select the attribute Remove.9) Select the attributes salary, gender to Remove.10) Click Remove button and then Save.11) Click on the Edit button, it shows a new Employee Table on Weka.Employee Table after removing attributes SALARY, GENDER:17

Normalize -Pre-Processing Technique:Procedure:1) Start -Programs -Weka-3-4 -Weka-3-42) Click on explorer.3) Click on open file.4) Select Employee.arff file and click on open.5) Click on Choose button and select the Filters option.6) In Filters, we have Supervised and Unsupervised data.7) Click on Unsupervised data.8) Select the attribute Normalize.9) Select the attributes id, experience, phone to Normalize.10) Click on Apply button and then Save.11) Click on the Edit button, it shows a new Employee Table with normalized values on Weka.Employee Table after Normalizing ID, EXP, PHONE:Result:This program has been successfully executed.18

WEEK-3Aim:Develop k-means and MST based clustering techniquesDescription:The knowledge flow provides an alternative way to the explorer as a graphical front end to WEKA’salgorithm. Knowledge flow is a working progress. So, some of the functionality from explorer is not yet available. So,on the other hand there are the things that can be done in knowledge flow, but not in explorer. Knowledge flowpresents a dataflow interface to WEKA. The user can select WEKA components from a toolbar placed them on alayout campus and connect them together in order to form a knowledge flow for processing and analyzing the data.Creation of Weather Table:Procedure:1) Open Start -Programs -Accessories - Notepad2) Type the following training data set with the help of Notepad for Weather Table.@relation weather@attribute outlook {sunny,rainy,overcast}@attribute temparature numeric@attribute humidity numeric@attribute windy {true,false}@attribute play ainy,75.0,80.0,false,yesAfter that the file is saved with .arff file format. Minimize the arff file and then open StartProgramsweka-3-4.Click on weka-3-4, then Weka dialog box is displayed on the screen.In that dialog box there are four modes, click on explorer.Explorer shows many options. In that click on ‘open file’ and select the arff fileClick on edit button which shows Weather table on weka.19

Output:Training Data Set -Weather TableProcedure for Knowledge Flow: 1) Open StartProgramsWeka-3-42) Open the Knowledge Flow. Weka-3-43) Select the Data Source component and add Arff Loader into the knowledge layout canvas.4) Select the Filters component and add Attribute Selection and Normalize into the knowledge layout canvas.5) Select the Data Sinks component and add Arff Saver into the knowledge layout canvas.6) Right click on Arff Loader and select Configure option then the new window will be opened and selectWeather.arff7) Right click on Arff Loader and select Dataset option then establish a link between Arff Loader andAttribute Selection.8) Right click on Attribute Selection and select Dataset option then establish a link betweenAttributeSelection and Normalize.9) Right click on Attribute Selection and select Configure option and choose the best attribute for Weatherdata.10) Right click on Normalize and select Dataset option then establish a link between Normalize and Arff Saver.11) Right click on Arff Saver and select Configure option then new window will be opened and set the path,enter .arff in look in dialog box to save normalize data.12) Right click on Arff Loader and click on Start Loading option then everything will be executed one by one.13) Check whether output is created or not by selecting the preferred path.2014) Rename the data name as a.arff15) Double click on a.arff then automatically the output will be opened in MS-Excel.

Result:This program has been successfully executed.21

Aim:Develop the methodology for assessment of clusters for given datasetDescription:The knowledge flow provides an alternative way to the explorer as a graphical front end to WEKA’salgorithm. Knowledge flow is a working progress. So, some of the functionality from explorer is not yet available. So,on the other hand there are the things that can be done in knowledge flow, but not in explorer. Knowledge flowpresents a dataflow interface to WEKA. The user can select WEKA components from a toolbar placed them on alayout campus and connect them together in order to form a knowledge flow for processing and analyzing the data.Creation of Employee Table:Procedure:1) Open Start- Programs - Accessories -Notepad2) Type the following training data set with the help of Notepad for EmployeeTable. @relation employee@attribute eid numeric@attribute ename ravi,ramana,ram,kavya,navya}@attribute salary numeric@attribute exp numeric@attribute address 3,kdp112,kavya,13000,4,k

4) Minimize the arff file and then open Start -Programs - weka-3-4. 5) Click on weka-3-4, then Weka dialog box is displayed on the screen. 6) In that dialog box there are four modes, click on explorer. 7) Explorer shows many options. In that click on ‘open file’ and select the arff file 8) Click on edit b

Related Documents:

preparation of solutions for use in the laboratory (CLS) The Laboratory Science: Practical Skills (National 5) unit broadly reflects the following aspects: maintaining health and safety in a laboratory environment (CA) assisting with the preparation of microbiological specimens and samples for laboratory investigations (CA)

Engineering Physics Laboratory Manual Page 2 Laboratory Instructions 1.The students should bring the laboratory manual, observation book, calculator etc., for each practical class. 2. The students should come to the laboratory with a good preparation to conduct the experiment. 3. Laboratory attendance will form a part of the internal assessment .

management, laboratory sample transport, laboratory purchasing and inventory, laboratory assessment, laboratory customer service, occurrence management, process improvement, quality essentials, laboratory process control, clinical laboratory, ISO 15189. Key words Note: Health laboratories, in this handbook, is a term that is meant to be inclusive

The Self-Service Data Preparation Guide contains information for business users and data scientists about accessing, preparing, enhancing, and sharing data using an interactive spreadsheet-like interface. Self-Service Data Preparation User Guide Self-Service Data Preparation Guide for SAP Data Intelligence PUBLIC 3

Data Preparation Accelerates Self-Service Data Preparation Helps Bridge The Gap Between IT And Line Of Business Through Self-Service And Automation Those utilizing data preparation overwhelmingly acknowledge its benefits in improving the efficiency and productivity of analysis (95%) as well as Agile deployment of new data pipelines (93%).

Sampling and Preparation for Laboratory Measurements measurements for performing a survey or deciding that sampling methods followed by laboratory analysis are necessary. 7.2.1 Identifying Data Needs The decision maker and the survey planning team need to identify the data needs for the survey being performed, including the:

Practical – 4 Sterilization & Disinfection 20-23 Practical – 5 a) Media Preparation 24-34 . Organization and general safety in a clinical microbiology laboratory Safety precautions and good laboratory practice in the laboratory Staining techniques for identification of medically important micro-organisms and Gram nature of medically important bacteria Preparation of different culture .

preparation. Data preparation tools and methods are used to tackle major challenges, and our survey results show they indeed provide benefits. The already high expectations of users are consistently exceeded. If suitably embedded in the organization, data preparation offers a real opportunity to provide data for analytics in better shape