ExploreKit: Automatic Feature Generation And Selection

2y ago
33 Views
2 Downloads
685.98 KB
6 Pages
Last View : 3d ago
Last Download : 3m ago
Upload by : Sasha Niles
Transcription

effective rankingrequires taking into account the size and feature compositionof the analyzed dataset. We therefore propose a novel MLbased approach for candidate feature ranking. We define metafeatures to represent both the dataset and the candidate featureand train a feature ranking classifier. This is, to the best of ourknowledge, the first such attempt.In the candidate features evaluation & selection phase weuse greedy search to evaluate the ranked candidate features.candWe evaluate the performance of the joint set Fi {fi,j} forcandcandeach fi,j RankedFiand compute the reduction inclassification error compared with Fi . When the performanceimprovement exceeds a predefined threshold w , the evaluationprocess terminates and we select the current candidate feature,denoted as fiselect . We define the joint set Fi {fiselect } asthe current feature set of the following iteration Fi 1 . Nextwe describe the phases of this process in detail.A. Generation of Candidate FeaturesThe goal of this phase is to generate a large set of candidatefeatures Ficand using the current features set Fi . We firstpresent the operators used in the candidate feature generationand then describe our proposed process.1) Operator Types: We apply three types of operators togenerate the candidate features set Ficand for iteration i: unary,binary and higher-order.Unary operators: applied on a single feature. Each operatorin this group belongs to one of two sub-groups:

Discretizers: used to convert continuous and datetimefeatures into discrete (i.e. categorical) ones. Discretization[4] is necessary in many popular classification algorithms(e.g., Decision Trees and Naive Bayes) and has also beenshown to contribute to performance [1]. For our evaluation, we have implemented the EqualRange discretizationfor numeric features (partition the range of values ofthe feature into X equal segments) and the DayOfWeek,MonthOfYear and IsWeekend for date-time features.Another important benefit of discretization is that itprovides us with transformations of continuous featuresthat can be utilized by the higher–order operators.Normalizers: used to fit the scale of continuous (i.e.numeric) features to specific distributions. Normalizationhas also been shown critical to the performance of multiple machine learning algorithms [3]. For our experimentswe have implemented normalization to the range [0,1].Binary operators: applied on a pair of features. This groupcurrently consists of the four basic arithmetic operations: , , , .Higher-order operators: use multiple (two or more) features for the generation of a new one. We have implemented five operators in this group: GroupByThenMax,GroupByThenMin, GroupByThenAvg, GroupByThenStdev andGroupByThenCount. These operators implement the SQLbased operations with the same name.It is important to point out that in our experiments we onlyuse a small set of operators. Additional operators can be easilycreated and added to our framework, including many domainspecific operators for fields such as biology or time-seriesanalysis.2) Generating the Candidate Features Set: We generatethe candidate features by applying the operators in the following order:1. We apply the unary operators on all possible featuresin the features set. We use this step to create Fu,i , thenormalized and discretized versions of all non-discretefeatures in Fi .2. W apply the binary and higher-order operators on theunified set Fi Fu,i . All possible valid feature combinations are generated. We denote the features generated inthis step as Fo,i .3. For every applicable (i.e. non-discrete) feature in Fo,i , weonce again apply all the unary operators. We denote thefeatures set generated by this step as Fou,i .4. The final candidate features set for iteration i is the unionof all generated feature sets: Ficand Fu,i Fo,i Fou,i .In order to limit the size of Ficand , the features are onlycombined once: none of the generated candidate features isre-used to gene

Gilad Katz Eui Chul Richard Shin Dawn Song University of California, Berkeley University of California, Berkeley University of California, Berkeley giladk@berkeley.edu ricshin@berkeley.edu dawnsong@cs.berkeley.edu Abstract—Feature generation is one of the challenging aspects of

Related Documents:

5 10 feature a feature b (a) plane a b 0 5 10 0 5 10 feature a feature c (b) plane a c 0 5 10 0 5 10 feature b feature c (c) plane b c Figure 1: A failed example for binary clusters/classes feature selection methods. (a)-(c) show the projections of the data on the plane of two joint features, respectively. Without the label .

SOLIDWORKS 2020 Basic Tools l Basic Solid Modeling - Extrude Options 3-3 (Base) A. First, the Parent feature . is created. B. Next, the Boss feature, which is a child, is created. Feature 2 (Boss) Feature 1 Feature 3 (Cut/Hole) Feature 4 (Fillet) The sample part below has 1 Parent feature (the Base) and 3 Child features

The following iPod, iPod nano, iPod classic, iPod touch and iPhone devices can be used with this system. Made for. iPod touch (5th generation)*. iPod touch (4th generation). iPod touch (3rd generation). iPod touch (2nd generation). iPod touch (1st generation). iPod classic. iPod nano (7th generation)*. iPod nano (6th generation)*

Modelos de iPod/iPhone que pueden conectarse a esta unidad Made for iPod nano (1st generation) iPod nano (2nd generation) iPod nano (3rd generation) iPod nano (4th generation) iPod nano (5th generation) iPod with video iPod classic iPod touch (1st generation) iPod touch (2nd generation) Works with

NEW BMW 7 SERIES. DR. FRIEDRICH EICHINER MEMBER OF THE BOARD OF MANAGEMENT OF BMW AG, FINANCE. The sixth-generation BMW 7 Series. TRADITION. 5th generation 2008-2015 1st generation 1977-1986 2nd generation 1986-1994 3rd generation 1994-2001 4th generation 2001-2008 6th generation 2015. Six

Transmission Automatic 6-speed DSG Manual 6-speed Automatic 6-speed DSG Automatic 7-speed DSG Automatic 7-speed DSG Manual 6-speed Automatic 7-speed DSG Automatic 7-speed DSG WEIGHT Kerb weight - in standard version with a 75kg driver (kg) 1,561 (1,604) 1,615 (1,658) 1,630 (1,673) 1,695 (1,738) 1,667 (1,710) 1,705 (1,748) 1,740 (1,783) 1,752 .

DSX Feature Handbook Introduction 1 Introduction Introduction Using the DSX Feature Handbook Using the DSX Feature Handbook How the Feature Handbook is Organized This feature handbook describes the features and operation of the DSX Multibutton Telephone. It is divided into two chapters, as follows: Introduction This is the chapter you are .

Kata Kunci : Asam folat, Anemia, Gagal Ginjal Kronik, Rawat Inap . ii ABSTRACT A STUDY OF FOLIC ACID USE IN ANEMIC CHRONIC RENAL FAILURE PATIENTS HOSPITALIZED IN RSUD SIDOARJO Chronic renal failure is disturbance of kidney functions that progressive and irreversible where the ability of kidney to maintain the metabolism and liquid balance also electrolit causes uremic. Anemic is a complication .