Author: Dr. Hans Hofmann Source: [UCI](https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data)) - 1994 Please cite: [UCI](https://archive.ics.uci.edu/ml/citation_policy.html)German Credit dataset This dataset classifies people described by a set of attributes as good or bad credit risks.This dataset comes with a cost matrix: ``` Good Bad (predicted) Good 0 1 (actual) Bad 5 0 ```It is worse to class a customer as good when they are bad (5), than it is to class a customer as bad when they are good (1). ### Attribute description 1. Status of existing checking account, in Deutsche Mark. 2. Duration in months 3. Credit history (credits taken, paid back duly, delays, critical accounts) 4. Purpose of the credit (car, television,..) 5. Credit amount 6. Status of savings account/bonds, in Deutsche Mark. 7. Present employment, in number of years. 8. Installment rate in percentage of disposable income 9. Personal status (married, single,..) and sex 10. Other debtors / guarantors 11. Present residence since X years 12. Property (e.g. real estate) 13. Age in years 14. Other installment plans (banks, stores) 15. Housing (rent, own,..) 16. Number of existing credits at this bank 17. Job 18. Number of people being liable to provide maintenance for 19. Telephone (yes,no) 20. Foreign worker (yes,no)
Here are some small programs purporting to show the versatility of the Weka data mining/machine learning system and what it can do. I will not explain everything (in fact, I will not explain very much at all). At the Weka site http://www.cs.waikato.ac.nz/~ml/weka/index.html you can read more about the system as well as downloading it. - Statlog (German Credit Data) Data Set Download: Data Folder, Data Set Description. Abstract: This dataset classifies people described by a set of attributes as good or bad credit risks. Comes in two formats (one all numeric). Also comes with a cost matrix.
- Below are some sample datasets that have been used with Auto-WEKA.Each zip has two files, test.arff and train.arff in WEKA's native format. To use these zip files with Auto-WEKA, you need to pass them to an InstanceGenerator that will split them up into different subsets to allow for processes like cross-validation. To perform 10 fold cross-validation with a specific seed, you can use the.
This is an analysis and classification of german credit data (more information at this pdf). Three classifiers tested, Support Vector Machines (SVM), Random Forests, Naive Bayes, to select the most efficient for our data. Verifire tools manual download.
Also see:Weka MOOC's:
- Advanded Data Mining with Weka
<>
- Simple Weka Applet 1
Here you can see some of the algorithms in the works, as well as using different data sets (and providing one of your own in ARFF data format) .
Source files: WekaApplet1.java, Weka1.java - Weka J48 Applet
Applet using (some of) the options of the J48 algorithm.
Source files: WekaJ48Applet.java, WekaJ48.java - Simple text classification (information)
Very simple text classification applets: Source files: TextClassifierApplet.java, TextClassifier.java - Very simple Association Rules (Apriori) applets: Source files: AssociationRulesApplet.java, AssociationRules.java
- Copy the file ExpandFreqField.java to the Weka directory
weka/filters/unsupervised/instance
- Add the following line in the file
weka/gui/GenericObjectEditor.props
together with other filters.unsupervised.instance filters:weka.filters.unsupervised.instance.ExpandFreqField,
(don't forget the trailing '). - Compile the Java file
- Start Weka Explorer
Some of my other pages about Weka
Also see the following pages on my site mentioning Weka.- If you can read Swedish (or are courageous) you may see my data mining presentation pages, wheresome of the basic principles and algorithms of machine learning and data miningare explained.
- The badge problem which is an analysis of a (recreational) data set, using Weka.
- My Data Mining, Machine Learning etc page.
ARFF data files
The data file normally used by Weka is in ARFF file format, which consist of special tags to indicate different things in the data file (mostly: attribute names, attribute types, attribute values and the data).Here is a list of some ARFF-file you can use, many are standard data sets oftenused in the machine learning community. Most of them are available from theWeka site. Many of them are also described and downloadable from http://www.ics.uci.edu/~mlearn/MLRepository.html.
If you click on the link in the list below you can see for yourself what the data set looks like. Please note that some files arequite big, and for some algorithms it will take a lot of time (often a lot
of time!). The number in parenthesis is the size in bytes. In some of the files there are quite good comments for the data set, other has no explanationat all (they are probably converted from some other source by myself).
One more thing: The class attribute (i.e. the attribute we want to learn) mustbe the last.
- http://www.hakank.org/weka/zoo2_x.arff (6296)
- http://www.hakank.org/weka/golf.arff (383)
- http://www.hakank.org/weka/cpu.arff (6936)
- http://www.hakank.org/weka/sunburn.arff (573)
- http://www.hakank.org/weka/wine.arff (13790)
- http://www.hakank.org/weka/iris_discretized.arff (12390)
- http://www.hakank.org/weka/shape.arff (296)
- http://www.hakank.org/weka/titanic.arff (42322)
- http://www.hakank.org/weka/disease.arff (457)
- http://www.hakank.org/weka/labor_discretized.arff (9595)
- http://www.hakank.org/weka/zoo.arff (9408)
- http://www.hakank.org/weka/monk3.arff (1944)
- http://www.hakank.org/weka/monk2.arff (2602)
- http://www.hakank.org/weka/monk1.arff (1972)
- http://www.hakank.org/weka/credit.arff (23254)
- http://www.hakank.org/weka/contact-lenses.arff (2890)
- http://www.hakank.org/weka/iris.arff (7486)
- http://www.hakank.org/weka/labor.arff (8255)
- http://www.hakank.org/weka/weather.arff (489)
- http://www.hakank.org/weka/weather.nominal.arff (587)
- http://www.hakank.org/weka/BC.arff (25063)
- http://www.hakank.org/weka/G2.arff (8125)
- http://www.hakank.org/weka/GL.arff (10504)
- http://www.hakank.org/weka/HD.arff (22564)
- http://www.hakank.org/weka/HE.arff (8639)
- http://www.hakank.org/weka/HO.arff (29907)
- http://www.hakank.org/weka/IR.arff (4919)
- http://www.hakank.org/weka/LA.arff (4817)
- http://www.hakank.org/weka/LY.arff (11150)
- http://www.hakank.org/weka/SO.arff (8068)
- http://www.hakank.org/weka/V1.arff (31252)
- http://www.hakank.org/weka/VO.arff (33016)
- http://www.hakank.org/weka/auto93.arff (13617)
- http://www.hakank.org/weka/tic-tac-toe.arff (26569)
- http://www.hakank.org/weka/prnn_virus3.arff (6657) From Pattern Recognition and Neural Networks' by B.D. Ripley
- http://www.hakank.org/weka/prnn_viruses.arff (7672) From Pattern Recognition and Neural Networks' by B.D. Ripley
- http://www.hakank.org/weka/tic-tac-toe.arff (26569)
- http://www.hakank.org/weka/badges_plain.arff (11262) (see my analysis of this data set here)
- http://www.hakank.org/weka/badges2.arff (21295) (see my analysis of this data set here)
- http://www.hakank.org/weka/spambase.arff (700661)
- http://www.hakank.org/weka/spambase_real.arff (700659)
- http://www.hakank.org/weka/ticdata_categ.arff (1012920) (Caravan data)
- http://www.hakank.org/weka/exper1.arff (106047)
- http://www.hakank.org/weka/soybean.arff (202935)
- http://www.hakank.org/weka/CH.arff (483568)
- http://www.hakank.org/weka/HY.arff (336201)
- http://www.hakank.org/weka/MU.arff (743765)
- http://www.hakank.org/weka/SE.arff (337512)
- http://www.hakank.org/weka/kropt.arff (532550)
German Credit Data Set Arff Download Software
ARFF versions of DASL data
DASL - The Data and StoryLibrary is a great collection of data sets, with backgroundstories and some analysis. For ARFF versions of these data sets, see ARFF versions ofDASL data sets.Related pages:- My Eureqa page: Eureqa is a great tool for symbolic regression
- My JGAP page, I have written my own symbolic regression program using JGAP (Java)
Back to my homepage
Created by Hakan Kjellerstrand [email protected]