Weka Machine Learning Project

An exciting and potentially far-reaching development in computer science is the invention and application of methods of machine learning. These enable a computer program to automatically analyse a large body of data and decide what information is most relevant. This crystallised information can then be used to automatically make predictions or to help people make decisions faster and more accurately.

The overall goal of our project is to build a state-of-the-art facility for developing machine learning (ML) techniques and to apply them to real-world data mining problems. Our team has incorporated several standard ML techniques into a software "workbench" called WEKA, for Waikato Environment for Knowledge Analysis. With it, a specialist in a particular field is able to use ML to derive useful knowledge from databases that are far too large to be analysed by hand. WEKA's users are ML researchers and industrial scientists, but it is also widely used for teaching.

Our objectives are to

Our machine learning package is publically available and presents a collection of algorithms for solving real-world data mining problems. The software is written entirely in Java and includes a uniform interface to a number of standard ML techniques. Please feel free to browse around. 


*This information was entered by Leonard D'Avolio in an effort to pre-populate ORBIT.  If this is your project and you'd like to be granted edit capabilities, or you'd like it removed, just shoot me a note at ldavolio@gmail.com


Eibe Frank
Ian Witten

Associated Institutions

The University of Waikato

Application Domains
  • Domain independent
Other Resource Type
Software Subtype
  • Algorithm implementation
  • Association analysis
  • Classification
  • Cluster analysis
  • Data mining/Machine learning
  • Outlier (anomaly) detection
  • Text mining
Programming Languages
  • Java
Operating Systems
Included Components
  • Application Programming Interface
  • Graphical User Interface
  • Library of modular components
Dataset Subtype
Data Model Subtype
Online Resource Subtype
Knowledge Base Subtype
Intended User Types
  • Informatics researcher
  • NLP researcher or developer
  • Software developer

Witten, Frank. Data Mining: Practical Machine Learning Tools and Techniques (Second Edition)Morgan Kaufmann June 2005 525 pages Paper ISBN 0-12-088407-0

Available Documentation
  • Web page/HTML documentation
Licensing Type
Open source
Licensing Notes
Released under GNU General Public License. Open source company Pentaho is taking over development / maintenance.
Development Milestones

Up to Weka 3 now.