Data Mining
- Def. - discovery of unexpected data correlations; KDD
- Motivation
- Huge amount of data that MBA's/Doctors/General
Researchers want analyzed
- What it's not - Statistics, AI, Information retrieval
do not have scalable algorithms to generate rules
- Need to find relationships
- Process
- Data Cleaning
- Warehousing
- Eliminate using Domain Knowledge
- Mine
- Is there useful rules?
- Applications
- Decision support
- Market analysis & Management (Business Stuff)
- Risk Analysis / Forecasting (More Business Stuff)
- Fraud detection
- Any general research (Medical/Engineering)
- Intelligent query answering
- Sports
- Real system - IDIS (as best I can remember)
- Future Research:
- Algorithms that improve the cleaning
- Noise
- Incomplete data
- reduce data & number of attributes without risk of
loosing discovery
- Methods for reducing attributes/data
- Algorithms that improve the Warehousing
- Handling different types of data/databases
- Mining from the web
- Algorithms that improve the mining
- Regression
- Χ2 - bivariate test of statistical
significance
- Entropy based discretization
- Discretization - next 3 can be used for
cleaning also
- Binning
- Cluster analysis
- Increase performance of any of these
- Applications
- Tools
- Integration with existing knowledge
- Data security
- Social impact
- privacy, integrity, security
- Visualization of rules
- New/Improved Data Mining Language Grammar
- New/Improved Sampling methods
- Mining non-traditional data such as spatial,
multimedia, time series data, etc.