Data mining refers to the extraction or “mining” of knowledge from large amounts of data. Data mining is an extraction of interesting non-trivial implicit, provisionally unknown, and potentially useful patterns or knowledge from a huge amount of data.
Read more: Difference between Data Mining and Database
Knowledge discovery is a process is depicted in the figure and consists of an iteration sequence of the following steps.
Knowledge Discovery Process
1 Data cleaning: It is the process to remove the noise and inconsistent data.
2 Data integration: It is the process where multiple data sources may be combined.
3 Data selection: It is the process where relevant to the analysis task are retrieved from the database.
4 Data transformation: It is the process where data are transformed or consolidated in the form appropriate for mining by performing summary or aggregation operations.
Sometimes data transformation and consolidation are performed before the data selection process, particularly in the case of data warehousing.
Data reduction may also be performed to obtain the smaller representation of the original data without sacrificing its integrity.
5 Data mining: It is an essential process where intelligent methods are applied in order to extract data patterns.
6 Pattern evaluation: It is the process to identify the truly interesting patterns representing knowledge based on some interestingness measured.
7 Knowledge Presentation: It is the process where visualization and knowledge presentation techniques are used to present the mined knowledge to the users.