Tip

Major issues in data mining

Jiawei Han and Micheline Kamber

Every project should be undertaken with all the necessary preparations. Data mining is no different. This tip from Jiawei Han and Micheline Kamber's book  Data Mining: Concepts and Techniques (Morgan

    Requires Free Membership to View

Kaufman) provides a list of the major issues involved in data mining.


Mining methodology and user interaction issues: These reflect the kinds of knowledge mined, the ability to mine knowledge at multiple granularities, the use of domain knowledge, ad hoc mining, and knowledge visualization.

  • Mining different kinds of knowledge databases: Data mining should cover a wide spectrum of data analysis and knowledge discovery tasks, including data characterization, discrimination, association, classification, clustering, tread and deviation analysis, and similarity analysis.
  • Interactive mining of knowledge at multiple levels of abstraction: The data mining process should be interactive. Interactive mining allows users to focus the search for patterns, providing and refining data mining requests based on returned results.
  • Incorporation of background knowledge: Background knowledge may be used to guide the discovery process and allow discovered patterns to be expressed in concise terms and at different levels of abstraction.
  • Data mining query languages and ad hoc mining: Relational query languages (such as SQL) allow users to pose ad hoc queries for data retrieval.
  • Presentation and visualization of data mining results: Discovered knowledge should be expressed in high-level languages, visual representations, or other expressive forms so that knowledge can be easily understood and directly usable by humans.
  • Handling noisy or incomplete data: When mining data regularities, these objects may confuse the process, causing the knowledge model constructed to overfit the data.
  • Pattern evaluation--the interestingness problem: A data mining system can uncover thousands of patterns. Many of the patterns discovered may be uninteresting to the given user, representing common knowledge or lacking novelty.

Performance issues: These include efficiency, scalability, and parallelization of data mining algorithms.

  • Efficiency and scalability of data mining algorithms: To effectively extract information from a huge amount of data in databases, data mining algorithms must be efficient and scalable.
  • Parallel, distributed, and incremental mining algorithms: The huge size of many databases, the wide distribution of data, and the computational complexity of some data mining methods are factors motivating the development of algorithms that divide data into partitions that can be processed in parallel.

Issues relating to the diversity of database types:

  • Handling of relational and complex types of data: Specific data mining systems should be constructed for mining specific kinds of data.
  • Mining information from heterogeneous databases and global information systems: Local- and wide-area computer networks (such as the Internet) connect many sources of data, forming huge, distributed, and heterogeneous databases.

The above issues are considered major requirements and challenges for the further evolution of data mining technology. Some of the challenges have been addressed in recent data mining research and development, to a certain extent, and are now considered requirements, while others are still in the research stage.


Click on the title to learn more about Data Mining: Concepts and Techniques.

What did you think of this tip? Love it or hate it, e-mail and let us know.

This was first published in August 2001

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.