Tip

What defines a data mining query?

Jiawei Han and Micheline Kamber

This tip from Han and Kamber's book Data Mining Concepts and Techniques (Morgan Kaufmann) examines the components of a data mining query.

Each user will have a data mining task in mind, that is, some

    Requires Free Membership to View

form of data analysis that she would like to have performed. A data mining task can be specified in the form of a data mining query, which is input to the data mining system. A data mining query is defined in terms of the following primitives.

  • Task-relevant data: This is the database portion to be investigated. For example, suppose that you are a manager of AllElectronics in charge of sales in the United States and Canada. In particular, you would like to study the buying trends of customers in Canada. Rather than mining the entire database, you can specify that only the data relating to the customer purchases in Canada need be retrieved, along with the related customer profile information. You can also specify attributes of interest to be considered in the mining process. These are referred to as relevant attributes. For example, if you are interested only in studying possible relationships between, say, the items purchased and customer annual income and age, then the attributes name of the relation item, and income and age of the relation customer, can be specified as the relevant attributes for mining.
  • The kinds of knowledge to be mined: This specifies the data mining functions to be performed, such as characterization, discrimination, association, classification, clustering, or evolution analysis. For instance, if studying the buying habits of customers in Canada, you may choose to mine associations between customer profiles and the items that these customers like to buy.
  • Background knowledge: Users can specify background knowledge, or knowledge about the domain to be mined. This knowledge is useful for guiding the knowledge discovery process and for evaluating the patterns found. There are several kinds of background knowledge. One popular form of background knowledge is known as concept hierarchies. Concept hierarchies are useful in that they allow data to be mined at multiple levels of abstraction. Other examples include user beliefs regarding relationships in the data. These can be used to evaluate the discovered patterns according to their degree of unexpectedness (where unexpected patterns are deemed interesting) or expectedness (where patterns that confirm a user hypothesis are considered interesting).
  • "Interestingness" measures: These functions are used to separate uninteresting patterns from knowledge. They may be used to guide the mining process or, after discovery, to evaluate the discovered patterns. Different kinds of knowledge may have different interestingness measures. For example, interestingness measures for association rules include support (the percentage of task-relevant data tuples for which the rule pattern appears) and confidence (an estimate of the strength of the implication of the rule). Rules whose support and confidence values are below user-specified thresholds are considered uninteresting.
  • Present and visualization of discovered patterns: This refers to the form in which discovered patterns are to be displayed. Users can choose from different forms for knowledge presentation, such as rules, tables, charts, graphs, decision trees, and cubes.

 

For More Information


This was first published in June 2001

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.