Domain Driven Data Mining

Towards Domain-Driven, Actionable Knowledge Discovery and Delivery

Glossary

Below, we list some of the terms selected from the book: Domain Driven Data Mining.

 

Actionability measures the ability of a pattern to suggest a user to take some concrete

actions to his/her advantage in the real world. The pattern satisfies both technical

and business performance needs from both objective and subjective perspectives.

It particularly measures the ability to suggest business decision-making actions.

 

Actionable knowledge discovery is an iterative optimization process toward the

actionable pattern set, considering the surrounding business environment and problem

states. It is a loop-closed and iterative refinement process, multiple feedbacks,

iterations and refinement are involved in the understanding of data, resources, the

roles and utilization of relevant intelligence, the presentation of patterns, the delivery

specification, and knowledge validation.

 

Actionable knowledge delivery aims to deliver knowledge that is of solid foundation,

business-friendly, and can be taken over by business people for decision making

seamlessly. During the process and iterations of actionable knowledge

discovery, understanding and deliverables are progressively improved and enhanced

toward the final deliverables satisfying user and business needs and supporting direct

decision-making action-taking. Its main objective is to enhance the actionability

of identified patterns for smart problem-solving.

 

Actionable pattern satisfies both technical and business interestingness needs, is

business-friendly and understandable, reflects user preferences and business needs,

and can be seamlessly taken over by business people for decision-making action taking.

Actionable patterns can support business problem-solving by taking actions

recommended by the pattern, and correspondingly transform the problem status

from an initially non-optimal status to a greatly improved one.

 

Agent-driven data mining (ADDM) refers to the contributions made by multiagents

for enhancing data mining tasks. ADDM can contribute to the problem solving

of many data mining issues, eg., agent-based data mining infrastructure and architecture,

agent-based interactive mining, agent-based user interaction, automated

pattern mining, agent-based distributed data mining, multi-agent dynamic mining,

multi-agent mobility mining, agent-based multiple data source mining, agent-based

peer-to-peer data mining, and multi-agent web mining.

 

Agent mining namely agents and data mining interaction and integration, is a new

research area that fulfills the respective strengths of both agents and data mining to

handle either critical challenges in an individual party or mutual issues. Agent mining

studies the methodologies, principles, techniques and applications of the integration

and interaction between agents and data mining, as well as the community that

focuses on the study of agent mining. The interaction and integration between agents

and data mining are comprehensive, multiple dimensional, and inter-disciplinary.

 

Business interestingness Business interestingness of a pattern is determined from

domain-oriented personal, social, economic, user preference and/or psychoanalytic

aspects. It consists of both subjective and objective aspects.

 

Closed-loop mining The discovery of patterns is through a process is with closed-loop

feedback and iterations. Actionable knowledge discovery in a constraint-based

context is more likely to be a closed-loop rather than open process. A closed-loop

process indicates that the outputs of data mining are fed back to change relevant

parameters or factors in particular stages. The feedback and change effect may be

embodied through analyzing and adjusting the relationships between outputs and

particular parameters and factors, and eventually tuning the parameters and factors

accordingly.

 

Cluster pattern more than two patterns are correlated to each other in terms of

pattern merging method G into a cluster. Atomic patterns are combined in terms of

certain relationships from the structural (for instance, Peer-to-Peer relation, Master-

Slave relation) or timeframe (for example Independent relation, Concurrent relation

or Sequential relation, or Hybrid relation) perspectives.

 

Constraint refers to conditions applied on or involved in the process of actionable

knowledge discovery and delivery, including domain constraints, data constraints,

interestingness constraints, and deliverable constraints.

 

Contrast pattern results from the mining process in which one considers the mining

of patterns/models that contrast two or more datasets, classes, conditions, time

periods, and so forth. It captures the situations or contexts (the conditional contrast

bases) where small changes in patterns to the base make big differences in matching

datasets.

 

Coupled sequence refers to multiple sequences of itemsets, which are coupled

with each other in terms of certain relationships. An example is the trade sequence,

buy sequence and sell sequence in stock markets, in which they are coupled in terms

of trading mechanisms, trading rules and investment intention etc.

 

Combined association rule consists of association rules identified from multiple

datasets, which are combined into one combined pattern in terms of a certain relationship.

 

Combined association cluster is a set of combined association rules based on a

combined rule pair, where the rules in the cluster share a same underlying pattern

but have different additional pattern increments on the left side.

 

Combined association pair consists of a pair of association rules.

 

Combined mining is a two to multi-step data mining and post-analysis procedure,

consisting of mining atomic patterns, merging atomic pattern sets into combined

pattern set, or merging dataset-specific combined patterns into the higher level of

combined pattern set. It directly analyzes complex data from multiple sources or

with heterogeneous features such as covering demographics, behavior and business

impacts. The aim of combined mining is to identify more informative knowledge

that can provide an informative and comprehensive presentation of problem solutions.

The deliverables of combined mining are combined patterns.

 

Combined pattern consists of multiple components, a pair or cluster of atomic

patterns, identified in individual sources or based on individual methods. As a result

of combined mining, the delivery of combined patterns presents an in-depth and

more comprehensive indication for taking decision-making actions, which make the

patterns informative and more actionable than patterns composed of single aspects

only, or identifying by single method-based results.

 

Data constraint Constraints on particular data, may be embodied in terms of aspects

such as very large volume, ill-structure, multimedia, diversity, high dimensionality,

high frequency and density, distribution and privacy, dynamics and changes.

 

Data intelligence reveals interesting stories and/or indicators hidden in data about

a business problem. The intelligence of data emerges in the form of interesting

patterns and actionable knowledge. It consists of multi-level of data intelligence,

namely explicit intelligence, implicit intelligence, syntactic intelligence, and semantic

intelligence.

 

Decremental cluster pattern also called decremental pattern cluster, is a special

cluster of combined patterns, within which a former atomic pattern has an additional

pattern increment compared to its next adjacent constituent pattern.

 

Decremental pair pattern also called decremental pattern pair, is a pair of combined

patterns which are paired in terms of certain relationship, within which the

first atomic pattern has a pattern increment part compared to the second constituent.

 

Deliverable constraint refers to conditions on deliverables such as business rules,

processes, information flow, presentation, etc. may need to be integrated into the

domain environment. For instance, learned patterns can be converted into operationalizable

business rules for business peoples use.

 

Derivative pattern is a derived pattern on top of an underlying pattern, namely by

appending additional pattern components on to the base pattern. When it is applied

to the impact-oriented pattern, the extension leads to the difference between the

outcomes of the constituent patterns. The derivative relationship can be unordered

or ordered.

 

Discriminative pattern or discriminating pattern, refers to those patterns drawing

distinctions from other candidates, usually taken in consideration based on class,

category, significance, or impact difference etc. Its opposite form is often called

indiscriminative pattern.

 

Domain constraint includes the domain and characteristics of a problem, domain

terminology, specific business process, policies and regulations, particular user profiling

and favorite deliverables.

 

Domain driven data mining also called Domain Driven Actionable Knowledge

Delivery, building on top of the traditional data-centered pattern mining framework, refers

to the set of methodologies, frameworks, approaches, techniques, tools and systems

that involve human, domain, organizational and social, and network and web factors

in the environment, for the discovery and delivery of actionable knowledge.

 

Domain factor consists of the involvement of domain knowledge and experts, the

consideration of constraints, and the development of in-depth patterns, which are

essential for filtering subtle concerns while capturing incisive issues.

 

Domain intelligence refers to the intelligence that emerges from the involvement

of domain factors and resources in pattern mining, which wrap not only a problem

but its target data and environment. The intelligence of domain is embodied through

the involvement into KDD process, modeling and systems. It consists of qualitative

and quantitative domain intelligence.

 

Dynamic chart is a pattern presentation method, which presents the dynamics of

sequential patterns, activity interaction, and impact change, and the formation of

associated pairs and clusters in terms of pattern interestingness.

 

Emerging patterns are sets of items whose frequency changes significantly from

one dataset to another. It describes significant changes (differences or trends) between

two classes of data.

 

General pattern refers to the pattern mined based on technical significance associated

with the algorithm used.

 

Human intelligence refers to (1) explicit or direct involvement of human knowledge

or a human as a problem-solving constituent, etc., and (2) implicit or indirect

involvement of human knowledge or a human as a system component.

 

Impact-oriented pattern An impact-oriented pattern consists of two components,

namely the left-hand itemsets and the right-hand target impact associated with the

left-hand itemsets. It means that the occurrence of the left-hand itemsets likely results

in the impact defined on the right hand side.

 

Impact-reversed pattern An impact-reversed pattern consists of an underlying activity

pattern and a derivative pattern with an incremental component. In the reversal from one

patterns impact (T1) to the others (T2), the extra itemset plays an important

role.

 

Incremental cluster pattern also called incremental pattern cluster, is a cluster of

combined patterns coupled in terms of certain relationships, within which additional

pattern increments are appended to every previously adjacent constituent patterns.

 

Incremental pair pattern also called incremental pattern pair, is a pair of combined

patterns which are paired in terms of certain relationship, within which the

second atomic pattern has an additional pattern increment part compared to the first

constituent. For instance, a contrast pattern consisting of an underlying pattern and

a derivative pattern.

 

In-depth pattern also called deep pattern, uncovers not only appearance dynamics

and rules but also inside driving forces, reflects not only technical concerns but also

business expectations, and discloses not only generic knowledge but also something

that can support straightforward decision-making actions. It is a pattern actionable

in the business world. In-depth pattern is either filtered and summarized in terms of

business expectations on top of general pattern(s), or itself discloses deep data intelligence.

In-depth pattern mining discovers more interesting and actionable patterns

from a domain-specific perspective.

 

Interestingness measures the significance of a pattern learned on a dataset through

a certain method. The pattern interestingness is specified in terms of technical interestingness

and business interestingness, from both objective and subjective perspectives.

 

Interestingness constraint determines what makes a rule, pattern and finding

more interesting than the other.

 

Intelligence meta-synthesis involves, synthesizes and uses ubiquitous intelligence

surrounding actionable knowledge discovery and delivery in complex data and environment.

 

Knowledge actionability Given a pattern P, its actionable capability is described

as being the degree to which can satisfy both technical interestingness and business

one. If both technical and business interestingness, or a hybrid interestingness

measure integrating both aspects, are satisfied, it is called an actionable pattern.

 

Market microstructure data refers to the data acquired in capital markets, which

is produced in terms of the theory of market microstructure and trading rules. Market

microstructure data presents special data complexities, such as high frequency, high

density, massive quantity, data stream, time series, mutliple coupled sequences etc.

 

Market microstructure pattern refers to the pattern learned on market microstructure

data.

 

Multi-feature combined mining is a kind of combined mining which learns patterns

by involving multiple feature sets, usually heterogeneous. For instance, a combined pattern

may consist of demographic features, business policy-related features,

and customer behavioral data.

 

Multi-method combined mining is a kind of combined mining which learns by involving

multiple data mining methods. It consists of serial multi-method combined

mining, parallel multi-method combined mining, and closed-loop multi-method

combined mining.

 

Multi-source combined mining is a kind of combined mining which learns patterns

by involving multiple data sets, usually distributed and heterogeneous.

 

Network intelligence refers to the intelligence that emerges from both web and

broad-based network information, facilities, services and processing surrounding a

data mining problem and system. It involves both web intelligence and broad-based

network intelligence.

 

Objective technical interestingness measures to what extent the findings satisfy

business needs and user preferences based on the objective criteria.

 

Objective technical interestingness is embodied by measures capturing the complexities

of a pattern and its statistical significance. It could be a set of criteria.

 

Organizational factor refers to many aspects existing in an organization, such as

organizational goals, actors, roles, structures, behavior, evolution, dynamics, interaction,

process, organizational/business regulation and convention, workflow and

actors surrounding a real-world data mining problem.

 

Organizational intelligence refers to the intelligence that emerges from involving

organization-oriented factors and resources into pattern mining. The organizational

intelligence is embodied through its involvement in the KDD process, modeling and

systems.

 

Pair pattern consists of two atomic patterns that are co-related to each other in

terms of a pattern merging method into a pair.

 

Pattern summarization is a process of data mining, which summarizes learned

patterns into higher level of patterns.

 

Pattern merging is a process of data mining, which merges multiple relevant patterns

into one or a set of combined patterns. For instance, local patterns from corresponding

data miners are merged into global pattern sets, merging atomic pattern

sets into combined pattern set, or merging dataset-specific combined patterns into

the higher level of combined pattern set.

 

Pattern increment refers to the additional component on top of an underlying pattern

(a prefix or postfix) to form a derivative pattern, or an incremental pattern. For

instance, with an underlying pattern U, different pattern increment V1, ..., VN

may be added to U, to form into different derivative pattern U, V1, U,V2, . . . , U, Vn.

 

Pattern interaction refers to the process and protocol in which patterns are interacted

with each other to form into certain new patterns. Cluster patterns and pair patterns may be

resulted from pattern interaction. Many pattern interaction mechanisms can be created,

for instance, pattern clustering, classification of patterns etc.

Pattern impact refers to the business impact associated with a pattern or a set of

patterns. For instance, a frequent sequence is likely associated with the occurrence

of government debt, here government debt is the impact.

 

Post analysis refers to techniques that are used to post-process learned patterns,

for instance, to prune rules, reduce redundancy, summarize learned rules, merge

patterns, match expected patterns by similarity difference, and the extraction of actions

from learned rules.

 

Post mining refers to the pattern mining on learned patterns, or on learned patterns

combined with additional data. The main difference between post analysis and post

mining is whether another round of pattern mining process is conducted on the

learned pattern set or not.

 

Reverse pattern also called impact-reserved pattern, is a pattern corresponding

to another pattern, which triggers the impact change from one to another, usually

opposite impact.

 

Social factor refers to aspects related to human social intelligence such as social

cognition, emotional intelligence, consensus construction, and group decision;

animat/agent-based social intelligence aspects such as swarm/collective intelligence

aspects, behavior/group dynamics aspects, as well as many common aspects such as

collective interaction, social behavior network, social interaction rules, protocols,

norms, trust and reputation, and privacy, risk, and security in a social context, etc.

 

Social intelligence refers to the intelligence that emerges from the group interactions,

behaviors and corresponding regulation surrounding a data mining problem.

Social intelligence covers both human social intelligence and animat/agent-based

social intelligence.

 

Subjective business interestingness measures business and user concerns from

the subjective perspectives such as psychoanalytic factors.

 

Subjective technical interestingness focuses and is based on technical means, and

recognize to what extent a pattern is of interest to a particular technical method.

 

Technical interestingness The technical interestingness of a pattern is highly dependent

on certain technical measures specified for a data mining method. Technical

interestingness is further measured in terms of objective technical measures and

subjective technical measures.

 

Ubiquitous intelligence refers to the emergence of intelligence from many related

aspects surrounding a data mining task, such as in-depth data intelligence, domain

intelligence, human intelligence, network and web intelligence, and/or organizational

and social intelligence. Real-world data mining applications often involve

multiple aspects of intelligence, a key task for actionable knowledge discovery and

delivery is to synthesize such ubiquitous intelligence. For this, methodologies, techniques

and tools for intelligence meta-synthesis in domain driven data mining is

necessary. The theory of M-computing, M-interaction and M-space provide a solution

for this purpose.

 

Underlying pattern also called base pattern, a base of a combined pattern, on top

of which new pattern(s) is(are) generated. An underlying pattern may be taken as

prefix or postfix of a derivative pattern.