229
Hội thảo Khoa học Quốc tế
...
2. BACKGROUND
2.1. Association Rule Mining Problem: In the last decade researchers, has find out that
Association rule mining (ARM) is the one of the heart process of data mining. ARM is the most
important data mining process which find out the all relations between the frequent pattern and
it doesn’t need any supervisor for that.ARM process on variable length data and determine
comprehensible results. Modern organizations having geographically distributed structure.
Characteristically, every location provincially saves its eternally increasing amount of daily data.
In such type of organize data, centralized data mining can’t discover feasible useful pattern because
it take large network communication cost. This is over come by using distributed data mining.
2.2. Apriori Algorithm: An ARM, Apriori has been produced for ARM in enormous exchange
databases by IBM’s Quest venture group. They have splitted the issue of ARM into two sections.
1. Search all the item set from the data set which has transaction support greater then minimum
support. Call it frequent item sets.
2. Generate preferred rules by utilizing these frequent item sets. Think about this illustration:
with the end goal that LMNO and LM are frequent item sets, at that point we can discover the
administer IF LM NO holds by figuring the proportion (R) =support (LMNO)/support (LM). The
rule holds just if R ≥ minimum confidence. Influence a note of that the rule will have minimum
support in light of the fact that LMNO is frequent. The method is amazingly adaptable. The method
of Apriori method is given beneath
2.3. Distributed Association Ruling: DARM find rules from different spatial data set located in
distributed environment. Conversely, parallel network connection is not having fast communication
as compare to the distributed network. So distributed mining frequently means to limit cost of the
correspondence. Scientists longed for the fast DMA to mine rules from scattered informational
collections apportioned among three diverse area .In each site, FDM finds the local support counts
and prunes all infrequent one . In the wake of completing home pruning, each site broadcasts
messages every other site to ask for their support counts. It at that point chooses whether immense
item sets are all inclusive frequent and creates the candidate item sets from those comprehensively
frequent item sets.
3. ANALYSIS OF DATA IN DISTRIBUTED ENVIRONMENT
Data Mining is technique to retrieve the effective data from the huge amount of database,
there are mainly two main goal of retrieve the data from the database, first one is the prediction
and second one is the description, from mining the data from the database there are different
data mining algorithm are available like, ARM, clustering and classification etc. within this, used
the concept of the SNARM in the geographical region, so the concept is spatial association rule
mining, in which retrieve the data from the geographical areas. Spatial association mining concept
is used to find the relationship between the different attributes by considering the threshold value
of support and confidence. And calculate the frequent item set in the distributed environment. In
this process, we divided the entire region into the three different region and each having their
spatial database SDB
1
, SDB
2
,…..SDBn and their own key values SK
1
,SK
2
,……..SK
n
, or Select N
number of region each having their own database SDB
1
, SDB
2
,…., SDBn . Every region figures
their frequent items set and support esteem. Every region are orchestrate in ring engineering at that
point locate the partial support, Now the area 1 send their Partial Support (PS) esteem to region 2
and region 2 send their incentive to region 3 and this procedure proceed till region n and after that
region n send their incentive to region 1. Region 1 subtract all the Random number an incentive
from the Partial Support esteem and ascertain their genuine support, now region 1 communicate the
real support an incentive to the whole region in the distributed.