Notice: Undefined index: linkPowrot in C:\wwwroot\wwwroot\publikacje\publikacje.php on line 1275
[84720] Artykuł: Hierarchical Clustering in Scalable Distributed Two-Layer Datastore for Big Data as a ServiceCzasopismo: International Conference on Enterprise Systems (ES) Tom: 6, Strony: 138-145ISSN: 2572-6609 ISBN: 978-1-5386-8388-0 Wydawca: IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA Opublikowano: 2018 Seria wydawnicza: International Conference on Enterprise Systems (ES) Autorzy / Redaktorzy / Twórcy Grupa MNiSW: Materiały z konferencji międzynarodowej (zarejestrowane w Web of Science) Punkty MNiSW: 15 Klasyfikacja Web of Science: Proceedings Paper DOI Web of Science |
In this paper we propose a highly scalable approach to data clustering which may be applied in cloud-based big data services. We present a hierarchical approach to create an automatic data clustering in a Scalable Distributed Two Layer Datastore (SD2DS) system by extending LH* schema so that it enables addressing data items based on their content. We achieved that with the bucket structure increase, the total clustering error decreases. Moreover, our method allows to incrementally add new data items to the structure and enables a parallel data processing. We carried out various simulations for 3 different cluster shapes and 5 different noise ratios to prove correctness of our solution. Additionally, we compare our solution with common clustering methods like K-means, Agglomerative and Birch.