Structural-Semantic Clustering for Architectural Models

By: Thi Dinh Tran, Maria Teresa Rossi, Davide Soldati, Mauro Sonzogni, Amleto Di Salle, Ludovico Iovino, Leonardo Mariani

Abstract

Architectural models are rich and useful representations of systems’ designs, which can be reused and adapted to design new systems. The many publicly available architectural models (e.g., on GitHub) offer the opportunity to reuse domain knowledge to bootstrap the design of new systems, avoiding common mistakes and promoting the reuse of good principles and practices. Discovering related architectures to distill domain knowledge, however, is challenging, and this body of public knowledge often remains underexploited. To address this need, this paper investigates the application of structural-semantic clustering strategies to automatically cluster sets of architectural models. We consider multiple structural model clustering strategies, which represent models as graphs, and multiple semantic model clustering strategies, utilizing embeddings to compare the names and contents of models. To assess the considered strategies, we built a curated dataset of 1,202 manually clustered models collected from GitHub, which is the largest ground truth of clustered architectural models, to the best of our knowledge. The proposed empirical study thoroughly analyses the effectiveness of clustering strategies, since they are the instrument that can enable the discovery of implicit architectural knowledge available in models.

Keywords

Software Architecture, Architectural Models, Clustering, Structural Similarity, Semantic Similarity

Cite as:

Thi Dinh Tran, Maria Teresa Rossi, Davide Soldati, Mauro Sonzogni, Amleto Di Salle, Ludovico Iovino, Leonardo Mariani, “Structural-Semantic Clustering for Architectural Models”, Journal of Object Technology, Volume 25, no. 3 ( 2026), pp. 3:309-322, doi:10.5381/jot.2026.25.3.a24.

PDF | DOI | BiBTeX | Tweet this | Post to CiteULike | Share on LinkedIn