Notice: Undefined index: linkPowrot in C:\wwwroot\wwwroot\publikacje\publikacje.php on line 1286
[140400] Artykuł: A comprehensive approach to preprocessing data for bibliometric analysisCzasopismo: ScientometricsISSN: 1588-2861 Opublikowano: Wrzesień 2025 Autorzy / Redaktorzy / Twórcy Grupa MNiSW: Publikacja w czasopismach wymienionych w wykazie ministra MNiSzW (część A) Punkty MNiSW: 100 ![]() ![]() Keywords: Bibliometrics Bibliographic data Data cleaning Data join Disambiguation  Thesaurus  |
Bibliometric analysis, also known as bibliometrics, has been conducted for several decades
to evaluate scientific research based on data available on bibliographic platforms,
such as the popular Web of Science or Scopus. Research papers which include bibliometric
analysis typically ignore the problem of bibliographic data preprocessing, in particular its
important aspect—data cleaning. Discussion of bibliographic data preprocessing in the literature
is sparse and scattered; studies usually address selected single components of the
entire endeavour. This study aims to fill the gap as a review article, extensively analysing
the problem, presenting issues arising from the structure of bibliographic data, combining
data from various sources, creating thesauri and conducting bibliometric analyses,
also through the author’s own experience. A brief description of the most popular software
dedicated to bibliometrics, such as BibExcel, Bibliometrix, CiteSpace, CitNetExplorer,
SciMAT, Sci2 Tool, and VOSviewer, is also provided, highlighting the operations available
in these applications for the preliminary processing of bibliographic data. The work allows
us to draw the following conclusions. The task is more difficult and demanding than some
authors suggest or unclearly claim has already been accomplished, without providing additional
details. Data cleaning operations are carried out at various stages of preprocessing,
sometimes repetitively, and the order in which they are performed may be significant as it
determines the success or failure of the process, in particular when combining data from
different sources. There is no software which allows automatic execution of the entire preprocessing
procedure of bibliographic data. Moreover, manual work is inevitable at various
stages of the process. The contribution of this work to the field of bibliometric analysis is
expressed in the form of a methodological synthesis, which involves the holistic consideration
of the discussed issue, enabling a more comprehensive understanding of it.