Notice: Undefined index: linkPowrot in C:\wwwroot\wwwroot\publikacje\publikacje.php on line 1286
Publikacje
Pomoc (F2)
[140400] Artykuł:

A comprehensive approach to preprocessing data for bibliometric analysis

Czasopismo: Scientometrics  
ISSN:  1588-2861
Opublikowano: Wrzesień 2025
 
  Autorzy / Redaktorzy / Twórcy
Imię i nazwisko Wydział Katedra Do oświadczenia
nr 3
Grupa
przynależności
Dyscyplina
naukowa
Procent
udziału
Liczba
punktów
do oceny pracownika
Liczba
punktów wg
kryteriów ewaluacji
Marzena Nowakowska orcid logo WZiMKKatedra Technologii Informatycznych*Takzaliczony do "N"Nauki o zarządzaniu i jakości100100.00100.00  

Grupa MNiSW:  Publikacja w czasopismach wymienionych w wykazie ministra MNiSzW (część A)
Punkty MNiSW: 100


Pełny tekstPełny tekst     DOI LogoDOI    
Keywords:

Bibliometrics Bibliographic data Data cleaning Data join Disambiguation  Thesaurus 



Abstract:

Bibliometric analysis, also known as bibliometrics, has been conducted for several decades
to evaluate scientific research based on data available on bibliographic platforms,
such as the popular Web of Science or Scopus. Research papers which include bibliometric
analysis typically ignore the problem of bibliographic data preprocessing, in particular its
important aspect—data cleaning. Discussion of bibliographic data preprocessing in the literature
is sparse and scattered; studies usually address selected single components of the
entire endeavour. This study aims to fill the gap as a review article, extensively analysing
the problem, presenting issues arising from the structure of bibliographic data, combining
data from various sources, creating thesauri and conducting bibliometric analyses,
also through the author’s own experience. A brief description of the most popular software
dedicated to bibliometrics, such as BibExcel, Bibliometrix, CiteSpace, CitNetExplorer,
SciMAT, Sci2 Tool, and VOSviewer, is also provided, highlighting the operations available
in these applications for the preliminary processing of bibliographic data. The work allows
us to draw the following conclusions. The task is more difficult and demanding than some
authors suggest or unclearly claim has already been accomplished, without providing additional
details. Data cleaning operations are carried out at various stages of preprocessing,
sometimes repetitively, and the order in which they are performed may be significant as it
determines the success or failure of the process, in particular when combining data from
different sources. There is no software which allows automatic execution of the entire preprocessing
procedure of bibliographic data. Moreover, manual work is inevitable at various
stages of the process. The contribution of this work to the field of bibliometric analysis is
expressed in the form of a methodological synthesis, which involves the holistic consideration
of the discussed issue, enabling a more comprehensive understanding of it.