Skip to main content
Log in

Validation of the Astro dataset clustering solutions with external data

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

We conduct an independent cluster validation study on published clustering solutions of a research testbed corpus, the Astro dataset of publication records from astronomy and astrophysics. We extend the dataset by collecting external validation data serving as proxies for the latent structure of the corpus. Specifically, we collect (1) grant funding information related to the publications, (2) data on topical special issues, (3) on specific journals’ internal topic classifications and (4) usage data from the main online bibliographic database of the discipline. The latter three types of data are newly introduced for the purpose of clustering validation and the rationale for using them for this task is set out. We find that one solution based on the global citation network achieves better results than the competitors across three validation data sources but that another solution based on bibliographic coupling performs best on the special issues data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Data availability

The original validation data is published at https://zenodo.org/record/4061694.

Notes

  1. See http://www.topic-challenge.info.

  2. http://141.20.126.171/solutions.html.

  3. http://54xushuo.net/wiki/lib/exe/fetch.php?media=resources:datasets:xlza_2018.zip.

  4. The reason for this result could be that these two solutions, eb and en, unlike the other ones, do not rely on direct citation, which is likely to be relatively rare between papers of a single special issue, but on bibliographic coupling and NLP-enhanced text similarity.

References

Download references

Acknowledgements

This research has made use of NASA’s Astrophysics Data System. We would like to thank Anastasiia Tcypina and Nikolai Schmarbeck for help with data collection. We further thank Clarivate Analytics for granting permission to use the Astro dataset which is derived from the Web of Science database. We also thank Michael J. Kurtz for his explanations of the ADS service.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul Donner.

Appendix

Appendix

Table 9 Topical sections of four astronomy journals with occurrence counts
Table 10 Three best- and worst-performing clusters per solution (only values for clusters for which all four true positive ratios could be calculated)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Donner, P. Validation of the Astro dataset clustering solutions with external data. Scientometrics 126, 1619–1645 (2021). https://doi.org/10.1007/s11192-020-03780-3

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-020-03780-3

Keywords

Navigation