Skip to main content

Advertisement

Log in

Multi-label classification and knowledge extraction from oncology-related content on online social networks

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

This study aims at automatic processing and knowledge extraction from large amounts of oncology-related content from online social networks (OSN). In this context, a large number of OSN textual posts concerning major cancer types are automatically scraped and structured using natural language processing techniques. Machines are trained to assign multiple labels to these posts based on the type of knowledge enclosed, if any. Trained machines are used to automatically classify large-scale textual posts. Statistical inferences are made based on these predictions to extract general concepts and abstract knowledge. Different approaches for constructing document feature vectors showed no tangible effect on the classification accuracy. Among different classifiers, logistic regression achieved the highest overall accuracy (96.4%) and \(\overline{F1}\) (73.4) in a 13-way multi-label classification of textual posts. The most common topic was seeking or providing moral support for cancer patients, followed by providing technical information about cancer causes and treatments. The most common causes and treatments of different types of cancer on OSN are also automatically detected in this study. Seeking or providing moral support for cancer patients shared the largest overlap with other topics, i.e. moral support tends to be present even in OSN posts which focus on other topics. On the other hand, providing technical information about cancer diagnosis or prevention were the most isolated topics, where OSN posts tend not to allude to other topics. OSN posts which seek financial support only overlap with the moral support topic, if any. Our methodology and results provide public health professionals with an opportunity to monitor what topics and to which extent are being discussed on OSN, what specific information and knowledge are being disseminated over OSN, and to assess their veracity in close to real time. This helps them to develop policies that encourage, discourage, or modify the consumption of viral oncology-related information on OSN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. I, you’d, below, so, who, is, mightn’t, did, for, the, any, each, hers, more, own, mustn, about, o, wouldn’t, between, off, s, doesn, ve, it’s, as, just, be, won’t, they, your, yourselves, isn’t, from, where, y, d, ourselves, she’s, at, our, why, him, you, can, himself, such, haven, to, most, you’ve, above, myself, than, now, here, only, it, through, aren, while, has, am, aren’t, but, down, too, hadn, he, other, there, having, not, itself, shouldn’t, up, until, on, didn, how, been, both, her, wouldn, shouldn, nor, being, shan’t, further, themselves, or, herself, all, theirs, during, no, out, after, needn’t, ain, should’ve, which, under, couldn’t, whom, doesn’t, their, ma, yours, you’re, if, these, my, again, wasn, weren’t, you’ll, wasn’t, when, don, because, hadn’t, that’ll, once, over, will, some, isn, does, shan, its, had, what, didn’t, were, an, re, and, are, against, into, have, mustn’t, this, do, in, before, yourself, t, same, was, doing, mightn, we, weren, haven’t, that, needn, few, hasn’t, me, she, ours, of, with, don’t, m, a, couldn, by, hasn, won, then, should, them, those, very, his, ll.

References

  • American Cancer Society (2019) Cancer facts and figures. American Cancer Society, Atlanta, GA. https://www.cancer.org/content/dam/cancer-org/research/cancer-facts-and-statistics/annual-cancer-facts-and-figures/2019/cancer-facts-and-figures-2019.pdf. Accessed 1 Dec 2018

  • Antheunis ML, Tates K, Nieboer TE (2013) Patients’ and health professionals’ use of social media in health care: motives, barriers and expectations. Patient Educ Couns 92(3):426–431

    Google Scholar 

  • Ashcraft KA, Warner AB, Jones LW, Dewhirst MW (2019) Exercise as adjunct therapy in cancer. Semi Radiat Oncol 29(1):16–24

    Google Scholar 

  • Attai DJ, Cowher MS, Al-Hamadani M, Schoger JM, Staley AC, Landercasper J (2015) Twitter social media is an effective tool for breast cancer patient education and support: patient-reported outcomes by survey. J Med Internet Res 17(7):e188

    Google Scholar 

  • Bloom R, Amber KT, Hu S, Kirsner R (2015) Google search trends and skin cancer: evaluating the us population’s interest in skin cancer and its association with melanoma outcomes. JAMA Dermatol 151(8):903–905

    Google Scholar 

  • Bosslet GT, Torke AM, Hickman SE, Terry CL, Helft PR (2011) The patient–doctor relationship and online social networks: results of a national survey. J Gen Intern Med 26(10):1168–1174

    Google Scholar 

  • Byars T, Theisen E, Bolton DL (2019) Using cannabis to treat cancer-related pain. Semin Oncol Nurs 35(3):300–309

    Google Scholar 

  • Charani E, Castro-Sánchez E, Moore LS, Holmes A (2014) Do smartphone applications in healthcare require a governance and legal framework? It depends on the application! BMC Med 12(1):29

    Google Scholar 

  • Chou W-YS, Hunt YM, Beckjord EB, Moser RP, Hesse BW (2009) Social media use in the United States: implications for health communication. J Med Internet Res 11(4):e48

    Google Scholar 

  • Chou W-YS, Hunt Y, Folkers A, Augustson E (2011) Cancer survivorship in the age of YouTube and social media: a narrative analysis. J Med Internet Res 13(1):e7

    Google Scholar 

  • Chretien K, Azar J, Kind T (2011) Physicians on twitter. J Am Med Assoc 305(6):566–568

    Google Scholar 

  • Chung JE (2014) Social networking in online support groups for health: how online social networking benefits patients. J Health Commun 19(6):639–659

    Google Scholar 

  • Crannell WC, Clark E, Jones C, James TA, Moore J (2016) A pattern-matched Twitter analysis of US cancer-patient sentiments. J Surg Res 206(2):536–542

    Google Scholar 

  • Dredze M (2012) How social media will change public health. IEEE Intell Syst 27(4):81–84

    Google Scholar 

  • Elkin N (2008) How America searches: health and wellness. Opinion Research Corporation: iCrossing 1–17

  • Eysenbach G (2008) Medicine 2.0: social networking, collaboration, participation, apomediation, and openness. J Med Internet Res 10(3):e22

    Google Scholar 

  • Falzone AE, Brindis CD, Chren M-M, Junn A, Pagoto S, Wehner M, Linos E (2017) Teens, tweets, and tanning beds: rethinking the use of social media for skin cancer prevention. Am J Prev Med 53(3):S86–S94

    Google Scholar 

  • Gold J, Pedrana AE, Sacks-Davis R, Hellard ME, Chang S, Howard S, Keogh L, Hocking JS, Stoove MA (2011) A systematic examination of the use of online social networking sites for sexual health promotion. BMC Public Health 11(1):583

    Google Scholar 

  • Gottlieb BH, Wachala ED (2007) Cancer support groups: a critical review of empirical studies. Psychooncology 16(5):379–400

    Google Scholar 

  • Gough A, Hunter RF, Ajao O, Jurek A, McKeown G, Hong J, Barrett E, Ferguson M, McElwee G, McCarthy M, Kee F (2017) Tweet for behavior change: using social media for the dissemination of public health messages. JMIR Public Health Surveill 3(1):e14

    Google Scholar 

  • Griffis HM, Kilaru AS, Werner RM, Asch DA, Hershey JC, Hill S, Ha YP, Sellers A, Mahoney K, Merchant RM (2014) Use of social media across US hospitals: descriptive analysis of adoption and utilization. J Med Internet Res 16(11):e264

    Google Scholar 

  • Harris JK, Snider D, Mueller N (2013) Social media adoption in health departments nationwide: the state of the states. Front Public Health Serv Syst Res 2(1):5

    Google Scholar 

  • Hashemi M (2019) Web page classification: a survey of perspectives, gaps, and future directions. Multimedia Tools Appl. https://doi.org/10.1007/s11042-019-08373-8

    Article  Google Scholar 

  • Hashemi M, Hall M (2019) Detecting and classifying online dark visual propaganda. Image Vis Comput 89:95–105

    Google Scholar 

  • Hashemi M, Karimi HA (2018) Weighted machine learning. Stat Optim Inf Comput 6(4):497–525

    MathSciNet  Google Scholar 

  • Häuser W, Welsch P, Klose P, Radbruch L, Fitzcharles M-A (2019) Efficacy, tolerability and safety of cannabis-based medicines for cancer pain: a systematic review with meta-analysis of randomised controlled trials. Der Schmerz 33(5):424–436

    Google Scholar 

  • Hawn C (2009) Take two aspirin and tweet me in the morning: how Twitter, Facebook, and other social media are reshaping health care. Health Aff 28(2):361–368

    Google Scholar 

  • Heilferty CM (2009) Toward a theory of online communication in illness: concept analysis of illness blogs. J Adv Nurs 65(7):1539–1547

    Google Scholar 

  • Huber J, Muck T, Maatz P, Keck B, Enders P, Maatouk I, Ihrig A (2018) Face-to-face vs. online peer support groups for prostate cancer: a cross-sectional comparison study. J Cancer Surviv 12(1):1–9

    Google Scholar 

  • Jaidka K, Zhou A, Lelkes Y (2019) Brevity is the soul of Twitter: the constraint affordance and political discussion. J Commun 69(4):345–372

    Google Scholar 

  • Jiang S (2017) The role of social media use in improving cancer survivors’ emotional well-being: a moderated mediation study. J Cancer Surviv 11(3):386–392

    Google Scholar 

  • Jiménez J, Ramos A, Ramos-Rivera FE, Gwede C, Quinn GP, Vadaparampil S, Brandon T, Simmons V, Castro E (2018) Community engagement for identifying cancer education needs in Puerto Rico. J Cancer Educ 33(1):12–20

    Google Scholar 

  • Jung AY, Behrens S, Schmidt M, Thoene K, Obi N, Hüsing A, Chang-Claude J (2019) Pre-to postdiagnosis leisure-time physical activity and prognosis in postmenopausal breast cancer survivors. Breast Cancer Res 21(1):117

    Google Scholar 

  • Jurafsky D, Martin JH (2014) Speech and language processing. Pearson, London

    Google Scholar 

  • Kaplan W (2012) Social media and survivorship: building a cancer support network for the 21st century. Oncol Nurse Advisor 3(2):35

    Google Scholar 

  • Lapointe L, Ramaprasad J, Vedel I (2014) Creating health awareness: a social media enabled collaboration. Health Technol 4(1):43–57

    Google Scholar 

  • Lyles CR, López A, Pasick R, Sarkar U (2013) “5 mins of uncomfyness is better than dealing with cancer 4 a lifetime”: an exploratory qualitative analysis of cervical and breast cancer screening dialogue on Twitter. J Cancer Educ 28(1):127–133

    Google Scholar 

  • Marteau TM, Hollands GJ, Fletcher PC (2012) Changing human behavior to prevent disease: the importance of targeting automatic processes. Science 337(6101):1492–1495

    Google Scholar 

  • Murthy D, Gross A, Oliveira D (2011) Understanding cancer-based networks in Twitter using social network analysis. In: 5th IEEE international conference on semantic computing. IEEE, pp 559–566

  • Norman C (2011) eHealth literacy 2.0: problems and opportunities with an evolving concept. J Med Internet Res 13(4):e125

    Google Scholar 

  • Orsini M (2010) Social media: how home health care agencies can join the chorus of empowered voices. Home Health Care Manag Pract 22(3):213–217

    Google Scholar 

  • Paul MJ, Dredze M (2011) You are what you tweet: analyzing twitter for public health. In: Fifth international AAAI conference on weblogs and social media. AAAI, pp 265–272

  • Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137

    Google Scholar 

  • Rajaraman A, Ullman JD (2011) Data mining. In Mining of massive datasets. Cambridge University Press, Cambridge, pp 1–17

  • Randeree E (2009) Exploring technology impacts of Healthcare 2.0 initiatives. Telemed and e-Health 15(3):255–260

    Google Scholar 

  • Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359

    MathSciNet  Google Scholar 

  • Read J, Martino L, Luengo D (2014) Efficient monte carlo methods for multi-dimensional learning with classifier chains. Pattern Recognit 47(3):1535–1546

    MATH  Google Scholar 

  • Rehman S, Lyons K, McEwen R, Sellen K (2018) Motives for sharing illness experiences on Twitter: conversations of parents with children diagnosed with cancer. Inf Commun Soc 21(4):578–593

    Google Scholar 

  • Ritterman J, Osborne M, Klein E (2009) Using prediction markets and Twitter to predict a swine flu pandemic. In: 1st international workshop on mining social media, vol 9, pp 9–17

  • Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523

    Google Scholar 

  • Strekalova YA, Krieger JL (2017) A picture really is worth a thousand words: public engagement with the National Cancer Institute on social media. J Cancer Educ 32(1):155–157

    Google Scholar 

  • Sugawara Y, Narimatsu H, Hozawa A, Shao L, Otani K, Fukao A (2012) Cancer patients on Twitter: a novel patient community on social media. BMC Res Notes 5(1):699

    Google Scholar 

  • Tsuya A, Sugawara Y, Tanaka A, Narimatsu H (2014) Do cancer patients tweet? Examining the twitter use of cancer patients in Japan. J Med Internet Res 16(5):e137

    Google Scholar 

  • Twitter (n.d.) https://about.twitter.com/company. Retrieved 1 Feb 2019

  • Uysal AK, Gunal S (2014) The impact of preprocessing on text classification. Inf Process Manag 50(1):104–112

    Google Scholar 

  • Vraga EK, Stefanidis A, Lamprianidis G, Croitoru A, Crooks AT, Delamater PL, Pfoser D, Radzikowski JR, Jacobsen KH (2018) Cancer and social media: a comparison of traffic about breast cancer, prostate cancer, and other reproductive cancers on Twitter and Instagram. J Health Commun 23(2):181–189

    Google Scholar 

  • Wicks P, Massagli M, Frost J, Brownstein C, Okun S, Vaughan T, Bradley R, Heywood J (2010) Sharing health data for better outcomes on PatientsLikeMe. J Med Internet Res 12(2):e19

    Google Scholar 

  • Wiener L, Crum C, Grady C, Merchant M (2011) To friend or not to friend: the use of social media in clinical oncology. J Oncol Pract 8(2):103–106

    Google Scholar 

  • Yoo S-W, Kim J, Lee Y (2018) The effect of health beliefs, media perceptions, and communicative behaviors on health behavioral intention: an integrated health campaign model on social media. Health Commun 33(1):32–40

    Google Scholar 

  • Zhou J (2018) Factors influencing people’s personal information disclosure behaviors in online health communities: a pilot study. Asia Pac J Public Health 30(3):286–295

    Google Scholar 

  • Zucco R, Lavano F, Anfosso R, Bianco A, Pileggi C, Pavia M (2018) Internet and social media use for antibiotic-related information seeking: findings from a survey among adult population in Italy. Int J Med Inform 111(1):131–139

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mahdi Hashemi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hashemi, M., Hall, M. Multi-label classification and knowledge extraction from oncology-related content on online social networks. Artif Intell Rev 53, 5957–5994 (2020). https://doi.org/10.1007/s10462-020-09839-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-020-09839-0

Keywords

Navigation