top of page
  • Writer's pictureCarlos A. Almenara

Topic modeling using natural language processing and machine learning

In 2020, during the COVID-19 lockdown, I had the opportunity to analyze 7899 abstracts of articles from eating disorder journals. I employed bibliometrics, network analysis, and topic modeling.


Today, I would like to share with you some interesting results: the topics found and their trends over a 40-year span (see Figure below).


I performed the analyses using Python. First, it was necessary to apply some natural language processing (NLP) techniques to clean and preprocess the dataset. Then, I used machine learning to extract the topics. More precisely, I employed a combination of TF-IDF (term frequency-inverse document frequency) with Non-negative matrix factorization (NMF or NNMF).


As can be seen below, a total of 10 topics were extracted (topics were manually labeled):


Topic 1 - Risk factors of eating disorders (2809 documents)

Topic 2 - Body image dissatisfaction (1136 documents)

Topic 3 - Binge Eating Disorder diagnosis (928 documents)

Topic 4 - Weight loss, weight control, and diet (735 documents)

Topic 5 - Clinical groups (671 documents)

Topic 6 - Treatment outcome (379 documents)

Topic 7 - Family and parent-child (356 documents)

Topic 8 - Binge and purge episodes (328 documents)

Topic 9 - Gender and subgroups (282 documents)

Topic 10 - Eating Disorder Not Otherwise Specified (EDNOS; 275 documents)


An interesting finding that you can spot in the Figure is the fact that studies with clinical samples were more evident during the 80s and mid 90s, whilst studies focusing on risk factors have grown exponentially in the end of the 90s. Something a bit similar happened with studies on treatment outcomes. The take-home message is that there has been a shift in history (around the end of the 90s), favoring the focus of eating disorders literature towards the prevention and better treatment outcomes (evidence-based interventions).


For further information, you can read the paper and take a look on the Python code in GitHub.


Commentaires


bottom of page