The burst analysis is done to identify the sudden increases or “bursts” in the frequency-of-use of certain terms or concepts over time, how they were more active for a period of time, and then faded away.
This is the burst analysis on publications on mesothelioma over a period of time from 1930s to 2007. The data has been taken from the MEDLINE dataset on http://sdb.cns.iu.edu. The data set was a .csv file and each row in the dataset has the details on a particular publication. The details include things like article title, year of publication, author name, publication mode etc.
This analysis will detect the “bursty” terms used in the title of papers on mesothelioma. The contents in the field article_title were normalized resulting in lowercase, tokenized and stemmed words with no stop words. Burtst analysis was carried out with respect to the Year of publication which represents when the events / topics were in use.
Results of the burst analysis were visualized through a temporal Bar graph which is given below:
We see that the term pleura was the most frequently used term in the context of mesithelioma during the period 1930 to 1966. Several other terms were actively used in various time frames before fading away.