Every academic knows that quite some time can pass from submission of a paper to a journal until acceptance and eventually publication. Whether it's the tedious communications with the Editors about the font size of a figure label or the questions in suspiciously minute detail from Reviewer 2.

Past publications in a journal can be indicative for how long your own submission process could take. For this case study, I have looked at how long it takes for papers in the field of Computational Neuroscience (which is the field I am working in) to be accepted at Nature Neuroscience, one of the most prestigious neuroscience journals out there.

To get this information, we can crawl publicly available data from the Nature website. Every article's page has the time of submission, acceptance and publication that we can gather to compute an average. An example is shown below.

We use the wonderful functions provided in requests_html to scrape the data. The script below goes through all search results from the Nature website, filtered by a specific subject, in this case it is "computational-neuroscience". We get a list of all articles by filtering the relevant elements we want to iterate through using the function base_html.html.find(). Then, we can easily get the relevant elements of the search results (the title and the html links to each article for example) using the attributes .text and .links.

from requests_html import HTMLSession
session = HTMLSession()

data = pd.DataFrame(columns=['journal', 'url', 'title', 'received', 'accepted', 'published', 'timeForAcceptance', 'timeForPublishing'])

for page_nr in range(1, 5):
    base_url = "https://www.nature.com/search?order=relevance&journal=neuro&subject=computational-neuroscience&article_type=research&page="
    base_url +=  str(page_nr)
    base_html = session.get(base_url)

    articles_journal = base_html.html.find('div.grid.grid-7.mq640-grid-12.mt10')
    articles = base_html.html.find("a[href*=articles]")

    for ai, a in enumerate(articles_journal):
        filter_journal = 'Nature Neuroscience'
        if filter_journal in a.text:
            print(" - - - - - - - - ")
            print('Article nr {} on page {} in Journal {}'.format(ai, page_nr, filter_journal))
            article_title = articles[ai].text
            article_suburl = list(articles[ai].links)[0]

            article_url = "https://www.nature.com{}".format(article_suburl)
            print("Getting {}".format(article_url))
            article_html = session.get(article_url)

            dates = article_html.html.find("time")
            print('> Received: ', dates[1].attrs['datetime'])
            print('> Accepted: ', dates[2].attrs['datetime'])
            print('> Published: ', dates[3].attrs['datetime'])
            received_date = parser.parse(dates[1].attrs['datetime'])
            accepted_date = parser.parse(dates[2].attrs['datetime'])
            published_date = parser.parse(dates[3].attrs['datetime'])

            timeForAcceptance = accepted_date - received_date
            timeForPublishing = published_date - received_date

            print(timeForAcceptance, 'between',received_date, 'and', accepted_date)

            data = data.append({'journal' : filter_journal, 
                 'url' : article_suburl,
                 'title' : article_title,
                 'received' : received_date,
                 'accepted' : accepted_date, 
                 'published' : published_date,
                 'timeForAcceptance' : timeForAcceptance,
                 'timeForPublishing' : timeForPublishing}, ignore_index = True)

Let's have a look at the aggregated Dataframe:

journal url title received accepted published timeForAcceptance timeForPublishing
0 Nature Neuroscience /articles/s41593-020-00753-w Strong inhibitory signaling underlies stable t... 2020-02-17 2020-11-05 2020-12-07 262 days 294 days
1 Nature Neuroscience /articles/s41593-020-00744-x Parameterizing neural power spectra into perio... 2019-05-31 2020-10-20 2020-11-23 508 days 542 days
2 Nature Neuroscience /articles/s41593-020-00733-0 Modeling behaviorally relevant neural dynamics... 2019-09-04 2020-10-02 2020-11-09 394 days 432 days
3 Nature Neuroscience /articles/s41593-020-00732-1 A cerebello-olivary signal for negative predic... 2019-11-25 2020-10-02 2020-11-09 312 days 350 days
4 Nature Neuroscience /articles/s41593-020-00719-y Edge-centric functional network representation... 2019-09-09 2020-09-03 2020-10-19 360 days 406 days
... ... ... ... ... ... ... ... ...
194 Nature Neuroscience /articles/nn.2797 Reversible large-scale modification of cortica... 2011-01-12 2011-03-04 2011-04-17 51 days 95 days
195 Nature Neuroscience /articles/nn.2904 Differential roles of human striatum and amygd... 2011-04-12 2011-07-07 2011-09-11 86 days 152 days
196 Nature Neuroscience /articles/nn.2868 High-accuracy neurite reconstruction for high-... 2011-02-28 2011-05-23 2011-07-10 84 days 132 days
197 Nature Neuroscience /articles/nn.2693 Hippocampal brain-network coordination during ... 2010-07-12 2010-10-06 2010-11-21 86 days 132 days
198 Nature Neuroscience /articles/nn.2872 Owl's behavior and neural representation predi... 2010-12-21 2011-04-29 2011-07-03 129 days 194 days

199 rows × 8 columns

We need to put the data into buckets of years and days for calculating the average.

data['year'] = data.apply(lambda row: row.published.year, axis=1)
data['days'] = data.apply(lambda row: row.timeForAcceptance.days, axis=1)
data['daysp'] = data.apply(lambda row: row.timeForPublishing.days, axis=1)

Let's plot the data:

plt.figure(figsize=(6, 3), dpi=300)
xfmt = md.DateFormatter('%Y')

years = [parser.parse(str(data.groupby(by='year').days.mean().index[d])) for d in range(len(data.groupby(by='year')))]
years_beginning = [datetime.datetime(y.year, month=1, day=1) for y in years]
mean_time = data.groupby(by='year').days.mean()
std_time = data.groupby(by='year').days.std()
plt.plot(years_beginning, mean_time, label='Yearly mean until accepted', c='C3')

plt.plot(years_beginning, data.groupby(by='year').daysp.mean(), label='Yearly mean until published', c='C0')

for di in range(len(data)):
    plt.scatter(data.iloc[di].received, data.iloc[di].days, c='C3', s=5, edgecolor='k', linewidth=0.5)
plt.xlabel("Time of submission")
plt.title("Time for acceptance of Computational\nNeuroscience papers in Nature Neuroscience")

So, how long?

On average, it took 157.47 days for a paper to be published from the day it was submitted.

This page was last built on 09.12.20 17:28:39