Every academic knows that quite some time can pass from submission of a paper to a journal until acceptance and eventually publication. Whether it's the tedious communications with the Editors about the font size of a figure label or the questions in suspiciously minute detail from Reviewer 2.

Past publications in a journal can be indicative for how long your own submission process could take. For this case study, I have looked at how long it takes for papers in the field of Computational Neuroscience (which is the field I am working in) to be accepted at Nature Neuroscience, one of the most prestigious neuroscience journals out there.

To get this information, we can crawl publicly available data from the Nature website. Every article's page has the time of submission, acceptance and publication that we can gather to compute an average. An example is shown below.

We use the wonderful functions provided in requests_html to scrape the data. The script below goes through all search results from the Nature website, filtered by a specific subject, in this case it is "computational-neuroscience". We get a list of all articles by filtering the relevant elements we want to iterate through using the function base_html.html.find(). Then, we can easily get the relevant elements of the search results (the title and the html links to each article for example) using the attributes .text and .links.

#hide_output
from requests_html import HTMLSession
session = HTMLSession()

data = pd.DataFrame(columns=['journal', 'url', 'title', 'received', 'accepted', 'published', 'timeForAcceptance', 'timeForPublishing'])

for page_nr in range(1, 5):
    
    base_url = "https://www.nature.com/search?order=relevance&journal=neuro&subject=computational-neuroscience&article_type=research&page="
    
    base_url +=  str(page_nr)
    base_html = session.get(base_url)

    articles_journal = base_html.html.find('div.grid.grid-7.mq640-grid-12.mt10')
    articles = base_html.html.find("a[href*=articles]")

    for ai, a in enumerate(articles_journal):
        filter_journal = 'Nature Neuroscience'
        if filter_journal in a.text:
            print(" - - - - - - - - ")
            print('Article nr {} on page {} in Journal {}'.format(ai, page_nr, filter_journal))
            article_title = articles[ai].text
            print("\"{}\"".format(article_title))
            article_suburl = list(articles[ai].links)[0]

            article_url = "https://www.nature.com{}".format(article_suburl)
            print("Getting {}".format(article_url))
            article_html = session.get(article_url)

            dates = article_html.html.find("time")
            print('> Received: ', dates[1].attrs['datetime'])
            print('> Accepted: ', dates[2].attrs['datetime'])
            print('> Published: ', dates[3].attrs['datetime'])
            received_date = parser.parse(dates[1].attrs['datetime'])
            accepted_date = parser.parse(dates[2].attrs['datetime'])
            published_date = parser.parse(dates[3].attrs['datetime'])

            timeForAcceptance = accepted_date - received_date
            timeForPublishing = published_date - received_date

            print(timeForAcceptance, 'between',received_date, 'and', accepted_date)

            data = data.append({'journal' : filter_journal, 
                 'url' : article_suburl,
                 'title' : article_title,
                 'received' : received_date,
                 'accepted' : accepted_date, 
                 'published' : published_date,
                 'timeForAcceptance' : timeForAcceptance,
                 'timeForPublishing' : timeForPublishing}, ignore_index = True)

Let's have a look at the aggregated Dataframe:

data

We need to put the data into buckets of years and days for calculating the average.

data['year'] = data.apply(lambda row: row.published.year, axis=1)
data['days'] = data.apply(lambda row: row.timeForAcceptance.days, axis=1)
data['daysp'] = data.apply(lambda row: row.timeForPublishing.days, axis=1)

Let's plot the data:

plt.figure(figsize=(6, 3), dpi=300)
ax=plt.gca()
xfmt = md.DateFormatter('%Y')
ax.xaxis.set_major_formatter(xfmt)


years = [parser.parse(str(data.groupby(by='year').days.mean().index[d])) for d in range(len(data.groupby(by='year')))]
years_beginning = [datetime.datetime(y.year, month=1, day=1) for y in years]
mean_time = data.groupby(by='year').days.mean()
std_time = data.groupby(by='year').days.std()
plt.plot(years_beginning, mean_time, label='Yearly mean until accepted', c='C3')

plt.plot(years_beginning, data.groupby(by='year').daysp.mean(), label='Yearly mean until published', c='C0')
plt.legend(fontsize=10)

for di in range(len(data)):
    plt.scatter(data.iloc[di].received, data.iloc[di].days, c='C3', s=5, edgecolor='k', linewidth=0.5)
    
plt.xlabel("Time of submission")
plt.ylabel("Days")
plt.title("Time for acceptance of Computational\nNeuroscience papers in Nature Neuroscience")
plt.savefig("../images/icon_natureneuroscience.png");

	journal	url	title	received	accepted	published	timeForAcceptance	timeForPublishing
0	Nature Neuroscience	/articles/s41593-020-00753-w	Strong inhibitory signaling underlies stable t...	2020-02-17	2020-11-05	2020-12-07	262 days	294 days
1	Nature Neuroscience	/articles/s41593-020-00744-x	Parameterizing neural power spectra into perio...	2019-05-31	2020-10-20	2020-11-23	508 days	542 days
2	Nature Neuroscience	/articles/s41593-020-00733-0	Modeling behaviorally relevant neural dynamics...	2019-09-04	2020-10-02	2020-11-09	394 days	432 days
3	Nature Neuroscience	/articles/s41593-020-00732-1	A cerebello-olivary signal for negative predic...	2019-11-25	2020-10-02	2020-11-09	312 days	350 days
4	Nature Neuroscience	/articles/s41593-020-00719-y	Edge-centric functional network representation...	2019-09-09	2020-09-03	2020-10-19	360 days	406 days
...	...	...	...	...	...	...	...	...
194	Nature Neuroscience	/articles/nn.2797	Reversible large-scale modification of cortica...	2011-01-12	2011-03-04	2011-04-17	51 days	95 days
195	Nature Neuroscience	/articles/nn.2904	Differential roles of human striatum and amygd...	2011-04-12	2011-07-07	2011-09-11	86 days	152 days
196	Nature Neuroscience	/articles/nn.2868	High-accuracy neurite reconstruction for high-...	2011-02-28	2011-05-23	2011-07-10	84 days	132 days
197	Nature Neuroscience	/articles/nn.2693	Hippocampal brain-network coordination during ...	2010-07-12	2010-10-06	2010-11-21	86 days	132 days
198	Nature Neuroscience	/articles/nn.2872	Owl's behavior and neural representation predi...	2010-12-21	2011-04-29	2011-07-03	129 days	194 days

Waiting for R2...

So, how long?