Discussion and Conclusions

We see growth across all the topics we used to categorise data availability statements submitted to 176 journals between 2013 and 2019 (Figure \ref{530186}). 
Figure 3 and Figure \ref{552435}, respectively, illustrate how our methods delivered results more than offering analysis; respectively, they show overall percentage of topics identified, and the relationship of topics to individual documents analysed. Figure \ref{295183} shows relationship of topics to words used by authors, and offers readers a simple visual validation for our methods and results; for example, the intense yellow area above topic 8 "Available on reasonable request" indicates strong relationships with the words: reasonable, request, corresponding author, finding, study, available and datum. Figure \ref{763853} offers trends in topics over time, and these are discussed in more detail below.
We see a particularly sharp increase in growth in early 2019 after launching the Expects Data policy which, for journals that adopt it, requires a data availability statement in every article \cite{open}. Implementing this requirement for data availability statements correlates with many more submitted data availability statements (as is to be expected of a successful implementation). It also correlates with many more declarations made by researchers that data are available on request (for example, topic 8 in Figure \ref{763853}, and the related topics 4, 9, 10, 12, 14, and 20). This is an improvement over the absence of any statement about data. It seems reasonable to anticipate that as researchers become familiar with, interested in, able to, and required to share research data the high proportion of data availability statements categorised as topic 8 (and related topics listed above) will gradually be replaced by data availability statements that describe shared data (like topics 1, 2, 3, 15, 17, 18, and 19). 
For data that have been shared, topic 19 is a good standard to aspire to. It indicates that data are shared in a repository with a permanent digital object identifier (DOI). The number of data availability statements categorised as topic 19 shows steady growth over 6 years. This is reassuring, but topic 19 does not show the sharp increase in 2019 that we might expect to correlate with launch of our Expects Data policy. Several related topics that also describe data that have been shared online (topics 1, 2, 3, 15, 17, and 18) do show the expected sharp increase in early 2019. For Topic 19, consistent growth may be real and based on author behaviour, or it may be an artefact of the analysis that we could investigate in future work.
Data that are available in genetics databases, per topic 15, also show an interesting trend: steeper growth between 2014 and 2016; a distinct flat period between 2016 and 2018; and then steep growth in 2019, correlating with launch of our Expects Data policy. This could be an area for future analysis. It is also interesting to note continuing presence and moderate growth in topics 13 and 16, which indicate data that have been shared in journal supporting information.
To conclude, if our goal is simply to enable research authors to describe in their journal articles whether or not they have shared the new data they have created then this can be achieved using a policy that requires data availability statements. If our goal is to increase data sharing, then launching a policy and studying the data collected from it may also be valuable: it creates insights into how to enable and support better experiences for researchers, more data sharing, and higher-quality data sharing. For example, data from this study could help identify which kinds of articles without shared data are similar to those with shared data, and to which journals both are submitted. With that information we could design and launch supportive policies and services where they are more likely to be welcomed by researchers, and therefore where they are most likely to have a positive impact.
It is ironic to write an article about data availability while not sharing the data set. The data availability statements we analysed were submitted by researchers to Wiley as part of journal articles, some of which we published. We analysed this information to improve our understanding of researchers' practices, and to improve our products and services . That kind of use is covered by both our privacy policy and the license researchers give us to publish their work. We did not ask those researchers when they submitted their articles whether we could share data about their data availability statements, and for this reason chose not to share the data set. With that in mind, the final and perhaps most important lesson for us from this study is an appreciation for the value of careful study designs and data management plans \cite{dcc}, created before starting a study.

Data availability statement

Research data are not shared.

Disclosure of conflicts of interest

All authors are employed by Wiley and benefit from the company's success.

Acknowledgements

Thanks to Elisha Morris at Wiley for the literature search and analysis we used to write our introduction. Thanks to Yan Wu at Wiley for insights into data sharing requirements in China. Thanks to Gary Spencer at Wiley for useful discussions about author behavior and manuscript submission processes. Thanks to Alex Moscrop at Wiley for providing our data. Written collaboratively and preprinted using Authorea; thanks to Alberto Pepe and the Authorea team.