Search for a command to run...
In 2017, the eResearch teams of five major universities initiated discussions on research data culture. A primary concern was the perceived rapid growth of data and its potential budgetary implications for CFOs. Contrary to general assumptions, initial measurements revealed a surprisingly lower growth rate, the reasons for which remain unclear. This finding was subsequently supported by the MacroView, a two-year project estimating Australia's research data scale for December 2021 and 2022. Current extrapolations suggest approximately 370 petabytes (PB) of research data in 2022, growing at 22-25%. This would be 550-600 PB today. However, institutions lacked the capacity to report on significant aspects of their data holdings, leaving the detailed characteristics of this research data asset unknown. To manage research data as a valuable asset, we believe understanding its origin, initial use, replication, ongoing use, and longevity is essential. Consequently, we are now examining data use at a research intensive Australian university in partnership with Arcitecta. Utilising their Mediaflux system, which manages in that case around 17 PB of data, we have generated de-identified event-by-event logs of all service activities. The logs allow us to visualise data asset state changes, including the age profile of data in use and data and metadata creation and access patterns. Our overall goal is to define a set of measurable attributes that can form the basis for organization-wide data reporting guidelines. We also hope to observe different life cycles of data that occur in practice. The poster will detail our methodology and preliminary results.