Search for a command to run...
Reproducibility is the ability to reproduce a study using the same data and protocol. Many disciplines face challenges with reproducibility raising concerns about the reliability of science. Fields relying on data analysis, such as the social sciences, health economics, and epidemiology, are also affected. A central factor in this issue is data analysis strategy. Effective analysis involves numerous choices: study design, theoretical model, assumptions, judgment criteria, parameter settings, and data quality. The complexity of statistical analysis and programming can also limit reproducibility. Among complex analysis, state sequence analysis (SSA) enables the exploration of temporal patterns and transitions over time. We developed and evaluated SSAW (State Sequence Analysis Workflow), a scientific workflow which simplifies and standardizes SSA (https://github.com/bakrimmadi/SSA_workflow2). SSAW automates key steps of SSA, ranging from data transformation, dissimilarity selection, and hyperparameter tuning to clustering and visualization while maintaining full documentation of each stage. Built on Snakemake, SSAW ensures reproducibility through isolated Conda environments and supports both non-expert and expert users. The quality of the workflow was assessed based on several criteria, such as modularity, reproducibility, sustainability, and transparency, but also by evaluating SSAW’s compliance with FAIR (Findable, Accessible, Interoperable, Reusable) principles. We showed that SSAW complies with the FAIR principles and other complementary workflow quality criteria. To assess its applicability, we applied SSAW to a publicly available sequence dataset which followed 712 individuals aged 16–19 monthly from September 1993 to June 1999, aiming to identify subgroups most at risk of long-term unemployment. Using SSAW, we obtain clear visualizations that facilitate the interpretation of trajectories and clusters. The results were consistent with those reported in the published analysis. SSAW not only streamlines the entire analysis but also introduces an innovative strategy to guide clustering decisions based on statistical coherence and stability. SSAW is the first FAIR-compliant framework for SSA and it significantly improved the reproducibility and traceability of analysis choices, bridging the gap between advanced methodological requirements and the need for transparent and reproducible practices. By combining methodological innovation with FAIR principles, SSAW makes trajectory analysis more transparent, reusable, and accessible across disciplines.