Provenance metadata for statistical data: An introduction to Structured Data Transformation Language (SDTL)

Authors

  • George Alter University of Michigan
  • Darrell Donakowski Center for Political Studies, University of Michigan, 426 Thompson Street, 4200 Bay, Ann Arbor, MI 48104-1248 USA
  • Jack Gager Metadata Technology North America Inc., 3915 Papermill Dr., Knoxville, TN 37909, USA
  • Pascal Heus Metadata Technology North America Inc., 3915 Papermill Dr., Knoxville, TN 37909, USA
  • Carson Hunter Metadata Technology North America Inc., 3915 Papermill Dr., Knoxville, TN 37909, USA
  • Sanda Ionescu ICPSR, University of Michigan, PO Box 1248, Ann Arbor, MI 48106-1248 USA
  • Jeremy Iverson Colectica, 1428 Washington Ave S STE 203 · Minneapolis, MN 55454 USA
  • H.V. Jagadish Computer Science and Engineering, University of Michigan, Bob and Betty Beyster Building, 2260 Hayward Street, Ann Arbor, MI 48109-2121 USA
  • Carl Lagoze School of Information, University of Michigan, 105 S State St, Ann Arbor, MI 48109 USA
  • Jared Lyle ICPSR, University of Michigan, PO Box 1248, Ann Arbor, MI 48106-1248 USA
  • Alexander Mueller ICPSR, University of Michigan, PO Box 1248, Ann Arbor, MI 48106-1248 USA
  • Sigbjørn Revheim NSD - Norwegian Centre for Research Data, Harald Hårfagres gate 29, N-5007 Bergen, Norway
  • Matthew A. Richardson ICPSR, University of Michigan, PO Box 1248, Ann Arbor, MI 48106-1248 USA
  • Risnes Ørnulf NSD - Norwegian Centre for Research Data, Harald Hårfagres gate 29, N-5007 Bergen, Norway
  • Karunakara Seelam ICPSR, University of Michigan, PO Box 1248, Ann Arbor, MI 48106-1248 USA
  • Dan Smith Colectica, 1428 Washington Ave S STE 203 · Minneapolis, MN 55454 USA
  • Tom Smith NORC, 1155 E 60th Street, Chicago, IL 60637 USA
  • Jie Song Computer Science and Engineering, University of Michigan, Bob and Betty Beyster Building, 2260 Hayward Street, Ann Arbor, MI 48109-2121 USA
  • Yashas Jaydeep Vaidya Computer Science and Engineering, University of Michigan, Bob and Betty Beyster Building, 2260 Hayward Street, Ann Arbor, MI 48109-2121 USA
  • Ole Voldsater NSD - Norwegian Centre for Research Data, Harald Hårfagres gate 29, N-5007 Bergen, Norway

DOI:

https://doi.org/10.29173/iq983

Keywords:

metadata, provenance, statistical data

Abstract

Structured Data Transformation Language (SDTL) provides structured, machine actionable representations of data transformation commands found in statistical analysis software.   The Continuous Capture of Metadata for Statistical Data Project (C2Metadata) created SDTL as part of an automated system that captures provenance metadata from data transformation scripts and adds variable derivations to standard metadata files.  SDTL also has potential for auditing scripts and for translating scripts between languages.  SDTL is expressed in a set of JSON schemas, which are machine actionable and easily serialized to other formats.  Statistical software languages have a number of special features that have been carried into SDTL.  We explain how SDTL handles differences among statistical languages and complex operations, such as merging files and reshaping data tables from “wide” to “long”. 

Downloads

Published

2020-12-18

How to Cite

Alter, G., Donakowski, D., Gager, J., Heus, P., Hunter, C., Ionescu, S., Iverson, J., Jagadish, H., Lagoze, C., Lyle, J., Mueller, A., Revheim, S., Richardson, M. A., Ørnulf, R., Seelam, K., Smith, D., Smith, T., Song, J., Vaidya, Y. J., & Voldsater, O. (2020). Provenance metadata for statistical data: An introduction to Structured Data Transformation Language (SDTL). IASSIST Quarterly, 44(4). https://doi.org/10.29173/iq983