TranSMART loader

Build status codecov PyPI PyPI - Status PyPI - Downloads MIT license

This package contains classes that represent the core domain objects stored in the TranSMART platform, an open source data sharing and analytics platform for translational biomedical research.

It also provides a utility that writes such objects to tab-separated files that can be loaded into a TranSMART database using the transmart-copy tool.

⚠️ Note: this is a very preliminary version, still under development. Issues can be reported at https://github.com/thehyve/python_transmart_loader/issues.

Installation and usage

To install transmart_loader, do:

pip install transmart-loader

or from sources:

git clone https://github.com/thehyve/python_transmart_loader.git
cd python_transmart_loader
pip install .

Usage

Usage examples can be found in these projects:

Documentation

Full documentation of the package is available at Read the Docs.

Known issues

  • Date values are not correctly translated

Development

For a quick reference on software development, we refer to the software guide checklist.

Python versions

This repository is set up with Python version 3.6

Add or remove Python versions based on project requirements. The guide contains more information about Python versions and writing Python 2 and 3 compatible code.

Package management and dependencies

This project uses pip for installing dependencies and package management.

  • Dependencies should be added to setup.py in the install_requires list.

Testing and code coverage

  • Tests are in the tests folder.
  • The tests folder contains:
    • A test if files for transmart-copy are generated for fake data (file: test_transmart_loader)
    • A test that checks whether your code conforms to the Python style guide (PEP 8) (file: test_lint.py)
  • The testing framework used is PyTest
  • Tests can be run with python setup.py test

Documentation

  • Documentation should be put in the docs folder.
  • To generate html documentation run python setup.py build_sphinx

Coding style conventions and code quality

  • Check your code style with prospector
  • You may need run pip install .[dev] first, to install the required dependencies

License

Copyright (c) 2019 The Hyve B.V.

The TranSMART loader is licensed under the MIT License. See the file LICENSE.

Credits

This package was created with Cookiecutter and the NLeSC/python-template.

Data model

Data model

_images/transmart_class_diagram.jpg

TranSMART data model

_images/transmart_database_diagram.jpg

TranSMART database tables

API Reference

transmart_loader package

Documentation about TranSMART loader

Submodules

transmart_loader.collection_validator module

class transmart_loader.collection_validator.CollectionValidator

Bases: transmart_loader.collection_visitor.CollectionVisitor

Validation class for TranSMART data collections.

static validate(collection: transmart_loader.transmart.DataCollection)
visit_concept(concept: transmart_loader.transmart.Concept) → None
visit_node(node: transmart_loader.transmart.TreeNode) → None
visit_observation(observation: transmart_loader.transmart.Observation) → None
visit_patient(patient: transmart_loader.transmart.Patient) → None
visit_study(study: transmart_loader.transmart.Study) → None
visit_trial_visit(trial_visit: transmart_loader.transmart.TrialVisit) → None
visit_visit(visit: transmart_loader.transmart.Visit) → None

transmart_loader.collection_visitor module

class transmart_loader.collection_visitor.CollectionVisitor

Bases: object

Visitor class for TranSMART data collections

visit(collection: Optional[transmart_loader.transmart.DataCollection]) → None
visit_concept(concept: transmart_loader.transmart.Concept) → None
visit_node(node: transmart_loader.transmart.TreeNode) → None
visit_observation(observation: transmart_loader.transmart.Observation) → None
visit_patient(patient: transmart_loader.transmart.Patient) → None
visit_study(study: transmart_loader.transmart.Study) → None
visit_trial_visit(trial_visit: transmart_loader.transmart.TrialVisit) → None
visit_visit(visit: transmart_loader.transmart.Visit) → None

transmart_loader.console module

class transmart_loader.console.Console

Bases: object

A helper class for displaying messages on the console (stderr).

Black = '\x1b[30m'
BlackBackground = '\x1b[40m'
Blue = '\x1b[94m'
Green = '\x1b[92m'
GreenBackground = '\x1b[42m'
Grey = '\x1b[37m'
Red = '\x1b[91m'
RedBackground = '\x1b[41m'
Reset = '\x1b[0m'
Yellow = '\x1b[93m'
YellowBackground = '\x1b[103m'
static error(message)
static info(message)
static success(message)
static title(title)
static warning(message)

transmart_loader.copy_writer module

class transmart_loader.copy_writer.TransmartCopyWriter(output_dir: str)

Bases: transmart_loader.collection_visitor.CollectionVisitor

Writes TranSMART data collections to a folder with files that can be loaded into a TranSMART database using transmart-copy.

concepts_header = ['concept_cd', 'concept_path', 'name_char']
dimensions_header = ['id', 'name', 'modifier_code', 'value_type']
init_writers() → None

Creates files and initialises writers for the output files in transmart-copy format.

observations_header = ['encounter_num', 'patient_num', 'concept_cd', 'provider_id', 'start_date', 'end_date', 'modifier_cd', 'instance_num', 'trial_visit_num', 'valtype_cd', 'tval_char', 'nval_num', 'observation_blob']
patient_mappings_header = ['patient_ide', 'patient_ide_source', 'patient_num']
patients_header = ['patient_num', 'sex_cd']
prepare_output_dir() → None

Creates an output directory if it does not exist. Fails if the output directory exists and is not empty.

studies_header = ['study_num', 'study_id', 'secure_obj_token']
study_dimensions_header = ['study_id', 'dimension_description_id']
tree_nodes_header = ['c_hlevel', 'c_fullname', 'c_name', 'c_visualattributes', 'c_basecode', 'c_facttablecolumn', 'c_tablename', 'c_columnname', 'c_columndatatype', 'c_operator', 'c_dimcode', 'secure_obj_token']
trial_visits_header = ['trial_visit_num', 'study_num', 'rel_time_unit_cd', 'rel_time_num', 'rel_time_label']
value_type_codes = {<ValueType.Numeric: 1>: 'N', <ValueType.Categorical: 2>: 'T', <ValueType.Date: 4>: 'D', <ValueType.Text: 3>: 'B'}
visit_concept(concept: transmart_loader.transmart.Concept) → None

Serialises a Concept entity to a TSV file.

Parameters:concept – the Concept entity
visit_node(node: transmart_loader.transmart.TreeNode) → None
visit_observation(observation: transmart_loader.transmart.Observation) → None

Serialises an Observation entity to a TSV file.

FIXME: fix date value serialisation

Parameters:observation – the Observation entity
visit_patient(patient: transmart_loader.transmart.Patient) → None

Serialises an Patient entity and related PatientMapping entities to TSV files.

Parameters:patient – the Patient entity
visit_study(study: transmart_loader.transmart.Study) → None

Serialises a Study entity to a TSV file.

Parameters:study – the Study entity
visit_tree_node(node: transmart_loader.transmart.TreeNode, level=0, parent_path='\\')

Serialises a TreeNode entity and its children to a TSV file.

Parameters:
  • node – the TreeNode entity
  • level – the hierarchy level of the node
  • parent_path – the path of the parent node.
visit_trial_visit(trial_visit: transmart_loader.transmart.TrialVisit) → None

Serialises a TrialVisit entity to a TSV file.

Parameters:trial_visit – the TrialVisit entity
visit_visit(visit: transmart_loader.transmart.Visit) → None

Serialises a Visit entity to a TSV file. NB: this requires all patient visits to be cleared before loading new visits for the patient.

Parameters:visit – the Visit entity
visits_header = ['encounter_num', 'patient_num', 'active_status_cd', 'start_date', 'end_date', 'inout_cd', 'location_cd', 'location_path', 'length_of_stay', 'visit_blob']
write_collection(collection: transmart_loader.transmart.DataCollection) → None
write_dimension(dimension: transmart_loader.transmart.Dimension) → None

Serialises a Dimension entity to a TSV file.

Parameters:dimension – the Dimension entity
write_dimensions() → None

Write dimensions metadata and link all studies to the dimensions

write_study_dimensions(study_index)
class transmart_loader.copy_writer.VisualAttribute

Bases: enum.Enum

Visual attribute of an ontology node

Categorical = 8
Container = 3
Date = 7
Folder = 2
Leaf = 1
Numerical = 5
Study = 4
Text = 6
transmart_loader.copy_writer.format_date(value: Optional[datetime.date]) → Optional[str]
transmart_loader.copy_writer.get_concept_node_row(node: transmart_loader.transmart.ConceptNode, level, node_path)
transmart_loader.copy_writer.get_folder_node_row(node: transmart_loader.transmart.TreeNode, level, node_path)
transmart_loader.copy_writer.get_study_node_row(node: transmart_loader.transmart.StudyNode, level, node_path)

transmart_loader.csv_types module

class transmart_loader.csv_types.CsvWriter

Bases: object

This is an abstract version of the return of csv.writer().

writerow(row: Sequence[Any]) → None
writerows(rows: Sequence[Sequence[Any]]) → None

transmart_loader.loader_exception module

exception transmart_loader.loader_exception.LoaderException

Bases: Exception

transmart_loader.transmart module

class transmart_loader.transmart.CategoricalValue(value: str)

Bases: transmart_loader.transmart.Value

A categorical value

value()
value_type()
class transmart_loader.transmart.Concept(concept_code: str, name: str, concept_path: str, value_type: transmart_loader.transmart.ValueType)

Bases: object

Concepts to classify observations

class transmart_loader.transmart.ConceptNode(concept: transmart_loader.transmart.Concept)

Bases: transmart_loader.transmart.TreeNode

Concept node

class transmart_loader.transmart.DataCollection(concepts: Iterable[transmart_loader.transmart.Concept], studies: Iterable[transmart_loader.transmart.Study], trial_visits: Iterable[transmart_loader.transmart.TrialVisit], visits: Iterable[transmart_loader.transmart.Visit], ontology: Iterable[transmart_loader.transmart.TreeNode], patients: Iterable[transmart_loader.transmart.Patient], observations: Iterable[transmart_loader.transmart.Observation])

Bases: object

A data collection that can be loaded into TranSMART

class transmart_loader.transmart.DateValue(value: datetime.date)

Bases: transmart_loader.transmart.Value

A date value

value()
value_type()
class transmart_loader.transmart.Dimension(name: str, modifier_code: Optional[str] = None, value_type: Optional[transmart_loader.transmart.ValueType] = None)

Bases: object

Dimension metadata

class transmart_loader.transmart.NumericalValue(value: float)

Bases: transmart_loader.transmart.Value

A numerical value

value()
value_type()
class transmart_loader.transmart.Observation(patient: transmart_loader.transmart.Patient, concept: transmart_loader.transmart.Concept, visit: Optional[transmart_loader.transmart.Visit], trial_visit: transmart_loader.transmart.TrialVisit, start_date: Optional[datetime.date], end_date: Optional[datetime.date], value: transmart_loader.transmart.Value)

Bases: object

Data about an observed event or an attribute of a patient

class transmart_loader.transmart.ObservationMetadata

Bases: object

Metadata about an observation

class transmart_loader.transmart.Patient(identifier: str, sex: str, mappings: Sequence[transmart_loader.transmart.PatientMapping])

Bases: object

Patient properties

class transmart_loader.transmart.PatientMapping(source: str, identifier: str)

Bases: object

Patient identifiers

class transmart_loader.transmart.Study(study_id: str, name: str)

Bases: object

class transmart_loader.transmart.StudyNode(study: transmart_loader.transmart.Study)

Bases: transmart_loader.transmart.TreeNode

Study node

class transmart_loader.transmart.TextValue(value: str)

Bases: transmart_loader.transmart.Value

A text value

value()
value_type()
class transmart_loader.transmart.TreeNode(name: str)

Bases: object

Ontology node

add_child(child: transmart_loader.transmart.TreeNode)
class transmart_loader.transmart.TrialVisit(study: transmart_loader.transmart.Study, rel_time_label: str, rel_time_unit: Optional[str] = None, rel_time: Optional[int] = None)

Bases: object

Trial visit

class transmart_loader.transmart.Value

Bases: object

An observed value

value
value_type
class transmart_loader.transmart.ValueType

Bases: enum.Enum

Type of an observed value

Categorical = 2
Date = 4
Numeric = 1
Text = 3
class transmart_loader.transmart.Visit(patient: transmart_loader.transmart.Patient, identifier: str, active_status: Optional[str], start_date: Optional[datetime.date], end_date: Optional[datetime.date], inout: Optional[str], location: Optional[str], length_of_stay: Optional[int])

Bases: object

Patient visit

transmart_loader.tsv_writer module

class transmart_loader.tsv_writer.TsvWriter(path: str)

Bases: transmart_loader.csv_types.CsvWriter

Tab-separated values writer. Creates a new file when initialised and fails when the file already exists.

close() → None
writerow(row: Sequence[Any]) → None
writerows(rows: Sequence[Sequence[Any]]) → None

Index