Getting Started#
This page takes you from installation to a working SynRXN data-loading workflow. The examples use released data for reproducibility and exact Git commits for development snapshots.
DataLoader to resolve, cache, and read a curated benchmark table.Requirements#
Python: 3.11 or newer is recommended.
Operating system: Linux, macOS, or Windows with WSL.
Network access: required the first time you download data from Zenodo or GitHub.
Persistent cache: recommended for repeated experiments, for example
~/.cache/synrxn.
Installation#
From PyPI#
Use this route for released package builds and published data snapshots.
python -m pip install --upgrade pip
python -m pip install synrxn
Install the broader optional dependency stack when you need all extras:
python -m pip install "synrxn[all]"
From source#
Use this route when you are developing SynRXN, editing documentation, or rebuilding curated datasets.
git clone https://github.com/TieuLongPhan/SynRXN.git
cd SynRXN
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"
Build the documentation locally#
The enhanced documentation uses the PyData Sphinx theme.
cd doc
python -m pip install -r requirements.txt
sphinx-build -b html . _build/html
Open doc/_build/html/index.html in a browser.
Verify the installation#
Run a minimal import and version check:
python - <<'PY'
import importlib.metadata as metadata
import synrxn
print("synrxn", metadata.version("synrxn"))
print("module", synrxn.__file__)
PY
Load your first dataset#
The example below loads the balanced Schneider classification benchmark from a versioned Zenodo release and inspects the resulting pandas DataFrame.
from pathlib import Path
from synrxn.data import DataLoader
cache_dir = Path("~/.cache/synrxn").expanduser()
loader = DataLoader(
task="classification",
source="zenodo",
version="1.0.0",
cache_dir=cache_dir,
)
print("Available datasets:", loader.available_names())
df = loader.load("schneider_b")
print("Shape:", df.shape)
print("Columns:", df.columns.tolist())
print(df.head(3))
Use a pinned commit during development#
For development workflows, pinning a commit gives an exact source snapshot. This
is stronger than latest because it prevents silent changes in future runs.
from pathlib import Path
from synrxn.data import DataLoader
loader = DataLoader(
task="property",
source="commit",
version="3e1612e2199e8b0e369fce3ed9aff3dda68e4c32",
cache_dir=Path("~/.cache/synrxn").expanduser(),
gh_enable=True,
)
df = loader.load("b97xd3")
print(df[["r_id", "ea", "dh"]].head())
Work with train/validation/test splits#
Some datasets already include a split column from the original benchmark.
When a fresh deterministic split is needed, use
synrxn.split.repeated_kfold.RepeatedKFoldsSplitter.
from pathlib import Path
from synrxn.data import DataLoader
from synrxn.split.repeated_kfold import RepeatedKFoldsSplitter
loader = DataLoader(
task="property",
source="zenodo",
version="1.0.0",
cache_dir=Path("~/.cache/synrxn").expanduser(),
)
df = loader.load("b97xd3")
splitter = RepeatedKFoldsSplitter(
n_splits=5,
n_repeats=2,
random_state=42,
val_ratio=0.1,
)
split_indices = splitter.split(df)
print(split_indices)
Reproducibility tip: For a paper or benchmark report, record the dataset name, task family, source mode, version or commit SHA, cache manifest if available, split seed, and package version.
Source modes at a glance#
Source mode |
Best use case |
Example version |
Reproducibility level |
|---|---|---|---|
|
Published results and long-term archiving. |
|
High, citable release snapshot. |
|
Aligning data with a GitHub release tag or branch. |
|
High for tags, lower for branches. |
|
Exact development snapshot. |
full commit SHA |
Highest for source-state reproducibility. |
|
Exploration during active development. |
|
Low unless the resolved commit is recorded. |
Troubleshooting#
Installation cannot find optional packages#
Upgrade packaging tools first:
python -m pip install --upgrade pip setuptools wheel
Zenodo downloads fail or are rate-limited#
Prefer GitHub-hosted release assets for routine experiments. For Zenodo-based loading, check network access and use a persistent cache so assets are not repeatedly requested.
Autodoc cannot import synrxn during documentation builds#
Install the package in editable mode from the project root before building docs:
python -m pip install -e ".[dev]"
cd doc
sphinx-build -b html . _build/html
Next steps#
Data Concept explains task folders, schema conventions, and source modes.
Data Records lists curated benchmark records and row counts.
Tutorials and Examples gives complete workflows for released data, pinned commits, splitting, and rebuilds.
API Reference documents
DataLoaderand split utilities.