Gene2Go Reader Dataset
Dataset to handle Gene Ontology (GO) terms.
The Gene2GoReaderDataset dataset is used to read the Gene Ontology (GO) terms from the landing.ncbigene.gene2go file. The dataset uses goatools library under the hood.
Gene2GoReaderDataset loads data from a Gene Annotation File (GAF) using goatools.Gene2GoReader, with built-in support for local or remote filesystems (via fsspec) and optional on-the-fly decompression of gzip-compressed GAFs.
Example usage for the YAML API:
Attributes
DEFAULT_LOAD_ARGS
DEFAULT_FS_ARGS
Methods
__init__(*, filepath, load_args=None, version=None, credentials=None, fs_args=None, metadata=None)
: Constructs the dataset, configuring protocol, filesystem, versioning, and optional gzip decompression settings.
_describe() → dict[str, Any]
: Returns a dictionary with the dataset's identifying parameters: filepath, protocol, load_args, and version.
load() → Gene2GoReader
: Loads (and if necessary decompresses) the GAF file and returns a Gene2GoReader instance.
- Detects .gz extension, decompresses to .gaf if no decompressed copy exists.
- Suppresses stdout during Gene2GoReader initialization.
- Raises DatasetError on failure to decompress or load.
save(data: Gene2GoReader) → None
: Always raises DatasetError because the dataset is read-only.
_exists() → bool
: Returns True if the (possibly compressed) source file exists on the configured filesystem, False otherwise.
_release() → None
: Releases any cached resources and invalidates the filesystem cache for the dataset's path.
_invalidate_cache() → None
: Helper method to clear fsspec's internal cache for the dataset's filepath.
How is this guide?