bsie.extractor package#

Extractors produce triples from some content.

Each Extractor class is linked to the Reader class whose content it requires.

class bsie.extractor.Extractor(schema: Schema)#

Bases: ABC

Produce (subject, predicate, value)-triples from some content. The Extractor produces princpal predicates that provide information about the content itself (i.e., triples that include the subject), and may also generate triples with auxiliary predicates if the extracted value is a node itself.

CONTENT_READER: str | None = None#
abstract extract(subject: Node, content: Any, principals: Iterable[Predicate]) Iterator[Tuple[Node, Predicate, Any]]#

Return (node, predicate, value) triples.

property principals: Iterator[Predicate]#

Return the principal predicates, i.e., relations from/to the extraction subject.

property schema: Schema#

Return the extractor’s schema.

class bsie.extractor.ExtractorBuilder(specs: List[Dict[str, Dict[str, Any]]])#

Bases: object

Build `bsie.base.Extractor instances.

It is permissible to build multiple instances of the same extractor (typically with different arguments), hence the ExtractorBuilder receives a list of build specifications. Each specification is a dict with a single key (extractor’s qualified name) and a dict to be used as keyword arguments. Example: [{‘bsie.extractor.generic.path.Path’: {}}, ]

build(index: int) Extractor#

Return an instance of the n’th extractor (n=*index*).