Skosmos: omtd_ModelFunctionScheme: Tokenization

Annotation > Text annotation > Structural annotation > Tokenization

Tokenization

Tokenization is commonly seen as an independent process of linguistic analysis, in which the input stream of characters is segmented into an ordered sequence of word-like units, usually called tokens, which function as input items for subsequent steps of linguistic processing. Tokens may correspond to words, numbers, punctuation marks or even proper names.The recognized tokens are usually classified according to their syntax. Since the notion of tokenization seems to have different meanings to different people, some tokenization tools fulfil additional tasks like for instance sentence boundary detection, handling of end-line hyphenations or conjoined clitics and contractions.

The task/process of recognizing and tagging tokens (words, punctuation marks, digits etc.) in a text

http://w3id.org/meta-share/omtd-share/Tokenization

RDF/XML TURTLE JSON-LD

Skosmos