Codeq NLP API Tutorial

In this tutorial of Codeq’s NLP API we will focus on a single module that can be used to extract semantic roles from texts. Previous tutorials of this series can be found here:

The complete list of modules in the Codeq NLP API can be found here:

Define a NLP pipeline and analyze a text

As usual, the first step is to declare an instance of the NLP API client and use it to send a text along with a pipeline variable indicating the name of the Semantic Roles annotator. The output is a Document object that contains a list of analyzed Sentences. A quick look of the output can be found with the method document.pretty_print().

from codeq_nlp_api import CodeqClientclient = CodeqClient(user_id="USER_ID", user_key="USER_KEY")pipe = [
"semantic_roles"
]

text = "A pneumonia outbreak was reported in Wuhan, China in December 2019."

document = client.analyze(text, pipeline=pipe)

print(document.pretty_print())

Semantic Role Labelling

The goal of this module is to identify the main events and participants in sentences and classify the different types of relations between them. In the extraction of Semantic Roles, events are called predicates, while the participants are known as the arguments of a given predicate. Arguments can denote specific types of relations, for example they can be an Agent, a Patient or a Location in relation to the predicate.

More details about the Semantic Role Labeler and an example of its application can be found here:

  • KEY: semantic_roles

Output Labels:

  • Agent/Experiencer
pipe = [
"semantic_roles"
]

text = "A pneumonia outbreak was reported in Wuhan, China in December 2019."

document = client.analyze(text, pipeline=pipe)

for sentence in document.sentences:
raw_sentence = sentence.raw_sentence
semantic_roles = sentence.semantic_roles

print(raw_sentence)
for sr in semantic_roles:
predicate_lemma = sr['predicate_lemma']
predicate_token = sr['predicate_token']
predicate_position = sr['predicate_position']
print("")
print("predicate_lemma: %s" % predicate_lemma)
print("predicate_token: %s" % predicate_token)
print("predicate_position: %s" % predicate_position)
if 'arguments' in sr:
print("arguments:")
for arg in sr['arguments']:
arg_type = arg['type']
arg_tokens = arg['tokens']
arg_tokens_position = arg['positions']
print("- type: %s" % arg_type)
print("- tokens: %s" % arg_tokens)
print("- positions: %s\n" % arg_tokens_position)
# Output:
#
# sentence: A pneumonia outbreak was reported in Wuhan, China in December 2019.
#
# semantic_roles:
#
# predicate_lemma: be
# predicate_token: was
# predicate_position: 4
#
# predicate_lemma: report
# predicate_token: reported
# predicate_position: 5
# arguments:
# - type: Patient/Theme
# - tokens: ['A', 'pneumonia', 'outbreak']
# - positions: [1, 2, 3]
#
# - type: Location
# - tokens: ['in', 'Wuhan', ',', 'China']
# - positions: [6, 7, 8, 9]
#
# - type: Temporal
# - tokens: ['in', 'December', '2019']
# - positions: [10, 11, 12]

From the output above we can observe the following:

  • All semantic roles contain a predicate and, if present, a list of arguments for that predicate.

Wrap up

In this tutorial we described how to use the Semantic Role Labeler of the Codeq NLP API. The code below summarizes how to iterate over its output:

  • Take a look at our documentation to learn more about the NLP tools we provide.

Senior Computational Linguist at Codeq