Codeq NLP API Tutorial

Part 8. Semantic Roles

3 min readDec 14, 2020

In this tutorial of Codeq’s NLP API we will focus on a single module that can be used to extract semantic roles from texts. Previous tutorials of this series can be found here:

The complete list of modules in the Codeq NLP API can be found here:

Codeq NLP API Documentation

The first thing you need to do before start using Codeq’s NLP API is to sign up to generate a User ID and User Key…

api.codeq.com

Define a NLP pipeline and analyze a text

As usual, the first step is to declare an instance of the NLP API client and use it to send a text along with a pipeline variable indicating the name of the Semantic Roles annotator. The output is a Document object that contains a list of analyzed Sentences. A quick look of the output can be found with the method document.pretty_print().

from codeq_nlp_api import CodeqClientclient = CodeqClient(user_id="USER_ID", user_key="USER_KEY")pipe = [
    "semantic_roles"
]

text = "A pneumonia outbreak was reported in Wuhan, China in December 2019."

document = client.analyze(text, pipeline=pipe)

print(document.pretty_print())

Semantic Role Labelling

The goal of this module is to identify the main events and participants in sentences and classify the different types of relations between them. In the extraction of Semantic Roles, events are called predicates, while the participants are known as the arguments of a given predicate. Arguments can denote specific types of relations, for example they can be an Agent, a Patient or a Location in relation to the predicate.

More details about the Semantic Role Labeler and an example of its application can be found here:

Exploring CORD-19 with Codeq NLP API and Semantic Roles

An exploratory analysis showing how Semantic Roles can be used to extract Knowledge-Rich Contexts. See it in action…

medium.com

KEY: semantic_roles
ATTR: sentence.semantic_roles

Output Labels:

Agent/Experiencer
Patient/Theme
Instrument/Beneficiary/Goal
StartingPoint/Attribute
EndingPoint
Location
Purpose
Cause
Temporal
Modifier
Negative
GenericArgument

pipe = [
    "semantic_roles"
]

text = "A pneumonia outbreak was reported in Wuhan, China in December 2019."

document = client.analyze(text, pipeline=pipe)

for sentence in document.sentences:
    raw_sentence = sentence.raw_sentence
    semantic_roles = sentence.semantic_roles

    print(raw_sentence)
    for sr in semantic_roles:
        predicate_lemma = sr['predicate_lemma']
        predicate_token = sr['predicate_token']
        predicate_position = sr['predicate_position']
        print("")
        print("predicate_lemma: %s" % predicate_lemma)
        print("predicate_token: %s" % predicate_token)
        print("predicate_position: %s" % predicate_position)
        if 'arguments' in sr:
            print("arguments:")
            for arg in sr['arguments']:
                arg_type = arg['type']
                arg_tokens = arg['tokens']
                arg_tokens_position = arg['positions']
                print("- type: %s" % arg_type)
                print("- tokens: %s" % arg_tokens)
                print("- positions: %s\n" % arg_tokens_position)# Output:
# 
# sentence: A pneumonia outbreak was reported in Wuhan, China in December 2019.
# 
# semantic_roles:
# 
# predicate_lemma: be
# predicate_token: was
# predicate_position: 4
# 
# predicate_lemma: report
# predicate_token: reported
# predicate_position: 5
# arguments:
# - type: Patient/Theme
# - tokens: ['A', 'pneumonia', 'outbreak']
# - positions: [1, 2, 3]
# 
# - type: Location
# - tokens: ['in', 'Wuhan', ',', 'China']
# - positions: [6, 7, 8, 9]
# 
# - type: Temporal
# - tokens: ['in', 'December', '2019']
# - positions: [10, 11, 12]

From the output above we can observe the following:

All semantic roles contain a predicate and, if present, a list of arguments for that predicate.
Predicates include the token, lemma (inflected form) and position in the sentence.
Each argument will contain the type (see Output labels above), the tokens of that argument and the position of the tokens in the sentence.
All token positions in the sentence start from 1 (instead of 0, as lists in Python).

Wrap up

In this tutorial we described how to use the Semantic Role Labeler of the Codeq NLP API. The code below summarizes how to iterate over its output:

Take a look at our documentation to learn more about the NLP tools we provide.
Do you need inspiration? Go to our use case demos and see how you can integrate different tools.
In our NLP demos section you can also try our tools and find examples of the output of each module.