Codeq NLP API Tutorial

Codeq’s NLP API includes text summarization modules that can help you to identify the most relevant content from texts. In this tutorial we will detail how to use modules related to text summarization, sentence compression and extraction of keyphrases.

Previous tutorials can be found here:

The complete list of modules in the Codeq NLP API can be found here:

Define a NLP pipeline and analyze a text

To call Codeq’s NLP API you need to create an instance of our Python SDK client using your API credentials as input parameters. After that, you need to declare a pipeline containing the annotators you are interested in. The client and the pipeline can be used to send a text and get as response a Document object containing the output of the desired annotators. For a quick overview of the output, you can use the method document.pretty_print(). For each annotator in this tutorial we will detail:

  • the keyword (KEY) used to call the annotator,
from codeq_nlp_api import CodeqClient

client = CodeqClient(user_id="USER_ID", user_key="USER_KEY")

pipe = [
"summarize", "compress", "summarize_compress", "keyphrases"
]

text = "A gunman who shot dead a uniformed officer outside the Biloxi police station remained on the run on Monday, the subject of an intense manhunt along Mississippi’s Gulf coast. It was unclear what prompted the killing of Officer Robert McKeithen, a 23-year veteran who was scheduled to retire this year. Biloxi’s police chief, John Miller, said police did not know if he was targeted, or the victim of a random act. The animal that did this is still on the run, Miller told reporters. We’re going to do everything within our power to bring him to justice for Robert and his family. Authorities say the man approached McKeithen in the station’s parking lot on Sunday night and shot him multiple times, either before or after coming inside the station."

document = client.analyze(text, pipeline=pipe)
print(document.pretty_print())

Summarization

This module generates as output an extractive summary with the most relevant sentences of the input text. In the case of this annotator, the output is stored at the level of the Document object.

  • KEY: summarize
pipe = [
"summarize"
]

text = "A gunman who shot dead a uniformed officer outside the Biloxi police station remained on the run on Monday, the subject of an intense manhunt along Mississippi’s Gulf coast. It was unclear what prompted the killing of Officer Robert McKeithen, a 23-year veteran who was scheduled to retire this year. Biloxi’s police chief, John Miller, said police did not know if he was targeted, or the victim of a random act. The animal that did this is still on the run, Miller told reporters. We’re going to do everything within our power to bring him to justice for Robert and his family. Authorities say the man approached McKeithen in the station’s parking lot on Sunday night and shot him multiple times, either before or after coming inside the station."

document = client.analyze(text, pipeline=pipe)

summary = document.summary

print(summary)
# Output:
#
# It was unclear what prompted the killing of Officer Robert McKeithen, a 23-year veteran who was scheduled to retire this year. Biloxi's police chief, John Miller, said police did not know if he was targeted, or the victim of a random act.

Sentence Compression

This module aims to generate, from a given sentence, a new, shorter one that retains the main point of the original, while possibly omitting some less central details. It can be thought of as the single-sentence counterpart to document summarization.

The output of this module is stored at the Sentence level.

  • KEY: compress
pipe = [
"compress"
]

text = "A gunman who shot dead a uniformed officer outside the Biloxi police station remained on the run on Monday, the subject of an intense manhunt along Mississippi’s Gulf coast. It was unclear what prompted the killing of Officer Robert McKeithen, a 23-year veteran who was scheduled to retire this year. Biloxi’s police chief, John Miller, said police did not know if he was targeted, or the victim of a random act. The animal that did this is still on the run, Miller told reporters. We’re going to do everything within our power to bring him to justice for Robert and his family. Authorities say the man approached McKeithen in the station’s parking lot on Sunday night and shot him multiple times, either before or after coming inside the station."

document = client.analyze(text, pipeline=pipe)

for sentence in document.sentences:
raw_sentence = sentence.raw_sentence
compressed_sentence = sentence.compressed_sentence
if raw_sentence != compressed_sentence:
print("original: %s" % raw_sentence)
print("compressed: %s\n" % compressed_sentence)
# Output:
#
# original: A gunman who shot dead a uniformed officer outside the Biloxi police station remained on the run on Monday, the subject of an intense manhunt along Mississippi's Gulf coast.
# compressed: A gunman who shot dead a uniformed officer outside the Biloxi police station remained on the run on Monday.
#
# original: It was unclear what prompted the killing of Officer Robert McKeithen, a 23-year veteran who was scheduled to retire this year.
# compressed: It was unclear what prompted the killing of Officer Robert McKeithen.
#
# original: Authorities say the man approached McKeithen in the station's parking lot on Sunday night and shot him multiple times, either before or after coming inside the station.
# compressed: Authorities say the man approached McKeithen in the station's parking lot on Sunday night and shot him multiple times.

Summarization with Compression

In this case, the annotator generates an extractive summary with the most relevant sentences of the input text in their compressed forms, independently of whether the compress Annotator is specified in the pipeline or not.

The output of this module is stored at the Document level.

  • KEY: summarize_compress
pipe = [
"summarize", "summarize_compress"
]

text = "A gunman who shot dead a uniformed officer outside the Biloxi police station remained on the run on Monday, the subject of an intense manhunt along Mississippi’s Gulf coast. It was unclear what prompted the killing of Officer Robert McKeithen, a 23-year veteran who was scheduled to retire this year. Biloxi’s police chief, John Miller, said police did not know if he was targeted, or the victim of a random act. The animal that did this is still on the run, Miller told reporters. We’re going to do everything within our power to bring him to justice for Robert and his family. Authorities say the man approached McKeithen in the station’s parking lot on Sunday night and shot him multiple times, either before or after coming inside the station."

document = client.analyze(text, pipeline=pipe)

print("summary: %s\n" % document.summary)
print("compressed_summary: %s" % document.compressed_summary)
# Output:
#
# summary: It was unclear what prompted the killing of Officer Robert McKeithen, a 23-year veteran who was scheduled to retire this year. Biloxi's police chief, John Miller, said police did not know if he was targeted, or the victim of a random act.
#
# compressed_summary: It was unclear what prompted the killing of Officer Robert McKeithen. Biloxi's police chief, John Miller, said police did not know if he was targeted, or the victim of a random act.

Keyphrase Extraction

This module is in charge of finding, for a given document, a list of short phrases that give a user a sense of the topics covered by the document. For example, for documents of a more technical nature, the retrieved keyphrases should include the technical terms most relevant to the topic of the paper, whereas for a news article, the keyphrases should include names of people, organizations, etc. relevant to the article.

The output of this module is stored at the Document level in two forms: the list of keyphrases as strings, and the list of keyphrases as tuples including their relevance score.

  • KEY: keyphrases
pipe = [
"keyphrases"
]
text = "A gunman who shot dead a uniformed officer outside the Biloxi police station remained on the run on Monday, the subject of an intense manhunt along Mississippi’s Gulf coast. It was unclear what prompted the killing of Officer Robert McKeithen, a 23-year veteran who was scheduled to retire this year. Biloxi’s police chief, John Miller, said police did not know if he was targeted, or the victim of a random act. The animal that did this is still on the run, Miller told reporters. We’re going to do everything within our power to bring him to justice for Robert and his family. Authorities say the man approached McKeithen in the station’s parking lot on Sunday night and shot him multiple times, either before or after coming inside the station."

document = client.analyze(text, pipeline=pipe)

print("Keyphrases:\n")
for k in document.keyphrases:
print(k)

print("Keyphrases Scored:\n")
for k in document.keyphrases_scored:
print(k)
# Output:
#
#
# Keyphrases:
#
# Biloxi police station
# Biloxi 's police chief
# Officer Robert McKeithen
# the station 's parking lot
# Mississippi 's Gulf coast
# Sunday night and shot
# him multiple times
# Monday
# John Miller
#
# Keyphrases Scored:
#
# ['Biloxi police station', 0.14236053468171103]
# ["Biloxi 's police chief", 0.12844081612661434]
# ['Officer Robert McKeithen', 0.12583178746051182]
# ["the station 's parking lot", 0.11744640267914488]
# ["Mississippi 's Gulf coast", 0.11721120752467706]
# ['Sunday night and shot', 0.11115943448261123]
# ['him multiple times', 0.09358212954494648]
# ['Monday', 0.08635075551783025]
# ['John Miller', 0.0776169319819529]

Wrap up

In this tutorial we described some modules of the Codeq NLP API that can be used to summarize texts and get relevant keyphrases. The code below summarizes the pipeline names to call each annotator and the variables used to store their output:

  • Take a look at our documentation to learn more about the NLP tools we provide.

Senior Computational Linguist at Codeq