Codeq NLP API Tutorial

Part 9. Semantic Similarity

In this tutorial we will showcase a module of Codeq’s NLP API that can be used to analyze the semantic similarity between texts. Previous tutorials of this series can be found here:

The complete list of modules in the Codeq NLP API can be found here:

Calling the Semantic Similarity endpoint

The endpoint to get the semantic similarity between texts can also be called using an instance of our Python SDK. As usual, to create an instance of this client, you need to use your API credentials as input parameters.

from codeq_nlp_api import CodeqClient

client = CodeqClient(user_id="USER_ID", user_key="USER_KEY")

Instead of defining a pipeline with the names of some NLP Annotators, as we have been doing in previous tutorials, in this case you need to use a different method of the client to get the similarity between texts:

client.analyze_text_similarity(text1, text2)

This method requires as input two strings and returns as output a dict containing the text_similarity_score:

text1 = 'More than 100 injured in Texas plant blast'text2 = 'Hundreds believed injured in Texas fertiliser plant blast'

similarity_score = client.analyze_text_similarity(text1, text2)

print(similarity_score)
# Output:
#
# {"text_similarity_score": 4.55188775062561}

The similarity score indicates the semantic relatedness between the input texts, expressed in the range of 1 to 5, where 1 means highly non-related and 5 means highly related:

  • 5 - The two sentences are completely equivalent, as they mean the same thing.

Wrap up

In this tutorial we described how to use the Semantic Similarity endpoint of the Codeq NLP API. The code below summarizes how to iterate over its output:

  • Take a look at our documentation to learn more about the NLP tools we provide.

Senior Computational Linguist at Codeq