TransWikia.com

Web API to query word frequency data

Software Recommendations Asked by hippietrail on August 11, 2020

There are many algorithms which are based on comparative word frequency used in clustering, keyword analysis, tf-idf, etc.

Usually you need to calculate your own word frequencies from your own corpus. Very large corpora are better but of course this takes a lot of work, space, time, etc, and distracts from the task at hand.

I’m wondering if there are any Web API providers that have done all this for you and provide programmatic access to frequency data via the web.

Requirements:

  • English is a must, other languages are a big plus.
  • Gratis is better than paid, open is better than closed.
  • Optional stemming and/or lemmatization would be a plus but not required.
  • Any requirements for registration, throttling, daily limits, etc are OK.
  • Any format is OK but urlencoded and JSON are expected.
  • Unicode support is a very strong preference.
    (Should not blow up on words like café, naïve, etc.)

One Answer

To return the relative frequency in 1 million words of the word "smartass", query:

https://api.datamuse.com/words?sp=smartass&md=f&max=1

It outputs:

[{"word":"smartass","score":129630,"tags":["f:0.067229"]}]

Extract the result from the json returned, e.g. with python like (the score is NOT the count):

import requests

_wait = 0.5

def get_freq(term):
    response = None
    while True:
        try:
            response = requests.get('https://api.datamuse.com/words?sp='+term+'&md=f&max=1').json()
        except:
            print 'Could not get response. Sleep and retry...'
            time.sleep(_wait)
            continue
        break;
    freq = 0.0 if len(response)==0 else float(response[0]['tags'][0][2:])
    return freq

You can call this 100,000 times a day. It seems that this is automatically maintained if you run a single process as the response has a delay such that it comes to roughly 100k responses per day.

The counts are from the google n-gram corpus.

Correct answer by Radio Controlled on August 11, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP