Lecture 4

This lecture taught by of Prof. Cathy Yi-Hsuan Chen focuses on testing API accessibility via Postman. We demonstrate three cases of API retrieval.

Specifically, the code can be found in the Github

Here is Postman Tutorial where it guides you the interface of Postman

Outlines


What is API ?

  • An application programming interface (API) is an interface or communication protocol between different parts of a computer program intended to simplify the implementation and maintenance of software.
  • API has been often used to refer to a specific kind of interface between a client and a server
  • Web API : client makes a request in a specific format, it will always get a response in a specific format or initiate a defined action from a server
  • Here is a short video for API [YouTube] (https://www.youtube.com/watch?v=s7wmiS2mSXY)

Web API

  • API is typically defined as a set of specifications, such as Hypertext Transfer Protocol (HTTP) request messages, along with a definition of the structure of response messages, usually in an Extensible Markup Language (XML) or JavaScript Object Notation (JSON) format
  • In the social media space, web APIs have allowed web communities to facilitate sharing content and data between communities and applications.

Postman API

  • The Postman API allows you to programmatically access data stored in Postman account with ease
  • You need a valid API Key to send requests to the API endpoints
  • The API has an access rate limit applied to it
  • Response to every request is sent in JSON format
  • The request method (verb) determines the nature of action you intend to perform. A request made using the GET method implies that you want to fetch something from Postman, and POST implies you want to save something new to Postman

API examples

News API

News API is great as a data source for news tickers and other applications where you want to show your users live headlines. News API tracks headlines in 7 categories across over 50 countries, and at over a hundred top publications and blogs, in near real time

  • Read Term of USE before doing it! Term

  • Top headlines: provides live top and breaking headlines for a country, specific category in a country, single source, or multiple sources. You can also search with keywords. Articles are sorted by the earliest date published first.

  • Request parameters: country; category; source; pagesize; page; apiKey
  • Response: status; totalResutls; articles; sources; authors; title; description; url; content...
QueryParams={'country':'us', 'apiKey':API_KEY}
  • Define a class for NewsAPI
import requests
import json
from enum import Enum


class NewsApi:
    """Class for accessing News API functions.

  Methods:
    GetResource  -- get the resources of news.
    GetHeadlines -- Download top headlines of specific country.
    GetEverything -- Download top headlines of specific country.

  For more detail on News API, see the documentation at
  https://newsapi.org/
  """

    def __init__(self, key: str):

        self.baseUri = "https://newsapi.org/v2/"
        self.api_key = key

    def GetSources(self, category = 'business', country = 'us'):
        """Download categories for detailed retrieval information.
    """
        fullUri = self.baseUri + "sources"
        getParams = {'country': country, "category": category }
        getParams['apiKey'] = self.api_key
        try:
            result = requests.get(fullUri, params=getParams)
        except Exception as e:
            print("HTTP Request fail {}\r\n{}".format(fullUri, e))
            return None
        return json.loads(result.content)

    def GetHeadlines(self, country = 'us'):
        """Download top headlines of specific country.
    """
        fullUri = self.baseUri + "top-headlines"
        getParams = {'country': country }
        getParams['apiKey'] = self.api_key

        try:
            result = requests.get(fullUri, params=getParams)
        except Exception as e:
            print("HTTP Request fail {}\r\n{}".format(fullUri, e))
            return None
        return json.loads(result.content)

    def GetEverything(self, symbol):
        """Download top headlines of specific country.
    """
        fullUri = self.baseUri + "everything"
        getParams = {'q': symbol }
        getParams['apiKey'] = self.api_key

        try:
            result = requests.get(fullUri, params=getParams)
        except Exception as e:
            print("HTTP Request fail {}\r\n{}".format(fullUri, e))
            return None
        return json.loads(result.content)
from NewsAPI import NewsApi    # create module NewsAPI for object class "NewsApi"
import pandas as pd
import os


def CreateDF(JsonArray,columns):
    dfData = pd.DataFrame()

    for item in JsonArray:
        itemStruct = {}

        for cunColumn in columns:
            itemStruct[cunColumn] = item[cunColumn]

        dfData = dfData.append(itemStruct,ignore_index=True)
            # dfData = dfData.append({'id': item['id'], 'name': item['name'], 'description': item['description']},
            #                        ignore_index=True)

    return dfData


def main():
    # access_token_NewsAPI.txt must contain your personal access token
    with open("access_token_NewsAPI.txt", "r") as f:
        myKey = f.read()[:-1]

    api = NewsApi(myKey)

    # get sources of news
    columns = ['id', 'name', 'description']
    rst_source = api.GetSources()
    df = CreateDF(rst_source['sources'], columns)
    df.to_csv('source_list.csv')


    # get news for specific country
    rst_country = api.GetHeadlines()
    columns = [ 'author', 'publishedAt', 'title', 'description','content', 'url']
    df = CreateDF(rst_country['articles'], columns)
    df.to_csv('Headlines_country.csv')

    # get  news for specific symbol
    symbol = 'aapl'
    rst_symbol =api.GetEverything(symbol)
    columns = ['author', 'publishedAt', 'title', 'description', 'content', 'url']
    df = CreateDF(rst_symbol['articles'], columns)
    df.to_csv('Headlines_symbol.csv')


main()

James Quick

StockTwits API

  • Find list of symbol used by stocktwits here

  • The Web API provided by StockTwits doc. Please read API terms and conditions

  • The Stocktwits API only allows clients to make a limited number of calls in a given hour. This policy affects the APIs in different ways. API rate limiting

  • Search for symbols or users

    • Get message streams in the form of JSON
    • NLP for messages
    • Create timeline (dataframe)
import importlib
import StockTwitsAPI
importlib.reload(StockTwitsAPI)
import json
import datetime as dt
import time
from StockTwitsAPI import StockTwitsApi    # create module NewsAPI for object class "NewsApi"
import preprocessing as pre             # create preprocessing class for NLP
importlib.reload(pre)

def collect(symbol, access_token):
    api = StockTwitsApi(access_token)
    stream = api.stream_symbol(symbol).json()
    status = stream["response"]["status"]
    with open("stream{}.json".format(symbol), "w", encoding="utf-8") as f:
        json.dump(stream, f)

def main():

    with open("access_token_stockTwits.txt", "r") as f:
        access_token = f.read()[:-1]
    start = dt.datetime.now()
    Symbol='IBB'      # symbol of interest, can be a list of symbols
    collect(symbol=Symbol, access_token=access_token)   # single request
    print("Making timeline...")
    pre.make_timeline(symbol=Symbol)

main()
  • Create a class for NLP (Natural Language Processing) task including removing puntucation, stop words. Add "negtag_" to all words following one of NEGWORDS
from string import punctuation
import numpy as np
from numpy.random import binomial
import pandas as pd
import json

PUNCTUATION = list(punctuation.replace("?", "").replace("!", ""))
STOPWORDS = ["an", "a", "the"] + NEGWORDS

def read_stream(symbol):
    """Return the stream of messages in database related to cur.

      cur must be a dictionary with at least keys "title" and "symbol".
    An empty list is returned if the corresponding JSON file doesn't exist yet.
      """
    try:
        with open("stream{}.json".format(symbol), "r", encoding="utf-8") as f:
            stream = json.load(f)
        return (stream)
    except FileNotFoundError:
        print("Stream not found for {} ({}).".format(["title"],["symbol"]))
        return ([])

def make_timeline(symbol, path="timeline.csv"):
  """Create and save timeline to disk.

  The timeline is the time-ordered list of all messages in database without
  redundancies.
  It is saved as a CSV file with columns:
    id -- the unique id of the message
    created_at -- time at which was posted the message
    body -- the actual text of the message
    user -- the unique id of the author
    username -- the username of the author at the time he/she posted the message
    declared_sentiment -- 1 if the message was declared "Bullish" by the author,
                          -1 if it was declared "Bearish",
                          None (or empty field) otherwise
  """
  messages = []
  n = 0
  def extract_infos(message):
    return(message["id"], {
                           "created_at": message["created_at"],
                           "body": preprocess(message),
                           "user": message["user"]["id"],
                           "username": message["user"]["username"],
                           "declared_sentiment": _declared_sentiment(message)
                           })

  messages.extend([extract_infos(m) for m in read_stream(symbol)['messages']])
  print(" "*(40 + n),"\rLoading and preprocessing...")
  print("Creating timeline...")
  dic = {id: infos for id, infos in messages}
  df = pd.DataFrame.from_dict(dic, orient='index')
  df["set"] = np.array(["training", "testing"])[binomial(1, 0.25, size=len(df))]
  df.index.name = "id"
  print("Writing to disk...")
  df.to_csv(path)
  print("Done.")


def preprocess(m):
  """Preprocess messages and return the corresponding strings.

  message must be a dict with at least keys "body" and "symbol", or a string.
  """
  txt = _replace_symbols_users_links(m)
  txt = _remove_null_bytes(txt)
  txt = txt.lower()
  txt = txt.replace("\n", " ")
  txt = txt.replace("\r", " ")
  txt = _change_tags(txt, "$", "moneytag")
  txt = _change_tags(txt, "€", "moneytag")
  txt = _remove_punctuation(txt)
  txt = _numbertags(txt)
  txt = _contract_spaces(txt)
  txt = _remove_stopwords(txt)
  if txt.startswith(" "):
    txt = txt[1:]
  return(txt)

def _remove_null_bytes(txt):
  return(txt.replace("\0", ""))

def _remove_stopwords(txt):
  """Delete from txt all words contained in STOPWORDS."""
  words = txt.split(" ")
  for i, word in enumerate(words):
    if word in STOPWORDS:
      words[i] = ""
  return(txt)

def _contract_spaces(txt):
  """Contract all repetitions of spaces to one space."""
  while "  " in txt:
    txt = txt.replace("  ", " ")
  return(txt)

def _replace_symbols_users_links(m):
  if type(m) is str:
    # SHOULD REPLACE SYMBOLS, USERS AND LINKS FOR RAW TEXT
    return(m)
  else:
    txt = m["body"]
    #symbols
    for s in m["symbols"]:
      txt = txt.replace("$" + s["symbol"], "cashtag")
      if "aliases" in s:
        for alias in s["aliases"]:
          txt = txt.replace("$" + alias, "cashtag")
    #users
    if "mentioned_users" in m:
      for u in m["mentioned_users"]:
        txt = txt.replace(u,  "usertag")
    #links
    if "links" in m:
      for l in m["links"]:
        txt = txt.replace(l["url"], "linktag")
  return(txt)

def _numbertags(txt):
  """Replace all numbers by "numbertag"."""
  words = txt.split(" ")
  for i, word in enumerate(words):
    if word.isnumeric():
      words[i] = "numbertag"
  return(" ".join(words))

def _change_tags(txt, tag, newtag):
  """Replace words starting with tag by newtag."""
  words = txt.split(" ")
  for i, word in enumerate(words):
    if word.startswith(tag):
      words[i] = newtag
  return(" ".join(words))

def _remove_punctuation(txt):
  characters = list(txt)
  n = len(characters)
  insert, offset = [], 0
  for i in range(n):
    if characters[i] in PUNCTUATION:
      if 0 < i < n-1 and " " not in [characters[i-1], characters[i+1]]:
        characters[i] = " "
      else:
        characters[i] = ""
    if characters[i] in ["!", "?"] and characters[i-1] != " ":
      insert.append((i + offset, " "))
      offset += 1
  for i, c in insert:
    characters.insert(i, c)
  return("".join(characters))

def _declared_sentiment(message):
  stm = message["entities"]["sentiment"]
  return(2*int(stm["basic"] == "Bullish") - 1 if stm is not None else None)

James Quick

NASA API

NASA API portal. The objective of this site is to make NASA data, including imagery, eminently accessible to application developers. The api.nasa.gov catalog is growing

James Quick

  • Get API key here
  • Browse APIs here

  • Example: Techport API to make NASA technology project data availabe in a machine-readable format. TechPort - NASA's resource for collecting and sharing information about NASA-funded technology development. Techport allows the public to discover the technologies NASA is working on every day to explore space, understand the universe, and improve aeronautics. NASA is developing technologies in areas such as propulsion, nanotechnology, robotics, and human health. NASA is committed to making its data available and machine-readable through an Application Programming Interface (API) to better serve its user communities. As such, the NASA TechPort system provides a RESTful web services API to make technology project data available in a machine-readable format. This API can be used to export TechPort data into either an XML or a JSON format, which can then be further processed and analyzed.

*** Define Class

import requests
import json

class NasaApi:
    """Class for accessing News API functions.
  For more detail on NASA API, see the documentation at
  https://api.nasa.gov/
  """

    def __init__(self, key: str):

        self.baseUri = "https://api.nasa.gov/techport/api/projects"
        self.api_key = key


    def Get(self):
        """Get list of projects """
        fullUri = self.baseUri
        getParams = {'api_key': self.api_key}
        # getParams['api_key'] = self.api_key
        try:
            result = requests.get(fullUri, params=getParams)
        except Exception as e:
            print("HTTP Request fail {}\r\n{}".format(fullUri, e))
            return None
        return json.loads(result.content)

    def GetProject(self, id):
        """ Get first N projecs """
        fullUri = self.baseUri + "/{}.json".format(id)
        getParams = {'api_key': self.api_key}
        try:
            result = requests.get(fullUri, params=getParams)
        except Exception as e:
            print("HTTP Request fail {}\r\n{}".format(fullUri, e))
            return None
        return json.loads(result.content)
  • get the list of NASA projects and retrieve selected projects (use the predefined Nasa class)
import importlib
import NasaAPI
importlib.reload(NasaAPI)
from NasaAPI import NasaApi
import pandas as pd

def main():

    with open("access_token_Nasa.txt", "r") as f:
        myKey = f.read()[:-1]

    api = NasaApi(myKey)

    # get list of project
    Proj_list= api.Get()
    Proj_DF = pd.DataFrame(Proj_list['projects']['projects'])

    # get detail of specific project
    selectID= Proj_DF['id'][:10]    # select first 10 projects
    mydata = []

    for i, item in enumerate(selectID):
        rst= api.GetProject(item)
        print('making dataframe for {}... ({}/{})'.format(item, i, len(selectID)), end="\r")
        mydata.append({'id': item, 'title': rst['project']['title'], 'description': rst['project']['description'], 'benefits': rst['project']['benefits']})

    NASA_project_DF= pd.DataFrame(mydata,ignore_index='True')
    NASA_project_DF.to_csv("NASA_Project.csv")

main()

Additional Resources

results matching ""

    No results matching ""