HELP!

Bayesian Classification Server API

The Viabl.ai Bayesian Classification Server is a multi-tenanted server providing a Machine Learning classifier based on the Naïve Bayesian algorithm. See here for both structured and unstructured data.

The following section describes the Bayesian server API

Usage Overview

The process of using the Bayesian Classifier is detailed in the following steps:

  • Add a domain to the server. This is a name which is used to reference the collection of texts. A classic example would be to have a domain called “spam” to refer to a classifier used to determine whether an email is spam or not.
  • Train the Bayesian domain by providing multiple texts along with their known classification.
  • At this point, un-categorized texts can be presented to the classifier. The result will be a sorted list of potential categories along with their probabilities.
  • If more (classified) texts become available, the classifier can be trained further (without presenting the original texts)
  • (optional) Each classifier domain can have user-specified data stored along with it. This is used (for example) by the structured data classifier to hold dataset statistics associated with training which are then used at a future time for the classification of new data

Special Features

The categorization process is further enhanced by the adoption of some additional text pre-processing steps:

  • Character case is not taken into account in the categorization process
  • Stop words are removed from the texts before processing. The list of stop words can be viewed and/or amended via the settings.json file
  • Individual words are stemmed (utilizing the Porter Stemmer Algorithm) to remove the more common morphological and inflexional endings from words

Calling the Classifier

The classifier provides a REST based API which is called via the HTTP POST method. The specific operation to perform is specified via the URL path with the BODY containing a stringified JSON object. The port number which the classifier listens is defined in the settings.json file.

API Methods

Add a new domain

URL path: /add_domain Add a new categorization domain. The domain is used to collect and classify a group of related texts. For example “spam” or “credit_authorization”

BODY message:

{
    "domain": "the name of the domain to add"
}

Remove a domain

URL path: /remove_domain Remove a previously added domain. N.b. all training data will be lost BODY message:

{
    “domain”: “the name of the domain to remove”
}

List the domains defined on a server

URL path: /list_domains BODY message:

{}

Return message:

{
    "data": [
        "domain name 1",
        "domain name 2"
    ]
}

List the categories for a domain

URL path: /list_categories Retrieve the exhaustive list of categories for a [trained] domain. The return message also provides the record count of each category. BODY message:

{
	"domain": "the name of the domain to retrieve the categories of"
}

Return message:

{
    "data": [
        {
            "name": "pass",
            "count": 2400
        },
        {
            "name": "fail",
            "count": 60
        }
   ]
}

Train the classifier

URL path: /train Train the classifier (for a specified domain) on a single text string. BODY message:

{
    "domain": "the name of the domain to train",
    "words": "the text to learn from",
    "category": "the category associated with this text"
}

Store meta data for a domain

URL path: /write_metadata Store user-supplied data against a specified domain. BODY message:

{
    “domain”: “the name of the domain to associate the data with”
    “data”: user-supplied data of any type (object, array etc)
}
``` 
### Retrieve meta data for a domain


URL path: /read_metadata
Retrieve the (previously supplied) data for a specified domain.
BODY message:
``` javascript
{
    “domain”: “the name of the domain to retrieve data for”
}

Return message:

{
    "data": user-supplied data of any type (object, array etc)
}

Classify an Unknown Text

URL path: /classify Perform classification on an unclassified text string (for a specified domain) BODY message:

{
    "domain": "the name of the domain to train",
    "words": "the text to learn from"
}

Example return message:

{
    "results": [
        {
            "category": "reject",
            "posterior": 1
        },
        {
            "category": "accept",
            "posterior": 0.71
        }
    ]
}

Note The number of categories returned is limited to a maximum of 10. In addition, any category with a posterior (classification certainty) < 0.005 is not returned.

On This Page