Starfruit Tagger

Starfruit is a system which suggests tags for news and sport articles using a classifier trained on past tagging choices by BBC journalists. The suggested tags are BBC Things.

You can request tags representing the primary topics of an article ("about" tags) using the topics method. This typically yields 3-5 tags per article.

Alternatively, you can request tags for every entity or theme mentioned in an article ("mentions" tags) using the mentions method. This typically yields 5-20 tags per article.

The tags method returns both topics and mentions tags and is faster than calling the API methods sequentially.

Finally, the names method finds mentions of people, organisations, places etc that are not limited or matched to BBC Things.

The BBC Things supported by Starfruit are limited to those which have previously been used by journalists to tag news and sport articles. You can search for those matching a particular text string using the search method.

Note that there may sometimes be errors and omissions in Starfruit results. Feedback is welcome and can help us improve the system.

Topics API method

The Topics API accepts the following requests, where <text> is the text to tag or <uri> is a URI to tag.

The optional section parameter can be included when the section URL identifier is known e.g. "/news/health"

The optional top parameter can be used to obtain a fixed number of results. However, this may include results with scores which fall below the normal cut-off threshold (0.5).

Note that this method is only suitable for news and sports articles. The mentions method should be used for other types of content and for text streams which combine different news stories.

Mentions API method

The Mentions API accepts the following requests, where <text> is the text to tag or <uri> is a URI to tag.

The optional type parameter controls whether entities or themes are returned. The default is to return both.

The optional scope parameter controls whether all results are returned or only those found in the title and leading sentences. The default is all.

The optional method parameter controls whether the standard or fast method is used. The default is standard. The fast method provides much lower latency but the results do not include confidence scores and are predominantly restricted to entities.

The optional threshold parameter is applied to the confidence scores. The default value is 0.35 which gives optimum precision. For optimum recall the threshold can be set to 0.0. Mentions with confidence scores below the threshold will be omitted. Note that the threshold is ignored when the fast method is used.

Tags API method

The Tags API accepts the following requests, where <text> is the text to tag or <uri> is a URI to tag.

The optional parameters are as defined above.

Names API method

The Names API accepts the following requests, where <text> is the text to tag or <uri> is a URI to tag.

The optional scope parameter controls whether all results are returned or only those found in the title and leading sentences. The default is all.

Search API method

The Search API match finds BBC Things which match a specified text string. The method accepts the following requests, where <text> is the text to tag.

The optional type parameter controls whether entities or themes are returned. The default is to return both.

All methods

All methods accept GET or POST requests and return a JSON-LD response.

For more information please refer to the Starfruit Confluence page

Topics API Demo


Mentions API Demo


Default: 0.35
Entities
Themes
Both

Leading mentions
All mentions

Fast method
Standard method

Contact

For more information please contact: chris.newell@bbc.co.uk