Content Discovery API for RAG Summaries and Insights

This tutorial is essentially a walkthrough implementation of RAG with semantic search of Farcaster content. We hope it will spark creativity, and be a bridge for exploration, unifying our collective pursuit in separating the wheat from the chaff, and celebrating how we differ in what we consider as wheat and what we consider as chaff.


Semantic Search

We begin with your 'mbd API key, which you can get from this link. Once that's sorted, let us make your first call to the 'mbd Content Discovery API. The following get_items_with_metadata function will take a text query and the number of top results. Under the hood, the top_k results are ranked according to their cosine similarity scores, measured against our vector database after converting your query to an embedding of size 384 with the all-MiniLM-L6-v2 model.

Setting the return_ai_labels as part of the data to be return will provide us with classification labels pertaining to the categories: topics, emotions, sentiments, and moderation. The return_metadata will provide us with the text of the post.

import requests
import os

def get_items_with_metadata(query, top_k=10):
	url = 'https://api.mbd.xyz/v1/farcaster/casts/search/semantic'
	MBD_API_KEY = os.getenv('MBD_API_KEY')
	
	headers = {
		"accept": "application/json",
		"HTTP-Referer": "https://docs.mbd.xyz/",
		"X-Title": "mbd_docs",
		"content-type": "application/json",
		"x-api-key": MBD_API_KEY
	}
	
	data = {
		'return_ai_labels': True,
		'return_metadata': True,
		'query': query,
		'top_k': top_k
	}
	
	response = requests.post(url, headers=headers, json=data)
	
	if response.status_code == 200:
		response_json = response.json()
		return response_json['body']
	
	else:
		print('Failed:', response.status_code)
		print(response.text)
		return []

The next step is to combine both the text of the posts and their respective annotations, this will enrich the context we can provide to the LLM of our choice later when we make our RAG calls. The build_enriched_contextwill take care of merging everything into a tidy text.

def build_enriched_context(items, max_items=10):
	context = []
	for item in items[:max_items]:
		text = item['metadata'].get('text', 'No text found')
		labels = item.get('ai_labels', {})
		
		labels_parts = []
		for category, values in labels.items():
			if values: # Only include non-empty categories
				labels_parts.append(f"{category.capitalize()}: {', '.join(values)}")
		labels_str = "; ".join(labels_parts) if labels_parts else "No labels available"
		context.append(f"Text: {text}\nLabels: {labels_str}")
	return "\n\n".join(context)

Let us test what we have built so far. We will run our first search using the query: Be the change you want to see in the world. We will initially call the get_items_with_metadata, the output of which will directly fee the build_enriched_contextfunction. Notice that we are setting the number of results to get and process to 100 for a larger context:

query = "Be the change you want to see in the world"
items = get_items_with_metadata(query, top_k=100)
context = build_enriched_context(items, max_items=100)

Here is what the output looks like after printing the first two items in context:

"Text: Be the change that you wish to see in the world.\nLabels: Topics: diaries_daily_life; Emotion: joy, optimism\n\nText: Be the change that you wish to see in the world .\nLabels: Topics: diaries_daily_life; Emotion: joy, optimism\n\nText: Be the change that you wish to see in the world.\nLabels: Topics: diaries_daily_life; Emotion: joy, optimism\n\n

Notice that we are not getting the sentiment and moderation labels in the first two posts we have printed. The reason is, unless their score is at least 0.7, we are omitting them from being returned in a concern for accuracy. Since this is a famous quote, it is to be expected that many posts would have more or less the same text.

Summarising and Extracting Insights with RAG

Let us now make use of the constructed context. We will make an OpenAI request using the Chat Completions API. Feel welcome to try different prompts for the system and user roles to accommodate your use case.

openai_api_key = os.environ.get('OPENAI_API_KEY')
openai_client = OpenAI()
completion = openai_client.chat.completions.create(
	model="gpt-4",
	messages=[
	{"role": "system", "content": "You are an insightful and brief assistant. Please provide a summary of the social media posts and a description of the overall labels present."},
	{"role": "user", "content": f"The following social media posts are provided as context: {context} Based on these posts, can you please provide a summary and describe the overall labels?"}
	],
	temperature=0.7,
	max_tokens=150,
	top_p=1.0,
	frequency_penalty=0.0,
	presence_penalty=0.0
)
print(completion.choices[0].message.content)

This gives us the following output, summarising what the 100 posts are expressing along with an overview of their annotations:

The social media posts predominantly revolve around the theme of change, often paired with the famous quote by Mahatma Gandhi, "Be the change you wish to see in the world." The posts inspire optimism and joy, with many users expressing their desire to see positive transformations in the world or their own lives. The posts are largely categorized under the topic of 'diaries_daily_life', with a few posts falling under 'news_social_concern'. Some unique posts fall under 'business_entrepreneurs', 'arts_culture', and 'film_tv_video'. The emotions conveyed are predominantly optimism and joy, with some instances of love and anticipation, and one post expressing anger.

Interestingly, one post was labelled as anger. It would have taken us quite some effort to spot it if we were to go through the results one by one through eye inspection. In this case it was great to see that one of the added value of RAG here is either surfacing a divergence of opinion, or at the very least pointing us to what could potentially be misclassified text. The cast in question is the following:

{'item_id': '0xfd032e3dd8560ebe86e285565bde8bdeab99630b', 'score': 0.629180193, 'ai_labels': {'topics': ['news_social_concern'], 'sentiment': [], 'emotion': ['anger', 'optimism'], 'moderation': []}, 'metadata': {'text': 'Donald Trump can change the world'}}