Categories
Articles Blog Engineering

Visual Prompting: LLMs vs. Image Generation

We’ve been trying a lot of different things in Project Cyborg, our quest to create the DevOps bot. The technology around AI is complicated and evolving quickly. Once you move away from Chat Bots and start making more complicated things, like working with embeddings and agents, you have to hold a lot of information in your mind. It would be nice to visualize this info.

Visual prompting is what we were looking for, and it’s more complicated than I expected.

Visual Prompting for Image Generation

I’ve been working with LLMs and text generation almost exclusively in AI. I haven’t had much need to use image generation. The tech is really interesting, but not very useful for creating a DevOps bot. However, I did hear about Chainner, a visual composer for image generation. Its interface will be familiar to you if you’ve worked with a node-based shader editor before.

chaiNNer - work4ai
Example of Chainner for images

This is a really cool way of working with image generation LLMs. Instead of working in Python to create images, you can work on them visually. Some things just make more sense mapped out visually. This could help us to mentally simplify some of the complex tasks we’re dealing with. This made me wonder: could I modify Chainner to work with LLMs?

Chainner for LLMs

Chainner doesn’t have anything built-in for LLMs. I’m not really surprised. However, it is well designed, and so it wasn’t very difficult to see how I would implement it myself.

I started with a simple LLM node. Here’s a sample from the code:

@NodeFactory.register("chainner:llm:agent_node")
class PromptNode(NodeBase):
    def __init__(self):
        super().__init__()
        self.description = "This is a node for making LLM prompts through an agent"
        self.inputs = [
            TextInput(
                "Prompt",
            ),
TextInput(
                "Prefix",
            ).make_optional(),
            TextInput(
                "Suffix",
            ).make_optional(),
            TextInput(
                "Separator",
                default="####",
            ).make_optional(),
            EnumInput(
                LLMOptions,
                "LLM",
                option_labels={k: k.value for k in LLMOptions},
            ),
            BoolInput("Use Google"),
            BoolInput("Use Vectorstore"),
            DirectoryInput("Vectorstore Directory", has_handle=True).make_optional(),
            EnumInput(
                EmbeddingsOptions,
                "Embeddings Platform",
                option_labels={k: k.value for k in EmbeddingsOptions},
            ),
        ]
        self.outputs = [
            TextOutput(
                "Result",
            )
        ]

        self.category = LLMCategory
        self.name = "LLM Agent"
        self.icon = "MdCalculate"
        self.sub = "Language Models"
LLM node example

With that working, I moved on to creating a node for a Vectorstore (aka embeddings).

@NodeFactory.register("chainner:llm:load_vectorstore")
class VectorstoreNode(NodeBase):
    def __init__(self):
        super().__init__()
        self.description = "This is a node for loading a vectorstore"
        self.inputs = [
            EnumInput(
                EmbeddingsOptions,
                "Embeddings Platform",
                option_labels={k: k.value for k in EmbeddingsOptions},
            ),
            DirectoryInput("Vectorstore Directory", has_handle=True),
        ]
        self.outputs = [
            TextOutput(
                "Vectorstore",
            )
        ]

        self.category = LLMCategory
        self.name = "Load Vectorstore"
        self.icon = "MdCalculate"
        self.sub = "Language Models"
A vectorstore node

You get the idea of the workflow. At the end of my experimenting, I ended up with a sample graph that looked like this:

An agent example

Roadblocks for Chainner

It’s about here that I had to abandon the experiement.

It was looking cool, and I liked the concept. There was only one problem: it wasn’t going to work, not as Chainner was designed. I don’t want to get too deep into the weeds on it, but there’s a dependency issue. We’re using self-hosted embeddings on some of our vectorstores in Project Cyborg. That means that we’re using open-source AI models for some of the embeddings. To do this, we’re spinning up spot-instances on Lambda Labs. One of the python libraries you need to run self-hosted embeddings only works on Unix-based systems (shoutout to file-path faux pa). That’s not a problem if you’re working in VSCode or a command line. It is a problem when you need to run an app with a GUI on Windows.

There are also some other problems with the visual scripting in general that stopped me from pursuing it further.

Other Visual Prompting Solutions

The day after I decided to stop pursuing the Chainner option, Langflow was released.

Look familiar?

Langflow is a visual interface for Langchain. So, basically exactly what I was doing. It is very new, and under development, but it does some things very well. If you’re looking to create a simple app using an Agent, and you don’t know python, Langflow gives you an option. It doesn’t currently support exporting to code, so it has limited usage in production. You could treat it as interactive outlining.

It does highlight the biggest problem currently with visual prompt engineering: you still need to have a strong understanding of the systems at play. To even use Langflow, you have to understand what a zero-shot agent is, and how that interacts with an LLM chain, and how you would create tools and supply them to the agent. You don’t really gain a lot in terms of complexity reduction. You also lose so much in terms of customization. Unless you customize your nodes so that you can provide every single perimeter that the background API supplies, you have to create tons of separate, very similar nodes. For LLMs, visual graphs are only very useful for small tasks.

Ultimately, the existing solutions serve a purpose, but they don’t really reduce the cognitive load of working with LLMs. You still need to know all of the same things, you just might be able to look at it in a different light. With everything changing so fast, it makes more sense for us to stick with good, old-fashioned programming. Hopefully, visual prompting will catch up, and be useful for more than image processing and chatbots.

Recent Posts

Categories
Articles Blog Engineering

How to take the brain out of the box: AI Agents

An AI Agent at work answer questions ChatGPT can't
An AI Agent at work answers questions ChatGPT can’t

Working with LLMs is complicated. For simple setups, like general purpose chatbots (ChatGPT), or classification, you have few moving pieces. But when it’s time to get serious work done, you have to coax your model into doing a lot more. We’re working on Project Cyborg, a DevOps bot that can identify security flaws, identify cost-savings opportunities in your cloud deployments and help you to follow best practices. What we need is an AI Agent.

Why do we need an agent?

Let’s start at the base of modern AI: the Large Language Model (LLM).

LLMs work on prediction. Give an LLM a prompt, and it will try and predict what the right answer is (a completion). Everything we do with AI and text generation is powered by LLMs. GPT-3, GPT-3.5 and GPT-4 are all LLMs. The problem with this is that they are limited to working with initial training data. These models cannot access the outside world. They are a brain in a box.

You have a few different options depending on your use-case. You can use fine-tuning, where you undergo another training stage. Fine tuning is excellent, and has a lot of use cases (like classification). It still doesn’t let you use live data. You can also use embeddings. This lets you extend the context length (memory) of your AI to give so that it can process more data at once. Embeddings help a lot, but they don’t help the LLM take action in the outside world.

The other option is to use an AI agent.

What is an Agent?

Here’s the simplest definition:

An AI agent is powered by an LLM, and it uses tools (like Google Search, a calculator, or a vectorstore) to interact with the outside world.

That way, you can take advantage of the communication skills of an LLM, and also work on real-world problems. Without an agent, LLMs are limited to things like chatbots, classification and generative text. With agents, you can have a bot that can pull live information and make changes in the world. You’re giving your brain in a box a body.

How can we do this? Well, I’m going to be using Langchain, which comes with multiple agent implementations. These are based on ReAct, a system outlined in a paper by Princeton University professors. The details are complicated, but the implementation is fairly simple: you tell your AI model to respond in a certain style. You ask them to think things through step by step, and then take actions using tools. LLMs can’t use tools by default, so they’ll try and make up what the tools would do. That’s when you step in, and do the thing the AI was trying to fake. For example, if you give it access to Google, it will just pretend to make a Google Search. You set up the tools so that you can make an actual Google Search and then feed the results back into the LLM.

The results can seem magical.

Example: AI Agent with Google Search

Let’s start with a simple agent that has access to two tools.

from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.llms import OpenAI
# We'll use an OpenAI model (Davinci by default) as the "brain" of our agent
llm = OpenAI(temperature=0)

# We'll provide two tools to the agent to solve problems: Google, and a tool for handling math
tools = load_tools(["google-search", "llm-math"], llm=llm)

# This agent is based on the ReAct paper
agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)

while True:
    prompt = input("What would you like the agent to tell you (press CTRL+C to quit)?")
    agent(prompt)

These agent examples look the best in video form:

Example: AI Agent with Access to External Documents (Vectorstore)

Here’s another example that uses a tool to pull information about Azure. I converted the official Azure documentation into a Vectorstore (aka embeddings). This is being used by Project Cyborg so that our DevOps bot can understand best practices and the capabilities of Azure.

tools = [
    Tool(
        name = "Azure QA System",
        func=chain,
        description="useful for when you need to answer questions about Azure. Input should be a fully formed question.",
        return_direct=False
    )
]

Here’s it in action:

AI Agents make LLMs useful

Chatbots are cool. They are very useful for many things. They can’t do everything, though. Most of the time, your AI will need access to live info, and you’d like for it to be able to do things for you. Not just be a very smart brain that can talk. Agents can do that for you. We’re figuring out how we can use them here at Electric Pipelines. If you want help figuring out how agents could help your business, let us know! We’d be happy to talk.

* indicates required

Recent Posts

Categories
Articles Blog Engineering

What does AI Embedding have to do with Devops?

AI embeddings are powerful. We’re working on Project Cyborg, a project to create a DevOps bot.

There’s a lot of steps to get there. Our bot should be able to analyze real-world systems and find our where we could implement best practices. It should be able to look at security systems and cloud deployments to help us better serve our customers.

To that end, our bot needs to know what best practices are. All of the documentation for Azure and AWS is available for free, and it’s searchable. However, online documentation doesn’t help with problem solving. It only helps if you have someone capable running a search. We want to be able to search based on our problems and real-world deployments. The solution: embeddings.

AI Embeddings

Here’s the technical definition: Text embeddings measure the relatedness of text strings.

Let’s talk application: embeddings allow us to compare the meaning of sentences. Instead of needing to know the right words for what you’re searching for, you can search more generally. Embedding enables that. 

Embeddings work by converting text into a list of numbers. Then, those numbers can be compared to one another later, and similarities can be found that a human couldn’t detect. Converting text to embeddings is not terribly difficult. OpenAI offers an embedding model that runs off of Ada, their cheapest model. Ada has a problem, though.

Ada has a memory problem

Ada is a powerful model, and if it can keep track of what it’s supposed to be doing, it does excellent work. However, it has a low context length, which is just a fancy way saying it has Alzheimer’s. So, you can’t give Ada long document and have it remember all of it. It can only hold on to a few sentences in its memory at a time. More advanced models, like Davinci, have much better memory. We need a way to get Ada to remember more.

Langchain

We’ve been using langchain for a few different parts of Project Cyborg, and it has a great tool in place for embedding as well. It has tools to split documents up into shorter chunks, so that Ada can process them one at a time. It can then store these chunks together into a Document Store. This acts as long-term memory for Ada. You can embed large documents and collections of documents together, and then access them later.

            By breaking it up documents into smaller pieces, it allows you to search your store for chunks that would be relevant. Let’s go over some examples.

You can see the document (data-factory.txt) and the different chunks (5076, 234, 5536) it’s pulling from for the answer
In this case, it pulls from multiple different documents to formulate an answer

Here you can see that we ask a question. An AI-model ingests our question, and then checks its long-term memory, our document store for the answer. If it knows the answer, it will reply with an answer and reference where it got that answer from.

Fine-Tuning vs. Embedding

Embeddings are different from fine-tuning in a few ways. Most relevant, embeddings are cheaper and easier to run, both for in-house models and for OpenAI models. Once you’ve saved you documents into a store, you can access them using few tokens and with off-the-shelf models. The downside comes in the initial embedding. To convert a lot of documents to an embedded format, like we needed to, takes millions of tokens. Even at low rates, that can add up.

Fine Tuned usage is significantly more expensive across the board

On the flip side, fine-tuning will typically use far fewer tokens than embedding, so even though the cost per token is much higher, it can be cheaper to fine-tune a model than to build out an embedded document store. However, running a fine-tuned model is expensive. If you use OpenAI, the cost per token is 4X the price of an off-the-shelf model. So, pick your poison. Some applications can take advantage of the higher initial cost, but then the cheaper cost of processing later.

Recent Posts

Categories
Articles Blog Engineering

Using Classification to Create an AI Bot to Scrape the News

Classification

We’re hard at work on Project Cyborg, our DevOps bot designed to enhance our team to provide 10x the DevOps services per person. Building a bot like this takes a lot of pieces working in concert. To that end, we need a step in our chain to classify requests: does a query need to go to our Containerization model or our Security model? The solution: classification. A model that can figure out what kind of prompt it has been given. Then, we can pass along the prompt to the correct model. To test out the options on OpenAI for classification, I trained a model to determine if news articles would be relevant to our business or not.

Google News

I started by pulling down the articles from Google News.

from GoogleNews import GoogleNews
start_date = '01-01-2023'
end_date = '02-02-2023'
search_term = "Topic:Technology"
googlenews=GoogleNews(start=start_date,end=end_date)
googlenews.search(search_term)
result=googlenews.result()

This way, I can pull down a list of Google News articles with a certain search term within a date range. Google News does not do a good job of staying on topic by itself.

The second result Google News returns here already moves away from what we searched for

So, once I have this list of articles with full-text and summaries, I loaded them into a dataframe using Pandas and output that to an Excel sheet.

for i in range(2,20):
    # Grab the page info for each article
    googlenews.getpage(i)
    result=googlenews.result()
    # Create a Pandas Dataframe to store the article
    df=pd.DataFrame(result)
An example of some of the Google News data we pulled in

Fine-Tuning for Classification

Then comes the human effort. I need to teach the bot what articles I consider relevant to our business. So, I took the Excel sheet and added another column, Relevancy.

The updated spreadsheet had a column for relevancy

I then manually ran down a lot of articles, looked at titles, summaries and sometimes the full text, and marked them as relevant or irrelevant.

Then, I took the information I have for each article, title, summary, and full-text, and combined them into one column. They form the prompt for the fine tuning. Then, the completion is taken from the relevancy column. I put these two columns into a csv file. This will be the training set for our fine-tuned model.

Once I had the dataset, it was time to train the model. I ran the csv through OpenAI’s data preparation tool.

OpenAI’s fine tuning data preparation tool make sure your dataset is properly formatted for fine tuning

I got out our training dataset and our validation dataset. With that in hand, it was time to train a model. I selected Ada, the least-advanced GPT-3 model available. It’s not close to ChatGPT, but it is good for simple things like classification. A few cents and half an hour later, I have a fine-tuned model. 

Results

I can now integrate the fine-tuned model into my Google News scraping app. Now, it can pull down articles from a search term, and automatically determine if they are relevant or not. The relevant ones go into a spreadsheet to be viewed later. The app dynamically builds prompts that match the training data, and so I end up with a spreadsheet with only relevant articles.

A table with Google News articles only relevant to our company

Recent Posts