Visual Prompting: LLMs vs. Image Generation

We’ve been trying a lot of different things in Project Cyborg, our quest to create the DevOps bot. The technology around AI is complicated and evolving quickly. Once you move away from Chat Bots and start making more complicated things, like working with embeddings and agents, you have to hold a lot of information in your mind. It would be nice to visualize this info.

Visual prompting is what we were looking for, and it’s more complicated than I expected.

Visual Prompting for Image Generation

I’ve been working with LLMs and text generation almost exclusively in AI. I haven’t had much need to use image generation. The tech is really interesting, but not very useful for creating a DevOps bot. However, I did hear about Chainner, a visual composer for image generation. Its interface will be familiar to you if you’ve worked with a node-based shader editor before.

chaiNNer - work4ai
Example of Chainner for images

This is a really cool way of working with image generation LLMs. Instead of working in Python to create images, you can work on them visually. Some things just make more sense mapped out visually. This could help us to mentally simplify some of the complex tasks we’re dealing with. This made me wonder: could I modify Chainner to work with LLMs?

Chainner for LLMs

Chainner doesn’t have anything built-in for LLMs. I’m not really surprised. However, it is well designed, and so it wasn’t very difficult to see how I would implement it myself.

I started with a simple LLM node. Here’s a sample from the code:

@NodeFactory.register("chainner:llm:agent_node")
class PromptNode(NodeBase):
    def __init__(self):
        super().__init__()
        self.description = "This is a node for making LLM prompts through an agent"
        self.inputs = [
            TextInput(
                "Prompt",
            ),
TextInput(
                "Prefix",
            ).make_optional(),
            TextInput(
                "Suffix",
            ).make_optional(),
            TextInput(
                "Separator",
                default="####",
            ).make_optional(),
            EnumInput(
                LLMOptions,
                "LLM",
                option_labels={k: k.value for k in LLMOptions},
            ),
            BoolInput("Use Google"),
            BoolInput("Use Vectorstore"),
            DirectoryInput("Vectorstore Directory", has_handle=True).make_optional(),
            EnumInput(
                EmbeddingsOptions,
                "Embeddings Platform",
                option_labels={k: k.value for k in EmbeddingsOptions},
            ),
        ]
        self.outputs = [
            TextOutput(
                "Result",
            )
        ]

        self.category = LLMCategory
        self.name = "LLM Agent"
        self.icon = "MdCalculate"
        self.sub = "Language Models"
LLM node example

With that working, I moved on to creating a node for a Vectorstore (aka embeddings).

@NodeFactory.register("chainner:llm:load_vectorstore")
class VectorstoreNode(NodeBase):
    def __init__(self):
        super().__init__()
        self.description = "This is a node for loading a vectorstore"
        self.inputs = [
            EnumInput(
                EmbeddingsOptions,
                "Embeddings Platform",
                option_labels={k: k.value for k in EmbeddingsOptions},
            ),
            DirectoryInput("Vectorstore Directory", has_handle=True),
        ]
        self.outputs = [
            TextOutput(
                "Vectorstore",
            )
        ]

        self.category = LLMCategory
        self.name = "Load Vectorstore"
        self.icon = "MdCalculate"
        self.sub = "Language Models"
A vectorstore node

You get the idea of the workflow. At the end of my experimenting, I ended up with a sample graph that looked like this:

An agent example

Roadblocks for Chainner

It’s about here that I had to abandon the experiement.

It was looking cool, and I liked the concept. There was only one problem: it wasn’t going to work, not as Chainner was designed. I don’t want to get too deep into the weeds on it, but there’s a dependency issue. We’re using self-hosted embeddings on some of our vectorstores in Project Cyborg. That means that we’re using open-source AI models for some of the embeddings. To do this, we’re spinning up spot-instances on Lambda Labs. One of the python libraries you need to run self-hosted embeddings only works on Unix-based systems (shoutout to file-path faux pa). That’s not a problem if you’re working in VSCode or a command line. It is a problem when you need to run an app with a GUI on Windows.

There are also some other problems with the visual scripting in general that stopped me from pursuing it further.

Other Visual Prompting Solutions

The day after I decided to stop pursuing the Chainner option, Langflow was released.

Look familiar?

Langflow is a visual interface for Langchain. So, basically exactly what I was doing. It is very new, and under development, but it does some things very well. If you’re looking to create a simple app using an Agent, and you don’t know python, Langflow gives you an option. It doesn’t currently support exporting to code, so it has limited usage in production. You could treat it as interactive outlining.

It does highlight the biggest problem currently with visual prompt engineering: you still need to have a strong understanding of the systems at play. To even use Langflow, you have to understand what a zero-shot agent is, and how that interacts with an LLM chain, and how you would create tools and supply them to the agent. You don’t really gain a lot in terms of complexity reduction. You also lose so much in terms of customization. Unless you customize your nodes so that you can provide every single perimeter that the background API supplies, you have to create tons of separate, very similar nodes. For LLMs, visual graphs are only very useful for small tasks.

Ultimately, the existing solutions serve a purpose, but they don’t really reduce the cognitive load of working with LLMs. You still need to know all of the same things, you just might be able to look at it in a different light. With everything changing so fast, it makes more sense for us to stick with good, old-fashioned programming. Hopefully, visual prompting will catch up, and be useful for more than image processing and chatbots.

Recent Posts


Posted

in

, ,

by

Tags: