I find it interesting that their entity extraction method for building a knowledge graph does not use or require one of the 'in-vogue' extraction libraries like instructor, Marvin, or Guardrails (all of which build off of pydantic). They just tell the llm to list graph nodes and edges in a list, and do some basic delimiter parsing, and load the result right into a networkx graph [1]. Is this because GPT-4 and the like have become very reliable at following specific formatting instructions, like a certain .json schema?
It looks like they just provide in the prompt a number of examples that follow the schema they want [2].
[1] https://github.com/microsoft/graphrag/blob/main/graphrag/ind...
[2] https://github.com/microsoft/graphrag/blob/main/graphrag/ind...
regardless how reliable llms are they will never be perfect, so graphrag will error when the RNG gods feel like ruining your day. use pydantic.
Was looking forward to this. Haven’t looked at the code yet, but yeah - it’s a bit of a red flag that it doesn’t use an established parsing and validation library.
i mean dont throw baby out with bathwater. its research code, take the good chuck the bad.
Have you any insights about whether these python libraries are using extensions for the heavy lifting? Current situation involves layers upon layers of parsing and serializing/deserializing in Python.
sorry what kind of extensions do you mean? im not aware of any. this stuff is just boilerplate
Using a library like Instructor adds a significant token overhead. While that overhead can be justified, if their use case performs fine without it, I don’t see any reason to add such a dependency.
That's what I was wondering. How the token overhead of function calling with, say, Instructor compares with a regular chat response that includes few-shot learning (schema examples).
Maybe Instructor makes the most sense only when you're working with potentially malicious user data
anecdotally i found gpt3.5 was decently reliable at returning structured data but it took some prompt tuning and gpt4 has never returned invalid JSON when asked for it, though it doesn't always return the same fields or field names. it performs better when the schema is explicit and included in the prompt, but that can add a non trivial number of tokens to each request.
LLMs are very good at knowledge extraction and following instructions, especially if you provide examples. What you see in the prompt you linked is an example of in-context learning, in particular a method called Few-Shot prompting [1]. You provide the model some specific examples of input and desired output, and it will follow the example as best it can. Which, with the latest frontier models, is pretty darn well.
[1] https://www.promptingguide.ai/techniques/fewshot