|
| 1 | +""" |
| 2 | +LangChain has an incredible caching system layered on top of the Large Languange Models |
| 3 | +(LLMs) that you use it with. The resources on this though are pretty scarce, and as of this |
| 4 | +recording, many of them are still outdated. If you search its official documentation, you will see it still |
| 5 | +uses the old .predict() and if you try to run it, you'll get issues like this https://github.com/hwchase17/langchain/issues/6740 |
| 6 | +and it doesn't run. I did a quick google search and checked the first two results and they are |
| 7 | +just lazy copy-and-paste from the docs, which means it's also wrong and plainly doesn't work. |
| 8 | +
|
| 9 | +But caching is so, so useful. Especially if you're building LLMs for production use where you're feeding |
| 10 | +a large context to the model, like using GPT to query your own personal knowledge base (I have a video |
| 11 | +on exactly that where I load in my bullet journals in a markdown format and build my query engine using gpt on top of it), |
| 12 | +or if you're trying to learn a foreign language by getting GPT to tutor you on a |
| 13 | +book (also have a video on that). In both cases, you're feeding a large context to the model, and you |
| 14 | +are going to incur quite a bit of cost and your queries will be slow if you don't cache. |
| 15 | +
|
| 16 | +So let me show you how to Caching with langchain and it's surprisingly easy. All of this code will be |
| 17 | +on my github, along with the rest of this LLM series if you've been following along. We're on video number |
| 18 | +13 now so there's a lot we've covered, and caching is just a great addition to your LLM development toolkit. |
| 19 | +
|
| 20 | +Let's open up a file and start with langchain's implementation of an in memory cache. Name it demo whatever. |
| 21 | +Before looking at the code, if you had asked me to guess, I thought it would be using |
| 22 | +python's LRU cache, which is also part of the standard library. I love caching and I have a video on LRU |
| 23 | +cache if you want to introduce built in caching to your python programs. But I took at the code and |
| 24 | +realize I was wrong, it was far simpler than that, it's just a dictionary. https://github.com/hwchase17/langchain/blob/master/langchain/cache.py#L102 |
| 25 | +
|
| 26 | +And as a quick primer, a cache is just this dictionary that stores the result of a function call, |
| 27 | +so that repeated calls with the same arguments don't have to recompute the result. So if you have a |
| 28 | +function that takes a long time to run, you can cache the result of that function call, and the next |
| 29 | +time you use the input it should yield the same result by just referring to the dictionary, do a quick |
| 30 | +look up instead of burning your openai credits, your computation power, or whatever resource you're using |
| 31 | +for the computation. It saves you lots of time and money, and if you're not using cache for all these |
| 32 | +repeated queries, you're leaving money on the table. |
| 33 | +""" |
| 34 | + |
| 35 | +import time |
| 36 | +from dotenv import load_dotenv |
| 37 | +import langchain |
| 38 | +from langchain.llms import OpenAI |
| 39 | +from langchain.callbacks import get_openai_callback |
| 40 | +from langchain.cache import InMemoryCache |
| 41 | + |
| 42 | +load_dotenv() |
| 43 | + |
| 44 | +# to make caching obvious, we use a slow model |
| 45 | +llm = OpenAI(model_name="text-davinci-002") |
| 46 | + |
| 47 | +langchain.llm_cache = InMemoryCache() |
| 48 | + |
| 49 | +with get_openai_callback() as cb: |
| 50 | + start = time.time() |
| 51 | + result = llm("What doesn't fall far from the tree?") |
| 52 | + print(result) |
| 53 | + end = time.time() |
| 54 | + print("--- cb") |
| 55 | + print(str(cb) + f" ({end - start:.2f} seconds)") |
| 56 | + |
| 57 | +with get_openai_callback() as cb2: |
| 58 | + start = time.time() |
| 59 | + result2 = llm("What doesn't fall far from the tree?") |
| 60 | + result3 = llm("What doesn't fall far from the tree?") |
| 61 | + end = time.time() |
| 62 | + print(result2) |
| 63 | + print(result3) |
| 64 | + print("--- cb2") |
| 65 | + print(str(cb2) + f" ({end - start:.2f} seconds)") |
0 commit comments