{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Fleche Storage Backends\n", "\n", "This notebook demonstrates the usage of the different storage backends available in `fleche`.\n", "In `fleche`, a `Cache` is composed of two storage components:\n", "1. **values**: Stores the actual results of the functions.\n", "2. **calls**: Stores the metadata about the function calls (arguments, function name, etc.) and references to the stored values.\n", "\n", "You can mix and match different storage backends for values and calls to suit your needs." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Memory Storage\n", "\n", "The `Memory` storage backend keeps all the cached data in memory. This is the simplest backend and is useful for testing or when you don't need to persist the cache beyond the current process." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2026-03-22T19:56:49.949163Z", "iopub.status.busy": "2026-03-22T19:56:49.948989Z", "iopub.status.idle": "2026-03-22T19:56:50.449807Z", "shell.execute_reply": "2026-03-22T19:56:50.448807Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "No config file found. Using default memory cache.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "() ()\n", "Executing add(2, 3)\n", "Result 1: 5\n", "Result 2: 5\n", "Cache keys in calls storage: ['947d73b2adcb9f69e1ddd5d05678a08983d98a971591910b0466084214a9cc08']\n" ] } ], "source": [ "from fleche import fleche, cache\n", "from fleche.caches import Cache\n", "from fleche.storage import Memory\n", "\n", "# Using Memory for both values and calls\n", "memory_cache = Cache(values=Memory({}), _calls=Memory({}))\n", "\n", "with cache(memory_cache):\n", " @fleche\n", " def add(a, b):\n", " print(f\"Executing add({a}, {b})\")\n", " return a + b\n", " \n", " print(f\"Result 1: {add(2, 3)}\")\n", " print(f\"Result 2: {add(2, 3)}\") # This should be cached\n", "print('Cache keys in calls storage:', list(memory_cache.calls.list()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## PickleFile Storage\n", "\n", "The `PickleFile` storage backend serializes data using Python's standard `pickle` module and stores it in individual files. It requires a `root` directory." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2026-03-22T19:56:50.451880Z", "iopub.status.busy": "2026-03-22T19:56:50.451542Z", "iopub.status.idle": "2026-03-22T19:56:50.575246Z", "shell.execute_reply": "2026-03-22T19:56:50.574139Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "() ()\n", "Executing greet(World)\n", "Hello, World!\n", "Hello, World!\n", "Files in calls storage:\n", "e9cc541e2a3c72d117bec87999a8966057cdd5f356c83a98583288163d8ef535\r\n" ] } ], "source": [ "import shutil\n", "from pathlib import Path\n", "from fleche.storage import PickleFile\n", "\n", "shutil.rmtree('.pickle_cache', ignore_errors=True)\n", "pickle_cache = Cache(\n", " values=PickleFile.with_pickle(root='.pickle_cache/values'),\n", " _calls=PickleFile.with_pickle(root='.pickle_cache/calls')\n", ")\n", "\n", "with cache(pickle_cache):\n", " @fleche\n", " def greet(name):\n", " print(f\"Executing greet({name})\")\n", " return f\"Hello, {name}!\"\n", " \n", " print(greet(\"World\"))\n", " print(greet(\"World\"))\n", "\n", "print(\"Files in calls storage:\")\n", "!ls .pickle_cache/calls" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compressed PickleFile Storage\n", "\n", "You can also enable gzip compression for `PickleFile` (and its variants `CloudpickleFile` and `DillFile`) by passing `compress=True`." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2026-03-22T19:56:50.577316Z", "iopub.status.busy": "2026-03-22T19:56:50.577099Z", "iopub.status.idle": "2026-03-22T19:56:50.585392Z", "shell.execute_reply": "2026-03-22T19:56:50.584495Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "() ()\n", "Is gzip compressed: True\n" ] } ], "source": [ "import shutil\n", "shutil.rmtree('.compressed_pickle_cache', ignore_errors=True)\n", "compressed_cache = Cache(\n", " values=PickleFile.with_pickle(root='.compressed_pickle_cache/values', compress=True),\n", " _calls=PickleFile.with_pickle(root='.compressed_pickle_cache/calls', compress=True)\n", ")\n", "\n", "with cache(compressed_cache):\n", " @fleche\n", " def big_result(n):\n", " return \"a\" * n\n", " \n", " big_result(1000)\n", "\n", "import gzip\n", "file_path = list(Path('.compressed_pickle_cache/values').iterdir())[0]\n", "with open(file_path, 'rb') as f:\n", " header = f.read(2)\n", " # check for gzip magic number 0x1f 0x8b\n", " is_compressed = (header[0] == 0x1f) and (header[1] == 0x8b)\n", " print(f\"Is gzip compressed: {is_compressed}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## CloudpickleFile Storage\n", "\n", "The `CloudpickleFile` storage backend is similar to `PickleFile` but uses `cloudpickle` for serialization. `cloudpickle` can handle more complex Python objects, like lambdas or functions defined interactively, that standard `pickle` might struggle with." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2026-03-22T19:56:50.587178Z", "iopub.status.busy": "2026-03-22T19:56:50.586992Z", "iopub.status.idle": "2026-03-22T19:56:50.706524Z", "shell.execute_reply": "2026-03-22T19:56:50.705580Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "() ()\n", "Executing mul(3, 4)\n", "Result: 12\n", "Result: 12\n", "Files in values storage:\n", "056a34cfa7662755ef47b05e966a8cd152ad9cd663134c03b0a4e6f61349c7e5\r\n", "65d52a82c5a72f12ca0499522dc9274a0e6822e1038630ba68f94400b3e4c98f\r\n", "83ada2198553b88cb3d0882f7fca8c4e9531049b978df3e9e3b5d6301c6c0bfa\r\n" ] } ], "source": [ "from fleche.storage import PickleFile\n", "\n", "shutil.rmtree('.cloudpickle_cache', ignore_errors=True)\n", "cp_cache = Cache(\n", " values=PickleFile.with_cloudpickle(root='.cloudpickle_cache/values'),\n", " _calls=PickleFile.with_cloudpickle(root='.cloudpickle_cache/calls')\n", ")\n", "\n", "with cache(cp_cache):\n", " @fleche\n", " def mul(a, b):\n", " print(f\"Executing mul({a}, {b})\")\n", " return a * b\n", " \n", " print(f\"Result: {mul(3, 4)}\")\n", " print(f\"Result: {mul(3, 4)}\")\n", "\n", "print(\"Files in values storage:\")\n", "!ls .cloudpickle_cache/values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## DillFile Storage\n", "\n", "The `DillFile` storage backend is similar to `CloudpickleFile` but uses `dill` for serialization. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2026-03-22T19:56:50.708571Z", "iopub.status.busy": "2026-03-22T19:56:50.708378Z", "iopub.status.idle": "2026-03-22T19:56:50.829061Z", "shell.execute_reply": "2026-03-22T19:56:50.827888Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "() ()\n", "Executing add(3, 4)\n", "Result: 7\n", "Result: 7\n", "Files in values storage:\n", "307ea96b8b27c58a4c93e2e34e3783e475dc0e8ef99b4c5bbfa6b67e2cab7d80\r\n", "65d52a82c5a72f12ca0499522dc9274a0e6822e1038630ba68f94400b3e4c98f\r\n", "83ada2198553b88cb3d0882f7fca8c4e9531049b978df3e9e3b5d6301c6c0bfa\r\n" ] } ], "source": [ "from fleche.storage import PickleFile\n", "\n", "shutil.rmtree('.dill_cache', ignore_errors=True)\n", "dill_cache = Cache(\n", " values=PickleFile.with_dill(root='.dill_cache/values'),\n", " _calls=PickleFile.with_dill(root='.dill_cache/calls')\n", ")\n", "\n", "with cache(dill_cache):\n", " @fleche\n", " def add(a, b):\n", " print(f\"Executing add({a}, {b})\")\n", " return a + b\n", " \n", " print(f\"Result: {add(3, 4)}\")\n", " print(f\"Result: {add(3, 4)}\")\n", "\n", "print(\"Files in values storage:\")\n", "!ls .dill_cache/values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## BagOfHoldingH5File Storage\n", "\n", "The `BagOfHoldingH5File` storage backend uses the `bagofholding` library to store data in HDF5 files. This is particularly efficient for large numerical arrays (e.g., NumPy arrays)." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2026-03-22T19:56:50.831273Z", "iopub.status.busy": "2026-03-22T19:56:50.831064Z", "iopub.status.idle": "2026-03-22T19:56:50.959794Z", "shell.execute_reply": "2026-03-22T19:56:50.958659Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "() ()\n", "Executing make_array(5)\n", "Array sum: 25.0\n", "Array sum: 25.0\n", "Files in H5 values storage:\n", "183708770a6459111eb4effccefdb31bac94cee07d7ea356d8e99b08b8551bcf\r\n", "5b07a837d91d67764109c11fb912c9b91b4e9d9d4c909fca542e5fbe39ddc244\r\n" ] } ], "source": [ "import numpy as np\n", "from fleche.storage import BagOfHoldingH5File\n", "\n", "shutil.rmtree('.boh_cache', ignore_errors=True)\n", "boh_cache = Cache(\n", " values=BagOfHoldingH5File(root='.boh_cache/values'),\n", " _calls=PickleFile.with_cloudpickle(root='.boh_cache/calls') # We can use Cloudpickle for calls\n", ")\n", "\n", "with cache(boh_cache):\n", " @fleche\n", " def make_array(n):\n", " print(f\"Executing make_array({n})\")\n", " return np.ones((n, n))\n", "\n", " print(f\"Array sum: {make_array(5).sum()}\")\n", " print(f\"Array sum: {make_array(5).sum()}\")\n", "\n", "print(\"Files in H5 values storage:\")\n", "!ls .boh_cache/values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sql Storage (Call Storage Only)\n", "\n", "The `Sql` storage backend uses SQLAlchemy to store call metadata in a SQL database (like SQLite). It provides advanced querying capabilities but can only be used for the **calls** component of a `Cache`." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2026-03-22T19:56:50.961998Z", "iopub.status.busy": "2026-03-22T19:56:50.961789Z", "iopub.status.idle": "2026-03-22T19:56:51.006219Z", "shell.execute_reply": "2026-03-22T19:56:51.005233Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "() ()\n", "Executing power(2, 8)\n", "Calls in SQL storage:\n", "['bbbcff21a09114bee332b0f0da9085948aadc5177ac01dccbf54023426813087']\n" ] } ], "source": [ "from fleche.storage import Sql\n", "import os\n", "\n", "if os.path.exists('calls.db'): os.remove('calls.db')\n", "\n", "sql_cache = Cache(\n", " values=Memory({}),\n", " _calls=Sql(url='sqlite:///calls.db')\n", ")\n", "\n", "with cache(sql_cache):\n", " @fleche\n", " def power(a, b):\n", " print(f\"Executing power({a}, {b})\")\n", " return a ** b\n", "\n", " power(2, 8)\n", " power(2, 8)\n", "\n", "print(\"Calls in SQL storage:\")\n", "print(list(sql_cache.calls.list()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Mix and Match\n", "\n", "As seen in the `BagOfHoldingH5File` and `Sql` examples, you can mix different backends for values and calls. For instance, you might want to use `BagOfHoldingH5File` for large values but `Sql` for calls to enable efficient metadata querying." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2026-03-22T19:56:51.008027Z", "iopub.status.busy": "2026-03-22T19:56:51.007846Z", "iopub.status.idle": "2026-03-22T19:56:51.032418Z", "shell.execute_reply": "2026-03-22T19:56:51.031506Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "() ()\n", "Value storage: DestructuringStorage\n", "Call storage: Sql\n" ] } ], "source": [ "mixed_cache = Cache(\n", " values=BagOfHoldingH5File(root='.mixed_cache/values'),\n", " _calls=Sql(url='sqlite:///.mixed_cache/calls.db')\n", ")\n", "\n", "with cache(mixed_cache):\n", " @fleche\n", " def compute_heavy(x):\n", " return np.random.rand(x, x)\n", " \n", " compute_heavy(10)\n", " \n", "print(f\"Value storage: {type(mixed_cache.values).__name__}\")\n", "print(f\"Call storage: {type(mixed_cache.calls).__name__}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Clean Up\n", "\n", "Remove temporary cache directories." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2026-03-22T19:56:51.034211Z", "iopub.status.busy": "2026-03-22T19:56:51.033999Z", "iopub.status.idle": "2026-03-22T19:56:51.041707Z", "shell.execute_reply": "2026-03-22T19:56:51.040810Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "() ()\n", "Is gzip compressed: True\n" ] } ], "source": [ "import shutil\n", "shutil.rmtree('.compressed_pickle_cache', ignore_errors=True)\n", "compressed_cache = Cache(\n", " values=PickleFile.with_pickle(root='.compressed_pickle_cache/values', compress=True),\n", " _calls=PickleFile.with_pickle(root='.compressed_pickle_cache/calls', compress=True)\n", ")\n", "\n", "with cache(compressed_cache):\n", " @fleche\n", " def big_result(n):\n", " return \"a\" * n\n", " \n", " big_result(1000)\n", "\n", "import gzip\n", "file_path = list(Path('.compressed_pickle_cache/values').iterdir())[0]\n", "with open(file_path, 'rb') as f:\n", " header = f.read(2)\n", " # check for gzip magic number 0x1f 0x8b\n", " is_compressed = (header[0] == 0x1f) and (header[1] == 0x8b)\n", " print(f\"Is gzip compressed: {is_compressed}\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.13" } }, "nbformat": 4, "nbformat_minor": 4 }