Fleche Storage Backends

This notebook demonstrates the usage of the different storage backends available in fleche. In fleche, a Cache is composed of two storage components:

  1. values: Stores the actual results of the functions.

  2. calls: Stores the metadata about the function calls (arguments, function name, etc.) and references to the stored values.

You can mix and match different storage backends for values and calls to suit your needs.

Memory Storage

The Memory storage backend keeps all the cached data in memory. This is the simplest backend and is useful for testing or when you don’t need to persist the cache beyond the current process.

[ ]:
from fleche import fleche, cache
from fleche.caches import Cache
from fleche.storage import ValueMemory, CallMemory

# Using Memory for both values and calls
memory_cache = Cache(values=ValueMemory({}), calls=CallMemory({}))

with cache(memory_cache):
    @fleche
    def add(a, b):
        print(f"Executing add({a}, {b})")
        return a + b

    print(f"Result 1: {add(2, 3)}")
    print(f"Result 2: {add(2, 3)}") # This should be cached
print('Cache keys in calls storage:', list(memory_cache.calls.list()))

PickleFile Storage

The PickleFile storage backend serializes data using Python’s standard pickle module and stores it in individual files. It requires a root directory.

[ ]:
import shutil
from pathlib import Path
from fleche.storage import ValuePickleFile, CallPickleFile

shutil.rmtree('.pickle_cache', ignore_errors=True)
pickle_cache = Cache(
    values=ValuePickleFile.with_pickle(root='.pickle_cache/values'),
    calls=CallPickleFile.with_pickle(root='.pickle_cache/calls')
)

with cache(pickle_cache):
    @fleche
    def greet(name):
        print(f"Executing greet({name})")
        return f"Hello, {name}!"

    print(greet("World"))
    print(greet("World"))

print("Files in calls storage:")
!ls .pickle_cache/calls

Compressed PickleFile Storage

You can also enable gzip compression for PickleFile (and its variants CloudpickleFile and DillFile) by passing compress=True.

[ ]:
import shutil
shutil.rmtree('.compressed_pickle_cache', ignore_errors=True)
compressed_cache = Cache(
    values=ValuePickleFile.with_pickle(root='.compressed_pickle_cache/values', compress=True),
    calls=CallPickleFile.with_pickle(root='.compressed_pickle_cache/calls', compress=True)
)

with cache(compressed_cache):
    @fleche
    def big_result(n):
        return "a" * n

    big_result(1000)

file_path = list(Path('.compressed_pickle_cache/values').iterdir())[0]
with open(file_path, 'rb') as f:
    header = f.read(2)
    # check for gzip magic number 0x1f 0x8b
    is_compressed = (header[0] == 0x1f) and (header[1] == 0x8b)
    print(f"Is gzip compressed: {is_compressed}")

CloudpickleFile Storage

The CloudpickleFile storage backend is similar to PickleFile but uses cloudpickle for serialization. cloudpickle can handle more complex Python objects, like lambdas or functions defined interactively, that standard pickle might struggle with.

[ ]:
shutil.rmtree('.cloudpickle_cache', ignore_errors=True)
cp_cache = Cache(
    values=ValuePickleFile.with_cloudpickle(root='.cloudpickle_cache/values'),
    calls=CallPickleFile.with_cloudpickle(root='.cloudpickle_cache/calls')
)

with cache(cp_cache):
    @fleche
    def mul(a, b):
        print(f"Executing mul({a}, {b})")
        return a * b

    print(f"Result: {mul(3, 4)}")
    print(f"Result: {mul(3, 4)}")

print("Files in values storage:")
!ls .cloudpickle_cache/values

DillFile Storage

The DillFile storage backend is similar to CloudpickleFile but uses dill for serialization.

[ ]:
shutil.rmtree('.dill_cache', ignore_errors=True)
dill_cache = Cache(
    values=ValuePickleFile.with_dill(root='.dill_cache/values'),
    calls=CallPickleFile.with_dill(root='.dill_cache/calls')
)

with cache(dill_cache):
    @fleche
    def add(a, b):
        print(f"Executing add({a}, {b})")
        return a + b

    print(f"Result: {add(3, 4)}")
    print(f"Result: {add(3, 4)}")

print("Files in values storage:")
!ls .dill_cache/values

BagOfHoldingH5File Storage

The BagOfHoldingH5File storage backend uses the bagofholding library to store data in HDF5 files. This is particularly efficient for large numerical arrays (e.g., NumPy arrays).

[ ]:
import numpy as np
from fleche.storage import ValueBagOfHoldingH5File

shutil.rmtree('.boh_cache', ignore_errors=True)
boh_cache = Cache(
    values=ValueBagOfHoldingH5File(root='.boh_cache/values'),
    calls=CallPickleFile.with_cloudpickle(root='.boh_cache/calls')  # We can use Cloudpickle for calls
)

with cache(boh_cache):
    @fleche
    def make_array(n):
        print(f"Executing make_array({n})")
        return np.ones((n, n))

    print(f"Array sum: {make_array(5).sum()}")
    print(f"Array sum: {make_array(5).sum()}")

print("Files in H5 values storage:")
!ls .boh_cache/values

Sql Storage (Call Storage Only)

The Sql storage backend uses SQLAlchemy to store call metadata in a SQL database (like SQLite). It provides advanced querying capabilities but can only be used for the calls component of a Cache.

[ ]:
from fleche.storage import Sql
import os

if os.path.exists('calls.db'): os.remove('calls.db')

sql_cache = Cache(
    values=ValueMemory({}),
    calls=Sql(url='sqlite:///calls.db')
)

with cache(sql_cache):
    @fleche
    def power(a, b):
        print(f"Executing power({a}, {b})")
        return a ** b

    power(2, 8)
    power(2, 8)

print("Calls in SQL storage:")
print(list(sql_cache.calls.list()))

Mix and Match

As seen in the BagOfHoldingH5File and Sql examples, you can mix different backends for values and calls. For instance, you might want to use BagOfHoldingH5File for large values but Sql for calls to enable efficient metadata querying.

[ ]:
mixed_cache = Cache(
    values=ValueBagOfHoldingH5File(root='.mixed_cache/values'),
    calls=Sql(url='sqlite:///.mixed_cache/calls.db')
)

with cache(mixed_cache):
    @fleche
    def compute_heavy(x):
        return np.random.rand(x, x)

    compute_heavy(10)

print(f"Value storage: {type(mixed_cache.values).__name__}")
print(f"Call storage: {type(mixed_cache.calls).__name__}")

Clean Up

Remove temporary cache directories.

[ ]:
import shutil, os
for d in ('.pickle_cache', '.compressed_pickle_cache', '.cloudpickle_cache',
          '.dill_cache', '.boh_cache', '.mixed_cache'):
    shutil.rmtree(d, ignore_errors=True)
if os.path.exists('calls.db'):
    os.remove('calls.db')
print('Cleaned up temporary cache directories.')