Skip to content

Data Generators

Lariv relies heavily on programmatically generated synthetic data during local testing and CI/CD pipelines.

The Architecture

Due to the strictly decoupled plugin nature of the project, data generation scripts cannot simply be run in an arbitrary order. For example, the p_invoices plugin cannot generate mock invoices until the p_orders plugin has generated mock orders.

To solve this, Lariv uses a topological sorting GeneratorRegistry.

Writing a Generator

Every plugin that needs data generation implements a typical Python class that is decorated with GeneratorRegistry.register.

# plugins/p_invoices/generator.py
from lariv.registry import GeneratorRegistry

@GeneratorRegistry.register("invoices_generator")
class InvoiceGenerator:
    # Explicitly define dependencies on other generators by their string keys
    dependencies = ["users_generator", "orders_generator"]

    def run(self):
        """ The entrypoint method invoked by generate_data """
        from .models import Invoice
        print("Generating mock invoices...")
        Invoice.objects.create(...)

Running Generators

The generate_data.py script (accessible via python manage.py generate_data) acts as the conductor.

# lariv/generate_data.py
from lariv.registry import GeneratorRegistry

class DataGenerator:
    def generate_all_data(self):
        print("\nStarting data generation...")
        # Automatically resolves the DAG dependencies and executes `.run()` in order
        GeneratorRegistry.run_all()

When generate_all_data runs, GeneratorRegistry.run_all() resolves the dependency graph and ensures users_generator and orders_generator complete successfully before attempting to execute InvoiceGenerator.run().