Core Concepts¶
BSPump is built around a composable architecture where data flows through pipelines consisting of sources, processors, and sinks. Understanding these core concepts is essential for building effective data pipelines.
Architecture Overview¶
┌─────────────────────────────────────────────────────────────┐
│ Application │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ Pipeline │ │
│ │ │ │
│ │ ┌────────┐ ┌───────────┐ ┌───────────┐ ┌────┐ │ │
│ │ │ Source │──▶│ Processor │──▶│ Processor │──▶│Sink│ │ │
│ │ └────────┘ └───────────┘ └───────────┘ └────┘ │ │
│ │ │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Connection │ │ Lookup │ │ Trigger │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────┘
Key Components¶
- Pipeline
The core abstraction that chains components together. Events flow from source through processors to the sink.
- Source
Entry point for data. Sources can pull data (polling) or receive data (push-based like webhooks). See Source.
- Processor
Transforms, filters, or enriches events. Multiple processors can be chained together. See Processor.
- Sink
Exit point for data. Sinks write events to external systems, files, or other destinations. See Sink.
- Connection
Shared, reusable connections to external systems (databases, message queues, etc.). See Connection.
- Lookup
Data enrichment tables that can be used to add context to events. See Lookup.
- Trigger
Controls when sources produce events (cron schedules, pub/sub, etc.). See Trigger.
Event Flow¶
Events flow through the pipeline in a linear fashion:
Source generates or receives an event
Event passes through each Processor in order
Each processor can transform, filter, or split the event
Sink receives the final event and outputs it
Events can be any Python object, but are commonly:
Bytes (raw data)
Dictionaries (structured data)
Dataclasses or typed objects
Async-First Design¶
BSPump is built on Python’s asyncio, enabling:
Non-blocking I/O operations
High concurrency with minimal threads
Efficient handling of many simultaneous connections
Natural integration with async libraries