Skip to content

scystream-sdk documentation

Release Notes#

scystream-sdk 1.0.0 - Release Notes#

The first version of the scystream sdk.

Providing basic functionalities:

Defining entrypoints.
Introducing the cbc.yaml.
Defining settings for entrypoints, parsing ENV-Variables.
Parsing & cross validation of cbc.yaml with actual code definition.
Reading & writing to a postgres database, integrating Apache Spark.
Reading & writing from/to a S3 bucket, currently not using Apache Spark.

scystream-sdk 1.1.0 - Release Notes#

Updated ENV settings.

We have added two predefined classes for postgres and file in- and outputs.

These contain default ENV-Keys such as the PG_HOST for defining the host of a postgres in-/ouput or S3_ACCESS_KEY for defining the access key of an S3 Bucket.

These settings can ultimatively be used to configure the corresponding connections.

scystream-sdk 1.2.0 - Release Notes#

Removed adjustable config path from SDKConfig

The configuration option config_path was removed. Every compute block repository must now contain the cbc.yaml within it’s root directory and with the file-name: cbc.yaml

scystream-sdk 1.3.0 - Release Notes#

Added#

Introduced PandasPostgresOperations for interacting with PostgreSQL using Pandas DataFrames
Added BasePostgresOperations abstraction to unify database operations across backends
Added support for optional PostgreSQL dependencies via extras_require["postgres"]
Introduced table name validation (PostgreSQL 63 character limit)

Changed#

Refactored PostgreSQL handling into backend-specific implementations:
- SparkPostgresOperations
- PandasPostgresOperations
Simplified database interaction API to remove duplicated logic across compute blocks
Updated internal structure to better support future extensibility (e.g. additional backends)
Added DB_NAME to PostgresSettings and PostgresConfig
Change PG_PORT type to int in PostgresSettings and PostgresConfig

Breaking Changes#

DB_NAME is now a required configuration parameter in both PostgresConfig and PostgresSettings
Removed implicit/default database usage (e.g. defaulting to "postgres")
Users must now explicitly install optional dependencies:
- scystream-sdk[postgres] for Pandas-based operations

scystream-sdk 1.4.0 - Release Notes#

Added#

Introduced DSN-based database configuration across the SDK
Added support for generic database backends via SQLAlchemy-compatible DSNs
Introduced DatabaseSettings to replace database-specific settings
Unified database interaction to support multiple backends (e.g. PostgreSQL, MySQL, SQLite)

Changed#

Replaced PostgresSettings with DatabaseSettings - Now requires:
- DB_DSN (full database connection string)
- DB_TABLE (target table name)
Refactored database operations to be backend-agnostic
Updated Spark and Pandas database integrations to use DSN-based connections
Simplified database configuration by removing individual connection parameters (host, port, user, password)

Breaking Changes#

Removed PostgresConfig entirely
Replaced PostgresSettings with DatabaseSettings
Input/Output type pg_table has been renamed to database_table
Users must now provide a full DSN instead of individual connection parameters
Existing compute blocks using PostgreSQL-specific configuration must be migrated

scystream-sdk 1.5.0 - Release Notes#

Added#

Added support for database schemas in database operations
Introduced optional DB_SCHEMA configuration for DatabaseSettings
Schema is now applied automatically to all database read/write operations
Added internal normalization of schema values (e.g. "" and whitespace are treated as None)

Notes#

Schema support is primarily intended for PostgreSQL
Automatic schema creation is currently only supported for PostgreSQL
For other databases (e.g. MySQL, SQLite), schema behavior depends on the underlying backend