Airflow Xcom Exclusive Portable
💡 : Use the TaskFlow API for the cleanest, most "exclusive" feeling data flow. It handles the keys and references for you, minimizing the risk of pulling the wrong data. If you'd like to dive deeper, I can show you: A code example of the TaskFlow API in action How to set up an S3 Custom Backend Ways to mask sensitive data in XCom logs Concepts — Airflow Documentation
In this example, task1 shares customer data through XCom, and task2 retrieves the data and processes it.
DB queries slow down, causing the Airflow Scheduler to lag. airflow xcom exclusive
from airflow import DAG from airflow.operators.python import PythonOperator from datetime import datetime def push_function(**context): # This value is automatically pushed to XCom return "secret_data_123" def pull_function(**context): ti = context['ti'] # Pull the value from the task 'push_task' value = ti.xcom_pull(task_ids='push_task') print(f"Pulled value: value") with DAG('xcom_traditional_example', start_date=datetime(2023,1,1), schedule=None) as dag: push_task = PythonOperator( task_id='push_task', python_callable=push_function ) pull_task = PythonOperator( task_id='pull_task', python_callable=pull_function ) push_task >> pull_task Use code with caution. B. The TaskFlow API Approach (Recommended)
"Airflow XCom exclusive" refers to the practice of pushing XCom data targeted specifically for one or more downstream tasks, ensuring no other tasks mistakenly consume or rely on that data. It is a best practice for maintaining modularity and preventing unintended dependencies between tasks. 💡 : Use the TaskFlow API for the
For the ultimate control—what you might call an "exclusive" backend in the truest sense—you can build a custom XCom class. This is the solution when you need to simultaneously store XComs in multiple locations (e.g., S3 and a local cache), enforce strict data-type policies (allowing only JSON-serializable objects), or implement complex encryption or compression logic beyond the built-in options. This method requires subclassing airflow.models.xcom.BaseXCom and overriding its core methods. The key methods to implement are serialize_value and deserialize_value , which control how data is prepared for storage and how it's reconstructed when pulled. You can also override the clear method to manage the lifecycle of data in your custom backend when tasks are retried or cleared.
If your tasks pass sensitive connection tokens, temporary credentials, or PII via XCom, you must protect that data from appearing in plain text within the Airflow Web UI. Airflow automatically masks keys containing terms like secret , password , auth , or token . DB queries slow down, causing the Airflow Scheduler to lag
If you are using traditional operators (like PythonOperator or BashOperator ), never pull all XComs globally. Always restrict the xcom_pull function by specifying the exact task_ids .