# Confluent Cloud Flink SQL Documentation > Comprehensive documentation for Apache Flink SQL dialect used in Confluent Cloud. This includes SQL syntax, functions, operators, and best practices for stream processing with Flink SQL in Confluent Cloud. ## Overview This documentation covers: - Flink SQL syntax and semantics - Built-in functions and operators - Stream processing concepts - Confluent Cloud specific features - Best practices and examples ## Core Documentation - [Flink SQL Autopilot in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/concepts/autopilot.html): Flink SQL Autopilot for Confluent Cloud¶ Autopilot scales up and scales down the compute resources that SQL statements use in Confluent Cloud for Apache Flink®. Autopilot assigns resources efficiently... - [Batch and Stream Processing in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/concepts/batch-and-stream-processing.html): Batch and Stream Processing in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports both batch and stream processing, which enables you to process data in either finite (bounde... - [Comparing Apache Flink with Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/concepts/comparison-with-apache-flink.html): Comparing Apache Flink with Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports many of the capabilities of Apache Flink® and provides additional features. Also, Confluent Clo... - [Compute Pools in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/concepts/compute-pools.html): Compute Pools in Confluent Cloud for Apache Flink¶ A compute pool in Confluent Cloud for Apache Flink® represents a set of compute resources bound to a region that is used to run your SQL statements. - [Delivery Guarantees and Latency in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/concepts/delivery-guarantees.html): Delivery Guarantees and Latency in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides exactly-once semantics end-to-end by default, which mean that every input message is ref... - [Determinism with continuous Flink SQL queries in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/concepts/determinism.html): Determinism in Continuous Queries on Confluent Cloud for Apache Flink¶ This topic answers the following questions about determinism in Confluent Cloud for Apache Flink®: What is determinism? Is all ba... - [Tables and Topics in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/concepts/dynamic-tables.html): Tables and Topics in Confluent Cloud for Apache Flink¶ Apache Flink® and the Table API use the concept of dynamic tables to facilitate the manipulation and processing of streaming data. Dynamic tables... - [Billing in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/concepts/flink-billing.html): Billing on Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® is a serverless stream-processing platform with usage-based pricing, where you are charged only for the duration that you... - [Private Networking with Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/concepts/flink-private-networking.html): Private Networking with Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports private networking on AWS, Azure, and Google Cloud. This feature enables Flink to securely read and... - [Stream Processing Concepts in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/concepts/overview.html): Stream Processing Concepts in Confluent Cloud for Apache Flink¶ Apache Flink® SQL, a high-level API powered by Confluent Cloud for Apache Flink, offers a simple and easy way to leverage the power of s... - [Schema and Statement Evolution with Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/concepts/schema-statement-evolution.html): Schema and Statement Evolution with Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables evolving your statements over time as your schemas change. This topic describes these co... - [Snapshot Queries in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/concepts/snapshot-queries.html): Snapshot Queries in Confluent Cloud for Apache Flink¶ In Confluent Cloud for Apache Flink®, a snapshot query is a query that reads data from a table at a specific point in time. In contrast with a str... - [Statement CFU Metrics in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/concepts/statement-cfu-metrics.html): Statement CFU Metrics in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides detailed metrics to help you understand and manage your resource utilization. One critical aspect - [Flink SQL Statements in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/concepts/statements.html): Flink SQL Statements in Confluent Cloud for Apache Flink¶ In Confluent Cloud for Apache Flink®, a statement represents a high-level resource that’s created when you enter a SQL query. Each statement h... - [Time and Watermarks in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/concepts/timely-stream-processing.html): Time and Watermarks in Confluent Cloud for Apache Flink¶ Timely stream processing is an extension of stateful stream processing that incorporates time into the computation. It’s commonly used for time... - [User-defined Functions in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/concepts/user-defined-functions.html): User-defined Functions in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports user-defined functions (UDFs), which are extension points for running custom logic that you can’t... - [FAQ for Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/flink-faq.html): Frequently Asked Questions for Confluent Cloud for Apache Flink¶ This topic provides answers to frequently asked questions about Confluent Cloud for Apache Flink®. What is Confluent Cloud for Apache F... - [Get Help with Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/get-help.html): Get Help with Confluent Cloud for Apache Flink¶ You can request support in the Confluent Support Portal. You can access the portal directly, or you can navigate to it from the Confluent Cloud Console - [Get Started with Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/get-started/overview.html): Get Started with Confluent Cloud for Apache Flink¶ Welcome to Confluent Cloud for Apache Flink®. This section guides you through the steps to get your queries running using the Confluent Cloud Console... - [Flink SQL Quick Start on Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/get-started/quick-start-cloud-console.html): Flink SQL Quick Start with Confluent Cloud Console¶ This quick start gets you up and running with Confluent Cloud for Apache Flink®. The following steps show how to create a workspace for running SQL - [Java Table API Quick Start on Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/get-started/quick-start-java-table-api.html): Java Table API Quick Start on Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports programming applications with the Table API. Confluent provides a plugin for running applicat... - [Python Table API Quick Start on Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/get-started/quick-start-python-table-api.html): Python Table API Quick Start on Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports programming applications with the Table API. Confluent provides a plugin for running applic... - [SQL Shell Quick Start on Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/get-started/quick-start-shell.html): Flink SQL Shell Quick Start on Confluent Cloud for Apache Flink¶ This quick start walks you through the following steps to get you up and running with Confluent Cloud for Apache Flink®. Step 1: Log in... - [Aggregate a Data Stream in a Tumbling Window with Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/how-to-guides/aggregate-tumbling-window.html): Aggregate a Stream in a Tumbling Window with Confluent Cloud for Apache Flink¶ Aggregation over windows is central to processing streaming data. Confluent Cloud for Apache Flink® supports Windowing Ta... - [Combine Streams and Track Most Recent Records with Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/how-to-guides/combine-and-track-most-recent-records.html): Combine Streams and Track Most Recent Records with Confluent Cloud for Apache Flink¶ When working with streaming data, it’s common to need to combine information from multiple sources while tracking t... - [Compare Current and Previous Values in a Data Stream with Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/how-to-guides/compare-current-and-previous-values.html): Compare Current and Previous Values in a Data Stream with Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides a LAG function, which is a built-in function that enables you to - [Convert the Serialization Format of a Topic with Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/how-to-guides/convert-serialization-format.html): Convert the Serialization Format of a Topic with Confluent Cloud for Apache Flink¶ This guide shows how to use Confluent Cloud for Apache Flink® to transform a topic serialized in Avro Schema Registry... - [Create a User-Defined Function with Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/how-to-guides/create-udf.html): Create a User-Defined Function with Confluent Cloud for Apache Flink¶ A user-defined function (UDF) extends the capabilities of Confluent Cloud for Apache Flink® and enables you to implement custom lo... - [Deduplicate Rows in a Table with Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/how-to-guides/deduplicate-rows.html): Deduplicate Rows in a Table with Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables generating a table that contains only unique records from an input table with only a few cl... - [Log Debug Messages in a User Defined Function with Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/how-to-guides/enable-udf-logging.html): Log Debug Messages in a User Defined Function for Confluent Cloud for Apache Flink¶ When you create a user defined function (UDF) with Confluent Cloud for Apache Flink®, you have the option of logging... - [Mask Fields in a Table with Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/how-to-guides/mask-fields.html): Mask Fields in a Table with Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables generating a topic that contains masked fields from an input topic with only a few clicks. In th... - [Handle Multiple Event Types In Tables in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/how-to-guides/multiple-event-types.html): Handle Multiple Event Types with Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides several ways to work with Kafka topics containing multiple event types. This guide explain... - [How-to Guides for Developing Flink Applications on Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/how-to-guides/overview.html): How-to Guides for Confluent Cloud for Apache Flink¶ Discover how Confluent Cloud for Apache Flink® can help you accomplish common processing tasks such as joins and aggregations. This section provides... - [Process schemaless events with Flink SQL in Confluent Cloud | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/how-to-guides/process-schemaless-events.html): Process Schemaless Events with Confluent Cloud for Apache Flink¶ This guide explains how use Confluent Cloud for Apache Flink to handle and process events in Apache Kafka® topics that don’t use serial... - [Profile a Query with Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/how-to-guides/profile-query.html): Profile a Query with Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables you to profile the performance of your queries. The Query Profiler provides enhanced visibility into ho... - [Resolve Statement Issues in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/how-to-guides/resolve-common-query-problems.html): Resolve Statement Issues in Confluent Cloud for Apache Flink¶ Inefficient Flink SQL queries in Confluent Cloud for Apache Flink® can cause performance issues that impact your data processing pipeline.... - [Run a Snapshot Query with in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/how-to-guides/run-snapshot-query.html): Run a Snapshot Query with Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports snapshot queries that read data from a table at a specific point in time. In contrast with a stre... - [Scan and Summarize Flink Tables in Confluent Cloud | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/how-to-guides/scan-and-summarize-tables.html): Scan and Summarize Tables with Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides graphical tools in your workspaces that enable scanning and summarizing data visually in Fli... - [Transform a Topic with Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/how-to-guides/transform-topic.html): Transform a Topic with Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables generating a transformed topic from an input topic’s properties, like partition count, key, serializa... - [View Time Series Data in Confluent Cloud | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/how-to-guides/view-time-series-data.html): View Time Series Data with Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables visualizing time-series data in real time. The output of certain SQL statements render as time-se... - [Best Practices for Moving SQL Statements to Production in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/operate-and-deploy/best-practices.html): Move SQL Statements to Production in Confluent Cloud for Apache Flink¶ When you move your Flink SQL statements to production in Confluent Cloud for Apache Flink®, consider the following recommendation... - [Carry-over Offsets in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/operate-and-deploy/carry-over-offsets.html): Carry-over Offsets in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports carry-over offsets, which means that you can use the topic offsets from one statement to start a new - [Manage Flink Compute Pools in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/operate-and-deploy/create-compute-pool.html): Manage Compute Pools in Confluent Cloud for Apache Flink¶ A compute pool represents the compute resources that are used to run your SQL statements. The resources provided by a compute pool are shared - [Deploy a Flink SQL Statement in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/operate-and-deploy/deploy-flink-sql-statement.html): Deploy a Flink SQL Statement Using CI/CD and Confluent Cloud for Apache Flink¶ GitHub Actions is a powerful feature on GitHub that enables automating your software development workflows. If your sourc... - [Grant Role-Based Access for Flink SQL Statements in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/operate-and-deploy/flink-rbac.html): Grant Role-Based Access in Confluent Cloud for Apache Flink¶ When deploying Flink SQL statements in production, you must configure appropriate access controls for different types of users and workload... - [Flink REST API in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/operate-and-deploy/flink-rest-api.html): Flink SQL REST API for Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides a REST API for managing your Flink SQL statements, compute pools, and connections programmatically. - [Generate an API key for Programmatic Access to Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/operate-and-deploy/generate-api-key-for-flink.html): Generate an API Key for Access in Confluent Cloud for Apache Flink¶ To manage Flink workloads programmatically in Confluent Cloud for Apache Flink®, you need an API key that’s specific to Flink. You c... - [Manage Flink Connections in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/operate-and-deploy/manage-connections.html): Manage Connections in Confluent Cloud for Apache Flink¶ A connection in Confluent Cloud for Apache Flink® represents an external service that is used in your Flink statements. Connections are used to - [Monitor and Manage Flink SQL Statements in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/operate-and-deploy/monitor-statements.html): Monitor and Manage Flink SQL Statements in Confluent Cloud for Apache Flink¶ You start a stream-processing app on Confluent Cloud for Apache Flink® by running a SQL statement. Once a statement is runn... - [Operate and Deploy Flink SQL Statements with Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/operate-and-deploy/overview.html): Operate and Deploy Flink Statements with Confluent Cloud for Apache Flink¶ Confluent provides tools for operating Confluent Cloud for Apache Flink® in the Cloud Console, the Confluent CLI, the Conflue... - [Enable Private Networking with Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/operate-and-deploy/private-networking.html): Enable Private Networking with Confluent Cloud for Apache Flink¶ You have these options for using private networking with Confluent Cloud for Apache Flink®. PrivateLink Attachment: Works with any type... - [Flink SQL Query Profiler in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/operate-and-deploy/query-profiler.html): Flink SQL Query Profiler in Confluent Cloud for Apache Flink¶ The Query Profiler is a tool in Confluent Cloud for Apache Flink® that provides enhanced visibility into how a Flink SQL statement is proc... - [Stream Processing with Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/overview.html): Stream Processing with Confluent Cloud for Apache Flink¶ Apache Flink® is a powerful, scalable stream processing framework for running complex, stateful, low-latency streaming applications on large vo... - [Supported Cloud Regions for Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/cloud-regions.html): Supported Cloud Regions for Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® is available on AWS, Azure, and Google Cloud. Flink is supported in the following regions. AWS supported... - [Flink SQL Data Types in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/datatypes.html): Data Types in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® has a rich set of native data types that you can use in SQL statements and queries. The query planner supports the fol... - [Example Data Streams in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/example-data.html): Example Data Streams in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides an Examples catalog that has mock data streams you can use for experimenting with Flink SQL queries... - [Confluent CLI commands with Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/flink-sql-cli.html): Confluent CLI commands with Confluent Cloud for Apache Flink¶ Manage Flink SQL statements and compute pools in Confluent Cloud for Apache Flink® by using the confluent flink commands in the Confluent - [SQL Information Schema in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/flink-sql-information-schema.html): Information Schema in Confluent Cloud for Apache Flink¶ An information schema, or data dictionary, is a standard SQL schema with a collection of predefined views that enable accessing metadata about o... - [SQL aggregate functions in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/functions/aggregate-functions.html): Aggregate Functions in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides these built-in functions to aggregate rows in Flink SQL queries: AVG COLLECT COUNT CUME_DIST DENSE_R... - [SQL Collection Functions in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/functions/collection-functions.html): Collection Functions in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides these built-in collection functions to use in Flink SQL queries: ARRAY ARRAY_AGG ARRAY_APPEND ARRAY... - [SQL comparison functions in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/functions/comparison-functions.html): Comparison Functions in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides these built-in comparison functions to use in SQL queries: Equality operations Logical operations C... - [SQL conditional functions in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/functions/conditional-functions.html): Conditional Functions in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides these built-in functions for controlling execution flow in SQL queries: CASE CASE WHEN CONDITION C... - [SQL Datetime Functions in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/functions/datetime-functions.html): Datetime Functions in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides these built-in functions for handling date and time logic in SQL queries: Date Time Timestamp Utility... - [SQL hash functions in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/functions/hash-functions.html): Hash Functions in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides these built-in functions to generate hash codes in SQL queries: MD5 SHA1 SHA2 SHA224 SHA256 SHA384 SHA512... - [SQL JSON functions in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/functions/json-functions.html): JSON Functions in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides these built-in functions to help with JSON in SQL queries: IS JSON JSON_ARRAY JSON_ARRAYAGG JSON_EXISTS J... - [Machine-Learning Preprocessing Functions in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/functions/ml-preprocessing-functions.html): Machine-Learning Preprocessing Functions in Confluent Cloud for Apache Flink¶ The following built-in functions are available for ML preprocessing in Confluent Cloud for Apache Flink®. These functions - [AI Model Inference and Machine Learning Functions in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/functions/model-inference-functions.html): AI Model Inference and Machine Learning Functions in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides built-in functions for invoking remote AI/ML models in Flink SQL queri... - [SQL numeric functions in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/functions/numeric-functions.html): Numeric Functions in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides these built-in numeric functions to use in SQL queries: Numeric Trigonometry Random number generators - [SQL Functions in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/functions/overview.html): Flink SQL Functions in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables you to do data transformations and other operations with the following built-in functions. Aggregate - [SQL string functions in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/functions/string-functions.html): String Functions in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides these built-in string functions to use in SQL queries: ASCII BTRIM string1 || string2 CHARACTER_LENGTH - [Table API functions in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/functions/table-api-functions.html): Table API in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports programming applications with the Table API. For more information, see the Table API Overview. To get started - [Flink SQL Keywords in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/keywords.html): Flink SQL Reserved Keywords in Confluent Cloud for Apache Flink¶ Keywords are words that have significance in Confluent Cloud for Apache Flink®. Some keywords, like AND, CHAR, and SELECT are reserved - [Flink SQL and Table API Reference in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/overview.html): Flink SQL and Table API Reference in Confluent Cloud for Apache Flink¶ This section describes the SQL language support in Confluent Cloud for Apache Flink®, including Data Definition Language (DDL) st... - [SQL Deduplication Queries in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/queries/deduplication.html): Deduplication Queries in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables removing duplicate rows over a set of columns in a Flink SQL table. Syntax¶ SELECT [column_list] FR... - [SQL Group Aggregation Queries in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/queries/group-aggregation.html): Group Aggregation Queries in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables computing a single result from multiple input rows in a Flink SQL table. Description¶ Compute a... - [SQL INSERT INTO FROM SELECT Statement in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/queries/insert-into-from-select.html): INSERT INTO FROM SELECT Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables inserting SELECT query results directly into a Flink SQL table. Syntax¶ [EXECUTE] INSER... - [SQL INSERT VALUES Statement in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/queries/insert-values.html): INSERT VALUES Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables inserting data directly into a Flink SQL table. Syntax¶ [EXECUTE] INSERT { INTO | OVERWRITE } [ca... - [SQL Join Queries in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/queries/joins.html): Join Queries in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables join data streams over Flink SQL dynamic tables. Description¶ Flink supports complex and flexible join opera... - [SQL LIMIT clause in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/queries/limit.html): LIMIT Clause in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables constraining the number of rows returned by a SELECT statement. Description¶ The LIMIT clause constrains the... - [SQL Pattern Recognition Queries in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/queries/match_recognize.html): Pattern Recognition Queries in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables pattern detection in event streams. Syntax¶ SELECT T.aid, T.bid, T.cid FROM MyTable MATCH_REC... - [SQL ORDER BY Clause in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/queries/orderby.html): ORDER BY Clause in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables sorting rows from a SELECT statement. Description¶ The ORDER BY clause causes the result rows to be sorte... - [SQL OVER Aggregation Queries in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/queries/over-aggregation.html): OVER Aggregation Queries in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables computing an aggregated value for every row over a range of ordered rows. Syntax¶ SELECT agg_fun... - [SQL Queries in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/queries/overview.html): Flink SQL Queries in Confluent Cloud for Apache Flink¶ In Confluent Cloud for Apache Flink®, Data Manipulation Language (DML) statements, also known as queries, are declarative verbs that read and mod... - [SQL SELECT statement in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/queries/select.html): SELECT Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables querying the content of your tables by using familiar SELECT syntax. Syntax¶ SELECT [DISTINCT] select_li... - [SQL Set Logic in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/queries/set-logic.html): Set Logic in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables set logic operations on tables in SQL statements. EXCEPT EXISTS IN INTERSECT UNION Example data¶ The following - [SQL Statement Sets in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/queries/statement-set.html): EXECUTE STATEMENT SET in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables executing multiple SQL statements as a single, optimized statement by using statement sets. Syntax¶... - [SQL Top-N queries in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/queries/topn.html): Top-N Queries in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables finding the smallest or largest values, ordered by columns, in a table. Syntax¶ SELECT [column_list] FROM (... - [SQL Window Aggregation Queries in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/queries/window-aggregation.html): Window Aggregation Queries in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables aggregating data over windows in a table. Syntax¶ SELECT ... FROM -- relation... - [SQL Window Deduplication Queries in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/queries/window-deduplication.html): Window Deduplication Queries in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables removing duplicate rows over a set of columns in a windowed table. Syntax¶ SELECT [column_li... - [SQL Window Join Queries in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/queries/window-join.html): Window Join Queries in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables joining data over time windows in dynamic tables. Syntax¶ The following shows the syntax of the INNER... - [SQL Window Top-N Queries in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/queries/window-topn.html): Window Top-N Queries in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables Window Top-N queries in dynamic tables. Syntax¶ SELECT [column_list] FROM ( SELECT [column_list], RO... - [SQL Windowing Table-Valued Functions in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/queries/window-tvf.html): Windowing Table-Valued Functions (Windowing TVFs) in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides several window table-valued functions (TVFs) for dividing the elements... - [SQL WITH Clause in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/queries/with.html): WITH Clause in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables writing auxiliary statements to use in larger SQL queries. Syntax¶ WITH [ , ... ] SELE... - [Data Type Mappings with Flink SQL Statements in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/serialization.html): Data Type Mappings in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports records in the Avro Schema Registry, JSON_SR, and Protobuf Schema Registry formats. Avro schemas JSON... - [Flink SQL Examples in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/sql-examples.html): Flink SQL Examples in Confluent Cloud for Apache Flink¶ The following code examples show common Flink SQL use cases with Confluent Cloud for Apache Flink®. CREATE TABLE Inferred tables ALTER TABLE SEL... - [Flink SQL Syntax in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/sql-syntax.html): Flink SQL Syntax in Confluent Cloud for Apache Flink¶ SQL is a domain-specific language for managing and manipulating data. It’s used primarily to work with structured data, where the types and relati... - [SQL ALTER CONNECTION Statement in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/statements/alter-connection.html): ALTER CONNECTION Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports creating secure connections to external services and data sources. You can use these connecti... - [SQL ALTER MODEL Statement in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/statements/alter-model.html): ALTER MODEL Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables real-time inference and prediction with AI models. Use the CREATE MODEL statement to register an AI... - [SQL ALTER TABLE Statement in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/statements/alter-table.html): ALTER TABLE Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables changing properties of an existing table. Syntax¶ ALTER TABLE [catalog_name.][db_name.]table_name {... - [SQL ALTER VIEW Statement in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/statements/alter-view.html): ALTER VIEW Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables modifying properties of an existing view. Syntax¶ ALTER VIEW [catalog_name.][db_name.]view_name RENA... - [SQL CREATE CONNECTION Statement in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/statements/create-connection.html): CREATE CONNECTION Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports creating secure connections to external services and data sources. You can use these connect... - [Flink SQL CREATE TABLE Statement in Confluent Cloud | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/statements/create-function.html): CREATE FUNCTION Statement¶ Confluent Cloud for Apache Flink® enables registering customer user defined functions (UDFs) by using the CREATE FUNCTION statement. When your UDFs are registered in a Flink... - [SQL CREATE MODEL Statement in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/statements/create-model.html): CREATE MODEL Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables real-time inference and prediction with AI and ML models. The Flink SQL interface is available in - [SQL CREATE TABLE Statement in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/statements/create-table.html): CREATE TABLE Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables creating tables backed by Apache Kafka® topics by using the CREATE TABLE statement. With Flink tab... - [SQL CREATE VIEW Statement in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/statements/create-view.html): CREATE VIEW Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables creating views based on statement expressions by using the CREATE VIEW statement. With Flink views,... - [SQL DESCRIBE Statement in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/statements/describe.html): DESCRIBE Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables viewing the schema of an Apache Kafka® topic. Also, you can view details of an AI model, function, or - [SQL DROP CONNECTION Statement in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/statements/drop-connection.html): DROP CONNECTION Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports creating secure connections to external services and data sources. You can use these connectio... - [SQL DROP MODEL Statement in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/statements/drop-model.html): DROP MODEL Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables real-time inference and prediction with AI models. Use the CREATE MODEL statement to register an AI - [SQL DROP TABLE Statement in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/statements/drop-table.html): DROP TABLE Statement in Confluent Cloud for Apache Flink¶ The DROP TABLE statement removes a table definition from Confluent Cloud for Apache Flink® and, depending on the table type, will also delete - [SQL DROP VIEW Statement in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/statements/drop-view.html): DROP VIEW Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables dropping views using the DROP VIEW statement. When a view is dropped, its definition is removed from - [SQL EXPLAIN Statement in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/statements/explain.html): EXPLAIN Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables viewing and analyzing the query plans of Flink SQL statements. Syntax¶ EXPLAIN { | Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables setting the current Apache Kafka® cluster with the USE statement. Syntax¶ U... - [Table API on Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/table-api.html): Table API on Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports programming applications with the Table API in Java and Python. Confluent provides a plugin for running applic... - [SQL Timezone Types in Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/flink/reference/timezone.html): Timezone Types in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides rich data types for date and time, including these: DATE TIME TIMESTAMP TIMESTAMP_LTZ INTERVAL YEAR TO MO... - [Flink authentication and authorization event methods (Confluent Cloud audit logs) | Confluent Documentation](https://docs.confluent.io/cloud/current/monitoring/audit-logging/event-methods/flink-authn-authz.html): Flink Authentication and Authorization Auditable Event Methods on Confluent Cloud¶ Expand all examples | Collapse all examples Confluent Cloud audit logs contain records of auditable events for authen... - [Auditable event methods for Apache Flink (Confluent Cloud) | Confluent Documentation](https://docs.confluent.io/cloud/current/monitoring/audit-logging/event-methods/flink.html): Auditable Event Methods for Apache Flink on Confluent Cloud¶ Auditable event methods for Confluent Cloud for Apache Flink are triggered by operations on Apache Flink® in Confluent Cloud and send event... - [Query Encrypted Data with Flink & Confluent Cloud | Confluent Documentation](https://docs.confluent.io/cloud/current/security/encrypt/csfle/flink-integration.html): Secure Stream Processing: Query Encrypted Data with Flink on Confluent Cloud¶ Processing sensitive data like personally identifiable information (PII) or financial records in real-time data streams pr... - [Query Tableflow Tables with Confluent Cloud for Apache Flink | Confluent Documentation](https://docs.confluent.io/cloud/current/topics/tableflow/how-to-guides/query-engines/query-with-flink.html): Query Tableflow Tables with Flink in Confluent Cloud for Apache Flink®¶ Confluent Cloud for Apache Flink® supports snapshot queries that read data from a Tableflow-enabled topic at a specific point in... - [confluent flink application create | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/application/confluent_flink_application_create.html): confluent flink application create Description Create a Flink application. confluent flink application create [flags] Flags --environment string REQUIRED: Name of the Flink envir... - [confluent flink application delete | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/application/confluent_flink_application_delete.html): confluent flink application delete Description Delete one or more Flink applications. confluent flink application delete [name-2] ... [name-n] [flags] Flags --environment string REQUIRED: - [confluent flink application describe | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/application/confluent_flink_application_describe.html): confluent flink application describe Description Describe a Flink application. confluent flink application describe [flags] Flags --environment string REQUIRED: Name of the Flink environment... - [confluent flink application list | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/application/confluent_flink_application_list.html): confluent flink application list Description List Flink applications. confluent flink application list [flags] Flags --environment string REQUIRED: Name of the Flink environment. --url string Base - [confluent flink application update | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/application/confluent_flink_application_update.html): confluent flink application update Description Update a Flink application. confluent flink application update [flags] Flags --environment string REQUIRED: Name of the Flink envir... - [confluent flink application web-ui-forward | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/application/confluent_flink_application_web-ui-forward.html): confluent flink application web-ui-forward Description Forward the web UI of a Flink application. confluent flink application web-ui-forward [flags] Flags --environment string REQUIRED: Name... - [confluent flink application | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/application/index.html): confluent flink application Aliases application, app Description Manage Flink applications. Subcommands Command Description confluent flink application create Create a Flink application. confluent... - [confluent flink artifact create | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/artifact/confluent_flink_artifact_create.html): confluent flink artifact create Description Create a Flink UDF artifact. confluent flink artifact create [flags] Flags --artifact-file string REQUIRED: Flink artifact JAR file or ZIP - [confluent flink artifact delete | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/artifact/confluent_flink_artifact_delete.html): confluent flink artifact delete Description Delete one or more Flink UDF artifacts. confluent flink artifact delete [id-2] ... [id-n] [flags] Flags --cloud string REQUIRED: Specify the cloud... - [confluent flink artifact describe | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/artifact/confluent_flink_artifact_describe.html): confluent flink artifact describe Description Describe a Flink UDF artifact. confluent flink artifact describe [flags] Flags --cloud string REQUIRED: Specify the cloud provider as "aws", "azur... - [confluent flink artifact list | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/artifact/confluent_flink_artifact_list.html): confluent flink artifact list Description List Flink UDF artifacts. confluent flink artifact list [flags] Flags --cloud string REQUIRED: Specify the cloud provider as "aws", "azure", or "gcp". --re... - [confluent flink artifact | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/artifact/index.html): confluent flink artifact Description Manage Flink UDF artifacts. Subcommands Command Description confluent flink artifact create Create a Flink UDF artifact. confluent flink artifact delete Delete - [confluent flink catalog create | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/catalog/confluent_flink_catalog_create.html): confluent flink catalog create Description Create a Flink catalog in Confluent Platform that provides metadata about tables and other database objects such as views and functions. confluent flink ca... - [confluent flink catalog delete | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/catalog/confluent_flink_catalog_delete.html): confluent flink catalog delete Description Delete one or more Flink catalogs in Confluent Platform. confluent flink catalog delete [name-2] ... [name-n] [flags] Flags --url string Base URL... - [confluent flink catalog describe | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/catalog/confluent_flink_catalog_describe.html): confluent flink catalog describe Description Describe a Flink catalog in Confluent Platform. confluent flink catalog describe [flags] Flags --url string Base URL of the Confluent Manager for... - [confluent flink catalog list | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/catalog/confluent_flink_catalog_list.html): confluent flink catalog list Description List Flink catalogs in Confluent Platform. confluent flink catalog list [flags] Flags --url string Base URL of the Confluent Manager for Apache Flink (CMF).... - [confluent flink catalog | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/catalog/index.html): confluent flink catalog Description Manage Flink catalogs in Confluent Platform. Subcommands Command Description confluent flink catalog create Create a Flink catalog. confluent flink catalog delet... - [confluent flink compute-pool create | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/compute-pool/confluent_flink_compute-pool_create.html): confluent flink compute-pool create Description Cloud Create a Flink compute pool. confluent flink compute-pool create [flags] On-Premises Create a Flink compute pool in Confluent Platform. c... - [confluent flink compute-pool delete | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/compute-pool/confluent_flink_compute-pool_delete.html): confluent flink compute-pool delete Description Cloud Delete one or more Flink compute pools. confluent flink compute-pool delete [id-2] ... [id-n] [flags] On-Premises Delete one or more Flin... - [confluent flink compute-pool describe | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/compute-pool/confluent_flink_compute-pool_describe.html): confluent flink compute-pool describe Description Cloud Describe a Flink compute pool. confluent flink compute-pool describe [id] [flags] On-Premises Describe a Flink compute pool in Confluent Platf... - [confluent flink compute-pool list | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/compute-pool/confluent_flink_compute-pool_list.html): confluent flink compute-pool list Description Cloud List Flink compute pools. confluent flink compute-pool list [flags] On-Premises List Flink compute pools in Confluent Platform. confluent flink co... - [confluent flink compute-pool unset | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/compute-pool/confluent_flink_compute-pool_unset.html): confluent flink compute-pool unset Description Unset the current Flink compute pool that was set with the use command. confluent flink compute-pool unset [flags] Flags -o, --output string Specify t... - [confluent flink compute-pool update | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/compute-pool/confluent_flink_compute-pool_update.html): confluent flink compute-pool update Description Update a Flink compute pool. confluent flink compute-pool update [id] [flags] Flags --name string Name of the compute pool. --max-cfu int32 Maximum n... - [confluent flink compute-pool use | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/compute-pool/confluent_flink_compute-pool_use.html): confluent flink compute-pool use Description Choose a Flink compute pool to be used in subsequent commands which support passing a compute pool with the --compute-pool flag. confluent flink compute-... - [confluent flink compute-pool | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/compute-pool/index.html): confluent flink compute-pool Description Manage Flink compute pools. Subcommands Cloud Command Description confluent flink compute-pool create Create a Flink compute pool. confluent flink compute-p... - [confluent flink shell | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/confluent_flink_shell.html): confluent flink shell Description Start Flink interactive SQL client. confluent flink shell [flags] Flags --compute-pool string Flink compute pool ID. --service-account string Service account ID. -... - [confluent flink connection create | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/connection/confluent_flink_connection_create.html): confluent flink connection create Description Create a Flink connection. confluent flink connection create [flags] Flags --cloud string REQUIRED: Specify the cloud provider as "aws", "azure"... - [confluent flink connection delete | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/connection/confluent_flink_connection_delete.html): confluent flink connection delete Description Delete one or more Flink connections. confluent flink connection delete [name-2] ... [name-n] [flags] Flags --cloud string REQUIRED: Specify t... - [confluent flink connection describe | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/connection/confluent_flink_connection_describe.html): confluent flink connection describe Description Describe a Flink connection. confluent flink connection describe [flags] Flags --cloud string REQUIRED: Specify the cloud provider as "aws", "... - [confluent flink connection list | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/connection/confluent_flink_connection_list.html): confluent flink connection list Description List Flink connections. confluent flink connection list [flags] Flags --cloud string REQUIRED: Specify the cloud provider as "aws", "azure", or "gcp". --... - [confluent flink connection update | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/connection/confluent_flink_connection_update.html): confluent flink connection update Description Update a Flink connection. Only secret can be updated. confluent flink connection update [flags] Flags --cloud string REQUIRED: Specify the clou... - [confluent flink connection | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/connection/index.html): confluent flink connection Description Manage Flink connections. Subcommands Command Description confluent flink connection create Create a Flink connection. confluent flink connection delete Delet... - [confluent flink connectivity-type use | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/connectivity-type/confluent_flink_connectivity-type_use.html): confluent flink connectivity-type use Description Select a Flink connectivity type for the current environment as “public” or “private”. If unspecified, the CLI will default to public connectivity t... - [confluent flink connectivity-type | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/connectivity-type/index.html): confluent flink connectivity-type Description Manage Flink connectivity type. Subcommands Command Description confluent flink connectivity-type use Select a Flink connectivity type. - [confluent flink endpoint list | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/endpoint/confluent_flink_endpoint_list.html): confluent flink endpoint list Description List Flink endpoint. confluent flink endpoint list [flags] Flags --context string CLI context name. -o, --output string Specify the output format as "human... - [confluent flink endpoint unset | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/endpoint/confluent_flink_endpoint_unset.html): confluent flink endpoint unset Description Unset the current Flink endpoint that was previously set with the use command. confluent flink endpoint unset [flags] Global Flags -h, --help Show help fo... - [confluent flink endpoint use | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/endpoint/confluent_flink_endpoint_use.html): confluent flink endpoint use Description Use a Flink endpoint as active endpoint for all subsequent Flink dataplane commands in current environment, such as flink connection, flink statement and fli... - [confluent flink endpoint | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/endpoint/index.html): confluent flink endpoint Description Manage Flink endpoint. Subcommands Command Description confluent flink endpoint list List Flink endpoint. confluent flink endpoint unset Unset the current Flink... - [confluent flink environment create | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/environment/confluent_flink_environment_create.html): confluent flink environment create Description Create a Flink environment. confluent flink environment create [flags] Flags --kubernetes-namespace string REQUIRED: Kubernetes namespace to de... - [confluent flink environment delete | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/environment/confluent_flink_environment_delete.html): confluent flink environment delete Description Delete one or more Flink environments. confluent flink environment delete [name-2] ... [name-n] [flags] Flags --url string Base URL of the Co... - [confluent flink environment describe | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/environment/confluent_flink_environment_describe.html): confluent flink environment describe Description Describe a Flink environment. confluent flink environment describe [flags] Flags --url string Base URL of the Confluent Manager for Apache Fl... - [confluent flink environment list | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/environment/confluent_flink_environment_list.html): confluent flink environment list Description List Flink environments. confluent flink environment list [flags] Flags --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environme... - [confluent flink environment update | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/environment/confluent_flink_environment_update.html): confluent flink environment update Description Update a Flink environment. confluent flink environment update [flags] Flags --url string Base URL of the Confluent Manager for Apache Flink (C... - [confluent flink environment | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/environment/index.html): confluent flink environment Aliases environment, env Description Manage Flink environments. Subcommands Command Description confluent flink environment create Create a Flink environment. confluent... - [confluent flink | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/index.html): confluent flink Description Manage Apache Flink. Subcommands Cloud Command Description confluent flink artifact Manage Flink UDF artifacts. confluent flink compute-pool Manage Flink compute pools. - [confluent flink region list | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/region/confluent_flink_region_list.html): confluent flink region list Description List Flink regions. confluent flink region list [flags] Flags --cloud string Specify the cloud provider as "aws", "azure", or "gcp". --context string CLI con... - [confluent flink region unset | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/region/confluent_flink_region_unset.html): confluent flink region unset Description Unset the current Flink cloud and region that was set with the use command. confluent flink region unset [flags] Global Flags -h, --help Show help for this - [confluent flink region use | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/region/confluent_flink_region_use.html): confluent flink region use Description Choose a Flink region to be used in subsequent commands which support passing a region with the --region flag. confluent flink region use [flags] Flags --clou... - [confluent flink region | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/region/index.html): confluent flink region Description Manage Flink regions. Subcommands Command Description confluent flink region list List Flink regions. confluent flink region unset Unset the current Flink cloud a... - [confluent flink statement create | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/statement/confluent_flink_statement_create.html): confluent flink statement create Description Cloud Create a Flink SQL statement. confluent flink statement create [name] [flags] On-Premises Create a Flink SQL statement in Confluent Platform. confl... - [confluent flink statement delete | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/statement/confluent_flink_statement_delete.html): confluent flink statement delete Description Cloud Delete one or more Flink SQL statements. confluent flink statement delete [name-2] ... [name-n] [flags] On-Premises Delete one or more Fli... - [confluent flink statement describe | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/statement/confluent_flink_statement_describe.html): confluent flink statement describe Description Cloud Describe a Flink SQL statement. confluent flink statement describe [flags] On-Premises Describe a Flink SQL statement in Confluent Platfor... - [confluent flink statement list | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/statement/confluent_flink_statement_list.html): confluent flink statement list Description Cloud List Flink SQL statements. confluent flink statement list [flags] On-Premises List Flink SQL statements in Confluent Platform. confluent flink statem... - [confluent flink statement rescale | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/statement/confluent_flink_statement_rescale.html): confluent flink statement rescale Description Rescale a Flink SQL statement in Confluent Platform. confluent flink statement rescale [flags] Flags --environment string REQUIRED: Na... - [confluent flink statement resume | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/statement/confluent_flink_statement_resume.html): confluent flink statement resume Description Cloud Resume a Flink SQL statement. confluent flink statement resume [flags] On-Premises Resume a Flink SQL statement in Confluent Platform. confl... - [confluent flink statement stop | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/statement/confluent_flink_statement_stop.html): confluent flink statement stop Description Cloud Stop a Flink SQL statement. confluent flink statement stop [flags] On-Premises Stop a Flink SQL statement in Confluent Platform. confluent fli... - [confluent flink statement update | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/statement/confluent_flink_statement_update.html): confluent flink statement update Description Update a Flink SQL statement. confluent flink statement update [flags] Flags --principal string A user or service account the statement runs as. - [confluent flink statement web-ui-forward | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/statement/confluent_flink_statement_web-ui-forward.html): confluent flink statement web-ui-forward Description Forward the web UI of a Flink statement in Confluent Platform. confluent flink statement web-ui-forward [flags] Flags --environment strin... - [confluent flink statement exception list | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/statement/exception/confluent_flink_statement_exception_list.html): confluent flink statement exception list Description Cloud List exceptions for a Flink SQL statement. confluent flink statement exception list [flags] On-Premises List exceptions fo... - [confluent flink statement exception | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/statement/exception/index.html): confluent flink statement exception Description Manage Flink SQL statement exceptions. Subcommands Command Description confluent flink statement exception list List exceptions for a Flink SQL state... - [confluent flink statement | Confluent Documentation](https://docs.confluent.io/confluent-cli/current/command-reference/flink/statement/index.html): confluent flink statement Description Manage Flink SQL statements. Subcommands Cloud Command Description confluent flink statement create Create a Flink SQL statement. confluent flink statement del... - [Manage Confluent Platform for Apache Flink Applications Using Confluent for Kubernetes | Confluent Documentation](https://docs.confluent.io/operator/current/co-manage-flink.html): Manage Flink Applications Using Confluent for Kubernetes Apache Flink® is a powerful, scalable, and secure stream processing framework for running complex, stateful, low-latency streaming application... ## Full Documentation Content ### Flink SQL Autopilot in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/concepts/autopilot.html Flink SQL Autopilot for Confluent Cloud¶ Autopilot scales up and scales down the compute resources that SQL statements use in Confluent Cloud for Apache Flink®. Autopilot assigns resources efficiently to SQL statements submitted in Confluent Cloud and provides elastic autoscaling for the entire time the job is running. One of the biggest benefits of using Confluent Cloud for Apache Flink is the built-in Autopilot capability. Autopilot takes care of all the work required to scale up or scale down the compute resources that a SQL statement consumes. Resources are scaled up when a SQL statement has an increased need for resources and scaled down when resources are not being used. This is all done automatically, and no manual work is required to monitor or adjust resources. This removes the complexity of managing your own infrastructure, removes the need for over-provisioning, and ensures that you never have to pay more than needed. The autoscaling process is based on parallelism, which is the number of parallel operations that occur when the SQL statement is running. A SQL statement performs at its best when it has the optimal resources for its required parallelism. Scaling status¶ The scaling status in the SQL workspace shows you how the statement resources are scaling. These are the possible scaling statuses. Scaling Status Description Fine The SQL statement has enough resources to run at the required parallelism. Pending Scale Down The SQL statement has more resources than required and will be scaled down. Pending Scale Up The SQL statement doesn’t have enough resources and will be scaled up. Compute Pool Exhausted There aren’t enough resources in the compute pool for the statement to run with the required parallelism. Compute Pool Exhausted¶ The compute pool has run out of resources. SQL statements may run with a reduced parallelism, which could affect the overall performance of the statement, or a statement may not be able to run at all, because all resources in the compute pool are in use. There are two ways to resolve this situation: You can add more resources by increasing the CFU limit on the compute pool. You can stop some running statements to free up existing resources. Messages Behind¶ Messages Behind is another indicator of how the statement is performing. The overall goal of Autopilot is to ensure that the SQL statement keeps up with the throughput of the source tables and topics, and to keep Messages Behind as close to zero as possible. In Apache Kafka® terms, Messages Behind is the Consumer Lag. A low or decreasing Messages Behind value indicates that Autopilot is doing its job successfully. The following table describes scenarios in which Autopilot is scaling resources correctly or where it may be struggling. Messages Behind and Scaling Status Description Messages Behind is increasing Scaling status = “Pending Scale Up” Autopilot has identified a need for scaling up and will increase the Statement resources. Once resources have been scaled up, the Messages Behind should start decreasing. Messages Behind is increasing Scaling status = “Fine” There is likely a problem. Reach out to Confluent Support. For more information, see Get Help with Confluent Cloud for Apache Flink. Messages Behind is not increasing Compute Pool is Exhausted The statement resources can keep up with throughput but Autopilot needs to assign more resources to improve performance capacity. You can either add more resources by increasing the CFU limit on the compute pool or stop some running statements to free up existing resources. Related content¶ Compute Pools DDL Statements Determinism in Continuous Queries Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. --- ### Batch and Stream Processing in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/concepts/batch-and-stream-processing.html Batch and Stream Processing in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports both batch and stream processing, which enables you to process data in either finite (bounded) or infinite (unbounded) modes. Understanding the differences between these modes is crucial for designing efficient data pipelines and analytics solutions. Overview¶ Flink is a distributed processing engine that excels at both batch and stream processing. While both modes share the same underlying engine and APIs, they have distinct characteristics, optimizations, and use cases. In Confluent Cloud for Apache Flink, batch mode is available by using snapshot queries. Batch processing¶ Batch processing in Flink operates on bounded datasets, which are finite, static collections of data. This processing mode has the following key characteristics. It processes complete, finite datasets, like files or database snapshots. Batch jobs run to completion and then terminate. It is optimized for throughput, focusing on processing large volumes of data efficiently. Batch processing can sort, aggregate, and join across the entire dataset. The system can drop state as soon as it is no longer needed. Use cases: - Historical data analysis - ETL (Extract, Transform, Load) operations - Report generation - Data warehousing Stream processing¶ Stream processing in Flink handles unbounded data streams, which have data that arrives continuously and might never end. This processing mode has the following key characteristics. It processes infinite, continuous data streams, such as Kafka topics or sensor feeds. Stream processing jobs run indefinitely, processing data as it arrives. It focuses on processing data with minimal delay for low latency. It produces incremental results as new data arrives. The system must retain state to handle late or out-of-order events. Use cases: - Real-time analytics - Fraud detection - IoT data processing - Live dashboards Bounded and unbounded tables comparison¶ In Flink, tables can be either bounded (batch) or unbounded (streaming). The following table compares the key differences between these two modes. Aspect Bounded Mode (Batch) Unbounded Mode (Streaming) Data Size Finite (static) Infinite (dynamic, continuous) Processing Style Batch processing Real-time/continuous processing Query Semantics All data available at once Data arrives over time State Management Minimal, can drop state when done Must retain state for late/out-of-order data Use Cases ETL, reporting, historical analytics Real-time analytics, monitoring, alerting Differences between batch and stream processing¶ The following table compares the important differences between batch and stream processing. Aspect Batch Processing Stream Processing Data Model Processes complete, finite datasets. Processes infinite, continuous data streams. Execution Model Jobs run to completion. Jobs run continuously. Latency vs. Throughput Optimized for high throughput. Optimized for low latency. State Management Minimal state, which is dropped when no longer needed. Robust state, which is retained for late or out-of-order data. Fault Tolerance Can restart from the beginning. Requires checkpointing for fault recovery. Query Semantics All data is available at once, so global operations are possible. Data arrives over time, so results are incremental. SQL/API Differences ORDER BY: You can use any sort order. Windowing: Supports time-based windows on static data. Deduplication: Deduplication is global. ORDER BY: The primary sort must be on a time attribute. Windowing: Uses windows to scope aggregations over unbounded data. Deduplication: Deduplication is incremental and often uses windows. Session Windows: Supported. Unified processing model¶ One important advantage of Flink is its unified processing model. This means that the same runtime engine handles both batch and streaming. The engine treats batch processing as a special case of stream processing. The same APIs and operators work for both modes. You can use the same code for both batch and streaming applications. This unified approach enables you to: Build applications that process both historical and real-time data. Seamlessly transition between batch and streaming modes. Maintain consistent semantics across processing modes. Leverage the same tools and libraries for both paradigms. Time and watermarks¶ Time and watermarks are important concepts in Flink that help you process data correctly. Batch mode: Time is fixed. All data is available, so event time and processing time are equivalent. Streaming mode: Time is dynamic. Streaming mode uses watermarks to track event time progress and handle out-of-order data. Windowing: In streaming, you use windows (tumbling, hopping, cumulative, session) to group data for aggregation. In batch, windows apply to static data. For more information, see Time and Watermarks. Determinism¶ Determinism is a key concept in Flink that helps you ensure that your queries always produce the same results. Batch: Re-running a batch job on the same data yields the same result, except for non-deterministic functions like UUID(). Streaming: Results can vary due to timing, order of arrival, and late data. Determinism is harder to guarantee. For more information, see Determinism in Continuous Queries. Snapshot queries and batch mode¶ In Confluent Cloud for Apache Flink, batch mode is available by using snapshot queries. Snapshot queries: These are batch queries that automatically bound the input sources as of the current time. Batch optimizations: Batch mode enables optimizations like global sorting, blocking operators, and efficient joins. Snapshot queries benefit from these optimizations. Resource usage: Batch jobs, which are snapshot queries in Confluent Cloud for Apache Flink, release resources when finished. Streaming jobs hold resources as long as they run. For more information, see Snapshot Queries. Examples¶ The following code example shows a batch query. -- Count all orders in a bounded table SELECT COUNT(*) FROM orders; The following code example shows a streaming query. -- Count orders per minute in an unbounded stream. SELECT window_start, window_end, COUNT(*) FROM TABLE( TUMBLE(TABLE orders, DESCRIPTOR(order_time), INTERVAL '1' MINUTE)) GROUP BY window_start, window_end; Related content¶ Deduplication Determinism in Continuous Queries ORDER BY Clause Snapshot Queries Time and Watermarks Window Aggregation Window Deduplication Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql -- Count all orders in a bounded table SELECT COUNT(*) FROM orders; ``` ```sql -- Count orders per minute in an unbounded stream. SELECT window_start, window_end, COUNT(*) FROM TABLE( TUMBLE(TABLE orders, DESCRIPTOR(order_time), INTERVAL '1' MINUTE)) GROUP BY window_start, window_end; ``` --- ### Comparing Apache Flink with Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/concepts/comparison-with-apache-flink.html Comparing Apache Flink with Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports many of the capabilities of Apache Flink® and provides additional features. Also, Confluent Cloud for Apache Flink has some different behaviors and limitations relative to Apache Flink. This topic describes the key differences between Confluent Cloud for Apache Flink and Apache Flink. Additional features¶ The following list shows features provided by Confluent Cloud for Apache Flink that go beyond what Apache Flink offers. Auto-inference of environments, clusters, topics, and schemas¶ In Apache Flink, you must define and configure your tables and their schemas, including authentication and authorization to Apache Kafka®. Confluent Cloud for Apache Flink maps environments, clusters, topics, and schemas automatically from Confluent Cloud to the corresponding Apache Flink concepts of catalogs, databases, tables, and table schemas. Autoscaling¶ Autopilot scales up and scales down the compute resources that SQL statements use in Confluent Cloud. The autoscaling process is based on parallelism, which is the number of parallel operations that occur when the SQL statement is running. A SQL statement performs at its best when it has the optimal resources for its required parallelism. Default system column implementation¶ Confluent Cloud for Apache Flink has a default implementation for a system column named $rowtime. This column is mapped to the Kafka record timestamp, which can be either LogAppendTime or CreateTime. Default watermark strategy¶ Flink requires a watermark strategy for a variety of features, such as windowing and temporal joins. Confluent Cloud for Apache Flink has a default watermark strategy applied on all tables/topics, which is based on the $rowtime system column. Apache Flink requires you to define a watermark strategy manually. For more information, see Event Time and Watermarks. Because the default strategy is defined for general usage, there are cases that require a custom strategy, for example, when delays in record arrival of longer than 7 days occur in your streams. You can override the default strategy with a custom strategy by using the ALTER TABLE statement. Schema Registry support for JSON_SR and Protobuf¶ Confluent Cloud for Apache Flink has support for Schema Registry formats AVRO, JSON_SR, and Protobuf, while Apache Flink currently supports only Schema Registry AVRO. INFORMATION_SCHEMA support¶ Confluent Cloud for Apache Flink has an implementation for IMPLEMENTATION_SCHEMA, which is a system view that provides insights on catalogs, databases, tables, and schemas. This doesn’t exist in Apache Flink. Behavioral differences¶ The following list shows differences in behavior between Confluent Cloud for Apache Flink and Apache Flink. Configuration options¶ Apache Flink supports various optimization configuration options on different levels, like Execution Options, Optimizer Options, Table Options, and SQL Client Options. Confluent Cloud for Apache Flink supports only the necessary subset of these options. Some of these options have different names in Confluent Cloud for Apache Flink, as shown in the following table. Confluent Cloud for Apache Flink Apache Flink client.results-timeout table.exec.async-lookup.timeout client.statement-name – sql.current-catalog table.builtin-catalog-name sql.current-database table.builtin-database-name sql.dry-run – sql.inline-result – sql.local-time-zone table.local-time-zone sql.state-ttl table.exec.state.ttl sql.tables.scan.bounded.timestamp-millis scan.bounded.timestamp-millis sql.tables.scan.bounded.mode scan.bounded.mode sql.tables.scan.idle-timeout table.exec.source.idle-timeout sql.tables.scan.startup.timestamp-millis scan.startup.timestamp-millis sql.tables.scan.startup.mode scan.startup.mode sql.tables.scan.watermark-alignment.max-allowed-drift scan.watermark.alignment.max-drift CREATE statements provision underlying resources¶ When you run a CREATE TABLE statement in Confluent Cloud for Apache Flink, it creates the underlying Kafka topic and a Schema Registry schema in Confluent Cloud. In Apache Flink, a CREATE TABLE statement only registers the object in the Apache Flink catalog and doesn’t create an underlying resource. This also means that temporary tables are not supported in Confluent Cloud for Apache Flink, while they are in Apache Flink. One Kafka connector and only Confluent Cloud support¶ Apache Flink contains a Kafka connector and an Upsert-Kafka connector, which, combined with the format, defines whether the source/sink is treated as an append-stream or update stream. Confluent Cloud for Apache Flink has only one Kafka connector and determines if the source/sink is an append-stream or update stream by examining the changelog.mode connector option. Confluent Cloud for Apache Flink only supports reading from and writing to Kafka topics that are located on Confluent Cloud. Apache Flink supports other connectors, like Kinesis, Pulsar, JDBC, etc., and also other Kafka environments, like on-premises and different cloud service providers. Limitations¶ The following list shows limitations of Confluent Cloud for Apache Flink compared with Apache Flink. Windowing functions syntax¶ Confluent Cloud for Apache Flink supports the TUMBLE, HOP, SESSION, and CUMULATE windowing functions only by using so-called Table-Valued Functions syntax. Apache Flink supports these windowing functions also by using the outdated Group Window Aggregations functions. Unsupported statements and features¶ Confluent Cloud for Apache Flink does not support the following statements and features. ANALYZE statements CALL statements CATALOG commands other than SHOW (No CREATE/DROP/ALTER) DATABASE command other than SHOW (No CREATE/DROP/ALTER) DELETE statements DROP CATALOG and DROP DATABASE JAR statements LOAD / UNLOAD statements TRUNCATE statements UPDATE statements Processing time operations, like PROCTIME(), TUMBLE_PROCTIME, HOP_PROCTIME, SESSION_PROCTIME, and CUMULATE_PROCTIME Limited support for ALTER¶ Confluent Cloud for Apache Flink has limited support for ALTER TABLE compared with Apache Flink. In Confluent Cloud for Apache Flink, you can use ALTER TABLE only to change the watermark strategy, add a metadata column, or change a parameter value. Related content¶ Flink SQL Autopilot Compute Pools DDL Statements in Confluent Cloud for Apache Flink Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql LogAppendTime ``` ```sql TUMBLE_PROCTIME ``` ```sql HOP_PROCTIME ``` ```sql SESSION_PROCTIME ``` ```sql CUMULATE_PROCTIME ``` --- ### Compute Pools in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/concepts/compute-pools.html Compute Pools in Confluent Cloud for Apache Flink¶ A compute pool in Confluent Cloud for Apache Flink® represents a set of compute resources bound to a region that is used to run your SQL statements. The resources provided by a compute pool are shared between all statements that use it. The capacity of a compute pool is measured in CFUs. Compute pools expand and shrink automatically based on the resources required by the statements using them. A compute pool without any running statements scale down to zero. The maximum size of a compute pool is configured during creation. A compute pool is provisioned in a specific region. The statements using a compute pool can only read and write Apache Kafka® topics in the same region as the compute pool. Compute pools fulfill two roles: Workload Isolation: Statements in different compute pools are isolated from each other. Budgeting: Statements within a compute pool can’t use more than the configured maximum number of CFUs. Compute pools and isolation¶ All statements using the same compute pool compete for resources. Although Confluent Cloud’s Autopilot aims to provide each statement with the resources it needs, this might not always be possible, in particular, when the maximum resources of the compute pool are exhausted. To avoid situations in which statements with different latency and availability requirements compete for resources, Confluent recommends using separate compute pools for different use cases, for example, ad-hoc exploration vs. mission-critical, long-running queries. Because statements may affect each other, Confluent recommends sharing compute pools only between statements with comparable requirements. Manage compute pools¶ You can use these Confluent tools to create and manage compute pools. Cloud Console Confluent CLI REST API Confluent Terraform Provider Authorization¶ You must be authorized to create, update, delete (FlinkAdmin) or use (FlinkDeveloper) a compute pool. For more information, see Grant Role-Based Access in Confluent Cloud for Apache Flink. Move statements between compute pools¶ You can move a statement from one compute pool to another. This can be useful if you’re close to maxing out the resources in one pool. To move a running statement, you must stop the statement, change its compute pool, then restart the statement. Related content¶ Billing on Confluent Cloud for Apache Flink DDL Statements #### Code Examples ```sql FlinkDeveloper ``` --- ### Delivery Guarantees and Latency in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/concepts/delivery-guarantees.html Delivery Guarantees and Latency in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides exactly-once semantics end-to-end by default, which mean that every input message is reflected exactly-once in the output of a statement and every output message is delivered exactly once. To achieve this, Confluent Cloud for Apache Flink relies on Apache Flink®’s checkpointing mechanism and Kafka transactions. While checkpointing and fault tolerance falls into Confluent’s responsibility, it is important to understand the implications of how Flink reads from and writes to Kafka: Flink statements write to Kafka by using transactions. Transactions are committed periodically, approximately every minute. Flink by default only reads committed messages from Kafka. For more information, see isolation.level. This implies that depending on the delivery guarantees required by your use case, you can currently achieve different end-to-end latencies with Flink. Exactly-Once: If you require exactly-once, the latency is roughly one minute and is dominated by the interval at which Kafka transactions are committed. In this case, ensure that all consumers of the output topics of Flink statements use isolation.level: read_committed or set the Flink table option 'kafka.consumer.isolation-level' = 'read-committed'. At-Least-Once: If at-least-once is sufficient for your use case, you can read from the output topics with isolation-level: read_uncommitted, which is the default in Kafka, or set the Flink table option 'kafka.consumer.isolation-level' = 'read-uncommitted'. With this configuration, depending on the logic of your query, you can achieve an end-to-end latency below 100 ms, but you may see some output messages twice. This happens when Flink needs to abort a transaction that your consumer has already read. You won’t see incorrect results, but you may see the same correct result multiple times. Note Confluent is actively working on reducing the latency under exactly-once semantics. If your use case requires a lower latency, reach out to Support or your account manager. Related content¶ Statements Determinism Flink SQL Queries Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql isolation.level: read_committed ``` ```sql 'kafka.consumer.isolation-level' = 'read-committed' ``` ```sql isolation-level: read_uncommitted ``` ```sql 'kafka.consumer.isolation-level' = 'read-uncommitted' ``` --- ### Determinism with continuous Flink SQL queries in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/concepts/determinism.html Determinism in Continuous Queries on Confluent Cloud for Apache Flink¶ This topic answers the following questions about determinism in Confluent Cloud for Apache Flink®: What is determinism? Is all batch processing deterministic? Two examples of batch queries with non-deterministic results Non-determinism in batch processing Determinism in stream processing Non-determinism in streaming Non-deterministic update in streaming What is determinism?¶ Paraphrasing the SQL standard’s description of determinism, an operation is deterministic if it reliably computes identical results when repeated with identical input values. Determinism for regular batch queries¶ In a classic batch scenario, repeated execution of the same query for a given bounded data set will yield consistent results, which is the most intuitive understanding of determinism. In practice, however, the same query does not always return consistent results on a batch process either, as shown by the following example queries. Two examples of batch queries with non-deterministic results¶ For example, consider a newly created website click log table: CREATE TABLE clicks ( uid VARCHAR(128), cTime TIMESTAMP(3), url VARCHAR(256) ) Some new records are written to the table: +------+---------------------+------------+ | uid | cTime | url | +------+---------------------+------------+ | Mary | 2023-08-22 12:00:01 | /home | | Bob | 2023-08-22 12:00:01 | /home | | Mary | 2023-08-22 12:00:05 | /prod?id=1 | | Liz | 2023-08-22 12:01:00 | /home | | Mary | 2023-08-22 12:01:30 | /cart | | Bob | 2023-08-22 12:01:35 | /prod?id=3 | +------+---------------------+------------+ The following query applies a time filter to the click log table and wants to return the last two minutes of click records: SELECT * FROM clicks WHERE cTime BETWEEN TIMESTAMPADD(MINUTE, -2, CURRENT_TIMESTAMP) AND CURRENT_TIMESTAMP; When the query was submitted at “2023-08-22 12:02:00”, it returned all 6 rows in the table, and when it was executed again a minute later, at “2023-08-22 12:03:00”, it returned only 3 items: +------+---------------------+------------+ | uid | cTime | url | +------+---------------------+------------+ | Liz | 2023-08-22 12:01:00 | /home | | Mary | 2023-08-22 12:01:30 | /cart | | Bob | 2023-08-22 12:01:35 | /prod?id=3 | +------+---------------------+------------+ Another query wants to add a unique identifier to each returned record, since the clicks table doesn’t have a primary key. SELECT UUID() AS uuid, * FROM clicks LIMIT 3; Executing this query twice in a row generates a different uuid identifier for each row: -- first execution +--------------------------------+------+---------------------+------------+ | uuid | uid | cTime | url | +--------------------------------+------+---------------------+------------+ | aaaa4894-16d4-44d0-a763-03f... | Mary | 2023-08-22 12:00:01 | /home | | ed26fd46-960e-4228-aaf2-0aa... | Bob | 2023-08-22 12:00:01 | /home | | 1886afc7-dfc6-4b20-a881-b0e... | Mary | 2023-08-22 12:00:05 | /prod?id=1 | +--------------------------------+------+---------------------+------------+ -- second execution +--------------------------------+------+---------------------+------------+ | uuid | uid | cTime | url | +--------------------------------+------+---------------------+------------+ | 95f7301f-bcf2-4b6f-9cf3-1ea... | Mary | 2023-08-22 12:00:01 | /home | | 63301e2d-d180-4089-876f-683... | Bob | 2023-08-22 12:00:01 | /home | | f24456d3-e942-43d1-a00f-fdb... | Mary | 2023-08-22 12:00:05 | /prod?id=1 | +--------------------------------+------+---------------------+------------+ Non-determinism in batch processing¶ The non-determinism in batch processing is caused mainly by the non-deterministic functions, as shown in the previous query examples, where the built-in functions CURRENT_TIMESTAMP and UUID() actually behave differently in batch processing. Compare with the following query: SELECT CURRENT_TIMESTAMP, * FROM clicks; CURRENT_TIMESTAMP is the same value on all records returned +-------------------------+------+---------------------+------------+ | CURRENT_TIMESTAMP | uid | cTime | url | +-------------------------+------+---------------------+------------+ | 2023-08-23 17:25:46.831 | Mary | 2023-08-22 12:00:01 | /home | | 2023-08-23 17:25:46.831 | Bob | 2023-08-22 12:00:01 | /home | | 2023-08-23 17:25:46.831 | Mary | 2023-08-22 12:00:05 | /prod?id=1 | | 2023-08-23 17:25:46.831 | Liz | 2023-08-22 12:01:00 | /home | | 2023-08-23 17:25:46.831 | Mary | 2023-08-22 12:01:30 | /cart | | 2023-08-23 17:25:46.831 | Bob | 2023-08-22 12:01:35 | /prod?id=3 | +-------------------------+------+---------------------+------------+ This difference is due to the fact that Flink SQL inherits the definition of functions from Apache Calcite, where there are two types of functions other than deterministic function: non-deterministic functions and dynamic functions. The non-deterministic functions are executed at runtime in clusters and evaluated per record. The dynamic functions determine the corresponding values only when the query plan is generated. They’re not executed at runtime, and different values are obtained at different times, but the same values are obtained during the same execution. Built-in dynamic functions are mainly temporal functions. Determinism in stream processing¶ A core difference between streaming and batch is the unboundedness of the data. Flink SQL abstracts streaming processing as the continuous query on dynamic tables. So the dynamic function in the batch query example is equivalent to a non-deterministic function in a streaming processing, where logically every change in the base table triggers the query to be executed. If the clicks log table in the previous example is from an Apache Kafka® topic that’s continuously written, the same query in stream mode returns a CURRENT_TIMESTAMP that changes over time. SELECT CURRENT_TIMESTAMP, * FROM clicks; For example: +-------------------------+------+---------------------+------------+ | CURRENT_TIMESTAMP | uid | cTime | url | +-------------------------+------+---------------------+------------+ | 2023-08-23 17:25:46.831 | Mary | 2023-08-22 12:00:01 | /home | | 2023-08-23 17:25:47.001 | Bob | 2023-08-22 12:00:01 | /home | | 2023-08-23 17:25:47.310 | Mary | 2023-08-22 12:00:05 | /prod?id=1 | +-------------------------+------+---------------------+------------+ Non-determinism in streaming¶ In addition to the non-deterministic functions, these are other factors that may generate non-determinism: Non-deterministic back read of a source connector. Querying based on processing time. Processing time is not supported in Confluent Cloud for Apache Flink. Clearing internal state data based on TTL. Non-deterministic back read of source connector¶ For Flink SQL, the determinism provided is limited to the computation only, because it doesn’t store user data itself. Here, it’s necessary to distinguish between the managed internal state in streaming mode and the user data itself. If the source connector’s implementation can’t provide deterministic back read, it brings non-determinism to the input data, which may produce non-deterministic results. Common examples are inconsistent data for multiple reads from a same offset, or requests for data that no longer exists because of the retention time, for example, when the requested data is beyond the configured TTL of a Kafka topic. Clear internal state data based on TTL¶ Because of the unbounded nature of stream processing, the internal state data maintained by long-running streaming queries in operations like regular join and group aggregation (non-windowed aggregation) may continuously increase. Setting a state TTL to clean up internal state data is often a necessary compromise but may make the computation results non-deterministic. The impact of the non-determinism on different queries is different. For some queries it only produces non-deterministic results, which means that the query works, but multiple runs fail to produce consistent results. But other queries can produce more serious effects, like incorrect results or runtime errors. The main cause of these failures is “non-deterministic update”. Non-deterministic update in streaming¶ Flink implements a complete incremental update mechanism based on the continuous query on dynamic tables abstraction. All operations that need to generate incremental messages maintain complete internal state data, and the operation of the entire query pipeline, including the complete DAG from source to sink operators, relies on the guarantee of correct delivery of update messages between operators, which can be broken by non-determinism, leading to errors. What is a “non-deterministic Update” (NDU)? Update messages (the changelog) may contain these kinds of message types: Insert (I) Delete (D) Update_Before (UB) Update_After (UA) In an insert-only changelog pipeline, there’s no NDU problem. When there is an update message containing at least one message D, UB, UA in addition to I, the update key of the message, which can be regarded as the primary key of the changelog, is deduced from the query. When the update key can be deduced, the operators in the pipeline maintain the internal state by the update key. When the update key can’t be deduced, it’s possible that the primary key isn’t defined in the CDC source table or sink table, or some operations can’t be deduced from the semantics of the query. All operators maintaining internal state can only process update (D/UB/UA) messages through complete rows. Sink nodes work in retract mode when no primary key is defined, and delete operations are performed by complete rows. This means that in the update-by-row mode, all the update messages received by the operators that need to maintain the state can’t be interfered by the non-deterministic column values, otherwise it will cause NDU problems resulting in computation errors. For a query pipeline with update messages that can’t derive the update key, the following points are the most important sources of NDU problems: Non-deterministic functions, including scalar, table, aggregate functions, built-in or custom ones LookupJoin on an evolving source CDC source carries metadata fields, like system columns, that don’t belong to the row entity itself Exceptions caused by cleaning internal state data based on TTL are discussed separately as a runtime fault-tolerant handling strategy. For more information, see FLINK-24666. Related content¶ Flink SQL Queries Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql CREATE TABLE clicks ( uid VARCHAR(128), cTime TIMESTAMP(3), url VARCHAR(256) ) ``` ```sql +------+---------------------+------------+ | uid | cTime | url | +------+---------------------+------------+ | Mary | 2023-08-22 12:00:01 | /home | | Bob | 2023-08-22 12:00:01 | /home | | Mary | 2023-08-22 12:00:05 | /prod?id=1 | | Liz | 2023-08-22 12:01:00 | /home | | Mary | 2023-08-22 12:01:30 | /cart | | Bob | 2023-08-22 12:01:35 | /prod?id=3 | +------+---------------------+------------+ ``` ```sql SELECT * FROM clicks WHERE cTime BETWEEN TIMESTAMPADD(MINUTE, -2, CURRENT_TIMESTAMP) AND CURRENT_TIMESTAMP; ``` ```sql +------+---------------------+------------+ | uid | cTime | url | +------+---------------------+------------+ | Liz | 2023-08-22 12:01:00 | /home | | Mary | 2023-08-22 12:01:30 | /cart | | Bob | 2023-08-22 12:01:35 | /prod?id=3 | +------+---------------------+------------+ ``` ```sql SELECT UUID() AS uuid, * FROM clicks LIMIT 3; ``` ```sql -- first execution +--------------------------------+------+---------------------+------------+ | uuid | uid | cTime | url | +--------------------------------+------+---------------------+------------+ | aaaa4894-16d4-44d0-a763-03f... | Mary | 2023-08-22 12:00:01 | /home | | ed26fd46-960e-4228-aaf2-0aa... | Bob | 2023-08-22 12:00:01 | /home | | 1886afc7-dfc6-4b20-a881-b0e... | Mary | 2023-08-22 12:00:05 | /prod?id=1 | +--------------------------------+------+---------------------+------------+ -- second execution +--------------------------------+------+---------------------+------------+ | uuid | uid | cTime | url | +--------------------------------+------+---------------------+------------+ | 95f7301f-bcf2-4b6f-9cf3-1ea... | Mary | 2023-08-22 12:00:01 | /home | | 63301e2d-d180-4089-876f-683... | Bob | 2023-08-22 12:00:01 | /home | | f24456d3-e942-43d1-a00f-fdb... | Mary | 2023-08-22 12:00:05 | /prod?id=1 | +--------------------------------+------+---------------------+------------+ ``` ```sql CURRENT_TIMESTAMP ``` ```sql SELECT CURRENT_TIMESTAMP, * FROM clicks; ``` ```sql CURRENT_TIMESTAMP ``` ```sql +-------------------------+------+---------------------+------------+ | CURRENT_TIMESTAMP | uid | cTime | url | +-------------------------+------+---------------------+------------+ | 2023-08-23 17:25:46.831 | Mary | 2023-08-22 12:00:01 | /home | | 2023-08-23 17:25:46.831 | Bob | 2023-08-22 12:00:01 | /home | | 2023-08-23 17:25:46.831 | Mary | 2023-08-22 12:00:05 | /prod?id=1 | | 2023-08-23 17:25:46.831 | Liz | 2023-08-22 12:01:00 | /home | | 2023-08-23 17:25:46.831 | Mary | 2023-08-22 12:01:30 | /cart | | 2023-08-23 17:25:46.831 | Bob | 2023-08-22 12:01:35 | /prod?id=3 | +-------------------------+------+---------------------+------------+ ``` ```sql CURRENT_TIMESTAMP ``` ```sql SELECT CURRENT_TIMESTAMP, * FROM clicks; ``` ```sql +-------------------------+------+---------------------+------------+ | CURRENT_TIMESTAMP | uid | cTime | url | +-------------------------+------+---------------------+------------+ | 2023-08-23 17:25:46.831 | Mary | 2023-08-22 12:00:01 | /home | | 2023-08-23 17:25:47.001 | Bob | 2023-08-22 12:00:01 | /home | | 2023-08-23 17:25:47.310 | Mary | 2023-08-22 12:00:05 | /prod?id=1 | +-------------------------+------+---------------------+------------+ ``` --- ### Tables and Topics in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/concepts/dynamic-tables.html Tables and Topics in Confluent Cloud for Apache Flink¶ Apache Flink® and the Table API use the concept of dynamic tables to facilitate the manipulation and processing of streaming data. Dynamic tables represent an abstraction for working with both batch and streaming data in a unified manner, offering a flexible and expressive way to define, modify, and query structured data. In contrast to the static tables that represent batch data, dynamic tables change over time. But like static batch tables, systems can execute queries over dynamic tables. Confluent Cloud for Apache Flink® implements ANSI-Standard SQL and has the familiar concepts of catalogs, databases, and tables. Confluent Cloud maps a Flink catalog to an environment and vice-versa. Similarly, Flink databases and tables are mapped to Apache Kafka® clusters and topics. For more information, see Metadata mapping between Kafka cluster, topics, schemas, and Flink. Dynamic tables and continuous queries¶ Every table in Flink is equivalent to a stream of events describing the changes that are being made to that table. A stream of changes like this a changelog stream. Essentially, a stream is the changelog of a table, and every table is backed by a stream. This is also the case for regular database tables. Querying a dynamic table yields a continuous query. A continuous query never terminates and produces dynamic results - another dynamic table. The query continuously updates its dynamic result table to reflect changes on its dynamic input tables. Essentially, a continuous query on a dynamic table is similar to a query that defines a materialized view. The output of a continuous query is always equivalent to the result of the same query executed in batch mode on a snapshot of the input tables. Append-only table¶ Stream-table table duality for an append-only table¶ In this animation, the only changes happening to the Orders table are the new orders being appended to the end of the table. The corresponding changelog stream is just a stream of INSERT events. Adding another order to the table is the same as adding another INSERT statement to the stream, as shown below the table. This is an example of an append-only or insert-only table. Updating table¶ Not all tables are append-only tables. Tables can also contain events that modify or delete existing rows. The changelog stream used by Flink SQL contains three additional event types to accommodate different ways that tables can be updated. Besides the regular Insertion event, Update Before and Update After are a pair of events that work together to update an earlier result. The Delete event has the effect you would expect, removing a record from the table. Stream-table table duality for an updating table¶ This animation has the same starting point as the previous example that showed the append-only table. But this time, an order has been cancelled, and the item in that order hasn’t been sold. The result of this event is that the Bestsellers table is updated, rather then doing another insert. The update starts with appending another order to the append-only/insert-only Orders table, which is registered as an INSERT event in the changelog stream. Because the SQL statement is doing grouping, the result is an updating table instead of an append-only/insert-only table. In this example, an order for 15 hats is cancelled. To process the event with the 15-hat order cancellation, the query produces two update events: The first is an UPDATE_BEFORE event that retracts the current result that showed 50 hats as the bestselling item. The second is an UPDATE_AFTER event that replaces the old entry with a new one that shows 35 hats. Conceptually, the UPDATE_BEFORE event is processed first, which removes the old entry from the Bestsellers table. Then, the sync processes the UPDATE_AFTER event, which inserts the updated results. The following figure visualizes the relationship of streams, dynamic tables, and continuous queries: A stream is converted into a dynamic table. A continuous query is evaluated on the dynamic table yielding a new dynamic table. The resulting dynamic table is converted back into a stream. Dynamic tables are a logical concept. The only state that is actually materialized by the Flink SQL runtime is whatever is strictly necessary to produce correct results for the specific query being executed. For example, the previous diagram shows a query executing a simple filter. This requires no state, so nothing is materialized. Changelog entries¶ Flink provides four different types of changelog entries: Short name Long name Semantics +I Insertion Records only the insertions that occur. -U Update Before Retracts a previously emitted result. Update Before is an update operation with the previous content of the updated row. This kind occurs together with Update After (+U) for modeling an update that must retract the previous row first. It is useful in cases of a non-idempotent update, which is an update of a row that is not uniquely identifiable by a key. +U Update After Updates a previously emitted result. Update After is an update operation with new content for the updated row. This kind can occur together with Update Before (-U) for modeling an update that must retract the previous row first, or it can describe an idempotent update, which is an update of a row that is uniquely identifiable by a key. -D Delete Deletes the last result. The - character always means that a row is being removed. If the downstream system supports upserting, you should use a primary key in Confluent Cloud for Apache Flink to avoid the need to use Update Before. Depending on the combination of source, sink, and business logic applied, you can end up with the following types of changelog streams. Changelog stream types Stream category Changelog entry types Appending stream Append stream Contains only +I Upserting streams Update stream +I, +U, -D (never contains -U but can contain +U and/or -D) Retracting stream Update stream +I, +U, -U, -D (contains +I and can contain -U and/or -D) All streams can have +I / inserts. Both retract and upsert streams can have -D / deletes and +U / upserts (upsert afters). Only retract streams can have -U. Related content¶ Flink SQL Queries Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql Bestsellers ``` ```sql Bestsellers ``` --- ### Billing in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/concepts/flink-billing.html Billing on Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® is a serverless stream-processing platform with usage-based pricing, where you are charged only for the duration that your queries are running. You configure Flink by creating a Flink compute pool. You are charged for the CFUs consumed when you run statements within a compute pool. While the compute pool itself scales elastically to provide the necessary resources, your cost is determined by the actual CFUs used per minute, not the provisioned size of the pool. You can configure the maximum size of a compute pool to limit your spending. CFUs¶ A CFU is a logical unit of processing power that is used to measure the resources consumed by Confluent Cloud for Apache Flink. Each Flink statement consumes a minimum of 1 CFU-minute but may consume more depending on the needs of the workload. CFU billing¶ You are billed for the total number of CFUs consumed inside a compute pool per minute. Usage is stated in hours in order to apply hourly pricing to minute-by-minute use. For example, 30 CFU-minutes is 0.5 CFU-hours. CFU pricing$0.21/CFU-hour, calculated by the minute ($0.0035/CFU-minute) Prices vary by cloud region. Networking fees¶ Using Flink to read and write data from Apache Kafka® doesn’t add any new Flink-specific networking fees, but you’re still responsible for the Confluent Cloud networking rates for data read from and written to your Kafka clusters. These are existing Kafka costs, not new charges created by Flink. Cost Management¶ You can’t define the number of CFUs required for individual statements. CFUs are counted by Confluent Cloud for Apache Flink. You can configure the maximum size of a compute pool to limit your spending by setting a parameter named MAX_CFU, which sets an upper limit on the hourly spend on the compute pool. If the size of the workload in a pool exceeds MAX_CFU, new statements are rejected. Existing workloads continue running but may experience increased latency. Note You can increase the MAX_CFU value after you create a compute pool, but decreasing the initial MAX_CFU value is not supported. For more information, see Update a compute pool. For more information on CFU prices, see Confluent Cloud Pricing. Pricing examples¶ Data streaming is a real-time business, and data streams oscillate on a minute-by-minute basis, creating peaks and troughs of utilization. You don’t want to allocate and overpay for processing capacity that you aren’t using. With Confluent Cloud for Apache Flink, you pay only for the processing power that you actually use. The following examples provide additional detail on how pricing works when processing streams using Confluent Cloud for Apache Flink. Data exploration and discovery¶ Most SQL queries are short-running, interactive queries that help software and data engineers understand the streams they have access to. Querying the streams directly is an important and necessary step in the iterative development of apps and pipelines. In the following example, one user executes five different queries. Unlike other Flink offerings, Confluent Cloud for Apache Flink’s serverless architecture charges you only for the five minutes when these queries are executing, with all users able to share the resources of a single compute pool. It doesn’t matter if these queries are executed by the same person, by five different people at the same time or, as shown below, at different points in the hour. Example pricing calculation Number of queries executed = 5 Total CFU-minutes consumed = 5 Total charge: 5 CFU-minutes x $0.0035/CFU-minute = $0.0175 Note: The charge appears on the invoice as “0.083 CFU-hours x $0.21/CFU-hour”. Many data streaming apps and statements¶ Data streaming architectures are composed of many applications, each with their own workload requirements. An architecture can be a mix of interactive, terminating statements and continuous, streaming statements. Confluent Cloud for Apache Flink automatically scales the processing power of the Flink compute pool up and down in real-time to ensure your apps have the processing power they need, while charging only for the minutes needed. In the following example, five streaming statements are running in a single compute pool. The data streams are oscillating, and you can see spikes of utilization for short periods within the hour. Each statement attracts a minimum price of 1 CFU-minute ($0.0035 in this example) and is automatically scaled up and down as needed on a per-minute basis. Statement CFU-minutes Statement Type Q1 5 Interactive Q2 60 Streaming Q3 110 Streaming Q4 10 Interactive Q5 124 Streaming Total 309 Example pricing calculation Number of statements executing = 5 Total CFU-minutes consumed = 309 Total charge: 309 CFU-minutes x $0.0035/CFU-minute = $1.0815 Note: The charge appears on the invoice as “5.15 CFU-hours x $0.21/CFU-hour”. Related content¶ Compute Pools Confluent Cloud Pricing --- ### Private Networking with Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/concepts/flink-private-networking.html Private Networking with Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports private networking on AWS, Azure, and Google Cloud. This feature enables Flink to securely read and write data stored in Confluent Cloud clusters that are located in private networking, with no data flowing to the public internet. With private networking, you can use Flink and Apache Kafka® together for stream processing in Confluent Cloud, even in the most stringent regulatory environments. Confluent Cloud for Apache Flink supports private networking for AWS and Azure in all regions where Flink is supported. Google Cloud supports private networking in most regions where Flink is supported. For the regions that support Flink private networking, see Supported Cloud Regions. Connectivity options¶ There are a number of ways to access Flink with private networking. In all cases, they allow access to all types of private clusters (Enterprise, Dedicated, Freight), with all types of connectivity (VNET/VPC, Peering, Transit Gateway, PNI). PrivateLink Attachment: Works with any type of cluster and is available on AWS and Azure. Confluent Cloud network (CCN): Available on AWS and Azure for all types of networks. If you already have an existing Confluent Cloud network, this is the easiest way to get started, but it works only on AWS when a Confluent Cloud network is already configured. If you need to create a new Confluent Cloud network, follow the steps in Create Confluent Cloud Network on AWS. PrivateLink Attachment¶ A PrivateLink Attachment is a resource that enables you to connect to Confluent serverless products, like Enterprise clusters and Flink. For Flink, the new PrivateLink Attachment is used only to establish a connection between your clients (like Cloud Console UI, Confluent CLI, Terraform, apps using the Confluent REST API) and Flink. Flink-to-Kafka is routed internally within Confluent Cloud. As a result, this PLATT is used only for submitting statements and fetching results from the client. For Dedicated clusters, regardless of the Kafka cluster connection type (Private Link, Peering, or Transit Gateway), Flink requires that you define a PLATT in the same region of the cluster, even if a private link exists for the Dedicated cluster. For Enterprise clusters, you can reuse the same PLATT used by your Enterprise clusters. By creating a PrivateLink Attachment to a Confluent Cloud environment in a region, you are enabling Flink statements created in that environment to securely access data in any of the Flink clusters in the same region, regardless of their environment. Access to the Flink clusters is governed by RBAC. Also, a PrivateLink Attachment enables your data-movement components in Confluent Cloud, including Flink statements and cluster links, to move data between all of the private networks in the organization, including the Confluent Cloud networks associated with any Dedicated Kafka clusters. For more information, see Enable private networking with PrivateLink Attachment. Confluent Cloud network (CCN)¶ If you have an existing Confluent Cloud network, this is the easiest way to get set up, but it works only on AWS and Azure when a Confluent Cloud network is configured already and at least one Kafka Dedicated cluster exists in the environment and region where you need to use Flink. For existing Kafka Dedicated users, this option requires no effort to configure, if everything is already configured for Kafka. If a reverse proxy is not set up, this requires setup for Flink or the use of a VM within the VPC to access Flink. To create a Confluent Cloud network, follow the steps in Create Confluent Cloud Network on AWS. For more information, see Enable private networking with Confluent Cloud Network. Protect resources with IP Filtering¶ With IP Filtering, you can enhance security for your Flink resources (statements and workspaces) based on trusted source IP addresses. IP Filtering is an authorization feature that allows you to create IP filters for your Confluent Cloud organization that permit inbound requests only from specified IP groups. All incoming API requests that originate from IP addresses not included in your IP filters are denied. For Flink resources, you can implement the following access controls: No public networks: Select the predefined No Public Networks group (ipg-none) to block all public network access, allowing access only from private network connections. This IP group cannot be combined with other IP groups in the same filter. Public: The default option if no IP filters are set. Flink statements and workspaces are accessible from all source networks when connecting over the public internet. While SQL queries are visible, private cluster data remains protected, and you can’t issue statements accessing private clusters. Public with restricted IP list: Create custom IP groups containing specific CIDR blocks to allow access only from trusted networks while maintaining the same protection for private cluster data. IP Filtering applies only to requests made over public networks and doesn’t limit requests made over private network connections. When creating IP filters for Flink resources, select the Flink operation group to control access to all operations related to Apache Flink data. For more information on setting IP filters, see IP Filtering and Manage IP Filters. The IP Filtering feature replaces the previous distinction between public and private Flink statements and workspaces. Administrators can modify access controls at any time by updating IP filters. For data protection in Kafka clusters, access is governed by network settings of the cluster: You can always read public data regardless of the connectivity, whether public or private. To read or write data in a private cluster, the cluster must use private connectivity. To prevent data exfiltration, you can’t write to public clusters when using private connectivity. Available endpoints for an environment and region¶ The following section shows the endpoints that are available for connecting to Flink. While the public endpoint is always present, others may require some effort to be created. Public endpoint PrivateLink Attachment Private connectivity through Confluent Cloud network The following table shows how to get the endpoint value by using different Confluent interfaces. Interface Location Endpoint Cloud Console Flink Endpoints page Full FQDN shown for each network connection Confluent CLI confluent flink endpoint list Full FQDN shown for each network connection Network UI/API/CLI Network management details page in Environment overview GET /network/ confluent network describe Read the endpoint_suffix attribute, for example, -abc1de.us-east-1.aws.glb.confluent.cloud Replace with the relevant value, for example, flink for Flink or flinkpls for Language Service. Assign in interface (UI/CLI/Terraform) The following table shows the endpoint patterns for different DNS and cluster type combinations. Networking DNS Cluster Type Endpoints PrivateLink Private Enterprise (PrivateLink Attachment) flink.$region.$cloud.private.confluent.cloud flinkpls.$region.$cloud.private.confluent.cloud Dedicated flink.dom$id.$region.$cloud.private.confluent.cloud flinkpls.dom$id.$region.$cloud.private.confluent.cloud Public Dedicated flink-$nid.$region.$cloud.glb.confluent.cloud flinkpls-$nid.$region.$cloud.glb.confluent.cloud VPC Peering / Transit Gateway w/ /16 CIDR Public Dedicated flink-$nid.$region.$cloud.confluent.cloud flinkpls-$nid.$region.$cloud.confluent.cloud VPC Peering / Transit Gateway w/ /27 CIDRs Public Dedicated flink-$nid.$region.$cloud.glb.confluent.cloud flinkpls-$nid.$region.$cloud.glb.confluent.cloud Public endpoint¶ Source: Always present. Considerations: Can’t access Kafka private data. Kafka data access and scope: Can access public cluster data (read/write) in cloud region for this organization. Access to Flink statement and workspace: Configurable with IP Filtering. Endpoints: flink...confluent.cloud, for example: flink.us-east-2.aws.confluent.cloud. PrivateLink Attachment¶ Source: Must create a Private Link Attachment for the environment/region. Considerations: A single VPC can’t have private link connections to multiple Confluent Cloud environments. Available on AWS and Azure. Can access private cluster data (read/write) in Enterprise, Dedicated or Freight clusters for the cloud region for the organization of the endpoint. Can access public cluster data (read only). Access all Flink resources in the same environment and region of the endpoint Endpoints: flink...private.confluent.cloud, for example: flink.us-east-2.aws.private.confluent.cloud Private connectivity through Confluent Cloud network¶ Source: Created with Kafka Dedicated clusters. Considerations: Easiest way to use Flink when the network is created already for Dedicated clusters. Available on AWS and Azure for all types of Confluent Cloud network. Can access private cluster data (read/write) in Enterprise, Dedicated or Freight clusters for the organization of the region. Can access public cluster data (read only). Access all Flink resources in the same environment and region of the endpoint To find the endpoints from the Cloud Console or Confluent CLI, see Available endpoints for an environment and region. Access private networking with the Confluent CLI¶ Run the confluent flink region --cloud --region command to select a cloud provider and region. Run the confluent flink endpoint list command to list all endpoints, both public and private. Run the confluent flink endpoint use to select an endpoint. In addition to the main Flink endpoint listed here, you must have access to flinkpls....private.confluent.cloud (for private DNS resolution) or flinkpls-...private.confluent.cloud (for public DNS resolution) to access the language service for autocompletion in the Flink SQL shell. In the case of public DNS resolution, routing is done transparently, but if you use private DNS resolution, you must make sure to route this endpoint from your client. For more information, see private DNS resolution. Access private networking with the Cloud Console¶ By default, public networking is used, which won’t work if IP Filtering is set, and/or the cluster is private. You can set defaults for each cloud region in an environment. For this, use the Flink Endpoints page. The default is per-user. When a default is set, it is used for all pages that access Flink, for example, the statement list, workspace list, and workspaces. If no default is set, the public endpoint is used. Related content¶ Video: Flink Queries on Dedicated PrivateLink Kafka Clusters in Confluent Cloud Use Confluent Cloud with Private Networking Flink Compute Pools Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql confluent flink endpoint list ``` ```sql confluent network describe ``` ```sql endpoint_suffix ``` ```sql -abc1de.us-east-1.aws.glb.confluent.cloud ``` ```sql ``` ```sql flink.$region.$cloud.private.confluent.cloud ``` ```sql flinkpls.$region.$cloud.private.confluent.cloud ``` ```sql flink.dom$id.$region.$cloud.private.confluent.cloud ``` ```sql flinkpls.dom$id.$region.$cloud.private.confluent.cloud ``` ```sql flink-$nid.$region.$cloud.glb.confluent.cloud ``` ```sql flinkpls-$nid.$region.$cloud.glb.confluent.cloud ``` ```sql flink-$nid.$region.$cloud.confluent.cloud ``` ```sql flinkpls-$nid.$region.$cloud.confluent.cloud ``` ```sql flink-$nid.$region.$cloud.glb.confluent.cloud ``` ```sql flinkpls-$nid.$region.$cloud.glb.confluent.cloud ``` ```sql flink...confluent.cloud ``` ```sql flink.us-east-2.aws.confluent.cloud ``` ```sql flink...private.confluent.cloud ``` ```sql flink.us-east-2.aws.private.confluent.cloud ``` ```sql confluent flink region --cloud --region ``` ```sql confluent flink endpoint list ``` ```sql confluent flink endpoint use ``` ```sql flinkpls....private.confluent.cloud ``` ```sql flinkpls-...private.confluent.cloud ``` --- ### Stream Processing Concepts in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/concepts/overview.html Stream Processing Concepts in Confluent Cloud for Apache Flink¶ Apache Flink® SQL, a high-level API powered by Confluent Cloud for Apache Flink, offers a simple and easy way to leverage the power of stream processing. With support for a wide variety of built-in functions, queries, and statements, Flink SQL provides real-time insights into streaming data. Time is a critical element in stream processing, and Flink SQL makes it easy to process data as it arrives, avoiding delays. By using SQL syntax, you can declare expressions that filter, aggregate, route, and mutate streams of data, simplifying your data processing workflows. Stream processing¶ Streams are the de-facto way to create data. Whether the data comprises events from web servers, trades from a stock exchange, or sensor readings from a machine on a factory floor, data is created as part of a stream. When you analyze data, you can either organize your processing around bounded or unbounded streams, and which of these paradigms you choose has significant consequences. Batch processing is the paradigm at work when you process a bounded data stream. In this mode of operation, you can choose to ingest the entire dataset before producing any results, which means that it’s possible, for example, to sort the data, compute global statistics, or produce a final report that summarizes all of the input. Snapshot queries are a type of batch processing query that enables you to process a subset of data from a Kafka topic. Stream processing, on the other hand, involves unbounded data streams. Conceptually, at least, the input may never end, and so you must process the data continuously as it arrives. Bounded and unbounded tables¶ In the context of a Flink table, bounded mode refers to processing data that is finite, which means that the dataset has a clear beginning and end and does not grow continuously or update over time. This is in contrast to unbounded mode, where data arrives as a continuous stream, potentially with no end. The scan.bounded.mode property controls how Flink consumes data from a Kafka topic. A table can be bounded by committed offsets in Kafka brokers of a specific consumer group, by latest offsets, or by a user-supplied timestamp. Key characteristics of bounded mode¶ Finite data: The table represents a static dataset, similar to a traditional table in a relational database or a file in a data lake. Once all records are read, there is no more data to process. Batch processing: Operations on bounded tables are executed in batch mode. This means Flink processes all the available data, computes the results, and then the job finishes. This is suitable for use cases like ETL, reporting, and historical analysis. Optimized execution: Since the system knows the data is finite, it can apply optimizations that are not possible with unbounded (streaming) data. For example, it can sort by any column, perform global aggregations, and use blocking operators. No need for state retention: Unlike streaming mode, where Flink must keep state around to handle late or out-of-order events, batch mode can drop state as soon as it is no longer needed, reducing resource usage. The following table compares the characteristics of bounded and unbounded tables. Aspect Bounded Mode (Batch) Unbounded Mode (Streaming) Data Size Finite (static) Infinite (dynamic, continuous) Processing Style Batch processing Real-time/continuous processing Query Semantics All data available at once Data arrives over time State Management Minimal, can drop state when done Must retain state for late/out-of-order data Use Cases ETL, reporting, historical analytics Real-time analytics, monitoring, alerting Parallel dataflows¶ Programs in Flink are inherently parallel and distributed. During execution, a stream has one or more stream partitions, and each operator has one or more operator subtasks. The operator subtasks are independent of one another, and execute in different threads and possibly on different machines or containers. The number of operator subtasks is the parallelism of that particular operator. Different operators of the same program may have different levels of parallelism. A parallel dataflow in Flink with condensed view (above) and parallelized view (below).¶ Streams can transport data between two operators in a one-to-one (or forwarding) pattern, or in a redistributing pattern: One-to-one streams (for example between the Source and the map() operators in the figure above) preserve the partitioning and ordering of the elements. That means that subtask[1] of the map() operator will see the same elements in the same order as they were produced by subtask[1] of the Source operator. Redistributing streams (as between map() and keyBy/window above, as well as between keyBy/window and Sink) change the partitioning of streams. Each operator subtask sends data to different target subtasks, depending on the selected transformation. Examples are keyBy() (which re-partitions by hashing the key), broadcast(), or rebalance() (which re-partitions randomly). In a redistributing exchange the ordering among the elements is only preserved within each pair of sending and receiving subtasks (for example, subtask[1] of map() and subtask[2] of keyBy/window). So, for example, the redistribution between the keyBy/window and the Sink operators shown above introduces non-determinism regarding the order in which the aggregated results for different keys arrive at the Sink. Timely stream processing¶ For most streaming applications it is very valuable to be able re-process historic data with the same code that is used to process live data - and to produce deterministic, consistent results, regardless. It can also be crucial to pay attention to the order in which events occurred, rather than the order in which they are delivered for processing, and to be able to reason about when a set of events is (or should be) complete. For example, consider the set of events involved in an e-commerce transaction, or financial trade. These requirements for timely stream processing can be met by using event time timestamps that are recorded in the data stream, rather than using the clocks of the machines processing the data. Stateful stream processing¶ Flink operations can be stateful. This means that how one event is handled can depend on the accumulated effect of all the events that came before it. State may be used for something simple, such as counting events per minute to display on a dashboard, or for something more complex, such as computing features for a fraud detection model. A Flink application is run in parallel on a distributed cluster. The various parallel instances of a given operator will execute independently, in separate threads, and in general will be running on different machines. The set of parallel instances of a stateful operator is effectively a sharded key-value store. Each parallel instance is responsible for handling events for a specific group of keys, and the state for those keys is kept locally. The following diagram shows a job running with a parallelism of two across the first three operators in the job graph, terminating in a sink that has a parallelism of one. The third operator is stateful, and a fully-connected network shuffle is occurring between the second and third operators. This is being done to partition the stream by some key, so that all of the events that need to be processed together will be. A Flink job running with a parallelism of two.¶ State is always accessed locally, which helps Flink applications achieve high throughput and low-latency. State management¶ Fault tolerance via state snapshots¶ Flink is able to provide fault-tolerant, exactly-once semantics through a combination of state snapshots and stream replay. These snapshots capture the entire state of the distributed pipeline, recording offsets into the input queues as well as the state throughout the job graph that has resulted from having ingested the data up to that point. When a failure occurs, the sources are rewound, the state is restored, and processing is resumed. As depicted above, these state snapshots are captured asynchronously, without impeding the ongoing processing. Table programs that run in streaming mode leverage all capabilities of Flink as a stateful stream processor. In particular, a table program can be configured with a state backend and various checkpointing options for handling different requirements regarding state size and fault tolerance. State usage¶ Due to the declarative nature of Table API and SQL programs, it’s not always obvious where and how much state is used within a pipeline. The planner decides whether state is necessary to compute a correct result. A pipeline is optimized to claim as little state as possible given the current set of optimizer rules. Conceptually, source tables are never kept entirely in state. An implementer deals with logical tables, named dynamic tables. Their state requirements depend on the operations that are in use. Queries such as SELECT ... FROM ... WHERE which consist only of field projections or filters are usually stateless pipelines. But operations like joins, aggregations, or deduplications require keeping intermediate results in a fault-tolerant storage for which Flink state abstractions are used. Refer to the individual operator documentation for more details about how much state is required and how to limit a potentially ever-growing state size. For example, a regular SQL join of two tables requires the operator to keep both input tables in state entirely. For correct SQL semantics, the runtime needs to assume that a match could occur at any point in time from both sides of the join. Flink provides optimized window and interval joins that aim to keep the state size small by exploiting the concept of watermark strategies. Another example is the following query that computes the number of clicks per session. SELECT sessionId, COUNT(*) FROM clicks GROUP BY sessionId; The sessionId attribute is used as a grouping key and the continuous query maintains a count for each sessionId it observes. The sessionId attribute is evolving over time and sessionId values are only active until the session ends, i.e., for a limited period of time. However, the continuous query cannot know about this property of sessionId and expects that every sessionId value can occur at any point of time. It maintains a count for each observed sessionId value. Consequently, the total state size of the query is continuously growing as more and more sessionId values are observed. Dataflow Model¶ Flink implements many techniques from the Dataflow Model. The following articles provide a good introduction to event time and watermark strategies. Blog post: Streaming 101 by Tyler Akidau Dataflow Model Related content¶ Autopilot Comparison with Apache Flink Compute Pools Dynamic Tables Statements Time and Watermarks Time attributes Joins in Continuous Queries Determinism in Continuous Queries Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql SELECT ... FROM ... WHERE ``` ```sql SELECT sessionId, COUNT(*) FROM clicks GROUP BY sessionId; ``` --- ### Schema and Statement Evolution with Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/concepts/schema-statement-evolution.html Schema and Statement Evolution with Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables evolving your statements over time as your schemas change. This topic describes these concepts: How you can evolve your statements and the tables they maintain over time. How statements behave when the schema of their source tables change. Example¶ Throughout this topic, the following statement is used as a running example. SET 'sql.state-ttl' = '1h'; SET 'client.statement-name' = 'orders-with-customers-v1-1'; CREATE FUNCTION to_minor_currency AS 'io.confluent.flink.demo.toMinorCurrency' USING JAR 'confluent-artifact://ccp-lzj320/ver-4y0qw7'; CREATE TABLE v_orders AS SELECT order.* FROM sales_lifecycle_events WHERE order != NULL; CREATE TABLE orders_with_customers_v1 PRIMARY KEY (v_orders.order_id) DISTRIBUTED INTO 10 BUCKETS AS SELECT v_orders.order_id, v_orders.product, to_minor_currency(v_orders.price), customers.*, FROM v_orders JOIN customers FOR SYSTEM TIME AS OF orders.$rowtime ON v_orders.customer_id = customers.id; The orders_with_customers_v1 table uses a user-defined function named to_minor_currency and joins a table named v_orders with the up-to-date customer information from the customers table. Fundamentals¶ Mutability of statements and tables¶ A statement has the following components: an immutable query, for example: SELECT v_orders.product, to_minor_currency(v_orders.price), customers.* FROM orders JOIN customers FOR SYSTEM TIME AS OF orders.$rowtime ON v_orders.customer_id = customers.id; immutable statement properties, for example: 'sql.state-ttl' = '1h' a mutable principal, that is, the user or service account under which this statement runs. The principal and compute pool are mutable when stopping and resuming the statement. Note that stopping and resume the statement results in a temporarily higher materialization delay and latency. The query and options of a statement (SELECT ...) are immutable, which means that you can’t change them after the statement has been created. Note If your use case requires a lower latency, reach out to Confluent Support or your account manager. The table which the statement is writing to has these components: An immutable name, for example: orders_with_customers_v1. Mutable constraints, for example: PRIMARY KEY (v_orders.order_id) A mutable watermark definition. a mutable column definition partially mutable table options The name of a table is immutable, because it maps one-to-one to the underlying topic, which you can’t rename. The watermark strategy is mutable by using the ALTER TABLE ... MODIFY/DROP WATERMARK ...; statement. For more information, see ALTER TABLE Statement in Confluent Cloud for Apache Flink. The table options of the table are mutable by using the ALTER TABLE SET (...); statement. For more information, see ALTER TABLE Statement in Confluent Cloud for Apache Flink. The constraints are partially mutable by using the ALTER TABLE ADD/DROP PRIMARY KEY statement. Statements take a snapshot of their dependencies¶ A statement almost always references other catalog objects such as tables and functions. In the current example, the orders_with_customers_v1 table references these objects: A table named customers. A table named v_orders. A user-defined function named to_minor_currency. When a statement is created, it takes a snapshot of the configuration of all the catalog objects that it depends on. Changes, or the deletion of these objects from the catalog, are not propagated to existing statements, which means that: A change to the watermark strategy of a source table is not picked up by existing statements that reference the table. A change to a table option of a source table is not picked up by existing statements that reference the table. A change to the implementation of a user-defined functions is not picked up by existing statements that reference the function. If an underlying physical resource is deleted that statements require at runtime, like the topic, the statements transition into the FAILED, STOPPED, or RECOVERING state, depending on which resource was deleted. Schema compatibility modes¶ When a statement is created, it must be bootstrapped from its source tables. For this, Flink must be able to read the source tables from the beginning (or any other specified offsets). As mentioned previously, statements use the latest schema version, at the time of statement creation, for each source table as the read schema. You have these options for handling changes to base schemas: Compatibility Mode FULL or FULL_TRANSITIVE BACKWARD_TRANSITIVE compatibility mode and upgrade consumers first Compatibility groups and migration rules To maximize compatibility with Flink, you should use FULL_TRANSITIVE or FULL as the schema compatibility mode, which eases migrations. Note that in Confluent Cloud, the default compatibility mode is BACKWARD. Sometimes, you may need to make changes beyond what the FULL_TRANSITIVE and FULL modes enable, so Confluent Cloud for Apache Flink gives you the additional options of BACKWARD_TRANSITIVE compatibility mode and Compatibility groups and migration rules for handling changes to base schemas. Compatibility Mode FULL or FULL_TRANSITIVE¶ If you use the FULL or FULL_TRANSITIVE compatibility mode, the order you upgrade your statements doesn’t matter. FULL limits the changes that you can make to your tables to adding and removing optional fields. You can make any compatible changes to the source tables, and none of the statements that reference them will break. BACKWARD_TRANSITIVE compatibility mode and upgrade consumers first¶ BACKWARD_TRANSITIVE mandates that consumers are upgraded prior to producers. This means that if you evolve your schema according to the BACKWARD_TRANSITIVE rules (delete fields, add optional fields), you always need to upgrade all statements that are reading from the corresponding source tables before producing any records to the table that uses the next schema version, as described in Query Evolution. Compatibility groups and migration rules¶ If you need to make a non-compatible change to a table, either using FULL or BACKWARD_TRANSITIVE, Confluent Cloud for Apache Flink also supports compatibility groups and migration rules. For more information, see Data Contracts for Schema Registry on Confluent Cloud. Note If you need to make changes to your schemas that aren’t possible under schema compatibility mode FULL, use compatibility mode FULL for all topics and rely on compatibility groups and migration rules. Statements and schema evolution¶ When following the practices in the previous section, statements won’t fail when fields are added or optional fields are removed from its source tables, but these new fields aren’t picked up or forwarded to the sink tables. They are ignored by any previously created statements, and the *-operators are not evaluated dynamically when the schema changes. Note If you’re interested in to providing feedback about configuring statements to pick up schema changes of sources tables dynamically, reach out to Confluent Support or your account manager. Query evolution¶ As stated previously, the query in a statement is immutable. But you may encounter situations in which you want to change the logic of a long-running statement: You may have to fix a bug in your query. For example, you may have to handle an arithmetic error that occurs only when the statement has already existed for a long time by adding another branch in a CASE clause. You may want to evolve the logic of your statement. You want your statement to pick up configuration updates to any of the catalog objects that it references, like tables or functions. The general strategy for query evolution is to replace the existing statement and the corresponding tables it maintains with a new statement and new tables, as shown in the following steps: Use CREATE TABLE ... AS ... to create a new version of the table, orders_with_customers_v2. Wait until the new statement has caught up with latest messages of its source tables, which means that the “Messages Behind” metric is close to zero. Note that Confluent Cloud Autopilot automatically configures the statement to catch up as quickly as the compute resources provided by the assigned compute pool allow. Migrate all consumers to the new tables. The best way to find all downstream consumers of a table topic in Confluent Cloud is to use Stream Lineage. Stop the orders-with-customers-v1-1 statement. This base strategy has these features: It works for any type of statement. It requires that all relevant input messages are retained in the source tables. It requires existing consumers to switch to different topics manually, and thereby reading the …v2 table from earliest or any manually specified offset. You can adjust the base strategy in multiple ways, depending on your circumstances. Limit reprocessing to a partial history¶ Compared to the base strategy, this strategy limits the messages that are reprocessed to a subset of the messages retained in the source tables. You may not want to reprocess the full history of messages that’s retained in all source table, but instead specify a different starting offset. For this, you can override the scan.startup.mode that is defined for the table, which by default is earliest, using dynamic table option hints. SET 'sql.state-ttl' = '1h'; SET 'client.statement-name' = 'orders-with-customers-v2-1'; CREATE TABLE orders_with_customers_v2 PRIMARY KEY (orders.order_id) DISTRIBUTED INTO 10 BUCKETS AS SELECT orders.order_id, orders.product, to_minor_currency(v_orders.price), customers.*, FROM orders /*+ OPTIONS('scan.startup.mode' = 'timestamp', 'scan.startup.timestamp-millis' = '1717763226336') */ JOIN customers /*+ OPTIONS('scan.startup.mode' = 'timestamp', 'scan.startup.timestamp-millis' = '1717763226336') */ ON orders.customer_id = customers.id; Alternatively, you can set this by using statement properties, like sql.tables.scan.startup.mode, and the SET statement. While dynamic table option hints enable you to configure the starting offset for each table independently, the statement properties affect the starting offset for all tables that this statement reads from. When reprocessing a partial history of the source tables, and depending on your query, you may want to add an additional filter predicate to your tables, to avoid incorrect results. For example, if your query performs windowed aggregations on ten-minute tumbling windows, you may want to start reading from exactly the beginning of a window to avoid an incomplete window at the start. This could be achieved by adding a WHERE event_time > '' clause to the respective source tables, where event_time is the name of the column that is used for windowing, and lies within the history of messages that are reprocessed and aligns with the start of one of the ten-minute windows, for example, 2024-06-11 15:40:00. Special case: Carrying over offsets of previous statements¶ When a statement is stopped, status.latest_offsets contains the latest offset for each partition of each of the source tables: status: latestOffsets: topic1: partition:0,offset:23;partition:1,offset:65 topic2: partition:0,offset:53;partition:1,offset:56 latestOffsetsTimestamp: you can use these offsets to specify the starting offsets to a new statement by using dynamic table option hints, so the new statement continues exactly where the previous statement left off. This strategy enables you to evolve statements arbitrarily with exactly-once semantics across the update, if and only if the statement is “stateless”, which mean that every output message is affected by a single input message. The following statements are common example of “stateless” statements: Filters INSERT INTO shipped_orders SELECT * FROM orders WHERE status = shipped; Routers EXECUTE STATEMENT SET BEGIN INSERT INTO shipped_orders SELECT * FROM orders WHERE status = 'shipped'; INSERT INTO cancelled_orders SELECT * FROM orders WHERE status = 'cancelled'; INSERT INTO returned_orders SELECT * FROM orders WHERE status = 'returned'; INSERT INTO other_orders SELECT * FROM orders WHERE status NOT IN ('returned', 'shipped', 'cancelled') END; Per-row transformations, including UDFs and array expansions: INSERT INTO ordered_products SELECT o.*, order_products.* FROM orders AS o CROSS JOIN UNNEST(orders.products) AS `order_products` (product_id, category, quantity, unit_price, net_price) For more information, see Carry-over Offsets. In-place upgrade¶ Compared to the base strategy, the in-place upgrade strategy has these features: It works only for tables that have a primary key, so that the new statement updates all rows written by the old statement. It works only for compatible changes, both semantically and in terms of the schema. It doesn’t require consumers to switch manually to new topics, but it does require consumers to be able to handle out-of-order, late, bulk updates to all keys. Instead of creating a new results table, you can also replace the original CREATE TABLE ... AS ... statement with an INSERT INTO statement that produces updates into the same table as before. The upgrade procedure then looks like this: Stop the old orders-with-customers-v1-1 statement. Once the old statement is stopped, create the new statement, orders-with-customers-v1-2. This strategy can and often will be combined with limited reprocessing to a partial history. Specifically, in the case of an exactly-once upgrade of a stateless statement, it makes sense to continue publishing messages to the same topic, provided this was a compatible change. Related content¶ Flink implements many techniques from the Dataflow Model. For a good introduction to event time and watermarks, have a look at these articles. Data Contracts Stream Lineage HINTS CREATE TABLE SET Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql SET 'sql.state-ttl' = '1h'; SET 'client.statement-name' = 'orders-with-customers-v1-1'; CREATE FUNCTION to_minor_currency AS 'io.confluent.flink.demo.toMinorCurrency' USING JAR 'confluent-artifact://ccp-lzj320/ver-4y0qw7'; CREATE TABLE v_orders AS SELECT order.* FROM sales_lifecycle_events WHERE order != NULL; CREATE TABLE orders_with_customers_v1 PRIMARY KEY (v_orders.order_id) DISTRIBUTED INTO 10 BUCKETS AS SELECT v_orders.order_id, v_orders.product, to_minor_currency(v_orders.price), customers.*, FROM v_orders JOIN customers FOR SYSTEM TIME AS OF orders.$rowtime ON v_orders.customer_id = customers.id; ``` ```sql orders_with_customers_v1 ``` ```sql to_minor_currency ``` ```sql SELECT v_orders.product, to_minor_currency(v_orders.price), customers.* FROM orders JOIN customers FOR SYSTEM TIME AS OF orders.$rowtime ON v_orders.customer_id = customers.id; ``` ```sql 'sql.state-ttl' = '1h' ``` ```sql (SELECT ...) ``` ```sql orders_with_customers_v1 ``` ```sql PRIMARY KEY (v_orders.order_id) ``` ```sql ALTER TABLE ... MODIFY/DROP WATERMARK ...; ``` ```sql ALTER TABLE SET (...); ``` ```sql ALTER TABLE ADD/DROP PRIMARY KEY ``` ```sql orders_with_customers_v1 ``` ```sql to_minor_currency ``` ```sql FULL_TRANSITIVE ``` ```sql FULL_TRANSITIVE ``` ```sql FULL_TRANSITIVE ``` ```sql BACKWARD_TRANSITIVE ``` ```sql BACKWARD_TRANSITIVE ``` ```sql BACKWARD_TRANSITIVE ``` ```sql CREATE TABLE ... AS ... ``` ```sql orders_with_customers_v2 ``` ```sql orders-with-customers-v1-1 ``` ```sql scan.startup.mode ``` ```sql SET 'sql.state-ttl' = '1h'; SET 'client.statement-name' = 'orders-with-customers-v2-1'; CREATE TABLE orders_with_customers_v2 PRIMARY KEY (orders.order_id) DISTRIBUTED INTO 10 BUCKETS AS SELECT orders.order_id, orders.product, to_minor_currency(v_orders.price), customers.*, FROM orders /*+ OPTIONS('scan.startup.mode' = 'timestamp', 'scan.startup.timestamp-millis' = '1717763226336') */ JOIN customers /*+ OPTIONS('scan.startup.mode' = 'timestamp', 'scan.startup.timestamp-millis' = '1717763226336') */ ON orders.customer_id = customers.id; ``` ```sql sql.tables.scan.startup.mode ``` ```sql WHERE event_time > '' ``` ```sql ``` ```sql 2024-06-11 15:40:00 ``` ```sql status.latest_offsets ``` ```sql status: latestOffsets: topic1: partition:0,offset:23;partition:1,offset:65 topic2: partition:0,offset:53;partition:1,offset:56 latestOffsetsTimestamp: ``` ```sql INSERT INTO shipped_orders SELECT * FROM orders WHERE status = shipped; ``` ```sql EXECUTE STATEMENT SET BEGIN INSERT INTO shipped_orders SELECT * FROM orders WHERE status = 'shipped'; INSERT INTO cancelled_orders SELECT * FROM orders WHERE status = 'cancelled'; INSERT INTO returned_orders SELECT * FROM orders WHERE status = 'returned'; INSERT INTO other_orders SELECT * FROM orders WHERE status NOT IN ('returned', 'shipped', 'cancelled') END; ``` ```sql INSERT INTO ordered_products SELECT o.*, order_products.* FROM orders AS o CROSS JOIN UNNEST(orders.products) AS `order_products` (product_id, category, quantity, unit_price, net_price) ``` ```sql CREATE TABLE ... AS ... ``` ```sql orders-with-customers-v1-1 ``` ```sql orders-with-customers-v1-2 ``` --- ### Snapshot Queries in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/concepts/snapshot-queries.html Snapshot Queries in Confluent Cloud for Apache Flink¶ In Confluent Cloud for Apache Flink®, a snapshot query is a query that reads data from a table at a specific point in time. In contrast with a streaming query, which runs continuously and returns results incrementally, a snapshot query runs, returns results, and then exits. Snapshot queries are also known as point-in-time or pull queries. You can query Kafka topics as well as Apache Iceberg™ tables by using Confluent Tableflow. Note Snapshot query is an Early Access Program feature in Confluent Cloud for Apache Flink. An Early Access feature is a component of Confluent Cloud introduced to gain feedback. This feature should be used only for evaluation and non-production testing purposes or to provide feedback to Confluent, particularly as it becomes more widely available in follow-on preview editions. Early Access Program features are intended for evaluation use in development and testing environments only, and not for production use. Early Access Program features are provided: (a) without support; (b) “AS IS”; and (c) without indemnification, warranty, or condition of any kind. No service level commitment will apply to Early Access Program features. Early Access Program features are considered to be a Proof of Concept as defined in the Confluent Cloud Terms of Service. Confluent may discontinue providing preview releases of the Early Access Program features at any time in Confluent’s sole discretion. Snapshot query uses¶ A snapshot query returns a consistent view of your data at the current point in time, similar to taking a photograph of your data at that moment. This is particularly useful when you need to: Generate reports that reflect your data’s state at a specific time Analyze historical data for auditing or compliance purposes Compare data states across different points in time Debug or investigate issues by examining past data states For example, if you want to know the total number of orders in your system at the current time, you can use a snapshot query. Snapshot mode¶ A snapshot query is an ordinary Flink SQL statement that has one additional property, named sql.snapshot.mode. To enable snapshot queries, set the sql.snapshot.mode property to now. You can set this property in the following ways: SQL Workspace: Toggle the Mode dropdown to Snapshot. Flink SQL: Prepend your query with SET 'sql.snapshot.mode' = 'now';. Table API: In the Cloud.Properties project file, add sql.snapshot.mode = now. REST API: In the statement’s spec.properties map, add "sql.snapshot.mode": "now". Terraform: In the statement properties, add "sql.snapshot.mode" = "now". Snapshot queries use Flink’s batch execution mode, which enables you to run batch processing jobs beside your existing stream processing workloads, within the same Confluent Cloud environment. Also, Confluent Cloud for Apache Flink bounds all sources, which means that Flink processes only a finite set of records up to a specific point in time, rather than continuously processing an infinite stream of incoming data. How snapshot queries work¶ When you execute a snapshot query, Flink performs the following steps: Determines the Kafka offsets corresponding to your current timestamp across all partitions Reads data from the source topics up to these offsets Processes the records to build the state of your tables at this point in time Returns the query results based on this state The query execution is optimized to use Kafka’s time index for efficient offset lookup, to leverage parallel processing across partitions, and to minimize the amount of data that needs to be processed. Snapshot queries and Tableflow¶ If Tableflow is enabled on a topic, snapshot queries on the topic run in a hybrid mode. If Tableflow is not enabled on a topic, the query reads from Kafka. If Tableflow is enabled on a topic, the query reads from both Kafka and Parquet, for Confluent Managed Storage and custom storage (BYOS). Run a snapshot query¶ To run a snapshot query, in a Flink workspace or the Flink SQL shell, prepend your query with the following SET statement: SET 'sql.snapshot.mode' = 'now'; Also, in a Flink workspace, you can change the Mode dropdown setting to Snapshot. For more information, see Run a Snapshot Query. Technical Details¶ Timestamp Resolution: Timestamps are processed with millisecond precision State Handling: For tables with state (like aggregations), Flink reconstructs the state by processing all relevant records up to the specified timestamp Parallelism: Queries are automatically parallelized across available compute resources Resource Optimization: Flink uses Kafka’s time index to quickly locate the relevant offsets, minimizing unnecessary data scanning Relationship to Batch Mode¶ Snapshot queries are closely related to Flink’s batch processing mode. When you execute a snapshot query: Flink automatically switches to batch mode processing The query processes a finite, bounded dataset up to the current timestamp The computation benefits from batch optimizations like sort-merge joins Resources are released once the query completes Results are deterministic and reproducible This behavior contrasts with streaming queries which: Process continuous, unbounded data streams Maintain persistent state and resources Produce incremental, real-time results May give different results when rerun due to new data Billing¶ Snapshot queries are billed in CFUs, in the same way that streaming queries are. For more information, see Flink Billing. Related content¶ Run a Snapshot Query Query Tableflow Tables with Flink Statements Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql sql.snapshot.mode ``` ```sql sql.snapshot.mode ``` ```sql SET 'sql.snapshot.mode' = 'now'; ``` ```sql Cloud.Properties ``` ```sql sql.snapshot.mode = now ``` ```sql spec.properties ``` ```sql "sql.snapshot.mode": "now" ``` ```sql "sql.snapshot.mode" = "now" ``` ```sql SET 'sql.snapshot.mode' = 'now'; ``` --- ### Statement CFU Metrics in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/concepts/statement-cfu-metrics.html Statement CFU Metrics in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides detailed metrics to help you understand and manage your resource utilization. One critical aspect of this is statement CFU metrics. How to use statement CFU metrics¶ The statement CFU metrics give you insights into the resource consumed by individual statements running inside your compute pools. Specifically, the statement CFU metrics enable you to: Monitor individual statement usage: Accurately measure the number of CFUs each statement consumes over time. This metric is available for all types of statements submitted in Confluent Cloud for Apache Flink. Track resource distribution: Understand how the resources in a compute pool are being distributed among the statements running in the compute pool. Identify high-consumption statements: Pinpoint which statements are consuming the most CFUs, enabling you to optimize the statement’s Flink SQL code or adjust the resources available to this statement. By monitoring statement-level CFU consumption, you can make informed decisions about your Flink application’s cost efficiency and resource utilization. You can’t set minimum or maximum CFU limits on individual statements, but maximum CFU limits are configurable at the compute-pool level. Where to view statement CFU metrics?¶ The statement CFU consumption metrics are available to view in the statements summary table and in the statement side panel. Statements summary table: Get an overview of CFU consumption for all your statements directly within the statements summary table. This provides a quick way to identify the most resource intensive statements. Statement side panel: For a deeper dive into a statement’s resource usage, open the statement side panel. Here, you’ll find the current CFU consumption and a time-series chart that visualizes how the statement’s CFU consumption has evolved over time. How UDF resource consumption is represented¶ The statement CFU metric shows the resources consumed by your SQL statements and the resources consumed by any UDF instances the statement might invoke. Resources consumed by individual UDF instances will sometimes appear as fractional CFU values. This is because multiple UDF instances can be consolidated, or “rolled into,” a single CFU of resources. Up to three instances of a UDF can be combined into one CFU. The distribution of CFUs amongst UDFs in a compute pool is flexible. Three UDF instances across different statements can be rolled into a single CFU, as long as the statements are in the same compute pool. Also, different UDF functions and their instances can be consolidated into a single CFU, as long as they are in the same compute pool. Understanding differences between CFUs for compute pool and statement¶ When monitoring resource consumption in Confluent Cloud for Apache Flink, you might observe minor differences between your compute pool CFU metrics and the aggregated sum of your statement CFU metrics. These discrepancies are expected and are caused by rounding. If your statements use UDFs, you may see a maximum discrepancy of 2 CFUs between the compute pool CFU metrics and the total sum of your statement CFU metrics. For statements not utilizing UDFs, the maximum expected discrepancy between the compute pool CFU metrics and the total sum of your statement CFU metrics is 1 CFU. Note You are billed based on the compute pool CFU metrics, not on the summed total of individual statement CFU metrics. Related content¶ Billing Compute Pools Flink SQL Statements Confluent Cloud Pricing --- ### Flink SQL Statements in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/concepts/statements.html Flink SQL Statements in Confluent Cloud for Apache Flink¶ In Confluent Cloud for Apache Flink®, a statement represents a high-level resource that’s created when you enter a SQL query. Each statement has a property that holds the SQL query that you entered. Based on the SQL query, the statement may be one of these kinds: A metadata operation, or DDL statement A background statement, which writes data back to a table/topic while running in the background A foreground statement, which writes data back to the UI or a client. In all of these cases, the statement represents any SQL statement for Data Definition Language (DDL), Data Manipulation Language (DML), and Data Query Language (DQL). When you submit a SQL query, Confluent Cloud creates a statement resource. You can create a statement resource from any Confluent-supported interface, including the SQL shell, Confluent CLI, Cloud Console, the REST API, and Terraform. The SQL query within a statement is immutable, which means that you can’t make changes to the SQL query once it’s been submitted. If you need to edit a statement, stop the running statement and create a new statement. You can change the security principal for the statement. If a statement is running under a user account, you can change it to run under a service account by using the Confluent Cloud Console, Confluent CLI, the REST API, or the Terraform provider. Running a statement under a service account provides better security and stability, ensuring that your statements aren’t affected by changes in user status or authorization. Also, you can change the compute pool that runs a statement. This can be useful if you’re close to maxing out the resources in one pool. You must stop the statement before changing the principal or compute pool, then restart the statement after the change. Confluent Cloud for Apache Flink enforces a 30-day retention for statements in terminal states. For example, once a statement transitions to the STOPPED state, it no longer consumes compute and is deleted after 30 days. If there is no consumer for the results of a foreground statement for five minutes or longer, Confluent Cloud moves the statement to the STOPPED state. Limit on query text size¶ Confluent Cloud for Apache Flink has a limit of 4 MB on the size of query text. This limit includes string and binary literals that are part of the query. The maximum length of a statement name is 72 characters. If you combine multiple SQL statements into a single semicolon-separated string, the length limit applies to the entire string. If the query size is greater than the 4 MB limit, you receive the following error. This query is too large to process (exceeds 4194304 bytes). This can happen due to: * Complex query structure. * Too many columns selected or expanded due to * usage. * Multiple table joins. * Large number of conditions. Try simplifying your query or breaking it into smaller parts. Lifecycle operations statements¶ These are the supported lifecycle operations for a statement. Statements have a lifecycle that includes the following states: Pending: The statement has been submitted and Flink is preparing to start running the statement. Running: Flink is actively running the statement. Completed: The statement has completed all of its work. Deleting: The statement is being deleted. Failed: The statement has encountered an error and is no longer running. Degraded: The statement appears unhealthy, for example, no transactions have been committed for a long time, or the statement has frequently restarted recently. Stopping: The statement is about to be stopped. Stopped: The statement has been stopped and is no longer running. Submit a statement¶ SQL shell Cloud Console REST API statements endpoint List running statements¶ SQL shell SHOW JOBS statement Confluent CLI Cloud Console REST API statements endpoint Describe a statement¶ Confluent CLI Cloud Console REST API statement endpoint Delete a statement¶ Confluent CLI Cloud Console REST API DELETE request List statement exceptions¶ Confluent CLI Cloud Console Stop and resume a statement¶ Confluent CLI REST API UPDATE request Cloud Console Queries in Flink¶ Flink enables issuing queries with an ANSI-standard SQL on data at rest (batch) and data in motion (streams). These are the queries that are possible with Flink SQL. Metadata queriesCRUD on catalogs, databases, tables, etc. Because Flink implements ANSI-Standard SQL, Flink uses a database analogy, and similar to a database, it uses the concepts of catalogs, databases and tables. In Apache Kafka®, these concepts map to environments, Kafka clusters, and topics, respectively. Ad-hoc / exploratory queriesYou can issue queries on a topic and see the results immediately. A query can be a batch query (“show me what happened up to now”), or a transient streaming query (“show me what happened up to now and give me updates for the near future”). In this case, when the query or the session is ended, no more compute is needed. Streaming queriesThese queries run continuously and read data from one or more tables/topics and write results of the queries to one table/topic. In general, Flink supports both batch and stream processing, but the exact subset of allowed operations differs slightly depending of the type of query. For more information, see Flink SQL Queries. All queries are executed in streaming execution mode, whether the sources are bounded or unbounded. Data lifecycle¶ Broadly speaking, the Flink SQL lifecycle is: Data is read into a Flink table from Kafka via the Flink connector for Kafka. Data is processed using SQL statements. Data is processed using Flink task managers (managed by Confluent and not exposed to users), which are part of the Flink runtime. Some data may be stored temporarily as state in Flink while it’s being processed Data is returned to the user as a result-set. The result-set may be bounded, in which case the query terminates. The result-set may be unbounded, in which case the query runs until canceled manually. OR Data is written back out to one or more tables. Data is stored in Kafka topics. Schema for the table is stored in Flink Metastore and synchronized out to Schema Registry. Flink SQL Data Definition Language (DDL) statements¶ Data Definition Language (DDL) statements are imperative verbs that define metadata in Flink SQL by adding, changing, or deleting tables. Data Definition Language statements modify metadata only and don’t operate on data. Use these statements with declarative Flink SQL Queries to create your Flink SQL applications. Flink SQL makes it simple to develop streaming applications using standard SQL. It’s easy to learn Flink SQL if you’ve ever worked with a database or SQL-like system that’s ANSI-SQL 2011 compliant. Available DDL statements¶ These are the available DDL statements in Confluent Cloud for Flink SQL. ALTER ALTER MODEL Statement in Confluent Cloud for Apache Flink ALTER TABLE Statement in Confluent Cloud for Apache Flink ALTER VIEW Statement in Confluent Cloud for Apache Flink CREATE CREATE FUNCTION Statement CREATE MODEL Statement in Confluent Cloud for Apache Flink CREATE TABLE Statement in Confluent Cloud for Apache Flink CREATE VIEW Statement in Confluent Cloud for Apache Flink DESCRIBE DESCRIBE Statement in Confluent Cloud for Apache Flink DROP DROP MODEL Statement in Confluent Cloud for Apache Flink DROP TABLE Statement in Confluent Cloud for Apache Flink DROP VIEW Statement in Confluent Cloud for Apache Flink EXPLAIN EXPLAIN Statement in Confluent Cloud for Apache Flink RESET RESET Statement in Confluent Cloud for Apache Flink SET SET Statement in Confluent Cloud for Apache Flink SHOW SHOW Statements in Confluent Cloud for Apache Flink USE USE CATALOG Statement in Confluent Cloud for Apache Flink USE Statement in Confluent Cloud for Apache Flink Related content¶ Flink SQL Queries Stream Processing Concepts Built-in Functions Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql This query is too large to process (exceeds 4194304 bytes). This can happen due to: * Complex query structure. * Too many columns selected or expanded due to * usage. * Multiple table joins. * Large number of conditions. Try simplifying your query or breaking it into smaller parts. ``` --- ### Time and Watermarks in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/concepts/timely-stream-processing.html Time and Watermarks in Confluent Cloud for Apache Flink¶ Timely stream processing is an extension of stateful stream processing that incorporates time into the computation. It’s commonly used for time series analysis, aggregations based on windows, and event processing where the time of occurrence is important. If you’re working with timely Apache Flink® applications on Confluent Cloud, it’s important to consider certain factors to ensure optimal performance. Learn more about these considerations in the following sections. Notions of time: Event Time and Processing Time¶ When referring to time in a streaming program, like when you define windows, different notions of time may apply. Processing time¶ Processing time refers to the system time of the machine that’s executing the operation. When a streaming program runs on processing time, all time-based operations, like time windows, use the system clock of the machines that run the operator. An hourly processing time window includes all records that arrived at a specific operator between the times when the system clock indicated the full hour. For example, if an application begins running at 9:15 AM, the first hourly processing time window includes events processed between 9:15 AM and 10:00 AM, the next window includes events processed between 10:00 AM and 11:00 AM, and so on. Processing time is the simplest notion of time and requires no coordination between streams and machines. It provides the best performance and the lowest latency. But in distributed and asynchronous environments, processing time doesn’t provide determinism, because it’s susceptible to the speed at which records arrive in the system, like from a message queue, to the speed at which records flow between operators inside the system, and to outages (scheduled, or otherwise). Event time¶ Event time is the time that each individual event occurred on its producing device. This time is typically embedded within the records before they enter Flink, and this event timestamp can be extracted from each record. In event time, the progress of time depends on the data, not on any wall clocks. Event-time programs must specify how to generate event-time watermarks, which is the mechanism that signals progress in event time. This watermarking mechanism is described in the Event Time and Watermarks section. In a perfect world, event-time processing would yield completely consistent and deterministic results, regardless of when events arrive, or their ordering. But unless the events are known to arrive in-order (by timestamp), event-time processing incurs some latency while waiting for out-of-order events. Because it’s only possible to wait for a finite period of time, this places a limit on how deterministic event-time applications can be. Assuming all of the data has arrived, event-time operations behave as expected, and produce correct and consistent results even when working with out-of-order or late events, or when reprocessing historic data. For example, an hourly event-time window contains all records that carry an event timestamp that falls into that hour, regardless of the order in which they arrive, or when they’re processed. For more information, see Lateness. Sometimes when an event-time program is processing live data in real-time, it uses some processing time operations in order to guarantee that they are progressing in a timely fashion. Event Time and Processing Time¶ Event Time and Watermarks¶ Event time¶ A stream processor that supports event time needs a way to measure the progress of event time. For example, a window operator that builds hourly windows needs to be notified when event time has passed beyond the end of an hour, so that the operator can close the window in progress. Event time can progress independently of processing time, as measured by wall clocks. For example, in one program, the current event time of an operator may trail slightly behind the processing time, accounting for a delay in receiving the events, while both proceed at the same speed. But another streaming program might progress through weeks of event time with only a few seconds of processing, by fast-forwarding through some historic data already buffered in an Apache Kafka® topic. Watermarks¶ The mechanism in Flink to measure progress in event time is watermarks. Watermarks determine when to make progress during processing or wait for more records. Certain SQL operations, like windows, interval joins, time-versioned joins, and MATCH_RECOGNIZE require watermarks. Without watermarks, they don’t produce output. By default, every table has a watermark strategy applied. A watermark means, “I have seen all records until this point in time”. It’s a long value that usually represents epoch milliseconds. The watermark of an operator is the minimum of received watermarks over all partitions of all inputs. It triggers the execution of time-based operations within this operator before sending the watermark downstream. Watermarks can be emitted for every record, or they can be computed and emitted on a wall-clock interval. By default, Flink emits them every 200 ms. The built-in function, CURRENT_WATERMARK, enables printing the current watermark for the executing operator. Providing a timestamp is a prerequisite for providing a default watermark. Without providing some timestamp, neither a watermark nor a time attribute is possible. In Flink SQL, only time attributes can be used for time-based operations. A time attribute must be of type TIMESTAMP(p) or TIMESTAMP_LTZ(p), with 0 <= p <= 3. Defining a watermark over a timestamp makes it a time attribute. This is shown as a ROWTIME in a DESCRIBE statement. Watermarks and timestamps¶ Every Kafka record has a message timestamp which is part of the message format, and not in the payload or headers. Timestamp semantics can be CreateTime (default) or LogAppendTime. The timestamp is overwritten by the broker only if LogAppendTime is configured. Otherwise, it depends on the producer, which means that the timestamp can be user-defined, or it is set using the client’s clock if not defined by the user. In most cases, a Kafka record’s timestamp is expressed in epoch milliseconds in UTC. Watermarks flow as part of the data stream and carry a timestamp t. A Watermark(t) declares that event time has reached time t in that stream, meaning that there should be no more elements from the stream with a timestamp t’ <= t, that is, events with timestamps older or equal to the watermark. The following diagram shows a stream of events with logical timestamps and watermarks flowing inline. In this example, the events are in order with respect to their timestamps, meaning that the watermarks are simply periodic markers in the stream. A data stream with in-order events and watermarks¶ Watermarks are crucial for out-of-order streams, as shown in the following diagram, where the events are not ordered by their timestamps. In general, a watermark declares that by this point in the stream, all events up to a certain timestamp should have arrived. Once a watermark reaches an operator, the operator can advance its internal event time clock to the value of the watermark. A data stream with out-of-order events and watermarks¶ Event time is inherited by a freshly created stream element (or elements) from either the event that produced them or from the watermark that triggered creation of these elements. Watermarks in parallel streams¶ Watermarks are generated at, or directly after, source functions. Each parallel subtask of a source function usually generates its watermarks independently. These watermarks define the event time at that particular parallel source. As the watermarks flow through the streaming program, they advance the event time at the operators where they arrive. Whenever an operator advances its event time, it generates a new watermark downstream for its successor operators. Some operators consume multiple input streams. For example, a union, or operators following a keyBy(…) or partition(…) function consume multiple input streams. Such an operator’s current event time is the minimum of its input streams’ event times. As its input streams update their event times, so does the operator. The following diagram shows an example of events and watermarks flowing through parallel streams, and operators tracking event time. Parallel data streams and operators with events and watermarks¶ Lateness¶ It’s possible that certain elements violate the watermark condition, meaning that even after the Watermark(t) has occurred, more elements with timestamp t’ <= t occur. In many real-world systems, certain elements can be delayed for arbitrary lengths of time, making it impossible to specify a time by which all elements of a certain event timestamp will have occurred. Furthermore, even if the lateness can be bounded, delaying the watermarks by too much is often not desirable, because it causes too much delay in the evaluation of event-time windows. For this reason, streaming programs may explicitly expect some late elements. Late elements are elements that arrive after the system’s event time clock, as signaled by the watermarks, has already passed the time of the late element’s timestamp. Currently, Flink does not support late events or allowed lateness. Windowing¶ Aggregating events, for example in counts and sums, works differently with streams than in batch processing. For example, it’s impossible to count all elements in a stream, because streams are, in general, infinite (unbounded). Instead, aggregates on streams, like counts and sums, are scoped by windows, like as “count over the last 5 minutes”, or “sum of the last 100 elements”. Time windows and count windows on a data stream¶ Windows can be time driven, for example, “every 30 seconds”, or data driven, for example, “every 100 elements”. There are different types of windows, for example: Tumbling windows: no overlap Sliding windows: with overlap Session windows: punctuated by a gap of inactivity For more information, see: Window Aggregation Queries in Confluent Cloud for Apache Flink Window Deduplication Queries in Confluent Cloud for Apache Flink Window Join Queries in Confluent Cloud for Apache Flink Window Top-N Queries in Confluent Cloud for Apache Flink Windowing Table-Valued Functions (Windowing TVFs) in Confluent Cloud for Apache Flink Watermarks and windows¶ In the following example, the source is a Kafka topic with 4 partitions. The Flink job is running with a parallelism of 2, and each instance of the Kafka source reads from 2 partitions. Each event has a key, shown as a letter from A to D, and a timestamp. The events shown in bold text have already been read. The events in gray, to the left of the read position, will be read next. The events that have already been read are shuffled by key into the window operators, where the events are counted by key for each hour. Example Flink job graph with windows and watermarks.¶ Because the hour from 1 to 2 o’clock hasn’t been finalized yet, the windows keep track of the counters for that hour. There have been two events for key A for that hour, one event for key B, and so on. Because events for the following hour have already begun to appear, these windows also maintain counters for the hour from 2 o’clock to 3 o’clock. These windows wait for watermarks to trigger them to produce their results. The watermarks come from the watermark generators in the Kafka source operators. For each Kafka partition, the watermark generator keeps track of the largest timestamp seen so far, and subtracts from that an estimate of the expected out-of-orderness. For example, for Partition 1, the largest timestamp is 1:30. Assuming that the events are at most 1 minute out of order, then the watermark for Partition 1 is 1:29. A similar computation for Partition 3 yields a watermark of 1:30, and so on for the remaining partitions. Each of the two Kafka source instances take as its watermark the minimum of these per-partition watermarks From the point of view of the uppermost Kafka source operator, the watermark it produces should include a timestamp that reflects how complete the stream is that it is producing. This stream from Kafka Source 1 includes events from both Partition 1 and Partition 3, so it can be no more complete than the furthest behind of these two partitions, which is Partition 1. Although Partition 1 has seen an event with a timestamp as late as 1:30, it reports its watermark as 1:29, because it allowing for its events to be up to one minute out-of-order. This same reasoning is applied as the watermarks flow downstream through the job graph. Each instance of the window operator has received watermarks from the two Kafka source instances. The current watermark at both of the window operators is 1:17, because this is the furthest behind of the watermarks coming into the windows from the Kafka sources. The furthest behind of all four Kafka partitions determines the overall progress of the windows. Watermark alignment¶ Watermark alignment enables you to specify how tightly synchronized your streams should be, preventing any of the sources from getting too far ahead of the others. It addresses the problem of temporal joins between streams with progressively diverging timestamps. When performing temporal joins between two streams, if one stream is significantly ahead of the other, data from the leading stream must be buffered while waiting for the watermark of the lagging stream to advance. As timestamps diverge further, the buffering requirements grow, potentially causing performance degradation and operational issues, like checkpointing failures. Watermark alignment enables you to pause reading from streams that are too far ahead, enabling lagging streams to catch up and preventing the situation from worsening. This feature is particularly valuable when joining streams that have naturally diverging timestamps, such as when one data source produces events more frequently or with different timing characteristics than another. Watermark alignment provides these benefits: Reduces memory buffering requirements Improves performance by preventing excessive data buffering Prevents operational problems like checkpointing failures Provides control over stream synchronization In Confluent Cloud for Apache Flink, watermark alignment is enabled by default. Set the sql.tables.scan.watermark-alignment.max-allowed-drift session option to change the maximum allowed deviation, or watermark drift. The default maximum watermark drift is 5 minutes. This value matches it with the default maximum idleness detection timeout, which is also 5 minutes. Otherwise, watermark alignment would occur while Flink waits for a partition to switch to idle, potentially wasting CPU resources. Only increase the watermark alignment’s maximum allowed drift to match the idleness timeout when you increase the idleness timeout. Decreasing the watermark alignment’s maximum allowed drift may be justified if records throughput, expressed as records per minute of event time, is too large for windowed/temporal operators to buffer the default 5 minutes of the data and the window’s length is lower than 5 minutes. Time attributes¶ Confluent Cloud for Apache Flink can process data based on different notions of time. Event time refers to stream processing based on timestamps that are attached to each row. The timestamps can encode when an event happened. Processing time refers to the machine’s system time that’s executing the operation. Processing time is also known as “epoch time”, for example, Java’s System.currentTimeMillis(). Processing time is not supported in Confluent Cloud for Apache Flink. Time attributes can be part of every table schema. They are defined when creating a table from a CREATE TABLE DDL statement. Once a time attribute is defined, it can be referenced as a field and used in time-based operations. As long as a time attribute is not modified and is simply forwarded from one part of a query to another, it remains a valid time attribute. Time attributes behave like regular timestamps, and are accessible for calculations. When used in calculations, time attributes are materialized and act as standard timestamps, but ordinary timestamps can’t be used in place of, or converted to, time attributes. Event time¶ Event time enables a table program to produce results based on timestamps in every record, which allows for consistent results despite out-of-order or late events. Event time also ensures the replayability of the results of the table program when reading records from persistent storage. Also, event time enables unified syntax for table programs in both batch and streaming environments. A time attribute in a streaming environment can be a regular column of a row in a batch environment. To handle out-of-order events and to distinguish between on-time and late events in streaming, Flink must know the timestamp for each row, and it also needs regular indications of how far along in event time the processing has progressed so far, by using watermarks. You can define event-time attributes in CREATE TABLE statements. Defining in DDL¶ The event-time attribute is defined by using a WATERMARK clause in a CREATE TABLE DDL statement. A watermark statement defines a watermark generation expression on an existing event-time field, which marks the event-time field as the event-time attribute. For more information about watermark strategies, see Watermark clause. Flink SQL supports defining an event-time attribute on TIMESTAMP and TIMESTAMP_LTZ columns. If the timestamp data in the source is represented as year-month-day-hour-minute-second, usually a string value without time-zone information, for example, 2020-04-15 20:13:40.564, it’s recommended to define the event-time attribute as a TIMESTAMP column. CREATE TABLE user_actions ( user_name STRING, data STRING, user_action_time TIMESTAMP(3), -- Declare the user_action_time column as an event-time attribute -- and use a 5-seconds-delayed watermark strategy. WATERMARK FOR user_action_time AS user_action_time - INTERVAL '5' SECOND ) WITH ( ... ); SELECT TUMBLE_START(user_action_time, INTERVAL '10' MINUTE), COUNT(DISTINCT user_name) FROM user_actions GROUP BY TUMBLE(user_action_time, INTERVAL '10' MINUTE); If the timestamp data in the source is represented as epoch time, which is usually a LONG value like 1618989564564, consider defining the event-time attribute as a TIMESTAMP_LTZ column. CREATE TABLE user_actions ( user_name STRING, data STRING, ts BIGINT, time_ltz AS TO_TIMESTAMP_LTZ(ts, 3), -- Declare the time_ltz column as an event-time attribute -- and use a 5-seconds-delayed watermark strategy. WATERMARK FOR time_ltz AS time_ltz - INTERVAL '5' SECOND ) WITH ( ... ); SELECT TUMBLE_START(time_ltz, INTERVAL '10' MINUTE), COUNT(DISTINCT user_name) FROM user_actions GROUP BY TUMBLE(time_ltz, INTERVAL '10' MINUTE); Processing time¶ Processing time enables a table program to produce results based on the time of the local machine. It’s the simplest notion of time, but it generates non-deterministic results. Processing time doesn’t require timestamp extraction or watermark generation. Processing time is not supported in Confluent Cloud for Apache Flink. Related content¶ Flink implements many techniques from the Dataflow Model. For a good introduction to event time and watermarks, have a look at these articles. Course: Watermarks Demystified Video: Watermark Alignment Explained in 2 Minutes Blog post: Introducing Stream Windows in Apache Flink Streaming 101 (O’Reilly online learning) by Tyler Akidau Dataflow Model CREATE TABLE Statement in Confluent Cloud for Apache Flink Flink SQL Queries Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql TIMESTAMP(p) ``` ```sql TIMESTAMP_LTZ(p) ``` ```sql 0 <= p <= 3 ``` ```sql sql.tables.scan.watermark-alignment.max-allowed-drift ``` ```sql System.currentTimeMillis() ``` ```sql CREATE TABLE ``` ```sql CREATE TABLE ``` ```sql 2020-04-15 20:13:40.564 ``` ```sql CREATE TABLE user_actions ( user_name STRING, data STRING, user_action_time TIMESTAMP(3), -- Declare the user_action_time column as an event-time attribute -- and use a 5-seconds-delayed watermark strategy. WATERMARK FOR user_action_time AS user_action_time - INTERVAL '5' SECOND ) WITH ( ... ); SELECT TUMBLE_START(user_action_time, INTERVAL '10' MINUTE), COUNT(DISTINCT user_name) FROM user_actions GROUP BY TUMBLE(user_action_time, INTERVAL '10' MINUTE); ``` ```sql 1618989564564 ``` ```sql TIMESTAMP_LTZ ``` ```sql CREATE TABLE user_actions ( user_name STRING, data STRING, ts BIGINT, time_ltz AS TO_TIMESTAMP_LTZ(ts, 3), -- Declare the time_ltz column as an event-time attribute -- and use a 5-seconds-delayed watermark strategy. WATERMARK FOR time_ltz AS time_ltz - INTERVAL '5' SECOND ) WITH ( ... ); SELECT TUMBLE_START(time_ltz, INTERVAL '10' MINUTE), COUNT(DISTINCT user_name) FROM user_actions GROUP BY TUMBLE(time_ltz, INTERVAL '10' MINUTE); ``` --- ### User-defined Functions in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/concepts/user-defined-functions.html User-defined Functions in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports user-defined functions (UDFs), which are extension points for running custom logic that you can’t express in the system-provided Flink SQL queries or with the Table API. You can implement user-defined functions in Java, and you can use third-party libraries within a UDF. Confluent Cloud for Apache Flink supports scalar functions (UDFs), which map scalar values to a new scalar value, and table functions (UDTFs), which map multiple scalar values to multiple output rows. Create an example UDF: Create a User Defined Function Add logging to your UDFs: Enable Logging in a User Defined Function Availability: UDF regional availability Limitations: UDF limitations Example code: Flink UDF Java Examples Artifacts¶ Artifacts are Java packages, or JAR files, that contain user-defined functions and all of the required dependencies. Artifacts are uploaded to Confluent Cloud and scoped to a specific region in a Confluent Cloud environment. To be used for UDF, artifacts must follow a few common implementation principles, which are described in the following sections. To use a UDF, you must register one or more functions that reference the artifact. Functions¶ Functions are SQL objects that reference a class in an artifact and can be used in any SQL Statement or Table API program. Once an artifact is uploaded, you register a function by using the CREATE FUNCTION statement. Once a function is registered, you can invoked it from any SQL statement or Table API program. The following example shows how to register a TShirtSizingIsSmaller function and invoke it in a SQL statement. -- Register the function. CREATE FUNCTION is_smaller AS 'com.example.my.TShirtSizingIsSmaller' USING JAR 'confluent-artifact:///'; -- Invoke the function. SELECT IS_SMALLER ('L', 'M'); To build and upload a UDF to Confluent Cloud for Apache Flink for use in Flink SQL or the Table API, see Create a UDF. RBAC¶ To upload artifacts, register functions, and invoke functions, you must have the FlinkDeveloper role or higher. For more information, see Grant Role-Based Access. Shared responsibility¶ Confluent supports the UDF infrastructure in Confluent Cloud only. It is your responsibility to troubleshoot custom UDF issues for functions you build or that are provided to you by others. The following provides additional details about shared support responsibilities. Customer Managed: You are responsible for function logic. Confluent does not provide any support for debugging services and features within UDFs. Confluent Managed: Confluent is responsible for managing the Flink services and custom compute platform, and provides support for these. Scalar functions¶ A user-defined scalar function maps zero, one, or multiple scalar values to a new scalar value. You can use any data type listed in Data Types as a parameter or return type of an evaluation method. To define a scalar function, extend the ScalarFunction base class in org.apache.flink.table.functions and implement one or more evaluation methods named eval(...). The following code example shows how to define your own hash code function. import org.apache.flink.table.annotation.InputGroup; import org.apache.flink.table.api.*; import org.apache.flink.table.functions.ScalarFunction; import static org.apache.flink.table.api.Expressions.*; public static class HashFunction extends ScalarFunction { // take any data type and return INT public int eval(@DataTypeHint(inputGroup = InputGroup.ANY) Object o) { return o.hashCode(); } } The following example shows how to call the HashFunction UDF in a Flink SQL statement. SELECT HashFunction(myField) FROM MyTable; To build and upload a UDF to Confluent Cloud for Apache Flink for use in Flink SQL, see Create a User Defined Function. Table functions¶ Confluent Cloud for Apache Flink also supports user-defined table functions (UDTFs), which take multiple scalar values as input arguments and return multiple rows as output, instead of a single value. To create a user-defined table function, extend the TableFunction base class in org.apache.flink.table.functions and implement one or more of the evaluation methods, which are named eval(...). Input and output data types are inferred automatically by using reflection, including the generic argument T of the class, for determining the output data type. Unlike scalar functions, the evaluation method itself doesn’t have a return type. Instead, a table function provides a collect(T) method that’s called within every evaluation method to emit zero, one, or more records. In the Table API, a table function is used with the .joinLateral(...) or .leftOuterJoinLateral(...) operators. The joinLateral operator cross-joins each row from the outer table (the table on the left of the operator) with all rows produced by the table-valued function (on the right side of the operator). The leftOuterJoinLateral operator joins each row from the outer table with all rows produced by the table-valued function and preserves outer rows, for which the table function returns an empty table. Note User-defined table functions are distinct from the Table API but can be used in Table API code. In SQL, use LATERAL TABLE() with JOIN or LEFT JOIN with an ON TRUE join condition. The following code example shows how to implement a simple string splitting function. import org.apache.flink.table.annotation.DataTypeHint; import org.apache.flink.table.annotation.FunctionHint; import org.apache.flink.table.api.*; import org.apache.flink.table.functions.TableFunction; import org.apache.flink.types.Row; import static org.apache.flink.table.api.Expressions.*; @FunctionHint(output = @DataTypeHint("ROW")) public static class SplitFunction extends TableFunction { public void eval(String str) { for (String s : str.split(" ")) { // use collect(...) to emit a row collect(Row.of(s, s.length())); } } } The following example shows how to call the SplitFunction UDTF in a Flink SQL statement. SELECT myField, word, length FROM MyTable LEFT JOIN LATERAL TABLE(SplitFunction(myField)) ON TRUE; To build and upload a user-defined table function to Confluent Cloud for Apache Flink for use in Flink SQL, see Create a User Defined Table Function. Implementation considerations¶ All UDFs adhere to a few common implementation principles, which are described in the following sections. Function class Evaluation methods Type inference Named parameters Scalar functions Table functions The following code example shows how to implement a simple scalar function and how to call it in Flink SQL. For the Table API, you can register the function in code and invoke it. For SQL queries, your UDF must be registered by using the CREATE FUNCTION statement. For more information, see Create a User-defined Function. import org.apache.flink.table.api.*; import org.apache.flink.table.functions.ScalarFunction; import static org.apache.flink.table.api.Expressions.*; // define function logic public static class SubstringFunction extends ScalarFunction { public String eval(String s, Integer begin, Integer end) { return s.substring(begin, end); } } The following example shows how to call the SubstringFunction UDF in a Flink SQL statement. SELECT SubstringFunction('test string', 2, 5); Function class¶ Your implementation class must extend one of the system-provided base classes. Scalar functions extend the org.apache.flink.table.functions.ScalarFunction class. Table functions extend the org.apache.flink.table.functions.TableFunction class. The class must be declared public, not abstract, and must be accessible globally. Non-static inner or anonymous classes are not supported. Evaluation methods¶ You define the behavior of a scalar function by implementing a custom evaluation method, named eval, which must be declared public. You can overload evaluation methods by implementing multiple methods named eval. The evaluation method is called by code-generated operators during runtime. Regular JVM method-calling semantics apply, so these implementation options are available: You can implement overloaded methods, like eval(Integer) and eval(LocalDateTime). You can use var-args, like eval(Integer...). You can use object inheritance, like eval(Object) that takes both LocalDateTime and Integer. You can use combinations of these, like eval(Object...) that takes all kinds of arguments. The ScalarFunction base class provides a set of optional methods that you can override, open(), close(), isDeterministic(), and supportsConstantFolding(). You can use the open() method for initialization work and the close() method for cleanup work. Internally, Table API and SQL code generation works with primitive values where possible. To reduce overhead during runtime, a user-defined scalar function should declare parameters and result types as primitive types instead of their boxed classes. For example, DATE/TIME is equal to int, and TIMESTAMP is equal to long. The following code example shows a user-defined function that has overloaded eval methods. import org.apache.flink.table.functions.ScalarFunction; // function with overloaded evaluation methods public static class SumFunction extends ScalarFunction { public Integer eval(Integer a, Integer b) { return a + b; } public Integer eval(String a, String b) { return Integer.valueOf(a) + Integer.valueOf(b); } public Integer eval(Double... d) { double result = 0; for (double value : d) result += value; return (int) result; } } Type inference¶ The Table API is strongly typed, so both function parameters and return types must be mapped to a data type. The Flink planner needs information about expected types, precision, and scale. Also it needs information about how internal data structures are represented as JVM objects when calling a user-defined function. Type inference is the process of validating input arguments and deriving data types for both the parameters and the result of a function. User-defined functions in Flink implement automatic type-inference extraction that derives data types from the function’s class and its evaluation methods by using reflection. If this implicit extraction approach with reflection fails, you can help the extraction process by annotating affected parameters, classes, or methods with @DataTypeHint and @FunctionHint. Automatic type inference¶ Automatic type inference inspects the function’s class and evaluation methods to derive data types for the arguments and return value of a function. The @DataTypeHint and @FunctionHint annotations support automatic extraction. For a list of classes that implicitly map to a data type, see Data type extraction. Data type hints¶ In some situations, you may need to support automatic extraction inline for parameters and return types of a function. In these cases you can use data type hints and the @DataTypeHint annotation to define data types. The following code example shows how to use data type hints. import org.apache.flink.table.annotation.DataTypeHint; import org.apache.flink.table.annotation.InputGroup; import org.apache.flink.table.functions.ScalarFunction; import org.apache.flink.types.Row; // user-defined function that has overloaded evaluation methods. public static class OverloadedFunction extends ScalarFunction { // No hint required for type inference. public Long eval(long a, long b) { return a + b; } // Define the precision and scale of a decimal. public @DataTypeHint("DECIMAL(12, 3)") BigDecimal eval(double a, double b) { return BigDecimal.valueOf(a + b); } // Define a nested data type. @DataTypeHint("ROW") public Row eval(int i) { return Row.of(String.valueOf(i), Instant.ofEpochSecond(i)); } // Enable wildcard input and custom serialized output. @DataTypeHint(value = "RAW", bridgedTo = ByteBuffer.class) public ByteBuffer eval(@DataTypeHint(inputGroup = InputGroup.ANY) Object o) { return MyUtils.serializeToByteBuffer(o); } } Function hints¶ In some situations, you may want one evaluation method to handle multiple different data types, or you may have overloaded evaluation methods with a common result type that should be declared only once. The @FunctionHint annotation provides a mapping from argument data types to a result data type. It enables annotating entire function classes or evaluation methods for input, accumulator, and result data types. You can declare one or more annotations on a class or individually for each evaluation method for overloading function signatures. All hint parameters are optional. If a parameter is not defined, the default reflection-based extraction is used. Hint parameters defined on a function class are inherited by all evaluation methods. The following code example shows how to use function hints. import org.apache.flink.table.annotation.DataTypeHint; import org.apache.flink.table.annotation.FunctionHint; import org.apache.flink.table.functions.TableFunction; import org.apache.flink.types.Row; // User-defined function with overloaded evaluation methods // but globally defined output type. @FunctionHint(output = @DataTypeHint("ROW")) public static class OverloadedFunction extends ScalarFunction { public void eval(int a, int b) { collect(Row.of("Sum", a + b)); } // Overloading arguments is still possible. public void eval() { collect(Row.of("Empty args", -1)); } } // Decouples the type inference from evaluation methods. // The type inference is entirely determined by the function hints. @FunctionHint( input = {@DataTypeHint("INT"), @DataTypeHint("INT")}, output = @DataTypeHint("INT") ) @FunctionHint( input = {@DataTypeHint("BIGINT"), @DataTypeHint("BIGINT")}, output = @DataTypeHint("BIGINT") ) @FunctionHint( input = {}, output = @DataTypeHint("BOOLEAN") ) public static class OverloadedFunction extends ScalarFunction { // Ensure a method exists that the JVM can call. public void eval(Object... o) { if (o.length == 0) { collect(false); } collect(o[0]); } } Named parameters¶ When you call a user-define function, you can use parameter names to specify the values of the parameters. Named parameters enable passing both the parameter name and value to a function. This approach avoids confusion caused by incorrect parameter order, and it improves code readability and maintainability. Also, named parameters can omit optional parameters, which are filled with null by default. Use the @ArgumentHint annotation to specify the name, type, and whether a parameter is required or not. The following code examples demonstrate how to use @ArgumentHint in different scopes. Use the @ArgumentHint annotation on the parameters of the eval method of the function: import com.sun.tracing.dtrace.ArgsAttributes; import org.apache.flink.table.annotation.ArgumentHint; import org.apache.flink.table.functions.ScalarFunction; public static class NamedParameterClass extends ScalarFunction { // Use the @ArgumentHint annotation to specify the name, type, and whether a parameter is required. public String eval(@ArgumentHint(name = "param1", isOptional = false, type = @DataTypeHint("STRING")) String s1, @ArgumentHint(name = "param2", isOptional = true, type = @DataTypeHint("INT")) Integer s2) { return s1 + ", " + s2; } } Use the @ArgumentHint annotation on the eval method of the function. import org.apache.flink.table.annotation.ArgumentHint; import org.apache.flink.table.functions.ScalarFunction; public static class NamedParameterClass extends ScalarFunction { // Use the @ArgumentHint annotation to specify the name, type, and whether a parameter is required. @FunctionHint( argument = {@ArgumentHint(name = "param1", isOptional = false, type = @DataTypeHint("STRING")), @ArgumentHint(name = "param2", isOptional = true, type = @DataTypeHint("INTEGER"))} ) public String eval(String s1, Integer s2) { return s1 + ", " + s2; } } Use the @ArgumentHint annotation on the class of the function. import org.apache.flink.table.annotation.ArgumentHint; import org.apache.flink.table.functions.ScalarFunction; // Use the @ArgumentHint annotation to specify the name, type, and whether a parameter is required. @FunctionHint( argument = {@ArgumentHint(name = "param1", isOptional = false, type = @DataTypeHint("STRING")), @ArgumentHint(name = "param2", isOptional = true, type = @DataTypeHint("INTEGER"))} ) public static class NamedParameterClass extends ScalarFunction { public String eval(String s1, Integer s2) { return s1 + ", " + s2; } } The @ArgumentHint annotation already contains the @DataTypeHint annotation, so you can’t use it with @DataTypeHint in @FunctionHint. When applied to function parameters, @ArgumentHint can’t be used with @DataTypeHint at the same time, so you should use @ArgumentHint instead. Named parameters take effect only when the corresponding class doesn’t contain overloaded functions and variable parameter functions, otherwise using named parameters causes an error. Determinism¶ Every user-defined function class can declare whether it produces deterministic results or not by overriding the isDeterministic() method. If the function is not purely functional, like random(), date(), or now(), the method must return false. By default, isDeterministic() returns true. Also, the isDeterministic() method may influence the runtime behavior. A runtime implementation might be called at two different stages. During planning¶ During planning, in the so-called pre-flight phase, if a function is called with constant expressions, or if constant expressions can be derived from the given statement, a function is pre-evaluated for constant expression reduction and might not be executed on the cluster. In these cases, you can use the isDeterministic() method to disable constant expression reduction. For example, the following calls to ABS are executed during planning: SELECT ABS(-1) FROM t; SELECT ABS(field) FROM t WHERE field = -1; But the following call to ABS is not executed during planning: SELECT ABS(field) FROM t; During runtime¶ If a function is called with non-constant expressions or isDeterministic() returns false, the function is executed on the cluster. System function determinism¶ The determinism of system (built-in) functions is immutable. According to Apache Calcite’s SqlOperator definition, there are two kinds of functions which are not deterministic: dynamic functions and non-deterministic functions. /** * Returns whether a call to this operator is guaranteed to always return * the same result given the same operands; true is assumed by default. */ public boolean isDeterministic() { return true; } /** * Returns whether it is unsafe to cache query plans referencing this * operator; false is assumed by default. */ public boolean isDynamicFunction() { return false; } The isDeterministic() method indicates the determinism of a function is evaluated per-record during runtime if it returns false. The isDynamicFunction() method implies the function can be evaluated only at query-start if it returns true. It will be pre-evaluated during planning only for batch mode. For streaming mode, it is equivalent to a non-deterministic function, because the query is executed continuously under the abstraction of a continuous query over dynamic tables, so the dynamic functions are also re-evaluated for each query execution, which is equivalent to per-record in the current implementation. The isDynamicFunction method applies only to system functions. The following system functions are always non-deterministic, which means they are evaluated per-record during runtime, both in batch and streaming mode. CURRENT_ROW_TIMESTAMP RAND RAND_INTEGER UNIX_TIMESTAMP UUID The following system temporal functions are dynamic and are pre-evaluated during planning (query-start) for batch mode and evaluated per-record for streaming mode. CURRENT_DATE CURRENT_TIME CURRENT_TIMESTAMP LOCALTIME LOCALTIMESTAMP NOW UDF regional availability¶ Flink UDFs are available in the following AWS regions. ap-east-1 ap-northeast-2 ap-south-1 ap-southeast-1 ap-southeast-2 ca-central-1 eu-central-1 eu-central-2 eu-north-1 eu-west-1 eu-west-2 me-south-1 sa-east-1 us-east-1 us-east-2 us-west-2 Flink UDFs are available in the following Azure regions. australiaeast brazilsouth centralindia centralus eastus eastus2 francecentral northeurope southcentralus southeastasia spaincentral uaenorth uksouth westeurope westus2 westus3 UDF limitations¶ User-defined functions have the following limitations. Confluent CLI version 4.13.0 or later is required. External network calls from UDFs are not supported. JDK 17 is the latest supported Java version for uploaded JAR files. Each Flink statement can have no more than 10 UDFs. Each organization/cloud/region/environment can have no more than 100 Flink artifacts. The size limit of each artifact is 100 MB. Aggregates are not supported. Table aggregates are not supported. Temporary functions are not supported. The ALTER FUNCTION statement is not supported. UDFs can’t be used in combination with MATCH_RECOGNIZE. Vararg functions are not supported. User-defined structured types are not supported. Python is not supported. Both inputs and outputs of the UDF have a row-size limit of 4MB. Custom type inference is not supported. Constant expression reduction is not supported. The UDF feature is optimized for streaming processing, so the initial query may be slow, but after the initial query, a UDF runs with low latency. File system access limitations¶ The file system is read-only in the runtime environment. UDFs can’t create, write, or modify files on the file system. This includes temporary files, model files, or any other file operations. Libraries that require file system write access, like those using JNI/native binaries that extract files from JARs, are not supported. JNI and native binary limitations¶ Libraries that use Java Native Interface (JNI) or require native binaries are not supported due to filesystem restrictions and potential architecture compatibility issues. UDF logging limitations¶ Log4j logging only: External UDF loggers can be composed only with the Apache Log4j logging framework. Burst rate to 1000/s: UDF logging supports up to 1000 log events per second for each UDF during a short burst of high activity. This helps to optimize performance and to reduce noise in logs. Events that exceed the maximum rate are dropped. Related content¶ CREATE FUNCTION Create a User-defined Function. Flink SQL Queries Flink UDF Java Examples Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql TShirtSizingIsSmaller ``` ```sql -- Register the function. CREATE FUNCTION is_smaller AS 'com.example.my.TShirtSizingIsSmaller' USING JAR 'confluent-artifact:///'; -- Invoke the function. SELECT IS_SMALLER ('L', 'M'); ``` ```sql ScalarFunction ``` ```sql org.apache.flink.table.functions ``` ```sql import org.apache.flink.table.annotation.InputGroup; import org.apache.flink.table.api.*; import org.apache.flink.table.functions.ScalarFunction; import static org.apache.flink.table.api.Expressions.*; public static class HashFunction extends ScalarFunction { // take any data type and return INT public int eval(@DataTypeHint(inputGroup = InputGroup.ANY) Object o) { return o.hashCode(); } } ``` ```sql HashFunction ``` ```sql SELECT HashFunction(myField) FROM MyTable; ``` ```sql TableFunction ``` ```sql org.apache.flink.table.functions ``` ```sql .joinLateral(...) ``` ```sql .leftOuterJoinLateral(...) ``` ```sql joinLateral ``` ```sql leftOuterJoinLateral ``` ```sql LATERAL TABLE() ``` ```sql import org.apache.flink.table.annotation.DataTypeHint; import org.apache.flink.table.annotation.FunctionHint; import org.apache.flink.table.api.*; import org.apache.flink.table.functions.TableFunction; import org.apache.flink.types.Row; import static org.apache.flink.table.api.Expressions.*; @FunctionHint(output = @DataTypeHint("ROW")) public static class SplitFunction extends TableFunction { public void eval(String str) { for (String s : str.split(" ")) { // use collect(...) to emit a row collect(Row.of(s, s.length())); } } } ``` ```sql SplitFunction ``` ```sql SELECT myField, word, length FROM MyTable LEFT JOIN LATERAL TABLE(SplitFunction(myField)) ON TRUE; ``` ```sql import org.apache.flink.table.api.*; import org.apache.flink.table.functions.ScalarFunction; import static org.apache.flink.table.api.Expressions.*; // define function logic public static class SubstringFunction extends ScalarFunction { public String eval(String s, Integer begin, Integer end) { return s.substring(begin, end); } } ``` ```sql SubstringFunction ``` ```sql SELECT SubstringFunction('test string', 2, 5); ``` ```sql org.apache.flink.table.functions.ScalarFunction ``` ```sql org.apache.flink.table.functions.TableFunction ``` ```sql eval(Integer) ``` ```sql eval(LocalDateTime) ``` ```sql eval(Integer...) ``` ```sql eval(Object) ``` ```sql LocalDateTime ``` ```sql eval(Object...) ``` ```sql ScalarFunction ``` ```sql isDeterministic() ``` ```sql supportsConstantFolding() ``` ```sql import org.apache.flink.table.functions.ScalarFunction; // function with overloaded evaluation methods public static class SumFunction extends ScalarFunction { public Integer eval(Integer a, Integer b) { return a + b; } public Integer eval(String a, String b) { return Integer.valueOf(a) + Integer.valueOf(b); } public Integer eval(Double... d) { double result = 0; for (double value : d) result += value; return (int) result; } } ``` ```sql @DataTypeHint ``` ```sql @FunctionHint ``` ```sql @DataTypeHint ``` ```sql @FunctionHint ``` ```sql @DataTypeHint ``` ```sql import org.apache.flink.table.annotation.DataTypeHint; import org.apache.flink.table.annotation.InputGroup; import org.apache.flink.table.functions.ScalarFunction; import org.apache.flink.types.Row; // user-defined function that has overloaded evaluation methods. public static class OverloadedFunction extends ScalarFunction { // No hint required for type inference. public Long eval(long a, long b) { return a + b; } // Define the precision and scale of a decimal. public @DataTypeHint("DECIMAL(12, 3)") BigDecimal eval(double a, double b) { return BigDecimal.valueOf(a + b); } // Define a nested data type. @DataTypeHint("ROW") public Row eval(int i) { return Row.of(String.valueOf(i), Instant.ofEpochSecond(i)); } // Enable wildcard input and custom serialized output. @DataTypeHint(value = "RAW", bridgedTo = ByteBuffer.class) public ByteBuffer eval(@DataTypeHint(inputGroup = InputGroup.ANY) Object o) { return MyUtils.serializeToByteBuffer(o); } } ``` ```sql @FunctionHint ``` ```sql import org.apache.flink.table.annotation.DataTypeHint; import org.apache.flink.table.annotation.FunctionHint; import org.apache.flink.table.functions.TableFunction; import org.apache.flink.types.Row; // User-defined function with overloaded evaluation methods // but globally defined output type. @FunctionHint(output = @DataTypeHint("ROW")) public static class OverloadedFunction extends ScalarFunction { public void eval(int a, int b) { collect(Row.of("Sum", a + b)); } // Overloading arguments is still possible. public void eval() { collect(Row.of("Empty args", -1)); } } // Decouples the type inference from evaluation methods. // The type inference is entirely determined by the function hints. @FunctionHint( input = {@DataTypeHint("INT"), @DataTypeHint("INT")}, output = @DataTypeHint("INT") ) @FunctionHint( input = {@DataTypeHint("BIGINT"), @DataTypeHint("BIGINT")}, output = @DataTypeHint("BIGINT") ) @FunctionHint( input = {}, output = @DataTypeHint("BOOLEAN") ) public static class OverloadedFunction extends ScalarFunction { // Ensure a method exists that the JVM can call. public void eval(Object... o) { if (o.length == 0) { collect(false); } collect(o[0]); } } ``` ```sql @ArgumentHint ``` ```sql @ArgumentHint ``` ```sql @ArgumentHint ``` ```sql import com.sun.tracing.dtrace.ArgsAttributes; import org.apache.flink.table.annotation.ArgumentHint; import org.apache.flink.table.functions.ScalarFunction; public static class NamedParameterClass extends ScalarFunction { // Use the @ArgumentHint annotation to specify the name, type, and whether a parameter is required. public String eval(@ArgumentHint(name = "param1", isOptional = false, type = @DataTypeHint("STRING")) String s1, @ArgumentHint(name = "param2", isOptional = true, type = @DataTypeHint("INT")) Integer s2) { return s1 + ", " + s2; } } ``` ```sql @ArgumentHint ``` ```sql import org.apache.flink.table.annotation.ArgumentHint; import org.apache.flink.table.functions.ScalarFunction; public static class NamedParameterClass extends ScalarFunction { // Use the @ArgumentHint annotation to specify the name, type, and whether a parameter is required. @FunctionHint( argument = {@ArgumentHint(name = "param1", isOptional = false, type = @DataTypeHint("STRING")), @ArgumentHint(name = "param2", isOptional = true, type = @DataTypeHint("INTEGER"))} ) public String eval(String s1, Integer s2) { return s1 + ", " + s2; } } ``` ```sql @ArgumentHint ``` ```sql import org.apache.flink.table.annotation.ArgumentHint; import org.apache.flink.table.functions.ScalarFunction; // Use the @ArgumentHint annotation to specify the name, type, and whether a parameter is required. @FunctionHint( argument = {@ArgumentHint(name = "param1", isOptional = false, type = @DataTypeHint("STRING")), @ArgumentHint(name = "param2", isOptional = true, type = @DataTypeHint("INTEGER"))} ) public static class NamedParameterClass extends ScalarFunction { public String eval(String s1, Integer s2) { return s1 + ", " + s2; } } ``` ```sql @ArgumentHint ``` ```sql @DataTypeHint ``` ```sql @DataTypeHint ``` ```sql @FunctionHint ``` ```sql @ArgumentHint ``` ```sql @DataTypeHint ``` ```sql @ArgumentHint ``` ```sql isDeterministic() ``` ```sql isDeterministic() ``` ```sql isDeterministic() ``` ```sql isDeterministic() ``` ```sql SELECT ABS(-1) FROM t; SELECT ABS(field) FROM t WHERE field = -1; ``` ```sql SELECT ABS(field) FROM t; ``` ```sql isDeterministic() ``` ```sql SqlOperator ``` ```sql /** * Returns whether a call to this operator is guaranteed to always return * the same result given the same operands; true is assumed by default. */ public boolean isDeterministic() { return true; } /** * Returns whether it is unsafe to cache query plans referencing this * operator; false is assumed by default. */ public boolean isDynamicFunction() { return false; } ``` ```sql isDeterministic() ``` ```sql isDynamicFunction() ``` ```sql isDynamicFunction ``` --- ### FAQ for Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/flink-faq.html Frequently Asked Questions for Confluent Cloud for Apache Flink¶ This topic provides answers to frequently asked questions about Confluent Cloud for Apache Flink®. What is Confluent Cloud for Apache Flink?¶ Confluent Cloud for Apache Flink is a fully managed, cloud-native service for stream processing using Flink SQL. It enables you to process, analyze, and transform data in real time directly on your Confluent Cloud-managed Kafka clusters. How do I get started with Confluent Cloud for Apache Flink?¶ Get started by clicking SQL Workspaces in the Confluent Cloud Console. For more information, see Flink SQL Quick Start with Confluent Cloud Console. Also, you can run the confluent flink shell command to start the Flink SQL shell. For more information, see Flink SQL Shell Quick Start. What is a compute pool?¶ A compute pool is a dedicated set of resources, measured in CFUs, that runs your Flink SQL statements. You must create a compute pool before running statements. Multiple statements can share a compute pool, and you can scale pools up or down as needed. For more information, see Compute Pools. How is Confluent Cloud for Apache Flink billed?¶ Billing is based on the number of CFUs provisioned in your compute pools and the duration for which they are running. You are charged for the resources allocated, not per statement. For more information, see Billing. What are the prerequisites for using Confluent Cloud for Apache Flink?¶ You need a Confluent Cloud account and an environment with Stream Governance enabled. You must have the appropriate roles and permissions, for example, the FlinkDeveloper role to run statements. You need access to at least one compute pool. What sources and sinks are supported?¶ Confluent Cloud for Apache Flink supports reading from and writing to Kafka topics in your Confluent Cloud environment. In addition, you use Confluent’s AI/ML features to perform searches on external tables. And you can use Confluent Tableflow to materialize streams to external tables. How do I monitor my Flink SQL statements?¶ You can monitor statements using the Cloud Console, which provides status, metrics, and logs. For advanced monitoring, use the Metrics API and Notifications for Confluent Cloud to set up alerts for failures, lag, and resource utilization. For more information, see Best practices for alerting. What happens if my statement fails?¶ If a statement fails, you will see an error message in the Cloud Console. You can view logs and metrics to diagnose the issue. Statements can be restarted after resolving the underlying problem. Can I use Flink SQL to join multiple topics?¶ Yes, you can use Flink SQL to join multiple Kafka topics, perform aggregations, windowing, filtering, and more. For more information, see the Flink SQL statements. How do I manage schema evolution?¶ Flink SQL integrates with Confluent’s Schema Registry. When reading from or writing to topics with Avro, Protobuf, or JSON Schema, Flink SQL uses the registered schemas and handles compatible schema evolution. How do I control access to Flink resources?¶ Access to Flink resources is managed using Role-Based Access Control (RBAC) in Confluent Cloud. Assign users and service accounts the appropriate roles, such as FlinkAdmin or FlinkDeveloper, to control what actions they can perform. For more information, see Grant Role-Based Access. How do I secure my Flink SQL jobs and data?¶ Confluent Cloud for Apache Flink uses the same security model as the rest of Confluent Cloud, including RBAC, API keys, and network controls. Make sure to assign the minimum required permissions to users and service accounts. For more information, see Grant Role-Based Access in Confluent Cloud for Apache Flink. How do I move my SQL statements to production?¶ To move your Flink SQL statements to production, follow best practices such as using service accounts, applying least-privilege permissions, and thoroughly testing your statements in a development environment before deploying them to production compute pools. For detailed guidance, see Best Practices for Moving SQL Statements to Production. You can use GitHub Actions and Terraform to deploy your Flink SQL statements to production. For more information, see Deploy a Flink SQL Statement Using CI/CD. Where can I get help or support?¶ If you have questions or need support, you can use the in-product help in the Confluent Cloud Console, visit the Flink documentation, or reach out through the established channels. You can also ask questions in the Confluent Community forums or contact Confluent Support if you have a support plan. Related content¶ Flink SQL Quick Start with Confluent Cloud Console Flink SQL Shell Quick Start Stream Processing Concepts Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql confluent flink shell ``` --- ### Get Help with Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/get-help.html Get Help with Confluent Cloud for Apache Flink¶ You can request support in the Confluent Support Portal. You can access the portal directly, or you can navigate to it from the Confluent Cloud Console by selecting the Support menu identified by the help icon () in the upper-right and choosing Support portal. For more information, see Confluent Support for Confluent Cloud. Confluent Community Slack¶ There is a dedicated #flink channel in the Confluent Community Slack. Join with this link to ask questions, provide feedback, and engage with other users. Troubleshoot Flink in Confluent Cloud Console¶ If issues occur while running Flink in Cloud Console, consider generating a HAR file and uploading it to the Confluent Community Slack channel or sending it to the Support Portal. For more information, see Generate a HAR file for Troubleshooting on Confluent Cloud. Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. --- ### Get Started with Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/get-started/overview.html Get Started with Confluent Cloud for Apache Flink¶ Welcome to Confluent Cloud for Apache Flink®. This section guides you through the steps to get your queries running using the Confluent Cloud Console (browser-based) and the Flink SQL shell (CLI-based). Get Started for Free Sign up for a Confluent Cloud trial and get $400 of free credit. If you’re currently using Confluent Cloud in a region that doesn’t yet support Flink, so you can’t use your data in existing Apache Kafka® topics, you can still try out Flink SQL by using sample data generators or the Example catalog, which are used in the quick starts and How-to Guides for Confluent Cloud for Apache Flink. Choose one of the following quick starts to get started with Flink SQL on Confluent Cloud: Flink SQL Quick Start with Confluent Cloud Console Flink SQL Shell Quick Start Also, you can access Flink by using the REST API and the Confluent Terraform Provider. REST API-based data streams Sample Project for Confluent Terraform Provider If you get stuck, have a question, or want to provide feedback or feature requests, don’t hesitate to reach out. Check out Get Help with Confluent Cloud for Apache Flink for our support channels. Next steps¶ Flink SQL Quick Start with Confluent Cloud Console Flink SQL Shell Quick Start Related content¶ Stream Processing Concepts Flink SQL Queries Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. --- ### Flink SQL Quick Start on Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/get-started/quick-start-cloud-console.html Flink SQL Quick Start with Confluent Cloud Console¶ This quick start gets you up and running with Confluent Cloud for Apache Flink®. The following steps show how to create a workspace for running SQL statements on streaming data. In this quick start guide, you perform the following steps: Step 1: Create a workspace Step 2: Run SQL statements Step 3: Query streaming data Step 4: Query existing topics (optional) Prerequisites¶ Access to Confluent Cloud. Step 1: Create a workspace¶ Workspaces provide an intuitive, flexible UI for dynamically exploring and interacting with all of your data on Confluent Cloud using Flink SQL. In a workspace, you can save your queries, run multiple queries simultaneously in a single view, and browse your catalogs, databases, and tables. Log in to Confluent Cloud Console at https://confluent.cloud/login. In the navigation menu, click Stream processing to open the Stream processing page. In the dropdown, select the environment where you want to run Flink SQL, or use the default environment. If you have Kafka topics that you want to run SQL queries on, choose the environment that has these topics. Click Create new workspace, and in the dialog, select the cloud provider and region. If you have Kafka topics that you want to run SQL queries on, select the region that has your Kafka cluster. Click Create workspace. A new workspace opens with an example query in the code editor, or cell. Under the hood, Confluent Cloud for Apache Flink is creating a compute pool, which represents the compute resources that are used to run your SQL statements. The resources provided by the compute pool are shared among all statements that use it. It enables you to limit or guarantee resources as your use cases require. A compute pool is bound to a region. There is no cost for creating compute pools. It may take a minute or two for the compute pool to be provisioned. You can change the compute pool where a workspace runs by clicking the workspace settings icon and choosing from the Compute pool selection dropdown. Step 2: Run SQL statements¶ When the compute pool status changes from Provisioning to Running, it’s ready to run queries. In the cell of the new workspace, you can start running SQL statements. Click Run. The example statement is submitted, and information about the statement is displayed, including its status and a unique identifier. Click the Statement name link to open the statement details view, which displays the statement status and other information. Click X to dismiss the details view. After an initialization period, the query results display beneath the cell. Your output should resemble: EXPR$0 0 1 2 Copy the following SQL and paste it into the cell. The statement runs the CURRENT_TIMESTAMP function, which is one of many built-in functions provided by Confluent Cloud for Apache Flink. SELECT CURRENT_TIMESTAMP; Click Run. The result from the statement is displayed beneath the cell. Your output should resemble: CURRENT_TIMESTAMP 2024-03-15 16:23:18.912 Step 3: Query streaming data¶ Flink SQL enables using familiar SQL syntax to query streaming data. Confluent Cloud for Apache Flink provides example data streams that you can experiment with. In this step, you query the orders table from the marketplace database in the examples catalog. In Flink SQL, catalog objects, like tables, are scoped by catalog and database. A catalog is a collection of databases that share the same namespace. A database is a collection of tables that share the same namespace. In Confluent Cloud, an environment is mapped to a Flink catalog, and a Kafka cluster is mapped to a Flink database. You can always use three-part identifiers for your tables, like catalog.database.table, but it’s more convenient to set a default. Set the default catalog and database by using the Use catalog and Use database dropdown menus in the top-right corner of the workspace. Select examples for the catalog, and marketplace for the database. Click to create a new cell, and run the following statement to list all the tables in the marketplace database. SHOW TABLES; Your output should resemble: table name clicks customers orders products Run the following statement to inspect the orders data stream. SELECT * FROM orders; Your output should resemble: order_id customer_id product_id price 36d77b21-e68f-4123-b87a-cc19ac1f36ac 3137 1305 65.71 7fd3cd2a-392b-4f8f-b953-0bfa1d331354 3063 1327 17.75 1a223c61-38a5-4b8c-8465-2a6b359bf05e 3064 1166 14.95 ... Click Stop to end the query. Step 4: Query existing topics (optional)¶ If you’ve created the workspace in a region where you already have Kafka clusters and topics, you can explore this data with Flink SQL. Confluent Cloud for Apache Flink automatically registers Flink tables on your topics, so you can run statements on your streaming data. Set the default catalog and database by using the Use catalog and Use database dropdown menus. You can find your catalogs and databases in the navigation menu on the left side of the workspace. Click to create a new cell, and run the following statement to list all the tables in the database that you selected as the default. SHOW TABLES; You can browse any of your tables by running a SELECT statement. SELECT * FROM ; Next steps¶ How-to Guides for Confluent Cloud for Apache Flink Flink SQL Shell Quick Start Related content¶ DDL Statements Stream Processing Concepts Built-in Functions Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql EXPR$0 0 1 2 ``` ```sql SELECT CURRENT_TIMESTAMP; ``` ```sql CURRENT_TIMESTAMP 2024-03-15 16:23:18.912 ``` ```sql marketplace ``` ```sql catalog.database.table ``` ```sql SHOW TABLES; ``` ```sql table name clicks customers orders products ``` ```sql SELECT * FROM orders; ``` ```sql order_id customer_id product_id price 36d77b21-e68f-4123-b87a-cc19ac1f36ac 3137 1305 65.71 7fd3cd2a-392b-4f8f-b953-0bfa1d331354 3063 1327 17.75 1a223c61-38a5-4b8c-8465-2a6b359bf05e 3064 1166 14.95 ... ``` ```sql SHOW TABLES; ``` ```sql SELECT * FROM ; ``` --- ### Java Table API Quick Start on Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/get-started/quick-start-java-table-api.html Java Table API Quick Start on Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports programming applications with the Table API. Confluent provides a plugin for running applications that use the Table API on Confluent Cloud. For more information, see Table API. For code examples, see Java Examples for Table API on Confluent Cloud. For a Confluent Developer course, see Apache Flink Table API: Processing Data Streams in Java. Note The Flink Table API is available for preview. A Preview feature is a Confluent Cloud component that is being introduced to gain early feedback from developers. Preview features can be used for evaluation and non-production testing purposes or to provide feedback to Confluent. The warranty, SLA, and Support Services provisions of your agreement with Confluent do not apply to Preview features. Confluent may discontinue providing preview releases of the Preview features at any time in Confluent’s’ sole discretion. Comments, questions, and suggestions related to the Table API are encouraged and can be submitted through the established channels. Prerequisites¶ Access to Confluent Cloud A compute pool in Confluent Cloud A Apache Kafka® cluster, if you want to run examples that store data in Kafka Java version 11 or later Maven (see Installing Apache Maven) To run Table API and Flink SQL programs, you must generate an API key that’s specific to the Flink environment. Also, you need Confluent Cloud account details, like your organization and environment identifiers. Flink API Key: Follow the steps in Generate a Flink API key. For convenience, assign your Flink key and secret to the FLINK_API_KEY and FLINK_API_SECRET environment variables. Organization ID: The identifier your organization, for example, b0b421724-4586-4a07-b787-d0bb5aacbf87. For convenience, assign your organization identifier to the ORG_ID environment variable. Environment ID: The identifier of the environment where your Flink SQL statements run, for example, env-z3y2x1. For convenience, assign your environment identifier to the ENV_ID environment variable. Cloud provider name: The name of the cloud provider where your cluster runs, for example, aws. To see the available providers, run the confluent flink region list command. For convenience, assign your cloud provider to the CLOUD_PROVIDER environment variable. Cloud region: The name of the region where your cluster runs, for example, us-east-1. To see the available regions, run the confluent flink region list command. For convenience, assign your cloud region to the CLOUD_REGION environment variable. export CLOUD_PROVIDER="aws" export CLOUD_REGION="us-east-1" export FLINK_API_KEY="" export FLINK_API_SECRET="" export ORG_ID="" export ENV_ID="" export COMPUTE_POOL_ID="" Compile and run a Table API program¶ The following code example shows how to run a “Hello World” statement and how to query an example data stream. Copy the following project object model (POM) into a file named pom.xml. pom.xml 4.0.0 example flink-table-api-java-hello-world 1.0 jar Apache Flink® Table API Java Hello World Example on Confluent Cloud 2.1.0 2.1-8 11 UTF-8 ${target.java.version} ${target.java.version} 2.17.1 confluent https://packages.confluent.io/maven/ apache.snapshots Apache Development Snapshot Repository https://repository.apache.org/content/repositories/snapshots/ false true org.apache.flink flink-table-api-java ${flink.version} io.confluent.flink confluent-flink-table-api-java-plugin ${confluent-plugin.version} org.apache.logging.log4j log4j-slf4j-impl ${log4j.version} runtime org.apache.logging.log4j log4j-api ${log4j.version} runtime org.apache.logging.log4j log4j-core ${log4j.version} runtime ./example org.apache.maven.plugins maven-compiler-plugin 3.10.1 ${target.java.version} ${target.java.version} org.apache.maven.plugins maven-shade-plugin 3.4.1 package shade org.apache.flink:flink-shaded-force-shading com.google.code.findbugs:jsr305 *:* META-INF/*.SF META-INF/*.DSA META-INF/*.RSA example.hello_table_api org.eclipse.m2e lifecycle-mapping 1.0.0 org.apache.maven.plugins maven-shade-plugin [3.1.1,) shade org.apache.maven.plugins maven-compiler-plugin [3.1,) testCompile compile Create a directory named “example”. mkdir example Create a file named hello_table_api.java in the example directory. touch example/hello_table_api.java Copy the following code into hello_table_api.java. package example; import io.confluent.flink.plugin.ConfluentSettings; import io.confluent.flink.plugin.ConfluentTools; import org.apache.flink.table.api.EnvironmentSettings; import org.apache.flink.table.api.Table; import org.apache.flink.table.api.TableEnvironment; import org.apache.flink.types.Row; import java.util.List; /** * A table program example to get started with the Apache Flink® Table API. * *

It executes two foreground statements in Confluent Cloud. The results of both statements are * printed to the console. */ public class hello_table_api { // All logic is defined in a main() method. It can run both in an IDE or CI/CD system. public static void main(String[] args) { // Set up connection properties to Confluent Cloud. // Use the fromGlobalVariables() method if you assigned environment variables. // EnvironmentSettings settings = ConfluentSettings.fromGlobalVariables(); // Use the fromArgs(args) method if you want to run with command-line arguments. EnvironmentSettings settings = ConfluentSettings.fromArgs(args); // Initialize the session context to get started. TableEnvironment env = TableEnvironment.create(settings); System.out.println("Running with printing..."); // The Table API centers on 'Table' objects, which help in defining data pipelines // fluently. You can define pipelines fully programmatically. Table table = env.fromValues("Hello world!"); // Also, You can define pipelines with embedded Flink SQL. // Table table = env.sqlQuery("SELECT 'Hello world!'"); // Once the pipeline is defined, execute it on Confluent Cloud. // If no target table has been defined, results are streamed back and can be printed // locally. This can be useful for development and debugging. table.execute().print(); System.out.println("Running with collecting..."); // Results can be collected locally and accessed individually. // This can be useful for testing. Table moreHellos = env.fromValues("Hello Bob", "Hello Alice", "Hello Peter").as("greeting"); List rows = ConfluentTools.collectChangelog(moreHellos, 10); rows.forEach( r -> { String column = r.getFieldAs("greeting"); System.out.println("Greeting: " + column); }); } } Run the following command to build the jar file. mvn clean package Run the jar. If you assigned your cloud configuration to the environment variables specified in the Prerequisites section, and you used the fromGlobalVariables method in the hello_table_api code, you don’t need to provide the command-line options. java -jar target/flink-table-api-java-hello-world-1.0.jar \ --cloud aws \ --region us-east-1 \ --flink-api-key key \ --flink-api-secret secret \ --organization-id b0b21724-4586-4a07-b787-d0bb5aacbf87 \ --environment-id env-z3y2x1 \ --compute-pool-id lfcp-8m03rm Your output should resemble: Running with printing... +----+--------------------------------+ | op | f0 | +----+--------------------------------+ | +I | Hello world! | +----+--------------------------------+ 1 row in set Running with collecting... Greeting: Hello Bob Greeting: Hello Alice Greeting: Hello Peter Next steps¶ Python Table API Quick Start How-to Guides for Confluent Cloud for Apache Flink Related content¶ Course: Apache Flink® Table API: Processing Data Streams in Java GitHub repo: Java Examples for Table API on Confluent Cloud GitHub repo: Python Examples for Table API on Confluent Cloud Built-in Functions Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql b0b421724-4586-4a07-b787-d0bb5aacbf87 ``` ```sql confluent flink region list ``` ```sql confluent flink region list ``` ```sql export CLOUD_PROVIDER="aws" export CLOUD_REGION="us-east-1" export FLINK_API_KEY="" export FLINK_API_SECRET="" export ORG_ID="" export ENV_ID="" export COMPUTE_POOL_ID="" ``` ```sql 4.0.0 example flink-table-api-java-hello-world 1.0 jar Apache Flink® Table API Java Hello World Example on Confluent Cloud 2.1.0 2.1-8 11 UTF-8 ${target.java.version} ${target.java.version} 2.17.1 confluent https://packages.confluent.io/maven/ apache.snapshots Apache Development Snapshot Repository https://repository.apache.org/content/repositories/snapshots/ false true org.apache.flink flink-table-api-java ${flink.version} io.confluent.flink confluent-flink-table-api-java-plugin ${confluent-plugin.version} org.apache.logging.log4j log4j-slf4j-impl ${log4j.version} runtime org.apache.logging.log4j log4j-api ${log4j.version} runtime org.apache.logging.log4j log4j-core ${log4j.version} runtime ./example org.apache.maven.plugins maven-compiler-plugin 3.10.1 ${target.java.version} ${target.java.version} org.apache.maven.plugins maven-shade-plugin 3.4.1 package shade org.apache.flink:flink-shaded-force-shading com.google.code.findbugs:jsr305 *:* META-INF/*.SF META-INF/*.DSA META-INF/*.RSA example.hello_table_api org.eclipse.m2e lifecycle-mapping 1.0.0 org.apache.maven.plugins maven-shade-plugin [3.1.1,) shade org.apache.maven.plugins maven-compiler-plugin [3.1,) testCompile compile ``` ```sql mkdir example ``` ```sql hello_table_api.java ``` ```sql touch example/hello_table_api.java ``` ```sql hello_table_api.java ``` ```sql package example; import io.confluent.flink.plugin.ConfluentSettings; import io.confluent.flink.plugin.ConfluentTools; import org.apache.flink.table.api.EnvironmentSettings; import org.apache.flink.table.api.Table; import org.apache.flink.table.api.TableEnvironment; import org.apache.flink.types.Row; import java.util.List; /** * A table program example to get started with the Apache Flink® Table API. * *

It executes two foreground statements in Confluent Cloud. The results of both statements are * printed to the console. */ public class hello_table_api { // All logic is defined in a main() method. It can run both in an IDE or CI/CD system. public static void main(String[] args) { // Set up connection properties to Confluent Cloud. // Use the fromGlobalVariables() method if you assigned environment variables. // EnvironmentSettings settings = ConfluentSettings.fromGlobalVariables(); // Use the fromArgs(args) method if you want to run with command-line arguments. EnvironmentSettings settings = ConfluentSettings.fromArgs(args); // Initialize the session context to get started. TableEnvironment env = TableEnvironment.create(settings); System.out.println("Running with printing..."); // The Table API centers on 'Table' objects, which help in defining data pipelines // fluently. You can define pipelines fully programmatically. Table table = env.fromValues("Hello world!"); // Also, You can define pipelines with embedded Flink SQL. // Table table = env.sqlQuery("SELECT 'Hello world!'"); // Once the pipeline is defined, execute it on Confluent Cloud. // If no target table has been defined, results are streamed back and can be printed // locally. This can be useful for development and debugging. table.execute().print(); System.out.println("Running with collecting..."); // Results can be collected locally and accessed individually. // This can be useful for testing. Table moreHellos = env.fromValues("Hello Bob", "Hello Alice", "Hello Peter").as("greeting"); List rows = ConfluentTools.collectChangelog(moreHellos, 10); rows.forEach( r -> { String column = r.getFieldAs("greeting"); System.out.println("Greeting: " + column); }); } } ``` ```sql mvn clean package ``` ```sql fromGlobalVariables ``` ```sql hello_table_api ``` ```sql java -jar target/flink-table-api-java-hello-world-1.0.jar \ --cloud aws \ --region us-east-1 \ --flink-api-key key \ --flink-api-secret secret \ --organization-id b0b21724-4586-4a07-b787-d0bb5aacbf87 \ --environment-id env-z3y2x1 \ --compute-pool-id lfcp-8m03rm ``` ```sql Running with printing... +----+--------------------------------+ | op | f0 | +----+--------------------------------+ | +I | Hello world! | +----+--------------------------------+ 1 row in set Running with collecting... Greeting: Hello Bob Greeting: Hello Alice Greeting: Hello Peter ``` --- ### Python Table API Quick Start on Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/get-started/quick-start-python-table-api.html Python Table API Quick Start on Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports programming applications with the Table API. Confluent provides a plugin for running applications that use the Table API on Confluent Cloud. For more information, see Table API. For code examples, see Python Examples for Table API on Confluent Cloud. Note The Flink Table API is available for preview. A Preview feature is a Confluent Cloud component that is being introduced to gain early feedback from developers. Preview features can be used for evaluation and non-production testing purposes or to provide feedback to Confluent. The warranty, SLA, and Support Services provisions of your agreement with Confluent do not apply to Preview features. Confluent may discontinue providing preview releases of the Preview features at any time in Confluent’s’ sole discretion. Comments, questions, and suggestions related to the Table API are encouraged and can be submitted through the established channels. Prerequisites¶ Access to Confluent Cloud A compute pool in Confluent Cloud A Apache Kafka® cluster, if you want to run examples that store data in Kafka Java version 11 or later Environment variables as defined in Environment variables. The uv package manager to manage your Python versions and environments. Only Python versions 3.9 to 3.12 are supported. To run Table API and Flink SQL programs, you must generate an API key that’s specific to the Flink environment. Also, you need Confluent Cloud account details, like your organization and environment identifiers. Flink API Key: Follow the steps in Generate a Flink API key. For convenience, assign your Flink key and secret to the FLINK_API_KEY and FLINK_API_SECRET environment variables. Organization ID: The identifier your organization, for example, b0b421724-4586-4a07-b787-d0bb5aacbf87. For convenience, assign your organization identifier to the ORG_ID environment variable. Environment ID: The identifier of the environment where your Flink SQL statements run, for example, env-z3y2x1. For convenience, assign your environment identifier to the ENV_ID environment variable. Cloud provider name: The name of the cloud provider where your cluster runs, for example, aws. To see the available providers, run the confluent flink region list command. For convenience, assign your cloud provider to the CLOUD_PROVIDER environment variable. Cloud region: The name of the region where your cluster runs, for example, us-east-1. To see the available regions, run the confluent flink region list command. For convenience, assign your cloud region to the CLOUD_REGION environment variable. export CLOUD_PROVIDER="aws" export CLOUD_REGION="us-east-1" export FLINK_API_KEY="" export FLINK_API_SECRET="" export ORG_ID="" export ENV_ID="" export COMPUTE_POOL_ID="" Note The Flink Python API communicates with a Java process. You must have at least Java 11 installed. Check that your JAVA_HOME environment variable is set correctly. Checking only java -version might not be sufficient. echo $JAVA_HOME If required, install openjdk and export the JAVA_HOME variable: brew install openjdk && export JAVA_HOME=$(/usr/libexec/java_home) && echo $JAVA_HOME Setup your environment and run a Table API program¶ Use uv to create a virtual environment that contains all required dependencies and project files. Use one of the following commands to install uv. curl -LsSf https://astral.sh/uv/install.sh | sh # or brew install uv # or pip install uv Create a new virtual environment. uv venv --python 3.11 Copy the following code into a file named hello_table_api.py. # /// script # requires-python = ">=3.9,<3.12" # dependencies = [ # "confluent-flink-table-api-python-plugin>=2.1-8", # ] # /// from pyflink.table.confluent import ConfluentSettings, ConfluentTools from pyflink.table import TableEnvironment, Row from pyflink.table.expressions import col, row def run(): # Set up the connection to Confluent Cloud settings = ConfluentSettings.from_global_variables() env = TableEnvironment.create(settings) # Run your first Flink statement in Table API env.from_elements([row("Hello world!")]).execute().print() # Or use SQL env.sql_query("SELECT 'Hello world!'").execute().print() # Structure your code with Table objects - the main ingredient of Table API. table = env.from_path("examples.marketplace.clicks") \ .filter(col("user_agent").like("Mozilla%")) \ .select(col("click_id"), col("user_id")) table.print_schema() print(table.explain()) # Use the provided tools to test on a subset of the streaming data expected = ConfluentTools.collect_materialized_limit(table, 50) actual = [Row(42, 500)] if expected != actual: print("Results don't match!") if __name__ == "__main__": run() Run the following command to execute the Table API program from the directory where you created hello_table_api.py. uv run hello_table_api.py Related content¶ Filter Kafka messages in Python using Flink’s Table API GitHub repo: Java Examples for Table API on Confluent Cloud GitHub repo: Python Examples for Table API on Confluent Cloud Java Table API Quick Start How-to Guides for Confluent Cloud for Apache Flink Built-in Functions Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql b0b421724-4586-4a07-b787-d0bb5aacbf87 ``` ```sql confluent flink region list ``` ```sql confluent flink region list ``` ```sql export CLOUD_PROVIDER="aws" export CLOUD_REGION="us-east-1" export FLINK_API_KEY="" export FLINK_API_SECRET="" export ORG_ID="" export ENV_ID="" export COMPUTE_POOL_ID="" ``` ```sql java -version ``` ```sql echo $JAVA_HOME ``` ```sql brew install openjdk && export JAVA_HOME=$(/usr/libexec/java_home) && echo $JAVA_HOME ``` ```sql curl -LsSf https://astral.sh/uv/install.sh | sh # or brew install uv # or pip install uv ``` ```sql uv venv --python 3.11 ``` ```sql hello_table_api.py ``` ```sql # /// script # requires-python = ">=3.9,<3.12" # dependencies = [ # "confluent-flink-table-api-python-plugin>=2.1-8", # ] # /// from pyflink.table.confluent import ConfluentSettings, ConfluentTools from pyflink.table import TableEnvironment, Row from pyflink.table.expressions import col, row def run(): # Set up the connection to Confluent Cloud settings = ConfluentSettings.from_global_variables() env = TableEnvironment.create(settings) # Run your first Flink statement in Table API env.from_elements([row("Hello world!")]).execute().print() # Or use SQL env.sql_query("SELECT 'Hello world!'").execute().print() # Structure your code with Table objects - the main ingredient of Table API. table = env.from_path("examples.marketplace.clicks") \ .filter(col("user_agent").like("Mozilla%")) \ .select(col("click_id"), col("user_id")) table.print_schema() print(table.explain()) # Use the provided tools to test on a subset of the streaming data expected = ConfluentTools.collect_materialized_limit(table, 50) actual = [Row(42, 500)] if expected != actual: print("Results don't match!") if __name__ == "__main__": run() ``` ```sql hello_table_api.py ``` ```sql uv run hello_table_api.py ``` --- ### SQL Shell Quick Start on Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/get-started/quick-start-shell.html Flink SQL Shell Quick Start on Confluent Cloud for Apache Flink¶ This quick start walks you through the following steps to get you up and running with Confluent Cloud for Apache Flink®. Step 1: Log in to Confluent Cloud with the Confluent CLI Step 2: Start the Flink SQL shell Step 3: Submit a SQL statement Step 4: Create and populate a table Step 5: Query streaming data Prerequisites¶ You need the following prerequisites to use Confluent Cloud for Apache Flink. Access to Confluent Cloud. The organization ID, environment ID, and compute pool ID for your organization. The OrganizationAdmin, EnvironmentAdmin, or FlinkAdmin role for creating compute pools, or the FlinkDeveloper role if you already have a compute pool. If you don’t have the appropriate role, reach out to your OrganizationAdmin or EnvironmentAdmin. The Confluent CLI. To use the Flink SQL shell, update to the latest version of the Confluent CLI by running the following command: confluent update --yes If you used homebrew to install the Confluent CLI, update the CLI by using the brew upgrade command, instead of confluent update. For more information, see Confluent CLI. Step 1: Log in to Confluent Cloud with the Confluent CLI¶ Run the following CLI command to log in to Confluent Cloud. confluent login --save --organization ${ORG_ID} Your output should resemble: Assuming https protocol. Logged in as "" for organization "" (""). Step 2: Start the Flink SQL shell¶ Start the Flink SQL shell by running the confluent flink shell command. The shell connects with Confluent Cloud Important This guide focuses on ad-hoc statements. To run statements in long-running jobs, you should provide the --service-account option in the confluent flink shell command. When you start the shell without this option, statements run with your user account. For more information, see Service Accounts on Confluent Cloud. Run the following CLI command to start the Flink SQL shell. confluent flink shell --compute-pool ${COMPUTE_POOL_ID} --environment ${ENV_ID} Your output should resemble: Welcome! To exit, press Ctrl-Q or type "exit". [Ctrl-Q] Quit [Ctrl-S] Toggle Smart Completion > You’re ready to start processing data by submitting statements to Flink SQL. Step 3: Submit a SQL statement¶ In the SQL shell, run the following statement to see Flink SQL in action. The CURRENT_TIMESTAMP function returns the local date and time. SELECT CURRENT_TIMESTAMP; Your output should resemble: Statement name: ab12345c-6e11-7bcd-9 Statement successfully submitted. Fetching results... +-------------------------+ | CURRENT_TIMESTAMP | +-------------------------+ | 2023-07-05 18:57:53.867 | +-------------------------+ For all functions and statements supported by Flink SQL, see Flink SQL Reference. Step 4: Create and populate a table¶ The following steps show how to create a table, populate it with a few records, and query it to view the records it contains. Run the following statement to create a table that contains pseudorandom integers. CREATE TABLE random_float_table( ts TIMESTAMP_LTZ(3), random_value FLOAT); Run the following INSERT VALUES statement to populate random_int_table with records that have a timestamp field and a float field. timestamp values are generated by the CURRENT_TIMESTAMP function, and float values are generated by the RAND_INTEGER(INT) function multiplied by a float. INSERT INTO random_float_table VALUES (CURRENT_TIMESTAMP, RAND_INTEGER(100)*0.02), (CURRENT_TIMESTAMP, RAND_INTEGER(1000)*0.05), (CURRENT_TIMESTAMP, RAND_INTEGER(10000)*0.20), (CURRENT_TIMESTAMP, RAND_INTEGER(100000)*0.22), (CURRENT_TIMESTAMP, RAND_INTEGER(1000000)*0.7); Press ENTER to return to the SQL shell. Because INSERT INTO VALUES is a point-in-time statement, it exits after it completes inserting records. Run the following statement to query random_float_table for all of its records. SELECT * FROM random_float_table; Your output should resemble: ts random_value 2023-09-07 20:24:19.366 0.46 2023-09-07 20:24:19.276 28.75 2023-09-07 20:24:19.367 1467.2 2023-09-07 20:24:19.368 7953.88 2023-09-07 20:24:19.465 685883.1 Press Q to exit the results view and stop the statement. Run the SHOW JOBS statement to get the status of statements in your SQL environment. SHOW JOBS; Your output should resemble: Statement name: dbdb79f8-7e6e-4b03 Statement successfully submitted. Waiting for statement to be ready. Statement phase is PENDING. Statement phase is COMPLETED. +--------------------+-----------+--------------------------------+--------------+------------------+ | Name | Phase | Statement | Compute Pool | Creation Time | +--------------------+-----------+--------------------------------+--------------+------------------+ | f8f118e1-bd79-40c1 | COMPLETED | CREATE TABLE random_float_t... | lfcp-xxxxxx | 2023-09-07 20... | | a30f8a59-af67-4bf6 | COMPLETED | INSERT INTO random_float_ta... | lfcp-xxxxxx | 2023-09-07 20... | +--------------------+-----------+--------------------------------+--------------+------------------+ Step 5: Query streaming data¶ Flink SQL enables using familiar SQL syntax to query streaming data. Confluent Cloud for Apache Flink provides example data streams that you can experiment with. In this step, you query the orders table from the marketplace database in the examples catalog. In Flink SQL, catalog objects, like tables, are scoped by catalog and database. A catalog is a collection of databases that share the same namespace. A database is a collection of tables that share the same namespace. In Confluent Cloud, an environment is mapped to a Flink catalog, and a Kafka cluster is mapped to a Flink database. You can always use three-part identifiers for your tables, like catalog.database.table, but it’s more convenient to set a default. Run the following statement to set the default catalog. USE CATALOG `examples`; Your output should resemble: +---------------------+----------+ | Key | Value | +---------------------+----------+ | sql.current-catalog | examples | +---------------------+----------+ Run the following statement to set the default database. USE `marketplace`; Your output should resemble: +----------------------+-------------+ | Key | Value | +----------------------+-------------+ | sql.current-database | marketplace | +----------------------+-------------+ Run the following statement to see the list of available tables. SHOW TABLES; Your output should resemble: +------------+ | Table Name | +------------+ | clicks | | customers | | orders | | products | +------------+ Run the following statement to inspect the orders data stream. SELECT * FROM orders; Your output should resemble: order_id customer_id product_id price 36d77b21-e68f-4123-b87a-cc19ac1f36ac 3137 1305 65.71 7fd3cd2a-392b-4f8f-b953-0bfa1d331354 3063 1327 17.75 1a223c61-38a5-4b8c-8465-2a6b359bf05e 3064 1166 14.95 ... Press Q to exit the results view and stop the statement. Congratulations, you have run your first Flink SQL statements on Confluent Cloud using the SQL Shell. Next steps¶ How-to Guides for Confluent Cloud for Apache Flink Related content¶ Course: Apache Flink 101 Course: Building Flink Applications in Java DDL Statements Stream Processing Concepts Built-in Functions Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql confluent update --yes ``` ```sql brew upgrade ``` ```sql confluent update ``` ```sql confluent login --save --organization ${ORG_ID} ``` ```sql Assuming https protocol. Logged in as "" for organization "" (""). ``` ```sql confluent flink shell ``` ```sql --service-account ``` ```sql confluent flink shell ``` ```sql confluent flink shell --compute-pool ${COMPUTE_POOL_ID} --environment ${ENV_ID} ``` ```sql Welcome! To exit, press Ctrl-Q or type "exit". [Ctrl-Q] Quit [Ctrl-S] Toggle Smart Completion > ``` ```sql SELECT CURRENT_TIMESTAMP; ``` ```sql Statement name: ab12345c-6e11-7bcd-9 Statement successfully submitted. Fetching results... +-------------------------+ | CURRENT_TIMESTAMP | +-------------------------+ | 2023-07-05 18:57:53.867 | +-------------------------+ ``` ```sql CREATE TABLE random_float_table( ts TIMESTAMP_LTZ(3), random_value FLOAT); ``` ```sql random_int_table ``` ```sql INSERT INTO random_float_table VALUES (CURRENT_TIMESTAMP, RAND_INTEGER(100)*0.02), (CURRENT_TIMESTAMP, RAND_INTEGER(1000)*0.05), (CURRENT_TIMESTAMP, RAND_INTEGER(10000)*0.20), (CURRENT_TIMESTAMP, RAND_INTEGER(100000)*0.22), (CURRENT_TIMESTAMP, RAND_INTEGER(1000000)*0.7); ``` ```sql random_float_table ``` ```sql SELECT * FROM random_float_table; ``` ```sql ts random_value 2023-09-07 20:24:19.366 0.46 2023-09-07 20:24:19.276 28.75 2023-09-07 20:24:19.367 1467.2 2023-09-07 20:24:19.368 7953.88 2023-09-07 20:24:19.465 685883.1 ``` ```sql Statement name: dbdb79f8-7e6e-4b03 Statement successfully submitted. Waiting for statement to be ready. Statement phase is PENDING. Statement phase is COMPLETED. +--------------------+-----------+--------------------------------+--------------+------------------+ | Name | Phase | Statement | Compute Pool | Creation Time | +--------------------+-----------+--------------------------------+--------------+------------------+ | f8f118e1-bd79-40c1 | COMPLETED | CREATE TABLE random_float_t... | lfcp-xxxxxx | 2023-09-07 20... | | a30f8a59-af67-4bf6 | COMPLETED | INSERT INTO random_float_ta... | lfcp-xxxxxx | 2023-09-07 20... | +--------------------+-----------+--------------------------------+--------------+------------------+ ``` ```sql marketplace ``` ```sql catalog.database.table ``` ```sql USE CATALOG `examples`; ``` ```sql +---------------------+----------+ | Key | Value | +---------------------+----------+ | sql.current-catalog | examples | +---------------------+----------+ ``` ```sql USE `marketplace`; ``` ```sql +----------------------+-------------+ | Key | Value | +----------------------+-------------+ | sql.current-database | marketplace | +----------------------+-------------+ ``` ```sql SHOW TABLES; ``` ```sql +------------+ | Table Name | +------------+ | clicks | | customers | | orders | | products | +------------+ ``` ```sql SELECT * FROM orders; ``` ```sql order_id customer_id product_id price 36d77b21-e68f-4123-b87a-cc19ac1f36ac 3137 1305 65.71 7fd3cd2a-392b-4f8f-b953-0bfa1d331354 3063 1327 17.75 1a223c61-38a5-4b8c-8465-2a6b359bf05e 3064 1166 14.95 ... ``` --- ### Aggregate a Data Stream in a Tumbling Window with Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/how-to-guides/aggregate-tumbling-window.html Aggregate a Stream in a Tumbling Window with Confluent Cloud for Apache Flink¶ Aggregation over windows is central to processing streaming data. Confluent Cloud for Apache Flink® supports Windowing Table-Valued Functions (Windowing TVFs) in Confluent Cloud for Apache Flink, a SQL-standard syntax for splitting an infinite stream into windows of finite size and computing aggregations within each window. This is often used to find the min/max/average within a group, finding the first or last record or calculating totals. In this guide, you will learn how to run an Flink SQL statement that identifies the maximum and minimum orders from a continuous data stream of orders data. This topic shows the following steps: Step 1: Inspect the example stream Step 2: View aggregated results in a tumbling window Prerequisites¶ Access to Confluent Cloud. The OrganizationAdmin, EnvironmentAdmin, or FlinkAdmin role for creating compute pools, or the FlinkDeveloper role if you already have a compute pool. If you don’t have the appropriate role, contact your OrganizationAdmin or EnvironmentAdmin. For more information, see Grant Role-Based Access in Confluent Cloud for Apache Flink. A provisioned Flink compute pool. Step 1: Inspect the example stream¶ In this step, you query the read-only orders table in the examples.marketplace database to inspect the stream for fields that you can mask. Log in to Confluent Cloud and navigate to your Flink workspace. In the Use catalog dropdown, select your environment. In the Use database dropdown, select your Kafka cluster. Run the following statement to inspect the example orders stream. SELECT * FROM examples.marketplace.orders; Your output should resemble: order_id customer_id product_id price 68362284-34df-41a3-87fb-50b79647b786 3195 1267 47.48 6e03663e-d20b-4a23-848a-aec959d794e3 3094 1412 50.92 84217b5d-7dcb-46d1-9600-675a3734a3ed 3038 1094 83.56 ... Step 2: View aggregated results in a tumbling window¶ Run the following statement to start a windowed query on the orders data. SELECT window_start, window_end, MIN(price) as minimum_order_value, MAX(price) as maximum_order_value FROM TABLE(TUMBLE(TABLE examples.marketplace.orders, DESCRIPTOR($rowtime), INTERVAL '10' SECOND)) GROUP BY window_start, window_end; Your output should resemble: window_start window_end minimum_order_value maximum_order_value 2023-09-12 08:54:20.000 2023-09-12 08:54:30.000 10.05 99.75 2023-09-12 08:54:30.000 2023-09-12 08:54:40.000 10.22 99.88 2023-09-12 08:54:40.000 2023-09-12 08:54:50.000 10.09 150.45 ... The Flink statement created with this query identifies the minimum and maximum order value in each 10-second window. Related content¶ Compare Current and Previous Values in a Data Stream Windowing Table-Valued Functions Window Aggregation Queries Window Deduplication Queries Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql examples.marketplace ``` ```sql SELECT * FROM examples.marketplace.orders; ``` ```sql order_id customer_id product_id price 68362284-34df-41a3-87fb-50b79647b786 3195 1267 47.48 6e03663e-d20b-4a23-848a-aec959d794e3 3094 1412 50.92 84217b5d-7dcb-46d1-9600-675a3734a3ed 3038 1094 83.56 ... ``` ```sql SELECT window_start, window_end, MIN(price) as minimum_order_value, MAX(price) as maximum_order_value FROM TABLE(TUMBLE(TABLE examples.marketplace.orders, DESCRIPTOR($rowtime), INTERVAL '10' SECOND)) GROUP BY window_start, window_end; ``` ```sql window_start window_end minimum_order_value maximum_order_value 2023-09-12 08:54:20.000 2023-09-12 08:54:30.000 10.05 99.75 2023-09-12 08:54:30.000 2023-09-12 08:54:40.000 10.22 99.88 2023-09-12 08:54:40.000 2023-09-12 08:54:50.000 10.09 150.45 ... ``` --- ### Combine Streams and Track Most Recent Records with Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/how-to-guides/combine-and-track-most-recent-records.html Combine Streams and Track Most Recent Records with Confluent Cloud for Apache Flink¶ When working with streaming data, it’s common to need to combine information from multiple sources while tracking the most recent record data. Confluent Cloud for Apache Flink® provides powerful capabilities to merge streams and maintain up-to-date information for each record, regardless of which stream it originated from. In this guide, you learn how to run a Flink SQL statement that combines multiple data streams and keeps track of the most recent information for each record by using window functions. While this example uses order and clickstream data, the pattern can be applied to any number of streams that share a common identifier. This topic shows the following steps: Step 1: Inspect the example source streams Step 2: Create a unified view with most recent records Prerequisites¶ Access to Confluent Cloud. The OrganizationAdmin, EnvironmentAdmin, or FlinkAdmin role for creating compute pools, or the FlinkDeveloper role if you already have a compute pool. If you don’t have the appropriate role, contact your OrganizationAdmin or EnvironmentAdmin. For more information, see Grant Role-Based Access in Confluent Cloud for Apache Flink. A provisioned Flink compute pool. Step 1: Inspect the example source streams¶ In this step, you examine the read-only orders and clicks tables in the examples.marketplace database to identify: The common identifier field that links the streams The unique fields from each stream that you want to track Log in to Confluent Cloud and navigate to your Flink workspace. Examine your source streams. The following example includes orders and clicks: -- First stream SELECT * FROM `examples`.`marketplace`.`orders`; -- Second stream SELECT * FROM `examples`.`marketplace`.`clicks`; Your output from orders should resemble: order_id customer_id product_id price be396ae5-d7d9-4454-99d7-9b1c155d51d4 3243 1304 99.55 79e295d3-5a0b-4127-9337-9a483794e7d4 3132 1201 21.43 9b59d319-c37a-4088-a803-350d43bc5382 3099 1271 66.70 8aaa9d8e-d8f7-4bb5-9d59-ce4d0cfc9a92 3181 1028 76.23 e681fa67-3a1e-4e99-ba03-da9fb5d12845 3186 1212 69.67 89ba7186-f927-462b-860a-68b8c9d51a06 3238 1336 76.89 ebfec6c6-3294-444b-82e5-5a66e7dc5cd5 3233 1223 23.69 Your output from clicks should resemble: click_id user_id url user_agent view_time a5c31d8b-cc93-4a48-a7d9-c1d389c83f4a 3099 https://www.acme.com/product/foxmh Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0 79 b7d42e6f-85a1-4f7b-b1c2-d3e456789abc 3262 https://www.acme.com/product/lruuv Mozilla/5.0 (iPhone; CPU OS 9_3_5 like Mac OS X) AppleWebKit/601.1.46 108 c8e53f7a-96b2-4a8c-c2d3-e4f567890def 3181 https://www.acme.com/product/vfzsy Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) 33 d9f64g8b-a7c3-4b9d-d3e4-f5g678901hij 4882 https://www.acme.com/product/zkxun Opera/9.80 (Windows NT 6.0) Presto/2.12.388 Version/12.14 99 e74441b6-09da-4113-b8f9-db12cee90c77 3500 https://www.acme.com/product/lruuv Mozilla/5.0 (iPhone; CPU iPhone OS 11_4_1 like Mac OS X) AppleWebKit/6... 116 f39236ac-2646-4e5d-bab2-cd4445630529 4360 https://www.acme.com/product/vfzsy Mozilla/4.0 (compatible; Win32; WinHttp.WinHttpRequest.5) 52 3f3b06df-aa2b-417e-833e-ccc232536c4a 4171 https://www.acme.com/product/foxmh Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) C... 82 ee9fe475-5420-410d-90ae-47987eba32d5 4095 https://www.acme.com/product/ifgcb Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0) Gecko/20100101 Firefox/1... 119 e75faa6f-78d3-45e0-817e-1338381f53a2 4904 https://www.acme.com/product/ffnsl Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like ... 36 77c6acbb-eb71-4a49-96e5-714f8b024c98 4681 https://www.acme.com/product/zkxun Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.11) Gecko GranParadiso/3... 67 Step 2: Create a unified view with most recent records¶ Run the following statement to combine multiple streams while tracking the most recent information for each record: -- This query combines order and click data, tracking the latest values -- for each customer's interactions across both datasets -- First, combine order data and clickstream data into a single structure -- Note: Fields not present in one source are filled with NULL WITH combined_data AS ( -- Orders data with empty click-related fields SELECT customer_id, order_id, product_id, price, CAST(NULL AS STRING) AS url, -- Click-specific fields set to NULL CAST(NULL AS STRING) AS user_agent, -- for order records CAST(NULL AS INT) AS view_time, $rowtime FROM `examples`.`marketplace`.`orders` UNION ALL -- Click data with empty order-related fields SELECT user_id AS customer_id, -- Normalize user_id to match customer_id CAST(NULL AS STRING) AS order_id, -- Order-specific fields set to NULL CAST(NULL AS STRING) AS product_id, -- for click records CAST(NULL AS DOUBLE) AS price, url, user_agent, view_time, $rowtime FROM `examples`.`marketplace`.`clicks` ) -- For each customer, maintain the latest value for each field -- using window functions over the combined dataset SELECT LAST_VALUE(customer_id) OVER w AS customer_id, LAST_VALUE(order_id) OVER w AS order_id, LAST_VALUE(product_id) OVER w AS product_id, LAST_VALUE(price) OVER w AS price, LAST_VALUE(url) OVER w AS url, LAST_VALUE(user_agent) OVER w AS user_agent, LAST_VALUE(view_time) OVER w AS view_time, MAX($rowtime) OVER w AS rowtime -- Track the latest event timestamp FROM combined_data -- Define window for tracking latest values per customer WINDOW w AS ( PARTITION BY customer_id -- Group all events by customer ORDER BY $rowtime -- Order by event timestamp ROWS BETWEEN UNBOUNDED PRECEDING -- Consider all previous events AND CURRENT ROW -- up to the current one ) Your output should resemble: customer_id order_id product_id price url user_agent view_time rowtime 3243 be396ae5-d7d9-4454-99d7-9b1c155d51d4 1304 99.55 NULL NULL NULL 2024-10-22T08:21:07.620Z 3132 79e295d3-5a0b-4127-9337-9a483794e7d4 1201 21.43 NULL NULL NULL 2024-10-22T08:21:07.640Z 3099 9b59d319-c37a-4088-a803-350d43bc5382 1271 66.7 https://www.acme.com/product/foxmh Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0 79 2024-10-22T08:21:07.600Z 3262 NULL NULL NULL https://www.acme.com/product/lruuv Mozilla/5.0 (iPhone; CPU OS 9_3_5 like Mac OS X) AppleWebKit/601.1.46 108 2024-10-22T08:21:07.637Z 3181 8aaa9d8e-d8f7-4bb5-9d59-ce4d0cfc9a92 1028 76.23 https://www.acme.com/product/vfzsy Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) 33 2024-10-22T08:21:07.656Z 3186 e681fa67-3a1e-4e99-ba03-da9fb5d12845 1212 69.67 NULL NULL NULL 2024-10-22T08:21:07.660Z 4882 NULL NULL NULL https://www.acme.com/product/zkxun Opera/9.80 (Windows NT 6.0) Presto/2.12.388 Version/12.14 99 2024-10-22T08:21:07.676Z 3238 89ba7186-f927-462b-860a-68b8c9d51a06 1336 76.89 NULL NULL NULL 2024-10-22T08:21:07.679Z 3233 ebfec6c6-3294-444b-82e5-5a66e7dc5cd5 1223 23.69 NULL NULL NULL 2024-10-22T08:21:07.699Z This pattern works by: Using a Common Table Expression (CTE) to combine all streams Setting fields not present in each stream to NULL Using window functions to track the most recent data for each field Partitioning by the common identifier to group related records Ordering by the watermark timestamp ($rowtime) to ensure proper temporal sequencing You can adapt this pattern by: Adding more streams to the UNION ALL Changing the common identifier field in the PARTITION BY clause Modifying the selected fields based on your needs Using a custom defined watermark strategy Key considerations¶ When applying this pattern, consider: All streams must have a common identifier field Timestamp fields should be consistent across streams NULL handling may need adjustment based on your use case Why UNION ALL vs. JOIN?¶ While it might seem natural to use a JOIN to combine data from multiple streams, the UNION ALL approach shown in this pattern offers several important advantages for streaming use cases. Consider what would happen with a join-based approach: SELECT COALESCE(o.customer_id, c.user_id) as customer_id, o.order_id, o.product_id, o.price, c.url, c.user_agent, c.view_time FROM orders o FULL OUTER JOIN clicks c ON o.customer_id = c.user_id This join would need to maintain state for both streams to match records, leading to several challenges in a streaming context: State management and performance¶ When using a join, Flink must maintain state for both sides of the join operation to match records. This state grows over time as new records arrive, consuming more resources. In contrast, the UNION ALL pattern simply combines records as they arrive, without needing to maintain state for matching. Handling late-arriving data¶ With a join, if a click record arrives late, Flink would need to match it against all historical order records for that customer. Similarly, a late order would need to be matched against historical clicks. This can lead to reprocessing of historical data and potential out-of-order results. The UNION ALL pattern handles each record independently, making late-arriving data much simpler to process. Append-only output¶ The combination of UNION ALL with window functions produces an append-only output stream, where each record contains the complete latest state for a customer at the time of each event. When materializing these results, you can: Use an append-only table to maintain the history of how each customer’s state changed over time Use an upsert table to maintain only the current state for each customer For example, when new events arrive for customer 3099 (first an order, then a click): customer_id order_id product_id price url user_agent view_time rowtime 3099 e681fa67-3a1e-4e99-ba03-da9fb5d12845 1424 89.99 NULL NULL NULL 2024-10-22T08:21:08.620Z 3099 e681fa67-3a1e-4e99-ba03-da9fb5d12845 1424 89.99 https://www.acme.com/product/vfzsy Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0 45 2024-10-22T08:21:09.620Z Each event produces a new output record with the complete latest state for that customer. In contrast, a join produces a changelog output where existing records may be updated, requiring downstream systems to handle inserts, updates, and deletions. Related content¶ Compare Current and Previous Values in a Data Stream Window Aggregation Queries Handle Multiple Event Types Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql examples.marketplace ``` ```sql -- First stream SELECT * FROM `examples`.`marketplace`.`orders`; -- Second stream SELECT * FROM `examples`.`marketplace`.`clicks`; ``` ```sql order_id customer_id product_id price be396ae5-d7d9-4454-99d7-9b1c155d51d4 3243 1304 99.55 79e295d3-5a0b-4127-9337-9a483794e7d4 3132 1201 21.43 9b59d319-c37a-4088-a803-350d43bc5382 3099 1271 66.70 8aaa9d8e-d8f7-4bb5-9d59-ce4d0cfc9a92 3181 1028 76.23 e681fa67-3a1e-4e99-ba03-da9fb5d12845 3186 1212 69.67 89ba7186-f927-462b-860a-68b8c9d51a06 3238 1336 76.89 ebfec6c6-3294-444b-82e5-5a66e7dc5cd5 3233 1223 23.69 ``` ```sql click_id user_id url user_agent view_time a5c31d8b-cc93-4a48-a7d9-c1d389c83f4a 3099 https://www.acme.com/product/foxmh Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0 79 b7d42e6f-85a1-4f7b-b1c2-d3e456789abc 3262 https://www.acme.com/product/lruuv Mozilla/5.0 (iPhone; CPU OS 9_3_5 like Mac OS X) AppleWebKit/601.1.46 108 c8e53f7a-96b2-4a8c-c2d3-e4f567890def 3181 https://www.acme.com/product/vfzsy Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) 33 d9f64g8b-a7c3-4b9d-d3e4-f5g678901hij 4882 https://www.acme.com/product/zkxun Opera/9.80 (Windows NT 6.0) Presto/2.12.388 Version/12.14 99 e74441b6-09da-4113-b8f9-db12cee90c77 3500 https://www.acme.com/product/lruuv Mozilla/5.0 (iPhone; CPU iPhone OS 11_4_1 like Mac OS X) AppleWebKit/6... 116 f39236ac-2646-4e5d-bab2-cd4445630529 4360 https://www.acme.com/product/vfzsy Mozilla/4.0 (compatible; Win32; WinHttp.WinHttpRequest.5) 52 3f3b06df-aa2b-417e-833e-ccc232536c4a 4171 https://www.acme.com/product/foxmh Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) C... 82 ee9fe475-5420-410d-90ae-47987eba32d5 4095 https://www.acme.com/product/ifgcb Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0) Gecko/20100101 Firefox/1... 119 e75faa6f-78d3-45e0-817e-1338381f53a2 4904 https://www.acme.com/product/ffnsl Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like ... 36 77c6acbb-eb71-4a49-96e5-714f8b024c98 4681 https://www.acme.com/product/zkxun Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.11) Gecko GranParadiso/3... 67 ``` ```sql -- This query combines order and click data, tracking the latest values -- for each customer's interactions across both datasets -- First, combine order data and clickstream data into a single structure -- Note: Fields not present in one source are filled with NULL WITH combined_data AS ( -- Orders data with empty click-related fields SELECT customer_id, order_id, product_id, price, CAST(NULL AS STRING) AS url, -- Click-specific fields set to NULL CAST(NULL AS STRING) AS user_agent, -- for order records CAST(NULL AS INT) AS view_time, $rowtime FROM `examples`.`marketplace`.`orders` UNION ALL -- Click data with empty order-related fields SELECT user_id AS customer_id, -- Normalize user_id to match customer_id CAST(NULL AS STRING) AS order_id, -- Order-specific fields set to NULL CAST(NULL AS STRING) AS product_id, -- for click records CAST(NULL AS DOUBLE) AS price, url, user_agent, view_time, $rowtime FROM `examples`.`marketplace`.`clicks` ) -- For each customer, maintain the latest value for each field -- using window functions over the combined dataset SELECT LAST_VALUE(customer_id) OVER w AS customer_id, LAST_VALUE(order_id) OVER w AS order_id, LAST_VALUE(product_id) OVER w AS product_id, LAST_VALUE(price) OVER w AS price, LAST_VALUE(url) OVER w AS url, LAST_VALUE(user_agent) OVER w AS user_agent, LAST_VALUE(view_time) OVER w AS view_time, MAX($rowtime) OVER w AS rowtime -- Track the latest event timestamp FROM combined_data -- Define window for tracking latest values per customer WINDOW w AS ( PARTITION BY customer_id -- Group all events by customer ORDER BY $rowtime -- Order by event timestamp ROWS BETWEEN UNBOUNDED PRECEDING -- Consider all previous events AND CURRENT ROW -- up to the current one ) ``` ```sql customer_id order_id product_id price url user_agent view_time rowtime 3243 be396ae5-d7d9-4454-99d7-9b1c155d51d4 1304 99.55 NULL NULL NULL 2024-10-22T08:21:07.620Z 3132 79e295d3-5a0b-4127-9337-9a483794e7d4 1201 21.43 NULL NULL NULL 2024-10-22T08:21:07.640Z 3099 9b59d319-c37a-4088-a803-350d43bc5382 1271 66.7 https://www.acme.com/product/foxmh Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0 79 2024-10-22T08:21:07.600Z 3262 NULL NULL NULL https://www.acme.com/product/lruuv Mozilla/5.0 (iPhone; CPU OS 9_3_5 like Mac OS X) AppleWebKit/601.1.46 108 2024-10-22T08:21:07.637Z 3181 8aaa9d8e-d8f7-4bb5-9d59-ce4d0cfc9a92 1028 76.23 https://www.acme.com/product/vfzsy Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) 33 2024-10-22T08:21:07.656Z 3186 e681fa67-3a1e-4e99-ba03-da9fb5d12845 1212 69.67 NULL NULL NULL 2024-10-22T08:21:07.660Z 4882 NULL NULL NULL https://www.acme.com/product/zkxun Opera/9.80 (Windows NT 6.0) Presto/2.12.388 Version/12.14 99 2024-10-22T08:21:07.676Z 3238 89ba7186-f927-462b-860a-68b8c9d51a06 1336 76.89 NULL NULL NULL 2024-10-22T08:21:07.679Z 3233 ebfec6c6-3294-444b-82e5-5a66e7dc5cd5 1223 23.69 NULL NULL NULL 2024-10-22T08:21:07.699Z ``` ```sql SELECT COALESCE(o.customer_id, c.user_id) as customer_id, o.order_id, o.product_id, o.price, c.url, c.user_agent, c.view_time FROM orders o FULL OUTER JOIN clicks c ON o.customer_id = c.user_id ``` ```sql customer_id order_id product_id price url user_agent view_time rowtime 3099 e681fa67-3a1e-4e99-ba03-da9fb5d12845 1424 89.99 NULL NULL NULL 2024-10-22T08:21:08.620Z 3099 e681fa67-3a1e-4e99-ba03-da9fb5d12845 1424 89.99 https://www.acme.com/product/vfzsy Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0 45 2024-10-22T08:21:09.620Z ``` --- ### Compare Current and Previous Values in a Data Stream with Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/how-to-guides/compare-current-and-previous-values.html Compare Current and Previous Values in a Data Stream with Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides a LAG function, which is a built-in function that enables you to access data from a previous event in the same row without the need for a self-join. It gives you the ability to analyze the differences between consecutive rows or to create more complex calculations based on previous events. This can be particularly useful in scenarios such as comparing daily sales values. In this guide, you will learn how to run an Flink SQL statement that uses the LAG function to compare current and historical order values from a continuous data stream of orders data. This topic shows the following steps: Step 1: Inspect the example stream Step 2: View aggregated results Prerequisites¶ Access to Confluent Cloud. The OrganizationAdmin, EnvironmentAdmin, or FlinkAdmin role for creating compute pools, or the FlinkDeveloper role if you already have a compute pool. If you don’t have the appropriate role, contact your OrganizationAdmin or EnvironmentAdmin. For more information, see Grant Role-Based Access in Confluent Cloud for Apache Flink. A provisioned Flink compute pool. Step 1: Inspect the example stream¶ In this step, you query the read-only orders table in the examples.marketplace database to inspect the stream for fields that you can mask. Log in to Confluent Cloud and navigate to your Flink workspace. In the Use catalog dropdown, select your environment. In the Use database dropdown, select your Kafka cluster. Run the following statement to inspect the example orders stream. SELECT * FROM examples.marketplace.orders; Your output should resemble: order_id customer_id product_id price 68362284-34df-41a3-87fb-50b79647b786 3195 1267 47.48 6e03663e-d20b-4a23-848a-aec959d794e3 3094 1412 50.92 84217b5d-7dcb-46d1-9600-675a3734a3ed 3038 1094 83.56 ... Step 2: View aggregated results¶ Run the following statement to start a query on the orders data using the LAG function to return current and previous order data for each customer. SELECT $rowtime AS row_time , customer_id , order_id , price , LAG(order_id, 1) OVER (PARTITION BY customer_id ORDER BY $rowtime) previous_order_id , LAG(price, 1) OVER (PARTITION BY customer_id ORDER BY $rowtime) previous_order_price FROM examples.marketplace.orders; Your output should resemble: row_time customer_id order_id price previous_order_id previous_order_price 2024-01-11 15:42:00.557 3213 821f81d4-d912-4e0f-ab8b-88fe8d9af397 89.34 2c26a03b-4cd5-4df6-90d0-0b11916533d2 57.89 2024-01-11 15:42:01.079 3090 57b20b43-3f52-49d8-b8bc-3a55d0440482 50.22 c913ea7b-a7dc-4b22-b966-8df3f28e8e5e 66.12 2024-01-11 15:42:01.391 3142 8a536722-3e4f-4920-bd33-2b981179b8f8 10.77 NULL NULL 2024-01-11 15:42:01.482 3006 cabf50e8-129d-4b71-b253-894526a571c1 113.12 NULL NULL 2024-01-11 15:42:01.681 3009 fd96d839-f06b-43ef-a23f-38e4ca6849b4 78.01 d5cdafb2-ddf1-4161-8843-48ae5f46f524 102.34 2024-01-11 15:42:01.910 3158 16165e84-d1d6-49b9-afaf-1856c4f2a751 354.11 NULL NULL ... Note that there are some NULL values for previous_order_id and previous_order_price. For these customers, the current order is the first order they have made, so there is no historical previous order data to return. Related content¶ Aggregate a Stream in a Tumbling Window Aggregate Functions Time Attributes DDL Statements Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql examples.marketplace ``` ```sql SELECT * FROM examples.marketplace.orders; ``` ```sql order_id customer_id product_id price 68362284-34df-41a3-87fb-50b79647b786 3195 1267 47.48 6e03663e-d20b-4a23-848a-aec959d794e3 3094 1412 50.92 84217b5d-7dcb-46d1-9600-675a3734a3ed 3038 1094 83.56 ... ``` ```sql SELECT $rowtime AS row_time , customer_id , order_id , price , LAG(order_id, 1) OVER (PARTITION BY customer_id ORDER BY $rowtime) previous_order_id , LAG(price, 1) OVER (PARTITION BY customer_id ORDER BY $rowtime) previous_order_price FROM examples.marketplace.orders; ``` ```sql row_time customer_id order_id price previous_order_id previous_order_price 2024-01-11 15:42:00.557 3213 821f81d4-d912-4e0f-ab8b-88fe8d9af397 89.34 2c26a03b-4cd5-4df6-90d0-0b11916533d2 57.89 2024-01-11 15:42:01.079 3090 57b20b43-3f52-49d8-b8bc-3a55d0440482 50.22 c913ea7b-a7dc-4b22-b966-8df3f28e8e5e 66.12 2024-01-11 15:42:01.391 3142 8a536722-3e4f-4920-bd33-2b981179b8f8 10.77 NULL NULL 2024-01-11 15:42:01.482 3006 cabf50e8-129d-4b71-b253-894526a571c1 113.12 NULL NULL 2024-01-11 15:42:01.681 3009 fd96d839-f06b-43ef-a23f-38e4ca6849b4 78.01 d5cdafb2-ddf1-4161-8843-48ae5f46f524 102.34 2024-01-11 15:42:01.910 3158 16165e84-d1d6-49b9-afaf-1856c4f2a751 354.11 NULL NULL ... ``` ```sql previous_order_id ``` ```sql previous_order_price ``` --- ### Convert the Serialization Format of a Topic with Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/how-to-guides/convert-serialization-format.html Convert the Serialization Format of a Topic with Confluent Cloud for Apache Flink¶ This guide shows how to use Confluent Cloud for Apache Flink® to transform a topic serialized in Avro Schema Registry format to a topic serialized in JSON Schema Registry format. The Apache Flink® type system is used to map the datatypes between the these two different wire formats. This topic shows the following steps: Step 1: Create a streaming data source using Avro Step 2: Inspect the source data Step 3: Convert the serialization format to JSON Step 4: Delete the long-running statement Prerequisites¶ You need the following prerequisites to use Confluent Cloud for Apache Flink. Access to Confluent Cloud. The organization ID, environment ID, and compute pool ID for your organization. The OrganizationAdmin, EnvironmentAdmin, or FlinkAdmin role for creating compute pools, or the FlinkDeveloper role if you already have a compute pool. If you don’t have the appropriate role, reach out to your OrganizationAdmin or EnvironmentAdmin. The Confluent CLI. To use the Flink SQL shell, update to the latest version of the Confluent CLI by running the following command: confluent update --yes If you used homebrew to install the Confluent CLI, update the CLI by using the brew upgrade command, instead of confluent update. For more information, see Confluent CLI. Step 1: Create a streaming data source using Avro¶ The streaming data for this topic is produced by a Datagen Source Connector that’s configured with the Gaming player activity template. It produces mock data to an Apache Kafka® topic named gaming_player_activity_source. The connector produces player score records that are randomly generated from the gaming_player_activity.avro file. Log in to the Confluent Cloud Console and navigate to the environment that hosts Flink SQL. In the navigation menu, select Connectors. The Connectors page opens. Click Add Connector The Connector Plugins page opens. In the Search connectors box, enter “datagen”. From the search results, click the Sample Data connector. If the Launch Sample Data dialog opens, click Advanced settings. In the Add Datagen Source Connector page, complete the following steps. 1: Create a topic2: Kafka credentials3: Configuration4: Sizing5: Review and Launch Click Add new topic, and in the Topic name field, enter “gaming_player_activity_source”. Click Create with defaults. Confluent Cloud creates the Kafka topic that the connector produces records to. Note When you’re in a Confluent Cloud environment that has Flink SQL, a SQL table is created automatically when you create a Kafka topic. In the Topics list, select gaming_player_activity_source and click Continue. Select the way you want to provide Kafka Cluster credentials. You can choose one of the following options: My account: This setting allows your connector to globally access everything that you have access to. With a user account, the connector uses an API key and secret to access the Kafka cluster. This option is not recommended for production. Service account: This setting limits the access for your connector by using a service account. This option is recommended for production. Use an existing API key: This setting allows you to specify an API key and a secret pair. You can use an existing pair or create a new one. This method is not recommended for production environments. Note Freight clusters support only service accounts for Kafka authentication. In the Kafka credentials pane, leave Global access selected, and click Generate API key & download. This creates an API key and secret that allows the connector to access your cluster, and downloads the key and secret to your computer. Click Continue. On the Configuration page, select AVRO for the output record value format. Selecting AVRO configures the connector to associate a schema with the gaming_player_activity_source topic and register it with Schema Registry. In the Select a template section, click Show more options, click the Gaming player activity tile. Click Show advanced configurations, and in the Max interval between messages (ms) textbox, enter 10. Click Continue. For Connector sizing, leave the slider at the default of 1 task and click Continue. In the Connector name box, Select the text and replace it with “gaming_player_activity_source_connector”. Click Continue to start the connector. The status of your new connector reads Provisioning, which lasts for a few seconds. When the status of the new connector changes from Provisioning to Running, you have a producer sending an event stream to your topic in the Confluent Cloud cluster. Step 2: Inspect the source data¶ In Cloud Console, navigate to your environment’s Flink workspace, or using the Confluent CLI, open a SQL shell from the Confluent CLI. If you use the workspace in Cloud Console, set the Use catalog and Use database controls to your environment and Kafka cluster. If you use the Flink SQL shell, run the following statements to set the current environment and Kafka cluster. USE CATALOG ; USE DATABASE ; Run the following statement to see the data flowing into the gaming_player_activity_source table. SELECT * FROM gaming_player_activity_source; Your output should resemble: key player_id game_room_id points coordinates x'31303833' 1083 4634 85 [30,39] x'31303731' 1071 3406 432 [91,61] x'31303239' 1029 3078 359 [63,04] x'31303736' 1076 4501 256 [73,12] x'31303437' 1047 3644 375 [24,55] ... If you add $rowtime to the SELECT statement, you can see the Kafka timestamp for each record. SELECT $rowtime, * FROM gaming_player_activity_source; Your output should resemble: $rowtime key player_id game_room_id points coordinates 2023-11-08 14:27:27.647 x'31303838' 1088 4198 22 [02,86] 2023-11-08 14:27:27.695 x'31303638' 1068 1446 132 [80,86] 2023-11-08 14:27:27.729 x'31303536' 1056 4839 125 [35,74] 2023-11-08 14:27:27.732 x'31303530' 1050 4517 221 [11,69] 2023-11-08 14:27:27.746 x'31303438' 1048 3337 339 [91,10] ... Step 3: Convert the serialization format to JSON¶ Run the following statement to confirm that the current format of this table is Avro Schema Registry. SHOW CREATE TABLE gaming_player_activity_source; Your output should resemble: +-------------------------------------------------------------+ | SHOW CREATE TABLE | +-------------------------------------------------------------+ | CREATE TABLE `env`.`clus`.`gaming_player_activity_source` ( | | `key` VARBINARY(2147483647), | | `player_id` INT NOT NULL, | | `game_room_id` INT NOT NULL, | | `points` INT NOT NULL, | | `coordinates` VARCHAR(2147483647) NOT NULL, | | ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS | | WITH ( | | 'changelog.mode' = 'append', | | 'connector' = 'confluent', | | 'kafka.cleanup-policy' = 'delete', | | 'kafka.max-message-size' = '2097164 bytes', | | 'kafka.partitions' = '6', | | 'kafka.retention.size' = '0 bytes', | | 'kafka.retention.time' = '604800000 ms', | | 'key.format' = 'raw', | | 'scan.bounded.mode' = 'unbounded', | | 'scan.startup.mode' = 'earliest-offset', | | 'value.format' = 'avro-registry' | | ) | | | +-------------------------------------------------------------+ Run the following statement to create a second table that has the same schema but is configured with the value format set to JSON with Schema Registry. The key format is unchanged. CREATE TABLE gaming_player_activity_source_json ( `key` VARBINARY(2147483647), `player_id` INT NOT NULL, `game_room_id` INT NOT NULL, `points` INT NOT NULL, `coordinates` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( 'value.format' = 'json-registry', 'key.format' = 'raw' ); This statement creates a corresponding Kafka topic and Schema Registry subject named gaming_player_activity_source_json-value for the value. Run the following SQL to create a long-running statement that continuously transforms gaming_player_activity_source records into gaming_player_activity_source_json records. INSERT INTO gaming_player_activity_source_json SELECT * FROM gaming_player_activity_source; Run the following statement to confirm that records are continuously appended to the target table: SELECT * FROM gaming_player_activity_source_json; Your output should resemble: key player_id game_room_id points coordinates x'31303834' 1084 3583 211 [51,93] x'31303037' 1007 2268 55 [98,72] x'31303230' 1020 1625 431 [01,08] x'31303934' 1094 4760 43 [80,71] x'31303539' 1059 2822 390 [33,74] ... Tip Run the SHOW JOBS; statement to see the phase of statements that you’ve started in your workspace or Flink SQL shell. Run the following statement to confirm that the format of the gaming_player_activity_source_json table is JSON. SHOW CREATE TABLE gaming_player_activity_source_json; Your output should resemble: +--------------------------------------------------------------------------------------+ | SHOW CREATE TABLE | +--------------------------------------------------------------------------------------+ | CREATE TABLE `jim-flink-test-env`.`cluster_0`.`gaming_player_activity_source_json` ( | | `key` VARBINARY(2147483647), | | `player_id` INT NOT NULL, | | `game_room_id` INT NOT NULL, | | `points` INT NOT NULL, | | `coordinates` VARCHAR(2147483647) NOT NULL | | ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS | | WITH ( | | 'changelog.mode' = 'append', | | 'connector' = 'confluent', | | 'kafka.cleanup-policy' = 'delete', | | 'kafka.max-message-size' = '2097164 bytes', | | 'kafka.partitions' = '6', | | 'kafka.retention.size' = '0 bytes', | | 'kafka.retention.time' = '604800000 ms', | | 'key.format' = 'raw', | | 'scan.bounded.mode' = 'unbounded', | | 'scan.startup.mode' = 'earliest-offset', | | 'value.format' = 'json-registry' | | ) | | | +--------------------------------------------------------------------------------------+ Step 4: Delete the long-running statement¶ Your INSERT INTO statement is converting records in the Avro format to the JSON format continuously. When you’re done with this guide, free resources in your compute pool by deleting the long-running statement. In Cloud Console, navigate to the Flink page in your environment and click Flink statements. In the statements list, find the statement that has a status of Running. In the Actions column, click … and select Delete statement. In the Confirm statement deletion dialog, copy and paste the statement name and click Confirm. Related content¶ Data Type Mappings WITH options Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql confluent update --yes ``` ```sql brew upgrade ``` ```sql confluent update ``` ```sql gaming_player_activity_source ``` ```sql gaming_player_activity_source ``` ```sql USE CATALOG ; USE DATABASE ; ``` ```sql gaming_player_activity_source ``` ```sql SELECT * FROM gaming_player_activity_source; ``` ```sql key player_id game_room_id points coordinates x'31303833' 1083 4634 85 [30,39] x'31303731' 1071 3406 432 [91,61] x'31303239' 1029 3078 359 [63,04] x'31303736' 1076 4501 256 [73,12] x'31303437' 1047 3644 375 [24,55] ... ``` ```sql SELECT $rowtime, * FROM gaming_player_activity_source; ``` ```sql $rowtime key player_id game_room_id points coordinates 2023-11-08 14:27:27.647 x'31303838' 1088 4198 22 [02,86] 2023-11-08 14:27:27.695 x'31303638' 1068 1446 132 [80,86] 2023-11-08 14:27:27.729 x'31303536' 1056 4839 125 [35,74] 2023-11-08 14:27:27.732 x'31303530' 1050 4517 221 [11,69] 2023-11-08 14:27:27.746 x'31303438' 1048 3337 339 [91,10] ... ``` ```sql SHOW CREATE TABLE gaming_player_activity_source; ``` ```sql +-------------------------------------------------------------+ | SHOW CREATE TABLE | +-------------------------------------------------------------+ | CREATE TABLE `env`.`clus`.`gaming_player_activity_source` ( | | `key` VARBINARY(2147483647), | | `player_id` INT NOT NULL, | | `game_room_id` INT NOT NULL, | | `points` INT NOT NULL, | | `coordinates` VARCHAR(2147483647) NOT NULL, | | ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS | | WITH ( | | 'changelog.mode' = 'append', | | 'connector' = 'confluent', | | 'kafka.cleanup-policy' = 'delete', | | 'kafka.max-message-size' = '2097164 bytes', | | 'kafka.partitions' = '6', | | 'kafka.retention.size' = '0 bytes', | | 'kafka.retention.time' = '604800000 ms', | | 'key.format' = 'raw', | | 'scan.bounded.mode' = 'unbounded', | | 'scan.startup.mode' = 'earliest-offset', | | 'value.format' = 'avro-registry' | | ) | | | +-------------------------------------------------------------+ ``` ```sql CREATE TABLE gaming_player_activity_source_json ( `key` VARBINARY(2147483647), `player_id` INT NOT NULL, `game_room_id` INT NOT NULL, `points` INT NOT NULL, `coordinates` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( 'value.format' = 'json-registry', 'key.format' = 'raw' ); ``` ```sql gaming_player_activity_source_json-value ``` ```sql gaming_player_activity_source ``` ```sql gaming_player_activity_source_json ``` ```sql INSERT INTO gaming_player_activity_source_json SELECT * FROM gaming_player_activity_source; ``` ```sql SELECT * FROM gaming_player_activity_source_json; ``` ```sql key player_id game_room_id points coordinates x'31303834' 1084 3583 211 [51,93] x'31303037' 1007 2268 55 [98,72] x'31303230' 1020 1625 431 [01,08] x'31303934' 1094 4760 43 [80,71] x'31303539' 1059 2822 390 [33,74] ... ``` ```sql gaming_player_activity_source_json ``` ```sql SHOW CREATE TABLE gaming_player_activity_source_json; ``` ```sql +--------------------------------------------------------------------------------------+ | SHOW CREATE TABLE | +--------------------------------------------------------------------------------------+ | CREATE TABLE `jim-flink-test-env`.`cluster_0`.`gaming_player_activity_source_json` ( | | `key` VARBINARY(2147483647), | | `player_id` INT NOT NULL, | | `game_room_id` INT NOT NULL, | | `points` INT NOT NULL, | | `coordinates` VARCHAR(2147483647) NOT NULL | | ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS | | WITH ( | | 'changelog.mode' = 'append', | | 'connector' = 'confluent', | | 'kafka.cleanup-policy' = 'delete', | | 'kafka.max-message-size' = '2097164 bytes', | | 'kafka.partitions' = '6', | | 'kafka.retention.size' = '0 bytes', | | 'kafka.retention.time' = '604800000 ms', | | 'key.format' = 'raw', | | 'scan.bounded.mode' = 'unbounded', | | 'scan.startup.mode' = 'earliest-offset', | | 'value.format' = 'json-registry' | | ) | | | +--------------------------------------------------------------------------------------+ ``` --- ### Create a User-Defined Function with Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/how-to-guides/create-udf.html Create a User-Defined Function with Confluent Cloud for Apache Flink¶ A user-defined function (UDF) extends the capabilities of Confluent Cloud for Apache Flink® and enables you to implement custom logic beyond what is supported by SQL. For example, you can implement functions like encoding and decoding a string, performing geospatial calculations, encrypting and decrypting fields, or reusing an existing library or code from a third-party supplier. Confluent Cloud for Apache Flink supports UDFs written in Java. Package your custom function and its dependencies into a JAR file and upload it as an artifact to Confluent Cloud. Register the function in a Flink database by using the CREATE FUNCTION statement, and invoke your UDF in Flink SQL or the Table API. Confluent Cloud provides the infrastructure to run your code. For a list of cloud service providers and regions that support UDFs, see UDF regional availability. The following steps show how to implement a simple user-defined scalar function, upload it to Confluent Cloud, and use it in a Flink SQL statement. Step 1: Build the uber jar Step 2: Upload the jar as a Flink artifact Step 3: Register the UDF Step 4: Use the UDF in a Flink SQL query Step 5: Implement UDF logging (optional) Step 6: Delete the UDF After you build and run the scalar function, try building a table function. For more code examples, see Flink UDF Java Examples. Permanent and in-line UDFs¶ Starting with Confluent Table API plugin version 2.1-8, you can simplify the process of creating and managing UDFs. Permanent UDFs are registered automatically and can be used in any Flink SQL or Table API program. The Table API creates a temporary JAR file containing all transitive classes required to run the function, uploads it to Confluent Cloud, and registers the function using the previously uploaded artifact. In-line UDFs are defined and used in the same Table API program. Note Permanent and in-line UDFs are an Open Preview feature in Confluent Cloud. A Preview feature is a Confluent Cloud component that is being introduced to gain early feedback from developers. Preview features can be used for evaluation and non-production testing purposes or to provide feedback to Confluent. The warranty, SLA, and Support Services provisions of your agreement with Confluent do not apply to Preview features. Confluent may discontinue providing preview releases of the Preview features at any time in Confluent’s’ sole discretion. The following example shows how to create and call a permanent UDF and an in-line UDF. For the full code listing, see Example_09_Functions.java in the flink-table-api-java-examples repository. Implement a permanent and in-line UDF package io.confluent.flink.examples.table; import io.confluent.flink.plugin.ConfluentSettings; import org.apache.flink.table.api.EnvironmentSettings; import org.apache.flink.table.api.TableEnvironment; import org.apache.flink.table.functions.ScalarFunction; import org.apache.flink.table.functions.TableFunction; import java.util.List; import static org.apache.flink.table.api.Expressions.$; import static org.apache.flink.table.api.Expressions.array; import static org.apache.flink.table.api.Expressions.call; import static org.apache.flink.table.api.Expressions.row; /** * A table program example showing how to use User-Defined Functions * (UDFs) in the Flink Table API. * *

The Flink Table API simplifies the process of creating and managing UDFs. * *

    *
  • It helps creating a JAR file containing all required dependencies for a given UDF. *
  • Uploads the JAR to Confluent artifact API. *
  • Creates SQL functions for given artifacts. *
*/ public class Example_09_Functions { // Fill this with an environment you have write access to static final String TARGET_CATALOG = ""; // Fill this with a Kafka cluster you have write access to static final String TARGET_DATABASE = ""; // All logic is defined in a main() method. It can run both in an IDE or CI/CD system. public static void main(String[] args) { // Setup connection properties to Confluent Cloud EnvironmentSettings settings = ConfluentSettings.fromResource("/cloud.properties"); // Initialize the session context to get started TableEnvironment env = TableEnvironment.create(settings); // Set default catalog and database env.useCatalog(TARGET_CATALOG); env.useDatabase(TARGET_DATABASE); System.out.println("Registering a scalar function..."); // The Table API underneath creates a temporary JAR file containing all transitive classes // required to run the function, uploads it to Confluent Cloud, and registers the function // using the previously uploaded artifact. env.createFunction("CustomTax", CustomTax.class, true); // As of now, Scalar and Table functions are supported. System.out.println("Registering a table function..."); env.createFunction("Explode", Explode.class, true); // Once registered, the functions can be used in Table API and SQL queries. System.out.println("Executing registered UDFs..."); env.fromValues(row("Apple", "USA", 2), row("Apple", "EU", 3)) .select( $("f0").as("product"), $("f1").as("location"), $("f2").times(call("CustomTax", $("f1"))).as("tax")) .execute() .print(); env.fromValues( row(1L, "Ann", array("Apples", "Bananas")), row(2L, "Peter", array("Apples", "Pears"))) .joinLateral(call("Explode", $("f2")).as("fruit")) .select($("f0").as("id"), $("f1").as("name"), $("fruit")) .execute() .print(); // Instead of registering functions permanently, you can embed UDFs directly into queries // without registering them first. This will upload all the functions of the query as a // single artifact to Confluent Cloud. Moreover, the functions lifecycle will be bound to // the lifecycle of the query. System.out.println("Executing inline UDFs..."); env.fromValues(row("Apple", "USA", 2), row("Apple", "EU", 3)) .select( $("f0").as("product"), $("f1").as("location"), $("f2").times(call(CustomTax.class, $("f1"))).as("tax")) .execute() .print(); env.fromValues( row(1L, "Ann", array("Apples", "Bananas")), row(2L, "Peter", array("Apples", "Pears"))) .joinLateral(call(Explode.class, $("f2")).as("fruit")) .select($("f0").as("id"), $("f1").as("name"), $("fruit")) .execute() .print(); } /** A scalar function that calculates a custom tax based on the provided location. */ public static class CustomTax extends ScalarFunction { public int eval(String location) { if (location.equals("USA")) { return 10; } if (location.equals("EU")) { return 5; } return 0; } } /** A table function that explodes an array of string into multiple rows. */ public static class Explode extends TableFunction { public void eval(List arr) { for (String i : arr) { collect(i); } } } } Prerequisites¶ You need the following prerequisites to use Confluent Cloud for Apache Flink. Access to Confluent Cloud. The organization ID, environment ID, and compute pool ID for your organization. The OrganizationAdmin, EnvironmentAdmin, or FlinkAdmin role for creating compute pools, or the FlinkDeveloper role if you already have a compute pool. If you don’t have the appropriate role, reach out to your OrganizationAdmin or EnvironmentAdmin. The Confluent CLI. To use the Flink SQL shell, update to the latest version of the Confluent CLI by running the following command: confluent update --yes If you used homebrew to install the Confluent CLI, update the CLI by using the brew upgrade command, instead of confluent update. For more information, see Confluent CLI. A provisioned Flink compute pool in Confluent Cloud. Apache Maven software project management tool (see Installing Apache Maven) Java 11 to Java 17 Sufficient permissions to upload and invoke UDFs in Confluent Cloud. For more information, see Flink RBAC. If using the Table API only, Flink versions 1.18.x and 1.19.x of flink-table-api-java are supported. Step 1: Build the uber jar¶ In this section, you compile a simple Java class, named TShirtSizingIsSmaller into a jar file. The project is based on the ScalarFunction class in the Flink Table API. The TShirtSizingIsSmaller.java class has an eval function that compares two T-shirt sizes and returns the smaller size. Copy the following project object model into a file named pom.xml. Important You can’t use your own Flink-related jars. If you package Flink core dependencies as part of the jar, you may break the dependency. Also, this example shows how to capture all dependencies greedily, possibly including more than needed. As an alternative, you can optimize on artifact size by listing all dependencies and including their transitive dependencies. pom.xml 4.0.0 example udf_example 1.0 11 11 UTF-8 org.apache.flink flink-table-api-java 2.1.0 provided ./example org.apache.maven.plugins maven-shade-plugin 3.6.0 *:* * META-INF/*.SF META-INF/*.DSA META-INF/*.RSA package shade Create a directory named “example”. mkdir example In the example directory, create a file named TShirtSizingIsSmaller.java. touch example/TShirtSizingIsSmaller.java Copy the following code into TShirtSizingIsSmaller.java. package com.example.my; import org.apache.flink.table.functions.ScalarFunction; import java.util.Arrays; import java.util.List; import java.util.stream.IntStream; /** TShirt sizing function for demo. */ public class TShirtSizingIsSmaller extends ScalarFunction { public static final String NAME = "IS_SMALLER"; private static final List ORDERED_SIZES = Arrays.asList( new Size("X-Small", "XS"), new Size("Small", "S"), new Size("Medium", "M"), new Size("Large", "L"), new Size("X-Large", "XL"), new Size("XX-Large", "XXL")); public boolean eval(String shirt1, String shirt2) { int size1 = findSize(shirt1); int size2 = findSize(shirt2); // If either can't be found just say false rather than throw an error if (size1 == -1 || size2 == -1) { return false; } return size1 < size2; } private int findSize(String shirt) { return IntStream.range(0, ORDERED_SIZES.size()) .filter( i -> { Size s = ORDERED_SIZES.get(i); return s.name.equalsIgnoreCase(shirt) || s.abbreviation.equalsIgnoreCase(shirt); }) .findFirst() .orElse(-1); } private static class Size { private final String name; private final String abbreviation; public Size(String name, String abbreviation) { this.name = name; this.abbreviation = abbreviation; } } } Run the following command to build the jar file. mvn clean package Run the following command to check the contents of your jar. jar -tf target/udf_example-1.0.jar | grep -i TShirtSizingIsSmaller Your output should resemble: com/example/my/TShirtSizingIsSmaller$Size.class com/example/my/TShirtSizingIsSmaller.class Step 2: Upload the jar as a Flink artifact¶ You can use the Confluent Cloud Console, the Confluent CLI, or the REST API to upload your UDF. Confluent Cloud ConsoleConfluent CLIREST API Log in to Confluent Cloud and navigate to your Flink workspace. Navigate to the environment where you want to run the UDF. Click Flink, in the Flink page, click Artifacts. Click Upload artifact to open the upload pane. In the Cloud provider dropdown, select AWS, and in the Region dropdown, select the cloud region. Click Upload your JAR file and navigate to the location of your JAR file, which in the current example is target/udf_example-1.0.jar. When your JAR file is uploaded, it appears in the Artifacts list. In the list, click the row for your UDF artifact to open the details pane. Log in to Confluent Cloud. confluent login --organization-id ${ORG_ID} --prompt Run the following command to upload the jar to Confluent Cloud. confluent flink artifact create udf_example \ --artifact-file target/udf_example-1.0.jar \ --cloud ${CLOUD_PROVIDER} \ --region ${CLOUD_REGION} \ --environment ${ENV_ID} Your output should resemble: +--------------------+-------------+ | ID | cfa-ldxmro | | Name | udf_example | | Version | ver-81vxm5 | | Cloud | aws | | Region | us-east-1 | | Environment | env-z3q9rd | | Content Format | JAR | | Description | | | Documentation Link | | +--------------------+-------------+ Note the artifact ID and version of your UDTF, which in this example are cfa-ldxmro and ver-81vxm5, because you use them later to register the UDTF in Flink SQL and to manage it. Run the following command to view all of the available UDFs. confluent flink artifact list \ --cloud ${CLOUD_PROVIDER} \ --region ${CLOUD_REGION} Your output should resemble: ID | Name | Cloud | Region | Environment -------------+-------------+-------+-----------+-------------- cfa-ldxmro | udf_example | AWS | us-east-1 | env-z3q9rd Run the following command to view the details of your UDF. You can use the artifact ID from the previous step or the artifact name to specify your UDF. # use the artifact ID confluent flink artifact describe \ cfa-ldxmro \ --cloud ${CLOUD_PROVIDER} \ --region ${CLOUD_REGION} # use the artifact name confluent flink artifact describe \ udf_example \ --cloud ${CLOUD_PROVIDER} \ --region ${CLOUD_REGION} Your output should resemble: +--------------------+-------------+ | ID | cfa-ldxmro | | Name | udf_example | | Version | ver-81vxm5 | | Cloud | aws | | Region | us-east-1 | | Environment | env-z3q9rd | | Content Format | JAR | | Description | | | Documentation Link | | +--------------------+-------------+ You can upload your JAR file by requesting a presigned upload URL, then uploading the file by using the presigned URL information. For more information, see Create a Flink artifact. Step 3: Register the UDF¶ UDFs are registered inside a Flink database, which means that you must specify the Confluent Cloud environment (Flink catalog) and Kafka cluster (Flink database) where you want to use the UDF. You can use the Confluent Cloud Console, the Confluent CLI, the Confluent Terraform provider, or the REST API to register your UDF. Confluent Cloud ConsoleConfluent CLITerraformREST API In the Flink page, click Compute pools. In the tile for the compute pool where you want to run the UDF, click Open SQL workspace. In the Use catalog dropdown, select the environment where you want to run the UDF. In the Use database dropdown, select Kafka cluster that you want to run the UDF. Run the following command to start the Flink shell. confluent flink shell --environment ${ENV_ID} --compute-pool ${COMPUTE_POOL_ID} Run the following statements to specify the catalog and database. -- Specify your catalog. This example uses the default. USE CATALOG default; Your output should resemble: +---------------------+---------+ | Key | Value | +---------------------+---------+ | sql.current-catalog | default | +---------------------+---------+ Specify the database you want to use, for example, cluster_0. -- Specify your database. This example uses cluster_0. USE cluster_0; Your output should resemble: +----------------------+-----------+ | Key | Value | +----------------------+-----------+ | sql.current-database | cluster_0 | +----------------------+-----------+ You can register a previously uploaded UDF by using the Confluent Terraform provider. For more information, see confluent_flink_artifact ResourceYou can register a UDF by sending a POST request to the Create Artifact endpoint. For more information, see Create a Flink artifact. In Cloud Console or the Confluent CLI, run the CREATE FUNCTION statement to register your UDF in the current catalog and database. Substitute your UDF’s value for . CREATE FUNCTION is_smaller AS 'com.example.my.TShirtSizingIsSmaller' USING JAR 'confluent-artifact://'; Your output should resemble: Function 'is_smaller' created. Step 4: Use the UDF in a Flink SQL query¶ Once it is registered, your UDF is available to use in queries. Run the following statement to view the UDFs in the current database. SHOW USER FUNCTIONS; Your output should resemble: +---------------+ | function name | +---------------+ | is_smaller | +---------------+ Run the following statement to create a sizes table. CREATE TABLE sizes ( `size_1` STRING, `size_2` STRING ); Run the following statement to populate the sizes table with values. INSERT INTO sizes VALUES ('XL', 'L'), ('small', 'L'), ('M', 'L'), ('XXL', 'XL'); Run the following statement to view the rows in the sizes table. SELECT * FROM sizes; Your output should resemble: size_1 size_2 XL L small L M L XXL XL Run the following statement to execute the is_smaller function on the data in the sizes table. SELECT size_1, size_2, is_smaller (size_1, size_2) AS is_smaller FROM sizes; Your output should resemble: size_1 size_2 is_smaller XL L FALSE small L TRUE M L TRUE XXL XL FALSE Step 5: Implement UDF logging (optional)¶ If you want to log UDF status messages, follow the steps in Log Debug Messages in UDFs. Step 6: Delete the UDF¶ When you’re finished using the UDF, you can delete it from the current database. You can use the Confluent Cloud Console, the Confluent CLI, the Confluent Terraform provider, or the REST API to delete your UDF. Drop the function¶ Run the following statement to remove the is_smaller function from the current database. DROP FUNCTION is_smaller; Your output should resemble: Function 'is_smaller' dropped. Currently running statements are not affected and continue running. Exit the Flink shell. exit; Delete the JAR artifact¶ Confluent Cloud ConsoleConfluent CLITerraformREST API Navigate to the environment where your UDF is registered. Click Flink, and in the Flink page, click Artifacts. In the artifacts list, find the UDF you want to delete. In the Actions column, click the icon, and in the context menu, select Delete artifact. In the confirmation dialog, type “udf_example”, and click Confirm. The “Artifact deleted successfully” message appears. Run the following command to delete the artifact form the environment. confluent flink artifact delete \ \ --cloud ${CLOUD_PROVIDER} \ --region ${CLOUD_REGION} You receive a warning about breaking Flink statements that use the artifact. Type “y” when you’re prompted to proceed. Your output should resemble: Deleted Flink artifact "". You can delete a UDF by using the Confluent Terraform provider. For more information, see confluent_flink_artifact ResourceYou can delete a UDF by sending a DELETE request to the Delete Artifact endpoint. For more information, see Delete an artifact. Implement a user-defined table function¶ In the previous steps, you implemented a UDF with a simple scalar function. Confluent Cloud for Apache Flink also supports user-defined table functions (UDTFs), which take multiple scalar values as input arguments and return multiple rows as output, instead of a single value. The following steps show how to implement a simple UDTF, upload it to Confluent Cloud, and use it in a Flink SQL statement. Step 1: Build the uber jar Step 2: Upload the UDTF jar as a Flink artifact Step 3: Register the UDTF Step 4: Use the UDTF in a Flink SQL query Step 1: Build the uber jar¶ In this section, you compile a simple Java class, named SplitFunction into a jar file, similar to the previous section. The class is based on the TableFunction class in the Flink Table API. The SplitFunction.java class has an eval function that uses the Java split method to break up a string into words and returns the words as columns in a row. In the example directory, create a file named SplitFunction.java. touch example/SplitFunction.java Copy the following code into SplitFunction.java. package com.example.my; import org.apache.flink.table.annotation.DataTypeHint; import org.apache.flink.table.annotation.FunctionHint; import org.apache.flink.table.api.*; import org.apache.flink.table.functions.TableFunction; import org.apache.flink.types.Row; import static org.apache.flink.table.api.Expressions.*; @FunctionHint(output = @DataTypeHint("ROW")) public class SplitFunction extends TableFunction { public void eval(String str, String delimiter) { for (String s : str.split(delimiter)) { // use collect(...) to emit a row collect(Row.of(s)); } } } Run the following command to build the jar file. You can use the POM file from the previous section. mvn clean package Run the following command to check the contents of your jar. jar -tf target/udf_example-1.0.jar | grep -i SplitFunction Your output should resemble: com/example/my/SplitFunction.class Step 2: Upload the UDTF jar as a Flink artifact¶ Confluent Cloud ConsoleConfluent CLI Log in to Confluent Cloud and navigate to your Flink workspace. Navigate to the environment where you want to run the UDF. Click Flink, in the Flink page, click Artifacts. Click Upload artifact to open the upload pane. In the Cloud provider dropdown, select AWS, and in the Region dropdown, select the cloud region. Click Upload your JAR file and navigate to the location of your JAR file, which in the current example is target/udf_example-1.0.jar. When your JAR file is uploaded, it appears in the Artifacts list. In the list, click the row for your UDF artifact to open the details pane. Log in to Confluent Cloud. confluent login --organization-id ${ORG_ID} --prompt Run the following command to upload the jar to Confluent Cloud. confluent flink artifact create udf_table_example \ --artifact-file target/udf_example-1.0.jar \ --cloud ${CLOUD_PROVIDER} \ --region ${CLOUD_REGION} \ --environment ${ENV_ID} Your output should resemble: +--------------------+-------------------+ | ID | cfa-l5xp82 | | Name | udf_table_example | | Version | ver-0x37m2 | | Cloud | aws | | Region | us-east-1 | | Environment | env-z3q9rd | | Content Format | JAR | | Description | | | Documentation Link | | +--------------------+-------------------+ Note the artifact ID and version of your UDTF, which in this example are cfa-l5xp82 and ver-0x37m2, because you use them later to register the UDTF in Flink SQL and to manage it. Step 3: Register the UDTF¶ In the Flink shell or the Cloud Console, specify the catalog and database (environment and cluster) where you want to use the UDTF, as you did in the previous section. Run the CREATE FUNCTION statement to register your UDTF in the current catalog and database. Substitute your UDTF’s value for . CREATE FUNCTION split_string AS 'com.example.my.SplitFunction' USING JAR 'confluent-artifact://'; Your output should resemble: Function 'split_string' created. Step 4: Use the UDTF in a Flink SQL query¶ Once it is registered, your UDTF is available to use in queries. Run the following statement to view the UDFs in the current database. SHOW USER FUNCTIONS; Your output should resemble: +---------------+ | Function Name | +---------------+ | split_string | +---------------+ Run the following statement to execute the split_string function. SELECT * FROM (VALUES 'A;B', 'C;D;E;F') as T(f), LATERAL TABLE(split_string(f, ';')) Your output should resemble: f word A;B A A;B B C;D;E;F C C;D;E;F D C;D;E;F E C;D;E;F F When you’re done with the example UDTF, drop the function and delete the JAR artifact as you did in Step 6: Delete the UDF. Related content¶ Enable UDF Logging confluent flink artifact create CREATE FUNCTION Statement Artifacts endpoints Flink UDF Java Examples Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql package io.confluent.flink.examples.table; import io.confluent.flink.plugin.ConfluentSettings; import org.apache.flink.table.api.EnvironmentSettings; import org.apache.flink.table.api.TableEnvironment; import org.apache.flink.table.functions.ScalarFunction; import org.apache.flink.table.functions.TableFunction; import java.util.List; import static org.apache.flink.table.api.Expressions.$; import static org.apache.flink.table.api.Expressions.array; import static org.apache.flink.table.api.Expressions.call; import static org.apache.flink.table.api.Expressions.row; /** * A table program example showing how to use User-Defined Functions * (UDFs) in the Flink Table API. * *

The Flink Table API simplifies the process of creating and managing UDFs. * *

    *
  • It helps creating a JAR file containing all required dependencies for a given UDF. *
  • Uploads the JAR to Confluent artifact API. *
  • Creates SQL functions for given artifacts. *
*/ public class Example_09_Functions { // Fill this with an environment you have write access to static final String TARGET_CATALOG = ""; // Fill this with a Kafka cluster you have write access to static final String TARGET_DATABASE = ""; // All logic is defined in a main() method. It can run both in an IDE or CI/CD system. public static void main(String[] args) { // Setup connection properties to Confluent Cloud EnvironmentSettings settings = ConfluentSettings.fromResource("/cloud.properties"); // Initialize the session context to get started TableEnvironment env = TableEnvironment.create(settings); // Set default catalog and database env.useCatalog(TARGET_CATALOG); env.useDatabase(TARGET_DATABASE); System.out.println("Registering a scalar function..."); // The Table API underneath creates a temporary JAR file containing all transitive classes // required to run the function, uploads it to Confluent Cloud, and registers the function // using the previously uploaded artifact. env.createFunction("CustomTax", CustomTax.class, true); // As of now, Scalar and Table functions are supported. System.out.println("Registering a table function..."); env.createFunction("Explode", Explode.class, true); // Once registered, the functions can be used in Table API and SQL queries. System.out.println("Executing registered UDFs..."); env.fromValues(row("Apple", "USA", 2), row("Apple", "EU", 3)) .select( $("f0").as("product"), $("f1").as("location"), $("f2").times(call("CustomTax", $("f1"))).as("tax")) .execute() .print(); env.fromValues( row(1L, "Ann", array("Apples", "Bananas")), row(2L, "Peter", array("Apples", "Pears"))) .joinLateral(call("Explode", $("f2")).as("fruit")) .select($("f0").as("id"), $("f1").as("name"), $("fruit")) .execute() .print(); // Instead of registering functions permanently, you can embed UDFs directly into queries // without registering them first. This will upload all the functions of the query as a // single artifact to Confluent Cloud. Moreover, the functions lifecycle will be bound to // the lifecycle of the query. System.out.println("Executing inline UDFs..."); env.fromValues(row("Apple", "USA", 2), row("Apple", "EU", 3)) .select( $("f0").as("product"), $("f1").as("location"), $("f2").times(call(CustomTax.class, $("f1"))).as("tax")) .execute() .print(); env.fromValues( row(1L, "Ann", array("Apples", "Bananas")), row(2L, "Peter", array("Apples", "Pears"))) .joinLateral(call(Explode.class, $("f2")).as("fruit")) .select($("f0").as("id"), $("f1").as("name"), $("fruit")) .execute() .print(); } /** A scalar function that calculates a custom tax based on the provided location. */ public static class CustomTax extends ScalarFunction { public int eval(String location) { if (location.equals("USA")) { return 10; } if (location.equals("EU")) { return 5; } return 0; } } /** A table function that explodes an array of string into multiple rows. */ public static class Explode extends TableFunction { public void eval(List arr) { for (String i : arr) { collect(i); } } } } ``` ```sql confluent update --yes ``` ```sql brew upgrade ``` ```sql confluent update ``` ```sql flink-table-api-java ``` ```sql TShirtSizingIsSmaller ``` ```sql ScalarFunction ``` ```sql TShirtSizingIsSmaller.java ``` ```sql 4.0.0 example udf_example 1.0 11 11 UTF-8 org.apache.flink flink-table-api-java 2.1.0 provided ./example org.apache.maven.plugins maven-shade-plugin 3.6.0 *:* * META-INF/*.SF META-INF/*.DSA META-INF/*.RSA package shade ``` ```sql mkdir example ``` ```sql TShirtSizingIsSmaller.java ``` ```sql touch example/TShirtSizingIsSmaller.java ``` ```sql TShirtSizingIsSmaller.java ``` ```sql package com.example.my; import org.apache.flink.table.functions.ScalarFunction; import java.util.Arrays; import java.util.List; import java.util.stream.IntStream; /** TShirt sizing function for demo. */ public class TShirtSizingIsSmaller extends ScalarFunction { public static final String NAME = "IS_SMALLER"; private static final List ORDERED_SIZES = Arrays.asList( new Size("X-Small", "XS"), new Size("Small", "S"), new Size("Medium", "M"), new Size("Large", "L"), new Size("X-Large", "XL"), new Size("XX-Large", "XXL")); public boolean eval(String shirt1, String shirt2) { int size1 = findSize(shirt1); int size2 = findSize(shirt2); // If either can't be found just say false rather than throw an error if (size1 == -1 || size2 == -1) { return false; } return size1 < size2; } private int findSize(String shirt) { return IntStream.range(0, ORDERED_SIZES.size()) .filter( i -> { Size s = ORDERED_SIZES.get(i); return s.name.equalsIgnoreCase(shirt) || s.abbreviation.equalsIgnoreCase(shirt); }) .findFirst() .orElse(-1); } private static class Size { private final String name; private final String abbreviation; public Size(String name, String abbreviation) { this.name = name; this.abbreviation = abbreviation; } } } ``` ```sql mvn clean package ``` ```sql jar -tf target/udf_example-1.0.jar | grep -i TShirtSizingIsSmaller ``` ```sql com/example/my/TShirtSizingIsSmaller$Size.class com/example/my/TShirtSizingIsSmaller.class ``` ```sql target/udf_example-1.0.jar ``` ```sql confluent login --organization-id ${ORG_ID} --prompt ``` ```sql confluent flink artifact create udf_example \ --artifact-file target/udf_example-1.0.jar \ --cloud ${CLOUD_PROVIDER} \ --region ${CLOUD_REGION} \ --environment ${ENV_ID} ``` ```sql +--------------------+-------------+ | ID | cfa-ldxmro | | Name | udf_example | | Version | ver-81vxm5 | | Cloud | aws | | Region | us-east-1 | | Environment | env-z3q9rd | | Content Format | JAR | | Description | | | Documentation Link | | +--------------------+-------------+ ``` ```sql confluent flink artifact list \ --cloud ${CLOUD_PROVIDER} \ --region ${CLOUD_REGION} ``` ```sql ID | Name | Cloud | Region | Environment -------------+-------------+-------+-----------+-------------- cfa-ldxmro | udf_example | AWS | us-east-1 | env-z3q9rd ``` ```sql # use the artifact ID confluent flink artifact describe \ cfa-ldxmro \ --cloud ${CLOUD_PROVIDER} \ --region ${CLOUD_REGION} # use the artifact name confluent flink artifact describe \ udf_example \ --cloud ${CLOUD_PROVIDER} \ --region ${CLOUD_REGION} ``` ```sql +--------------------+-------------+ | ID | cfa-ldxmro | | Name | udf_example | | Version | ver-81vxm5 | | Cloud | aws | | Region | us-east-1 | | Environment | env-z3q9rd | | Content Format | JAR | | Description | | | Documentation Link | | +--------------------+-------------+ ``` ```sql confluent flink shell --environment ${ENV_ID} --compute-pool ${COMPUTE_POOL_ID} ``` ```sql -- Specify your catalog. This example uses the default. USE CATALOG default; ``` ```sql +---------------------+---------+ | Key | Value | +---------------------+---------+ | sql.current-catalog | default | +---------------------+---------+ ``` ```sql -- Specify your database. This example uses cluster_0. USE cluster_0; ``` ```sql +----------------------+-----------+ | Key | Value | +----------------------+-----------+ | sql.current-database | cluster_0 | +----------------------+-----------+ ``` ```sql ``` ```sql CREATE FUNCTION is_smaller AS 'com.example.my.TShirtSizingIsSmaller' USING JAR 'confluent-artifact://'; ``` ```sql Function 'is_smaller' created. ``` ```sql SHOW USER FUNCTIONS; ``` ```sql +---------------+ | function name | +---------------+ | is_smaller | +---------------+ ``` ```sql CREATE TABLE sizes ( `size_1` STRING, `size_2` STRING ); ``` ```sql INSERT INTO sizes VALUES ('XL', 'L'), ('small', 'L'), ('M', 'L'), ('XXL', 'XL'); ``` ```sql SELECT * FROM sizes; ``` ```sql size_1 size_2 XL L small L M L XXL XL ``` ```sql SELECT size_1, size_2, is_smaller (size_1, size_2) AS is_smaller FROM sizes; ``` ```sql size_1 size_2 is_smaller XL L FALSE small L TRUE M L TRUE XXL XL FALSE ``` ```sql DROP FUNCTION is_smaller; ``` ```sql Function 'is_smaller' dropped. ``` ```sql confluent flink artifact delete \ \ --cloud ${CLOUD_PROVIDER} \ --region ${CLOUD_REGION} ``` ```sql Deleted Flink artifact "". ``` ```sql SplitFunction ``` ```sql TableFunction ``` ```sql SplitFunction.java ``` ```sql SplitFunction.java ``` ```sql touch example/SplitFunction.java ``` ```sql SplitFunction.java ``` ```sql package com.example.my; import org.apache.flink.table.annotation.DataTypeHint; import org.apache.flink.table.annotation.FunctionHint; import org.apache.flink.table.api.*; import org.apache.flink.table.functions.TableFunction; import org.apache.flink.types.Row; import static org.apache.flink.table.api.Expressions.*; @FunctionHint(output = @DataTypeHint("ROW")) public class SplitFunction extends TableFunction { public void eval(String str, String delimiter) { for (String s : str.split(delimiter)) { // use collect(...) to emit a row collect(Row.of(s)); } } } ``` ```sql mvn clean package ``` ```sql jar -tf target/udf_example-1.0.jar | grep -i SplitFunction ``` ```sql com/example/my/SplitFunction.class ``` ```sql target/udf_example-1.0.jar ``` ```sql confluent login --organization-id ${ORG_ID} --prompt ``` ```sql confluent flink artifact create udf_table_example \ --artifact-file target/udf_example-1.0.jar \ --cloud ${CLOUD_PROVIDER} \ --region ${CLOUD_REGION} \ --environment ${ENV_ID} ``` ```sql +--------------------+-------------------+ | ID | cfa-l5xp82 | | Name | udf_table_example | | Version | ver-0x37m2 | | Cloud | aws | | Region | us-east-1 | | Environment | env-z3q9rd | | Content Format | JAR | | Description | | | Documentation Link | | +--------------------+-------------------+ ``` ```sql ``` ```sql CREATE FUNCTION split_string AS 'com.example.my.SplitFunction' USING JAR 'confluent-artifact://'; ``` ```sql Function 'split_string' created. ``` ```sql SHOW USER FUNCTIONS; ``` ```sql +---------------+ | Function Name | +---------------+ | split_string | +---------------+ ``` ```sql split_string ``` ```sql SELECT * FROM (VALUES 'A;B', 'C;D;E;F') as T(f), LATERAL TABLE(split_string(f, ';')) ``` ```sql f word A;B A A;B B C;D;E;F C C;D;E;F D C;D;E;F E C;D;E;F F ``` --- ### Deduplicate Rows in a Table with Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/how-to-guides/deduplicate-rows.html Deduplicate Rows in a Table with Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables generating a table that contains only unique records from an input table with only a few clicks. In this guide, you create a Flink table and apply the Deduplicate Rows action to generate a topic that has only unique records, by using a deduplication statement. The Deduplicate Rows action creates a Flink SQL statement for you, but no knowledge of Flink SQL is required to use it. This guide shows the following steps: Step 1: Create a users table Step 2: Apply the Deduplicate Topic action Step 3: Inspect the output table Prerequisites¶ Access to Confluent Cloud. The OrganizationAdmin, EnvironmentAdmin, or FlinkAdmin role for creating compute pools, or the FlinkDeveloper role if you already have a compute pool. If you don’t have the appropriate role, contact your OrganizationAdmin or EnvironmentAdmin. For more information, see Grant Role-Based Access in Confluent Cloud for Apache Flink. A provisioned Flink compute pool. Step 1: Create a users table¶ Before you can deduplicate rows, you need a table with sample data that contains duplicates. In this step, you create a simple users table and populate it with mock records, some of which are duplicated intentionally. Log in to Confluent Cloud and navigate to your Flink workspace. Run the following statement to create a users table. CREATE TABLE users ( user_id STRING NOT NULL, registertime BIGINT, gender STRING, regionid STRING ); Insert rows with mock data into the users table. INSERT INTO users VALUES ('Thomas A. Anderson', 1677260724, 'male', 'Region_4'), ('Thomas A. Anderson', 1677260724, 'male', 'Region_4'), ('Trinity', 1677260733, 'female', 'Region_4'), ('Trinity', 1677260733, 'female', 'Region_4'), ('Morpheus', 1677260742, 'male', 'Region_8'), ('Morpheus', 1677260742, 'male', 'Region_8'), ('Dozer', 1677260823, 'male', 'Region_1'), ('Agent Smith', 1677260955, 'male', 'Region_0'), ('Persephone', 1677260901, 'female', 'Region_2'), ('Niobe', 1677260921, 'female', 'Region_3'), ('Niobe', 1677260921, 'female', 'Region_3'), ('Niobe', 1677260921, 'female', 'Region_3'), ('Zee', 1677260922, 'female', 'Region_5'); Inspect the inserted rows. SELECT * FROM users; Your output should resemble: user_id registertime gender regionid Thomas A. Anderson 1677260724 male Region_4 Thomas A. Anderson 1677260724 male Region_4 Trinity 1677260733 female Region_4 Trinity 1677260733 female Region_4 Morpheus 1677260742 male Region_8 Morpheus 1677260742 male Region_8 Dozer 1677260823 male Region_1 Agent Smith 1677260955 male Region_0 Persephone 1677260901 female Region_2 Niobe 1677260921 female Region_3 Niobe 1677260921 female Region_3 Niobe 1677260921 female Region_3 Zee 1677260922 female Region_5 Step 2: Apply the Deduplicate Topic action¶ In the previous step, you created a Flink table that had duplicate rows. In this step, you apply the Deduplicate Topic action to create an output table that has only unique rows. In the navigation menu, click Data portal. In the Data portal page, click the Environment dropdown menu and select the environment for your workspace. In the Recently created section, find your users topic and click it to open the details pane. Click Actions, and in the Actions list, click Deduplicate topic to open the Deduplicate topic dialog. In the Fields to deduplicate dropdown, select user_id. Flink uses the deduplication field as the output message key. This means that the output topic’s row key may be different from the input topic’s row key, because the deduplication statement’s DISTRIBUTED BY clause determines the output topic’s key. For this example, the output message key is the user_id field. In the Compute pool dropdown, select the compute pool you want to use. (Optional) In the Runtime configuration section, select Run with a service account to run the deduplicate query with a service account principal. Use this option for production queries. Note The service account you select must have the DeveloperManage and DeveloperWrite roles to create topics, schemas, and run Flink statements. For more information, see Grant Role-Based Access. Click the Show SQL toggle to view the statement that the action will run. For this example, the deduplication query depends on the registertime field, so you must modify the generated statement to use the registertime field as the field to sort on. Click Open SQL editor to modify the statement. A Flink workspace opens with the generated statement in the cell. In the cell, replace $rowtime with registertime in the ORDER BY clause. CREATE TABLE ``.``.`users_deduplicate` ( PRIMARY KEY (`user_id`) NOT ENFORCED ) DISTRIBUTED BY HASH( `user_id` ) WITH ( 'changelog.mode' = 'upsert', 'value.format'='avro-registry', 'key.format'='avro-registry' ) AS SELECT `user_id`, `registertime`, `gender`, `regionid` FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY `user_id` ORDER BY registertime ASC) AS row_num FROM ``.``.`users`) WHERE row_num = 1; Click Run to execute the deduplication query. The CREATE TABLE AS SELECT statement creates the users_deduplicate table and populates it with rows from the users table using a deduplication query. When the Statement status changes to Running, you can query the users_deduplicate table. Step 3: Inspect the output table¶ The statement generated by the Deduplicate Topic action created an output table named users_deduplicate. In this step, you query the output table to see the deduplicated rows. Run the following statement to inspect the users_deduplicate output table. SELECT * FROM users_deduplicate; Your output should resemble: user_id registertime gender regionid Thomas A. Anderson 1677260724 male Region_4 Trinity 1677260733 female Region_4 Morpheus 1677260742 male Region_8 Dozer 1677260823 male Region_1 Agent Smith 1677260955 male Region_0 Persephone 1677260901 female Region_2 Niobe 1677260921 female Region_3 Zee 1677260922 female Region_5 Related content¶ Flink action: Mask Fields in a Table Flink action: Transform a Topic Flink action: Create an Embedding Aggregate a Stream in a Tumbling Window Compare Current and Previous Values in a Data Stream Convert the Serialization Format of a Topic Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql CREATE TABLE users ( user_id STRING NOT NULL, registertime BIGINT, gender STRING, regionid STRING ); ``` ```sql INSERT INTO users VALUES ('Thomas A. Anderson', 1677260724, 'male', 'Region_4'), ('Thomas A. Anderson', 1677260724, 'male', 'Region_4'), ('Trinity', 1677260733, 'female', 'Region_4'), ('Trinity', 1677260733, 'female', 'Region_4'), ('Morpheus', 1677260742, 'male', 'Region_8'), ('Morpheus', 1677260742, 'male', 'Region_8'), ('Dozer', 1677260823, 'male', 'Region_1'), ('Agent Smith', 1677260955, 'male', 'Region_0'), ('Persephone', 1677260901, 'female', 'Region_2'), ('Niobe', 1677260921, 'female', 'Region_3'), ('Niobe', 1677260921, 'female', 'Region_3'), ('Niobe', 1677260921, 'female', 'Region_3'), ('Zee', 1677260922, 'female', 'Region_5'); ``` ```sql SELECT * FROM users; ``` ```sql user_id registertime gender regionid Thomas A. Anderson 1677260724 male Region_4 Thomas A. Anderson 1677260724 male Region_4 Trinity 1677260733 female Region_4 Trinity 1677260733 female Region_4 Morpheus 1677260742 male Region_8 Morpheus 1677260742 male Region_8 Dozer 1677260823 male Region_1 Agent Smith 1677260955 male Region_0 Persephone 1677260901 female Region_2 Niobe 1677260921 female Region_3 Niobe 1677260921 female Region_3 Niobe 1677260921 female Region_3 Zee 1677260922 female Region_5 ``` ```sql registertime ``` ```sql registertime ``` ```sql registertime ``` ```sql CREATE TABLE ``.``.`users_deduplicate` ( PRIMARY KEY (`user_id`) NOT ENFORCED ) DISTRIBUTED BY HASH( `user_id` ) WITH ( 'changelog.mode' = 'upsert', 'value.format'='avro-registry', 'key.format'='avro-registry' ) AS SELECT `user_id`, `registertime`, `gender`, `regionid` FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY `user_id` ORDER BY registertime ASC) AS row_num FROM ``.``.`users`) WHERE row_num = 1; ``` ```sql users_deduplicate ``` ```sql users_deduplicate ``` ```sql users_deduplicate ``` ```sql users_deduplicate ``` ```sql SELECT * FROM users_deduplicate; ``` ```sql user_id registertime gender regionid Thomas A. Anderson 1677260724 male Region_4 Trinity 1677260733 female Region_4 Morpheus 1677260742 male Region_8 Dozer 1677260823 male Region_1 Agent Smith 1677260955 male Region_0 Persephone 1677260901 female Region_2 Niobe 1677260921 female Region_3 Zee 1677260922 female Region_5 ``` --- ### Log Debug Messages in a User Defined Function with Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/how-to-guides/enable-udf-logging.html Log Debug Messages in a User Defined Function for Confluent Cloud for Apache Flink¶ When you create a user defined function (UDF) with Confluent Cloud for Apache Flink®, you have the option of logging events to help with monitoring and debugging. Your log messages appear in the Confluent Cloud Console’s statement log page. For more information on creating UDFs, see Create a User Defined Function. Limitations¶ UDF logging has these limitations. Log4j logging only: External UDF loggers can be composed only with the Apache Log4j logging framework. Burst rate to 1000/s: UDF logging supports up to 1000 log events per second for each UDF during a short burst of high activity. This helps to optimize performance and to reduce noise in logs. Events that exceed the maximum rate are dropped. Implement logging code¶ In your UDF project, import the org.apache.logging.log4j.LogManager and org.apache.logging.log4j.Logger namespaces. Get the Logger instance by calling the LogManager.getLogger() method. package your.package.namespace; import org.apache.flink.table.functions.ScalarFunction; import org.apache.logging.log4j.LogManager; import org.apache.logging.log4j.Logger; import java.util.Date; /* This class is a SumScalar function that logs messages at different levels */ public class LogSumScalarFunction extends ScalarFunction { private static final Logger LOGGER = LogManager.getLogger(); public int eval(int a, int b) { String value = String.format("SumScalar of %d and %d", a, b); Date now = new java.util.Date(); // You can choose the logging level for log messages. LOGGER.info(value + " info log messages by log4j logger --- " + now); LOGGER.error(value + " error log messages by log4j logger --- " + now); LOGGER.warn(value + " warn log messages by log4j logger --- " + now); LOGGER.debug(value + " debug log messages by log4j logger --- " + now); return a + b; } } The following log levels are supported. OFF FATAL ERROR WARN INFO DEBUG TRACE ALL View logged events¶ After the instrumented UDF statements run, you can view logged events in the Confluent Cloud Console’s event logging page. Related content¶ User-defined Functions Create a User-defined Function confluent flink artifact create CREATE FUNCTION Statement Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql org.apache.logging.log4j.LogManager ``` ```sql org.apache.logging.log4j.Logger ``` ```sql LogManager.getLogger() ``` ```sql package your.package.namespace; import org.apache.flink.table.functions.ScalarFunction; import org.apache.logging.log4j.LogManager; import org.apache.logging.log4j.Logger; import java.util.Date; /* This class is a SumScalar function that logs messages at different levels */ public class LogSumScalarFunction extends ScalarFunction { private static final Logger LOGGER = LogManager.getLogger(); public int eval(int a, int b) { String value = String.format("SumScalar of %d and %d", a, b); Date now = new java.util.Date(); // You can choose the logging level for log messages. LOGGER.info(value + " info log messages by log4j logger --- " + now); LOGGER.error(value + " error log messages by log4j logger --- " + now); LOGGER.warn(value + " warn log messages by log4j logger --- " + now); LOGGER.debug(value + " debug log messages by log4j logger --- " + now); return a + b; } } ``` --- ### Mask Fields in a Table with Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/how-to-guides/mask-fields.html Mask Fields in a Table with Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables generating a topic that contains masked fields from an input topic with only a few clicks. In this guide, you create a Flink table and apply the Mask Fields action to generate a topic that has user names masked out, by using a preconfigured regular expression. The Mask Fields action creates a Flink SQL statement for you, but no knowledge of Flink SQL is required to use it. This guide shows the following steps: Step 1: Inspect the example stream Step 2: Create a source table Step 3: Apply the Mask Fields action Step 4: Inspect the output table Step 5: Stop the persistent query Prerequisites¶ Access to Confluent Cloud. The OrganizationAdmin, EnvironmentAdmin, or FlinkAdmin role for creating compute pools, or the FlinkDeveloper role if you already have a compute pool. If you don’t have the appropriate role, contact your OrganizationAdmin or EnvironmentAdmin. For more information, see Grant Role-Based Access in Confluent Cloud for Apache Flink. A provisioned Flink compute pool. Step 1: Inspect the example stream¶ In this step, you query the read-only customers table in the examples.marketplace database to inspect the stream for fields that you can mask. Log in to Confluent Cloud and navigate to your Flink workspace. In the Use catalog dropdown, select your environment. In the Use database dropdown, select your Kafka cluster. Run the following statement to inspect the example customers stream. SELECT * FROM examples.marketplace.customers; Your output should resemble: customer_id name address postcode city email 3134 Dr. Andrew Terry 45488 Eileen Walk 78690 Latoyiaberg romaine.lynch@hotmail.com 3243 Miss Shelby Lueilwitz 199 Bernardina Brook 79991 Johnburgh dominick.oconner@hotmail.c… 3027 Korey Hand 655 Murray Turnpike 08917 Port Sukshire karlyn.ziemann@yahoo.com ... Step 2: Create a source table¶ In the step, you create a customers_source table for the data from the example customers stream. You use the INSERT INTO FROM SELECT statement to populate the table with streaming data. Run the following statement to register the customers_source table. Confluent Cloud for Apache Flink creates a backing Kafka topic that has the same name automatically. -- Register a customers source table. CREATE TABLE customers_source ( customer_id INT NOT NULL, name STRING, address STRING, postcode STRING, city STRING, email STRING, PRIMARY KEY(`customer_id`) NOT ENFORCED ); Run the following statement to populate the customers_source table with data from the example customers stream. -- Persistent query to stream data from -- the customers example stream to the -- customers_source table. INSERT INTO customers_source( customer_id, name, address, postcode, city, email ) SELECT customer_id, name, address, postcode, city, email FROM examples.marketplace.customers; Run the following statement to inspect the customers_source table. SELECT * FROM customers_source; Your output should resemble: customer_id name address postcode city email 3088 Phil Grimes 07738 Zieme Court 84845 Port Dillontown garnett.abernathy@hotmail.com 3022 Jeana Gaylord 021 Morgan Drives 35160 West Celena emile.daniel@gmail.com 3097 Lily Ryan 671 Logan Throughway 58261 Dickinsonburgh ivory.lockman@gmail.com ... Step 3: Apply the Mask Fields action¶ In the previous step, you created a Flink table that had rows with customer names, which might be confidential data. In this step, you apply the Mask Fields action to create an output table that has the contents of the name field masked. Navigate to the Environments page, and in the navigation menu, click Data portal. In the Data portal page, click the dropdown menu and select the environment for your workspace. In the Recently created section, find your customers_source topic and click it to open the details pane. Click Actions, and in the Actions list, click Mask fields to open the Mask fields dialog. In the Field to mask dropdown, select name. In the Regex for name dropdown, select Word characters. In the Runtime configuration section, either select an existing service account or create a new service account for the current action. Note The service you select must have the EnvironmentAdmin role to create topics, schemas, and run Flink statements. Optionally, click the Show SQL toggle to view the statements that the action will run. The code resembles: CREATE TABLE ``.``.`customers_source_mask` LIKE ``.``.`customers_source` INSERT INTO ``.``.`customers_source_mask` SELECT `customer_id`, REGEXP_REPLACE(`name`, '(\w)', '*') as `name`, address, postcode, city, email FROM ``.``.`customers_source`; Click Confirm. The action runs the CREATE TABLE and INSERT INTO statements. These statements register the customers_source_mask table and populate it with rows from the customers_source table. The strings in the name column are masked by the REGEXP_REPLACE function. Step 4: Inspect the output table¶ The statements that were generated by the Mask Fields action created an output table named customers_source_mask. In this step, you query the output table to see the masked field values. Return to your workspace and run the following command to inspect the customers_source_mask output table. SELECT * FROM customers_source_mask; Your output should resemble: customer_id name address postcode city email 3104 **** *** ****** 342 Odis Hollow 27615 West Florentino bryce.hodkiewicz@hotmail.c… 3058 **** ******* ****** 33569 Turner Glens 14107 Schummchester sarah.roob@yahoo.com 3138 **** ****** ******** 944 Elden Walks 39293 New Ernestbury velvet.volkman@gmail.com ... Step 5: Stop the persistent query¶ The INSERT INTO statement that was created by the Mask Fields action runs continuously until you stop it manually. Free resources in your compute pool by deleting the long-running statement. Navigate to the Flink page in your environment and click Flink statements. In the statements list, find the statement that has a status of Running. In the Actions column, click … and select Delete statement. In the Confirm statement deletion dialog, copy and paste the statement name and click Confirm. Related content¶ Flink action: Deduplicate Rows in a Table Flink action: Transform a Topic Flink action: Create an Embedding Aggregate a Stream in a Tumbling Window Compare Current and Previous Values in a Data Stream Convert the Serialization Format of a Topic Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql examples.marketplace ``` ```sql SELECT * FROM examples.marketplace.customers; ``` ```sql customer_id name address postcode city email 3134 Dr. Andrew Terry 45488 Eileen Walk 78690 Latoyiaberg romaine.lynch@hotmail.com 3243 Miss Shelby Lueilwitz 199 Bernardina Brook 79991 Johnburgh dominick.oconner@hotmail.c… 3027 Korey Hand 655 Murray Turnpike 08917 Port Sukshire karlyn.ziemann@yahoo.com ... ``` ```sql customers_source ``` ```sql customers_source ``` ```sql -- Register a customers source table. CREATE TABLE customers_source ( customer_id INT NOT NULL, name STRING, address STRING, postcode STRING, city STRING, email STRING, PRIMARY KEY(`customer_id`) NOT ENFORCED ); ``` ```sql customers_source ``` ```sql -- Persistent query to stream data from -- the customers example stream to the -- customers_source table. INSERT INTO customers_source( customer_id, name, address, postcode, city, email ) SELECT customer_id, name, address, postcode, city, email FROM examples.marketplace.customers; ``` ```sql customers_source ``` ```sql SELECT * FROM customers_source; ``` ```sql customer_id name address postcode city email 3088 Phil Grimes 07738 Zieme Court 84845 Port Dillontown garnett.abernathy@hotmail.com 3022 Jeana Gaylord 021 Morgan Drives 35160 West Celena emile.daniel@gmail.com 3097 Lily Ryan 671 Logan Throughway 58261 Dickinsonburgh ivory.lockman@gmail.com ... ``` ```sql CREATE TABLE ``.``.`customers_source_mask` LIKE ``.``.`customers_source` INSERT INTO ``.``.`customers_source_mask` SELECT `customer_id`, REGEXP_REPLACE(`name`, '(\w)', '*') as `name`, address, postcode, city, email FROM ``.``.`customers_source`; ``` ```sql customers_source_mask ``` ```sql customers_source ``` ```sql customers_source_mask ``` ```sql customers_source_mask ``` ```sql SELECT * FROM customers_source_mask; ``` ```sql customer_id name address postcode city email 3104 **** *** ****** 342 Odis Hollow 27615 West Florentino bryce.hodkiewicz@hotmail.c… 3058 **** ******* ****** 33569 Turner Glens 14107 Schummchester sarah.roob@yahoo.com 3138 **** ****** ******** 944 Elden Walks 39293 New Ernestbury velvet.volkman@gmail.com ... ``` --- ### Handle Multiple Event Types In Tables in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/how-to-guides/multiple-event-types.html Handle Multiple Event Types with Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides several ways to work with Kafka topics containing multiple event types. This guide explains how Flink automatically infers and handles different event type patterns, allowing you to query and process mixed event streams effectively. Overview¶ When working with Kafka topics containing multiple event types, Flink automatically infers table schemas based on the Schema Registry configuration and schema format. The following sections describe the supported approaches in order of recommendation. Using Schema References¶ Schema references provide the most robust way to handle multiple event types in a single topic. With this approach, you define a main schema that references other schemas, allowing for modular schema management and independent evolution of event types. For example, consider a topic that combines purchase and pageview events. Schema for purchase events. AvroJSON SchemaProtobuf{ "type":"record", "namespace": "io.confluent.developer.avro", "name":"Purchase", "fields": [ {"name": "item", "type":"string"}, {"name": "amount", "type": "double"}, {"name": "customer_id", "type": "string"} ] } { "$schema": "http://json-schema.org/draft-07/schema#", "title": "Purchase", "type": "object", "properties": { "item": { "type": "string" }, "amount": { "type": "number" }, "customer_id": { "type": "string" } }, "required": ["item", "amount", "customer_id"] } syntax = "proto3"; package io.confluent.developer.proto; message Purchase { string item = 1; double amount = 2; string customer_id = 3; } Schema for pageview events. AvroJSON SchemaProtobuf{ "type":"record", "namespace": "io.confluent.developer.avro", "name":"Pageview", "fields": [ {"name": "url", "type":"string"}, {"name": "is_special", "type": "boolean"}, {"name": "customer_id", "type": "string"} ] } { "$schema": "http://json-schema.org/draft-07/schema#", "title": "Pageview", "type": "object", "properties": { "url": { "type": "string" }, "is_special": { "type": "boolean" }, "customer_id": { "type": "string" } }, "required": ["url", "is_special", "customer_id"] } syntax = "proto3"; package io.confluent.developer.proto; message Pageview { string url = 1; bool is_special = 2; string customer_id = 3; } Combined schema that references both event types: AvroJSON SchemaProtobuf[ "io.confluent.developer.avro.Purchase", "io.confluent.developer.avro.Pageview" ] { "$schema": "http://json-schema.org/draft-07/schema#", "title": "CustomerEvent", "type": "object", "oneOf": [ { "$ref": "io.confluent.developer.json.Purchase" }, { "$ref": "io.confluent.developer.json.Pageview" } ] } syntax = "proto3"; package io.confluent.developer.proto; import "purchase.proto"; import "pageview.proto"; message CustomerEvent { oneof action { Purchase purchase = 1; Pageview pageview = 2; } } When these schemas are registered in Schema Registry and used with the default TopicNameStrategy, Flink automatically infers the table structure. You can see this structure using: SHOW CREATE TABLE `customer-events`; Your output will show a table structure that includes columns for both event types: AvroJSON SchemaProtobufCREATE TABLE `customer-events` ( `key` VARBINARY(2147483647), `Purchase` ROW<`item` VARCHAR(2147483647), `amount` DOUBLE, `customer_id` VARCHAR(2147483647)>, `Pageview` ROW<`url` VARCHAR(2147483647), `is_special` BOOLEAN, `customer_id` VARCHAR(2147483647)> ) CREATE TABLE `customer-events` ( `key` VARBINARY(2147483647), `connect_union_field_0` ROW<`amount` DOUBLE NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL, `item` VARCHAR(2147483647) NOT NULL>, `connect_union_field_1` ROW<`customer_id` VARCHAR(2147483647) NOT NULL, `is_special` BOOLEAN NOT NULL, `url` VARCHAR(2147483647) NOT NULL> ) CREATE TABLE `customer-events` ( `key` VARBINARY(2147483647), `action` ROW `purchase` ROW<`item` VARCHAR(2147483647) NOT NULL, `amount` DOUBLE NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL>, `pageview` ROW<`url` VARCHAR(2147483647) NOT NULL, `is_special` BOOLEAN NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL> > ) You can query specific event types using standard SQL. The exact syntax depends on your schema format: AvroJSON SchemaProtobuf-- Query purchase events SELECT Purchase.* FROM `customer-events` WHERE Purchase IS NOT NULL; -- Query pageview events SELECT Pageview.* FROM `customer-events` WHERE Pageview IS NOT NULL; -- Query purchase events SELECT connect_union_field_0.* FROM `customer-events` WHERE connect_union_field_0 IS NOT NULL; -- Query pageview events SELECT connect_union_field_1.* FROM `customer-events` WHERE connect_union_field_1 IS NOT NULL; -- Query purchase events SELECT action.purchase.* FROM `customer-events` WHERE action.purchase IS NOT NULL; -- Query pageview events SELECT action.pageview.* FROM `customer-events` WHERE action.pageview IS NOT NULL; Using Union Types¶ Flink automatically handles union types across different schema formats. With this approach, all event types are defined within a single schema using the format’s native union type mechanism: Avro unions JSON Schema oneOf Protocol Buffer oneOf For example, consider a schema combining order and shipment events: AvroJSON SchemaProtobuf{ "type": "record", "namespace": "io.confluent.examples.avro", "name": "AllTypes", "fields": [ { "name": "event_type", "type": [ { "type": "record", "name": "Order", "fields": [ {"name": "order_id", "type": "string"}, {"name": "amount", "type": "double"} ] }, { "type": "record", "name": "Shipment", "fields": [ {"name": "tracking_id", "type": "string"}, {"name": "status", "type": "string"} ] } ] } ] } { "$schema": "http://json-schema.org/draft-07/schema#", "title": "AllTypes", "type": "object", "oneOf": [ { "type": "object", "title": "Order", "properties": { "order_id": { "type": "string" }, "amount": { "type": "number" } }, "required": ["order_id", "amount"] }, { "type": "object", "title": "Shipment", "properties": { "tracking_id": { "type": "string" }, "status": { "type": "string" } }, "required": ["tracking_id", "status"] } ] } syntax = "proto3"; package io.confluent.examples.proto; message Order { string order_id = 1; double amount = 2; } message Shipment { string tracking_id = 1; string status = 2; } message AllTypes { oneof event_type { Order order = 1; Shipment shipment = 2; } } When using these union types with TopicNameStrategy, Flink automatically creates a table structure based on your schema format. You can see this structure using: SHOW CREATE TABLE `events`; The output shows a table structure that reflects how each format handles unions: AvroJSON SchemaProtobufCREATE TABLE `events` ( `key` VARBINARY(2147483647), `event_type` ROW `Order` ROW<`order_id` VARCHAR(2147483647) NOT NULL, `amount` DOUBLE NOT NULL>, `Shipment` ROW<`tracking_id` VARCHAR(2147483647) NOT NULL, `status` VARCHAR(2147483647) NOT NULL> > NOT NULL ) You can query specific event types: -- Query orders SELECT event_type.Order.* FROM `events` WHERE event_type.Order IS NOT NULL; -- Query shipments SELECT event_type.Shipment.* FROM `events` WHERE event_type.Shipment IS NOT NULL; CREATE TABLE `events` ( `key` VARBINARY(2147483647), `connect_union_field_0` ROW<`amount` DOUBLE NOT NULL, `order_id` VARCHAR(2147483647) NOT NULL>, `connect_union_field_1` ROW<`status` VARCHAR(2147483647) NOT NULL, `tracking_id` VARCHAR(2147483647) NOT NULL> ) You can query specific event types: -- Query orders SELECT connect_union_field_0.* FROM `events` WHERE connect_union_field_0 IS NOT NULL; -- Query shipments SELECT connect_union_field_1.* FROM `events` WHERE connect_union_field_1 IS NOT NULL; CREATE TABLE `events` ( `key` VARBINARY(2147483647), `AllTypes` ROW `event_type` ROW `order` ROW<`order_id` VARCHAR(2147483647) NOT NULL, `amount` DOUBLE NOT NULL>, `shipment` ROW<`tracking_id` VARCHAR(2147483647) NOT NULL, `status` VARCHAR(2147483647) NOT NULL> > > ) You can query specific event types: -- Query orders SELECT AllTypes.event_type.order.* FROM `events` WHERE AllTypes.event_type.order IS NOT NULL; -- Query shipments SELECT AllTypes.event_type.shipment.* FROM `events` WHERE AllTypes.event_type.shipment IS NOT NULL; Using RecordNameStrategy Or TopicRecordNameStrategy Strategies¶ For topics using RecordNameStrategy or TopicRecordNameStrategy, Flink initially infers a raw binary table: CREATE TABLE `events` ( `key` VARBINARY(2147483647), `value` VARBINARY(2147483647) ) To work with these events, you need to manually configure the table with the appropriate subject names: ALTER TABLE events SET ( 'value.format' = 'avro-registry', 'value.avro-registry.subject-names' = 'com.example.events.OrderEvent;com.example.events.ShipmentEvent' ); If your topic uses keyed messages, you may also need to configure the key format: ALTER TABLE events SET ( 'key.format' = 'avro-registry', 'key.avro-registry.subject-names' = 'com.example.events.OrderKey' ); Replace avro-registry with json-registry or proto-registry based on your schema format. Best Practices¶ Use schema references with TopicNameStrategy when possible, as this provides the best balance of flexibility and manageability. If schema references aren’t suitable, use union types for a simpler schema management approach. Configure alternative subject name strategies only when working with existing systems that require them. Related Content¶ Schema References Flink SQL Data Type Mappings Subject Name Strategy Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql { "type":"record", "namespace": "io.confluent.developer.avro", "name":"Purchase", "fields": [ {"name": "item", "type":"string"}, {"name": "amount", "type": "double"}, {"name": "customer_id", "type": "string"} ] } ``` ```sql { "$schema": "http://json-schema.org/draft-07/schema#", "title": "Purchase", "type": "object", "properties": { "item": { "type": "string" }, "amount": { "type": "number" }, "customer_id": { "type": "string" } }, "required": ["item", "amount", "customer_id"] } ``` ```sql syntax = "proto3"; package io.confluent.developer.proto; message Purchase { string item = 1; double amount = 2; string customer_id = 3; } ``` ```sql { "type":"record", "namespace": "io.confluent.developer.avro", "name":"Pageview", "fields": [ {"name": "url", "type":"string"}, {"name": "is_special", "type": "boolean"}, {"name": "customer_id", "type": "string"} ] } ``` ```sql { "$schema": "http://json-schema.org/draft-07/schema#", "title": "Pageview", "type": "object", "properties": { "url": { "type": "string" }, "is_special": { "type": "boolean" }, "customer_id": { "type": "string" } }, "required": ["url", "is_special", "customer_id"] } ``` ```sql syntax = "proto3"; package io.confluent.developer.proto; message Pageview { string url = 1; bool is_special = 2; string customer_id = 3; } ``` ```sql [ "io.confluent.developer.avro.Purchase", "io.confluent.developer.avro.Pageview" ] ``` ```sql { "$schema": "http://json-schema.org/draft-07/schema#", "title": "CustomerEvent", "type": "object", "oneOf": [ { "$ref": "io.confluent.developer.json.Purchase" }, { "$ref": "io.confluent.developer.json.Pageview" } ] } ``` ```sql syntax = "proto3"; package io.confluent.developer.proto; import "purchase.proto"; import "pageview.proto"; message CustomerEvent { oneof action { Purchase purchase = 1; Pageview pageview = 2; } } ``` ```sql SHOW CREATE TABLE `customer-events`; ``` ```sql CREATE TABLE `customer-events` ( `key` VARBINARY(2147483647), `Purchase` ROW<`item` VARCHAR(2147483647), `amount` DOUBLE, `customer_id` VARCHAR(2147483647)>, `Pageview` ROW<`url` VARCHAR(2147483647), `is_special` BOOLEAN, `customer_id` VARCHAR(2147483647)> ) ``` ```sql CREATE TABLE `customer-events` ( `key` VARBINARY(2147483647), `connect_union_field_0` ROW<`amount` DOUBLE NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL, `item` VARCHAR(2147483647) NOT NULL>, `connect_union_field_1` ROW<`customer_id` VARCHAR(2147483647) NOT NULL, `is_special` BOOLEAN NOT NULL, `url` VARCHAR(2147483647) NOT NULL> ) ``` ```sql CREATE TABLE `customer-events` ( `key` VARBINARY(2147483647), `action` ROW `purchase` ROW<`item` VARCHAR(2147483647) NOT NULL, `amount` DOUBLE NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL>, `pageview` ROW<`url` VARCHAR(2147483647) NOT NULL, `is_special` BOOLEAN NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL> > ) ``` ```sql -- Query purchase events SELECT Purchase.* FROM `customer-events` WHERE Purchase IS NOT NULL; -- Query pageview events SELECT Pageview.* FROM `customer-events` WHERE Pageview IS NOT NULL; ``` ```sql -- Query purchase events SELECT connect_union_field_0.* FROM `customer-events` WHERE connect_union_field_0 IS NOT NULL; -- Query pageview events SELECT connect_union_field_1.* FROM `customer-events` WHERE connect_union_field_1 IS NOT NULL; ``` ```sql -- Query purchase events SELECT action.purchase.* FROM `customer-events` WHERE action.purchase IS NOT NULL; -- Query pageview events SELECT action.pageview.* FROM `customer-events` WHERE action.pageview IS NOT NULL; ``` ```sql { "type": "record", "namespace": "io.confluent.examples.avro", "name": "AllTypes", "fields": [ { "name": "event_type", "type": [ { "type": "record", "name": "Order", "fields": [ {"name": "order_id", "type": "string"}, {"name": "amount", "type": "double"} ] }, { "type": "record", "name": "Shipment", "fields": [ {"name": "tracking_id", "type": "string"}, {"name": "status", "type": "string"} ] } ] } ] } ``` ```sql { "$schema": "http://json-schema.org/draft-07/schema#", "title": "AllTypes", "type": "object", "oneOf": [ { "type": "object", "title": "Order", "properties": { "order_id": { "type": "string" }, "amount": { "type": "number" } }, "required": ["order_id", "amount"] }, { "type": "object", "title": "Shipment", "properties": { "tracking_id": { "type": "string" }, "status": { "type": "string" } }, "required": ["tracking_id", "status"] } ] } ``` ```sql syntax = "proto3"; package io.confluent.examples.proto; message Order { string order_id = 1; double amount = 2; } message Shipment { string tracking_id = 1; string status = 2; } message AllTypes { oneof event_type { Order order = 1; Shipment shipment = 2; } } ``` ```sql SHOW CREATE TABLE `events`; ``` ```sql CREATE TABLE `events` ( `key` VARBINARY(2147483647), `event_type` ROW `Order` ROW<`order_id` VARCHAR(2147483647) NOT NULL, `amount` DOUBLE NOT NULL>, `Shipment` ROW<`tracking_id` VARCHAR(2147483647) NOT NULL, `status` VARCHAR(2147483647) NOT NULL> > NOT NULL ) ``` ```sql -- Query orders SELECT event_type.Order.* FROM `events` WHERE event_type.Order IS NOT NULL; -- Query shipments SELECT event_type.Shipment.* FROM `events` WHERE event_type.Shipment IS NOT NULL; ``` ```sql CREATE TABLE `events` ( `key` VARBINARY(2147483647), `connect_union_field_0` ROW<`amount` DOUBLE NOT NULL, `order_id` VARCHAR(2147483647) NOT NULL>, `connect_union_field_1` ROW<`status` VARCHAR(2147483647) NOT NULL, `tracking_id` VARCHAR(2147483647) NOT NULL> ) ``` ```sql -- Query orders SELECT connect_union_field_0.* FROM `events` WHERE connect_union_field_0 IS NOT NULL; -- Query shipments SELECT connect_union_field_1.* FROM `events` WHERE connect_union_field_1 IS NOT NULL; ``` ```sql CREATE TABLE `events` ( `key` VARBINARY(2147483647), `AllTypes` ROW `event_type` ROW `order` ROW<`order_id` VARCHAR(2147483647) NOT NULL, `amount` DOUBLE NOT NULL>, `shipment` ROW<`tracking_id` VARCHAR(2147483647) NOT NULL, `status` VARCHAR(2147483647) NOT NULL> > > ) ``` ```sql -- Query orders SELECT AllTypes.event_type.order.* FROM `events` WHERE AllTypes.event_type.order IS NOT NULL; -- Query shipments SELECT AllTypes.event_type.shipment.* FROM `events` WHERE AllTypes.event_type.shipment IS NOT NULL; ``` ```sql CREATE TABLE `events` ( `key` VARBINARY(2147483647), `value` VARBINARY(2147483647) ) ``` ```sql ALTER TABLE events SET ( 'value.format' = 'avro-registry', 'value.avro-registry.subject-names' = 'com.example.events.OrderEvent;com.example.events.ShipmentEvent' ); ``` ```sql ALTER TABLE events SET ( 'key.format' = 'avro-registry', 'key.avro-registry.subject-names' = 'com.example.events.OrderKey' ); ``` ```sql avro-registry ``` ```sql json-registry ``` ```sql proto-registry ``` --- ### How-to Guides for Developing Flink Applications on Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/how-to-guides/overview.html How-to Guides for Confluent Cloud for Apache Flink¶ Discover how Confluent Cloud for Apache Flink® can help you accomplish common processing tasks such as joins and aggregations. This section provides step-by-step guidance on how to use Flink to process your data efficiently and effectively. Aggregate a Stream in a Tumbling Window Combine Streams and Track Most Recent Records Compare Current and Previous Values in a Data Stream Convert the Serialization Format of a Topic Create a User Defined Function Handle Multiple Event Types Process Schemaless Events Resolve Statement Issues Run a Snapshot Query Scan and Summarize Tables View Time Series Data Flink actions¶ Confluent Cloud for Apache Flink provides Flink Actions that enable you to perform specific data-processing tasks on topics with minimal configuration. These actions are designed to simplify common workloads by providing a user-friendly interface to configure and execute them. Create an Embedding: Convert data in a topic’s column into a vector embedding for AI model inference. Deduplicate Rows in a Table: Remove duplicate records from a topic based on specified fields, ensuring that only unique records are retained in the output topic. Mask Fields in a Table: Mask sensitive data in specified fields of a topic by replacing the original data with a static value. Transform a Topic: Change a topic’s properties by applying custom Flink SQL transformations. Related content¶ Video: How to Set Idle Timeouts Video: How to Analyze Data from a REST API with Flink SQL Video: How To Use Streaming Joins with Apache Flink Video: How to Visualize Real-Time Data from Apache Kafka using Apache Flink SQL and Streamlit Use Flink SQL with Kafka, Streamlit, and the Alpaca API Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. --- ### Process schemaless events with Flink SQL in Confluent Cloud | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/how-to-guides/process-schemaless-events.html Process Schemaless Events with Confluent Cloud for Apache Flink¶ This guide explains how use Confluent Cloud for Apache Flink to handle and process events in Apache Kafka® topics that don’t use serializers that are compatible with Schema Registry, while still leveraging Schema Registry for data processing with Flink SQL. Overview¶ When working with Kafka topics containing events that aren’t serialized with Schema Registry-compatible serializers, you can still use Flink SQL to process your data. This approach enables you to handle “schemaless” events by defining a schema separately in Schema Registry. Prerequisites¶ Access to Confluent Cloud A Kafka topic containing events you want to process Appropriate permissions to access Schema Registry in Confluent Cloud Step 1: Submit your schema to Schema Registry¶ Log in to the Confluent Cloud Console. Navigate to the Topics Overview page. Locate your topic and click it to open the topic details page. Click Set a schema. Submit your schema in Avro, Protobuf, or JSON format. Note: With JSON you can define a partial schema, which means that not all fields that can exist in the payload need to be defined in the schema at first. Flink ignores fields that aren’t defined. Also, the order of these fields doesn’t matter for JSON. This differs from Avro and Protobuf, where you must define all fields in the right order. In case some fields don’t appear in every event, you can mark these fields as optional. The following example schemas show how sensor data might be represented in full JSON, partial JSON, Avro, and Protobuf formats. Full JSONPartial JSONAvroProtobuf{ "$schema": "http://json-schema.org/draft-07/schema#", "additionalProperties": false, "properties": { "humidity": { "description": "The humidity reading as a percentage", "type": "number" }, "id": { "description": "The unique identifier for the event", "type": "string" }, "temperature": { "description": "The temperature reading in Celsius", "type": "number" }, "timestamp": { "description": "The timestamp of the event in milliseconds since the epoch", "type": "integer" } }, "required": [ "id" ], "title": "DynamicEvent", "type": "object" } { "$schema": "http://json-schema.org/draft-07/schema#", "additionalProperties": false, "properties": { "id": { "description": "The unique identifier for the event", "type": "string" } }, "required": [ "id" ], "title": "DynamicEvent", "type": "object" } { "fields": [ { "name": "id", "type": "string" }, { "default": null, "name": "timestamp", "type": [ "null", "long" ] }, { "default": null, "name": "temperature", "type": [ "null", "float" ] }, { "default": null, "name": "humidity", "type": [ "null", "float" ] } ], "name": "DynamicEvent", "type": "record" } syntax = "proto3"; package example; message DynamicEvent { string id = 1; optional int64 timestamp = 2; optional float temperature = 3; optional float humidity = 4; } Step 2: Query your table¶ Once you’ve submitted the schema, you can start querying your topic immediately by using Flink SQL. The defined schema is used to interpret the data, even if the events themselves don’t contain schema information. Flink first tries to deserialize as if the data was serialized with Schema Registry serializers, and otherwise treats the incoming bytes as Avro, Protobuf, or JSON. Important considerations¶ When possible, you should always use the Schema Registry serializers, to gain the benefits of properly governing your data streams. This method works even if your events don’t include schema version information in their byte stream. You can submit a partial schema only for JSON. Flink will process the defined fields and ignore the rest. With this approach, automatic schema evolution within the stream is not supported. If you want to evolve the schema, you must manually evolve the it and consider the impact as described in Schema Evolution and Compatibility for Schema Registry on Confluent Cloud. Related content¶ Handle Multiple Event Types Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql { "$schema": "http://json-schema.org/draft-07/schema#", "additionalProperties": false, "properties": { "humidity": { "description": "The humidity reading as a percentage", "type": "number" }, "id": { "description": "The unique identifier for the event", "type": "string" }, "temperature": { "description": "The temperature reading in Celsius", "type": "number" }, "timestamp": { "description": "The timestamp of the event in milliseconds since the epoch", "type": "integer" } }, "required": [ "id" ], "title": "DynamicEvent", "type": "object" } ``` ```sql { "$schema": "http://json-schema.org/draft-07/schema#", "additionalProperties": false, "properties": { "id": { "description": "The unique identifier for the event", "type": "string" } }, "required": [ "id" ], "title": "DynamicEvent", "type": "object" } ``` ```sql { "fields": [ { "name": "id", "type": "string" }, { "default": null, "name": "timestamp", "type": [ "null", "long" ] }, { "default": null, "name": "temperature", "type": [ "null", "float" ] }, { "default": null, "name": "humidity", "type": [ "null", "float" ] } ], "name": "DynamicEvent", "type": "record" } ``` ```sql syntax = "proto3"; package example; message DynamicEvent { string id = 1; optional int64 timestamp = 2; optional float temperature = 3; optional float humidity = 4; } ``` --- ### Profile a Query with Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/how-to-guides/profile-query.html Profile a Query with Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables you to profile the performance of your queries. The Query Profiler provides enhanced visibility into how a Flink statement is processing data, which enables rapid identification of bottlenecks, data skew issues, and other performance issues. The profiler updates metrics in near real-time, enabling you to monitor query performance as data flows through your pipeline. For more information about the Query Profiler, see Flink SQL Query Profiler. Prerequisites¶ Access to Confluent Cloud. The OrganizationAdmin, EnvironmentAdmin, or FlinkAdmin role for creating compute pools, or the FlinkDeveloper role if you already have a compute pool. If you don’t have the appropriate role, contact your OrganizationAdmin or EnvironmentAdmin. For more information, see Grant Role-Based Access in Confluent Cloud for Apache Flink. A provisioned Flink compute pool. Step 1: Analyze and run a statement¶ In this step, you use the EXPLAIN statement to perform a static analysis on a query and then start the query. The query is a temporal join between the orders and customers tables. Log in to Confluent Cloud and navigate to your Flink workspace. Run the following EXPLAIN statement to view a static analysis of a query. EXPLAIN SELECT o.order_id, o.`$rowtime`, c.customer_id, c.name, o.price FROM examples.marketplace.orders o JOIN examples.marketplace.customers FOR SYSTEM_TIME AS OF o.`$rowtime` c ON o.customer_id = c.customer_id WHERE o.`$rowtime` >= CURRENT_TIMESTAMP - INTERVAL '1' HOUR; Your output should resemble: == Physical Plan == StreamSink [12] +- StreamCalc [11] +- StreamTemporalJoin [10] +- StreamExchange [3] : +- StreamCalc [2] : +- StreamTableSourceScan [1] +- StreamExchange [9] +- StreamCalc [8] +- StreamChangelogNormalize [7] +- StreamExchange [6] +- StreamCalc [5] +- StreamTableSourceScan [4] ... Run the statement. SELECT o.order_id, o.`$rowtime`, c.customer_id, c.name, o.price FROM examples.marketplace.orders o JOIN examples.marketplace.customers FOR SYSTEM_TIME AS OF o.`$rowtime` c ON o.customer_id = c.customer_id WHERE o.`$rowtime` >= CURRENT_TIMESTAMP - INTERVAL '1' HOUR; Step 2: Profile the query¶ In this step, you use the Query Profiler to monitor the performance of the query. The Query Profiler helps identify performance bottlenecks by showing where records are flowing slowly or backing up in the pipeline. Navigate to your environment’s overview page. In the navigation menu, click Flink, and in the overview page, click Flink statements. In the statement list, click your statement to open the statement details page. Click Query profiler to view the profiler graph. The Query Profiler opens and shows a graph of the Flink tasks that are running. The graph shows the physical execution plan of your query, with each operator represented as a node. The nodes are connected by arrows showing the flow of data between operators. For each operator node, you can see: The operator name and ID Metrics like Idleness and Backpressure Resource utilization like CPU and memory usage Key operators in the current temporal join query include: StreamTableSourceScan nodes [1] and [4] reading from the orders and customers tables StreamCalc nodes [2], [5], [8], [11] performing filtering and projection StreamExchange nodes [3], [6], [9] handling data redistribution between tasks StreamChangelogNormalize [7] processing changelog records from the versioned customers table StreamTemporalJoin [10] joining the orders with customer versions based on event time StreamSink [12] writing results to the output Click the title bar of the TemporalJoin node to open the operator details pane. Click Operator to view details about the operators in the task. Expand State Size to view the amount of data currently stored by the task. In the graph, click on other operator nodes to see metrics about their performance. Related content¶ Flink SQL Query Profiler EXPLAIN Statement Flink SQL Statements Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql EXPLAIN SELECT o.order_id, o.`$rowtime`, c.customer_id, c.name, o.price FROM examples.marketplace.orders o JOIN examples.marketplace.customers FOR SYSTEM_TIME AS OF o.`$rowtime` c ON o.customer_id = c.customer_id WHERE o.`$rowtime` >= CURRENT_TIMESTAMP - INTERVAL '1' HOUR; ``` ```sql == Physical Plan == StreamSink [12] +- StreamCalc [11] +- StreamTemporalJoin [10] +- StreamExchange [3] : +- StreamCalc [2] : +- StreamTableSourceScan [1] +- StreamExchange [9] +- StreamCalc [8] +- StreamChangelogNormalize [7] +- StreamExchange [6] +- StreamCalc [5] +- StreamTableSourceScan [4] ... ``` ```sql SELECT o.order_id, o.`$rowtime`, c.customer_id, c.name, o.price FROM examples.marketplace.orders o JOIN examples.marketplace.customers FOR SYSTEM_TIME AS OF o.`$rowtime` c ON o.customer_id = c.customer_id WHERE o.`$rowtime` >= CURRENT_TIMESTAMP - INTERVAL '1' HOUR; ``` --- ### Resolve Statement Issues in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/how-to-guides/resolve-common-query-problems.html Resolve Statement Issues in Confluent Cloud for Apache Flink¶ Inefficient Flink SQL queries in Confluent Cloud for Apache Flink® can cause performance issues that impact your data processing pipeline. These inefficiencies can be identified early through warnings when you submit your query, or they become apparent later when your statement enters a DEGRADED state during execution. This page explains how to identify and resolve query inefficiencies, providing a comprehensive approach to troubleshooting statement performance problems. Statement enters DEGRADED state¶ When a Flink statement is unable to make consistent progress, it may enter a DEGRADED state. This typically occurs due to performance bottlenecks or resource constraints. You may see the following error message: Your |af| statement has entered a Degraded state because it is unable to make consistent progress. This can be caused by inefficient query logic or insufficient compute resources. Please review your statement for performance bottlenecks. If the issue persists, consider scaling your compute pool or contacting Confluent support for assistance. To resolve a DEGRADED state, follow these steps: Check for Statement Advisor warnings: Review and resolve any warnings that were returned during query submission. If you’re unsure whether warnings were shown, run your query with the EXPLAIN statement to see if warnings are generated. Profile your query: Use the Query Profiler to identify performance bottlenecks and data flow issues in your statement. Review compute resources: Check if your compute pool has reached its maximum CFU limit. If so, consider: Increasing the maximum CFU limit for your compute pool Moving the statement to a dedicated compute pool with more CFU capacity Optimizing your query to reduce resource consumption Optimize query logic: Based on the warnings and profiling results, implement the specific optimizations described in the following warning sections. Primary key differs from derived upsert key¶ [Warning] The primary key "" does not match the upsert key "" that is derived from the query. If the primary key and upsert key don't match, the system needs to add a state-intensive operation for correction, which can result in a DEGRADED statement and higher CFU consumption. If possible, revisit the table declaration with the primary key or change your query. For more information, see https://cnfl.io/primary_vs_upsert_key. This warning occurs when you insert data into a table where the table’s defined PRIMARY KEY doesn’t align with the key columns derived from the INSERT INTO ... SELECT or CREATE TABLE ... AS SELECT query’s grouping or source. When the keys mismatch, Flink must introduce an expensive internal operator (UpsertMaterialize) to ensure correctness, which consumes more state and resources. The following example illustrates a query that triggers this warning: -- Create a table to store customer total orders CREATE TABLE customer_orders ( total_orders INT PRIMARY KEY NOT ENFORCED, -- Primary Key is total_orders customer_name STRING ); -- Insert aggregated order counts per customer INSERT INTO customer_orders SELECT SUM(order_count), customer_name -- Upsert key derived from GROUP BY is customer_name FROM ( VALUES ('Bob', 2), -- Bob placed 2 orders ('Alice', 1), -- Alice placed 1 order ('Bob', 2) -- Bob placed 2 more orders ) AS OrderData(customer_name, order_count) GROUP BY customer_name; To resolve this warning: Align Primary KeyModify the PRIMARY KEY definition in your CREATE TABLE statement to match the columns used to uniquely identify rows in your INSERT query (often the GROUP BY columns). In the example above, changing the primary key to customer_name resolves the warning. Modify QueryAdjust your INSERT INTO ... SELECT query so the selected columns or grouping aligns with the existing primary key definition. This might involve changing the GROUP BY clause or the columns being selected. Check for warningsIf you’re unsure whether your query produces this warning, run it with the EXPLAIN statement to see if warnings are generated. High state operator without state TTL¶ [Warning] Your query includes one or more highly state-intensive operators but does not set a time-to-live (TTL) value, which means that the system potentially needs to store an infinite amount of state. This can result in a DEGRADED statement and higher CFU consumption. If possible, change your query to use a different operator, or set a time-to-live (TTL) value. For more information, see https://cnfl.io/high_state_intensive_operators. Certain SQL operations, like joins on unbounded streams or aggregations without windowing, require Flink to maintain internal state. If this state isn’t configured to expire (using a Time-To-Live or TTL setting), it can grow indefinitely, leading to excessive memory usage, performance degradation, and higher costs. The following example illustrates a query that triggers this warning: -- Joining two unbounded streams without TTL SELECT c.*, o.* FROM `examples`.`marketplace`.`clicks` c INNER JOIN `examples`.`marketplace`.`orders` o ON c.user_id = o.customer_id; To resolve this warning: Set State TTLConfigure a state time-to-live (TTL) for the table(s) involved in the stateful operation. This ensures that state older than the specified duration is automatically cleared. This can done for the full statement via SET ‘sql.state-ttl’ option or for individual tables via State TTL Hints. Use Windowed OperationsIf applicable, rewrite your query to use windowed operations, like windowed joins or windowed aggregations, instead of unbounded operations. Windows limit the amount of state required inherently. Refactor QueryAnalyze if the stateful operation is necessary or if the query logic can be changed to avoid large state requirements. Check for warningsIf you’re unsure whether your query produces this warning, run it with the EXPLAIN statement to see if warnings are generated. Missing window_start or window_end in GROUP BY for window aggregation¶ [Warning] Your query contains only "window_end" in the GROUP BY clause, with no corresponding "window_start". This means that the query is considered a regular aggregation query and not a windowed aggregation, which can result in unexpected, continuously updating output and higher CFU consumption. if you want a windowed aggregation in your query, ensure that you include both "window_start" and "window_end" in the GROUP BY clause. For more information, see https://cnfl.io/regular_vs_window_aggregation. A similar warning appears if only window_start is included without window_end. When performing windowed aggregations, using functions like TUMBLE, HOP, CUMULATE, SESSION, you typically group by the window boundaries (window_start and window_end) along with any other grouping keys. If you include only one of the window boundary columns. either window_start or window_end, in the GROUP BY clause, Flink interprets this as a regular, non-windowed aggregation. This leads to continuously updating results for each input row rather than a single result per window, which is usually not the intended behavior and can consume more resources. The following example illustrates a query that triggers this warning: -- Incorrect GROUP BY for TUMBLE window SELECT window_end, SUM(price) as `sum` FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES) ) GROUP BY window_end; -- Missing window_start To resolve this warning should it occur in a query: Include both window boundariesWhen performing windowed aggregations, ensure that your GROUP BY clause includes both window_start and window_end. Check for warningsIf you’re unsure whether your query produces this warning, run it with the EXPLAIN statement to see if warnings are generated. The following example shows the revised query that resolves this warning: -- Correct GROUP BY for TUMBLE window SELECT window_start, window_end, SUM(price) as `sum` FROM TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES) ) GROUP BY window_start, window_end; -- Includes both window boundaries Session window without a PARTITION BY key¶ [Warning] Your query uses a SESSION window without a PARTITION BY clause. This results in all data being processed by a single, non-parallel task, which can create a significant bottleneck, leading to poor performance and high resource consumption. To improve performance and enable parallel execution, specify a PARTITION BY key in your SESSION window. For more information, see https://cnfl.io/session_without_partioning. When using a SESSION window, data is grouped into sessions based on periods of activity, which are separated by a specified gap of inactivity. If you don’t include a PARTITION BY clause, all data will be sent to a single, non-parallel task to correctly identify these sessions. This creates a significant performance bottleneck and prevents the query from scaling. The following example shows a query that triggers this warning: -- This query uses a SESSION window without a PARTITION BY key SELECT * FROM SESSION( TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES ); To resolve this warning: Add a PARTITION BY keyModify your SESSION window definition to include a PARTITION BY clause. This partitions the data by the specified key(s), allowing the sessionization to be performed independently and in parallel for each partition. This is important for performance and scalability. Check for warningsIf you’re unsure whether your query produces this warning, run it with the EXPLAIN statement to see if warnings are generated. The following example shows the revised query that resolves the warning: -- Corrected query with PARTITION BY to enable parallel execution SELECT * FROM SESSION( TABLE `examples`.`marketplace`.`orders` PARTITION BY customer_id, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES ); Related content¶ Profile a Query EXPLAIN Statement SET HINTS Window Aggregation Window TopN Window Join Window Deduplication Interval join Temporal join Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql Your |af| statement has entered a Degraded state because it is unable to make consistent progress. This can be caused by inefficient query logic or insufficient compute resources. Please review your statement for performance bottlenecks. If the issue persists, consider scaling your compute pool or contacting Confluent support for assistance. ``` ```sql [Warning] The primary key "" does not match the upsert key "" that is derived from the query. If the primary key and upsert key don't match, the system needs to add a state-intensive operation for correction, which can result in a DEGRADED statement and higher CFU consumption. If possible, revisit the table declaration with the primary key or change your query. For more information, see https://cnfl.io/primary_vs_upsert_key. ``` ```sql PRIMARY KEY ``` ```sql INSERT INTO ... SELECT ``` ```sql CREATE TABLE ... AS SELECT ``` ```sql UpsertMaterialize ``` ```sql -- Create a table to store customer total orders CREATE TABLE customer_orders ( total_orders INT PRIMARY KEY NOT ENFORCED, -- Primary Key is total_orders customer_name STRING ); -- Insert aggregated order counts per customer INSERT INTO customer_orders SELECT SUM(order_count), customer_name -- Upsert key derived from GROUP BY is customer_name FROM ( VALUES ('Bob', 2), -- Bob placed 2 orders ('Alice', 1), -- Alice placed 1 order ('Bob', 2) -- Bob placed 2 more orders ) AS OrderData(customer_name, order_count) GROUP BY customer_name; ``` ```sql PRIMARY KEY ``` ```sql CREATE TABLE ``` ```sql customer_name ``` ```sql INSERT INTO ... SELECT ``` ```sql [Warning] Your query includes one or more highly state-intensive operators but does not set a time-to-live (TTL) value, which means that the system potentially needs to store an infinite amount of state. This can result in a DEGRADED statement and higher CFU consumption. If possible, change your query to use a different operator, or set a time-to-live (TTL) value. For more information, see https://cnfl.io/high_state_intensive_operators. ``` ```sql -- Joining two unbounded streams without TTL SELECT c.*, o.* FROM `examples`.`marketplace`.`clicks` c INNER JOIN `examples`.`marketplace`.`orders` o ON c.user_id = o.customer_id; ``` ```sql window_start ``` ```sql [Warning] Your query contains only "window_end" in the GROUP BY clause, with no corresponding "window_start". This means that the query is considered a regular aggregation query and not a windowed aggregation, which can result in unexpected, continuously updating output and higher CFU consumption. if you want a windowed aggregation in your query, ensure that you include both "window_start" and "window_end" in the GROUP BY clause. For more information, see https://cnfl.io/regular_vs_window_aggregation. ``` ```sql window_start ``` ```sql window_start ``` ```sql window_start ``` ```sql -- Incorrect GROUP BY for TUMBLE window SELECT window_end, SUM(price) as `sum` FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES) ) GROUP BY window_end; -- Missing window_start ``` ```sql window_start ``` ```sql -- Correct GROUP BY for TUMBLE window SELECT window_start, window_end, SUM(price) as `sum` FROM TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES) ) GROUP BY window_start, window_end; -- Includes both window boundaries ``` ```sql [Warning] Your query uses a SESSION window without a PARTITION BY clause. This results in all data being processed by a single, non-parallel task, which can create a significant bottleneck, leading to poor performance and high resource consumption. To improve performance and enable parallel execution, specify a PARTITION BY key in your SESSION window. For more information, see https://cnfl.io/session_without_partioning. ``` ```sql -- This query uses a SESSION window without a PARTITION BY key SELECT * FROM SESSION( TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES ); ``` ```sql -- Corrected query with PARTITION BY to enable parallel execution SELECT * FROM SESSION( TABLE `examples`.`marketplace`.`orders` PARTITION BY customer_id, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES ); ``` --- ### Run a Snapshot Query with in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/how-to-guides/run-snapshot-query.html Run a Snapshot Query with Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports snapshot queries that read data from a table at a specific point in time. In contrast with a streaming query, which runs continuously and returns results incrementally, a snapshot query runs, returns results, and then exits. This guide shows how to run a snapshot query on a Flink table. Step 1: Create an example data stream Step 2: Run a snapshot query on the topic Step 3: Set the snapshot mode in SQL Note Snapshot query is an Early Access Program feature in Confluent Cloud for Apache Flink. An Early Access feature is a component of Confluent Cloud introduced to gain feedback. This feature should be used only for evaluation and non-production testing purposes or to provide feedback to Confluent, particularly as it becomes more widely available in follow-on preview editions. Early Access Program features are intended for evaluation use in development and testing environments only, and not for production use. Early Access Program features are provided: (a) without support; (b) “AS IS”; and (c) without indemnification, warranty, or condition of any kind. No service level commitment will apply to Early Access Program features. Early Access Program features are considered to be a Proof of Concept as defined in the Confluent Cloud Terms of Service. Confluent may discontinue providing preview releases of the Early Access Program features at any time in Confluent’s sole discretion. Prerequisites¶ Access to Confluent Cloud. The OrganizationAdmin, EnvironmentAdmin, or FlinkAdmin role for creating compute pools, or the FlinkDeveloper role if you already have a compute pool. If you don’t have the appropriate role, contact your OrganizationAdmin or EnvironmentAdmin. For more information, see Grant Role-Based Access in Confluent Cloud for Apache Flink. A provisioned Flink compute pool. Step 1: Create an example data stream¶ In this step, you create a Datagen source connector that produces a stream of data. If you have a topic with data, you can skip this step and proceed to Step 2: Run a snapshot query on the topic. In the Confluent Cloud UI, go to the Environments page. Select the environment where you want to create the connector. In the Overview page, click the cluster that you want to use. In the navigation menu, click Connectors. Click Add connector, and in the Connector Plugins page, click Sample Data. In the Launch Sample Data dialog, click Users, and click Launch. It may take a few minutes to create the connector. Step 2: Run a snapshot query on the topic¶ In the navigation menu, click Topics. In the topics list, find the topic you want to query. If you created a Datagen source connector, the topic is named sample_data_users. Click the topic name to open the topic details page. Click Query with Flink. A Flink workspace opens with a SQL editor that you can use to run a snapshot query. In the cell, find the Mode dropdown, which defaults to Streaming. Change the mode to Snapshot and click Run. The query runs and returns all of the messages that have been produced to the topic at the current point in time. Step 3: Set the snapshot mode in SQL¶ You can set the snapshot mode in SQL by using the SET statement to assign the sql.snapshot.mode configuration option. In the cell, prepend the SELECT statement with the following SET statement: SET 'sql.snapshot.mode' = 'now'; SELECT * FROM ``.``.`sample_data_users`; Click Run. The query runs and returns all of the messages that have been produced to the topic at the current point in time. Related content¶ Snapshot Queries Query Tableflow Tables with Flink Statements Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql sample_data_users ``` ```sql sql.snapshot.mode ``` ```sql SET 'sql.snapshot.mode' = 'now'; SELECT * FROM ``.``.`sample_data_users`; ``` --- ### Scan and Summarize Flink Tables in Confluent Cloud | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/how-to-guides/scan-and-summarize-tables.html Scan and Summarize Tables with Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides graphical tools in your workspaces that enable scanning and summarizing data visually in Flink tables. Distributions of values for each column in a table are shown in embedded charts, or sparklines. You can highlight values in one chart to filter corresponding values in all columns, revealing connections and relationships in your data. Overview¶ When you explore data in a table, you frequently want to find a row (scan), or you may want to understand the shape of the data (summarize). Scan¶ Cloud Console workspaces provide a search box that enables scanning the data for particular rows. For example, if you’re interested in the orders that are placed by a particular customer, you can enter the customer’s ID in the search box to scan the table for relevant rows. Summarize¶ In a Cloud Console workspace, when you run a Flink SQL statement that returns a table, sparklines are displayed automatically and show the distribution of distinct values in each column. These charts update automatically as new rows arrive from the data stream. The workspace enables filtering rows by interacting with these charts. For example, in an orders table, you can apply a filter that shows only rows for low-price items and compare these results with another filter that shows high-price items to see if there’s a different distribution of items between the price ranges. Explore example data¶ Log in to the Confluent Cloud Console and navigate to an environment that hosts Flink SQL. In the navigation menu, click Stream processing to open the Stream processing page. If you have a workspace set up already, click its tile, or click Create workspace to create a new one. In the workspace, use the Catalog and Database dropdown controls to select the examples catalog and the marketplace database. Run the following statement to query the orders stream for all rows. SELECT * FROM orders; Your output should resemble: At the top of each column, a chart is displayed. The charts update as new rows stream into the query results. Each chart shows the distribution of distinct values in the column, for strings, booleans, numbers, and categories. An icon displays the data type of the column. Also, the arrow icon enables sorting rows by the column values. At the bottom of each column, aggregated values are displayed that summarize aspects of the data in the column, like the count of rows and the number of distinct values, or cardinality. For columns with numerical values, you can see statistics, like the average, minimum, and maximum values. The number of rows displayed is limited to 5000 or to the LIMIT value you specify in your query. For example, the following statement limits the query result to 50 rows. SELECT * FROM orders LIMIT 50; At the bottom of the price column, change the dropdown control from Count to Average. The average value of the most recent prices displays and updates as new rows arrive. Select other statistics for prices, like Max and Min. Search for values¶ The search box enables finding values across all columns in the currently displayed result set. The search box doesn’t filter the data. It’s useful for scanning for a particular row or narrowing the results down to a particular row. In the search box, type “3000”. All rows that have a customer_id value of 3000 are displayed, which enables viewing all orders from this customer. Click x in the search to clear it. In the search box, type “1000”. All rows that have a product_id value of 1000 are displayed, which enables viewing all orders for this product. Click x in the search box to clear it. In the search box, type “3050”, and in the price column, click the double-arrow icon. All rows for customer 3050 are displayed, and the rows are sorted by price, from lowest to highest. In the price column, click the arrow icon. All rows for customer 3050 are displayed, and the rows are sorted by price, from highest to lowest. Click x in the search box to clear it, and click the arrow icon in the price column to reset the rows to unsorted. Apply a filter¶ Any column that has numerical or datetime data is filterable. Filters apply across all columns in the table. Filters apply only in the graphical display and don’t affect the underlying data stream. Hover over the leftmost bar in the price chart. The cursor changes to a + target, and a summary of the rows represented by the bar appears in a popup. Click-drag, or brush the cursor over the first three bars in the price chart. A filter is applied to the price data, so only the rows with prices that fall within the selected range are displayed. This filter shows the orders for the least expensive products. When a filter is applied, the unfiltered data is shown in gray. Above the charts, the current filter is displayed and if you click on it you will see (and be able to adjust) its settings. You can apply more than one filter. In the customer_id chart, brush the first three bars. A filter is applied to the customer data. In conjunction with the filter you applied already to the price data, the displayed rows show the least expensive products ordered by customers with IDs between 3000 and 3029, inclusive. Click x in the filters to clear them. View changes over time¶ If your data contains a datetime column then each numerical column, along with distribution, will have the option to show the average value over time. If the data is filtered then the unfiltered average value is also shown for context. You can hover over the chart for exact values. Related content¶ View Time Series Data Aggregate a Stream in a Tumbling Window Compare Current and Previous Values in a Data Stream Convert the Serialization Format of a Topic Flink action: Mask Fields in a Table Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql SELECT * FROM orders; ``` ```sql SELECT * FROM orders LIMIT 50; ``` --- ### Transform a Topic with Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/how-to-guides/transform-topic.html Transform a Topic with Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables generating a transformed topic from an input topic’s properties, like partition count, key, serialization format, and field names, with only a few clicks. In this guide, you create a Flink table and apply a transformation that creates an output topic with these changes: Rename a field Specify a bucket key Change the key and value serialization format Specify a different partition count The Transform Topic action creates a Flink SQL statement for you, but no knowledge of Flink SQL is required to use it. This guide shows the following steps: Step 1: Create a users table Step 2: Apply the Transform Topic action Step 3: Inspect the transformed topic Prerequisites¶ Access to Confluent Cloud. The OrganizationAdmin, EnvironmentAdmin, or FlinkAdmin role for creating compute pools, or the FlinkDeveloper role if you already have a compute pool. If you don’t have the appropriate role, contact your OrganizationAdmin or EnvironmentAdmin. For more information, see Grant Role-Based Access in Confluent Cloud for Apache Flink. A provisioned Flink compute pool. Step 1: Create a users table¶ Log in to Confluent Cloud and navigate to your Flink workspace. Run the following statement to create a users table. -- Create a users table. CREATE TABLE users ( user_id STRING, registertime BIGINT, gender STRING, regionid STRING ); Insert rows with mock data into the users table. -- Populate the table with mock users data. INSERT INTO users VALUES ('Thomas A. Anderson', 1677260724, 'male', 'Region_4'), ('Trinity', 1677260733, 'female', 'Region_4'), ('Morpheus', 1677260742, 'male', 'Region_8'), ('Dozer', 1677260823, 'male', 'Region_1'), ('Agent Smith', 1677260955, 'male', 'Region_0'), ('Persephone', 1677260901, 'female', 'Region_2'), ('Niobe', 1677260921, 'female', 'Region_3'), ('Zee', 1677260922, 'female', 'Region_5'); Inspect the inserted rows. SELECT * FROM users; Your output should resemble: user_id registertime gender regionid Thomas A. Anderson 1677260724 male Region_4 Trinity 1677260733 female Region_4 Morpheus 1677260742 male Region_8 Dozer 1677260823 male Region_1 Agent Smith 1677260955 male Region_0 Persephone 1677260901 female Region_2 Niobe 1677260921 female Region_3 Zee 1677260922 female Region_5 Step 2: Apply the Transform Topic action¶ In the previous step, you created a Flink table and populated it with a few rows. In this step, you apply the Transform Topic action to create a transformed output table. Navigate to the Environments page, and in the navigation menu, click Data portal. In the Data portal page, click the dropdown menu and select the environment for your workspace. In the Recently created section, find your users topic and click it to open the details pane. In the details pane, click Actions, and in the Actions list, click Transform topic to open the dialog. In the Action details section, set up the transformation. user_id field: select the Key field checkbox. registertime field: enter registration_time. Partition count property: enter 3. Serialization format property: select JSON Schema. By default, the name of the transformed topic is users_transform, and you can change this as desired. In the Runtime configuration section, configure how the transformation statement will run. (Optional) Select the Flink compute pool to run the embedding query. The current compute pool is selected as the default. (Optional) Select Run with a service account for production jobs. The service account you select must have the EnvironmentAdmin role to create topics, schemas, and run Flink statements. (Optional) Select Show SQL to view the Flink statement that does the transformation work. Your Flink SQL should resemble: CREATE TABLE `your-env`.`your-cluster`.`users_transform` DISTRIBUTED BY HASH ( `user_id` ) INTO 3 BUCKETS WITH ( 'value.format' = 'json-registry', 'key.format' = 'json-registry' ) AS SELECT `user_id`, `registertime` as `registration_time`, `gender`, `regionid` FROM `your-env`.`your-cluster`.`users`; Click Confirm and run to run the transformation statement. A Summary page displays the result of the job submission, showing the statement name and other details. Step 3: Inspect the transformed topic¶ In the Summary page, click the Output topic link for the users_transform topic, and in the topic’s details pane, click Query to open a Flink workspace. Run the following statement to view the rows in the users_transform table. Note the renamed registration_time column. SELECT * FROM `users_transform`; Click Stop to end the statement. Run the following command to confirm that the user_id field in the transformed table is a key field. DESCRIBE `users_source_transform`; Your output should resemble: +-------------------+-----------+----------+------------+ | Column Name | Data Type | Nullable | Extras | +-------------------+-----------+----------+------------+ | user_id | STRING | NULL | BUCKET KEY | | registration_time | BIGINT | NULL | | | gender | STRING | NULL | | | regionid | STRING | NULL | | +-------------------+-----------+----------+------------+ Run the following command to confirm the serialization format and partition count on the transformed topic. SHOW CREATE TABLE `users_source_transform`; Your output should resemble: CREATE TABLE `your-env`.`your-cluster`.`users_transform` ( `user_id` VARCHAR(2147483647), `registration_time` BIGINT, `gender` VARCHAR(2147483647), `regionid` VARCHAR(2147483647) ) DISTRIBUTED BY HASH(`user_id`) INTO 3 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'kafka.cleanup-policy' = 'delete', 'kafka.max-message-size' = '2097164 bytes', 'kafka.retention.size' = '0 bytes', 'kafka.retention.time' = '7 d', 'key.format' = 'json-registry', 'scan.bounded.mode' = 'unbounded', 'scan.startup.mode' = 'earliest-offset', 'value.format' = 'json-registry' ) Related content¶ Flink action: Deduplicate Rows in a Table Flink action: Mask Fields in a Table Flink action: Create an Embedding Aggregate a Stream in a Tumbling Window Compare Current and Previous Values in a Data Stream Convert the Serialization Format of a Topic Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql -- Create a users table. CREATE TABLE users ( user_id STRING, registertime BIGINT, gender STRING, regionid STRING ); ``` ```sql -- Populate the table with mock users data. INSERT INTO users VALUES ('Thomas A. Anderson', 1677260724, 'male', 'Region_4'), ('Trinity', 1677260733, 'female', 'Region_4'), ('Morpheus', 1677260742, 'male', 'Region_8'), ('Dozer', 1677260823, 'male', 'Region_1'), ('Agent Smith', 1677260955, 'male', 'Region_0'), ('Persephone', 1677260901, 'female', 'Region_2'), ('Niobe', 1677260921, 'female', 'Region_3'), ('Zee', 1677260922, 'female', 'Region_5'); ``` ```sql SELECT * FROM users; ``` ```sql user_id registertime gender regionid Thomas A. Anderson 1677260724 male Region_4 Trinity 1677260733 female Region_4 Morpheus 1677260742 male Region_8 Dozer 1677260823 male Region_1 Agent Smith 1677260955 male Region_0 Persephone 1677260901 female Region_2 Niobe 1677260921 female Region_3 Zee 1677260922 female Region_5 ``` ```sql users_transform ``` ```sql CREATE TABLE `your-env`.`your-cluster`.`users_transform` DISTRIBUTED BY HASH ( `user_id` ) INTO 3 BUCKETS WITH ( 'value.format' = 'json-registry', 'key.format' = 'json-registry' ) AS SELECT `user_id`, `registertime` as `registration_time`, `gender`, `regionid` FROM `your-env`.`your-cluster`.`users`; ``` ```sql SELECT * FROM `users_transform`; ``` ```sql DESCRIBE `users_source_transform`; ``` ```sql +-------------------+-----------+----------+------------+ | Column Name | Data Type | Nullable | Extras | +-------------------+-----------+----------+------------+ | user_id | STRING | NULL | BUCKET KEY | | registration_time | BIGINT | NULL | | | gender | STRING | NULL | | | regionid | STRING | NULL | | +-------------------+-----------+----------+------------+ ``` ```sql SHOW CREATE TABLE `users_source_transform`; ``` ```sql CREATE TABLE `your-env`.`your-cluster`.`users_transform` ( `user_id` VARCHAR(2147483647), `registration_time` BIGINT, `gender` VARCHAR(2147483647), `regionid` VARCHAR(2147483647) ) DISTRIBUTED BY HASH(`user_id`) INTO 3 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'kafka.cleanup-policy' = 'delete', 'kafka.max-message-size' = '2097164 bytes', 'kafka.retention.size' = '0 bytes', 'kafka.retention.time' = '7 d', 'key.format' = 'json-registry', 'scan.bounded.mode' = 'unbounded', 'scan.startup.mode' = 'earliest-offset', 'value.format' = 'json-registry' ) ``` --- ### View Time Series Data in Confluent Cloud | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/how-to-guides/view-time-series-data.html View Time Series Data with Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables visualizing time-series data in real time. The output of certain SQL statements render as time-series charts. Whenever a statement’s output has at least one time column, and at least one numeric column, it is charted automatically in a time-series graph when you toggle to chart mode. You can further customize charts by user interactions: you can choose a different x-axis column, add multiple series, change the chart’s time granularity, and filter the overall time range. Prerequisites¶ Access to Confluent Cloud. The OrganizationAdmin, EnvironmentAdmin, or FlinkAdmin role for creating compute pools, or the FlinkDeveloper role if you already have a compute pool. If you don’t have the appropriate role, contact your OrganizationAdmin or EnvironmentAdmin. For more information, see Grant Role-Based Access in Confluent Cloud for Apache Flink. A provisioned Flink compute pool. Step 1: Open a workspace¶ Log in to Confluent Cloud Console at https://confluent.cloud/login. Open a Flink workspace. Use the Catalog and Database dropdown controls to select the examples catalog and the marketplace database. Step 2: Generate time-series data¶ Run the following statement to generate three time-series signals. SELECT $rowtime AS row_timestamp, RAND() * 0.10 * SIN(0.10 * UNIX_TIMESTAMP() + 0) AS series1, RAND() * 0.10 * SIN(0.10 * UNIX_TIMESTAMP() + 1.1e3) AS series2, RAND() * 0.03 * SIN(0.10 * UNIX_TIMESTAMP() + 1.2e3) AS series3 FROM orders; Step 3: View time-series data¶ Click the time-series toggle () to open the time-series visualizer. Your output should resemble: The upper pane shows the series1 signal. The lower pane enables scrolling through the data as it streams through the visualizer. On the right side of the lower pane, click and drag it to the left. On the left side of the lower pane, click and drag it to the right. These gestures define the width of the view window that displays in the upper pane. Click and drag it to the right. The view in the upper pane adjusts to display the data within the window. As data continues to stream, the window in the lower pane moves to the left, while the display in the upper pane remains centered on the data selected in the window. Double-click to reset the view. Click Add Column, and in the context menu, select series2 and series3 to display the other signals. Click to download the current visualization as a PNG file. Click the time-series toggle () to close the visualizer. Related content¶ Scan and Summarize Tables Compare Current and Previous Values in a Data Stream Convert the Serialization Format of a Topic Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql SELECT $rowtime AS row_timestamp, RAND() * 0.10 * SIN(0.10 * UNIX_TIMESTAMP() + 0) AS series1, RAND() * 0.10 * SIN(0.10 * UNIX_TIMESTAMP() + 1.1e3) AS series2, RAND() * 0.03 * SIN(0.10 * UNIX_TIMESTAMP() + 1.2e3) AS series3 FROM orders; ``` --- ### Best Practices for Moving SQL Statements to Production in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/operate-and-deploy/best-practices.html Move SQL Statements to Production in Confluent Cloud for Apache Flink¶ When you move your Flink SQL statements to production in Confluent Cloud for Apache Flink®, consider the following recommendations and best practices. Validate your watermark strategy Validate or disable idleness handling Choose the correct Schema Registry compatibility type Separate workloads of different priorities into separate compute pools Use event-time temporal joins instead of streaming joins Implement state time-to-live (TTL) Use service account API keys for production Assign custom names to Flink SQL statements Validate your watermark strategy¶ When moving your Flink SQL statements to production, it’s crucial to validate your watermark strategy. Watermarks in Flink track the progress of event time and provide a way to trigger time-based operations. Confluent Cloud for Apache Flink provides a default watermark strategy for all tables, whether they’re created automatically from a Kafka topic or from a CREATE TABLE statement. The default watermark strategy is applied on the $rowtime system column, which is mapped to the associated timestamp of a Kafka record. Watermarks for this default strategy are calculated per Kafka partition, and at least 250 events are required per partition. Here are some situations when you need to define your own custom watermark strategy: When the event time needs to be based on data from the payload and not the timestamp of the Kafka record. If a delay of longer than 7 days can occur. When events might not arrive in the exact order they were generated. When data may arrive late due to network latency or processing delays. Validate or disable idleness handling¶ One critical aspect to consider when moving your Flink SQL statements to production is the handling of idleness in data streams. If no events arrive within a certain time (timeout duration) on a Kafka partition, that partition is marked as idle and does not contribute to the watermark calculation until a new event comes in. This situation creates a problem: if some partitions continue to receive events while others are idle, the overall watermark computation, which is based on the minimum across all parallel watermarks, may be inaccurately held back. Confluent Cloud for Apache Flink dynamically adjusts the consideration of idle partitions in watermark calculations with Confluent’s Progressive Idleness feature. The idle-time detection starts small at 15 seconds but grows linearly with the age of the statement up to a maximum of 5 minutes. Progressive Idleness can cause wrong watermarks if a partition is marked as idle too quickly, and this can cause the system to move ahead too quickly, impacting data processing. When you move your Flink SQL statement into production, make sure that you have validated how you want to handle idleness. You can configure or disable this behavior by using the sql.tables.scan.idle-timeout option. Choose the correct Schema Registry compatibility type¶ The Confluent Schema Registry plays a pivotal role in ensuring that the schemas of the data flowing through your Kafka topics are consistent, compatible, and evolve in a controlled manner. One of the key decisions in this process is selecting the appropriate schema compatibility type. Consider using FULL_TRANSITIVE compatibility to ensure that any new schema is fully compatible with all previous versions of the schema. This comprehensive check minimizes the risk of introducing changes that could disrupt data-processing applications relying on the data. When choosing any of the other compatibility modes, you need to consider the consequences on currently-running statements, especially since a Flink statement is both a producer and a consumer at the same time. Separate workloads of different priorities into separate compute pools¶ All statements using the same compute pools compete for resources. Although the Confluent Cloud Autopilot aims to provide each statement with the resources it needs, this may not always be possible, in particular, when the maximum resources of the compute pool are exhausted. To avoid situations in which statements with different latency and availability requirements compete for resources, consider using separate compute pools for different use cases, for example, ad-hoc exploration vs. mission-critical, long-running queries. Because statements may affect each other, you should share compute pools only between statements with comparable requirements. Use event-time temporal joins instead of streaming joins¶ When processing data streams, choosing the right type of join operation is crucial for efficiency and performance. Event-time temporal joins offer significant advantages over regular streaming joins. Temporal joins are particularly useful when the join condition is based on a time attribute. They enable you to join a primary stream with a historical version of another table, using the state of that table as it existed at the time of the event. This results in more efficient processing, because it avoids the need to keep large amounts of state in memory. Traditional streaming joins involve keeping a stateful representation of all joined records, which can be inefficient and resource-intensive, especially with large datasets or high-velocity streams. Also, event-time temporal joins typically result in insert-only outputs, when your inputs are also insert-only, which means that once a record is processed and joined, it is not updated or deleted later. Streaming joins often need to handle updates and deletions. When moving to production, prefer using temporal joins wherever applicable to ensure your data processing is efficient and performant. Avoid traditional streaming joins unless necessary, as they can lead to increased resource consumption and complexity. Implement state time-to-live (TTL)¶ Some stateful operations in Flink require storing state, like streaming joins and pattern matching. Managing this state effectively is crucial for application performance, resource optimization, and cost reduction. The state time-to-live (TTL) feature enables specifying a minimum time interval for how long state, meaning state that is not updated, is retained. This mechanism ensures that state is cleared at some time after the idle duration. When moving to production, you should configure the sql.state-ttl setting carefully to balance performance versus correctness of the results. Use service account API keys for production¶ API keys for Confluent Cloud can be created with user accounts and service accounts. A service account is intended to provide an identity for an application or service that needs to perform programmatic operations within Confluent Cloud. When moving to production, ensure that only service account API keys are used. Avoid user account API keys, except for development and testing. If a user leaves and a user account is deleted, all API keys created with that user account are deleted, and applications might break. Assign custom names to Flink SQL statements¶ Custom naming facilitates easier management, monitoring, and debugging of your streaming applications by providing clear, identifiable references to specific operations or data flows. You can do this easily by using the client.statement-name option. Review error handling and monitoring best practices¶ Review these topics: Error handling and recovery Best practices for alerting Notifications Related content¶ Flink Compute Pools Billing on Confluent Cloud for Apache Flink Managing and Monitoring Statements Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql FULL_TRANSITIVE ``` --- ### Carry-over Offsets in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/operate-and-deploy/carry-over-offsets.html Carry-over Offsets in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports carry-over offsets, which means that you can use the topic offsets from one statement to start a new statement. Carry-over offsets provide a streamlined way to update Flink statements without data loss. This feature eliminates the manual complexity of copying offsets between statements and reduces the need to monitor statement status when deploying CI/CD pipelines. Automatic orchestration handles the upgrade process. The system automatically waits for the old statement to stop before starting the new one, providing a seamless transition of processing between statements. Carry-over offsets are available only when replacing an existing statement. This feature enables you to evolve statements with exactly-once semantics across the update when the statement is “stateless”, as determined by the system. At a high level, “stateless” applies to statements that can process each event independently and in any order. For other scenarios, such as aggregates, lag, windows, pattern matching, or use of upsert sink, this feature can’t be used, because the update may cause inconsistent results. To use carry-over offsets, add the sql.tables.initial-offset-from property to the statement configuration when you create your new statement, for example: In the Confluent Cloud Console and the Flink SQL shell, you can set the property by using the SET statement, for example: SET 'sql.tables.initial-offset-from' = '' The is the name of the statement that you want to use as the reference for the carry-over offsets. If you’re using the Statements API or the Confluent Terraform provider, you can set the property by using the properties field, for example: { "properties": { "sql.tables.initial-offset-from": "" } } Considerations for carry-over offsets¶ Regional limitations¶ The referenced statement must be in the same organization, environment, and region as the new statement. Cross-region offset carry-over is not supported using this property. Timeout Behavior¶ New statements will wait up to 6 hours for the referenced statement to stop. If the timeout expires, the new statement will fail with an error message indicating the reason. Table Options Priority¶ Explicit table options in your SQL text take precedence over inherited offsets. Only tables without explicit options will use carried-over offsets. Example of table options priority: INSERT INTO output SELECT * FROM table1 UNION ALL SELECT * FROM table2 /*+ OPTIONS('scan.startup.mode' = 'latest-offset') */; Result: table1 uses carried-over offsets, table2 uses the specified latest-offset mode. Common Issues¶ Statement Not Found Error¶ Verify the referenced statement name is correct. Ensure the statement exists in the same org/env/region. Timeout Exceeded¶ Check if the old statement is actually stopping. Verify there are no blocking conditions preventing termination. Invalid SQL Error¶ The new statement’s syntax is validated immediately upon creation. Fix SQL syntax errors before the offset carry-over process begins. Referenced Statement Savepoint Failed¶ The statement failed to be submitted because the referenced statement didn’t enter a stopped state gracefully. Data inconsistencies can occur when using offsets from failed savepoints. Try to resume the referenced statement and stop it again. If there are still issues, contact Confluent Support. Examples¶ Statement already stopped¶ You have a stopped statement named my-original-statement. Create a new statement with updated logic: SET 'sql.tables.initial-offset-from' = 'my-original-statement'; INSERT INTO enhanced_output SELECT user_id, event_type, timestamp, new_field FROM user_events WHERE event_type IN ('click', 'view', 'purchase'); Statement still running¶ Your original statement metrics-processor-v1 is still running, SET 'sql.tables.initial-offset-from' = 'metrics-processor-v1'; INSERT INTO enhanced_output SELECT user_id, event_type, timestamp, new_field FROM user_events WHERE event_type IN ('click', 'view', 'purchase'); The new statement remains in the “Pending” state until you stop metrics-processor-v1. Related content¶ Schema and Statement Evolution Flink SQL Shell Quick Start Flink SQL Shell Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql sql.tables.initial-offset-from ``` ```sql SET 'sql.tables.initial-offset-from' = '' ``` ```sql ``` ```sql { "properties": { "sql.tables.initial-offset-from": "" } } ``` ```sql INSERT INTO output SELECT * FROM table1 UNION ALL SELECT * FROM table2 /*+ OPTIONS('scan.startup.mode' = 'latest-offset') */; ``` ```sql latest-offset ``` ```sql my-original-statement ``` ```sql SET 'sql.tables.initial-offset-from' = 'my-original-statement'; INSERT INTO enhanced_output SELECT user_id, event_type, timestamp, new_field FROM user_events WHERE event_type IN ('click', 'view', 'purchase'); ``` ```sql metrics-processor-v1 ``` ```sql SET 'sql.tables.initial-offset-from' = 'metrics-processor-v1'; INSERT INTO enhanced_output SELECT user_id, event_type, timestamp, new_field FROM user_events WHERE event_type IN ('click', 'view', 'purchase'); ``` ```sql metrics-processor-v1 ``` --- ### Manage Flink Compute Pools in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/operate-and-deploy/create-compute-pool.html Manage Compute Pools in Confluent Cloud for Apache Flink¶ A compute pool represents the compute resources that are used to run your SQL statements. The resources provided by a compute pool are shared among all statements that use it. It enables you to limit or guarantee resources as your use cases require. A compute pool is bound to a region. There is no cost for creating compute pools. To create a compute pool, you need the OrganizationAdmin, EnvironmentAdmin, or FlinkAdmin RBAC role. In addition to the Cloud Console, Confluent provides these tools for creating and managing Flink compute pools: Confluent CLI Confluent Cloud REST API Confluent Terraform provider Create a compute pool¶ Confluent Cloud ConsoleConfluent CLIREST APITerraform In the navigation menu, click Environments, and click the tile for the environment where you want to use Flink SQL. In the environment details page, click Flink. In the Flink page, click Compute pools, if it’s not selected already. Click Create compute pool to open the Create compute pool page. In the Region dropdown, select the region that hosts the data you want to process with SQL, or use any region if you just want to try out Flink using sample data. Click Continue. In the Pool name textbox, enter “my-compute-pool”. In the Max CFUs dropdown, select 10. For more information, see CFUs. Note You can increase the Max CFUs value later, but decreasing Max CFUs is not supported. Click Continue, and on the Review and create page, click Finish. A tile for your compute pool appears on the Flink page. It shows the pool in the Provisioning state. It may take a few minutes for the pool to enter the Running state. Tip The tile for your compute pool provides the Confluent CLI command for using the pool from the CLI. Learn more about the CLI in the Flink SQL Shell Quick Start. Run the confluent flink compute-pool create command to create a compute pool. Creating a compute pool requires the following inputs: export COMPUTE_POOL_NAME= # human-readable name, for example, "my-compute-pool" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export ENV_ID="" # example: "env-z3y2x1" export MAX_CFU="" # example: 5 Run the following command to create a compute pool in the specified cloud provider and environment. confluent flink compute-pool create ${COMPUTE_POOL_NAME} \ --cloud ${CLOUD_PROVIDER} \ --region ${CLOUD_REGION} \ --max-cfu ${MAX_CFU} \ --environment ${ENV_ID} Your output should resemble: +-------------+-----------------+ | Current | false | | ID | lfcp-xxd6og | | Name | my-compute-pool | | Environment | env-z3y2x1 | | Current CFU | 0 | | Max CFU | 5 | | Cloud | AWS | | Region | us-east-1 | | Status | PROVISIONING | +-------------+-----------------+ Create a compute pool in your environment by sending a POST request to the Compute Pools endpoint. This request uses your Cloud API key instead of the Flink API key. Creating a compute pool requires the following inputs: export COMPUTE_POOL_NAME="" # human readable name, for example: "my-compute-pool" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export MAX_CFU="" # example: 5 export JSON_DATA="" The following JSON shows an example payload. The network key is optional. { "spec": { "display_name": "${COMPUTE_POOL_NAME}", "cloud": "${CLOUD_PROVIDER}", "region": "${CLOUD_REGION}", "max_cfu": ${MAX_CFU}, "environment": { "id": "${ENV_ID}" }, "network": { "id": "n-00000", "environment": "string" } } } Quotation mark characters in the JSON string must be escaped, so the payload string to send resembles the following: export JSON_DATA="{ \"spec\": { \"display_name\": \"${COMPUTE_POOL_NAME}\", \"cloud\": \"${CLOUD_PROVIDER}\", \"region\": \"${CLOUD_REGION}\", \"max_cfu\": ${MAX_CFU}, \"environment\": { \"id\": \"${ENV_ID}\" } } }" The following command sends a POST request to create a compute pool. curl --request POST \ --url https://api.confluent.cloud/fcpm/v2/compute-pools \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" \ --header 'content-type: application/json' \ --data "${JSON_DATA}" Your output should resemble: Response from a request to create a compute pool { "api_version": "fcpm/v2", "id": "lfcp-6g7h8i", "kind": "ComputePool", "metadata": { "created_at": "2024-02-27T22:44:27.18964Z", "resource_name": "crn://confluent.cloud/organization=b0b21724-4586-4a07-b787-d0bb5aacbf87/environment=env-z3y2x1/flink-region=aws.us-east-1/compute-pool=lfcp-6g7h8i", "self": "https://api.confluent.cloud/fcpm/v2/compute-pools/lfcp-6g7h8i", "updated_at": "2024-02-27T22:44:27.18964Z" }, "spec": { "cloud": "AWS", "display_name": "my-compute-pool", "environment": { "id": "env-z3y2x1", "related": "https://api.confluent.cloud/fcpm/v2/compute-pools/lfcp-6g7h8i", "resource_name": "crn://confluent.cloud/organization=b0b21724-4586-4a07-b787-d0bb5aacbf87/environment=env-z3y2x1" }, "http_endpoint": "https://flink.us-east-1.aws.confluent.cloud/sql/v1/organizations/b0b21724-4586-4a07-b787-d0bb5aacbf87/environments/env-z3y2x1", "max_cfu": 5, "region": "us-east-1" }, "status": { "current_cfu": 0, "phase": "PROVISIONING" } } To create a compute pool by using the Confluent Terraform provider, use the confluent_flink_compute_pool resource. Configure your Terraform file. Provide your Confluent Cloud API key and secret. terraform { required_providers { confluent = { source = "confluentinc/confluent" version = "2.44.0" } } } provider "confluent" { cloud_api_key = var.confluent_cloud_api_key # optionally use CONFLUENT_CLOUD_API_KEY env var cloud_api_secret = var.confluent_cloud_api_secret # optionally use CONFLUENT_CLOUD_API_SECRET env var } Define the environment where the compute pool will be created. resource "confluent_environment" "development" { display_name = "Development" lifecycle { prevent_destroy = true } } Define the confluent_flink_compute_pool resource with the required parameters, like display_name, cloud, region, max_cfu, and the environment ID. resource "confluent_flink_compute_pool" "main" { display_name = "standard_compute_pool" cloud = "AWS" region = "us-east-1" max_cfu = 5 environment { id = confluent_environment.development.id } } Run the terraform apply command to create the resources. terraform apply If you need to import an existing compute pool, use the terraform import command. export CONFLUENT_CLOUD_API_KEY="" export CONFLUENT_CLOUD_API_SECRET="" terraform import confluent_flink_compute_pool.main / For more information, see confluent_flink_compute_pool resource. View details for a compute pool¶ Confluent Cloud ConsoleConfluent CLIREST APITerraform In the navigation menu, click Environments, and click the tile for the environment where you use Flink SQL. In the environment details page, click Flink. In the Flink page, click Compute pools, if it’s not selected already. The available compute pools are listed as tiles, with details like Max CFUs and the cloud provider and region. If the tile for your compute pool isn’t visible, start typing in the Search pools textbox to filter the view. Click the tile for your compute pool to open the details page, which shows information like consumption metrics and Flink SQL statements that are associated with the compute pool. Run the confluent flink compute-pool describe command to get details about a compute pool. Describing a compute pool requires the following inputs: export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export ENV_ID="" # example: "env-z3y2x1" Run the following command to get details about a compute pool in the specified environment. confluent flink compute-pool describe ${COMPUTE_POOL_ID} \ --environment ${ENV_ID} Your output should resemble: +-------------+-----------------+ | Current | false | | ID | lfcp-xxd6og | | Name | my-compute-pool | | Environment | env-z3y2x1 | | Current CFU | 0 | | Max CFU | 5 | | Cloud | AWS | | Region | us-east-1 | | Status | PROVISIONED | +-------------+-----------------+ Get the details about a compute pool in your environment by sending a GET request to the Compute Pools endpoint. This request uses your Cloud API key instead of the Flink API key. Getting details about a compute pool requires the following inputs: export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" Run the following command to get details about the compute pool specified in the COMPUTE_POOL_ID environment variable. curl --request GET \ --url "https://api.confluent.cloud/fcpm/v2/compute-pools/${COMPUTE_POOL_ID}?environment=${ENV_ID}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" Your output should resemble: Response from a request to read a compute pool { "api_version": "fcpm/v2", "id": "lfcp-6g7h8i", "kind": "ComputePool", "metadata": { "created_at": "2024-02-27T22:44:27.18964Z", "resource_name": "crn://confluent.cloud/organization=b0b21724-4586-4a07-b787-d0bb5aacbf87/environment=env-z3y2x1/flink-region=aws.us-east-1/compute-pool=lfcp-6g7h8i", "self": "https://api.confluent.cloud/fcpm/v2/compute-pools/lfcp-6g7h8i", "updated_at": "2024-02-27T22:44:27.18964Z" }, "spec": { "cloud": "AWS", "display_name": "my-compute-pool", "environment": { "id": "env-z3y2x1", "related": "https://api.confluent.cloud/fcpm/v2/compute-pools/lfcp-6g7h8i", "resource_name": "crn://confluent.cloud/organization=b0b21724-4586-4a07-b787-d0bb5aacbf87/environment=env-z3y2x1" }, "http_endpoint": "https://flink.us-east-1.aws.confluent.cloud/sql/v1/organizations/b0b21724-4586-4a07-b787-d0bb5aacbf87/environments/env-z3y2x1", "max_cfu": 5, "region": "us-east-1" }, "status": { "current_cfu": 0, "phase": "PROVISIONED" } } To view details for a compute pool by using the Confluent Terraform provider, use the confluent_flink_compute_pool data source and the data argument. data "confluent_flink_compute_pool" "example_using_id" { id = "lfcp-abc123" environment { id = "" } } output "example_using_id" { value = data.confluent_flink_compute_pool.example_using_id } data "confluent_flink_compute_pool" "example_using_name" { display_name = "my_compute_pool" environment { id = "" } } output "example_using_name" { value = data.confluent_flink_compute_pool.example_using_name } Run the terraform apply or terraform output command. The example_using_id and example_using_name output contains details for the compute pool with the specified ID or name. For more information, see confluent_flink_compute_pool data source. List compute pools¶ Confluent Cloud ConsoleConfluent CLIREST APITerraform In the navigation menu, click Environments, and click the tile for the environment where you use Flink SQL. In the environment details page, click Flink. In the Flink page, click Compute pools, if it’s not selected already. The available compute pools are listed as tiles, with details like Max CFUs and the cloud provider and region. Run the confluent flink compute-pool list command to compute pools in the specified environment. Listing compute pools may require the following inputs, depending on the command: export CLOUD_REGION="" # example: "us-east-1" export ENV_ID="" # example: "env-z3y2x1" Run the following command to get details about a compute pool in the specified environment. confluent flink compute-pool list --environment ${ENV_ID} Your output should resemble: Current | ID | Name | Environment | Current CFU | Max CFU | Cloud | Region | Status ----------+-------------+---------------------------+-------------+-------------+---------+-------+-----------+-------------- * | lfcp-xxd6og | my-compute-pool | env-z3y2x1 | 0 | 5 | AWS | us-east-1 | PROVISIONED | lfcp-8m03rm | test-blue-compute-pool | env-z3q9rd | 0 | 10 | AWS | us-east-1 | PROVISIONED ... List the compute pools in your environment by sending a GET request to the Compute Pools endpoint. This request uses your Cloud API key instead of the Flink API key. Listing the compute pools in your environment requires the following inputs: export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" Run the following command to list the compute pools in your environment. curl --request GET \ --url "https://confluent.cloud/api/fcpm/v2/compute-pools?environment=${ENV_ID}&page_size=100" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" \ | jq -r '.data[] | .spec.display_name, {id}' Your output should resemble: compute_pool_0 { "id": "lfcp-j123kl" } compute_pool_2 { "id": "lfcp-abc1de" } my-lfcp-01 { "id": "lfcp-l2mn3o" } ... Find your compute pool in the list and save its ID in an environment variable. export COMPUTE_POOL_ID="" To list all compute pools using the Confluent Terraform provider, use the confluent_flink_compute_pool data source and the data argument. provider "confluent" { cloud_api_key = var.confluent_cloud_api_key cloud_api_secret = var.confluent_cloud_api_secret } data "confluent_flink_compute_pools" "all_pools" { environment_id = "" } output "compute_pools" { value = data.confluent_flink_compute_pools.all_pools.compute_pools } Run the terraform apply or terraform output command. The compute_pools output contains a list of all compute pools in your environment. To filter the compute pools by a specific attribute, region, availability, or name, use the filter argument within the data block. data "confluent_flink_compute_pools" "pools_in_us_east" { environment_id = "" filter = "region == ''" } For more information, see confluent_flink_compute_pool. Update a compute pool¶ You can update the name of the compute pool, its environment, and the MAX_CFUs setting. You can increase the Max CFUs value, but decreasing Max CFUs is not supported. Confluent Cloud ConsoleConfluent CLIREST APITerraform In the navigation menu, click Environments, and click the tile for the environment where you use Flink SQL. In the environment details page, click Flink. In the Flink page, click Compute pools, if it’s not selected already. In the listed compute pools, find the one you want to update, and click the options icon (⋮). In the context menu, click either Edit display name or Edit max CFUs and follow the instructions in the dialog. Click the tile for your compute pool to open the details page. In the details page, you can update the compute pool’s description or add metadata tags. Also, you can manage Flink SQL statements that are associated with the compute pool. Run the confluent flink compute-pool update command to update a compute pool. Updating a compute pool may require the following inputs, depending on the command: export COMPUTE_POOL_NAME= # human-readable name, for example, "my-compute-pool" export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export ENV_ID="" # example: "env-z3y2x1" export MAX_CFU="" # example: 5 Run the following command to update a compute pool in the specified environment. confluent flink compute-pool update ${COMPUTE_POOL_ID} \ --environment ${ENV_ID} \ --name ${COMPUTE_POOL_NAME} \ --max-cfu ${MAX_CFU} Your output should resemble: +-------------+----------------------+ | Current | false | | ID | lfcp-xxd6og | | Name | renamed-compute-pool | | Environment | env-z3y2x1 | | Current CFU | 0 | | Max CFU | 10 | | Cloud | AWS | | Region | us-east-1 | | Status | PROVISIONED | +-------------+----------------------+ Update a compute pool in your environment by sending a PATCH request to the Compute Pools endpoint. This request uses your Cloud API key instead of the Flink API key. Updating a compute pool requires the following inputs: export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" export MAX_CFU="" # example: 5 export JSON_DATA="" The following JSON shows an example payload. The network key is optional. { "spec": { "display_name": "${COMPUTE_POOL_NAME}", "max_cfu": ${MAX_CFU}, "environment": { "id": "${ENV_ID}" } } } Quotation mark characters in the JSON string must be escaped, so the payload string to send resembles the following: export JSON_DATA="{ \"spec\": { \"display_name\": \"${COMPUTE_POOL_NAME}\", \"max_cfu\": ${MAX_CFU}, \"environment\": { \"id\": \"${ENV_ID}\" } } }" Run the following command to update the compute pool specified in the COMPUTE_POOL_ID environment variable. curl --request PATCH \ --url "https://api.confluent.cloud/fcpm/v2/compute-pools/${COMPUTE_POOL_ID}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" \ --header 'content-type: application/json' \ --data "${JSON_DATA}" To update a compute pool by using the Confluent Terraform provider, use the confluent_flink_compute_pool resource. Find the definition for the compute pool resource in your Terraform configuration, for example: resource "confluent_flink_compute_pool" "example" { cloud = "AWS" region = "us-west-2" max_cfu = 10 # other required parameters } Modify the attributes of the confluent_flink_compute_pool resource in the Terraform configuration file. The following example updates the max_cfu attribute. resource "confluent_flink_compute_pool" "example" { cloud = "AWS" region = "us-west-2" max_cfu = 20 # Updated value # other required parameters } Run the terraform apply command to update the compute pool with the new configuration. terraform apply For more information, see confluent_flink_compute_pool. Delete a compute pool¶ Confluent Cloud ConsoleConfluent CLIREST APITerraform In the navigation menu, click Environments, and click the tile for the environment where you want to use Flink SQL. In the environment details page, click Flink. In the Flink page, click Compute pools, if it’s not selected already. In the listed compute pools, find the one you want to delete, and click the options icon (⋮). In the context menu, click Delete compute pool, and in the dialog, enter the compute pool name to confirm deletion. Run the confluent flink compute-pool delete command to delete a compute pool. Run the following command to delete a compute pool in the specified environment. The optional --force flag skips the confirmation prompt. confluent flink compute-pool delete ${COMPUTE_POOL_ID} \ --environment ${ENV_ID} --force Your output should resemble: Deleted Flink compute pool "lfcp-xxd6og". Delete a compute pool in your environment by sending a DELETE request to the Compute Pools endpoint. This request uses your Cloud API key instead of the Flink API key. Deleting a compute pool requires the following inputs: export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" Run the following command to delete the compute pool specified in the COMPUTE_POOL_ID environment variable. curl --request DELETE \ --url "https://api.confluent.cloud/fcpm/v2/compute-pools/${COMPUTE_POOL_ID}?environment=${ENV_ID}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" To delete a compute pool by using the Confluent Terraform provider, use the confluent_flink_compute_pool resource. Define the compute pool resource in your Terraform configuration file, for example: resource "confluent_flink_compute_pool" "main" { display_name = "standard_compute_pool" cloud = "AWS" region = "us-east-1" max_cfu = 5 environment { id = "" } } To avoid accidental deletions, review the plan before applying the destroy command. terraform plan -destroy -target=confluent_flink_compute_pool.main To delete the compute pool, run the following command to target the specific resource. This command deletes only the compute pool and not other resources. terraform apply -destroy -target=confluent_flink_compute_pool.main To remove all resources defined in your Terraform configuration file, including the compute pool, run the terraform destroy command. terraform destroy For more information, see confluent_flink_compute_pool. Related content¶ Flink Compute Pools Billing on Confluent Cloud for Apache Flink Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql export COMPUTE_POOL_NAME= # human-readable name, for example, "my-compute-pool" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export ENV_ID="" # example: "env-z3y2x1" export MAX_CFU="" # example: 5 ``` ```sql confluent flink compute-pool create ${COMPUTE_POOL_NAME} \ --cloud ${CLOUD_PROVIDER} \ --region ${CLOUD_REGION} \ --max-cfu ${MAX_CFU} \ --environment ${ENV_ID} ``` ```sql +-------------+-----------------+ | Current | false | | ID | lfcp-xxd6og | | Name | my-compute-pool | | Environment | env-z3y2x1 | | Current CFU | 0 | | Max CFU | 5 | | Cloud | AWS | | Region | us-east-1 | | Status | PROVISIONING | +-------------+-----------------+ ``` ```sql export COMPUTE_POOL_NAME="" # human readable name, for example: "my-compute-pool" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export MAX_CFU="" # example: 5 export JSON_DATA="" ``` ```sql { "spec": { "display_name": "${COMPUTE_POOL_NAME}", "cloud": "${CLOUD_PROVIDER}", "region": "${CLOUD_REGION}", "max_cfu": ${MAX_CFU}, "environment": { "id": "${ENV_ID}" }, "network": { "id": "n-00000", "environment": "string" } } } ``` ```sql export JSON_DATA="{ \"spec\": { \"display_name\": \"${COMPUTE_POOL_NAME}\", \"cloud\": \"${CLOUD_PROVIDER}\", \"region\": \"${CLOUD_REGION}\", \"max_cfu\": ${MAX_CFU}, \"environment\": { \"id\": \"${ENV_ID}\" } } }" ``` ```sql curl --request POST \ --url https://api.confluent.cloud/fcpm/v2/compute-pools \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" \ --header 'content-type: application/json' \ --data "${JSON_DATA}" ``` ```sql { "api_version": "fcpm/v2", "id": "lfcp-6g7h8i", "kind": "ComputePool", "metadata": { "created_at": "2024-02-27T22:44:27.18964Z", "resource_name": "crn://confluent.cloud/organization=b0b21724-4586-4a07-b787-d0bb5aacbf87/environment=env-z3y2x1/flink-region=aws.us-east-1/compute-pool=lfcp-6g7h8i", "self": "https://api.confluent.cloud/fcpm/v2/compute-pools/lfcp-6g7h8i", "updated_at": "2024-02-27T22:44:27.18964Z" }, "spec": { "cloud": "AWS", "display_name": "my-compute-pool", "environment": { "id": "env-z3y2x1", "related": "https://api.confluent.cloud/fcpm/v2/compute-pools/lfcp-6g7h8i", "resource_name": "crn://confluent.cloud/organization=b0b21724-4586-4a07-b787-d0bb5aacbf87/environment=env-z3y2x1" }, "http_endpoint": "https://flink.us-east-1.aws.confluent.cloud/sql/v1/organizations/b0b21724-4586-4a07-b787-d0bb5aacbf87/environments/env-z3y2x1", "max_cfu": 5, "region": "us-east-1" }, "status": { "current_cfu": 0, "phase": "PROVISIONING" } } ``` ```sql terraform { required_providers { confluent = { source = "confluentinc/confluent" version = "2.44.0" } } } provider "confluent" { cloud_api_key = var.confluent_cloud_api_key # optionally use CONFLUENT_CLOUD_API_KEY env var cloud_api_secret = var.confluent_cloud_api_secret # optionally use CONFLUENT_CLOUD_API_SECRET env var } ``` ```sql resource "confluent_environment" "development" { display_name = "Development" lifecycle { prevent_destroy = true } } ``` ```sql confluent_flink_compute_pool ``` ```sql display_name ``` ```sql resource "confluent_flink_compute_pool" "main" { display_name = "standard_compute_pool" cloud = "AWS" region = "us-east-1" max_cfu = 5 environment { id = confluent_environment.development.id } } ``` ```sql terraform apply ``` ```sql terraform apply ``` ```sql terraform import ``` ```sql export CONFLUENT_CLOUD_API_KEY="" export CONFLUENT_CLOUD_API_SECRET="" terraform import confluent_flink_compute_pool.main / ``` ```sql export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export ENV_ID="" # example: "env-z3y2x1" ``` ```sql confluent flink compute-pool describe ${COMPUTE_POOL_ID} \ --environment ${ENV_ID} ``` ```sql +-------------+-----------------+ | Current | false | | ID | lfcp-xxd6og | | Name | my-compute-pool | | Environment | env-z3y2x1 | | Current CFU | 0 | | Max CFU | 5 | | Cloud | AWS | | Region | us-east-1 | | Status | PROVISIONED | +-------------+-----------------+ ``` ```sql export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" ``` ```sql curl --request GET \ --url "https://api.confluent.cloud/fcpm/v2/compute-pools/${COMPUTE_POOL_ID}?environment=${ENV_ID}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" ``` ```sql { "api_version": "fcpm/v2", "id": "lfcp-6g7h8i", "kind": "ComputePool", "metadata": { "created_at": "2024-02-27T22:44:27.18964Z", "resource_name": "crn://confluent.cloud/organization=b0b21724-4586-4a07-b787-d0bb5aacbf87/environment=env-z3y2x1/flink-region=aws.us-east-1/compute-pool=lfcp-6g7h8i", "self": "https://api.confluent.cloud/fcpm/v2/compute-pools/lfcp-6g7h8i", "updated_at": "2024-02-27T22:44:27.18964Z" }, "spec": { "cloud": "AWS", "display_name": "my-compute-pool", "environment": { "id": "env-z3y2x1", "related": "https://api.confluent.cloud/fcpm/v2/compute-pools/lfcp-6g7h8i", "resource_name": "crn://confluent.cloud/organization=b0b21724-4586-4a07-b787-d0bb5aacbf87/environment=env-z3y2x1" }, "http_endpoint": "https://flink.us-east-1.aws.confluent.cloud/sql/v1/organizations/b0b21724-4586-4a07-b787-d0bb5aacbf87/environments/env-z3y2x1", "max_cfu": 5, "region": "us-east-1" }, "status": { "current_cfu": 0, "phase": "PROVISIONED" } } ``` ```sql data "confluent_flink_compute_pool" "example_using_id" { id = "lfcp-abc123" environment { id = "" } } output "example_using_id" { value = data.confluent_flink_compute_pool.example_using_id } data "confluent_flink_compute_pool" "example_using_name" { display_name = "my_compute_pool" environment { id = "" } } output "example_using_name" { value = data.confluent_flink_compute_pool.example_using_name } ``` ```sql terraform apply ``` ```sql terraform output ``` ```sql example_using_id ``` ```sql example_using_name ``` ```sql export CLOUD_REGION="" # example: "us-east-1" export ENV_ID="" # example: "env-z3y2x1" ``` ```sql confluent flink compute-pool list --environment ${ENV_ID} ``` ```sql Current | ID | Name | Environment | Current CFU | Max CFU | Cloud | Region | Status ----------+-------------+---------------------------+-------------+-------------+---------+-------+-----------+-------------- * | lfcp-xxd6og | my-compute-pool | env-z3y2x1 | 0 | 5 | AWS | us-east-1 | PROVISIONED | lfcp-8m03rm | test-blue-compute-pool | env-z3q9rd | 0 | 10 | AWS | us-east-1 | PROVISIONED ... ``` ```sql export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" ``` ```sql curl --request GET \ --url "https://confluent.cloud/api/fcpm/v2/compute-pools?environment=${ENV_ID}&page_size=100" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" \ | jq -r '.data[] | .spec.display_name, {id}' ``` ```sql compute_pool_0 { "id": "lfcp-j123kl" } compute_pool_2 { "id": "lfcp-abc1de" } my-lfcp-01 { "id": "lfcp-l2mn3o" } ... ``` ```sql export COMPUTE_POOL_ID="" ``` ```sql provider "confluent" { cloud_api_key = var.confluent_cloud_api_key cloud_api_secret = var.confluent_cloud_api_secret } data "confluent_flink_compute_pools" "all_pools" { environment_id = "" } output "compute_pools" { value = data.confluent_flink_compute_pools.all_pools.compute_pools } ``` ```sql terraform apply ``` ```sql terraform output ``` ```sql compute_pools ``` ```sql data "confluent_flink_compute_pools" "pools_in_us_east" { environment_id = "" filter = "region == ''" } ``` ```sql export COMPUTE_POOL_NAME= # human-readable name, for example, "my-compute-pool" export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export ENV_ID="" # example: "env-z3y2x1" export MAX_CFU="" # example: 5 ``` ```sql confluent flink compute-pool update ${COMPUTE_POOL_ID} \ --environment ${ENV_ID} \ --name ${COMPUTE_POOL_NAME} \ --max-cfu ${MAX_CFU} ``` ```sql +-------------+----------------------+ | Current | false | | ID | lfcp-xxd6og | | Name | renamed-compute-pool | | Environment | env-z3y2x1 | | Current CFU | 0 | | Max CFU | 10 | | Cloud | AWS | | Region | us-east-1 | | Status | PROVISIONED | +-------------+----------------------+ ``` ```sql export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" export MAX_CFU="" # example: 5 export JSON_DATA="" ``` ```sql { "spec": { "display_name": "${COMPUTE_POOL_NAME}", "max_cfu": ${MAX_CFU}, "environment": { "id": "${ENV_ID}" } } } ``` ```sql export JSON_DATA="{ \"spec\": { \"display_name\": \"${COMPUTE_POOL_NAME}\", \"max_cfu\": ${MAX_CFU}, \"environment\": { \"id\": \"${ENV_ID}\" } } }" ``` ```sql curl --request PATCH \ --url "https://api.confluent.cloud/fcpm/v2/compute-pools/${COMPUTE_POOL_ID}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" \ --header 'content-type: application/json' \ --data "${JSON_DATA}" ``` ```sql resource "confluent_flink_compute_pool" "example" { cloud = "AWS" region = "us-west-2" max_cfu = 10 # other required parameters } ``` ```sql confluent_flink_compute_pool ``` ```sql resource "confluent_flink_compute_pool" "example" { cloud = "AWS" region = "us-west-2" max_cfu = 20 # Updated value # other required parameters } ``` ```sql terraform apply ``` ```sql terraform apply ``` ```sql confluent flink compute-pool delete ${COMPUTE_POOL_ID} \ --environment ${ENV_ID} --force ``` ```sql Deleted Flink compute pool "lfcp-xxd6og". ``` ```sql export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" ``` ```sql curl --request DELETE \ --url "https://api.confluent.cloud/fcpm/v2/compute-pools/${COMPUTE_POOL_ID}?environment=${ENV_ID}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" ``` ```sql resource "confluent_flink_compute_pool" "main" { display_name = "standard_compute_pool" cloud = "AWS" region = "us-east-1" max_cfu = 5 environment { id = "" } } ``` ```sql terraform plan -destroy -target=confluent_flink_compute_pool.main ``` ```sql terraform apply -destroy -target=confluent_flink_compute_pool.main ``` ```sql terraform destroy ``` ```sql terraform destroy ``` --- ### Deploy a Flink SQL Statement in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/operate-and-deploy/deploy-flink-sql-statement.html Deploy a Flink SQL Statement Using CI/CD and Confluent Cloud for Apache Flink¶ GitHub Actions is a powerful feature on GitHub that enables automating your software development workflows. If your source code is stored in a GitHub repository, you can easily create a custom workflow in GitHub Actions to build, test, package, release, or deploy any code project. This topic shows how to create a CI/CD workflow that deploys an Apache Flink® SQL statement programmatically on Confluent Cloud for Apache Flink by using Hashicorp Terraform and GitHub Actions. With the steps in this topic, you can streamline your development process. In this walkthrough, you perform the following steps: Step 1: Set up a Terraform Cloud workspace Step 2: Set up a repository and secrets in GitHub Step 3. Create a CI/CD workflow in GitHub Actions Step 4. Deploy resources in Confluent Cloud Step 5. Deploy a Flink SQL statement Prerequisites¶ You need the following prerequisites to complete this tutorial: Access to Confluent Cloud A GitHub account to set up a repository and create the CI/CD workflow A Terraform Cloud account Step 1: Set up a Terraform Cloud workspace¶ You need a Terraform Cloud account to follow this tutorial. If you don’t have one yet, create an account for free at Terraform Cloud. With a Terraform Cloud account, you can manage your infrastructure-as-code and collaborate with your team. Create a workspace¶ If you have created a new Terraform Cloud account and the Getting Started page is displayed, click Create a new organization, and in the Organization name textbox, enter “flink_ccloud”. Click Create organization. Otherwise, from the Terraform Cloud homepage, click New to create a new workspace. In the Create a new workspace page, click the API-Driven Workflow tile, and in the Workspace name textbox, enter “cicd_flink_ccloud”. Click Create to create the workspace. Create a Terraform Cloud API token¶ By creating an API token, you can authenticate securely with Terraform Cloud and integrate it with GitHub Actions. Save the token in a secure location, and don’t share it with anyone. At the top of the navigation menu, click your user icon and select User settings. In the navigation menu, click Tokens, and in the Tokens page, click Create an API token. Give your token a meaningful description, like “github_actions”, and click Generate token. Your token appears in the Tokens list. Save the API token in a secure location. It won’t be displayed again. Step 2: Set up a repository and secrets in GitHub¶ To create an Action Secret in GitHub for securely storing the API token from Terraform Cloud, follow these steps. Log in to your GitHub account and create a new repository. In the Create a new repository page, use the Owner dropdown to choose an owner, and give the repository a unique name, like “”. Click Create. In the repository details page, click Settings. In the navigation menu, click Secrets and variables, and in the context menu, select Actions to open the Actions secrets and variables page. Click New repository secret. In the New secret page, enter the following settings. In the Name textbox, enter “TF_API_TOKEN”. In the Secret textbox, enter the API token value that you saved from the previous Terraform Cloud step. Click Add secret to save the Action Secret. By creating an Action Secret for the API token, you can use it securely in your CI/CD pipelines, such as in GitHub Actions. Keep the secret safe, and don’t share it with anyone who shouldn’t have access to it. Step 3. Create a CI/CD workflow in GitHub Actions¶ The following steps show how to create an Action Workflow for automating the deployment of a Flink SQL statement on Confluent Cloud using Terraform. In the toolbar at the top of the screen, click Actions. The Get started with GitHub Actions page opens. Click set up a workflow yourself ->. If you already have a workflow defined, click new workflow, and then click set up a workflow yourself ->. Copy the following YAML into the editor. This YAML file defines a workflow that runs when changes are pushed to the main branch of your repository. It includes a job named “terraform_flink_ccloud_tutorial” that runs on the latest version of Ubuntu. The job includes these steps: Check out the code Set up Terraform Log in to Terraform Cloud using the API token stored in the Action Secret Initialize Terraform Apply the Terraform configuration to deploy changes to your Confluent Cloud account on: push: branches: - main jobs: terraform_flink_ccloud_tutorial: name: "terraform_flink_ccloud_tutorial" runs-on: ubuntu-latest steps: - name: Checkout uses: actions/checkout@v4 - name: Setup Terraform uses: hashicorp/setup-terraform@v3 with: cli_config_credentials_token: ${{ secrets.TF_API_TOKEN }} - name: Terraform Init id: init run: terraform init - name: Terraform Validate id: validate run: terraform validate -no-color - name: Terraform Plan id: plan run: terraform plan env: TF_VAR_confluent_cloud_api_key: ${{ secrets.CONFLUENT_CLOUD_API_KEY }} TF_VAR_confluent_cloud_api_secret: ${{ secrets.CONFLUENT_CLOUD_API_SECRET }} - name: Terraform Apply id: apply run: terraform apply -auto-approve env: TF_VAR_confluent_cloud_api_key: ${{ secrets.CONFLUENT_CLOUD_API_KEY }} TF_VAR_confluent_cloud_api_secret: ${{ secrets.CONFLUENT_CLOUD_API_SECRET }} Click Commit changes, and in the dialog, enter a description in the Extended description textbox, for example, “CI/CD workflow to automate deployment on Confluent Cloud”. Click Commit changes. The file main.yml is created in the .github/workflows directory in your repository. With this Action Workflow, your deployment of Flink SQL statements on Confluent Cloud is now automatic. Step 4. Deploy resources in Confluent Cloud¶ In this section, you deploy a Flink SQL statement programmatically to Confluent Cloud that runs continuously until stopped manually. In VS Code or another IDE, clone your repository and create a new file in the root named “main.tf” with the following code. Replace the organization and workspace names with your Terraform Cloud organization name and workspace names from Step 1. terraform { cloud { organization = "" workspaces { name = "cicd_flink_ccloud" } } required_providers { confluent = { source = "confluentinc/confluent" version = "2.2.0" } } } Commit and push the changes to the repository. The CI/CD workflow that you created previously runs automatically. Verify that it’s running by navigating to the Actions section in your repository and clicking on the latest workflow run. Create a Confluent Cloud API key¶ To access Confluent Cloud securely, you must have a Confluent Cloud API key. After you generate an API key, you store securely it in your GitHub repository’s Secrets and variables page, the same way that you stored the Terraform API token. Follow the instructions here to create a new API key for Confluent Cloud, and on the https://confluent.cloud/settings/api-keys page, select the Cloud resource management tile for the API key’s resource scope. You will use this API key to communicate securely with Confluent Cloud. Return to the Settings page for your GitHub repository, and in the navigation menu, click Secrets and variables. In the context menu, select Actions to open the Actions secrets and variables page. Click New repository secret. In the New secret page, enter the following settings. In the Name textbox, enter “CONFLUENT_CLOUD_API_KEY”. In the Secret textbox, enter the Cloud API key. Click Add secret to save the Cloud API key as an Action Secret. Click New repository secret and repeat the previous steps for the Cloud API secret. Name the secret “CONFLUENT_CLOUD_API_SECRET”. Your Repository secrets list should resemble the following: Deploy resources¶ In this section, you add resources to your Terraform configuration file and provision them when the GitHub Action runs. In your repository, create a new file named “variables.tf” with the following code. variable "confluent_cloud_api_key" { description = "Confluent Cloud API Key" type = string } variable "confluent_cloud_api_secret" { description = "Confluent Cloud API Secret" type = string sensitive = true } In the “main.tf” file, add the following code. This code references the Cloud API key and secret you added in the previous steps and creates a new environment and Kafka cluster for your organization. Optionally, you can choose to use an existing environment. locals { cloud = "AWS" region = "us-east-2" } provider "confluent" { cloud_api_key = var.confluent_cloud_api_key cloud_api_secret = var.confluent_cloud_api_secret } # Create a new environment. resource "confluent_environment" "my_env" { display_name = "my_env" stream_governance { package = "ESSENTIALS" } } # Create a new Kafka cluster. resource "confluent_kafka_cluster" "my_kafka_cluster" { display_name = "my_kafka_cluster" availability = "SINGLE_ZONE" cloud = local.cloud region = local.region basic {} environment { id = confluent_environment.my_env.id } depends_on = [ confluent_environment.my_env ] } # Access the Stream Governance Essentials package to the environment. data "confluent_schema_registry_cluster" "my_sr_cluster" { environment { id = confluent_environment.my_env.id } } Create a Service Account and provide a role binding by adding the following code to “main.tf”. The role binding gives the Service Account the necessary permissions to create topics, Flink statements, and other resources. In production, you may want to assign a less privileged role than OrganizationAdmin. # Create a new Service Account. This will used during Kafka API key creation and Flink SQL statement submission. resource "confluent_service_account" "my_service_account" { display_name = "my_service_account" } data "confluent_organization" "my_org" {} # Assign the OrganizationAdmin role binding to the above Service Account. # This will give the Service Account the necessary permissions to create topics, Flink statements, etc. # In production, you may want to assign a less privileged role. resource "confluent_role_binding" "my_org_admin_role_binding" { principal = "User:${confluent_service_account.my_service_account.id}" role_name = "OrganizationAdmin" crn_pattern = data.confluent_organization.my_org.resource_name depends_on = [ confluent_service_account.my_service_account ] } Push all changes to your repository and check the Actions page to ensure the workflow runs successfully. At this point, you should have a new environment, an Apache Kafka® cluster, and a Stream Governance package provisioned in your Confluent Cloud organization. Step 5. Deploy a Flink SQL statement¶ To use Flink, you must create a Flink compute pool. A compute pool represents a set of compute resources that are bound to a region and are used to run your Flink SQL statements. For more information, see Compute Pools. Create a new compute pool by adding the following code to “main.tf”. # Create a Flink compute pool to execute a Flink SQL statement. resource "confluent_flink_compute_pool" "my_compute_pool" { display_name = "my_compute_pool" cloud = local.cloud region = local.region max_cfu = 10 environment { id = confluent_environment.my_env.id } depends_on = [ confluent_environment.my_env ] } Create a Flink-specific API key, which is required for submitting statements to Confluent Cloud, by adding the following code to “main.tf”. # Create a Flink-specific API key that will be used to submit statements. data "confluent_flink_region" "my_flink_region" { cloud = local.cloud region = local.region } resource "confluent_api_key" "my_flink_api_key" { display_name = "my_flink_api_key" owner { id = confluent_service_account.my_service_account.id api_version = confluent_service_account.my_service_account.api_version kind = confluent_service_account.my_service_account.kind } managed_resource { id = data.confluent_flink_region.my_flink_region.id api_version = data.confluent_flink_region.my_flink_region.api_version kind = data.confluent_flink_region.my_flink_region.kind environment { id = confluent_environment.my_env.id } } depends_on = [ confluent_environment.my_env, confluent_service_account.my_service_account ] } Deploy a Flink SQL statement on Confluent Cloud by adding the following code to “main.tf”. The statement consumes data from examples.marketplace.orders, aggregates in 1 minute windows and ingests the filtered data into sink_topic. Because you’re using a Service Account, the statement runs in Confluent Cloud continuously until manually stopped. # Deploy a Flink SQL statement to Confluent Cloud. resource "confluent_flink_statement" "my_flink_statement" { organization { id = data.confluent_organization.my_org.id } environment { id = confluent_environment.my_env.id } compute_pool { id = confluent_flink_compute_pool.my_compute_pool.id } principal { id = confluent_service_account.my_service_account.id } # This SQL reads data from source_topic, filters it, and ingests the filtered data into sink_topic. statement = < https://flink.${CLOUD_REGION}.${CLOUD_PROVIDER}.confluent.cloud/ https://flink.${CLOUD_REGION}.${CLOUD_PROVIDER}.private.confluent.cloud/ For example, if you send a request to the us-east-1 AWS region without a private network, the host is: https://flink.us-east-1.aws.confluent.cloud With a private network, the host is: https://flink.us-east-1.aws.private.confluent.cloud Generate a Flink API key¶ To access the REST API, you need an API key specifically for Flink. This key is distinct from the Confluent Cloud API key. Before you create an API key for Flink access, decide whether you want to create long-running Flink SQL statements. If you need long-running Flink SQL statements, Confluent recommends using a service account and creating an API key for it. If you want to run only interactive queries or statements for a short time while developing queries, you can create an API key for your user account. Follow the steps in Generate an API Key for Access. Run the following commands to save your API key and secret in environment variables. export FLINK_API_KEY="" export FLINK_API_SECRET="" The REST API uses basic authentication, which means that you provide a base64-encoded string made from your Flink API key and secret in the request header. You can use the base64 command to encode the “key:secret” string. Be sure to use the -n option of the echo command to prevent newlines from being embedded in the encoded string. If you’re on Linux, be sure to use the -w 0 option of the base64 command, to prevent the string from being line-wrapped. For convenience, save the encoded string in an environment variable: export BASE64_FLINK_KEY_AND_SECRET=$(echo -n "${FLINK_API_KEY}:${FLINK_API_SECRET}" | base64 -w 0) Manage statements¶ Using requests to the Flink REST API, you can perform these actions: Submit a statement Get a statement List statements Update metadata for a statement Delete a statement Flink SQL statement schema¶ A statement has the following schema: api_version: "sql/v1" kind: "Statement" organization_id: "" # Identifier of your Confluent Cloud organization environment_id: "" # Identifier of your Confluent Cloud environment name: "" # Primary identifier of the statement, must be unique within the environment, 100 max length, [a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)* metadata: created_at: "" # Creation timestamp of this resource updated_at: "" # Last updated timestamp of this resource resource_version: "" # Generated by the system and updated whenever the statement is updated (including by the system). Opaque and should not be parsed. self: "" # An absolute URL to this resource uid: "" # uid is unique in time and space (i.e., even if the name is re-used) spec: compute_pool_id: "" # The ID of the compute pool the statement should run in. DNS Subdomain (RFC 1123) – 255 max len, [a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)* principal: "" # user or service account ID properties: map[string]string # Optional. request/client properties statement: "SELECT * from Orders;" # The raw SQL text stopped: false # Boolean, specifying if the statement should be stopped status: phase: PENDING | RUNNING | COMPLETED | DELETING | FAILING | FAILED detail: "" # Optional. Human-readable description of phase. result_schema: "" # Optional. JSON object in TableSchema format; describes the data returned by the results serving API. The statement name has a maximum length of 100 characters and must satisfy the following regular expression: [a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)* The underscore character (_) and period character (.) are not supported. Submit a statement¶ You can submit a Flink SQL statement by sending a POST request to the Statements endpoint. Submitting a Flink SQL statement requires the following inputs: export FLINK_API_KEY="" export FLINK_API_SECRET="" export BASE64_FLINK_KEY_AND_SECRET=$(echo -n "${FLINK_API_KEY}:${FLINK_API_SECRET}" | base64 -w 0) export STATEMENT_NAME="" # example: "user-filter" export ORG_ID="" # example: "b0b21724-4586-4a07-b787-d0bb5aacbf87" export ENV_ID="" # example: "env-z3y2x1" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export PRINCIPAL_ID="" # (optional) example: "sa-23kgz4" for a service account, or "u-aq1dr2" for a user account export SQL_CODE="" # example: "SELECT * FROM USERS;" export JSON_DATA="" The PRINCIPAL_ID parameter is optional. Confluent Cloud infers the principal from the provided Flink API key. The following JSON shows an example payload: { "name": "${STATEMENT_NAME}", "organization_id": "${ORG_ID}", "environment_id": "${ENV_ID}", "spec": { "statement": "${SQL_CODE}", "properties": { "key1": "value1", "key2": "value2" }, "compute_pool_id": "${COMPUTE_POOL_ID}", "principal": "${PRINCIPAL_ID}", "stopped": false } } Quotation mark characters in the JSON string must be escaped, so the payload string to send resembles the following: export JSON_DATA="{ \"name\": \"${STATEMENT_NAME}\", \"organization_id\": \"${ORG_ID}\", \"environment_id\": \"${ENV_ID}\", \"spec\": { \"statement\": \"${SQL_CODE}\", \"properties\": { \"key1\": \"value1\", \"key2\": \"value2\" }, \"compute_pool_id\": \"${COMPUTE_POOL_ID}\", \"principal\": \"${PRINCIPAL_ID}\", \"stopped\": false } }" The following command sends a POST request that submits a Flink SQL statement. curl --request POST \ --url "https://flink.${CLOUD_REGION}.${CLOUD_PROVIDER}.confluent.cloud/sql/v1/organizations/${ORG_ID}/environments/${ENV_ID}/statements" \ --header "Authorization: Basic ${BASE64_FLINK_KEY_AND_SECRET}" \ --header 'content-type: application/json' \ --data "${JSON_DATA}" Your output should resemble: Response from a request to submit a SQL statement { "api_version": "sql/v1", "environment_id": "env-z3y2x1", "kind": "Statement", "metadata": { "created_at": "2023-12-16T17:12:08.914198Z", "resource_version": "1", "self": "https://flink.us-east-1.aws.confluent.cloud/sql/v1/organizations/b0b21724-4586-4a07-b787-d0bb5aacbf87/environments/env-z3y2x1/statements/demo-statement-1", "uid": "0005dd7b-8a7e-4274-b97e-c21b134d98f0", "updated_at": "2023-12-16T17:12:08.914198Z" }, "name": "demo-statement-1", "organization_id": "b0b21724-4586-4a07-b787-d0bb5aacbf87", "spec": { "compute_pool_id": "lfcp-8m03rm", "principal": "u-aq1dr2", "properties": null, "statement": "select 1;", "stopped": false }, "status": { "detail": "", "phase": "PENDING" } } Get a statement¶ Get the details about a Flink SQL statement by sending a GET request to the Statements endpoint. Getting a Flink SQL statement requires the following inputs: export FLINK_API_KEY="" export FLINK_API_SECRET="" export BASE64_FLINK_KEY_AND_SECRET=$(echo -n "${FLINK_API_KEY}:${FLINK_API_SECRET}" | base64 -w 0) export STATEMENT_NAME="" # example: "user-filter" export ORG_ID="" # example: "b0b21724-4586-4a07-b787-d0bb5aacbf87" export ENV_ID="" # example: "env-z3y2x1" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" The following command gets a Flink SQL statement’s details by its name. Attempting to get a deleted statement returns 404. curl --request GET \ --url "https://flink.${CLOUD_REGION}.${CLOUD_PROVIDER}.confluent.cloud/sql/v1/organizations/${ORG_ID}/environments/${ENV_ID}/statements/${STATEMENT_NAME}" \ --header "Authorization: Basic ${BASE64_FLINK_KEY_AND_SECRET}" Your output should resemble: Response from a request to get a SQL statement { "api_version": "sql/v1", "environment_id": "env-z3y2x1", "kind": "Statement", "metadata": { "created_at": "2023-12-16T16:08:36.650591Z", "resource_version": "13", "self": "https://flink.us-east-1.aws.confluent.cloud/sql/v1/organizations/b0b21724-4586-4a07-b787-d0bb5aacbf87/environments/env-z3y2x1/statements/demo-statement-1", "uid": "5387a4a4-02dd-4375-8db1-80bdd82ede96", "updated_at": "2023-12-16T16:10:05.353298Z" }, "name": "demo-statement-1", "organization_id": "b0b21724-4586-4a07-b787-d0bb5aacbf87", "spec": { "compute_pool_id": "lfcp-8m03rm", "principal": "u-aq1dr2", "properties": null, "statement": "select 1;", "stopped": false }, "status": { "detail": "", "phase": "COMPLETED", "result_schema": { "columns": [ { "name": "EXPR$0", "type": { "nullable": false, "type": "INTEGER" } } ] } } } Tip Pipe the result through jq to extract the code for the Flink SQL statement: curl --request GET \ --url "https://flink.${CLOUD_REGION}.${CLOUD_PROVIDER}.confluent.cloud/sql/v1/organizations/${ORG_ID}/environments/${ENV_ID}/statements/${STATEMENT_NAME}" \ --header "Authorization: Basic ${BASE64_FLINK_KEY_AND_SECRET}" \ | jq -r '.spec.statement' Your output should resemble: select 1; List statements¶ List the statements in an environment by sending a GET request to the Statements endpoint. Request Query Parameters spec.compute_pool_id (optional): Fetch only the statements under this compute pool ID. page_token (optional): Retrieve a page based on a previously received token (via the metadata.next field of StatementList). page_size (optional): Maximum number of items to return in a page. Listing all Flink SQL statements requires the following inputs: export FLINK_API_KEY="" export FLINK_API_SECRET="" export BASE64_FLINK_KEY_AND_SECRET=$(echo -n "${FLINK_API_KEY}:${FLINK_API_SECRET}" | base64 -w 0) export ORG_ID="" # example: "b0b21724-4586-4a07-b787-d0bb5aacbf87" export ENV_ID="environment-id" # example: "env-z3y2x1" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" The following command returns details for all non-deleted Flink SQL statements under the scope of the environment (one or more compute pools) where you have permission to do a GET request. curl --request GET \ --url "https://flink.${CLOUD_REGION}.${CLOUD_PROVIDER}.confluent.cloud/sql/v1/organizations/${ORG_ID}/environments/${ENV_ID}/statements" \ --header "Authorization: Basic ${BASE64_FLINK_KEY_AND_SECRET}" Your output should resemble: Response from a request to list the statements in an environment { "api_version": "sql/v1", "data": [ { "api_version": "sql/v1", "environment_id": "env-z3y2x1", "kind": "Statement", "metadata": { "created_at": "2023-12-16T16:08:36.650591Z", "resource_version": "13", "self": "https://flink.us-east-1.aws.confluent.cloud/sql/v1/organizations/b0b21724-4586-4a07-b787-d0bb5aacbf87/environments/env-z3y2x1/statements/demo-statement-1", "uid": "5387a4a4-02dd-4375-8db1-80bdd82ede96", "updated_at": "2023-12-16T16:10:05.353298Z" }, "name": "demo-statement-1", "organization_id": "b0b21724-4586-4a07-b787-d0bb5aacbf87", "spec": { "compute_pool_id": "lfcp-8m03rm", "principal": "u-aq1dr2", "properties": null, "statement": "select 1;", "stopped": false }, "status": { "detail": "", "phase": "COMPLETED", "result_schema": { "columns": [ { "name": "EXPR$0", "type": { "nullable": false, "type": "INTEGER" } } ] } } } Update metadata for a statement¶ Update the metadata for a statement by sending a PUT request to the Statements endpoint. You can stop and resume a statement by setting stopped in the spec to true to stop the statement and false to resume the statement. You can update the statement’s name, compute pool, and security principal. To update the compute pool or principal, you must stop the statement, send the update request, then restart the statement. The statement’s code is immutable. You must specify a resource version in the payload metadata. Updating metadata for an existing Flink SQL statement requires the following inputs: export FLINK_API_KEY="" export FLINK_API_SECRET="" export BASE64_FLINK_KEY_AND_SECRET=$(echo -n "${FLINK_API_KEY}:${FLINK_API_SECRET}" | base64 -w 0) export STATEMENT_NAME="" # example: "user-filter" export ORG_ID="" # example: "b0b21724-4586-4a07-b787-d0bb5aacbf87" export ENV_ID="" # example: "env-z3y2x1" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export PRINCIPAL_ID="" # (optional) example: "sa-23kgz4" for a service account, or "u-aq1dr2" for a user account export SQL_CODE="" # example: "SELECT * FROM USERS;" export RESOURCE_VERSION="" # example: "a3e", must be fetched from the latest version of the statement export JSON_DATA="" The PRINCIPAL_ID parameter is optional. Confluent Cloud infers the principal from the provided Flink API key. The following JSON shows an example payload: { "name": "${STATEMENT_NAME}", "organization_id": "${ORG_ID}", "environment_id": "${ENV_ID}", "spec": { "statement": "${SQL_CODE}", "properties": { "key1": "value1", "key2": "value2" }, "compute_pool_id": "${COMPUTE_POOL_ID}", "principal": "${PRINCIPAL_ID}", "stopped": false }, "metadata": { "resource_version": "${RESOURCE_VERSION}" } } Quotation mark characters in the JSON string must be escaped, so the payload string to send resembles the following: export JSON_DATA="{ \"name\": \"${STATEMENT_NAME}\", \"organization_id\": \"${ORG_ID}\", \"environment_id\": \"${ENV_ID}\", \"spec\": { \"statement\": \"${SQL_CODE}\", \"properties\": { \"key1\": \"value1\", \"key2\": \"value2\" }, \"compute_pool_id\": \"${COMPUTE_POOL_ID}\", \"principal\": \"${PRINCIPAL_ID}\", \"stopped\": false }, \"metadata\": { \"resource_version\": \"${RESOURCE_VERSION}\" } }" The following command sends a PUT request that updates metadata for an existing Flink SQL statement. curl --request PUT \ --url "https://flink.${CLOUD_REGION}.${CLOUD_PROVIDER}.confluent.cloud/sql/v1/organizations/${ORG_ID}/environments/${ENV_ID}/statements/${STATEMENT_NAME}" \ --header "Authorization: Basic ${BASE64_FLINK_KEY_AND_SECRET}" \ --header 'content-type: application/json' \ --data "${JSON_DATA}" Resource version is required in the PUT request and changes every time the statement is updated, by the system or by the user. It’s not possible to calculate the resource version ahead of time, so if the statement has changed since it was fetched, you must submit a GET request, reapply the modifications, and try the update again. This means you must loop and retry on 409 errors. The following pseudo code shows the loop. while true: statement = getStatement() # make modifications to the current statement statement.spec.stopped = True # send the update response = updateStatement(statement) # if a conflict, retry if response.code == 409: continue elif response.code == 200: return "success" else: return response.error() Delete a statement¶ Delete a statement from the compute pool by sending a DELETE request to the Statements endpoint. Once a statement deleted, it can’t be undone. State is cleaned up by Confluent Cloud. When deletion is complete, the statement is no longer accessible. Deleting a statement requires the following inputs: export FLINK_API_KEY="" export FLINK_API_SECRET="" export BASE64_FLINK_KEY_AND_SECRET=$(echo -n "${FLINK_API_KEY}:${FLINK_API_SECRET}" | base64 -w 0) export STATEMENT_NAME="" # example: "user-filter" export ORG_ID="" # example: "b0b21724-4586-4a07-b787-d0bb5aacbf87" export ENV_ID="" # example: "env-z3y2x1" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" The following command deletes a statement in the specified organization and environment. curl --request DELETE \ --url "https://flink.${CLOUD_REGION}.${CLOUD_PROVIDER}.confluent.cloud/sql/v1/organizations/${ORG_ID}/environments/${ENV_ID}/statements/${STATEMENT_NAME}" \ --header "Authorization: Basic ${BASE64_FLINK_KEY_AND_SECRET}" Manage compute pools¶ Using requests to the Flink REST API, you can perform these actions: List Flink compute pools Create a Flink compute pool Read a Flink compute pool Update a Flink compute pool Delete a Flink compute pool You must be authorized to create, update, delete (FlinkAdmin) or use (FlinkDeveloper) a compute pool. For more information, see Grant Role-Based Access in Confluent Cloud for Apache Flink. List Flink compute pools¶ List the compute pools in your environment by sending a GET request to the Compute Pools endpoint. This request uses your Cloud API key instead of the Flink API key. Listing the compute pools in your environment requires the following inputs: export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" Run the following command to list the compute pools in your environment. curl --request GET \ --url "https://confluent.cloud/api/fcpm/v2/compute-pools?environment=${ENV_ID}&page_size=100" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" \ | jq -r '.data[] | .spec.display_name, {id}' Your output should resemble: compute_pool_0 { "id": "lfcp-j123kl" } compute_pool_2 { "id": "lfcp-abc1de" } my-lfcp-01 { "id": "lfcp-l2mn3o" } ... Find your compute pool in the list and save its ID in an environment variable. export COMPUTE_POOL_ID="" Create a Flink compute pool¶ Create a compute pool in your environment by sending a POST request to the Compute Pools endpoint. This request uses your Cloud API key instead of the Flink API key. Creating a compute pool requires the following inputs: export COMPUTE_POOL_NAME="" # human readable name, for example: "my-compute-pool" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export MAX_CFU="" # example: 5 export JSON_DATA="" The following JSON shows an example payload. The network key is optional. { "spec": { "display_name": "${COMPUTE_POOL_NAME}", "cloud": "${CLOUD_PROVIDER}", "region": "${CLOUD_REGION}", "max_cfu": ${MAX_CFU}, "environment": { "id": "${ENV_ID}" }, "network": { "id": "n-00000", "environment": "string" } } } Quotation mark characters in the JSON string must be escaped, so the payload string to send resembles the following: export JSON_DATA="{ \"spec\": { \"display_name\": \"${COMPUTE_POOL_NAME}\", \"cloud\": \"${CLOUD_PROVIDER}\", \"region\": \"${CLOUD_REGION}\", \"max_cfu\": ${MAX_CFU}, \"environment\": { \"id\": \"${ENV_ID}\" } } }" The following command sends a POST request to create a compute pool. curl --request POST \ --url https://api.confluent.cloud/fcpm/v2/compute-pools \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" \ --header 'content-type: application/json' \ --data "${JSON_DATA}" Your output should resemble: Response from a request to create a compute pool { "api_version": "fcpm/v2", "id": "lfcp-6g7h8i", "kind": "ComputePool", "metadata": { "created_at": "2024-02-27T22:44:27.18964Z", "resource_name": "crn://confluent.cloud/organization=b0b21724-4586-4a07-b787-d0bb5aacbf87/environment=env-z3y2x1/flink-region=aws.us-east-1/compute-pool=lfcp-6g7h8i", "self": "https://api.confluent.cloud/fcpm/v2/compute-pools/lfcp-6g7h8i", "updated_at": "2024-02-27T22:44:27.18964Z" }, "spec": { "cloud": "AWS", "display_name": "my-compute-pool", "environment": { "id": "env-z3y2x1", "related": "https://api.confluent.cloud/fcpm/v2/compute-pools/lfcp-6g7h8i", "resource_name": "crn://confluent.cloud/organization=b0b21724-4586-4a07-b787-d0bb5aacbf87/environment=env-z3y2x1" }, "http_endpoint": "https://flink.us-east-1.aws.confluent.cloud/sql/v1/organizations/b0b21724-4586-4a07-b787-d0bb5aacbf87/environments/env-z3y2x1", "max_cfu": 5, "region": "us-east-1" }, "status": { "current_cfu": 0, "phase": "PROVISIONING" } } Read a Flink compute pool¶ Get the details about a compute pool in your environment by sending a GET request to the Compute Pools endpoint. This request uses your Cloud API key instead of the Flink API key. Getting details about a compute pool requires the following inputs: export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" Run the following command to get details about the compute pool specified in the COMPUTE_POOL_ID environment variable. curl --request GET \ --url "https://api.confluent.cloud/fcpm/v2/compute-pools/${COMPUTE_POOL_ID}?environment=${ENV_ID}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" Your output should resemble: Response from a request to read a compute pool { "api_version": "fcpm/v2", "id": "lfcp-6g7h8i", "kind": "ComputePool", "metadata": { "created_at": "2024-02-27T22:44:27.18964Z", "resource_name": "crn://confluent.cloud/organization=b0b21724-4586-4a07-b787-d0bb5aacbf87/environment=env-z3y2x1/flink-region=aws.us-east-1/compute-pool=lfcp-6g7h8i", "self": "https://api.confluent.cloud/fcpm/v2/compute-pools/lfcp-6g7h8i", "updated_at": "2024-02-27T22:44:27.18964Z" }, "spec": { "cloud": "AWS", "display_name": "my-compute-pool", "environment": { "id": "env-z3y2x1", "related": "https://api.confluent.cloud/fcpm/v2/compute-pools/lfcp-6g7h8i", "resource_name": "crn://confluent.cloud/organization=b0b21724-4586-4a07-b787-d0bb5aacbf87/environment=env-z3y2x1" }, "http_endpoint": "https://flink.us-east-1.aws.confluent.cloud/sql/v1/organizations/b0b21724-4586-4a07-b787-d0bb5aacbf87/environments/env-z3y2x1", "max_cfu": 5, "region": "us-east-1" }, "status": { "current_cfu": 0, "phase": "PROVISIONED" } } Update a Flink compute pool¶ Update a compute pool in your environment by sending a PATCH request to the Compute Pools endpoint. This request uses your Cloud API key instead of the Flink API key. Updating a compute pool requires the following inputs: export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" export MAX_CFU="" # example: 5 export JSON_DATA="" The following JSON shows an example payload. The network key is optional. { "spec": { "display_name": "${COMPUTE_POOL_NAME}", "max_cfu": ${MAX_CFU}, "environment": { "id": "${ENV_ID}" } } } Quotation mark characters in the JSON string must be escaped, so the payload string to send resembles the following: export JSON_DATA="{ \"spec\": { \"display_name\": \"${COMPUTE_POOL_NAME}\", \"max_cfu\": ${MAX_CFU}, \"environment\": { \"id\": \"${ENV_ID}\" } } }" Run the following command to update the compute pool specified in the COMPUTE_POOL_ID environment variable. curl --request PATCH \ --url "https://api.confluent.cloud/fcpm/v2/compute-pools/${COMPUTE_POOL_ID}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" \ --header 'content-type: application/json' \ --data "${JSON_DATA}" Delete a Flink compute pool¶ Delete a compute pool in your environment by sending a DELETE request to the Compute Pools endpoint. This request uses your Cloud API key instead of the Flink API key. Deleting a compute pool requires the following inputs: export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" Run the following command to delete the compute pool specified in the COMPUTE_POOL_ID environment variable. curl --request DELETE \ --url "https://api.confluent.cloud/fcpm/v2/compute-pools/${COMPUTE_POOL_ID}?environment=${ENV_ID}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" List Flink regions¶ List the regions where Flink is available by sending a GET request to the Regions endpoint. This request uses your Cloud API key instead of the Flink API key. Getting details about a compute pool requires the following inputs: export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) Run the following command to list the available Flink regions. curl --request GET \ --url "https://api.confluent.cloud/fcpm/v2/regions" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" \ | jq -r '.data[].id' Your output should resemble: aws.eu-central-1 aws.us-east-1 aws.eu-west-1 aws.us-east-2 ... Manage Flink artifacts¶ Using requests to the Flink REST API, you can perform these actions: List Flink artifacts Create a Flink artifact Read an artifact Update an artifact Delete an artifact An artifact has the following schema: api_version: artifact/v1 kind: FlinkArtifact id: dlz-f3a90de metadata: self: 'https://api.confluent.cloud/artifact/v1/flink-artifacts/fa-12345' resource_name: crn://confluent.cloud/organization=/flink-artifact=fa-12345 created_at: '2006-01-02T15:04:05-07:00' updated_at: '2006-01-02T15:04:05-07:00' deleted_at: '2006-01-02T15:04:05-07:00' cloud: AWS region: us-east-1 environment: env-00000 display_name: string class: io.confluent.example.SumScalarFunction content_format: JAR description: string documentation_link: '^$|^(http://|https://).' runtime_language: JAVA versions: - version: cfa-ver-001 release_notes: string is_beta: true artifact_id: {} upload_source: api_version: artifact.v1/UploadSource kind: PresignedUrl id: dlz-f3a90de metadata: self: https://api.confluent.cloud/artifact.v1/UploadSource/presigned-urls/pu-12345 resource_name: crn://confluent.cloud/organization=/presigned-url=pu-12345 created_at: '2006-01-02T15:04:05-07:00' updated_at: '2006-01-02T15:04:05-07:00' deleted_at: '2006-01-02T15:04:05-07:00' location: PRESIGNED_URL_LOCATION upload_id: List Flink artifacts¶ List the artifacts, like user-defined functions (UDFs), in your environment by sending a GET request to the List Artifacts endpoint. This request uses your Cloud API key instead of the Flink API key. Listing the artifacts in your environment requires the following inputs: export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" Run the following command to list the artifacts in your environment. curl --request GET \ --url "https://api.confluent.cloud/artifact/v1/flink-artifacts?cloud=${CLOUD_PROVIDER}®ion=${CLOUD_REGION}&environment=${ENV_ID}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" \ | jq -r '.data[] | .spec.display_name, {id}' Your output should resemble: { "id": "cfa-e8rzq7" } Create a Flink artifact¶ Creating an artifact, like a user-defined function (UDF), requires these steps: Request a presigned upload URL for a new Flink Artifact by sending a POST request to the Presigned URLs endpoint. Upload your JAR file to the object storage provider by using the results from the presigned URL request. Create the artifact in your environment by sending a POST request to the Create Artifact endpoint. These requests use your Cloud API key instead of the Flink API key. Creating an artifact in your environment requires the following inputs: export ARTIFACT_DISPLAY_NAME="" # example: "my-udf" export ARTIFACT_DESCRIPTION="" # example: "This is a demo UDF." export ARTIFACT_DOC_LINK="" # example: "https://docs.example.com/my-udf" export CLASS_NAME="" # example: "io.confluent.example.SumScalarFunction" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" The following JSON shows an example payload. { "content_format": "JAR", "cloud": "${CLOUD_PROVIDER}", "environment": "${ENV_ID}", "region": "${CLOUD_REGION}" } Quotation mark characters in the JSON string must be escaped, so the payload string to send resembles the following: export JSON_DATA="{ \"content_format\": \"JAR\", \"cloud\": \"${CLOUD_PROVIDER}\", \"environment\": \"${ENV_ID}\", \"region\": \"${CLOUD_REGION}\" }" Run the following command to request the upload identifier and the presigned upload URL for your artifact. curl --request POST \ --url https://api.confluent.cloud/artifact/v1/presigned-upload-url \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" \ --header 'content-type: application/json' \ --data "${JSON_DATA}" Your output should resemble: { "api_version": "artifact/v1", "cloud": "AWS", "content_format": "JAR", "kind": "PresignedUrl", "region": "us-east-1", "upload_form_data": { "bucket": "confluent-custom-connectors-prod-us-east-1", "key": "staging/ccp/v1//custom-plugins//plugin.jar", "policy": "ey…", "x-amz-algorithm": "AWS4-HMAC-SHA256", "x-amz-credential": "AS…/20241121/us-east-1/s3/aws4_request", "x-amz-date": "20241121T212232Z", "x-amz-security-token": "IQ…", "x-amz-signature": "52…" }, "upload_id": "", "upload_url": "https://confluent-custom-connectors-prod-us-east-1.s3.dualstack.us-east-1.amazonaws.com/" } For convenience, save the security details in environment variables: export UPLOAD_ID="" export UPLOAD_URL="" export UPLOAD_BUCKET="" export UPLOAD_KEY="" export UPLOAD_POLICY="" export UPLOAD_KEY="" export X_AMZ_ALGORITHM="" export X_AMZ_CREDENTIAL="" export X_AMZ_DATE="" export X_AMZ_SECURITY_TOKEN="" export X_AMZ_SIGNATURE="" Once you have the presigned URL, ID, bucket policy, and other security details, upload your JAR to the bucket. The following example provides a curl command you can use to upload your JAR file. Note When specifying the JAR file to upload, you must use the @ symbol at the start of the file path. For example, -F file=@. If the @ symbol is not used, you may see an error stating that Your proposed upload is smaller than the minimum allowed size. curl -X POST "${UPLOAD_URL}" \ -F "bucket=${UPLOAD_BUCKET}" \ -F "key=${UPLOAD_KEY}" \ -F "policy=${UPLOAD_POLICY}" \ -F "x-amz-algorithm=${X_AMZ_ALGORITHM}" \ -F "x-amz-credential=${X_AMZ_CREDENTIAL}" \ -F "x-amz-date=${X_AMZ_DATE}" \ -F "x-amz-security-token=${X_AMZ_SECURITY_TOKEN}" \ -F "x-amz-signature=${X_AMZ_SIGNATURE}" \ -F file=@/path/to/udf_file.jar When your JAR file is uploaded to the object score, you can create the UDF in Confluent Cloud for Apache Flink by sending a POST request to the Create Artifact endpoint. The following JSON shows an example payload. { "cloud": "${CLOUD_PROVIDER}", "region": "${CLOUD_REGION}", "environment": "${ENV_ID}", "display_name": "${ARTIFACT_DISPLAY_NAME}", "class": "${CLASS_NAME}", "content_format": "JAR", "description": "${ARTIFACT_DESCRIPTION}", "documentation_link": "${ARTIFACT_DOC_LINK}", "runtime_language": "JAVA", "upload_source": { "location": "PRESIGNED_URL_LOCATION", "upload_id": "${UPLOAD_ID}" } } Quotation mark characters in the JSON string must be escaped, so the payload string resembles the following: export JSON_DATA="{ \"cloud\": \"${CLOUD_PROVIDER}\", \"region\": \"${CLOUD_REGION}\", \"environment\": \"${ENV_ID}\", \"display_name\": \"${ARTIFACT_DISPLAY_NAME}\", \"class\": \"${CLASS_NAME}\", \"content_format\": \"JAR\", \"description\": \"${ARTIFACT_DESCRIPTION}\", \"documentation_link\": \"${ARTIFACT_DOC_LINK}\", \"runtime_language\": \"JAVA\", \"upload_source\": { \"location\": \"PRESIGNED_URL_LOCATION\", \"upload_id\": \"${UPLOAD_ID}\" } }" Run the following command to create the artifact in your environment. curl --request POST \ --url "https://api.confluent.cloud/artifact/v1/flink-artifacts?cloud=${CLOUD_REGION}®ion=${CLOUD_REGION}&environment=${ENV_ID}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" \ --header 'content-type: application/json' \ --data "${JSON_DATA}" Read an artifact¶ Get the details about an artifact in your environment by sending a GET request to the Read Artifact endpoint. This request uses your Cloud API key instead of the Flink API key. Getting details about an artifact requires the following inputs: export ARTIFACT_ID="" # example: cfa-e8rzq7 export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" Run the following command to get details about the artifact specified by the ARTIFACT_ID environment variable. curl --request GET \ --url "https://api.confluent.cloud/artifact/v1/flink-artifacts/${ARTIFACT_ID}?cloud=${CLOUD_PROVIDER}®ion=${CLOUD_REGION}&environment=${ENV_ID}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" Your output should resemble: Response from a request to get details about an artifact { "api_version": "artifact/v1", "class": "default", "cloud": "AWS", "content_format": "JAR", "description": "", "display_name": "udf_example", "documentation_link": "", "environment": "env-z3q9rd", "id": "cfa-e8rzq7", "kind": "FlinkArtifact", "metadata": { "created_at": "2024-11-21T21:52:43.788042Z", "resource_name": "crn://confluent.cloud/organization=/flink-artifact=cfa-e8rzq7", "self": "https://api.confluent.cloud/artifact/v1/flink-artifacts/cfa-e8rzq7", "updated_at": "2024-11-21T21:52:44.625318Z" }, "region": "us-east-1", "runtime_language": "JAVA", "versions": [ { "artifact_id": {}, "is_beta": false, "release_notes": "", "upload_source": { "location": "PRESIGNED_URL_LOCATION", "upload_id": "" }, "version": "ver-xq72dk" } ] } Update an artifact¶ Update an artifact in your environment by sending a PATCH request to the Update Artifact endpoint. This request uses your Cloud API key instead of the Flink API key. Updating an artifact in your environment requires the following inputs: export ARTIFACT_ID="" # example: cfa-e8rzq7 export ARTIFACT_DISPLAY_NAME="" # example: "my-udf" export ARTIFACT_DESCRIPTION="" # example: "This is a demo UDF." export ARTIFACT_DOC_LINK="" # example: "https://docs.example.com/my-udf", "^$|^(http://|https://)." export CLASS_NAME="" # example: "io.confluent.example.SumScalarFunction" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" The following JSON shows an example payload. Response from a request to update an artifact { "cloud": "${CLOUD_PROVIDER}", "region": "${CLOUD_REGION}", "environment": "${ENV_ID}", "display_name": "${ARTIFACT_DISPLAY_NAME}", "content_format": "JAR", "description": "${ARTIFACT_DESCRIPTION}", "documentation_link": "${ARTIFACT_DOC_LINK}", "runtime_language": "JAVA", "versions": [ { "version": "cfa-ver-001", "release_notes": "string", "is_beta": true, "artifact_id": { "cloud": "${CLOUD_PROVIDER}", "region": "${CLOUD_REGION}", "environment": "${ENV_ID}", "display_name": "${ARTIFACT_DISPLAY_NAME}", "class": "${CLASS_NAME}", "content_format": "JAR", "description": "${ARTIFACT_DESCRIPTION}", "documentation_link": "${ARTIFACT_DOC_LINK}", "runtime_language": "JAVA", "versions": [ {} ] }, "upload_source": { "location": "PRESIGNED_URL_LOCATION", "upload_id": "${UPLOAD_ID}" } } ] } Quotation mark characters in the JSON string must be escaped, so the payload string resembles the following: export JSON_DATA="{ \"cloud\": \"${CLOUD_PROVIDER}\", \"region\": \"${CLOUD_REGION}\", \"environment\": \"${ENV_ID}\", \"display_name\": \"${ARTIFACT_DISPLAY_NAME}\", \"content_format\": \"JAR\", \"description\": \"${ARTIFACT_DESCRIPTION}\", \"documentation_link\": \"${ARTIFACT_DOC_LINK}\", \"runtime_language\": \"JAVA\", \"versions\": [ { \"version\": \"cfa-ver-001\", \"release_notes\": \"string\", \"is_beta\": true, \"artifact_id\": { \"cloud\": \"${CLOUD_PROVIDER}\", \"region\": \"${CLOUD_REGION}\", \"environment\": \"${ENV_ID}\", \"display_name\": \"${ARTIFACT_DISPLAY_NAME}\", \"class\": \"${CLASS_NAME}\", \"content_format\": \"JAR\", \"description\": \"${ARTIFACT_DESCRIPTION}\", \"documentation_link\": \"${ARTIFACT_DOC_LINK}\", \"runtime_language\": \"JAVA\", \"versions\": [ {} ] }, \"upload_source\": { \"location\": \"PRESIGNED_URL_LOCATION\", \"upload_id\": \"${UPLOAD_ID}\" } } ] }" Run the following command to update the artifact specified by the ARTIFACT_ID environment variable. curl --request PATCH \ --url "https://api.confluent.cloud/artifact/v1/flink-artifacts/${ARTIFACT_ID}?cloud=${CLOUD_PROVIDER}®ion=${CLOUD_REGION}&environment=${ENV_ID}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" \ --header 'content-type: application/json' \ --data "${JSON_DATA}" Delete an artifact¶ Delete an artifact in your environment by sending a DELETE request to the Delete Artifact endpoint. This request uses your Cloud API key instead of the Flink API key. Deleting an artifact in your environment requires the following inputs: export ARTIFACT_ID="" # example: cfa-e8rzq7 export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" Run the following command to delete an artifact specified by the ARTIFACT_ID environment variable. curl --request DELETE \ --url "https://api.confluent.cloud/artifact/v1/flink-artifacts/${ARTIFACT_ID}?cloud=${CLOUD_PROVIDER}®ion=${CLOUD_REGION}&environment=${ENV_ID}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" Manage UDF logging¶ When you create a user defined function (UDF) with Confluent Cloud for Apache Flink®, you have the option of enabling logging to a Kafka topic to help with monitoring and debugging. For more information, see Enable Logging in a User Defined Function. Using requests to the Flink REST API, you can perform these actions: Enable logging List UDF logs Disable a UDF log View log details Update the logging level for a UDF log Managing UDF logs requires the following inputs: export UDF_LOG_ID="" # example: "ccl-4l5klo" export UDF_LOG_TOPIC_NAME="" # example: "udf_log" export KAFKA_CLUSTER_ID="" # example: "lkc-12345" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export ENV_ID="" # example: "env-z3y2x1" Enable logging¶ Run the following command to enable UDF logging. cat << EOF | curl --silent -X POST -u ${CLOUD_API_KEY}:${CLOUD_API_SECRET} \ -d @- https://api.confluent.cloud/ccl/v1/custom-code-loggings { "cloud":"${CLOUD_PROVIDER}", "region":"${CLOUD_REGION}", "environment": { "id":"${ENV_ID}" }, "destination_settings":{ "kind":"Kafka", "cluster_id":"${KAFKA_CLUSTER_ID}", "topic":"${UDF_LOG_TOPIC_NAME}", "log_level":"info" } } EOF List UDF logs¶ To list the active UDF logs, run the following commands. curl --silent -X GET \ -u ${CLOUD_API_KEY}:${CLOUD_API_SECRET} \ https://api.confluent.cloud/ccl/v1/custom-code-loggings?environment=${ENV_ID} Disable a UDF log¶ Run the following command to disable UDF logging. curl --silent -X DELETE \ -u ${CLOUD_API_KEY}:${CLOUD_API_SECRET} \ https://api.confluent.cloud/ccl/v1/custom-code-loggings/${UDF_LOG_ID}?environment=${ENV_ID} View log details¶ Run the following command to view the details of a UDF log. curl --silent -X GET \ -u ${CLOUD_API_KEY}:${CLOUD_API_SECRET} \ https://api.confluent.cloud/ccl/v1/custom-code-loggings/${UDF_LOG_ID}?environment=${ENV_ID} Update the logging level for a UDF log¶ Run the following command to change the logging level for an active UDF log. cat < https://flink.${CLOUD_REGION}.${CLOUD_PROVIDER}.confluent.cloud/ https://flink.${CLOUD_REGION}.${CLOUD_PROVIDER}.private.confluent.cloud/ ``` ```sql https://flink.us-east-1.aws.confluent.cloud ``` ```sql https://flink.us-east-1.aws.private.confluent.cloud ``` ```sql export FLINK_API_KEY="" export FLINK_API_SECRET="" ``` ```sql export BASE64_FLINK_KEY_AND_SECRET=$(echo -n "${FLINK_API_KEY}:${FLINK_API_SECRET}" | base64 -w 0) ``` ```sql api_version: "sql/v1" kind: "Statement" organization_id: "" # Identifier of your Confluent Cloud organization environment_id: "" # Identifier of your Confluent Cloud environment name: "" # Primary identifier of the statement, must be unique within the environment, 100 max length, [a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)* metadata: created_at: "" # Creation timestamp of this resource updated_at: "" # Last updated timestamp of this resource resource_version: "" # Generated by the system and updated whenever the statement is updated (including by the system). Opaque and should not be parsed. self: "" # An absolute URL to this resource uid: "" # uid is unique in time and space (i.e., even if the name is re-used) spec: compute_pool_id: "" # The ID of the compute pool the statement should run in. DNS Subdomain (RFC 1123) – 255 max len, [a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)* principal: "" # user or service account ID properties: map[string]string # Optional. request/client properties statement: "SELECT * from Orders;" # The raw SQL text stopped: false # Boolean, specifying if the statement should be stopped status: phase: PENDING | RUNNING | COMPLETED | DELETING | FAILING | FAILED detail: "" # Optional. Human-readable description of phase. result_schema: "" # Optional. JSON object in TableSchema format; describes the data returned by the results serving API. ``` ```sql [a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)* ``` ```sql export FLINK_API_KEY="" export FLINK_API_SECRET="" export BASE64_FLINK_KEY_AND_SECRET=$(echo -n "${FLINK_API_KEY}:${FLINK_API_SECRET}" | base64 -w 0) export STATEMENT_NAME="" # example: "user-filter" export ORG_ID="" # example: "b0b21724-4586-4a07-b787-d0bb5aacbf87" export ENV_ID="" # example: "env-z3y2x1" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export PRINCIPAL_ID="" # (optional) example: "sa-23kgz4" for a service account, or "u-aq1dr2" for a user account export SQL_CODE="" # example: "SELECT * FROM USERS;" export JSON_DATA="" ``` ```sql { "name": "${STATEMENT_NAME}", "organization_id": "${ORG_ID}", "environment_id": "${ENV_ID}", "spec": { "statement": "${SQL_CODE}", "properties": { "key1": "value1", "key2": "value2" }, "compute_pool_id": "${COMPUTE_POOL_ID}", "principal": "${PRINCIPAL_ID}", "stopped": false } } ``` ```sql export JSON_DATA="{ \"name\": \"${STATEMENT_NAME}\", \"organization_id\": \"${ORG_ID}\", \"environment_id\": \"${ENV_ID}\", \"spec\": { \"statement\": \"${SQL_CODE}\", \"properties\": { \"key1\": \"value1\", \"key2\": \"value2\" }, \"compute_pool_id\": \"${COMPUTE_POOL_ID}\", \"principal\": \"${PRINCIPAL_ID}\", \"stopped\": false } }" ``` ```sql curl --request POST \ --url "https://flink.${CLOUD_REGION}.${CLOUD_PROVIDER}.confluent.cloud/sql/v1/organizations/${ORG_ID}/environments/${ENV_ID}/statements" \ --header "Authorization: Basic ${BASE64_FLINK_KEY_AND_SECRET}" \ --header 'content-type: application/json' \ --data "${JSON_DATA}" ``` ```sql { "api_version": "sql/v1", "environment_id": "env-z3y2x1", "kind": "Statement", "metadata": { "created_at": "2023-12-16T17:12:08.914198Z", "resource_version": "1", "self": "https://flink.us-east-1.aws.confluent.cloud/sql/v1/organizations/b0b21724-4586-4a07-b787-d0bb5aacbf87/environments/env-z3y2x1/statements/demo-statement-1", "uid": "0005dd7b-8a7e-4274-b97e-c21b134d98f0", "updated_at": "2023-12-16T17:12:08.914198Z" }, "name": "demo-statement-1", "organization_id": "b0b21724-4586-4a07-b787-d0bb5aacbf87", "spec": { "compute_pool_id": "lfcp-8m03rm", "principal": "u-aq1dr2", "properties": null, "statement": "select 1;", "stopped": false }, "status": { "detail": "", "phase": "PENDING" } } ``` ```sql export FLINK_API_KEY="" export FLINK_API_SECRET="" export BASE64_FLINK_KEY_AND_SECRET=$(echo -n "${FLINK_API_KEY}:${FLINK_API_SECRET}" | base64 -w 0) export STATEMENT_NAME="" # example: "user-filter" export ORG_ID="" # example: "b0b21724-4586-4a07-b787-d0bb5aacbf87" export ENV_ID="" # example: "env-z3y2x1" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" ``` ```sql curl --request GET \ --url "https://flink.${CLOUD_REGION}.${CLOUD_PROVIDER}.confluent.cloud/sql/v1/organizations/${ORG_ID}/environments/${ENV_ID}/statements/${STATEMENT_NAME}" \ --header "Authorization: Basic ${BASE64_FLINK_KEY_AND_SECRET}" ``` ```sql { "api_version": "sql/v1", "environment_id": "env-z3y2x1", "kind": "Statement", "metadata": { "created_at": "2023-12-16T16:08:36.650591Z", "resource_version": "13", "self": "https://flink.us-east-1.aws.confluent.cloud/sql/v1/organizations/b0b21724-4586-4a07-b787-d0bb5aacbf87/environments/env-z3y2x1/statements/demo-statement-1", "uid": "5387a4a4-02dd-4375-8db1-80bdd82ede96", "updated_at": "2023-12-16T16:10:05.353298Z" }, "name": "demo-statement-1", "organization_id": "b0b21724-4586-4a07-b787-d0bb5aacbf87", "spec": { "compute_pool_id": "lfcp-8m03rm", "principal": "u-aq1dr2", "properties": null, "statement": "select 1;", "stopped": false }, "status": { "detail": "", "phase": "COMPLETED", "result_schema": { "columns": [ { "name": "EXPR$0", "type": { "nullable": false, "type": "INTEGER" } } ] } } } ``` ```sql curl --request GET \ --url "https://flink.${CLOUD_REGION}.${CLOUD_PROVIDER}.confluent.cloud/sql/v1/organizations/${ORG_ID}/environments/${ENV_ID}/statements/${STATEMENT_NAME}" \ --header "Authorization: Basic ${BASE64_FLINK_KEY_AND_SECRET}" \ | jq -r '.spec.statement' ``` ```sql spec.compute_pool_id ``` ```sql metadata.next ``` ```sql StatementList ``` ```sql export FLINK_API_KEY="" export FLINK_API_SECRET="" export BASE64_FLINK_KEY_AND_SECRET=$(echo -n "${FLINK_API_KEY}:${FLINK_API_SECRET}" | base64 -w 0) export ORG_ID="" # example: "b0b21724-4586-4a07-b787-d0bb5aacbf87" export ENV_ID="environment-id" # example: "env-z3y2x1" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" ``` ```sql curl --request GET \ --url "https://flink.${CLOUD_REGION}.${CLOUD_PROVIDER}.confluent.cloud/sql/v1/organizations/${ORG_ID}/environments/${ENV_ID}/statements" \ --header "Authorization: Basic ${BASE64_FLINK_KEY_AND_SECRET}" ``` ```sql { "api_version": "sql/v1", "data": [ { "api_version": "sql/v1", "environment_id": "env-z3y2x1", "kind": "Statement", "metadata": { "created_at": "2023-12-16T16:08:36.650591Z", "resource_version": "13", "self": "https://flink.us-east-1.aws.confluent.cloud/sql/v1/organizations/b0b21724-4586-4a07-b787-d0bb5aacbf87/environments/env-z3y2x1/statements/demo-statement-1", "uid": "5387a4a4-02dd-4375-8db1-80bdd82ede96", "updated_at": "2023-12-16T16:10:05.353298Z" }, "name": "demo-statement-1", "organization_id": "b0b21724-4586-4a07-b787-d0bb5aacbf87", "spec": { "compute_pool_id": "lfcp-8m03rm", "principal": "u-aq1dr2", "properties": null, "statement": "select 1;", "stopped": false }, "status": { "detail": "", "phase": "COMPLETED", "result_schema": { "columns": [ { "name": "EXPR$0", "type": { "nullable": false, "type": "INTEGER" } } ] } } } ``` ```sql export FLINK_API_KEY="" export FLINK_API_SECRET="" export BASE64_FLINK_KEY_AND_SECRET=$(echo -n "${FLINK_API_KEY}:${FLINK_API_SECRET}" | base64 -w 0) export STATEMENT_NAME="" # example: "user-filter" export ORG_ID="" # example: "b0b21724-4586-4a07-b787-d0bb5aacbf87" export ENV_ID="" # example: "env-z3y2x1" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export PRINCIPAL_ID="" # (optional) example: "sa-23kgz4" for a service account, or "u-aq1dr2" for a user account export SQL_CODE="" # example: "SELECT * FROM USERS;" export RESOURCE_VERSION="" # example: "a3e", must be fetched from the latest version of the statement export JSON_DATA="" ``` ```sql { "name": "${STATEMENT_NAME}", "organization_id": "${ORG_ID}", "environment_id": "${ENV_ID}", "spec": { "statement": "${SQL_CODE}", "properties": { "key1": "value1", "key2": "value2" }, "compute_pool_id": "${COMPUTE_POOL_ID}", "principal": "${PRINCIPAL_ID}", "stopped": false }, "metadata": { "resource_version": "${RESOURCE_VERSION}" } } ``` ```sql export JSON_DATA="{ \"name\": \"${STATEMENT_NAME}\", \"organization_id\": \"${ORG_ID}\", \"environment_id\": \"${ENV_ID}\", \"spec\": { \"statement\": \"${SQL_CODE}\", \"properties\": { \"key1\": \"value1\", \"key2\": \"value2\" }, \"compute_pool_id\": \"${COMPUTE_POOL_ID}\", \"principal\": \"${PRINCIPAL_ID}\", \"stopped\": false }, \"metadata\": { \"resource_version\": \"${RESOURCE_VERSION}\" } }" ``` ```sql curl --request PUT \ --url "https://flink.${CLOUD_REGION}.${CLOUD_PROVIDER}.confluent.cloud/sql/v1/organizations/${ORG_ID}/environments/${ENV_ID}/statements/${STATEMENT_NAME}" \ --header "Authorization: Basic ${BASE64_FLINK_KEY_AND_SECRET}" \ --header 'content-type: application/json' \ --data "${JSON_DATA}" ``` ```sql while true: statement = getStatement() # make modifications to the current statement statement.spec.stopped = True # send the update response = updateStatement(statement) # if a conflict, retry if response.code == 409: continue elif response.code == 200: return "success" else: return response.error() ``` ```sql export FLINK_API_KEY="" export FLINK_API_SECRET="" export BASE64_FLINK_KEY_AND_SECRET=$(echo -n "${FLINK_API_KEY}:${FLINK_API_SECRET}" | base64 -w 0) export STATEMENT_NAME="" # example: "user-filter" export ORG_ID="" # example: "b0b21724-4586-4a07-b787-d0bb5aacbf87" export ENV_ID="" # example: "env-z3y2x1" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" ``` ```sql curl --request DELETE \ --url "https://flink.${CLOUD_REGION}.${CLOUD_PROVIDER}.confluent.cloud/sql/v1/organizations/${ORG_ID}/environments/${ENV_ID}/statements/${STATEMENT_NAME}" \ --header "Authorization: Basic ${BASE64_FLINK_KEY_AND_SECRET}" ``` ```sql FlinkDeveloper ``` ```sql export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" ``` ```sql curl --request GET \ --url "https://confluent.cloud/api/fcpm/v2/compute-pools?environment=${ENV_ID}&page_size=100" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" \ | jq -r '.data[] | .spec.display_name, {id}' ``` ```sql compute_pool_0 { "id": "lfcp-j123kl" } compute_pool_2 { "id": "lfcp-abc1de" } my-lfcp-01 { "id": "lfcp-l2mn3o" } ... ``` ```sql export COMPUTE_POOL_ID="" ``` ```sql export COMPUTE_POOL_NAME="" # human readable name, for example: "my-compute-pool" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export MAX_CFU="" # example: 5 export JSON_DATA="" ``` ```sql { "spec": { "display_name": "${COMPUTE_POOL_NAME}", "cloud": "${CLOUD_PROVIDER}", "region": "${CLOUD_REGION}", "max_cfu": ${MAX_CFU}, "environment": { "id": "${ENV_ID}" }, "network": { "id": "n-00000", "environment": "string" } } } ``` ```sql export JSON_DATA="{ \"spec\": { \"display_name\": \"${COMPUTE_POOL_NAME}\", \"cloud\": \"${CLOUD_PROVIDER}\", \"region\": \"${CLOUD_REGION}\", \"max_cfu\": ${MAX_CFU}, \"environment\": { \"id\": \"${ENV_ID}\" } } }" ``` ```sql curl --request POST \ --url https://api.confluent.cloud/fcpm/v2/compute-pools \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" \ --header 'content-type: application/json' \ --data "${JSON_DATA}" ``` ```sql { "api_version": "fcpm/v2", "id": "lfcp-6g7h8i", "kind": "ComputePool", "metadata": { "created_at": "2024-02-27T22:44:27.18964Z", "resource_name": "crn://confluent.cloud/organization=b0b21724-4586-4a07-b787-d0bb5aacbf87/environment=env-z3y2x1/flink-region=aws.us-east-1/compute-pool=lfcp-6g7h8i", "self": "https://api.confluent.cloud/fcpm/v2/compute-pools/lfcp-6g7h8i", "updated_at": "2024-02-27T22:44:27.18964Z" }, "spec": { "cloud": "AWS", "display_name": "my-compute-pool", "environment": { "id": "env-z3y2x1", "related": "https://api.confluent.cloud/fcpm/v2/compute-pools/lfcp-6g7h8i", "resource_name": "crn://confluent.cloud/organization=b0b21724-4586-4a07-b787-d0bb5aacbf87/environment=env-z3y2x1" }, "http_endpoint": "https://flink.us-east-1.aws.confluent.cloud/sql/v1/organizations/b0b21724-4586-4a07-b787-d0bb5aacbf87/environments/env-z3y2x1", "max_cfu": 5, "region": "us-east-1" }, "status": { "current_cfu": 0, "phase": "PROVISIONING" } } ``` ```sql export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" ``` ```sql curl --request GET \ --url "https://api.confluent.cloud/fcpm/v2/compute-pools/${COMPUTE_POOL_ID}?environment=${ENV_ID}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" ``` ```sql { "api_version": "fcpm/v2", "id": "lfcp-6g7h8i", "kind": "ComputePool", "metadata": { "created_at": "2024-02-27T22:44:27.18964Z", "resource_name": "crn://confluent.cloud/organization=b0b21724-4586-4a07-b787-d0bb5aacbf87/environment=env-z3y2x1/flink-region=aws.us-east-1/compute-pool=lfcp-6g7h8i", "self": "https://api.confluent.cloud/fcpm/v2/compute-pools/lfcp-6g7h8i", "updated_at": "2024-02-27T22:44:27.18964Z" }, "spec": { "cloud": "AWS", "display_name": "my-compute-pool", "environment": { "id": "env-z3y2x1", "related": "https://api.confluent.cloud/fcpm/v2/compute-pools/lfcp-6g7h8i", "resource_name": "crn://confluent.cloud/organization=b0b21724-4586-4a07-b787-d0bb5aacbf87/environment=env-z3y2x1" }, "http_endpoint": "https://flink.us-east-1.aws.confluent.cloud/sql/v1/organizations/b0b21724-4586-4a07-b787-d0bb5aacbf87/environments/env-z3y2x1", "max_cfu": 5, "region": "us-east-1" }, "status": { "current_cfu": 0, "phase": "PROVISIONED" } } ``` ```sql export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" export MAX_CFU="" # example: 5 export JSON_DATA="" ``` ```sql { "spec": { "display_name": "${COMPUTE_POOL_NAME}", "max_cfu": ${MAX_CFU}, "environment": { "id": "${ENV_ID}" } } } ``` ```sql export JSON_DATA="{ \"spec\": { \"display_name\": \"${COMPUTE_POOL_NAME}\", \"max_cfu\": ${MAX_CFU}, \"environment\": { \"id\": \"${ENV_ID}\" } } }" ``` ```sql curl --request PATCH \ --url "https://api.confluent.cloud/fcpm/v2/compute-pools/${COMPUTE_POOL_ID}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" \ --header 'content-type: application/json' \ --data "${JSON_DATA}" ``` ```sql export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" ``` ```sql curl --request DELETE \ --url "https://api.confluent.cloud/fcpm/v2/compute-pools/${COMPUTE_POOL_ID}?environment=${ENV_ID}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" ``` ```sql export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) ``` ```sql curl --request GET \ --url "https://api.confluent.cloud/fcpm/v2/regions" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" \ | jq -r '.data[].id' ``` ```sql aws.eu-central-1 aws.us-east-1 aws.eu-west-1 aws.us-east-2 ... ``` ```sql api_version: artifact/v1 kind: FlinkArtifact id: dlz-f3a90de metadata: self: 'https://api.confluent.cloud/artifact/v1/flink-artifacts/fa-12345' resource_name: crn://confluent.cloud/organization=/flink-artifact=fa-12345 created_at: '2006-01-02T15:04:05-07:00' updated_at: '2006-01-02T15:04:05-07:00' deleted_at: '2006-01-02T15:04:05-07:00' cloud: AWS region: us-east-1 environment: env-00000 display_name: string class: io.confluent.example.SumScalarFunction content_format: JAR description: string documentation_link: '^$|^(http://|https://).' runtime_language: JAVA versions: - version: cfa-ver-001 release_notes: string is_beta: true artifact_id: {} upload_source: api_version: artifact.v1/UploadSource kind: PresignedUrl id: dlz-f3a90de metadata: self: https://api.confluent.cloud/artifact.v1/UploadSource/presigned-urls/pu-12345 resource_name: crn://confluent.cloud/organization=/presigned-url=pu-12345 created_at: '2006-01-02T15:04:05-07:00' updated_at: '2006-01-02T15:04:05-07:00' deleted_at: '2006-01-02T15:04:05-07:00' location: PRESIGNED_URL_LOCATION upload_id: ``` ```sql export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" ``` ```sql curl --request GET \ --url "https://api.confluent.cloud/artifact/v1/flink-artifacts?cloud=${CLOUD_PROVIDER}®ion=${CLOUD_REGION}&environment=${ENV_ID}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" \ | jq -r '.data[] | .spec.display_name, {id}' ``` ```sql { "id": "cfa-e8rzq7" } ``` ```sql export ARTIFACT_DISPLAY_NAME="" # example: "my-udf" export ARTIFACT_DESCRIPTION="" # example: "This is a demo UDF." export ARTIFACT_DOC_LINK="" # example: "https://docs.example.com/my-udf" export CLASS_NAME="" # example: "io.confluent.example.SumScalarFunction" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" ``` ```sql { "content_format": "JAR", "cloud": "${CLOUD_PROVIDER}", "environment": "${ENV_ID}", "region": "${CLOUD_REGION}" } ``` ```sql export JSON_DATA="{ \"content_format\": \"JAR\", \"cloud\": \"${CLOUD_PROVIDER}\", \"environment\": \"${ENV_ID}\", \"region\": \"${CLOUD_REGION}\" }" ``` ```sql curl --request POST \ --url https://api.confluent.cloud/artifact/v1/presigned-upload-url \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" \ --header 'content-type: application/json' \ --data "${JSON_DATA}" ``` ```sql { "api_version": "artifact/v1", "cloud": "AWS", "content_format": "JAR", "kind": "PresignedUrl", "region": "us-east-1", "upload_form_data": { "bucket": "confluent-custom-connectors-prod-us-east-1", "key": "staging/ccp/v1//custom-plugins//plugin.jar", "policy": "ey…", "x-amz-algorithm": "AWS4-HMAC-SHA256", "x-amz-credential": "AS…/20241121/us-east-1/s3/aws4_request", "x-amz-date": "20241121T212232Z", "x-amz-security-token": "IQ…", "x-amz-signature": "52…" }, "upload_id": "", "upload_url": "https://confluent-custom-connectors-prod-us-east-1.s3.dualstack.us-east-1.amazonaws.com/" } ``` ```sql export UPLOAD_ID="" export UPLOAD_URL="" export UPLOAD_BUCKET="" export UPLOAD_KEY="" export UPLOAD_POLICY="" export UPLOAD_KEY="" export X_AMZ_ALGORITHM="" export X_AMZ_CREDENTIAL="" export X_AMZ_DATE="" export X_AMZ_SECURITY_TOKEN="" export X_AMZ_SIGNATURE="" ``` ```sql -F file=@ ``` ```sql Your proposed upload is smaller than the minimum allowed size. ``` ```sql curl -X POST "${UPLOAD_URL}" \ -F "bucket=${UPLOAD_BUCKET}" \ -F "key=${UPLOAD_KEY}" \ -F "policy=${UPLOAD_POLICY}" \ -F "x-amz-algorithm=${X_AMZ_ALGORITHM}" \ -F "x-amz-credential=${X_AMZ_CREDENTIAL}" \ -F "x-amz-date=${X_AMZ_DATE}" \ -F "x-amz-security-token=${X_AMZ_SECURITY_TOKEN}" \ -F "x-amz-signature=${X_AMZ_SIGNATURE}" \ -F file=@/path/to/udf_file.jar ``` ```sql { "cloud": "${CLOUD_PROVIDER}", "region": "${CLOUD_REGION}", "environment": "${ENV_ID}", "display_name": "${ARTIFACT_DISPLAY_NAME}", "class": "${CLASS_NAME}", "content_format": "JAR", "description": "${ARTIFACT_DESCRIPTION}", "documentation_link": "${ARTIFACT_DOC_LINK}", "runtime_language": "JAVA", "upload_source": { "location": "PRESIGNED_URL_LOCATION", "upload_id": "${UPLOAD_ID}" } } ``` ```sql export JSON_DATA="{ \"cloud\": \"${CLOUD_PROVIDER}\", \"region\": \"${CLOUD_REGION}\", \"environment\": \"${ENV_ID}\", \"display_name\": \"${ARTIFACT_DISPLAY_NAME}\", \"class\": \"${CLASS_NAME}\", \"content_format\": \"JAR\", \"description\": \"${ARTIFACT_DESCRIPTION}\", \"documentation_link\": \"${ARTIFACT_DOC_LINK}\", \"runtime_language\": \"JAVA\", \"upload_source\": { \"location\": \"PRESIGNED_URL_LOCATION\", \"upload_id\": \"${UPLOAD_ID}\" } }" ``` ```sql curl --request POST \ --url "https://api.confluent.cloud/artifact/v1/flink-artifacts?cloud=${CLOUD_REGION}®ion=${CLOUD_REGION}&environment=${ENV_ID}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" \ --header 'content-type: application/json' \ --data "${JSON_DATA}" ``` ```sql export ARTIFACT_ID="" # example: cfa-e8rzq7 export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" ``` ```sql curl --request GET \ --url "https://api.confluent.cloud/artifact/v1/flink-artifacts/${ARTIFACT_ID}?cloud=${CLOUD_PROVIDER}®ion=${CLOUD_REGION}&environment=${ENV_ID}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" ``` ```sql { "api_version": "artifact/v1", "class": "default", "cloud": "AWS", "content_format": "JAR", "description": "", "display_name": "udf_example", "documentation_link": "", "environment": "env-z3q9rd", "id": "cfa-e8rzq7", "kind": "FlinkArtifact", "metadata": { "created_at": "2024-11-21T21:52:43.788042Z", "resource_name": "crn://confluent.cloud/organization=/flink-artifact=cfa-e8rzq7", "self": "https://api.confluent.cloud/artifact/v1/flink-artifacts/cfa-e8rzq7", "updated_at": "2024-11-21T21:52:44.625318Z" }, "region": "us-east-1", "runtime_language": "JAVA", "versions": [ { "artifact_id": {}, "is_beta": false, "release_notes": "", "upload_source": { "location": "PRESIGNED_URL_LOCATION", "upload_id": "" }, "version": "ver-xq72dk" } ] } ``` ```sql export ARTIFACT_ID="" # example: cfa-e8rzq7 export ARTIFACT_DISPLAY_NAME="" # example: "my-udf" export ARTIFACT_DESCRIPTION="" # example: "This is a demo UDF." export ARTIFACT_DOC_LINK="" # example: "https://docs.example.com/my-udf", "^$|^(http://|https://)." export CLASS_NAME="" # example: "io.confluent.example.SumScalarFunction" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" ``` ```sql { "cloud": "${CLOUD_PROVIDER}", "region": "${CLOUD_REGION}", "environment": "${ENV_ID}", "display_name": "${ARTIFACT_DISPLAY_NAME}", "content_format": "JAR", "description": "${ARTIFACT_DESCRIPTION}", "documentation_link": "${ARTIFACT_DOC_LINK}", "runtime_language": "JAVA", "versions": [ { "version": "cfa-ver-001", "release_notes": "string", "is_beta": true, "artifact_id": { "cloud": "${CLOUD_PROVIDER}", "region": "${CLOUD_REGION}", "environment": "${ENV_ID}", "display_name": "${ARTIFACT_DISPLAY_NAME}", "class": "${CLASS_NAME}", "content_format": "JAR", "description": "${ARTIFACT_DESCRIPTION}", "documentation_link": "${ARTIFACT_DOC_LINK}", "runtime_language": "JAVA", "versions": [ {} ] }, "upload_source": { "location": "PRESIGNED_URL_LOCATION", "upload_id": "${UPLOAD_ID}" } } ] } ``` ```sql export JSON_DATA="{ \"cloud\": \"${CLOUD_PROVIDER}\", \"region\": \"${CLOUD_REGION}\", \"environment\": \"${ENV_ID}\", \"display_name\": \"${ARTIFACT_DISPLAY_NAME}\", \"content_format\": \"JAR\", \"description\": \"${ARTIFACT_DESCRIPTION}\", \"documentation_link\": \"${ARTIFACT_DOC_LINK}\", \"runtime_language\": \"JAVA\", \"versions\": [ { \"version\": \"cfa-ver-001\", \"release_notes\": \"string\", \"is_beta\": true, \"artifact_id\": { \"cloud\": \"${CLOUD_PROVIDER}\", \"region\": \"${CLOUD_REGION}\", \"environment\": \"${ENV_ID}\", \"display_name\": \"${ARTIFACT_DISPLAY_NAME}\", \"class\": \"${CLASS_NAME}\", \"content_format\": \"JAR\", \"description\": \"${ARTIFACT_DESCRIPTION}\", \"documentation_link\": \"${ARTIFACT_DOC_LINK}\", \"runtime_language\": \"JAVA\", \"versions\": [ {} ] }, \"upload_source\": { \"location\": \"PRESIGNED_URL_LOCATION\", \"upload_id\": \"${UPLOAD_ID}\" } } ] }" ``` ```sql curl --request PATCH \ --url "https://api.confluent.cloud/artifact/v1/flink-artifacts/${ARTIFACT_ID}?cloud=${CLOUD_PROVIDER}®ion=${CLOUD_REGION}&environment=${ENV_ID}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" \ --header 'content-type: application/json' \ --data "${JSON_DATA}" ``` ```sql export ARTIFACT_ID="" # example: cfa-e8rzq7 export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ENV_ID="" # example: "env-z3y2x1" ``` ```sql curl --request DELETE \ --url "https://api.confluent.cloud/artifact/v1/flink-artifacts/${ARTIFACT_ID}?cloud=${CLOUD_PROVIDER}®ion=${CLOUD_REGION}&environment=${ENV_ID}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" ``` ```sql export UDF_LOG_ID="" # example: "ccl-4l5klo" export UDF_LOG_TOPIC_NAME="" # example: "udf_log" export KAFKA_CLUSTER_ID="" # example: "lkc-12345" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export ENV_ID="" # example: "env-z3y2x1" ``` ```sql cat << EOF | curl --silent -X POST -u ${CLOUD_API_KEY}:${CLOUD_API_SECRET} \ -d @- https://api.confluent.cloud/ccl/v1/custom-code-loggings { "cloud":"${CLOUD_PROVIDER}", "region":"${CLOUD_REGION}", "environment": { "id":"${ENV_ID}" }, "destination_settings":{ "kind":"Kafka", "cluster_id":"${KAFKA_CLUSTER_ID}", "topic":"${UDF_LOG_TOPIC_NAME}", "log_level":"info" } } EOF ``` ```sql curl --silent -X GET \ -u ${CLOUD_API_KEY}:${CLOUD_API_SECRET} \ https://api.confluent.cloud/ccl/v1/custom-code-loggings?environment=${ENV_ID} ``` ```sql curl --silent -X DELETE \ -u ${CLOUD_API_KEY}:${CLOUD_API_SECRET} \ https://api.confluent.cloud/ccl/v1/custom-code-loggings/${UDF_LOG_ID}?environment=${ENV_ID} ``` ```sql curl --silent -X GET \ -u ${CLOUD_API_KEY}:${CLOUD_API_SECRET} \ https://api.confluent.cloud/ccl/v1/custom-code-loggings/${UDF_LOG_ID}?environment=${ENV_ID} ``` ```sql cat < option. This enables submitting long-running Flink SQL statements. To create an API key for Flink access by using the Confluent Cloud APIs, you must first create a Cloud API key. To generate the Flink key, you send your Cloud API key and secret in the request header, encoded as a base64 string. Create a Cloud API key for the principal, which is either a service account or your user account. For more information, see Add an API key. Assign the Cloud API key and secret to environment variables that you use in your REST API requests. export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export PRINCIPAL_ID="" # or "" export ENV_REGION_ID="." # example: "env-z3y2x1.aws.us-east-1" The ENV_REGION_ID variable is a concatenation of your environment ID and the cloud provider region of your Kafka cluster, separated by a . character. To see the available regions, run the confluent flink region list command. Run the following command to send a POST request to the api-keys endpoint. The REST API uses basic authentication, which means that you provide a base64-encoded string made from your Cloud API key and secret in the request header. curl --request POST \ --url 'https://api.confluent.cloud/iam/v2/api-keys' \ --header "Authorization: Basic $(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0)" \ --header 'content-type: application/json' \ --data "{"spec":{"display_name":"flinkapikey","owner":{"id":"${PRINCIPAL_ID}"},"resource":{"api_version":"fcpm/v2","id":"${ENV_REGION_ID}"}}}" Your output should resemble: { "api_version": "iam/v2", "id": "KJDYFDMBOBDNQEIU", "kind": "ApiKey", "metadata": { "created_at": "2023-12-15T23:10:20.406556Z", "resource_name": "crn://api.confluent.cloud/organization=b0b21724-4586-4a07-b787-d0bb5aacbf87/user=u-lq1dr3/api-key=KJDYFDMBOBDNQEIU", "self": "https://api.confluent.cloud/iam/v2/api-keys/KJDYFDMBOBDNQEIU", "updated_at": "2023-12-15T23:10:20.406556Z" }, "spec": { "description": "", "display_name": "flinkapikey", "owner": { "api_version": "iam/v2", "id": "u-lq1dr3", "kind": "User", "related": "https://api.confluent.cloud/iam/v2/users/u-lq2dr7", "resource_name": "crn://api.confluent.cloud/organization=b0b21724-4586-4a07-b787-d0bb5aacbf87/user=u-lq2dr7" }, "resource": { "api_version": "fcpm/v2", "id": "env-z3q9rd.aws.us-east-1", "kind": "Region", "related": "https://api.confluent.cloud/fcpm/v2/regions?cloud=aws", "resource_name": "crn://api.confluent.cloud/organization=b0b21724-4586-4a07-b787-d0bb5aacbf87/environment=env-z3q9rd/flink-region=aws.us-east-1" }, "secret": "B0BYFzyd0bb5Q58ZZJJYV52mbwDDHnZx21f0gOTz2k6Qv2V9I4KraVztwFOlQx6z" } } You can use the Confluent Terraform Provider to generate an API key for Flink access. Follow the steps in Sample Project for Confluent Terraform Provider and use the configuration shown in Example Flink API Key. When your API key and secret are generated, save them in environment variables for later use. export FLINK_API_KEY="" export FLINK_API_SECRET="" You can manage the API key by using the Confluent CLI commands. For more information, see confluent api-key . Also, you can use the REST API and Cloud Console. Next steps¶ Flink SQL REST API Related content¶ Manage API Keys Confluent CLI commands with Confluent Cloud for Apache Flink Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql env-abc123.aws.us-east-2 ``` ```sql export PRINCIPAL_ID="" ``` ```sql confluent login ``` ```sql confluent flink region list ``` ```sql Current | Name | Cloud | Region ----------+--------------------------+-------+--------------- | Frankfurt (eu-central-1) | aws | eu-central-1 | Ireland (eu-west-1) | aws | eu-west-1 * | N. Virginia (us-east-1) | aws | us-east-1 | Ohio (us-east-2) | aws | us-east-2 ``` ```sql # Example values for environment variables. export CLOUD_PROVIDER=aws export CLOUD_REGION=us-east-1 export ENV_ID=env-a12b34 # Generate the API key and secret. confluent api-key create \ --resource flink \ --cloud ${CLOUD_PROVIDER} \ --region ${CLOUD_REGION} \ --environment ${ENV_ID} ``` ```sql It may take a couple of minutes for the API key to be ready. Save the API key and secret. The secret is not retrievable later. +------------+------------------------------------------------------------------+ | API Key | ABC1DDN2BNASQVRU | | API Secret | B0b+xCoSPY2pSNETeuyrziWmsPmou0WP9rH0Nxed4y4/msnESzjj7kBrRWGOMu1a | +------------+------------------------------------------------------------------+ ``` ```sql confluent api-key create --resource flink ``` ```sql --service-account ``` ```sql export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export PRINCIPAL_ID="" # or "" export ENV_REGION_ID="." # example: "env-z3y2x1.aws.us-east-1" ``` ```sql confluent flink region list ``` ```sql curl --request POST \ --url 'https://api.confluent.cloud/iam/v2/api-keys' \ --header "Authorization: Basic $(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0)" \ --header 'content-type: application/json' \ --data "{"spec":{"display_name":"flinkapikey","owner":{"id":"${PRINCIPAL_ID}"},"resource":{"api_version":"fcpm/v2","id":"${ENV_REGION_ID}"}}}" ``` ```sql { "api_version": "iam/v2", "id": "KJDYFDMBOBDNQEIU", "kind": "ApiKey", "metadata": { "created_at": "2023-12-15T23:10:20.406556Z", "resource_name": "crn://api.confluent.cloud/organization=b0b21724-4586-4a07-b787-d0bb5aacbf87/user=u-lq1dr3/api-key=KJDYFDMBOBDNQEIU", "self": "https://api.confluent.cloud/iam/v2/api-keys/KJDYFDMBOBDNQEIU", "updated_at": "2023-12-15T23:10:20.406556Z" }, "spec": { "description": "", "display_name": "flinkapikey", "owner": { "api_version": "iam/v2", "id": "u-lq1dr3", "kind": "User", "related": "https://api.confluent.cloud/iam/v2/users/u-lq2dr7", "resource_name": "crn://api.confluent.cloud/organization=b0b21724-4586-4a07-b787-d0bb5aacbf87/user=u-lq2dr7" }, "resource": { "api_version": "fcpm/v2", "id": "env-z3q9rd.aws.us-east-1", "kind": "Region", "related": "https://api.confluent.cloud/fcpm/v2/regions?cloud=aws", "resource_name": "crn://api.confluent.cloud/organization=b0b21724-4586-4a07-b787-d0bb5aacbf87/environment=env-z3q9rd/flink-region=aws.us-east-1" }, "secret": "B0BYFzyd0bb5Q58ZZJJYV52mbwDDHnZx21f0gOTz2k6Qv2V9I4KraVztwFOlQx6z" } } ``` ```sql export FLINK_API_KEY="" export FLINK_API_SECRET="" ``` --- ### Manage Flink Connections in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/operate-and-deploy/manage-connections.html Manage Connections in Confluent Cloud for Apache Flink¶ A connection in Confluent Cloud for Apache Flink® represents an external service that is used in your Flink statements. Connections are used to access external services, such as databases, APIs, and other systems, from your Flink statements. To create a connection, you need the OrganizationAdmin, EnvironmentAdmin, or FlinkAdmin RBAC role. Confluent Cloud for Apache Flink makes a best-effort attempt to redact sensitive values from the CREATE CONNECTION and ALTER CONNECTION statements by masking the values for the known sensitive keys. In Confluent Cloud Console, the sensitive values are redacted in the Flink SQL workspace if you navigate away from the workspace and return, or if you reload the page in the browser. Alternatively, you can use the Confluent CLI commands to create and manage connections. In addition, if syntax in the CREATE CONNECTION statement is incorrect, Confluent Cloud for Apache Flink may not detect the secrets. For example, if you type CREATE CONNECTION my_conn WITH ('ap-key' = 'x'), Flink won’t redact the x, because api-key is misspelled. Note Connection resources are an Open Preview feature in Confluent Cloud. A Preview feature is a Confluent Cloud component that is being introduced to gain early feedback from developers. Preview features can be used for evaluation and non-production testing purposes or to provide feedback to Confluent. The warranty, SLA, and Support Services provisions of your agreement with Confluent do not apply to Preview features. Confluent may discontinue providing preview releases of the Preview features at any time in Confluent’s’ sole discretion. Create a connection¶ Flink SQLConfluent Cloud ConsoleConfluent CLIREST APITerraform In the Confluent Cloud Console or in the Flink SQL shell, run the CREATE CONNECTION statement to create a connection. The following example creates an OpenAI connection with an API key. CREATE CONNECTION `my-connection` WITH ( 'type' = 'OPENAI', 'endpoint' = 'https://.openai.azure.com/openai/deployments//chat/completions?api-version=2025-01-01-preview', 'api-key' = '' ); The following example creates a MongoDB connection with basic authorization. CREATE CONNECTION `my-mongodb-connection` WITH ( 'type' = 'MONGODB', 'endpoint' = 'mongodb+srv://myCluster.mongodb.net/myDatabase', 'username' = '', 'password' = '' ); Run the CREATE TABLE statement to create a table that uses the connection. The following example creates a MongoDB external table that uses the MongoDB connection. -- Use the MongoDB connection to create a MongoDB external table. CREATE TABLE mongodb_movies_full_text_search ( title STRING, plot STRING ) WITH ( 'connector' = 'mongodb', 'mongodb.connection' = 'my-mongodb-connection', 'mongodb.database' = 'sample_mflix', 'mongodb.collection' = 'movies', 'mongodb.index' = 'default' ); In the navigation menu, click Environments, and click the tile for the environment where you’re using Flink SQL. In the navigation menu, click Integrations. Click Connections, then click Add connection. The available services are listed. Click the tile of the service you want to connect to, and click Continue. The Define endpoint and credentials page opens. In the Endpoint textbox, enter the URL for the service you want to connect to. In the following fields, enter your credentials, which may be an API key, a username/password pair, or another type of credential, like a Service Account Key, depending on the service. Click Continue. The Review and launch page opens. In the Cloud provider and Region dropdowns, select the cloud provider and region where your Flink statements run. Important You can access the connection only from a workspace that is in the same region as the connection. Click Create connection. The connection is created and you can use it in your Flink statements. Note You can edit the credentials later, but you can’t change the other properties, like the cloud provider or region. Run the confluent flink connection create command to create a connection. Creating a connection requires the following inputs. Credentials vary by service. export CONNECTION_NAME="" # human-readable name, for example, "azure-openai-connection" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export ENV_ID="" # example: "env-a1b2c3" export CONNECTION_TYPE="" # example: "azureopenai" export ENDPOINT="" # example: "https://.openai.azure.com/openai/deployments//chat/completions?api-version=2025-01-01-preview" export API_KEY="" Run the following command to create a connection in the specified cloud provider and environment. confluent flink connection create ${CONNECTION_NAME} \ --cloud ${CLOUD_PROVIDER} \ --region ${CLOUD_REGION} \ --environment ${ENV_ID} \ --type ${CONNECTION_TYPE} \ --endpoint ${ENDPOINT} \ --api-key ${API_KEY} Your output should resemble: +---------------+------------------------------------+ | Creation Date | 2025-08-13 22:04:57.972969 | | | +0000 UTC | | Name | azure-openai-connection | | Environment | env-a1b2c3 | | Cloud | aws | | Region | us-west-2 | | Type | AZUREOPENAI | | Endpoint | https:// | | Data | | | Status | | +---------------+------------------------------------+ Create a connection in your environment by sending a POST request to the Connections endpoint. Creating a connection requires the following inputs. Credentials vary by service. export CONNECTION_NAME="" # example: "my-openai-connection" export CONNECTION_TYPE="" # example: "OPENAI" export ENDPOINT="" # example: "https://api.openai.com/v1/chat/completions" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ORG_ID="" # example: "b0b21724-4586-4a07-b787-d0bb5aacbf87" export ENV_ID="" # example: "env-a1b2c3" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export JSON_DATA="" The following JSON shows an example payload. The auth_data key varies by service. { "name": "${CONNECTION_NAME}", "spec": { "connection_type": "${CONNECTION_TYPE}", "endpoint": "${ENDPOINT}", "auth_data": { "kind": "PlaintextProvider", "data": "string" } }, "metadata": {} } Quotation mark characters in the JSON string must be escaped, so the payload string to send resembles the following: export JSON_DATA="{ \"name\": \"${CONNECTION_NAME}\", \"spec\": { \"connection_type\": \"${CONNECTION_TYPE}\", \"endpoint\": \"${ENDPOINT}\", \"auth_data\": { \"kind\": \"PlaintextProvider\", \"data\": \"string\" } }, \"metadata\": {} }" The following command sends a POST request to create a connection. curl --request POST \ --url "https://flink.region.provider.confluent.cloud/sql/v1/organizations/${ORG_ID}/environments/${ENV_ID}/connections" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" \ --header 'content-type: application/json' \ --data "${JSON_DATA}" Your output should resemble: Response from a request to create a connection { "api_version": "sql/v1", "kind": "Connection", "metadata": { "self": "https://flink.us-west1.aws.confluent.cloud/sql/v1/organizations/org-abc/environments/env-a1b2c3/connections/my-openai-connection", "resource_name": "", "created_at": "2006-01-02T15:04:05-07:00", "updated_at": "2006-01-02T15:04:05-07:00", "deleted_at": "2006-01-02T15:04:05-07:00", "uid": "12345678-1234-1234-1234-123456789012", "resource_version": "a23av" }, "name": "my-openai-connection", "spec": { "connection_type": "OPENAI", "endpoint": "https://api.openai.com/v1/chat/completions", "auth_data": { "kind": "PlaintextProvider", "data": "string" } }, "status": { "phase": "READY", "detail": "Lookup failed: ai.openai.com" } } } To create a connection by using the Confluent Terraform provider, use the confluent_flink_connection resource. Configure your Terraform file. Provide your Confluent Cloud API key and secret. terraform { required_providers { confluent = { source = "confluentinc/confluent" version = "2.44.0" } } } provider "confluent" { cloud_api_key = var.confluent_cloud_api_key # optionally use CONFLUENT_CLOUD_API_KEY env var cloud_api_secret = var.confluent_cloud_api_secret # optionally use CONFLUENT_CLOUD_API_SECRET env var } Define the confluent_flink_connection resource with the required parameters, like display_name, cloud, region, and the environment ID. resource "confluent_flink_connection" "openai-connection" { organization { id = data.confluent_organization.main.id } environment { id = data.confluent_environment.staging.id } compute_pool { id = confluent_flink_compute_pool.example.id } principal { id = confluent_service_account.app-manager-flink.id } rest_endpoint = data.confluent_flink_region.main.rest_endpoint credentials { key = confluent_api_key.env-admin-flink-api-key.id secret = confluent_api_key.env-admin-flink-api-key.secret } display_name = "connection1" type = "OPENAI" endpoint = "https://api.openai.com/v1/chat/completions" api_key ="API_Key_value" lifecycle { prevent_destroy = true } } Run the terraform apply command to create the resources. terraform apply For more information, see confluent_flink_connection resource. View details for a connection¶ Flink SQLConfluent Cloud ConsoleConfluent CLIREST APITerraformIn the Confluent Cloud Console or in the Flink SQL shell, run the DESCRIBE CONNECTION statement to get details about a connection. DESCRIBE CONNECTION `my-connection`; Your output should resemble: +---------------+------------------------------------+ | Creation Date | 2025-08-13 22:04:57.972969 | | | +0000 UTC | | Name | azure-openai-connection | | Environment | env-a1b2c3 | | Cloud | aws | | Region | us-west-2 | | Type | AZUREOPENAI | | Endpoint | https:// | | Data | | | Status | | +---------------+------------------------------------+ In the navigation menu, click Environments, and click the tile for the environment where you’re using Flink SQL. In the navigation menu, click Integrations. Click Connections. In the listed connections, find the one you want to view. If you have many connections in the list, use the search bar to find the connection. Click the connection name to view the connection details. Run the confluent flink connection describe command to get details about a connection. Describing a connection requires the following inputs: export CONNECTION_NAME="" # example: "azure-openai-connection" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export ENV_ID="" # example: "env-a1b2c3" Run the following command to get details about a connection. confluent flink connection describe ${CONNECTION_NAME} \ --cloud ${CLOUD_PROVIDER} \ --region ${CLOUD_REGION} \ --environment ${ENV_ID} Your output should resemble: +---------------+------------------------------------+ | Creation Date | 2025-08-13 22:04:57.972969 | | | +0000 UTC | | Name | azure-openai-connection | | Environment | env-a1b2c3 | | Cloud | aws | | Region | us-west-2 | | Type | AZUREOPENAI | | Endpoint | https:// | | Data | | | Status | | +---------------+------------------------------------+ Get the details about a connection in your environment by sending a GET request to the Connections endpoint. This request uses your Cloud API key instead of the Flink API key. Getting details about a connection requires the following inputs: export CONNECTION_NAME="" # example: "my-openai-connection" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ORG_ID="" # example: "b0b21724-4586-4a07-b787-d0bb5aacbf87" export ENV_ID="" # example: "env-a1b2c3" Run the following command to get details about the connection specified in the CONNECTION_NAME environment variable. curl --request GET \ --url "https://flink.region.provider.confluent.cloud/sql/v1/organizations/${ORG_ID}/environments/${ENV_ID}/connections/${CONNECTION_NAME}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" Your output should resemble: Response from a request to read a connection { "api_version": "sql/v1", "kind": "Connection", "metadata": { "self": "https://flink.us-west1.aws.confluent.cloud/sql/v1/organizations/org-abc/environments/env-123/connections/my-openai-connection", "resource_name": "", "created_at": "2006-01-02T15:04:05-07:00", "updated_at": "2006-01-02T15:04:05-07:00", "deleted_at": "2006-01-02T15:04:05-07:00", "uid": "12345678-1234-1234-1234-123456789012", "resource_version": "a23av" }, "name": "my-openai-connection", "spec": { "connection_type": "OPENAI", "endpoint": "https://api.openai.com/v1/chat/completions", "auth_data": { "kind": "PlaintextProvider", "data": "string" } }, "status": { "phase": "READY", "detail": "Lookup failed: ai.openai.com" } } To view details for a connection by using the Confluent Terraform provider, use the confluent_flink_connection data source. data "confluent_flink_connection" "existing_connection" { organization { id = "" } environment { id = "" } compute_pool { id = "" } principal { id = "" } rest_endpoint = "" credentials { key = "" secret = "" } display_name = "my_connection" type = "JDBC" } output "connection_endpoint" { value = data.confluent_flink_connection.existing_connection.endpoint } Run the terraform apply or terraform output command. The connection_endpoint output contains details for the connection. To inspect specific attributes after your configuration has been applied, run the terraform output command. terraform output connection_endpoint For more information, see confluent_flink_connection data source. List connections¶ Flink SQLConfluent Cloud ConsoleConfluent CLIREST APITerraformIn the Confluent Cloud Console or in the Flink SQL shell, run the SHOW CONNECTIONS statement to list the connections. SHOW CONNECTIONS; Your output should resemble: Creation Date | Name | Environment | Cloud | Region | Type | Endpoint | Data | Status | Status Detail ---------------------------------+--------------------------+-------------+-------+-----------+-------------+---------------------------------+------------+--------+---------------- 2025-08-13 21:05:15.035376 | azureopenai-connection-2 | env-a1b2c3 | aws | us-west-2 | AZUREOPENAI | https:// | | | +0000 UTC | | | | | | | | | 2025-08-13 22:04:57.972969 | azure-openai-connection | env-a1b2c3 | aws | us-west-2 | AZUREOPENAI | https:// | | | +0000 UTC | | | | | | | | | In the navigation menu, click Environments, and click the tile for the environment where you’re using Flink SQL. In the navigation menu, click Integrations. Click Connections. The available connections are listed. Run the confluent flink connection list command to list connections in the specified environment. Listing connections requires the following inputs: export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export ENV_ID="" # example: "env-a1b2c3" Run the following command to list connections in the specified environment. confluent flink connection list --cloud ${CLOUD_PROVIDER} \ --region ${CLOUD_REGION} \ --environment ${ENV_ID} Your output should resemble: Creation Date | Name | Environment | Cloud | Region | Type | Endpoint | Data | Status | Status Detail ---------------------------------+--------------------------+-------------+-------+-----------+-------------+---------------------------------+------------+--------+---------------- 2025-08-13 21:05:15.035376 | azureopenai-connection-2 | env-a1b2c3 | aws | us-west-2 | AZUREOPENAI | https:// | | | +0000 UTC | | | | | | | | | 2025-08-13 22:04:57.972969 | azure-openai-connection | env-a1b2c3 | aws | us-west-2 | AZUREOPENAI | https:// | | | +0000 UTC | | | | | | | | | List the connections in your environment by sending a GET request to the Connections endpoint. This request uses your Cloud API key instead of the Flink API key. Listing the connections in your environment requires the following inputs: export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ORG_ID="" # example: "b0b21724-4586-4a07-b787-d0bb5aacbf87" export ENV_ID="" # example: "env-a1b2c3" Run the following command to list the connections in your environment. curl --request GET \ --url "https://flink.region.provider.confluent.cloud/sql/v1/organizations/${ORG_ID}/environments/${ENV_ID}/connections" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" Your output should resemble: Response from a request to list connections { "api_version": "sql/v1", "kind": "ConnectionList", "metadata": { "first": "https://flink.us-west1.aws.confluent.cloud/sql/v1/environments/env-abc123/connections", "last": "", "prev": "", "next": "https://flink.us-west1.aws.confluent.cloud/sql/v1/environments/env-abc123/connections?page_token=UvmDWOB1iwfAIBPj6EYb", "total_size": 123, "self": "https://flink.us-west1.aws.confluent.cloud/sql/v1/environments/env-123/connections" }, "data": [ { "api_version": "sql/v1", "kind": "Connection", "metadata": { "self": "https://flink.us-west1.aws.confluent.cloud/sql/v1/organizations/org-abc/environments/env-123/connections/my-openai-connection", "resource_name": "", "created_at": "2006-01-02T15:04:05-07:00", "updated_at": "2006-01-02T15:04:05-07:00", "deleted_at": "2006-01-02T15:04:05-07:00", "uid": "12345678-1234-1234-1234-123456789012", "resource_version": "a23av" }, "name": "my-openai-connection", "spec": { "connection_type": "OPENAI", "endpoint": "https://api.openai.com/v1/chat/completions", "auth_data": { "kind": "PlaintextProvider", "data": "string" } } }, "status": { "phase": "READY", "detail": "Lookup failed: ai.openai.com" } } ] } The Confluent Terraform provider does not support a plural data source or enumeration method that enables you to list all existing connection resources in one operation. To view all connections, you must use Flink SQL, Confluent Cloud Console, the CLI, or the REST API. If you use the Flink SQL REST API, you could integrate the response list into Terraform workflows by scripting an external data source that queries the Flink SQL API, and using an external provider, parses the results and feeds them into Terraform. This is a custom integration, not a supported feature. For more information, see confluent_flink_connection. Update a connection¶ You can update only the credentials for a connection. Flink SQLConfluent Cloud ConsoleConfluent CLIREST APITerraformIn the Confluent Cloud Console or in the Flink SQL shell, run the ALTER CONNECTION statement to update the connection. ALTER CONNECTION `my-connection` SET ('api-key' = ''); Your output should resemble: +---------------+------------------------------------+ | Creation Date | 2025-08-13 22:04:57.972969 | | | +0000 UTC | | Name | azure-openai-connection | | Environment | env-a1b2c3 | | Cloud | aws | | Region | us-west-2 | | Type | AZUREOPENAI | | Endpoint | https:// | | Data | | | Status | | +---------------+------------------------------------+ In the navigation menu, click Environments, and click the tile for the environment where you’re using Flink SQL. In the navigation menu, click Integrations. Click Connections. In the listed connections, find the one you want to update, and click the options icon (⋮). In the context menu, click Edit connection. In the credentials fields, enter the new credentials for the connection. Click Save changes. The connection is updated, and you can use it in your Flink statements. Run the confluent flink connection update command to update a connection. Updating a connection requires the following inputs. Credentials vary by service. export CONNECTION_NAME="" # example: "azure-openai-connection" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export ENV_ID="" # example: "env-a1b2c3" export ENDPOINT="" # example: "https://.openai.azure.com/openai/deployments//chat/completions?api-version=2025-01-01-preview" export NEWAPI_KEY="" Run the following command to update a connection. confluent flink connection update ${CONNECTION_NAME} \ --cloud ${CLOUD_PROVIDER} \ --region ${CLOUD_REGION} \ --environment ${ENV_ID} \ --endpoint ${ENDPOINT} \ --api-key ${NEWAPI_KEY} Your output should resemble: +---------------+------------------------------------+ | Creation Date | 2025-08-13 22:04:57.972969 | | | +0000 UTC | | Name | azure-openai-connection | | Environment | env-a1b2c3 | | Cloud | aws | | Region | us-west-2 | | Type | AZUREOPENAI | | Endpoint | https:// | | Data | | | Status | | +---------------+------------------------------------+ Update a connection in your environment by sending a PATCH request to the Connections endpoint. This request uses your Cloud API key instead of the Flink API key. Updating a connection requires the following inputs: export CONNECTION_NAME="" # example: "my-openai-connection" export CONNECTION_TYPE="" # example: "OPENAI" export ENDPOINT="" # example: "https://api.openai.com/v1/chat/completions" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ORG_ID="" # example: "b0b21724-4586-4a07-b787-d0bb5aacbf87" export ENV_ID="" # example: "env-a1b2c3" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export JSON_DATA="" The following JSON shows an example payload. The auth_data key varies by service. { "name": "${CONNECTION_NAME}", "spec": { "connection_type": "${CONNECTION_TYPE}", "endpoint": "${ENDPOINT}", "auth_data": { "kind": "PlaintextProvider", "data": "string" } }, "metadata": {} } Quotation mark characters in the JSON string must be escaped, so the payload string to send resembles the following: export JSON_DATA="{ \"name\": \"${CONNECTION_NAME}\", \"spec\": { \"connection_type\": \"${CONNECTION_TYPE}\", \"endpoint\": \"${ENDPOINT}\", \"auth_data\": { \"kind\": \"PlaintextProvider\", \"data\": \"string\" } }, \"metadata\": {} }" The following command sends a PUT request to update a connection. curl --request PUT \ --url "https://flink.region.provider.confluent.cloud/sql/v1/organizations/${ORG_ID}/environments/${ENV_ID}/connections/${CONNECTION_NAME}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" \ --header 'content-type: application/json' \ --data "${JSON_DATA}" Your output should resemble: Response from a request to update a connection { "api_version": "sql/v1", "kind": "Connection", "metadata": { "self": "https://flink.us-west1.aws.confluent.cloud/sql/v1/organizations/org-abc/environments/env-a1b2c3/connections/my-openai-connection", "resource_name": "", "created_at": "2006-01-02T15:04:05-07:00", "updated_at": "2006-01-02T15:04:05-07:00", "deleted_at": "2006-01-02T15:04:05-07:00", "uid": "12345678-1234-1234-1234-123456789012", "resource_version": "a23av" }, "name": "my-openai-connection", "spec": { "connection_type": "OPENAI", "endpoint": "https://api.openai.com/v1/chat/completions", "auth_data": { "kind": "PlaintextProvider", "data": "string" } }, "status": { "phase": "READY", "detail": "Lookup failed: ai.openai.com" } } } To update a connection by using the Confluent Terraform provider, use the confluent_flink_connection resource. Find the definition for the connection resource in your Terraform configuration, for example: resource "confluent_flink_connection" "openai-connection" { ... credentials { api_key = confluent_api_key.env-admin-flink-api-key.id } } Modify the attributes of the confluent_flink_connection resource in the Terraform configuration file. The following example updates the api_key attribute. resource "confluent_flink_connection" "openai-connection" { ... credentials { api_key = confluent_api_key.env-admin-flink-api-key.id # Updated value } } Run the terraform apply command to update the connection with the new configuration. terraform apply For more information, see confluent_flink_connection. Delete a connection¶ Flink SQLConfluent Cloud ConsoleConfluent CLIREST APITerraformIn the Confluent Cloud Console or in the Flink SQL shell, run the DROP CONNECTION statement to delete the connection. DROP CONNECTION `my-connection`; In the navigation menu, click Environments, and click the tile for the environment where you’re using Flink SQL. In the navigation menu, click Integrations. Click Connections. In the listed connections, find the one you want to delete, and click the options icon (⋮). In the context menu, click Delete connection. In the dialog, enter the connection name, and click Confirm. The connection is deleted. Run the confluent flink connection delete command to delete a connection. Deleting a connection requires the following inputs: export CONNECTION_NAME="" # example: "azure-openai-connection" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export ENV_ID="" # example: "env-a1b2c3" Run the following command to delete a connection. confluent flink connection delete ${CONNECTION_NAME} \ --cloud ${CLOUD_PROVIDER} \ --region ${CLOUD_REGION} \ --environment ${ENV_ID} Your output should resemble: Deleted Flink connection "azure-openai-connection". Delete a connection in your environment by sending a DELETE request to the Connections endpoint. This request uses your Cloud API key instead of the Flink API key. Deleting a connection requires the following inputs: export CONNECTION_NAME="" # example: "my-openai-connection" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ORG_ID="" # example: "b0b21724-4586-4a07-b787-d0bb5aacbf87" export ENV_ID="" # example: "env-a1b2c3" Run the following command to delete the connection specified in the CONNECTION_NAME environment variable. curl --request DELETE \ --url "https://flink.region.provider.confluent.cloud/sql/v1/organizations/${ORG_ID}/environments/${ENV_ID}/connections/${CONNECTION_NAME}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" To delete a connection by using the Confluent Terraform provider, use the confluent_flink_connection resource. Find the definition for the connection resource in your Terraform configuration and copy the name of the resource. In the following example, the resource name is main. resource "confluent_flink_connection" "main" { display_name = "standard_connection" ... } } To avoid accidental deletions, review the plan before applying the destroy command. terraform plan -destroy -target=confluent_flink_connection.main To delete the connection, run the following command to target the specific resource. This command deletes only the connection and not other resources. terraform apply -destroy -target=confluent_flink_connection.main To remove all resources defined in your Terraform configuration file, including the connection, run the terraform destroy command. terraform destroy For more information, see confluent_flink_connection. Related content¶ Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql CREATE CONNECTION my_conn WITH ('ap-key' = 'x') ``` ```sql CREATE CONNECTION `my-connection` WITH ( 'type' = 'OPENAI', 'endpoint' = 'https://.openai.azure.com/openai/deployments//chat/completions?api-version=2025-01-01-preview', 'api-key' = '' ); ``` ```sql CREATE CONNECTION `my-mongodb-connection` WITH ( 'type' = 'MONGODB', 'endpoint' = 'mongodb+srv://myCluster.mongodb.net/myDatabase', 'username' = '', 'password' = '' ); ``` ```sql -- Use the MongoDB connection to create a MongoDB external table. CREATE TABLE mongodb_movies_full_text_search ( title STRING, plot STRING ) WITH ( 'connector' = 'mongodb', 'mongodb.connection' = 'my-mongodb-connection', 'mongodb.database' = 'sample_mflix', 'mongodb.collection' = 'movies', 'mongodb.index' = 'default' ); ``` ```sql export CONNECTION_NAME="" # human-readable name, for example, "azure-openai-connection" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export ENV_ID="" # example: "env-a1b2c3" export CONNECTION_TYPE="" # example: "azureopenai" export ENDPOINT="" # example: "https://.openai.azure.com/openai/deployments//chat/completions?api-version=2025-01-01-preview" export API_KEY="" ``` ```sql confluent flink connection create ${CONNECTION_NAME} \ --cloud ${CLOUD_PROVIDER} \ --region ${CLOUD_REGION} \ --environment ${ENV_ID} \ --type ${CONNECTION_TYPE} \ --endpoint ${ENDPOINT} \ --api-key ${API_KEY} ``` ```sql +---------------+------------------------------------+ | Creation Date | 2025-08-13 22:04:57.972969 | | | +0000 UTC | | Name | azure-openai-connection | | Environment | env-a1b2c3 | | Cloud | aws | | Region | us-west-2 | | Type | AZUREOPENAI | | Endpoint | https:// | | Data | | | Status | | +---------------+------------------------------------+ ``` ```sql export CONNECTION_NAME="" # example: "my-openai-connection" export CONNECTION_TYPE="" # example: "OPENAI" export ENDPOINT="" # example: "https://api.openai.com/v1/chat/completions" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ORG_ID="" # example: "b0b21724-4586-4a07-b787-d0bb5aacbf87" export ENV_ID="" # example: "env-a1b2c3" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export JSON_DATA="" ``` ```sql { "name": "${CONNECTION_NAME}", "spec": { "connection_type": "${CONNECTION_TYPE}", "endpoint": "${ENDPOINT}", "auth_data": { "kind": "PlaintextProvider", "data": "string" } }, "metadata": {} } ``` ```sql export JSON_DATA="{ \"name\": \"${CONNECTION_NAME}\", \"spec\": { \"connection_type\": \"${CONNECTION_TYPE}\", \"endpoint\": \"${ENDPOINT}\", \"auth_data\": { \"kind\": \"PlaintextProvider\", \"data\": \"string\" } }, \"metadata\": {} }" ``` ```sql curl --request POST \ --url "https://flink.region.provider.confluent.cloud/sql/v1/organizations/${ORG_ID}/environments/${ENV_ID}/connections" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" \ --header 'content-type: application/json' \ --data "${JSON_DATA}" ``` ```sql { "api_version": "sql/v1", "kind": "Connection", "metadata": { "self": "https://flink.us-west1.aws.confluent.cloud/sql/v1/organizations/org-abc/environments/env-a1b2c3/connections/my-openai-connection", "resource_name": "", "created_at": "2006-01-02T15:04:05-07:00", "updated_at": "2006-01-02T15:04:05-07:00", "deleted_at": "2006-01-02T15:04:05-07:00", "uid": "12345678-1234-1234-1234-123456789012", "resource_version": "a23av" }, "name": "my-openai-connection", "spec": { "connection_type": "OPENAI", "endpoint": "https://api.openai.com/v1/chat/completions", "auth_data": { "kind": "PlaintextProvider", "data": "string" } }, "status": { "phase": "READY", "detail": "Lookup failed: ai.openai.com" } } } ``` ```sql terraform { required_providers { confluent = { source = "confluentinc/confluent" version = "2.44.0" } } } provider "confluent" { cloud_api_key = var.confluent_cloud_api_key # optionally use CONFLUENT_CLOUD_API_KEY env var cloud_api_secret = var.confluent_cloud_api_secret # optionally use CONFLUENT_CLOUD_API_SECRET env var } ``` ```sql confluent_flink_connection ``` ```sql display_name ``` ```sql resource "confluent_flink_connection" "openai-connection" { organization { id = data.confluent_organization.main.id } environment { id = data.confluent_environment.staging.id } compute_pool { id = confluent_flink_compute_pool.example.id } principal { id = confluent_service_account.app-manager-flink.id } rest_endpoint = data.confluent_flink_region.main.rest_endpoint credentials { key = confluent_api_key.env-admin-flink-api-key.id secret = confluent_api_key.env-admin-flink-api-key.secret } display_name = "connection1" type = "OPENAI" endpoint = "https://api.openai.com/v1/chat/completions" api_key ="API_Key_value" lifecycle { prevent_destroy = true } } ``` ```sql terraform apply ``` ```sql terraform apply ``` ```sql DESCRIBE CONNECTION `my-connection`; ``` ```sql +---------------+------------------------------------+ | Creation Date | 2025-08-13 22:04:57.972969 | | | +0000 UTC | | Name | azure-openai-connection | | Environment | env-a1b2c3 | | Cloud | aws | | Region | us-west-2 | | Type | AZUREOPENAI | | Endpoint | https:// | | Data | | | Status | | +---------------+------------------------------------+ ``` ```sql export CONNECTION_NAME="" # example: "azure-openai-connection" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export ENV_ID="" # example: "env-a1b2c3" ``` ```sql confluent flink connection describe ${CONNECTION_NAME} \ --cloud ${CLOUD_PROVIDER} \ --region ${CLOUD_REGION} \ --environment ${ENV_ID} ``` ```sql +---------------+------------------------------------+ | Creation Date | 2025-08-13 22:04:57.972969 | | | +0000 UTC | | Name | azure-openai-connection | | Environment | env-a1b2c3 | | Cloud | aws | | Region | us-west-2 | | Type | AZUREOPENAI | | Endpoint | https:// | | Data | | | Status | | +---------------+------------------------------------+ ``` ```sql export CONNECTION_NAME="" # example: "my-openai-connection" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ORG_ID="" # example: "b0b21724-4586-4a07-b787-d0bb5aacbf87" export ENV_ID="" # example: "env-a1b2c3" ``` ```sql curl --request GET \ --url "https://flink.region.provider.confluent.cloud/sql/v1/organizations/${ORG_ID}/environments/${ENV_ID}/connections/${CONNECTION_NAME}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" ``` ```sql { "api_version": "sql/v1", "kind": "Connection", "metadata": { "self": "https://flink.us-west1.aws.confluent.cloud/sql/v1/organizations/org-abc/environments/env-123/connections/my-openai-connection", "resource_name": "", "created_at": "2006-01-02T15:04:05-07:00", "updated_at": "2006-01-02T15:04:05-07:00", "deleted_at": "2006-01-02T15:04:05-07:00", "uid": "12345678-1234-1234-1234-123456789012", "resource_version": "a23av" }, "name": "my-openai-connection", "spec": { "connection_type": "OPENAI", "endpoint": "https://api.openai.com/v1/chat/completions", "auth_data": { "kind": "PlaintextProvider", "data": "string" } }, "status": { "phase": "READY", "detail": "Lookup failed: ai.openai.com" } } ``` ```sql data "confluent_flink_connection" "existing_connection" { organization { id = "" } environment { id = "" } compute_pool { id = "" } principal { id = "" } rest_endpoint = "" credentials { key = "" secret = "" } display_name = "my_connection" type = "JDBC" } output "connection_endpoint" { value = data.confluent_flink_connection.existing_connection.endpoint } ``` ```sql terraform apply ``` ```sql terraform output ``` ```sql connection_endpoint ``` ```sql terraform output ``` ```sql terraform output connection_endpoint ``` ```sql SHOW CONNECTIONS; ``` ```sql Creation Date | Name | Environment | Cloud | Region | Type | Endpoint | Data | Status | Status Detail ---------------------------------+--------------------------+-------------+-------+-----------+-------------+---------------------------------+------------+--------+---------------- 2025-08-13 21:05:15.035376 | azureopenai-connection-2 | env-a1b2c3 | aws | us-west-2 | AZUREOPENAI | https:// | | | +0000 UTC | | | | | | | | | 2025-08-13 22:04:57.972969 | azure-openai-connection | env-a1b2c3 | aws | us-west-2 | AZUREOPENAI | https:// | | | +0000 UTC | | | | | | | | | ``` ```sql export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export ENV_ID="" # example: "env-a1b2c3" ``` ```sql confluent flink connection list --cloud ${CLOUD_PROVIDER} \ --region ${CLOUD_REGION} \ --environment ${ENV_ID} ``` ```sql Creation Date | Name | Environment | Cloud | Region | Type | Endpoint | Data | Status | Status Detail ---------------------------------+--------------------------+-------------+-------+-----------+-------------+---------------------------------+------------+--------+---------------- 2025-08-13 21:05:15.035376 | azureopenai-connection-2 | env-a1b2c3 | aws | us-west-2 | AZUREOPENAI | https:// | | | +0000 UTC | | | | | | | | | 2025-08-13 22:04:57.972969 | azure-openai-connection | env-a1b2c3 | aws | us-west-2 | AZUREOPENAI | https:// | | | +0000 UTC | | | | | | | | | ``` ```sql export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ORG_ID="" # example: "b0b21724-4586-4a07-b787-d0bb5aacbf87" export ENV_ID="" # example: "env-a1b2c3" ``` ```sql curl --request GET \ --url "https://flink.region.provider.confluent.cloud/sql/v1/organizations/${ORG_ID}/environments/${ENV_ID}/connections" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" ``` ```sql { "api_version": "sql/v1", "kind": "ConnectionList", "metadata": { "first": "https://flink.us-west1.aws.confluent.cloud/sql/v1/environments/env-abc123/connections", "last": "", "prev": "", "next": "https://flink.us-west1.aws.confluent.cloud/sql/v1/environments/env-abc123/connections?page_token=UvmDWOB1iwfAIBPj6EYb", "total_size": 123, "self": "https://flink.us-west1.aws.confluent.cloud/sql/v1/environments/env-123/connections" }, "data": [ { "api_version": "sql/v1", "kind": "Connection", "metadata": { "self": "https://flink.us-west1.aws.confluent.cloud/sql/v1/organizations/org-abc/environments/env-123/connections/my-openai-connection", "resource_name": "", "created_at": "2006-01-02T15:04:05-07:00", "updated_at": "2006-01-02T15:04:05-07:00", "deleted_at": "2006-01-02T15:04:05-07:00", "uid": "12345678-1234-1234-1234-123456789012", "resource_version": "a23av" }, "name": "my-openai-connection", "spec": { "connection_type": "OPENAI", "endpoint": "https://api.openai.com/v1/chat/completions", "auth_data": { "kind": "PlaintextProvider", "data": "string" } } }, "status": { "phase": "READY", "detail": "Lookup failed: ai.openai.com" } } ] } ``` ```sql ALTER CONNECTION `my-connection` SET ('api-key' = ''); ``` ```sql +---------------+------------------------------------+ | Creation Date | 2025-08-13 22:04:57.972969 | | | +0000 UTC | | Name | azure-openai-connection | | Environment | env-a1b2c3 | | Cloud | aws | | Region | us-west-2 | | Type | AZUREOPENAI | | Endpoint | https:// | | Data | | | Status | | +---------------+------------------------------------+ ``` ```sql export CONNECTION_NAME="" # example: "azure-openai-connection" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export ENV_ID="" # example: "env-a1b2c3" export ENDPOINT="" # example: "https://.openai.azure.com/openai/deployments//chat/completions?api-version=2025-01-01-preview" export NEWAPI_KEY="" ``` ```sql confluent flink connection update ${CONNECTION_NAME} \ --cloud ${CLOUD_PROVIDER} \ --region ${CLOUD_REGION} \ --environment ${ENV_ID} \ --endpoint ${ENDPOINT} \ --api-key ${NEWAPI_KEY} ``` ```sql +---------------+------------------------------------+ | Creation Date | 2025-08-13 22:04:57.972969 | | | +0000 UTC | | Name | azure-openai-connection | | Environment | env-a1b2c3 | | Cloud | aws | | Region | us-west-2 | | Type | AZUREOPENAI | | Endpoint | https:// | | Data | | | Status | | +---------------+------------------------------------+ ``` ```sql export CONNECTION_NAME="" # example: "my-openai-connection" export CONNECTION_TYPE="" # example: "OPENAI" export ENDPOINT="" # example: "https://api.openai.com/v1/chat/completions" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ORG_ID="" # example: "b0b21724-4586-4a07-b787-d0bb5aacbf87" export ENV_ID="" # example: "env-a1b2c3" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export JSON_DATA="" ``` ```sql { "name": "${CONNECTION_NAME}", "spec": { "connection_type": "${CONNECTION_TYPE}", "endpoint": "${ENDPOINT}", "auth_data": { "kind": "PlaintextProvider", "data": "string" } }, "metadata": {} } ``` ```sql export JSON_DATA="{ \"name\": \"${CONNECTION_NAME}\", \"spec\": { \"connection_type\": \"${CONNECTION_TYPE}\", \"endpoint\": \"${ENDPOINT}\", \"auth_data\": { \"kind\": \"PlaintextProvider\", \"data\": \"string\" } }, \"metadata\": {} }" ``` ```sql curl --request PUT \ --url "https://flink.region.provider.confluent.cloud/sql/v1/organizations/${ORG_ID}/environments/${ENV_ID}/connections/${CONNECTION_NAME}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" \ --header 'content-type: application/json' \ --data "${JSON_DATA}" ``` ```sql { "api_version": "sql/v1", "kind": "Connection", "metadata": { "self": "https://flink.us-west1.aws.confluent.cloud/sql/v1/organizations/org-abc/environments/env-a1b2c3/connections/my-openai-connection", "resource_name": "", "created_at": "2006-01-02T15:04:05-07:00", "updated_at": "2006-01-02T15:04:05-07:00", "deleted_at": "2006-01-02T15:04:05-07:00", "uid": "12345678-1234-1234-1234-123456789012", "resource_version": "a23av" }, "name": "my-openai-connection", "spec": { "connection_type": "OPENAI", "endpoint": "https://api.openai.com/v1/chat/completions", "auth_data": { "kind": "PlaintextProvider", "data": "string" } }, "status": { "phase": "READY", "detail": "Lookup failed: ai.openai.com" } } } ``` ```sql resource "confluent_flink_connection" "openai-connection" { ... credentials { api_key = confluent_api_key.env-admin-flink-api-key.id } } ``` ```sql confluent_flink_connection ``` ```sql resource "confluent_flink_connection" "openai-connection" { ... credentials { api_key = confluent_api_key.env-admin-flink-api-key.id # Updated value } } ``` ```sql terraform apply ``` ```sql terraform apply ``` ```sql DROP CONNECTION `my-connection`; ``` ```sql export CONNECTION_NAME="" # example: "azure-openai-connection" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export ENV_ID="" # example: "env-a1b2c3" ``` ```sql confluent flink connection delete ${CONNECTION_NAME} \ --cloud ${CLOUD_PROVIDER} \ --region ${CLOUD_REGION} \ --environment ${ENV_ID} ``` ```sql Deleted Flink connection "azure-openai-connection". ``` ```sql export CONNECTION_NAME="" # example: "my-openai-connection" export CLOUD_API_KEY="" export CLOUD_API_SECRET="" export BASE64_CLOUD_KEY_AND_SECRET=$(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0) export ORG_ID="" # example: "b0b21724-4586-4a07-b787-d0bb5aacbf87" export ENV_ID="" # example: "env-a1b2c3" ``` ```sql curl --request DELETE \ --url "https://flink.region.provider.confluent.cloud/sql/v1/organizations/${ORG_ID}/environments/${ENV_ID}/connections/${CONNECTION_NAME}" \ --header "Authorization: Basic ${BASE64_CLOUD_KEY_AND_SECRET}" ``` ```sql resource "confluent_flink_connection" "main" { display_name = "standard_connection" ... } } ``` ```sql terraform plan -destroy -target=confluent_flink_connection.main ``` ```sql terraform apply -destroy -target=confluent_flink_connection.main ``` ```sql terraform destroy ``` ```sql terraform destroy ``` --- ### Monitor and Manage Flink SQL Statements in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/operate-and-deploy/monitor-statements.html Monitor and Manage Flink SQL Statements in Confluent Cloud for Apache Flink¶ You start a stream-processing app on Confluent Cloud for Apache Flink® by running a SQL statement. Once a statement is running, you can monitor its progress by using the Confluent Cloud Console. Also, you can set up integrations with monitoring services like Prometheus and Datadog. View and monitor statements in Cloud Console¶ Cloud Console shows details about your statements on the Flink page. If you don’t have running statements currently, run a SQL query like INSERT INTO FROM SELECT in the Flink SQL shell or in a workspace. Log in to the Confluent Cloud Console. Navigate to the Environments page. Click the tile that has the environment where your Flink compute pools are provisioned. Click Flink, and in the Flink page, click Flink statements. The Statements list opens. You can use the Filter options on the page to identify the statements you want to view. The following information is available in the Flink statements table to help you monitor your statements. Field Description Flink Statement Name The name of the statement. The name is populated automatically when a statement is submitted. You can set the name by using the SET command. Status The statement status Represents what is currently happening with the statement. These are the status values: Pending: The statement has been submitted and Flink is preparing to start running the statement. Running: Flink is actively running the statement. Completed: The statement has completed all of its work. Deleting: The statement is being deleted. Failed: The statement has encountered an error and is no longer running. Degraded: The statement appears unhealthy, for example, no transactions have been committed for a long time, or the statement has frequently restarted recently. Stopping: The statement is about to be stopped. Stopped: The statement has been stopped and is no longer running. Statement Type The type of SQL function that is used in the statement. Statement CFU The number of CFUs that the statement is consuming. State size (GB) The size of the state used by the statement, in gigabytes. Created Indicates when the statement started running. If you stop and resume the statement, the Created date shows the date when the statement was first submitted. Messages Behind The Consumer Lag of the statement. You are also shown an indicator of whether the back pressure is increasing, decreasing, or if the back pressure is being maintained at a stable rate. Ideally, the Messages Behind metric should be as close to zero as possible. A low, close-to-zero consumer lag is the best indicator that your statement is running smoothly and keeping up with all of its inputs. A growing consumer lag indicates there is a problem. Messages in The count of Messages in per minute which represents the rate at which records are read. You also have a watermark for the messages read. The watermark displayed in the Flink statements table is the minimum watermark from the source(s) in the query. Messages out The count of Messages out per minute which represents the rate at which records are written. You also have a watermark for the messages written. The watermark displayed in the Flink statements table is the minimum watermark from the sink(s) in the query. Account The name of the user account or service account the statement is running with. When you click on a particular statement a detailed side panel opens up. The panel provides detailed information on the statement at a more granular level, showing how messages are being read from sources and written to sinks. The watermarks for each individual source and sink table are shown in this panel along with the statement’s catalog, database, local time zone, and Scaling status . The SQL Content section shows the code used to generate the statement. The panel also contains visual interactive graphs of statement’s performance over time. There are charts for # Messages behind, Messages in per minute, and Messages out per minute. Manage statements in Cloud Console¶ Cloud Console gives you actions to manage your statements on the Flink page. In the statement list, click the checkbox next to one of your statements to select it. Click Actions. A menu opens, showing options for managing the statement’s status. You can select Stop statement, Resume statement, or Delete statement. Flink metrics integrations¶ Confluent Cloud for Apache Flink supports metrics integrations with services like Prometheus and Datadog. If you don’t have running statements currently, run a SQL query like INSERT INTO FROM SELECT in the Flink SQL shell or in a workspace. Log in to the Confluent Cloud Console. Open the Administration menu () and select Metrics to open the Metrics integration page. In the Explore available metrics section, click the Metric dropdown. Scroll until you find the Flink compute pool and Flink statement metrics, for example, Messages behind. This list doesn’t include all available metrics. For a full list of available metrics, see Metrics API Reference. Click the Resource dropdown and select the corresponding compute pool or statement that you want to monitor. A graph showing the most recent data for your selected Flink metric displays. Click New integration to export your metrics to a monitoring service. For more information, see Integrate with third-party monitoring tools. For an introductory example of setting up monitoring with Grafana and Prometheus, see the Flink Monitoring repository. Error handling and recovery¶ When errors occur during the runtime of a statement, Confluent Cloud for Apache Flink handles them differently depending on the type of error: Statement failures: When a statement encounters an error that prevents it from continuing, it moves to the FAILED state. FAILED statements do not consume any CFUs. You’ll see an error message in the statement details explaining what went wrong. Common causes of statement failures include: Data format issues (deserialization errors) Query logic problems (division by zero, invalid operations) Missing or inaccessible topics Insufficient permissions For deserialization errors, you can use custom error handling rules to skip problematic records or send them to a dead letter queue instead of failing the entire statement. FAILED statements can be resumed, but you must fix the underlying issue first to prevent the statement from failing again immediately. For more information on evolving statements, see Schema and Statement Evolution. Statement degradation: When a statement encounters issues but could continue running, it may enter the DEGRADED state. For more information, see Degraded statements Degraded statements¶ When a statement enters the DEGRADED state, it means the statement is unable to make consistent progress. There are two scenarios that can cause this: Query-related issues: When the degradation is caused by inefficient query logic or insufficient compute resources, you’ll see an error message like: Your |af| statement has entered a Degraded state because it is unable to make consistent progress. This can be caused by inefficient query logic or insufficient compute resources. Please review your statement for performance bottlenecks. If the issue persists, consider scaling your compute pool or contacting Confluent support for assistance. System-related issues: When the degradation is caused by an unknown or internal system error, you’ll see this error message: An internal system error has been detected that requires attention from our engineering team. We are actively working to resolve this issue. No action is required on your part at this time. If the issue persists, please contact Confluent support for further assistance. DEGRADED statements will continue to consume CFUs. For query-related issues, see Resolve Common Statement Problems for a troubleshooting guide. Custom error handling rules¶ Confluent Cloud for Apache Flink supports custom error handling for deserialization errors using the error-handling.mode table property. You can choose to fail, ignore, or log problematic records to a Dead Letter Queue (DLQ). When set to log, errors are sent to a DLQ table. Notifications¶ Confluent Cloud for Apache Flink integrates with Notifications for Confluent Cloud. The following notifications are available for Flink statements. They apply only to background Data Manipulation Language (DML) statements like INSERT INTO, EXECUTE STATEMENT SET, or CREATE TABLE AS. Statement failure: This notification is triggered when a statement transitions from RUNNING to FAILED. A statement transitions to FAILED on exceptions that Confluent classifies as USER, as opposed to SYSTEM exceptions. Statement degraded: This notification triggered when a statement transitions from RUNNING to DEGRADED. Statement stuck in pending: This notification is triggered when a newly submitted statement stays in PENDING for a long time. The time period for a statement to be considered stuck in the PENDING state depends on the cloud provider that’s running your Flink statements: AWS: 10 minutes Azure: 30 minutes Google Cloud: 10 minutes Statement auto-stopped: This notification is triggered when a statement moves into STOPPED because the compute pool it is using was deleted by a user. Best practices for alerting¶ Use the Metrics API and Notifications for Confluent Cloud to monitor your compute pools and statements over time. You should monitor and configure alerts for the following conditions: Per compute pool Alert on exhausted compute pools by comparing the current CFUs (io.confluent.flink/compute_pool_utilization/current_cfus) to the maximum CFUs of the pool (io.confluent.flink/compute_pool_utilization/cfu_limit). Flink statement stuck in pending notifications also indicate compute-pool exhaustion. Per statement Alert on statement failures (see Notifications) Alert on Statement degradation (see Notifications) Alert on a increase of “Messages Behind”/”Consumer Lag” (metric name: io.confluent.flink/pending_records) over an extended period of time, for example > 10 minutes; your mileage may vary. Note that Confluent Cloud for Apache Flink does not appear as a consumer in the regular consumer lag monitoring feature in Confluent Cloud, because it uses the assign() method. (Optional) Alert on an increase of the difference between the output (io.confluent.flink/current_output_watermark_ms) and input watermark (io.confluent.flink/current_input_watermark_ms). The input watermark corresponds to the time up to which the input data is complete, and the output watermark corresponds to the time up to which the output data is complete. This difference can be considered as a measure of the amount of data that’s currently “in-flight”. Depending on the logic of the statement, different patterns are expected. For example, for a tumbling event-time window, expect an increasing difference until the window is fired, at which point the difference drops to zero and starts increasing again. Statement logging¶ Confluent Cloud for Apache Flink supports event logging for statements in Confluent Cloud Console. The following screenshot shows the event log for a statement that failed due to a division by zero error. The event log is available in the Logs tab of the statement details page. The statement event log page provides logs for the following events: Changes of lifecycle, for example, PENDING or RUNNING. For more information, see Statement lifecycle. Scaling status changes, for example, OK or Pending Scale Up. For more information, see Scaling status. Autopilot scaling decisions, for example, Autopilot is requesting to scale the statement to [New CFU Value] CFUs. or Autopilot is unable to scale up the statement because the compute pool’s CFU limit has been reached. Errors and warnings. The Cloud Console enables the following operations: Search: Search for specific log messages. Wildcards are supported. Time range: Select the time range for the log events. Log level: Filter logs events by severity: Error, Warning, Info. Chart: View the log events in a chart. Download: Download log events as a CSV or JSON file. UDF logging¶ Log messages from user-defined functions (UDFs) are also shown in the statement log page. For more information, see Log Debug Messages in UDFs. Related content¶ Video: How to work with a paused stream Statements Queries Flink SQL Shell Quick Start Flink SQL Shell Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql Your |af| statement has entered a Degraded state because it is unable to make consistent progress. This can be caused by inefficient query logic or insufficient compute resources. Please review your statement for performance bottlenecks. If the issue persists, consider scaling your compute pool or contacting Confluent support for assistance. ``` ```sql An internal system error has been detected that requires attention from our engineering team. We are actively working to resolve this issue. No action is required on your part at this time. If the issue persists, please contact Confluent support for further assistance. ``` ```sql io.confluent.flink/compute_pool_utilization/current_cfus ``` ```sql io.confluent.flink/compute_pool_utilization/cfu_limit ``` ```sql io.confluent.flink/pending_records ``` ```sql io.confluent.flink/current_output_watermark_ms ``` ```sql io.confluent.flink/current_input_watermark_ms ``` --- ### Operate and Deploy Flink SQL Statements with Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/operate-and-deploy/overview.html Operate and Deploy Flink Statements with Confluent Cloud for Apache Flink¶ Confluent provides tools for operating Confluent Cloud for Apache Flink® in the Cloud Console, the Confluent CLI, the Confluent Terraform Provider, and the REST API: Deploy a Statement Billing Monitor Statements with Cloud Console CLI commands Terraform resources REST API RBAC Flink API Keys Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. --- ### Enable Private Networking with Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/operate-and-deploy/private-networking.html Enable Private Networking with Confluent Cloud for Apache Flink¶ You have these options for using private networking with Confluent Cloud for Apache Flink®. PrivateLink Attachment: Works with any type of cluster and is available on AWS, Azure, and Google Cloud. For more information, see Supported Cloud Regions. Existing or new Confluent Cloud network (CCN): Available on AWS and Azure. To create a new Confluent Cloud network, follow the steps in Create Confluent Cloud Network on AWS. For more information, see Private Networking with Confluent Cloud for Apache Flink. Enable private networking with Confluent Cloud Network¶ If you already have a Confluent Cloud Network (CCN) created and configured, which is usually the case when you have any Dedicated cluster, you can use this network directly to connect to Flink. No setup, or minimum setup, is required to configure Flink, because you can reuse connectivity to existing Private Endpoints, Peering, or Transit Gateway. To access Flink from your local client, follow these steps. Prerequisites¶ Access to Confluent Cloud. The OrganizationAdmin, EnvironmentAdmin, or NetworkAdmin role to enable Flink private networking for an environment. Configure DNS resolution¶ Ensure your VPC is configured to route your unique Flink endpoint to Confluent Cloud. Have a client that is running within the VPC, or a proxy that reroutes your client to the VPC. For more information, see Use the Confluent Cloud Console with Private Networking. If you already configured 1 and 2 for Apache Kafka® you may not need any changes. For public DNS resolution with endpoints that resemble flink-...private.confluent.cloud: if your local machine was already configured to access Kafka, no additional setup is necessary. With PrivateLink only: For private DNS resolution with endpoints that resemble flink....private.confluent.cloud, if routing is using *....private.confluent.cloud, no additional setup is necessary, but if your routing is using a more specific URL, you must add the Flink endpoint to your routing rules. Note that if you use a reverse proxy with a custom route added to your local host file, you must add the Flink endpoint to your host file. Routing to flinkpls...confluent.cloud is necessary to enable auto-completion and error highlighting in the Flink SQL shell and Confluent Cloud Console. Enable private networking with PrivateLink Attachment¶ Private networking with PrivateLink Attachment works with any type of cluster and is available on AWS and Azure. Prerequisites¶ Access to Confluent Cloud. The OrganizationAdmin, EnvironmentAdmin, or NetworkAdmin role to enable Flink private networking for an environment. A VPC in AWS, a VNet in Azure, or a VPC in Google Cloud. Overview¶ In this walkthrough, you perform the following steps. Set up a PrivateLink attachment Create a PrivateLink Attachment. Create a private endpoint. For AWS, create a VPC Interface Endpoint to the PrivateLink Attachment. For Azure, create a private endpoint that’s associated with the PrivateLink Attachment. For Google Cloud, create a private endpoint that’s associated with the PrivateLink Attachment. Create a PrivateLink Attachment Connection. Set up DNS resolution. Connect to the private network: If your client is not in the VPC or VNet, enable the Cloud Console or Confluent CLI to connect to your private network. When the previous steps are completed, you can use Flink over your private network from the Confluent Cloud Console or Confluent CLI. The experience is the same as with public networking. Step 1: Set up a PrivateLink Attachment and connection¶ In AWS, Azure, or Google Cloud, follow these steps to create a PrivateLink Attachment, a private endpoint, a PrivateLink Attachment Connection, and set up a DNS resolution. AWSAzureGoogle Cloud In Confluent Cloud, create a PrivateLinkAttachment. In AWS, create a VPC Interface Endpoint to the PrivateLinkAttachment service. In Confluent Cloud, create a PrivateLinkAttachmentConnection. Set up a DNS resolution. In Confluent Cloud, create a PrivateLinkAttachment. In Azure, create a private endpoint. In Confluent Cloud, create a PrivateLinkAttachmentConnection. Set up a DNS resolution. In Confluent Cloud, create a PrivateLinkAttachment. PrivateLink Attachments are powered by Private Service Connect. In Google Cloud, create a Private Service Connect endpoint to the service attachment URI you get in Step 1. If you use the Confluent Cloud Console for configuration, this step is merged into the next step and shows up as the first and second steps in access point creation. In Confluent Cloud, create a PrivateLink Attachment Connection for the Private Service Connect endpoint you created. A PrivateLink Attachment Connection is required for each Private Service Connect endpoint. Set up a DNS resolution. Step 2: Connect to the network with Cloud Console or Confluent CLI¶ If your client is not in the VPC or VNet, enable the Confluent Cloud Console or Confluent CLI to connect to your private network. If you don’t connect from a machine in the VPC or VNet, you see the following error. To connect to Confluent Cloud with your PrivateLink Attachment, see Use Confluent Cloud with Private Networking. One way to connect is to set up a reverse proxy. Create an EC2 instance. Connect to the instance with SSH. Install NGINX. Configure Routing Table. Set up DNS resolution: point to the Flink regional endpoints you use, as described in Step 6 of Configure a proxy. will resemble flink...private.confluent.cloud, for example: flink.us-east-2.aws.private.confluent.cloud. Find the DNS part of the PrivateLink Attachment by navigating to your environment’s Network management page and finding the DNS domain setting. You can find the full list of supported Flink regions by using the Regions endpoint API. Once networking is set up in Cloud Console, the interface uses the correct endpoint automatically, either public or private, based on the presence of a PrivateLink Attachment. If the connection is private, access to the Flink private network works transparently. Related content¶ Use Confluent Cloud with Private Networking Flink Compute Pools Billing on Confluent Cloud for Apache Flink Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql flink-...private.confluent.cloud ``` ```sql flink....private.confluent.cloud ``` ```sql *....private.confluent.cloud ``` ```sql flinkpls...confluent.cloud ``` ```sql ``` ```sql ``` ```sql flink...private.confluent.cloud ``` ```sql flink.us-east-2.aws.private.confluent.cloud ``` --- ### Flink SQL Query Profiler in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/operate-and-deploy/query-profiler.html Flink SQL Query Profiler in Confluent Cloud for Apache Flink¶ The Query Profiler is a tool in Confluent Cloud for Apache Flink® that provides enhanced visibility into how a Flink SQL statement is processing data, which enables rapid identification of bottlenecks, data skew issues, and other performance concerns. To use the Query Profiler, see Profile a Query. The Query Profiler is a dynamic, real-time visual dashboard that provides insights into the computations performed by your Flink SQL statements. It boosts observability, enabling you to monitor running statements and diagnose performance issues during execution. The Query Profiler presents key metrics and visual representations of the performance and behavior of individual tasks, subtasks, and operators within a statement. Query Profiler is available in the Confluent Cloud Console. Key features of the Query Profiler include: Monitor in real time: Track the live performance of your Flink SQL statements, enabling you to react quickly to emerging issues. View detailed metrics: The profiler provides a breakdown of performance metrics at various levels, including statement, task, operator, and partition levels, which helps you understand how different components of a Flink SQL job are performing. Visualize data flow: The profiler visualizes data flow as a job graph, showing how data is processed through different tasks and operators. This helps you identify operators experiencing high latency, large amounts of state, or workload imbalances. Reduce manual analysis: By offering immediate visibility into performance data, the profiler reduces the need for extensive manual logging and analysis, which can consume significant developer time. This enables you to focus on optimizing your queries and improving performance. The Query Profiler helps you manage the complexities of stream processing applications and optimize query performance in real time. Available metrics¶ The Query Profiler provides the following metrics for the tasks in your Flink statements. Metric Definition Backpressure Percentage of time a task is regulating data flow to match processing speed by reducing pending events. Busyness The percentage of time a task is actively processing data. If a task has multiple subtasks running in parallel, Query Profiler shows the highest busyness value seen among them. Note that idleness and busyness will not always add up to 100%. Bytes in/min Amount of data received by a task per minute. Bytes out/min Amount of data sent by a task per minute. Idleness The percentage of time a task is not actively processing data. If a task has multiple subtasks running in parallel, Query Profiler shows the highest idleness value seen among them. Note that idleness and busyness do not always add up to 100%. Messages in/min Number of events the task receives per minute. Messages out/min Number of events the task sends out per minute. State size Amount of data stored by the task during processing to track information across events. Watermark Timestamp Flink uses to track event time progress and handle out-of-order events. The Query Profiler provides the following metrics for the operators in your Flink statements. Metric Definition Messages in/min Number of events the operator receives per minute. Messages out/min Number of events the operator sends out per minute. State size Amount of data stored by the operator during processing to track information across events. Watermark Timestamp Flink uses to track event time progress and handle out-of-order events. The Query Profiler provides the following metrics for the Kafka partitions in your data source(s). Metric Definition Active Percentage of time the partition is active. An active partition processes events and creates watermarks to keep your statements running smoothly. Blocked Percentage of time the partition is blocked. A blocked partition is overwhelmed with data, causing delays in the watermark calculation. Idle Percentage of time the partition is idle. An idle partition has not received any events for a certain time period and is not contributing to the watermark calculation. Related content¶ EXPLAIN Statement Flink SQL Statements Profile a Query Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. --- ### Stream Processing with Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/overview.html Stream Processing with Confluent Cloud for Apache Flink¶ Apache Flink® is a powerful, scalable stream processing framework for running complex, stateful, low-latency streaming applications on large volumes of data. Flink excels at complex, high-performance, mission-critical streaming workloads and is used by many companies for production stream processing applications. Flink is the de facto industry standard for stream processing. Get Started for Free Sign up for a Confluent Cloud trial and get $400 of free credit. Confluent Cloud for Apache Flink provides a cloud-native, serverless service for Flink that enables simple, scalable, and secure stream processing that integrates seamlessly with Apache Kafka®. Your Kafka topics appear automatically as queryable Flink tables, with schemas and metadata attached by Confluent Cloud. Confluent Cloud for Apache Flink supports creating stream-processing applications by using Flink SQL, the Flink Table API (Java and Python), and custom user-defined functions. To run Flink on-premises with Confluent Platform, see Confluent Platform for Apache Flink. What is Confluent Cloud for Apache Flink? Cloud native Complete Everywhere Program Flink with SQL, Java, and Python Confluent for VS Code What is Confluent Cloud for Apache Flink?¶ Confluent Cloud for Apache Flink integrates with the Kafka ecosystem¶ Confluent Cloud for Apache Flink is Flink re-imagined as a truly cloud-native service. Confluent’s fully managed Flink service enables you to: Easily filter, join, and enrich your data streams with Flink Enable high-performance and efficient stream processing at any scale, without the complexities of managing infrastructure Experience Kafka and Flink as a unified platform, with fully integrated monitoring, security, and governance When bringing Flink to Confluent Cloud, the goal was to provide a uniquely serverless experience superior to just “cloud-hosted” Flink. Kafka on Confluent Cloud goes beyond Kafka by using the Kora engine, which showcases Confluent’s engineering expertise in building cloud-native data systems. Confluent’s goal is to deliver the same simplicity, security, and scalability for Flink that you expect for Kafka. Confluent Cloud for Apache Flink is engineered to be: Cloud-native: Flink is fully managed on Confluent Cloud and autoscales up and down with your workloads. Complete: Flink is integrated deeply with Confluent Cloud to provide an enterprise-ready experience. Everywhere: Flink is available in AWS, Azure, and Google Cloud. Get started with Confluent Cloud for Apache Flink: Flink SQL Quick Start with Confluent Cloud Console Flink SQL Shell Quick Start Confluent Cloud for Apache Flink is cloud-native¶ Confluent Cloud for Apache Flink autoscales with your workloads¶ Confluent Cloud for Apache Flink provides a cloud-native experience for Flink. This means you can focus fully on your business logic, encapsulated in Flink SQL statements, and Confluent Cloud takes care of what’s needed to run them in a secure, resource-efficient and fault-tolerant manner. You don’t need to know about or interact with Flink clusters, state backends, checkpointing, or any of the other aspects that are usually involved when operating a production-ready Flink deployment. Fully ManagedOn Confluent Cloud, you don’t need to choose a runtime version of Flink. You’re always using the latest version and benefit from continuous improvements and innovations. All of your running statements automatically and transparently receive security patches and minor upgrades of the Flink runtime. AutoscalingAll of your Flink SQL statements on Confluent Cloud are monitored continuously and auto-scaled to keep up with the rate of their input topics. The resources required by a statement depend on its complexity and the throughput of topics it reads from. Usage-based billingYou pay only for what you use, not what you provision. Flink compute in Confluent Cloud is elastic: once you stop using the compute resources, they are deallocated, and you no longer pay for them. Coupled with the elasticity provided by scale-to-zero, you can benefit from unbounded scalability while maintaining cost efficiency. For more information, see Billing. Confluent Cloud for Apache Flink is complete¶ Confluent Cloud for Apache Flink is a unified platform¶ Confluent has integrated Flink deeply with Confluent Cloud to provide an enterprise-ready, complete experience that enables data discovery and processing using familiar SQL semantics. Confluent Cloud for Apache Flink is a regional service¶ Confluent Cloud for Apache Flink is a regional service, and you can create compute pools in any of the supported regions. Compute pools represent a set of resources that scale automatically between zero and their maximum size to provide all of the power required by your statements. A compute pool is bound to a region, and the resources provided by a compute pool are shared among all statements that use them. While compute pools are created within an environment, you can query data in any topic in your Confluent Cloud organization, even if the data is in a different environment, as long as it’s in the same region. This enables Flink to do cross-cluster, cross-environment queries while providing low latency. Of course, access control with RBAC still determines the data that can be read or written. Flink can read from and write to any Kafka cluster in the same region, but by design, Confluent Cloud doesn’t allow you to query across regions. This helps you to avoid expensive data transfer charges, and also protects data locality and sovereignty by keeping reads and writes in-region. For a list of available regions, see Supported Cloud Regions. Metadata mapping between Kafka cluster, topics, schemas, and Flink¶ Kafka topics and schemas are always in sync with Flink, simplifying how you can process your data. Any topic created in Kafka is visible directly as a table in Flink, and any table created in Flink is visible as a topic in Kafka. Effectively, Flink provides a SQL interface on top of Confluent Cloud. Because Flink follows the SQL standard, the terminology is slightly different from Kafka. The following table shows the mapping between Kafka and Flink terminology. Kafka Flink Notes Environment Catalog Flink can query and join data that are in any environments/catalogs Cluster Database Flink can query and join data that are in different clusters/databases Topic + Schema Table Kafka topics and Flink tables are always in sync. You never need to declare tables manually for existing topics. Creating a table in Flink creates a topic and the associated schema. As a result, when you start using Flink, you can directly access all of the environments, clusters, and topics that you already have in Confluent Cloud, without any additional metadata creation. Automatic metadata integration in Confluent Cloud for Apache Flink¶ Compared with Apache Flink, the main difference is that the Data Definition Language (DDL) statements related to catalogs, databases, and tables act on physical objects and not only on metadata. For example, when you create a table in Flink, the corresponding topic and schema are created immediately in Confluent Cloud. Confluent Cloud provides a unified approach to metadata management. There is one object definition, and Flink integrates directly with this definition, avoiding unnecessary duplication of metadata and making all topics immediately queryable with Flink SQL. Also, any existing schemas in Schema Registry are used to surface fully-defined entities in Confluent Cloud. If you’re already on Confluent Cloud, you see tables automatically that are ready to query using Flink, simplifying data discovery and exploration. Observability¶ Confluent Cloud provides you with a curated set of metrics, exposing them through Confluent’s existing Metrics API. If you have established observability platforms in place, Confluent Cloud provides first-class integrations with New Relic, Datadog, Grafana Cloud, and Dynatrace. You can also monitor workloads directly within the Confluent Cloud Console. Clicking into a compute pool gives you insight into the health and performance of your applications, in addition to the resource consumption of your compute pool. Security¶ Confluent Cloud for Apache Flink has a deep integration with Role-Based Access Control (RBAC), ensuring that you can easily access and process the data that you have access to, and no other data. Access from Flink to the data¶ For ad-hoc queries, you can use your user account, because the permissions of the current user are applied automatically without any additional setting needed. For long-running statements that need to run 24/7, like INSERT INTO, you should use a service account, so the statements are not affected by a user leaving the company or changing teams. Access to Flink¶ To manage Flink access, Confluent has introduced two roles. In both cases, RBAC of the user on the underlying data is still applied. FlinkDeveloper: basic access to Flink, enabling users to query data and manage their own statements. FlinkAdmin: role that enables creating and managing Flink compute pools. Service accounts¶ Service accounts are available for running statements permanently. If you want to run a statement with service account permissions, an OrganizationAdmin must create an Assigner role binding for the user on the service account. For more information, see Production workloads (service accounts). Private networking¶ Confluent Cloud for Apache Flink supports private networking on AWS, Azure, and Google Cloud, providing a simple, secure, and flexible solution that enables new scenarios while keeping your data securely in private networking. All Kafka cluster types are supported, with any type of connectivity (public, Private Links, VPC Peering, and Transit Gateway). For more information, see Private Networking with Flink. Cross-environment queries¶ Flink can perform cross-environment queries when using both public and private networking. This can be useful if you want to enable a single networking route from your VPC or VNET. In this case, you can use a single environment and a single PLATT where you run all your Flink workloads and use three-part name queries, to query data in other environments, for example: SELECT * FROM `myEnvironment`.`myDatabase`.`myTable`; As a result, a single routing rule is necessary on the VPC or VNet side, per region, to redirect all traffic to the Flink regional endpoint(s) using this PrivateLink Attachment Connection. To isolate different workloads, you can create different compute pools, which enables you to control budget and scale of these workloads independently. Data access is protected by RBAC at the Kafka cluster (Flink database) or Kafka topic (Flink table) level. If your user account or service account that runs the query doesn’t have access, Flink can’t access sources and destinations. To access Flink statements and workspaces, you must access them from a public IP address, if authorized, or from a PLATT or Confluent Cloud Network from the same environment and region. Flink statements themselves can then access all the environments in the same organization and region. Program Flink with SQL, Java, and Python¶ Confluent Cloud for Apache Flink supports programming your streaming applications in these languages: SQL Java Table API Python Table API Also, you can create custom user-defined functions and call them in your SQL statements. For more information, see User-defined Functions. Note The Flink Table API is available for preview. A Preview feature is a Confluent Cloud component that is being introduced to gain early feedback from developers. Preview features can be used for evaluation and non-production testing purposes or to provide feedback to Confluent. The warranty, SLA, and Support Services provisions of your agreement with Confluent do not apply to Preview features. Confluent may discontinue providing preview releases of the Preview features at any time in Confluent’s’ sole discretion. Comments, questions, and suggestions related to the Table API are encouraged and can be submitted through the established channels. Confluent for VS Code¶ Install Confluent for VS Code to access Smart Project Templates that accelerate project setup by providing ready-to-use templates tailored for common development patterns. These templates enable you to launch new projects quickly with minimal configuration, significantly reducing setup time. Next steps¶ Get Started Quick Start with Cloud Console Quick Start with Flink SQL Shell Java Table API Quick Start Python Table API Quick Start Related content¶ Stream Processing Concepts How-to Guides Operate and Deploy Flink SQL Reference Blog post: Introducing Confluent Cloud for Apache Flink Blog post: Your Guide to Flink SQL: An In-Depth Exploration Blog post: How to Use Flink SQL, Streamlit, and Kafka: Part 1 Blog post: How to Use Flink SQL, Streamlit, and Kafka: Part 2 Blog post: Data Products, Data Contracts, and Change Data Capture Course: Apache Flink SQL Course: Apache Flink 101 Course: Building Flink Applications in Java Course: Apache Flink® Table API: Processing Data Streams in Java Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql SELECT * FROM `myEnvironment`.`myDatabase`.`myTable`; ``` --- ### Supported Cloud Regions for Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/cloud-regions.html Supported Cloud Regions for Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® is available on AWS, Azure, and Google Cloud. Flink is supported in the following regions. AWS supported regions Azure supported regions Google Cloud supported regions You can see the regions where Confluent Cloud for Apache Flink is supported by using the Confluent Cloud Console, the Confluent CLI, and the Flink REST API. List regions by using Cloud Console, Confluent CLI, or REST API AWS supported regions¶ Regions Networking ap-east-1 Public and private ap-northeast-1 Public and private ap-northeast-2 Public and private ap-south-1 Public and private ap-southeast-1 Public and private ap-southeast-2 Public and private ca-central-1 Public and private eu-central-1 Public and private eu-north-1 Public and private eu-west-1 Public and private eu-west-2 Public and private me-south-1 Public and private sa-east-1 Public and private us-east-1 Public and private us-east-2 Public and private us-west-2 Public and private Azure supported regions¶ Regions Networking australiaeast Public and private brazilsouth Public and private canadacentral Public and private centralindia Public and private centralus Public and private eastasia Public and private eastus Public and private eastus2 Public and private francecentral Public and private germanywestcentral Public and private northeurope Public and private southcentralus Public and private southeastasia Public and private spaincentral Public and private uaenorth Public and private uksouth Public and private westeurope Public and private westus2 Public and private westus3 Public and private Google Cloud supported regions¶ Regions Networking asia-south1 Public and private asia-south2 Public and private asia-southeast1 Public and private asia-southeast2 Public and private australia-southeast1 Public and private europe-west1 Public and private europe-west2 Public and private europe-west3 Public and private europe-west4 Public and private northamerica-northeast1 Public and private northamerica-northeast2 Public and private us-central1 Public and private us-east1 Public and private us-east4 Public and private us-west1 Public and private us-west2 Public and private us-west4 Public and private List regions by using Cloud Console, Confluent CLI, or REST API¶ You can see the regions where Confluent Cloud for Apache Flink is supported by using the Confluent Cloud Console, the Confluent CLI, or the Flink REST API. Confluent Cloud ConsoleConfluent CLIREST API Log in to Confluent Cloud and navigate to your environment. Click Flink and ensure that Compute pools is selected. Click Add compute pool. In the Create compute pool page, you can browse the available cloud providers and regions. Log in to Confluent Cloud. confluent login --organization-id ${ORG_ID} --prompt Use the Confluent CLI command to see the regions where Confluent Cloud for Apache Flink is supported. confluent flink region list Your output should resemble: Current | Name | Cloud | Region ----------+--------------------------------+-------+----------------------- | Belgium (europe-west1) | GCP | europe-west1 | Canada (ca-central-1) | AWS | ca-central-1 | Iowa (centralus) | AZURE | centralus ... Use grep to filter the list by cloud provider. For example, the following command shows the AWS regions where Flink is available. confluent flink region list | grep -i aws Your output should resemble: | Canada (ca-central-1) | AWS | ca-central-1 | Frankfurt (eu-central-1) | AWS | eu-central-1 | Ireland (eu-west-1) | AWS | eu-west-1 ... Send a GET request to the Flink REST API Regions endpoint to list the available regions. For more information, see List Flink Regions. Related content¶ Compute Pools Create a Compute Pool Statements Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql confluent login --organization-id ${ORG_ID} --prompt ``` ```sql confluent flink region list ``` ```sql Current | Name | Cloud | Region ----------+--------------------------------+-------+----------------------- | Belgium (europe-west1) | GCP | europe-west1 | Canada (ca-central-1) | AWS | ca-central-1 | Iowa (centralus) | AZURE | centralus ... ``` ```sql confluent flink region list | grep -i aws ``` ```sql | Canada (ca-central-1) | AWS | ca-central-1 | Frankfurt (eu-central-1) | AWS | eu-central-1 | Ireland (eu-west-1) | AWS | eu-west-1 ... ``` --- ### Flink SQL Data Types in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/datatypes.html Data Types in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® has a rich set of native data types that you can use in SQL statements and queries. The query planner supports the following SQL types. Flink SQL type Java type JSON Schema type Protobuf type Avro type Avro logical type ARRAY t[] Array repeated T array – BIGINT long Number INT64 long – BINARY byte[] String BYTES fixed – BOOLEAN boolean Boolean BOOL boolean – BYTES / VARBINARY byte[] String BYTES bytes – CHAR String String STRING string – DATE java.time.LocalDate Number MESSAGE int date DECIMAL java.math.BigDecimal Number MESSAGE bytes decimal DOUBLE double Number DOUBLE double – FLOAT float Number FLOAT float – INT long Number INT32 int – INTERVAL DAY TO SECOND java.time.Duration Not supported Not supported Not supported – INTERVAL YEAR TO MONTH java.time.Period Not supported Not supported Not supported – MAP java.util.Map Array[Object] / Object repeated MESSAGE map / array – MULTISET java.util.Map Array[Object] / Object repeated MESSAGE map / array – NULL java.lang.Object oneOf(Null, T) [1] union(avro_type, null) – ROW org.apache.flink.types.Row Object MESSAGE record [2] – SMALLINT short Number INT32 int – TIME java.time.LocalTime Number – int time-millis TIMESTAMP java.time.LocalDateTime Number MESSAGE long local-timestamp-millis/local-timestamp-micros TIMESTAMP_LTZ java.time.Instant Number MESSAGE long timestamp-millis / timestamp-micros TINYINT byte Number INT32 int – VARCHAR / STRING String String STRING string – [1]See discussion at Flink SQL types to Protobuf types [2]See discussion at Flink SQL types to Avro types Data type definition¶ A data type describes the logical type of a value in a SQL table. You use data types to declare the input and output types of an operation. The Flink data types are similar to the SQL standard data type terminology, but for efficient handling of scalar expressions, they also contain information about the nullability of a value. These are examples of SQL data types: INT INT NOT NULL INTERVAL DAY TO SECOND(3) ROW, fieldTwo TIMESTAMP(3)> The following sections list all pre-defined data types in Flink SQL. Character strings¶ CHAR¶ Represents a fixed-length character string. Declaration CHAR CHAR(n) Bridging to JVM types Java Type Input Output Notes java.lang.String ✓ ✓ Default byte[] ✓ ✓ Assumes UTF-8 encoding org.apache.flink.table.data.StringData ✓ ✓ Internal data structure Formats The following table shows examples of the CHAR type in different formats. JSON for data type {"type":"CHAR","nullable":true,"length":8} CLI/UI format CHAR(8) JSON for payload "Example string" CLI/UI format for payload Example string Declare this type by using CHAR(n), where n is the number of code points. n must have a value between 1 and 2,147,483,647 (both inclusive). If no length is specified, n is equal to 1. CHAR(0) is not supported for CAST or persistence in catalogs, but it exists in protocols. VARCHAR / STRING¶ Represents a variable-length character string. Declaration VARCHAR VARCHAR(n) STRING Bridging to JVM types Java Type Input Output Notes java.lang.String ✓ ✓ Default byte[] ✓ ✓ Assumes UTF-8 encoding org.apache.flink.table.data.StringData ✓ ✓ Internal data structure Formats The following table shows examples of the VARCHAR type in different formats. JSON for data type {"type":"VARCHAR","nullable":true,"length":8} CLI/UI format VARCHAR(800) JSON for payload "Example string" CLI/UI format for payload Example string Declare this type by using VARCHAR(n), where n is the maximum number of code points. n must have a value between 1 and 2,147,483,647 (both inclusive). If no length is specified, n is equal to 1. STRING is equivalent to VARCHAR(2147483647). VARCHAR(0) is not supported for CAST or persistence in catalogs, but it exists in protocols. Binary strings¶ BINARY¶ Represents a fixed-length binary string (=a sequence of bytes). Declaration BINARY BINARY(n) Bridging to JVM types Java Type Input Output Notes byte[] ✓ ✓ Default Formats The following table shows examples of the BINARY type in different formats. JSON for data type {"type":"BINARY","nullable":true,"length":1} CLI/UI format BINARY(3) JSON for payload "x'7f0203'" CLI/UI format for payload x'7f0203' Declare this type by using BINARY(n), where n is the number of bytes. n must have a value between 1 and 2,147,483,647 (both inclusive). If no length is specified, n is equal to 1. The string representation is hexadecimal format. BINARY(0) is not supported for CAST or persistence in catalogs, but it exists in protocols. BYTES / VARBINARY¶ Represents a variable-length binary string (=a sequence of bytes). Declaration BYTES VARBINARY VARBINARY(n) Bridging to JVM types Java Type Input Output Notes byte[] ✓ ✓ Default Formats The following table shows examples of the VARBINARY type in different formats. JSON for data type {"type":"VARBINARY","nullable":true,"length":1} CLI/UI format VARBINARY(800) JSON for payload "x'7f0203'" CLI/UI format for payload x'7f0203' Declare this type by using VARBINARY(n) where n is the maximum number of bytes. n must have a value between 1 and 2,147,483,647 (both inclusive). If no length is specified, n is equal to 1. BYTES is equivalent to VARBINARY(2147483647). VARCHAR(0) is not supported for CAST or persistence in catalogs, but it exists in protocols. Exact numerics¶ BIGINT¶ Represents an 8-byte signed integer with values from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. Declaration BIGINT Bridging to JVM types Java Type Input Output Notes java.lang.Long ✓ ✓ Default long ✓ (✓) Output only if type is not nullable Formats The following table shows examples of the BIGINT type in different formats. JSON for data type {"type":"BIGINT","nullable":true} CLI/UI format BIGINT JSON for payload "23" CLI/UI format for payload 23 DECIMAL¶ Represents a decimal number with fixed precision and scale. Declaration DECIMAL DECIMAL(p) DECIMAL(p, s) DEC DEC(p) DEC(p, s) NUMERIC NUMERIC(p) NUMERIC(p, s) Bridging to JVM types Java Type Input Output Notes java.math.BigDecimal ✓ ✓ Default org.apache.flink.table.data.DecimalData ✓ ✓ Internal data structure Formats The following table shows examples of the DECIMAL type in different formats. JSON for data type {"type":"DECIMAL","nullable":true,"precision":5,"scale":3} CLI/UI format DECIMAL(5, 3) JSON for payload "12.123" CLI/UI format for payload 12.123 Declare this type by using DECIMAL(p, s) where p is the number of digits in a number (precision) and s is the number of digits to the right of the decimal point in a number (scale). p must have a value between 1 and 38 (both inclusive). The default value for p is 10. s must have a value between 0 and p (both inclusive). The default value for s is 0. The right side is padded with 0. The left side must be padded with spaces, like all other values. NUMERIC(p, s) and DEC(p, s) are synonyms for this type. INT¶ Represents a 4-byte signed integer with values from -2,147,483,648 to 2,147,483,647. Declaration INT INTEGER Bridging to JVM types Java Type Input Output Notes java.lang.Integer ✓ ✓ Default long ✓ (✓) Output only if type is not nullable Formats The following table shows examples of the INT type in different formats. JSON for data type {"type":"INT","nullable":true} CLI/UI format INT JSON for payload "23" CLI/UI format for payload 23 INTEGER is a synonym for this type. SMALLINT¶ Represents a 2-byte signed integer with values from -32,768 to 32,767. Declaration SMALLINT Bridging to JVM types Java Type Input Output Notes java.lang.Short ✓ ✓ Default short ✓ (✓) Output only if type is not nullable Formats The following table shows examples of the SMALLINT type in different formats. JSON for data type {"type":"SMALLINT","nullable":true} CLI/UI format SMALLINT JSON for payload "23" CLI/UI format for payload 23 TINYINT¶ Represents a 1-byte signed integer with values from -128 to 127. Declaration TINYINT Bridging to JVM types Java Type Input Output Notes java.lang.Byte ✓ ✓ Default byte ✓ (✓) Output only if type is not nullable Formats The following table shows examples of the TINYINT type in different formats. JSON for data type {"type":"TINYINT","nullable":true} CLI/UI format TINYINT JSON for payload "23" CLI/UI format for payload 23 Approximate numerics¶ DOUBLE¶ Represents an 8-byte double precision floating point number. Declaration DOUBLE DOUBLE PRECISION Bridging to JVM types Java Type Input Output Notes java.lang.Double ✓ ✓ Default double ✓ (✓) Output only if type is not nullable Formats The following table shows examples of the DOUBLE type in different formats. JSON for data type {"type":"DOUBLE","nullable":true} CLI/UI format DOUBLE JSON for payload "1.1111112120000001E7" CLI/UI format for payload 1.1111112120000001E7 DOUBLE PRECISION is a synonym for this type. FLOAT¶ Represents a 4-byte single precision floating point number. Declaration FLOAT Bridging to JVM types Java Type Input Output Notes java.lang.Float ✓ ✓ Default float ✓ (✓) Output only if type is not nullable Formats The following table shows examples of the FLOAT type in different formats. JSON for data type {"type":"FLOAT","nullable":true} CLI/UI format FLOAT JSON for payload "1.1111112E7" CLI/UI format for payload 1.1111112E7 Compared to the SQL standard, this type doesn’t take parameters. Date and time¶ DATE¶ Represents a date consisting of year-month-day with values ranging from 0000-01-01 to 9999-12-31. Declaration DATE Bridging to JVM types Java Type Input Output Notes java.time.LocalDate ✓ ✓ Default java.sql.Date ✓ ✓ java.lang.Integer ✓ ✓ Describes the number of days since Unix epoch int ✓ (✓) Describes the number of days since Unix epoch. Output only if type is not nullable. Formats The following table shows examples of the DATE type in different formats. JSON for data type {"type":"DATE","nullable":true} CLI/UI format DATE JSON for payload "2023-04-06" CLI/UI format for payload 2023-04-06 Compared to the SQL standard, the range starts at year 0000. INTERVAL DAY TO SECOND¶ Data type for a group of day-time interval types. Declaration INTERVAL DAY INTERVAL DAY(p1) INTERVAL DAY(p1) TO HOUR INTERVAL DAY(p1) TO MINUTE INTERVAL DAY(p1) TO SECOND(p2) INTERVAL HOUR INTERVAL HOUR TO MINUTE INTERVAL HOUR TO SECOND(p2) INTERVAL MINUTE INTERVAL MINUTE TO SECOND(p2) INTERVAL SECOND INTERVAL SECOND(p2) Bridging to JVM types Java Type Input Output Notes java.time.Duration ✓ ✓ Default java.lang.Long ✓ ✓ Describes the number of milliseconds long ✓ (✓) Describes the number of milliseconds. Output only if type is not nullable. Formats The following table shows examples of the INTERVAL DAY TO SECOND type in different formats. JSON for data type {"type":"INTERVAL_DAY_TIME","nullable":true,"precision":1,"fractionalPrecision":3,"resolution":"DAY_TO_SECOND"} CLI/UI format INTERVAL DAY(1) TO SECOND(3) JSON for payload "+2 07:33:20.000" CLI/UI format for payload +2 07:33:20.000 Declare this type by using the above combinations, where p1 is the number of digits of days (day precision) and p2 is the number of digits of fractional seconds (fractional precision). p1 must have a value between 1 and 6 (both inclusive). If no p1 is specified, it is equal to 2 by default. p2 must have a value between 0 and 9 (both inclusive). If no p2 is specified, it is equal to 6 by default. The type must be parameterized to one of these resolutions with up to nanosecond precision: Interval of days Interval of days to hours Interval of days to minutes Interval of days to seconds Interval of hours Interval of hours to minutes Interval of hours to seconds Interval of minutes Interval of minutes to seconds Interval of seconds An interval of day-time consists of +days hours:months:seconds.fractional with values ranging from -999999 23:59:59.999999999 to +999999 23:59:59.999999999. The value representation is the same for all types of resolutions. For example, an interval of seconds of 70 is always represented in an interval-of-days-to-seconds format (with default precisions): +00 00:01:10.000000. Formatting intervals are tricky, because they have different resolutions: DAY DAY_TO_HOUR DAY_TO_MINUTE DAY_TO_SECOND HOUR HOUR_TO_MINUTE HOUR_TO_SECOND MINUTE MINUTE_TO_SECOND SECOND Depending on the resolution, use: INTERVAL DAY(1) INTERVAL DAY(1) TO HOUR INTERVAL DAY(1) TO MINUTE INTERVAL DAY(1) TO SECOND(3) INTERVAL HOUR INTERVAL HOUR TO MINUTE INTERVAL HOUR TO SECOND(3) INTERVAL MINUTE INTERVAL MINUTE TO SECOND(3) INTERVAL SECOND(3) INTERVAL YEAR TO MONTH¶ Data type for a group of year-month interval types. Declaration INTERVAL YEAR INTERVAL YEAR(p) INTERVAL YEAR(p) TO MONTH INTERVAL MONTH Bridging to JVM types Java Type Input Output Notes java.time.Period ✓ ✓ Default. Ignores the days part. java.lang.Integer ✓ ✓ Describes the number of months. int ✓ (✓) Describes the number of months. Output only if type is not nullable. Formats The following table shows examples of the INTERVAL YEAR TO MONTH type in different formats. JSON for data type {"type":"INTERVAL_YEAR_MONTH","nullable":true,"precision":4,"resolution":"YEAR_TO_MONTH"} CLI/UI format INTERVAL YEAR(4) TO MONTH JSON for payload "+2000-02" CLI/UI format for payload +2000-02 Declare this type by using the above combinations, where p is the number of digits of years (year precision). p must have a value between 1 and 4 (both inclusive). If no year precision is specified, p is equal to 2. The type must be parameterized to one of these resolutions: Interval of years Interval of years to months Interval of months An interval of year-month consists of +years-months with values ranging from -9999-11 to +9999-11. The value representation is the same for all types of resolutions. For example, an interval of months of 50 is always represented in an interval-of-years-to-months format (with default year precision): +04-02. Formatting intervals are tricky, because they have different resolutions: YEAR YEAR_TO_MONTH MONTH Depending on the resolution, use: INTERVAL YEAR(4) INTERVAL YEAR(4) TO MONTH INTERVAL MONTH TIME¶ Represents a time without timezone consisting of hour:minute:second[.fractional] with up to nanosecond precision and values ranging from 00:00:00.000000000 to 23:59:59.999999999. Declaration TIME TIME(p) TIME_WITHOUT_TIME_ZONE TIME_WITHOUT_TIME_ZONE(p) Bridging to JVM types Java Type Input Output Notes java.time.LocalTime ✓ ✓ Default java.sql.Time ✓ ✓ java.lang.Integer ✓ ✓ Describes the number of milliseconds of the day. int ✓ (✓) Describes the number of milliseconds of the day. Output only if type is not nullable. java.lang.Long ✓ ✓ Describes the number of nanoseconds of the day. long ✓ (✓) Describes the number of nanoseconds of the day. Output only if type is not nullable. Formats The following table shows examples of the TIME type in different formats. JSON for data type {"type":"TIME_WITHOUT_TIME_ZONE","nullable":true,"precision":3} CLI/UI format TIME(3) JSON for payload "10:56:22.541" CLI/UI format for payload 10:56:22.541 Declare this type by using TIME(p), where p is the number of digits of fractional seconds (precision). p must have a value between 0 and 9 (both inclusive). If no precision is specified, p is equal to 0. Compared to the SQL standard, leap seconds (23:59:60 and 23:59:61) are not supported, as the semantics are closer to java.time.LocalTime. A time with timezone is not provided. TIME acts like a pure string and isn’t related to a time zone of any kind, including UTC. TIME WITHOUT TIME ZONE is a synonym for this type. TIMESTAMP¶ Represents a timestamp without timezone consisting of year-month-day hour:minute:second[.fractional] with up to nanosecond precision and values ranging from 0000-01-01 00:00:00.000000000 to 9999-12-31 23:59:59.999999999. Declaration TIMESTAMP TIMESTAMP(p) TIMESTAMP WITHOUT TIME ZONE TIMESTAMP(p) WITHOUT TIME ZONE Bridging to JVM types Java Type Input Output Notes java.time.LocalDateTime ✓ ✓ Default java.sql.Timestamp ✓ ✓ org.apache.flink.table.data.TimestampData ✓ ✓ Internal data structure Formats The following table shows examples of the TIMESTAMP type in different formats. JSON for data type {"type":"TIMESTAMP_WITHOUT_TIME_ZONE","nullable":true,"precision":3} CLI/UI format TIMESTAMP(3) JSON for payload "2023-04-06 10:59:32.628" CLI/UI format for payload 2023-04-06 10:59:32.628 Declare this type by using TIMESTAMP(p), where p is the number of digits of fractional seconds (precision). p must have a value between 0 and 9 (both inclusive). If no precision is specified, p is equal to 6. A space separates the date and time parts. Compared to the SQL standard, leap seconds (23:59:60 and 23:59:61) are not supported, as the semantics are closer to java.time.LocalDateTime. A conversion from and to BIGINT (a JVM long type) is not supported, as this would imply a timezone, but this type is time-zone free. For more java.time.Instant-like semantics use TIMESTAMP_LTZ. TIMESTAMP acts like a pure string and isn’t related to a time zone of any kind, including UTC. TIMESTAMP WITHOUT TIME ZONE is a synonym for this type. TIMESTAMP_LTZ¶ Represents a timestamp with the local timezone consisting of year-month-day hour:minute:second[.fractional] zone with up to nanosecond precision and values ranging from 0000-01-01 00:00:00.000000000 +14:59 to 9999-12-31 23:59:59.999999999 -14:59. Declaration TIMESTAMP_LTZ TIMESTAMP_LTZ(p) TIMESTAMP WITH LOCAL TIME ZONE TIMESTAMP(p) WITH LOCAL TIME ZONE Bridging to JVM types Java Type Input Output Notes java.time.Instant ✓ ✓ Default java.lang.Integer ✓ ✓ Describes the number of seconds since Unix epoch. int ✓ (✓) Describes the number of seconds since Unix epoch. Output only if type is not nullable. java.lang.Long ✓ ✓ Describes the number of milliseconds since Unix epoch. long ✓ (✓) Describes the number of milliseconds since Unix epoch. Output only if type is not nullable. java.sql.Timestamp ✓ ✓ Describes the number of milliseconds since Unix epoch. org.apache.flink.table.data.TimestampData ✓ ✓ Internal data structure Formats The following table shows examples of the TIMESTAMP_LTZ type in different formats. JSON for data type {"type":"TIMESTAMP_WITH_LOCAL_TIME_ZONE","nullable":true,"precision":3} CLI/UI format TIMESTAMP(3) WITH LOCAL TIME ZONE JSON for payload "2023-04-06 11:06:47.224" CLI/UI format for payload 2023-04-06 11:06:47.224 Declare this type by using TIMESTAMP_LTZ(p), where p is the number of digits of fractional seconds (precision). p must have a value between 0 and 9 (both inclusive). If no precision is specified, p is equal to 6. Leap seconds (23:59:60 and 23:59:61) are not supported, as the semantics are closer to java.time.OffsetDateTime. Compared to TIMESTAMP WITH TIME ZONE, the timezone offset information is not stored physically in every datum. Instead, the type assumes java.time.Instant semantics in the UTC timezone at the edges of the table ecosystem. Every datum is interpreted in the local timezone configured in the current session for computation and visualization. This type fills the gap between time-zone free and time-zone mandatory timestamp types by allowing the interpretation of UTC timestamps according to the configured session timezone. TIMESTAMP_LTZ resembles a TIMESTAMP without a timezone, but the string always considers the sessions/query’s timezone. Internally, it is always in the UTC time zone. If you require the short format, prefer TIMESTAMP_LTZ(3). TIMESTAMP WITH LOCAL TIME ZONE is a synonym for this type. TIMESTAMP and TIMESTAMP_LTZ comparison¶ Although TIMESTAMP and TIMESTAMP_LTZ are similarly named, they represent different concepts. TIMESTAMP_LTZ TIMESTAMP_LTZ in SQL is similar to the Instant class in Java. TIMESTAMP_LTZ represents a moment, or a specific point in the UTC timeline. TIMESTAMP_LTZ stores time as a UTC integer, which can be converted dynamically to every other timezone. When printing or casting TIMESTAMP_LTZ as a character string, the sql.local-time-zone setting is considered. TIMESTAMP TIMESTAMP in SQL is similar to LocalDateTime in Java. TIMESTAMP has no time zone or offset from UTC, so it can’t represent a moment. TIMESTAMP stores time as character string, not related to any timezone. TIMESTAMP WITH TIME ZONE¶ Represents a timestamp with time zone consisting of year-month-day hour:minute:second[.fractional] zone with up to nanosecond precision and values ranging from 0000-01-01 00:00:00.000000000 +14:59 to 9999-12-31 23:59:59.999999999 -14:59. Declaration TIMESTAMP WITH TIME ZONE TIMESTAMP(p) WITH TIME ZONE Bridging to JVM types Java Type Input Output Notes java.time.OffsetDateTime ✓ ✓ Default java.time.ZonedDateTime ✓ Ignores the zone ID Compared to TIMESTAMP_LTZ, the time zone offset information is stored physically in every datum. It is used individually for every computation, visualization, or communication to external systems. Collection data types¶ ARRAY¶ Represents an array of elements with same subtype. Declaration ARRAY t ARRAY Bridging to JVM types Java Type Input Output Notes t[] ✓ ✓ Default. Depends on the subtype. java.util.List ✓ ✓ subclass of java.util.List ✓ org.apache.flink.table.data.ArrayData ✓ ✓ Internal data structure Formats The following table shows examples of the ARRAY type in different formats. JSON for data type {"type":"ARRAY","nullable":true,"elementType":{"type":"INTEGER","nullable":true}} CLI/UI format ARRAY JSON for payload ["1", "2", "3", null] CLI/UI format for payload [1, 2, 3, NULL] Declare this type by using ARRAY, where t is the data type of the contained elements. Compared to the SQL standard, the maximum cardinality of an array cannot be specified and is fixed at 2,147,483,647. Also, any valid type is supported as a subtype. t ARRAY is a synonym for being closer to the SQL standard. For example, INT ARRAY is equivalent to ARRAY. MAP¶ Represents an associative array that maps keys (including NULL) to values (including NULL). Declaration MAP Bridging to JVM types Java Type Input Output Notes java.util.Map ✓ ✓ Default subclass of java.util.Map ✓ org.apache.flink.table.data.MapData ✓ ✓ Internal data structure Formats The following table shows examples of the MAP type in different formats. JSON for data type {"type":"MAP","nullable":true,"keyType":{"type":"INTEGER","nullable":true},"valueType":{"type":"VARCHAR","nullable":true,"length":2147483647}} CLI/UI format MAP JSON for payload [["1", "a"], ["2", "b"], [null, "c"]] CLI/UI format for payload {1=a, 2=b, NULL=c} Declare this type by using MAP where kt is the data type of the key elements and vt is the data type of the value elements. A map can’t contain duplicate keys. Each key can map to at most one value. There is no restriction of element types. It is the responsibility of the user to ensure uniqueness. The map type is an extension to the SQL standard. MULTISET¶ Represents a multiset (=bag). Declaration MULTISET t MULTISET Bridging to JVM types Java Type Input Output Notes java.util.Map ✓ ✓ Default. Assigns each value to an integer multiplicity. subclass of java.util.Map ✓ org.apache.flink.table.data.MapData ✓ ✓ Internal data structure Formats The following table shows examples of the MULTISET type in different formats. JSON for data type {"type":"MULTISET","nullable":true,"elementType":{"type":"INTEGER","nullable":true}} CLI/UI format MULTISET JSON for payload [["a", "1"], ["b", "2"], [null, "1"]] CLI/UI format for payload {a=1, b=2, NULL=1} Declare this type by using MULTISET where t is the data type of the contained elements. Unlike a set, the multiset allows for multiple instances for each of its elements with a common subtype. Each unique value (including NULL) is mapped to some multiplicity. There is no restriction of element types; it is the responsibility of the user to ensure uniqueness. t MULTISET is a synonym for being closer to the SQL standard. For example, INT MULTISET is equivalent to MULTISET. ROW¶ Represents a sequence of fields. Declaration ROW ROW ROW(name0 type0, name1 type1, ...) ROW(name0 type0 'description0', name1 type1 'description1', ...) Bridging to JVM types Java Type Input Output Notes org.apache.flink.types.Row ✓ ✓ Default org.apache.flink.table.data.RowData ✓ ✓ Internal data structure Formats The following table shows examples of the ROW type in different formats. JSON for data type {"type":"ROW","nullable":true,"fields":[{"name":"a","fieldType":{"type":"INTEGER","nullable":true}},{"name":"b","fieldType":{"type":"VARCHAR","nullable":true,"length":2147483647}}]} CLI/UI format MULTISET JSON for payload [["a", "1"], ["b", "2"], [null, "1"]] CLI/UI format for payload {a=1, b=2, NULL=1} Declare this type by using ROW, where n is the unique name of a field, t is the logical type of a field, d is the description of a field. A field consists of a field name, field type, and an optional description. The most specific type of a row of a table is a row type. In this case, each column of the row corresponds to the field of the row type that has the same ordinal position as the column. To create a table with a row type, use the following syntax: CREATE TABLE table_with_row_types ( `Customer` ROW, `Order` ROW ); To insert a row into a table with a row type, use the following syntax: INSERT INTO table_with_row_types VALUES (('Alice', 30), (101, 'Book')), (('Bob', 25), (102, 'Laptop')), (('Charlie', 35), (103, 'Phone')), (('Diana', 28), (104, 'Tablet')), (('Eve', 22), (105, 'Headphones')); To work with fields from a row, use dot notation: SELECT `Customer`.name, `Customer`.age, `Order`.id, `Order`.title FROM table_with_row_types WHERE `Customer`.age > 30; Compared to the SQL standard, an optional field description simplifies the handling with complex structures. A row type is similar to the STRUCT type known from other non-standard-compliant frameworks. ROW(...) is a synonym for being closer to the SQL standard. For example, ROW(fieldOne INT, fieldTwo BOOLEAN) is equivalent to ROW. If the fields of the data type contain characters other than [A-Za-z_], use escaping notation. Double backticks escape the backtick character, for example: ROW<`a-b` INT, b STRING, `weird_col``_umn` STRING> Rows fields can contain comments, for example: {"type":"ROW","nullable":true,"fields":[{"name":"a","fieldType":{"type":"INTEGER","nullable":true},"description":"hello"}]} Format using single quotes. Double single quotes escape single quotes, for example: ROW Other data types¶ BOOLEAN¶ Represents a boolean with a (possibly) three-valued logic of TRUE, FALSE, and UNKNOWN. Declaration BOOLEAN Bridging to JVM types Java Type Input Output Notes java.lang.Boolean ✓ ✓ Default boolean ✓ (✓) Output only if type is not nullable. Formats The following table shows examples of the BOOLEAN type in different formats. JSON for data type {"type":"BOOLEAN","nullable":true} CLI/UI format NULL JSON for payload null CLI/UI format for payload NULL NULL¶ Data type for representing untyped NULL values. Declaration NULL Bridging to JVM types Java Type Input Output Notes java.lang.Object ✓ ✓ Default any class (✓) Any non-primitive type. Formats The following table shows examples of the NULL type in different formats. JSON for data type {"type":"NULL"} CLI/UI format NULL JSON for payload null CLI/UI format for payload NULL The NULL type is an extension to the SQL standard. A NULL type has no other value except NULL, thus, it can be cast to any nullable type similar to JVM semantics. This type helps in representing unknown types in API calls that use a NULL literal as well as bridging to formats such as JSON or Avro that define such a type as well. This type is not very useful in practice and is described here only for completeness. Casting¶ Flink SQL can perform casting between a defined input type and target type. While some casting operations can always succeed regardless of the input value, others can fail at runtime when there’s no way to create a value for the target type. For example, it’s always possible to convert INT to STRING, but you can’t always convert a STRING to INT. During the planning stage, the query validator rejects queries for invalid type pairs with a ValidationException, for example, when trying to cast a TIMESTAMP to an INTERVAL. Valid type pairs that can fail at runtime are accepted by the query validator, but this requires you to handle cast failures correctly. In Flink SQL, casting can be performed by using one of these two built-in functions: CAST: The regular cast function defined by the SQL standard. It can fail the job if the cast operation is fallible and the provided input is not valid. Type inference preserves the nullability of the input type. TRY_CAST: An extension to the regular cast function that returns NULL if the cast operation fails. Its return type is always nullable. For example: -- returns 42 of type INT NOT NULL SELECT CAST('42' AS INT); -- returns NULL of type VARCHAR SELECT CAST(NULL AS VARCHAR); -- throws an exception and fails the job SELECT CAST('non-number' AS INT); -- returns 42 of type INT SELECT TRY_CAST('42' AS INT); -- returns NULL of type VARCHAR SELECT TRY_CAST(NULL AS VARCHAR); -- returns NULL of type INT SELECT TRY_CAST('non-number' AS INT); -- returns 0 of type INT NOT NULL SELECT COALESCE(TRY_CAST('non-number' AS INT), 0); The following matrix shows the supported cast pairs, where “Y” means supported, “!” means fallible, and “N” means unsupported: Input / Target CHAR¹ / VARCHAR¹ / STRING BINARY¹ / VARBINARY¹ / BYTES BOOLEAN DECIMAL TINYINT SMALLINT INTEGER BIGINT FLOAT DOUBLE DATE TIME TIMESTAMP TIMESTAMP_LTZ INTERVAL ARRAY MULTISET MAP ROW CHAR / VARCHAR / STRING Y ! ! ! ! ! ! ! ! ! ! ! ! ! N N N N N BINARY / VARBINARY / BYTES Y Y N N N N N N N N N N N N N N N N N BOOLEAN Y N Y Y Y Y Y Y Y Y N N N N N N N N N DECIMAL Y N N Y Y Y Y Y Y Y N N N N N N N N N TINYINT Y N Y Y Y Y Y Y Y Y N N N² N² N N N N N SMALLINT Y N Y Y Y Y Y Y Y Y N N N² N² N N N N N INTEGER Y N Y Y Y Y Y Y Y Y N N N² N² Y⁵ N N N N BIGINT Y N Y Y Y Y Y Y Y Y N N N² N² Y⁶ N N N N FLOAT Y N N Y Y Y Y Y Y Y N N N N N N N N N DOUBLE Y N N Y Y Y Y Y Y Y N N N N N N N N N DATE Y N N N N N N N N N Y N Y Y N N N N N TIME Y N N N N N N N N N N Y Y Y N N N N N TIMESTAMP Y N N N N N N N N N Y Y Y Y N N N N N TIMESTAMP_LTZ Y N N N N N N N N N Y Y Y Y N N N N N INTERVAL Y N N N N N Y⁵ Y⁶ N N N N N N Y N N N N ARRAY Y N N N N N N N N N N N N N N !³ N N N MULTISET Y N N N N N N N N N N N N N N N !³ N N MAP Y N N N N N N N N N N N N N N N N !³ N ROW Y N N N N N N N N N N N N N N N N N !³ Notes: All the casting to constant length or variable length also trims and pads, according to the type definition. TO_TIMESTAMP and TO_TIMESTAMP_LTZ must be used instead of CAST/ TRY_CAST. Supported iff the children type pairs are supported. Fallible iff the children type pairs are fallible. Supported iff the RAW class and serializer are equals. Supported iff INTERVAL is a MONTH TO YEAR range. Supported iff INTERVAL is a DAY TO TIME range. Note A cast of a NULL value always returns NULL, regardless of whether the function used is CAST or TRY_CAST. Data type extraction¶ In many locations in the API, Flink tries to extract data types automatically from class information by using reflection to avoid repetitive manual schema work. But extracting a data type using reflection is not always successful, because logical information might be missing. In these cases, it may be necessary to add additional information close to a class or field declaration for supporting the extraction logic. The following table lists classes that map implicitly to a data type without requiring further information. Other JVM bridging classes require the @DataTypeHint annotation. Class Data Type boolean BOOLEAN NOT NULL byte TINYINT NOT NULL byte[] BYTES double DOUBLE NOT NULL float FLOAT NOT NULL int INT NOT NULL java.lang.Boolean BOOLEAN java.lang.Byte TINYINT java.lang.Double DOUBLE java.lang.Float FLOAT java.lang.Integer INT java.lang.Long BIGINT java.lang.Short SMALLINT java.lang.String STRING java.sql.Date DATE java.sql.Time TIME(0) java.sql.Timestamp TIMESTAMP(9) java.time.Duration INTERVAL SECOND(9) java.time.Instant TIMESTAMP_LTZ(9) java.time.LocalDate DATE java.time.LocalTime TIME(9) java.time.LocalDateTime TIMESTAMP(9) java.time.OffsetDateTime TIMESTAMP(9) WITH TIME ZONE java.time.Period INTERVAL YEAR(4) TO MONTH java.util.Map MAP short SMALLINT NOT NULL structured type T anonymous structured type T long BIGINT NOT NULL T[] ARRAY Related content¶ DDL Statements Flink SQL Queries Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql INT INT NOT NULL INTERVAL DAY TO SECOND(3) ROW, fieldTwo TIMESTAMP(3)> ``` ```sql CHAR CHAR(n) ``` ```sql {"type":"CHAR","nullable":true,"length":8} ``` ```sql "Example string" ``` ```sql Example string ``` ```sql VARCHAR VARCHAR(n) STRING ``` ```sql {"type":"VARCHAR","nullable":true,"length":8} ``` ```sql VARCHAR(800) ``` ```sql "Example string" ``` ```sql Example string ``` ```sql VARCHAR(2147483647) ``` ```sql BINARY BINARY(n) ``` ```sql {"type":"BINARY","nullable":true,"length":1} ``` ```sql "x'7f0203'" ``` ```sql BYTES VARBINARY VARBINARY(n) ``` ```sql {"type":"VARBINARY","nullable":true,"length":1} ``` ```sql VARBINARY(800) ``` ```sql "x'7f0203'" ``` ```sql VARBINARY(n) ``` ```sql VARBINARY(2147483647) ``` ```sql {"type":"BIGINT","nullable":true} ``` ```sql DECIMAL DECIMAL(p) DECIMAL(p, s) DEC DEC(p) DEC(p, s) NUMERIC NUMERIC(p) NUMERIC(p, s) ``` ```sql {"type":"DECIMAL","nullable":true,"precision":5,"scale":3} ``` ```sql DECIMAL(5, 3) ``` ```sql DECIMAL(p, s) ``` ```sql NUMERIC(p, s) ``` ```sql INT INTEGER ``` ```sql {"type":"INT","nullable":true} ``` ```sql {"type":"SMALLINT","nullable":true} ``` ```sql {"type":"TINYINT","nullable":true} ``` ```sql DOUBLE DOUBLE PRECISION ``` ```sql {"type":"DOUBLE","nullable":true} ``` ```sql "1.1111112120000001E7" ``` ```sql 1.1111112120000001E7 ``` ```sql DOUBLE PRECISION ``` ```sql {"type":"FLOAT","nullable":true} ``` ```sql "1.1111112E7" ``` ```sql 1.1111112E7 ``` ```sql year-month-day ``` ```sql {"type":"DATE","nullable":true} ``` ```sql "2023-04-06" ``` ```sql INTERVAL DAY INTERVAL DAY(p1) INTERVAL DAY(p1) TO HOUR INTERVAL DAY(p1) TO MINUTE INTERVAL DAY(p1) TO SECOND(p2) INTERVAL HOUR INTERVAL HOUR TO MINUTE INTERVAL HOUR TO SECOND(p2) INTERVAL MINUTE INTERVAL MINUTE TO SECOND(p2) INTERVAL SECOND INTERVAL SECOND(p2) ``` ```sql {"type":"INTERVAL_DAY_TIME","nullable":true,"precision":1,"fractionalPrecision":3,"resolution":"DAY_TO_SECOND"} ``` ```sql INTERVAL DAY(1) TO SECOND(3) ``` ```sql "+2 07:33:20.000" ``` ```sql +2 07:33:20.000 ``` ```sql +days hours:months:seconds.fractional ``` ```sql -999999 23:59:59.999999999 ``` ```sql +999999 23:59:59.999999999 ``` ```sql +00 00:01:10.000000 ``` ```sql INTERVAL DAY(1) INTERVAL DAY(1) TO HOUR INTERVAL DAY(1) TO MINUTE INTERVAL DAY(1) TO SECOND(3) INTERVAL HOUR INTERVAL HOUR TO MINUTE INTERVAL HOUR TO SECOND(3) INTERVAL MINUTE INTERVAL MINUTE TO SECOND(3) INTERVAL SECOND(3) ``` ```sql INTERVAL YEAR INTERVAL YEAR(p) INTERVAL YEAR(p) TO MONTH INTERVAL MONTH ``` ```sql {"type":"INTERVAL_YEAR_MONTH","nullable":true,"precision":4,"resolution":"YEAR_TO_MONTH"} ``` ```sql INTERVAL YEAR(4) TO MONTH ``` ```sql +years-months ``` ```sql INTERVAL YEAR(4) INTERVAL YEAR(4) TO MONTH INTERVAL MONTH ``` ```sql hour:minute:second[.fractional] ``` ```sql 00:00:00.000000000 ``` ```sql 23:59:59.999999999 ``` ```sql TIME TIME(p) TIME_WITHOUT_TIME_ZONE TIME_WITHOUT_TIME_ZONE(p) ``` ```sql {"type":"TIME_WITHOUT_TIME_ZONE","nullable":true,"precision":3} ``` ```sql "10:56:22.541" ``` ```sql 10:56:22.541 ``` ```sql java.time.LocalTime ``` ```sql TIME WITHOUT TIME ZONE ``` ```sql year-month-day hour:minute:second[.fractional] ``` ```sql 0000-01-01 00:00:00.000000000 ``` ```sql 9999-12-31 23:59:59.999999999 ``` ```sql TIMESTAMP TIMESTAMP(p) TIMESTAMP WITHOUT TIME ZONE TIMESTAMP(p) WITHOUT TIME ZONE ``` ```sql {"type":"TIMESTAMP_WITHOUT_TIME_ZONE","nullable":true,"precision":3} ``` ```sql TIMESTAMP(3) ``` ```sql "2023-04-06 10:59:32.628" ``` ```sql 2023-04-06 10:59:32.628 ``` ```sql TIMESTAMP(p) ``` ```sql java.time.LocalDateTime ``` ```sql java.time.Instant ``` ```sql TIMESTAMP_LTZ ``` ```sql TIMESTAMP WITHOUT TIME ZONE ``` ```sql year-month-day hour:minute:second[.fractional] zone ``` ```sql 0000-01-01 00:00:00.000000000 +14:59 ``` ```sql 9999-12-31 23:59:59.999999999 -14:59 ``` ```sql TIMESTAMP_LTZ TIMESTAMP_LTZ(p) TIMESTAMP WITH LOCAL TIME ZONE TIMESTAMP(p) WITH LOCAL TIME ZONE ``` ```sql {"type":"TIMESTAMP_WITH_LOCAL_TIME_ZONE","nullable":true,"precision":3} ``` ```sql TIMESTAMP(3) WITH LOCAL TIME ZONE ``` ```sql "2023-04-06 11:06:47.224" ``` ```sql 2023-04-06 11:06:47.224 ``` ```sql TIMESTAMP_LTZ(p) ``` ```sql java.time.OffsetDateTime ``` ```sql TIMESTAMP WITH TIME ZONE ``` ```sql java.time.Instant ``` ```sql TIMESTAMP_LTZ ``` ```sql TIMESTAMP_LTZ(3) ``` ```sql TIMESTAMP WITH LOCAL TIME ZONE ``` ```sql sql.local-time-zone ``` ```sql LocalDateTime ``` ```sql year-month-day hour:minute:second[.fractional] ``` ```sql 0000-01-01 00:00:00.000000000 +14:59 ``` ```sql 9999-12-31 23:59:59.999999999 -14:59 ``` ```sql TIMESTAMP WITH TIME ZONE TIMESTAMP(p) WITH TIME ZONE ``` ```sql ARRAY t ARRAY ``` ```sql {"type":"ARRAY","nullable":true,"elementType":{"type":"INTEGER","nullable":true}} ``` ```sql ["1", "2", "3", null] ``` ```sql [1, 2, 3, NULL] ``` ```sql MAP ``` ```sql {"type":"MAP","nullable":true,"keyType":{"type":"INTEGER","nullable":true},"valueType":{"type":"VARCHAR","nullable":true,"length":2147483647}} ``` ```sql MAP ``` ```sql [["1", "a"], ["2", "b"], [null, "c"]] ``` ```sql {1=a, 2=b, NULL=c} ``` ```sql MAP ``` ```sql MULTISET t MULTISET ``` ```sql {"type":"MULTISET","nullable":true,"elementType":{"type":"INTEGER","nullable":true}} ``` ```sql MULTISET ``` ```sql [["a", "1"], ["b", "2"], [null, "1"]] ``` ```sql {a=1, b=2, NULL=1} ``` ```sql MULTISET ``` ```sql INT MULTISET ``` ```sql MULTISET ``` ```sql ROW ROW ROW(name0 type0, name1 type1, ...) ROW(name0 type0 'description0', name1 type1 'description1', ...) ``` ```sql {"type":"ROW","nullable":true,"fields":[{"name":"a","fieldType":{"type":"INTEGER","nullable":true}},{"name":"b","fieldType":{"type":"VARCHAR","nullable":true,"length":2147483647}}]} ``` ```sql MULTISET ``` ```sql [["a", "1"], ["b", "2"], [null, "1"]] ``` ```sql {a=1, b=2, NULL=1} ``` ```sql ROW ``` ```sql CREATE TABLE table_with_row_types ( `Customer` ROW, `Order` ROW ); ``` ```sql INSERT INTO table_with_row_types VALUES (('Alice', 30), (101, 'Book')), (('Bob', 25), (102, 'Laptop')), (('Charlie', 35), (103, 'Phone')), (('Diana', 28), (104, 'Tablet')), (('Eve', 22), (105, 'Headphones')); ``` ```sql SELECT `Customer`.name, `Customer`.age, `Order`.id, `Order`.title FROM table_with_row_types WHERE `Customer`.age > 30; ``` ```sql ROW(fieldOne INT, fieldTwo BOOLEAN) ``` ```sql ROW ``` ```sql ROW<`a-b` INT, b STRING, `weird_col``_umn` STRING> ``` ```sql {"type":"ROW","nullable":true,"fields":[{"name":"a","fieldType":{"type":"INTEGER","nullable":true},"description":"hello"}]} ``` ```sql ROW ``` ```sql {"type":"BOOLEAN","nullable":true} ``` ```sql {"type":"NULL"} ``` ```sql ValidationException ``` ```sql -- returns 42 of type INT NOT NULL SELECT CAST('42' AS INT); -- returns NULL of type VARCHAR SELECT CAST(NULL AS VARCHAR); -- throws an exception and fails the job SELECT CAST('non-number' AS INT); -- returns 42 of type INT SELECT TRY_CAST('42' AS INT); -- returns NULL of type VARCHAR SELECT TRY_CAST(NULL AS VARCHAR); -- returns NULL of type INT SELECT TRY_CAST('non-number' AS INT); -- returns 0 of type INT NOT NULL SELECT COALESCE(TRY_CAST('non-number' AS INT), 0); ``` ```sql TO_TIMESTAMP ``` ```sql TO_TIMESTAMP_LTZ ``` ```sql MONTH TO YEAR ``` ```sql DAY TO TIME ``` --- ### Example Data Streams in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/example-data.html Example Data Streams in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides an Examples catalog that has mock data streams you can use for experimenting with Flink SQL queries. The examples catalog is available in all environments. All example tables have $rowtime available as a system column. The SOURCE_WATERMARK() strategy for example tables is different than the SOURCE_WATERMARK() strategy Kafka-based tables. For the example tables, the SOURCE_WATAERMARK() corresponds to the maximum timestamp seen to this point. You can use example data in Flink workspaces, Flink shell, Terraform, and all other clients. Example data is read-only, so you can’t use INSERT INTO/ALTER/DROP/CREATE statements on these tables, the database, or the catalog. SHOW statements work for the database, catalog, and tables. SHOW CREATE TABLE works for the example tables. Publish to a Kafka topic¶ You can publish any of the example streams to a Kafka topic by creating a Flink table and populating it with the INSERT INTO FROM SELECT statement. Confluent Cloud for Apache Flink creates a Kafka topic automatically for the table. Run the following statements to create and populate a customers_source table with the examples.marketplace.customers stream. CREATE TABLE customers_source ( customer_id INT, name STRING, address STRING, postcode STRING, city STRING, email STRING, PRIMARY KEY (customer_id) NOT ENFORCED ); INSERT INTO customers_source( customer_id, name, address, postcode, city, email ) SELECT * FROM examples.marketplace.customers; Run the following statement to inspect the customers_source table: SELECT * FROM customers_source; Your output should resemble: customer_id name address postcode city email 3172 Roseanna Bode 6744 Kacy Bypass 22635 Margarettborough rico.zboncak@yahoo.com 3055 Josiah Morissette PhD 61799 Friesen Islands 14194 North Abbybury thomas.dach@gmail.com 3177 Buddy Hill 6836 Graham Street 72767 South Earnest enoch.turcotte@hotmail.com ... Navigate to the Environments page, and in the navigation menu, click Data portal. In the Data portal page, click the dropdown menu and select the environment for your workspace. In the Recently created section, find your customers_source topic and click it to open the details pane. Click View all messages to open the Message viewer on the customers_source topic. Observe the example data from the examples.marketplace.customers flowing into the Kafka topic. Important The INSERT INTO statement runs continuously until you stop it manually. Free resources in your compute pool by deleting the long-running statement when you’re done. Marketplace database¶ The marketplace database provides streams that simulate commerce-related data. The marketplace database has these tables: clicks: simulates a stream of user clicks on a web page. customers: simulates a stream of customers who order products. orders: simulates a stream of orders. products: simulates a stream of products that a customer has ordered. clicks table¶ To access the clicks example stream, use the fully qualified string, examples.marketplace.clicks in your queries. The clicks table has the following schema: CREATE TABLE clicks ( click_id STRING, -- UUID user_id INT, -- range between 3000 and 5000 url STRING, -- regex https://www[.]acme[.]com/product/[a-z]{5} user_agent STRING, -- set by the datafaker Internet class view_time INT -- range between 10 and 120 ); The user_agent field is assigned by the datafaker Internet class. Run the following statement to inspect the clicks data stream: SELECT * FROM examples.marketplace.clicks; Your output should resemble: click_id user_id url user_agent view_time 23add2ce-da47-47c1-925a-f7c1def06f0c 3278 https://www.acme.com/product/mqwpg Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; AS; rv:11.0) like … 11 b81dc020-5ad2-493f-8175-d3e50e40f411 4919 https://www.acme.com/product/vycnj Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko)… 58 b62ae975-0f5d-4e87-9cbe-45b7661ad327 3461 https://www.acme.com/product/pghkm Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML… 105 ... customers table¶ To access the customers example stream, use the fully qualified string, examples.marketplace.customers in your queries. The customers table has the following schema: CREATE TABLE customers ( customer_id INT, -- range between 3000 and 3250 name STRING, -- set by the datafaker Name class address STRING, -- set by the datafaker Address class postcode STRING, -- set by the datafaker Address class city STRING, -- set by the datafaker Address class email STRING, -- set by the datafaker Internet class PRIMARY KEY (customer_id) NOT ENFORCED ); The name field is assigned by the datafaker Name class The address fields are assigned by the datafaker Address class. The email field is assigned by the datafaker Internet class. Run the following statement to inspect the customers data stream: SELECT * FROM examples.marketplace.customers; Your output should resemble: customer_id name address postcode city email 3023 Ellsworth Price 0644 Mara Drive 29407 Emilyhaven sheldon.sipes@gmail.com 3003 Jayme Buckridge 320 Schumm Green 38752 Schowalterchester johnsie.hane@yahoo.com 3010 Les Beier 7032 Gerda Road 66841 Deckowside minnie.becker@hotmail.com ... orders table¶ To access the orders example stream, use the fully qualified string, examples.marketplace.orders in your queries. The customer_id and product_id are suitable for joins with the customers and products streams. CREATE TABLE orders ( order_id STRING, -- UUID customer_id INT, -- range between 3000 and 3250 product_id INT, -- range between 1000 and 1500 price DOUBLE -- range between 0.00 and 100.00 ); Run the following statement to inspect the orders data stream: SELECT * FROM examples.marketplace.orders; Your output should resemble: order_id customer_id product_id price 36d77b21-e68f-4123-b87a-cc19ac1f36ac 3137 1305 65.71 7fd3cd2a-392b-4f8f-b953-0bfa1d331354 3063 1327 17.75 1a223c61-38a5-4b8c-8465-2a6b359bf05e 3064 1166 14.95 ... Run the following statement to join the orders data stream with the customers and products streams. The query shows the name of the customer, and the product name, and the price of the order. SELECT examples.marketplace.customers.name AS customer_name, examples.marketplace.products.name AS product_name, examples.marketplace.orders.price FROM examples.marketplace.products JOIN examples.marketplace.orders ON examples.marketplace.products.product_id = examples.marketplace.orders.product_id JOIN examples.marketplace.customers ON examples.marketplace.customers.customer_id = examples.marketplace.orders.customer_id; Your output should resemble: customer_name product_name price Mr. Lexie Collins Fantastic Rubber Car 32.76 Lyle Spencer Synergistic Leather Clock 21.28 Mrs. Candida Howe Lightweight Silk Hat 35.38 Colette Ebert Sleek Steel Keyboard 92.22 products table¶ To access the products example stream, use the fully qualified string, examples.marketplace.products in your queries. CREATE TABLE products ( product_id INT, -- range between 1000 and 1500 name STRING, -- set by the datafaker Commerce class brand STRING, -- set by the datafaker Commerce class vendor STRING, -- set by the datafaker Commerce class department STRING, -- set by the datafaker Commerce class PRIMARY KEY (product_id) NOT ENFORCED ); The product fields are assigned by the datafaker Commerce class. Run the following statement to inspect the products data stream: SELECT * FROM examples.marketplace.products; Your output should resemble: product_id name brand vendor department 1440 Enormous Aluminum Keyboard LG Dollar General Garden & Movies 1404 Practical Plastic Computer Adidas Target Outdoors 1132 Gorgeous Paper Watch Samsung Amazon Home, Kids & Movies ... Related content¶ DDL Statements Flink SQL Queries Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql SOURCE_WATERMARK() ``` ```sql SOURCE_WATERMARK() ``` ```sql SOURCE_WATAERMARK() ``` ```sql customers_source ``` ```sql examples.marketplace.customers ``` ```sql CREATE TABLE customers_source ( customer_id INT, name STRING, address STRING, postcode STRING, city STRING, email STRING, PRIMARY KEY (customer_id) NOT ENFORCED ); INSERT INTO customers_source( customer_id, name, address, postcode, city, email ) SELECT * FROM examples.marketplace.customers; ``` ```sql customers_source ``` ```sql SELECT * FROM customers_source; ``` ```sql customer_id name address postcode city email 3172 Roseanna Bode 6744 Kacy Bypass 22635 Margarettborough rico.zboncak@yahoo.com 3055 Josiah Morissette PhD 61799 Friesen Islands 14194 North Abbybury thomas.dach@gmail.com 3177 Buddy Hill 6836 Graham Street 72767 South Earnest enoch.turcotte@hotmail.com ... ``` ```sql customers_source ``` ```sql examples.marketplace.customers ``` ```sql marketplace ``` ```sql marketplace ``` ```sql examples.marketplace.clicks ``` ```sql CREATE TABLE clicks ( click_id STRING, -- UUID user_id INT, -- range between 3000 and 5000 url STRING, -- regex https://www[.]acme[.]com/product/[a-z]{5} user_agent STRING, -- set by the datafaker Internet class view_time INT -- range between 10 and 120 ); ``` ```sql SELECT * FROM examples.marketplace.clicks; ``` ```sql click_id user_id url user_agent view_time 23add2ce-da47-47c1-925a-f7c1def06f0c 3278 https://www.acme.com/product/mqwpg Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; AS; rv:11.0) like … 11 b81dc020-5ad2-493f-8175-d3e50e40f411 4919 https://www.acme.com/product/vycnj Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko)… 58 b62ae975-0f5d-4e87-9cbe-45b7661ad327 3461 https://www.acme.com/product/pghkm Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML… 105 ... ``` ```sql examples.marketplace.customers ``` ```sql CREATE TABLE customers ( customer_id INT, -- range between 3000 and 3250 name STRING, -- set by the datafaker Name class address STRING, -- set by the datafaker Address class postcode STRING, -- set by the datafaker Address class city STRING, -- set by the datafaker Address class email STRING, -- set by the datafaker Internet class PRIMARY KEY (customer_id) NOT ENFORCED ); ``` ```sql SELECT * FROM examples.marketplace.customers; ``` ```sql customer_id name address postcode city email 3023 Ellsworth Price 0644 Mara Drive 29407 Emilyhaven sheldon.sipes@gmail.com 3003 Jayme Buckridge 320 Schumm Green 38752 Schowalterchester johnsie.hane@yahoo.com 3010 Les Beier 7032 Gerda Road 66841 Deckowside minnie.becker@hotmail.com ... ``` ```sql examples.marketplace.orders ``` ```sql customer_id ``` ```sql CREATE TABLE orders ( order_id STRING, -- UUID customer_id INT, -- range between 3000 and 3250 product_id INT, -- range between 1000 and 1500 price DOUBLE -- range between 0.00 and 100.00 ); ``` ```sql SELECT * FROM examples.marketplace.orders; ``` ```sql order_id customer_id product_id price 36d77b21-e68f-4123-b87a-cc19ac1f36ac 3137 1305 65.71 7fd3cd2a-392b-4f8f-b953-0bfa1d331354 3063 1327 17.75 1a223c61-38a5-4b8c-8465-2a6b359bf05e 3064 1166 14.95 ... ``` ```sql SELECT examples.marketplace.customers.name AS customer_name, examples.marketplace.products.name AS product_name, examples.marketplace.orders.price FROM examples.marketplace.products JOIN examples.marketplace.orders ON examples.marketplace.products.product_id = examples.marketplace.orders.product_id JOIN examples.marketplace.customers ON examples.marketplace.customers.customer_id = examples.marketplace.orders.customer_id; ``` ```sql customer_name product_name price Mr. Lexie Collins Fantastic Rubber Car 32.76 Lyle Spencer Synergistic Leather Clock 21.28 Mrs. Candida Howe Lightweight Silk Hat 35.38 Colette Ebert Sleek Steel Keyboard 92.22 ``` ```sql examples.marketplace.products ``` ```sql CREATE TABLE products ( product_id INT, -- range between 1000 and 1500 name STRING, -- set by the datafaker Commerce class brand STRING, -- set by the datafaker Commerce class vendor STRING, -- set by the datafaker Commerce class department STRING, -- set by the datafaker Commerce class PRIMARY KEY (product_id) NOT ENFORCED ); ``` ```sql SELECT * FROM examples.marketplace.products; ``` ```sql product_id name brand vendor department 1440 Enormous Aluminum Keyboard LG Dollar General Garden & Movies 1404 Practical Plastic Computer Adidas Target Outdoors 1132 Gorgeous Paper Watch Samsung Amazon Home, Kids & Movies ... ``` --- ### Confluent CLI commands with Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/flink-sql-cli.html Confluent CLI commands with Confluent Cloud for Apache Flink¶ Manage Flink SQL statements and compute pools in Confluent Cloud for Apache Flink® by using the confluent flink commands in the Confluent CLI. To see the available commands, use the --help option. confluent flink statement --help confluent flink compute-pool --help confluent flink region --help Use the Confluent CLI to manage these features: Statements Compute pools Regions For the complete CLI reference, see confluent flink statement. In addition to the CLI, you can manage Flink statements and compute pools by using these Confluent tools: Flink SQL REST API Cloud Console SQL shell Confluent Terraform Provider Manage statements¶ Using the Confluent CLI, you can perform these actions: Submit a statement List statements Describe a statement List exceptions from a statement Delete a statement Update a statement Managing Flink SQL statements may require the following inputs, depending on the command: export STATEMENT_NAME="" # example: "user-filter" export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export CLUSTER_ID="" # example: "lkc-a1b2c3" export PRINCIPAL_ID="" # example: "sa-23kgz4" for a service account, or "u-aq1dr2" for a user account export SQL_CODE="" # example: "SELECT * FROM USERS;" For the complete CLI reference, see confluent flink statement. Submit a statement¶ The confluent flink statement create command submits a statement in your compute pool. Run the following command to submit a Flink SQL statement in the current compute pool with your user account. confluent flink statement create --sql "${SQL_CODE}" Your output should resemble: +---------------+------------------------------------------------------------+ | Creation Date | 2024-02-28 21:08:08.9749 +0000 | | | UTC | | Name | cli-2024-02-28-130806-78dd77b5-16a9-40ab-9786-db95b9895eaa | | Statement | Select 1; | | Compute Pool | lfcp-8m09g0 | | Status | PENDING | +---------------+------------------------------------------------------------+ For long-running statements, Confluent recommends submitting statements with a service account instead of your user account. The following command submits a Flink SQL statement for the specified principal in the specified compute pool and Flink database (Kafka cluster). confluent flink statement create ${STATEMENT_NAME} \ --service-account ${PRINCIPAL_ID} \ --sql "${SQL_CODE}" \ --compute-pool ${COMPUTE_POOL_ID} \ --database ${CLUSTER_ID} List statements¶ Run the confluent flink statement list command to list all of the non-deleted statements in your environment. confluent flink statement list Your output should resemble: Creation Date | Name | Statement | Compute Pool | Status | Status Detail --------------------------------+----------------------+--------------------------------+--------------+-----------+--------------------------------- 2023-07-08 21:04:06 +0000 UTC | 4b1d3494-f0f7-460d-9 | INSERT INTO copytopic | lfcp-r2j1x9 | RUNNING | | | SELECT symbol,price from | | | | | topic_datagen; | | | 2023-07-08 21:07:04 +0000 UTC | 6c43b973-b3c6-4be8-9 | INSERT INTO copytopic | lfcp-r2j1x9 | RUNNING | | | SELECT symbol,price from | | | | | topic_datagen; | | | ... To list only the statements that you’ve created, get the context for your current Confluent Cloud login session and provide the context with the context option. confluent context list Your output should resemble: Current | Name | Platform | Credential ----------+--------------------------------------------------------+-----------------+------------------------------------ * | login--https://confluent.cloud | confluent.cloud | username- For convenience, save the context in an environment variable: export MY_CONTEXT="login--https://confluent.cloud" Run the confluent flink statement list command with your context. confluent flink statement list ${MY_CONTEXT} Your output should resemble: Creation Date | Name | Statement | Compute Pool | Status | Status Detail ---------------------------------+------------------------------------------------------------+-----------+--------------+-----------+---------------- 2024-02-28 21:08:08.9749 +0000 | cli-2024-02-28-130806-78dd77b5-16a9-40ab-9786-db95b9895eaa | Select 1; | lfcp-8m09g0 | COMPLETED | UTC | | | | | ... To list only the statements in your compute pool, provide the compute pool ID with the --compute-pool option. confluent flink statement list --compute-pool ${COMPUTE_POOL_ID} Describe a statement¶ Run the confluent flink statement describe command to view the details of an existing statement. confluent flink statement describe ${STATEMENT_NAME} Your output should resemble: Creation Date | Name | Statement | Compute Pool | Status | Status Detail --------------------------------+--------------------+------------+--------------+-----------+---------------- 2023-07-19 19:26:52 +0000 UTC | fdc6cbf5-038a-408c | show jobs; | lfcp-a1b2c3 | COMPLETED | List exceptions from a statement¶ Run the confluent flink statement exception list command to get exceptions that have been thrown by a statement. confluent flink statement exception list ${STATEMENT_NAME} Delete a statement¶ Run the confluent flink statement delete command to delete an existing statement permanently. All of its resources, like checkpoints, are also deleted. Deleting a statement stops charges for its use. confluent flink statement delete ${STATEMENT_NAME} Your output should resemble: Deleted Flink SQL statement "ac23db14-b5dc-49fb-b". Update a statement¶ Run the confluent flink statement delete command to stop an existing statement or resume a stopped statement. # Request to stop a statement. confluent flink statement update ${STATEMENT_NAME} --stopped=true # Request to resume a stopped statement. confluent flink statement update ${STATEMENT_NAME} --stopped=false Manage compute pools¶ Using the Confluent CLI, you can perform these actions: Create a compute pool Describe a compute pool List compute pools Update a compute pool Set the current compute pool Unset the current compute pool Delete a compute pool You must be authorized to create, update, delete (FlinkAdmin) or use (FlinkDeveloper) a compute pool. For more information, see Grant Role-Based Access in Confluent Cloud for Apache Flink. Managing compute pools may require the following inputs, depending on the command: export COMPUTE_POOL_NAME= # human-readable name, for example, "my-compute-pool" export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export MAX_CFU="" # example: 5 For the complete CLI reference, see confluent flink compute-pool. Create a compute pool¶ Run the confluent flink compute-pool create command to create a compute pool. Creating a compute pool requires the following inputs: export COMPUTE_POOL_NAME= # human-readable name, for example, "my-compute-pool" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export ENV_ID="" # example: "env-z3y2x1" export MAX_CFU="" # example: 5 Run the following command to create a compute pool in the specified cloud provider and environment. confluent flink compute-pool create ${COMPUTE_POOL_NAME} \ --cloud ${CLOUD_PROVIDER} \ --region ${CLOUD_REGION} \ --max-cfu ${MAX_CFU} \ --environment ${ENV_ID} Your output should resemble: +-------------+-----------------+ | Current | false | | ID | lfcp-xxd6og | | Name | my-compute-pool | | Environment | env-z3y2x1 | | Current CFU | 0 | | Max CFU | 5 | | Cloud | AWS | | Region | us-east-1 | | Status | PROVISIONING | +-------------+-----------------+ Describe a compute pool¶ Run the confluent flink compute-pool describe command to get details about a compute pool. Describing a compute pool requires the following inputs: export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export ENV_ID="" # example: "env-z3y2x1" Run the following command to get details about a compute pool in the specified environment. confluent flink compute-pool describe ${COMPUTE_POOL_ID} \ --environment ${ENV_ID} Your output should resemble: +-------------+-----------------+ | Current | false | | ID | lfcp-xxd6og | | Name | my-compute-pool | | Environment | env-z3y2x1 | | Current CFU | 0 | | Max CFU | 5 | | Cloud | AWS | | Region | us-east-1 | | Status | PROVISIONED | +-------------+-----------------+ List compute pools¶ Run the confluent flink compute-pool list command to compute pools in the specified environment. Listing compute pools may require the following inputs, depending on the command: export CLOUD_REGION="" # example: "us-east-1" export ENV_ID="" # example: "env-z3y2x1" Run the following command to get details about a compute pool in the specified environment. confluent flink compute-pool list --environment ${ENV_ID} Your output should resemble: Current | ID | Name | Environment | Current CFU | Max CFU | Cloud | Region | Status ----------+-------------+---------------------------+-------------+-------------+---------+-------+-----------+-------------- * | lfcp-xxd6og | my-compute-pool | env-z3y2x1 | 0 | 5 | AWS | us-east-1 | PROVISIONED | lfcp-8m03rm | test-blue-compute-pool | env-z3q9rd | 0 | 10 | AWS | us-east-1 | PROVISIONED ... Update a compute pool¶ Run the confluent flink compute-pool update command to update a compute pool. Updating a compute pool may require the following inputs, depending on the command: export COMPUTE_POOL_NAME= # human-readable name, for example, "my-compute-pool" export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export ENV_ID="" # example: "env-z3y2x1" export MAX_CFU="" # example: 5 Run the following command to update a compute pool in the specified environment. confluent flink compute-pool update ${COMPUTE_POOL_ID} \ --environment ${ENV_ID} \ --name ${COMPUTE_POOL_NAME} \ --max-cfu ${MAX_CFU} Your output should resemble: +-------------+----------------------+ | Current | false | | ID | lfcp-xxd6og | | Name | renamed-compute-pool | | Environment | env-z3y2x1 | | Current CFU | 0 | | Max CFU | 10 | | Cloud | AWS | | Region | us-east-1 | | Status | PROVISIONED | +-------------+----------------------+ Set the current compute pool¶ Run the confluent flink compute-pool use command to use a compute pool in subsequent commands. Setting a compute pool requires the following inputs: export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export ENV_ID="" # example: "env-z3y2x1" Run the following commands to set the current compute pool in the specified environment. First, you must run the confluent environment use command to set the current environment. confluent environment use ${ENV_ID} && \ confluent flink compute-pool use ${COMPUTE_POOL_ID} Your output should resemble: Using environment "env-z3y2x1". Using Flink compute pool "lfcp-xxd6og". Unset the current compute pool¶ Run the confluent flink compute-pool unset command to unset the current compute pool. Run the following command to unset the current compute pool. confluent flink compute-pool unset Your output should resemble: Unset Flink compute pool "lfcp-xxd6og". Delete a compute pool¶ Run the confluent flink compute-pool delete command to delete a compute pool. Run the following command to delete a compute pool in the specified environment. The optional --force flag skips the confirmation prompt. confluent flink compute-pool delete ${COMPUTE_POOL_ID} \ --environment ${ENV_ID} --force Your output should resemble: Deleted Flink compute pool "lfcp-xxd6og". Manage regions¶ Using the Confluent CLI, you can perform these actions: List available regions Set the current region Managing Flink SQL regions may require the following inputs, depending on the command: export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" For the complete CLI reference, see confluent flink region. List available regions¶ Run the confluent flink region list to see all available regions where you can run Flink statements. confluent flink region list Your output should resemble: Current | Name | Cloud | Region ----------+-------------------------------+-------+----------------------- | Belgium (europe-west1) | gcp | europe-west1 | Frankfurt (eu-central-1) | aws | eu-central-1 | Frankfurt (europe-west3) | gcp | europe-west3 | Iowa (us-central1) | gcp | us-central1 | Ireland (eu-west-1) | aws | eu-west-1 | Las Vegas (us-west4) | gcp | us-west4 | London (eu-west-2) | aws | eu-west-2 * | N. Virginia (us-east-1) | aws | us-east-1 | N. Virginia (us-east4) | gcp | us-east4 | Netherlands (westeurope) | azure | westeurope | Ohio (us-east-2) | aws | us-east-2 | Oregon (us-west-2) | aws | us-west-2 | S. Carolina (us-east1) | gcp | us-east1 | Singapore (ap-southeast-1) | aws | ap-southeast-1 | Singapore (asia-southeast1) | gcp | asia-southeast1 | Singapore (southeastasia) | azure | southeastasia | Sydney (ap-southeast-2) | aws | ap-southeast-2 | Sydney (australia-southeast1) | gcp | australia-southeast1 | Virginia (eastus) | azure | eastus | Virginia (eastus2) | azure | eastus2 | Washington (westus2) | azure | westus2 Run the following command to filter the list of available regions by cloud provider. confluent flink region list --cloud ${CLOUD_PROVIDER} Your output should resemble: Current | Name | Cloud | Region ----------+----------------------------+-------+----------------- | Frankfurt (eu-central-1) | aws | eu-central-1 | Ireland (eu-west-1) | aws | eu-west-1 | London (eu-west-2) | aws | eu-west-2 * | N. Virginia (us-east-1) | aws | us-east-1 | Ohio (us-east-2) | aws | us-east-2 | Oregon (us-west-2) | aws | us-west-2 | Singapore (ap-southeast-1) | aws | ap-southeast-1 | Sydney (ap-southeast-2) | aws | ap-southeast-2 Set the current region¶ Run the confluent flink region use to set the current region where subsequent Flink statements run. You must have a compute pool in the region to run statements. confluent flink region use --cloud ${CLOUD_PROVIDER} --region ${CLOUD_REGION} For CLOUD_PROVIDER=aws and CLOUD_REGION=us-east-2, your output should resemble: Using Flink region "Ohio (us-east-2)". Related content¶ Flink SQL Shell Quick Start Monitor Flink SQL Statements Flink SQL REST API Confluent Terraform Provider Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql confluent flink statement --help confluent flink compute-pool --help confluent flink region --help ``` ```sql export STATEMENT_NAME="" # example: "user-filter" export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export CLUSTER_ID="" # example: "lkc-a1b2c3" export PRINCIPAL_ID="" # example: "sa-23kgz4" for a service account, or "u-aq1dr2" for a user account export SQL_CODE="" # example: "SELECT * FROM USERS;" ``` ```sql confluent flink statement create --sql "${SQL_CODE}" ``` ```sql +---------------+------------------------------------------------------------+ | Creation Date | 2024-02-28 21:08:08.9749 +0000 | | | UTC | | Name | cli-2024-02-28-130806-78dd77b5-16a9-40ab-9786-db95b9895eaa | | Statement | Select 1; | | Compute Pool | lfcp-8m09g0 | | Status | PENDING | +---------------+------------------------------------------------------------+ ``` ```sql confluent flink statement create ${STATEMENT_NAME} \ --service-account ${PRINCIPAL_ID} \ --sql "${SQL_CODE}" \ --compute-pool ${COMPUTE_POOL_ID} \ --database ${CLUSTER_ID} ``` ```sql confluent flink statement list ``` ```sql Creation Date | Name | Statement | Compute Pool | Status | Status Detail --------------------------------+----------------------+--------------------------------+--------------+-----------+--------------------------------- 2023-07-08 21:04:06 +0000 UTC | 4b1d3494-f0f7-460d-9 | INSERT INTO copytopic | lfcp-r2j1x9 | RUNNING | | | SELECT symbol,price from | | | | | topic_datagen; | | | 2023-07-08 21:07:04 +0000 UTC | 6c43b973-b3c6-4be8-9 | INSERT INTO copytopic | lfcp-r2j1x9 | RUNNING | | | SELECT symbol,price from | | | | | topic_datagen; | | | ... ``` ```sql confluent context list ``` ```sql Current | Name | Platform | Credential ----------+--------------------------------------------------------+-----------------+------------------------------------ * | login--https://confluent.cloud | confluent.cloud | username- ``` ```sql export MY_CONTEXT="login--https://confluent.cloud" ``` ```sql confluent flink statement list ${MY_CONTEXT} ``` ```sql Creation Date | Name | Statement | Compute Pool | Status | Status Detail ---------------------------------+------------------------------------------------------------+-----------+--------------+-----------+---------------- 2024-02-28 21:08:08.9749 +0000 | cli-2024-02-28-130806-78dd77b5-16a9-40ab-9786-db95b9895eaa | Select 1; | lfcp-8m09g0 | COMPLETED | UTC | | | | | ... ``` ```sql --compute-pool ``` ```sql confluent flink statement list --compute-pool ${COMPUTE_POOL_ID} ``` ```sql confluent flink statement describe ${STATEMENT_NAME} ``` ```sql Creation Date | Name | Statement | Compute Pool | Status | Status Detail --------------------------------+--------------------+------------+--------------+-----------+---------------- 2023-07-19 19:26:52 +0000 UTC | fdc6cbf5-038a-408c | show jobs; | lfcp-a1b2c3 | COMPLETED | ``` ```sql confluent flink statement exception list ${STATEMENT_NAME} ``` ```sql confluent flink statement delete ${STATEMENT_NAME} ``` ```sql Deleted Flink SQL statement "ac23db14-b5dc-49fb-b". ``` ```sql # Request to stop a statement. confluent flink statement update ${STATEMENT_NAME} --stopped=true # Request to resume a stopped statement. confluent flink statement update ${STATEMENT_NAME} --stopped=false ``` ```sql FlinkDeveloper ``` ```sql export COMPUTE_POOL_NAME= # human-readable name, for example, "my-compute-pool" export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export MAX_CFU="" # example: 5 ``` ```sql export COMPUTE_POOL_NAME= # human-readable name, for example, "my-compute-pool" export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" export ENV_ID="" # example: "env-z3y2x1" export MAX_CFU="" # example: 5 ``` ```sql confluent flink compute-pool create ${COMPUTE_POOL_NAME} \ --cloud ${CLOUD_PROVIDER} \ --region ${CLOUD_REGION} \ --max-cfu ${MAX_CFU} \ --environment ${ENV_ID} ``` ```sql +-------------+-----------------+ | Current | false | | ID | lfcp-xxd6og | | Name | my-compute-pool | | Environment | env-z3y2x1 | | Current CFU | 0 | | Max CFU | 5 | | Cloud | AWS | | Region | us-east-1 | | Status | PROVISIONING | +-------------+-----------------+ ``` ```sql export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export ENV_ID="" # example: "env-z3y2x1" ``` ```sql confluent flink compute-pool describe ${COMPUTE_POOL_ID} \ --environment ${ENV_ID} ``` ```sql +-------------+-----------------+ | Current | false | | ID | lfcp-xxd6og | | Name | my-compute-pool | | Environment | env-z3y2x1 | | Current CFU | 0 | | Max CFU | 5 | | Cloud | AWS | | Region | us-east-1 | | Status | PROVISIONED | +-------------+-----------------+ ``` ```sql export CLOUD_REGION="" # example: "us-east-1" export ENV_ID="" # example: "env-z3y2x1" ``` ```sql confluent flink compute-pool list --environment ${ENV_ID} ``` ```sql Current | ID | Name | Environment | Current CFU | Max CFU | Cloud | Region | Status ----------+-------------+---------------------------+-------------+-------------+---------+-------+-----------+-------------- * | lfcp-xxd6og | my-compute-pool | env-z3y2x1 | 0 | 5 | AWS | us-east-1 | PROVISIONED | lfcp-8m03rm | test-blue-compute-pool | env-z3q9rd | 0 | 10 | AWS | us-east-1 | PROVISIONED ... ``` ```sql export COMPUTE_POOL_NAME= # human-readable name, for example, "my-compute-pool" export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export ENV_ID="" # example: "env-z3y2x1" export MAX_CFU="" # example: 5 ``` ```sql confluent flink compute-pool update ${COMPUTE_POOL_ID} \ --environment ${ENV_ID} \ --name ${COMPUTE_POOL_NAME} \ --max-cfu ${MAX_CFU} ``` ```sql +-------------+----------------------+ | Current | false | | ID | lfcp-xxd6og | | Name | renamed-compute-pool | | Environment | env-z3y2x1 | | Current CFU | 0 | | Max CFU | 10 | | Cloud | AWS | | Region | us-east-1 | | Status | PROVISIONED | +-------------+----------------------+ ``` ```sql export COMPUTE_POOL_ID="" # example: "lfcp-8m03rm" export ENV_ID="" # example: "env-z3y2x1" ``` ```sql confluent environment use ``` ```sql confluent environment use ${ENV_ID} && \ confluent flink compute-pool use ${COMPUTE_POOL_ID} ``` ```sql Using environment "env-z3y2x1". Using Flink compute pool "lfcp-xxd6og". ``` ```sql confluent flink compute-pool unset ``` ```sql Unset Flink compute pool "lfcp-xxd6og". ``` ```sql confluent flink compute-pool delete ${COMPUTE_POOL_ID} \ --environment ${ENV_ID} --force ``` ```sql Deleted Flink compute pool "lfcp-xxd6og". ``` ```sql export CLOUD_PROVIDER="" # example: "aws" export CLOUD_REGION="" # example: "us-east-1" ``` ```sql confluent flink region list ``` ```sql Current | Name | Cloud | Region ----------+-------------------------------+-------+----------------------- | Belgium (europe-west1) | gcp | europe-west1 | Frankfurt (eu-central-1) | aws | eu-central-1 | Frankfurt (europe-west3) | gcp | europe-west3 | Iowa (us-central1) | gcp | us-central1 | Ireland (eu-west-1) | aws | eu-west-1 | Las Vegas (us-west4) | gcp | us-west4 | London (eu-west-2) | aws | eu-west-2 * | N. Virginia (us-east-1) | aws | us-east-1 | N. Virginia (us-east4) | gcp | us-east4 | Netherlands (westeurope) | azure | westeurope | Ohio (us-east-2) | aws | us-east-2 | Oregon (us-west-2) | aws | us-west-2 | S. Carolina (us-east1) | gcp | us-east1 | Singapore (ap-southeast-1) | aws | ap-southeast-1 | Singapore (asia-southeast1) | gcp | asia-southeast1 | Singapore (southeastasia) | azure | southeastasia | Sydney (ap-southeast-2) | aws | ap-southeast-2 | Sydney (australia-southeast1) | gcp | australia-southeast1 | Virginia (eastus) | azure | eastus | Virginia (eastus2) | azure | eastus2 | Washington (westus2) | azure | westus2 ``` ```sql confluent flink region list --cloud ${CLOUD_PROVIDER} ``` ```sql Current | Name | Cloud | Region ----------+----------------------------+-------+----------------- | Frankfurt (eu-central-1) | aws | eu-central-1 | Ireland (eu-west-1) | aws | eu-west-1 | London (eu-west-2) | aws | eu-west-2 * | N. Virginia (us-east-1) | aws | us-east-1 | Ohio (us-east-2) | aws | us-east-2 | Oregon (us-west-2) | aws | us-west-2 | Singapore (ap-southeast-1) | aws | ap-southeast-1 | Sydney (ap-southeast-2) | aws | ap-southeast-2 ``` ```sql confluent flink region use --cloud ${CLOUD_PROVIDER} --region ${CLOUD_REGION} ``` ```sql CLOUD_PROVIDER=aws ``` ```sql CLOUD_REGION=us-east-2 ``` ```sql Using Flink region "Ohio (us-east-2)". ``` --- ### SQL Information Schema in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/flink-sql-information-schema.html Information Schema in Confluent Cloud for Apache Flink¶ An information schema, or data dictionary, is a standard SQL schema with a collection of predefined views that enable accessing metadata about objects in Confluent Cloud for Apache Flink®. The Confluent INFORMATION_SCHEMA is based on the SQL-92 ANSI Information Schema, with the addition of views and functions that are specific to Confluent Cloud for Apache Flink. The ANSI standard uses “catalog” to refer to a database. In Confluent Cloud, “schema” refers to a database. Conceptually, the terms are equivalent. The views in the INFORMATION_SCHEMA provide information about database objects, such as tables, columns, and constraints. The views are organized into tables that you can query by using standard SQL statements. For example, you can use the INFORMATION_SCHEMA.COLUMNS table to get details about the columns in a table, like the column name, data type, and whether it allows null values. Similarly, you can use the INFORMATION_SCHEMA.TABLES table to get singular, static configuration details of the relation, if it’s a view, a table, or a system table. For example, you can query for details like watermark definition and the number of partitions in the topic. Every Flink catalog has a corresponding INFORMATION_SCHEMA, so you can always run a statement like SELECT (...) FROM .INFORMATION_SCHEMA.TABLES WHERE (...). Global views are available in every INFORMATION_SCHEMA, which means that you can query for information across catalogs. For example, you can query the global INFORMATION_SCHEMA.CATALOGS view to list all catalogs. The information schema is a powerful tool for querying metadata about your Flink catalogs and databases, and you can use it for a variety of purposes, such as generating reports, documenting a schema, and troubleshooting performance issues. The following views are supported in the Confluent INFORMATION_SCHEMA: Catalogs and databases CATALOGS INFORMATION_SCHEMA_CATALOG_NAME SCHEMATA / DATABASES Functions PARAMETERS ROUTINES TABLES COLUMNS KEY_COLUMN_USAGE SYSTEM_TABLES TABLE_CONSTRAINTS TABLE_OPTIONS VIEWS Query syntax in INFORMATION_SCHEMA¶ Metadata queries on the INFORMATION_SCHEMA tables support the following syntax. Supported data types: INT STRING Supported operators: SELECT WHERE UNION ALL Supported expressions: CAST(NULL AS dt), CAST(x as dt) UNION ALL (see this example) AND, OR = , <>, IS NULL, IS NOT NULL AS STRING and INT literals The following limitations apply to INFORMATION_SCHEMA: You can use INFORMATION_SCHEMA views only in SELECT statements, not in INSERT INTO statements. You can’t use INFORMATION_SCHEMA in joins with real tables. Only the previously listed equality and basic expressions are supported. Catalogs and databases¶ CATALOGS¶ The global catalogs view. The rows returned are limited to the schemas that you have permission to interact with. This view is an extension to the SQL standard. Column Name Data Type Standard Description CATALOG_ID STRING NOT NULL No The ID of the catalog/environment, for example, env-xmzdkk. CATALOG_NAME STRING NOT NULL No The human readable name of the catalog/environment, for example, default. ExampleRun the following code to query for all catalogs across environments. SELECT `CATALOG_ID`, `CATALOG_NAME` FROM `INFORMATION_SCHEMA`.`CATALOGS`; INFORMATION_SCHEMA_CATALOG_NAME¶ Local catalog view. Returns the name of the current information schema’s catalog. Column Name Data Type Standard Description CATALOG_ID STRING NOT NULL No The ID of the catalog/environment, for example, env-xmzdkk. CATALOG_NAME STRING NOT NULL Yes The human readable name of the catalog/environment, for example, default. ExampleRun the following code to query for the name of this information schema’s catalog. SELECT `CATALOG_ID`, `CATALOG_NAME` FROM `INFORMATION_SCHEMA`.`INFORMATION_SCHEMA_CATALOG_NAME` SCHEMATA / DATABASES¶ Describes databases within the catalog. For convenience, DATABASES is an alias for SCHEMATA. The rows returned are limited to the schemas that you have permission to interact with. Column Name Data Type Standard Description CATALOG_ID STRING NOT NULL No The ID of the catalog/environment, for example, env-xmzdkk. CATALOG_NAME STRING NOT NULL Yes The human readable name of the catalog/environment, for example, default. SCHEMA_ID STRING NOT NULL No The ID of the database/cluster, for example, lkc-kgjwwv. SCHEMA_NAME STRING NOT NULL Yes The human readable name of the database/cluster, for example, MyCluster. ExampleRun the following code to list all Flink databases within a catalog, (Kafka clusters within an environment), excluding information schema. SELECT `SCHEMA_ID`, `SCHEMA_NAME` FROM `INFORMATION_SCHEMA`.`SCHEMATA` WHERE `SCHEMA_NAME` <> 'INFORMATION_SCHEMA'; COLUMNS¶ Describes columns of tables and virtual tables (views) in the catalog. Column Name Data Type Standard Description COLUMN_NAME STRING NOT NULL Yes Column reference. COMMENT STRING NULL No (Like Databricks, Snowflake) An optional comment that describes the relation. DATA_TYPE STRING NOT NULL Yes Type root, for example, VARCHAR or ROW. DISTRIBUTION_ORDINAL_POSITION INT NULL No (Like BigQuery for clustering key) If the table IS_DISTRIBUTED, contains the position of the key in a DISTRIBUTED BY clause. FULL_DATA_TYPE STRING NOT NULL No (Like Databricks) Fully qualified data type. for example, VARCHAR(32) or ROW<…>. GENERATION_EXPRESSION STRING NULL Yes (Like BigQuery and Databricks) For computed columns. IS_GENERATED STRING NOT NULL Yes Indicates whether column is a computed column. Values are YES or NO. IS_HIDDEN STRING NOT NULL No (Like BigQuery) Indicates whether a column is a system column. Values are YES or NO. IS_METADATA STRING NOT NULL No Indicates whether column is a metadata column. Values are YES or NO. IS_NULLABLE STRING NOT NULL No Indicates whether the column is nullable. Values are YES or NO. IS_PERSISTED STRING NOT NULL No Indicates whether a metadata column is stored during INSERT INTO. Also YES if a physical column. Values are YES or NO. METADATA_KEY STRING NULL No For metadata columns. ORDINAL_POSITION INT NOT NULL Yes Position of the column in the key, starting at 1. TABLE_CATALOG STRING NOT NULL Yes The human readable name of the catalog. TABLE_CATALOG_ID STRING NOT NULL No The ID of the catalog. TABLE_NAME STRING NOT NULL Yes The name of the relation. TABLE_SCHEMA STRING NOT NULL Yes The human-readable name of the database. TABLE_SCHEMA_ID STRING NOT NULL No The ID of the database. ExamplesThis example shows a complex query. The complexity comes from reducing the number of requests. Because the views are in normal form, instead of issuing three requests, you can batch them into single one by using UNION ALL. UNION ALL avoids the need for various inner/outer joins. The result is a sparse table that contains different “sections”. The overall schema looks like this: ( section, column_name, column_pos, column_type, constraint_name, constraint_type, constraint_enforced ) Run the following code to list columns, like name, position, data type, and their primary key characteristics. ( SELECT 'COLUMNS' AS `section`, `COLUMN_NAME` AS `column_name`, `ORDINAL_POSITION` AS `column_pos`, `FULL_DATA_TYPE` AS `column_type`, CAST(NULL AS STRING) AS `constraint_name`, CAST(NULL AS STRING) AS `constraint_type`, CAST(NULL AS STRING) AS `constraint_enforced` FROM ``.`INFORMATION_SCHEMA`.`COLUMNS` WHERE `TABLE_CATALOG` = '' AND `TABLE_SCHEMA` = '' AND `TABLE_NAME` = '' AND `IS_HIDDEN` = 'NO' ) UNION ALL ( SELECT 'TABLE_CONSTRAINTS' AS `section`, CAST(NULL AS STRING) AS `column_name`, CAST(NULL AS INT) AS `column_pos`, CAST(NULL AS STRING) AS `column_type`, `CONSTRAINT_NAME` AS `constraint_name`, `CONSTRAINT_TYPE` AS `constraint_type`, `ENFORCED` AS `constraint_enforced` FROM `<>`.`INFORMATION_SCHEMA`.`TABLE_CONSTRAINTS` WHERE `CONSTRAINT_CATALOG` = '' AND `CONSTRAINT_SCHEMA` = '' AND `TABLE_CATALOG` = '' AND `TABLE_SCHEMA` = '' AND `TABLE_NAME` = '' ) UNION ALL ( SELECT 'KEY_COLUMN_USAGE' AS `section`, `COLUMN_NAME` AS `column_name`, `ORDINAL_POSITION` AS `column_pos`, CAST(NULL AS STRING) AS `column_type`, `CONSTRAINT_NAME` AS `constraint_name`, CAST(NULL AS STRING) AS `constraint_type`, CAST(NULL AS STRING) AS `constraint_enforced` FROM `<>`.`INFORMATION_SCHEMA`.`KEY_COLUMN_USAGE` WHERE `TABLE_CATALOG` = '' AND `TABLE_SCHEMA` = '' AND `TABLE_NAME` = '' ); KEY_COLUMN_USAGE¶ Side view of TABLE_CONSTRAINTS for key columns. Column Name Data Type Standard Description COLUMN_NAME STRING NOT NULL Yes The name of the constrained column. CONSTRAINT_CATALOG STRING NOT NULL Yes Catalog name containing the constraint. CONSTRAINT_CATALOG_ID STRING NOT NULL No Catalog ID containing the constraint. CONSTRAINT_SCHEMA STRING NOT NULL Yes Schema name containing the constraint. CONSTRAINT_SCHEMA_ID STRING NOT NULL No Schema ID containing the constraint. CONSTRAINT_NAME STRING NOT NULL Yes Name of the constraint. ORDINAL_POSITION INT NOT NULL Yes The ordinal position of the column within the constraint key (starting at 1). TABLE_CATALOG STRING NOT NULL Yes The human readable name of the catalog. TABLE_CATALOG_ID STRING NOT NULL No The ID of the catalog. TABLE_NAME STRING NOT NULL Yes The name of the relation. TABLE_SCHEMA STRING NOT NULL Yes The human readable name of the database. TABLE_SCHEMA_ID STRING NOT NULL No The ID of the database. ExampleRun the following code to query for a side view of TABLE_CONSTRAINTS for key columns. SELECT * FROM `INFORMATION_SCHEMA`.`KEY_COLUMN_USAGE` SYSTEM_TABLES¶ Contains the object-level metadata for virtual tables within the catalog. Virtual tables do not represent physical data. Instead, they expose specific metadata. The rows returned are limited to the schemas the user has permission to interact with. Column Name Data Type Description BASE_TABLE_NAME STRING NULL The name of the relation to which this system view corresponds. COMMENT STRING NULL An optional comment that describes the system view. TABLE_CATALOG STRING NOT NULL The human-readable name of the catalog. TABLE_CATALOG_ID STRING NOT NULL The ID of the catalog. TABLE_NAME STRING NOT NULL The name of the relation. TABLE_SCHEMA STRING NOT NULL The human-readable name of the database. TABLE_SCHEMA_ID STRING NOT NULL The ID of the database. TABLES¶ Contains the object level metadata for tables and virtual tables (views) within the catalog. The rows returned are limited to the schemas that you have permission to interact with. Column Name Data Type Standard Description COMMENT STRING NULL No (Like Databricks and Snowflake) An optional comment that describes the relation. DISTRIBUTION_ALGORITHM STRING NULL No Only HASH is supported. DISTRIBUTION_BUCKETS INT NULL No Number of buckets, if defined. IS_DISTRIBUTED STRING NOT NULL No (Like Snowflake for clustering key) Indicates whether the table is bucketed using the DISTRIBUTED BY clause. Values are YES or NO. IS_WATERMARKED STRING NOT NULL No Indicates whether the table has a watermark from the WATERMARK FOR clause. Values are YES or NO. TABLE_CATALOG STRING NOT NULL Yes The human-readable name of the catalog. TABLE_CATALOG_ID STRING NOT NULL No The ID of the catalog. TABLE_NAME STRING NOT NULL Yes The name of the relation. TABLE_SCHEMA STRING NOT NULL Yes The human-readable name of the database. TABLE_SCHEMA_ID STRING NOT NULL No The ID of the database. TABLE_TYPE STRING NOT NULL Yes Values are BASE TABLE, EXTERNAL TABLE, SYSTEM TABLE, or VIEW [1]. WATERMARK_COLUMN STRING NULL No Time attribute column for which the watermark is defined. WATERMARK_EXPRESSION STRING NULL No Watermark expression. WATERMARK_IS_HIDDEN STRING NULL No Indicates whether the watermark is the default, system-provided one. [1]These are the valid values for the TABLE_TYPE column. BASE TABLE: For Confluent-native tables that can be used conceptually for reading and writing, like a regular database table. EXTERNAL TABLE: For non-native Confluent table, for example non-Kafka and Tableflow. Usually, those tables are read-only. SYSTEM TABLE: For tables that the system creates, either with a BASE TABLE or on its own. Only $error is supported. Compared with BASE TABLE, these tables are read-only. VIEW: For SQL views. ExamplesRun the following code to list all tables within a catalog (Kafka topics within an environment), excluding the information schema. SELECT `TABLE_CATALOG`, `TABLE_SCHEMA`, `TABLE_NAME` FROM `INFORMATION_SCHEMA`.`TABLES` WHERE `TABLE_SCHEMA` <> 'INFORMATION_SCHEMA'; Run the following code to list all tables within a database (Kafka topics within a cluster). SELECT `TABLE_CATALOG`, `TABLE_SCHEMA`, `TABLE_NAME` FROM ``.`INFORMATION_SCHEMA`.`TABLES` WHERE `TABLE_SCHEMA` = ''; TABLE_CONSTRAINTS¶ Side view of TABLES for all primary key constraints within the catalog. Column Name Data Type Standard Description CONSTRAINT_CATALOG STRING NOT NULL Yes Catalog name containing the constraint. CONSTRAINT_CATALOG_ID STRING NOT NULL No Catalog ID containing the constraint. CONSTRAINT_SCHEMA STRING NOT NULL Yes Schema name containing the constraint. CONSTRAINT_SCHEMA_ID STRING NOT NULL No Schema ID containing the constraint. CONSTRAINT_NAME STRING NOT NULL Yes Name of the constraint. CONSTRAINT_TYPE STRING NOT NULL Yes Currently, only PRIMARY KEY. ENFORCED STRING NOT NULL Yes YES if constraint is enforced, otherwise NO. TABLE_CATALOG STRING NOT NULL Yes The human readable name of the catalog. TABLE_CATALOG_ID STRING NOT NULL No The ID of the catalog. TABLE_NAME STRING NOT NULL Yes The name of the relation. TABLE_SCHEMA STRING NOT NULL Yes The human readable name of the database. TABLE_SCHEMA_ID STRING NOT NULL No The ID of the database. ExamplesRun the following code to query for a side view of TABLES for all primary key constraints within the catalog. SELECT * FROM `INFORMATION_SCHEMA`.`TABLE_CONSTRAINTS`; TABLE_OPTIONS¶ Side view of TABLES for WITH. Extension to the SQL Standard Information Schema. Column Name Data Type Description TABLE_CATALOG STRING NOT NULL The human readable name of the catalog. TABLE_CATALOG_ID STRING NOT NULL The ID of the catalog. TABLE_NAME STRING NOT NULL The name of the relation. TABLE_SCHEMA STRING NOT NULL The human readable name of the database. TABLE_SCHEMA_ID STRING NOT NULL The ID of the database. OPTION_KEY STRING NOT NULL Option key. OPTION_VALUE STRING NOT NULL Option value. ExamplesRun the following code to query for a side view of TABLES for WITH. SELECT * FROM `INFORMATION_SCHEMA`.`TABLE_OPTIONS`; VIEWS¶ Contains the object-level metadata for views within the catalog. The rows returned are limited to the schemas the user has permission to interact with. Column Name Data Type Standard Description COMMENT STRING NULL No (Like Databricks and Snowflake) An optional comment that describes the relation. TABLE_CATALOG STRING NOT NULL Yes The human-readable name of the catalog. TABLE_CATALOG_ID STRING NOT NULL No The ID of the catalog. TABLE_NAME STRING NOT NULL Yes The name of the relation. TABLE_SCHEMA STRING NOT NULL Yes The human-readable name of the database. TABLE_SCHEMA_ID STRING NOT NULL No The ID of the database. VIEW_DEFINITION STRING NULL Yes Text of the view’s expanded query expression. Like Databricks, NULL if the user does not own the view. Functions¶ Confluent Cloud for Apache Flink supports a number of features for routines. overloading structured types, that is, Java POJOs var-args procedures and polymorphic table functions (PTFs) table, model arguments with traits for PTFs input type strategies and return type strategies These special cases are considered in the INFORMATION_SCHEMA design: overloading: SPECIFIC_NAME with _1, _2 behavior, similar to Databricks structured types, return type strategies: FULL_DATA_TYPE and DATA_TYPE = NULL var-args, input type strategies: at least indicated with IS_STATIC = NO procedures and PTFs, table, model arguments with traits for PTFs: special columns TRAITS and DATA_TYPE = NULL ROUTINES¶ Contains the object-level metadata for functions within the catalog. Column Name Data Type Standard Description CREATED TIMESTAMP_LTZ(9) NOT NULL Yes Creation time of the function. DATA_TYPE STRING NULL Yes Type root, for example, VARCHAR or ROW. NULL is not standard but is reserved for user-defined functions with type strategies instead of a static return type or procedures. Table-valued functions always return a ROW type: it is either an automatic wrapper in case of a non-row type or a ROW returned by the function. EXTERNAL_ARTIFACTS STRING NULL No Contains the content from USING JAR. Null if not external. For example: confluent-artifact:///. If multiple artifacts are supported, use a semicolon separated list. The information about this being a JAR can be derived from EXTERNAL_LANGUAGE for now. EXTERNAL_LANGUAGE STRING NULL Yes JAVA or PYTHON. NULL if not external. EXTERNAL_NAME STRING NULL Yes Identifier in the external language. For example, for Java, it’s the fully qualified class path. NULL is for non-external functions, that is, functions implemented in SQL. Contains the content of the AS clause, for example, ‘’: CREATE FUNCTION … AS ‘’ FULL_DATA_TYPE STRING NULL No Fully qualified data type, for example, VARCHAR(32) or ROW<…>. NULL is not standard but reserved for user-defined functions with type strategies instead of a static return type or procedures. Table-valued functions always return a ROW type: it is either an automatic wrapper in case of a non-row type or a ROW returned by the function. FUNCTION_KIND STRING NULL No For ROUTINE_TYPE of FUNCTION or PTF, defines a more specific function kind, corresponding to Flink’s FunctionKind. Values are: TABLE, SCALAR, AGGREGATE, PROCESS_TABLE. Null is reserved for PROCEDURES. FUNCTION_REQUIREMENTS STRING NULL No Semicolon separated list of requirements for ROUTINE_TYPE of FUNCTION or PTF. Corresponds to Flink’s FunctionRequirement. Values are: OVER_WINDOW_ONLY. Null if there are no calling requirements. IS_DETERMINISTIC STRING NOT NULL Yes YES or NO. IS_DYNAMIC STRING NOT NULL No YES or NO. Whether the signature has static arguments or uses a strategy. In the latter case, PARAMETERS doesn’t contain information about the given specific routine. ROUTINE_BODY STRING NOT NULL Yes EXTERNAL for Java and Python. Deviates from standard because PTFs are also EXTERNAL. ROUTINE_CATALOG STRING NOT NULL Yes Matches SPECIFIC_CATALOG. ROUTINE_CATALOG_ID STRING NOT NULL No Matches SPECIFIC_CATALOG_ID. ROUTINE_NAME STRING NOT NULL Yes Name of the routine. ROUTINE_SCHEMA STRING NOT NULL Yes Matches SPECIFIC_SCHEMA. ROUTINE_SCHEMA_ID STRING NOT NULL No Matches SPECIFIC_SCHEMA_ID. ROUTINE_TYPE STRING NOT NULL Yes FUNCTION for user-defined functions. PTF for user-defined PTFs. SPECIFIC_CATALOG STRING NOT NULL Yes The human-readable name of the catalog. SPECIFIC_CATALOG_ID STRING NOT NULL No The ID of the catalog/environment. SPECIFIC_NAME STRING NOT NULL Yes Uniquely identifies a potentially overloaded routine signature. For example, a function f takes both f(INT) and f(STRING). Each overload gets a specific name such as f_1 or f_2. The specific name is not callable in SQL but is used for references by other INFORMATION_SCHEMA views such as PARAMETERS. SPECIFIC_SCHEMA STRING NOT NULL Yes The human-readable name of the database. SPECIFIC_SCHEMA_ID STRING NOT NULL No The ID of the database. PARAMETERS¶ The parameters view supports only functions with static arguments. Column Name Data Type Standard Description DATA_TYPE STRING NULL Yes Type root, for example, VARCHAR or ROW. NULL is not standard but is reserved for PTFs with untyped table arguments. FULL_DATA_TYPE STRING NULL No Fully qualified data type, for example, VARCHAR(32) or ROW<…>. IS_OPTIONAL STRING NOT NULL No YES or NO whether this parameter is optional. ORDINAL_POSITION INT NOT NULL Yes Position (1-based) of the parameter in the signature. PARAMETER_MODE STRING NOT NULL Yes Always IN. Reserved for future use. PARAMETER_NAME STRING NOT NULL Yes Name of the parameter. ROUTINE_NAME STRING NOT NULL No Name of the routine. SPECIFIC_CATALOG STRING NOT NULL Yes The human-readable name of the catalog. SPECIFIC_CATALOG_ID STRING NOT NULL No The ID of the catalog/environment. SPECIFIC_NAME STRING NOT NULL Yes Uniquely identifies a potentially overloaded routine signature. For example, a function f takes both f(INT) and f(STRING). Then each overload gets a specific name such as: f_1 or f_2. The specific name is not callable in SQL but is used for references by other INFORMATION_SCHEMA views such as ROUTINES. SPECIFIC_SCHEMA STRING NOT NULL Yes The human-readable name of the database. SPECIFIC_SCHEMA_ID STRING NOT NULL No The ID of the database. TRAITS STRING NOT NULL No Semicolon-separated list of traits. By default, SCALAR only. Related content¶ Data Types Queries Reserved Keywords Statements Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql SELECT (...) FROM .INFORMATION_SCHEMA.TABLES WHERE (...) ``` ```sql SELECT `CATALOG_ID`, `CATALOG_NAME` FROM `INFORMATION_SCHEMA`.`CATALOGS`; ``` ```sql SELECT `CATALOG_ID`, `CATALOG_NAME` FROM `INFORMATION_SCHEMA`.`INFORMATION_SCHEMA_CATALOG_NAME` ``` ```sql SELECT `SCHEMA_ID`, `SCHEMA_NAME` FROM `INFORMATION_SCHEMA`.`SCHEMATA` WHERE `SCHEMA_NAME` <> 'INFORMATION_SCHEMA'; ``` ```sql ( section, column_name, column_pos, column_type, constraint_name, constraint_type, constraint_enforced ) ``` ```sql ( SELECT 'COLUMNS' AS `section`, `COLUMN_NAME` AS `column_name`, `ORDINAL_POSITION` AS `column_pos`, `FULL_DATA_TYPE` AS `column_type`, CAST(NULL AS STRING) AS `constraint_name`, CAST(NULL AS STRING) AS `constraint_type`, CAST(NULL AS STRING) AS `constraint_enforced` FROM ``.`INFORMATION_SCHEMA`.`COLUMNS` WHERE `TABLE_CATALOG` = '' AND `TABLE_SCHEMA` = '' AND `TABLE_NAME` = '' AND `IS_HIDDEN` = 'NO' ) UNION ALL ( SELECT 'TABLE_CONSTRAINTS' AS `section`, CAST(NULL AS STRING) AS `column_name`, CAST(NULL AS INT) AS `column_pos`, CAST(NULL AS STRING) AS `column_type`, `CONSTRAINT_NAME` AS `constraint_name`, `CONSTRAINT_TYPE` AS `constraint_type`, `ENFORCED` AS `constraint_enforced` FROM `<>`.`INFORMATION_SCHEMA`.`TABLE_CONSTRAINTS` WHERE `CONSTRAINT_CATALOG` = '' AND `CONSTRAINT_SCHEMA` = '' AND `TABLE_CATALOG` = '' AND `TABLE_SCHEMA` = '' AND `TABLE_NAME` = '' ) UNION ALL ( SELECT 'KEY_COLUMN_USAGE' AS `section`, `COLUMN_NAME` AS `column_name`, `ORDINAL_POSITION` AS `column_pos`, CAST(NULL AS STRING) AS `column_type`, `CONSTRAINT_NAME` AS `constraint_name`, CAST(NULL AS STRING) AS `constraint_type`, CAST(NULL AS STRING) AS `constraint_enforced` FROM `<>`.`INFORMATION_SCHEMA`.`KEY_COLUMN_USAGE` WHERE `TABLE_CATALOG` = '' AND `TABLE_SCHEMA` = '' AND `TABLE_NAME` = '' ); ``` ```sql SELECT * FROM `INFORMATION_SCHEMA`.`KEY_COLUMN_USAGE` ``` ```sql EXTERNAL TABLE ``` ```sql SYSTEM TABLE ``` ```sql SELECT `TABLE_CATALOG`, `TABLE_SCHEMA`, `TABLE_NAME` FROM `INFORMATION_SCHEMA`.`TABLES` WHERE `TABLE_SCHEMA` <> 'INFORMATION_SCHEMA'; ``` ```sql SELECT `TABLE_CATALOG`, `TABLE_SCHEMA`, `TABLE_NAME` FROM ``.`INFORMATION_SCHEMA`.`TABLES` WHERE `TABLE_SCHEMA` = ''; ``` ```sql SELECT * FROM `INFORMATION_SCHEMA`.`TABLE_CONSTRAINTS`; ``` ```sql SELECT * FROM `INFORMATION_SCHEMA`.`TABLE_OPTIONS`; ``` ```sql FULL_DATA_TYPE ``` ```sql DATA_TYPE = NULL ``` ```sql IS_STATIC = NO ``` ```sql DATA_TYPE = NULL ``` ```sql PROCESS_TABLE ``` ```sql OVER_WINDOW_ONLY ``` --- ### SQL aggregate functions in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/functions/aggregate-functions.html Aggregate Functions in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides these built-in functions to aggregate rows in Flink SQL queries: AVG COLLECT COUNT CUME_DIST DENSE_RANK FIRST_VALUE LAG LAST_VALUE LEAD LISTAGG MAX MIN NTILE PERCENT_RANK RANK ROW_NUMBER STDDEV_POP STDDEV_SAMP SUM VAR_POP VAR_SAMP VARIANCE The aggregate functions take an expression across all the rows as the input and return a single aggregated value as the result. AVG¶ SyntaxAVG([ ALL | DISTINCT ] expression) DescriptionBy default or with keyword ALL, returns the average (arithmetic mean) of expression over all input rows. Use DISTINCT to return one unique instance of each value. Example-- returns 1.500000 SELECT AVG(my_values) FROM (VALUES (0.0), (1.0), (2.0), (3.0)) AS my_values; COLLECT¶ SyntaxCOLLECT([ ALL | DISTINCT ] expression) DescriptionBy default or with the ALL keyword, returns a multiset of expression over all input rows. NULL values are ignored. Use DISTINCT to return one unique instance of each value. COUNT¶ SyntaxCOUNT([ ALL ] expression | DISTINCT expression1 [, expression2]*) DescriptionBy default or with ALL, returns the number of input rows for which expression isn’t NULL. Use DISTINCT to return one unique instance of each value. Use COUNT(*) or COUNT(1) to return the number of input rows. Example-- returns 4 SELECT COUNT(my_values) FROM (VALUES (0), (1), (2), (3)) AS my_values; CUME_DIST¶ SyntaxCUME_DIST() DescriptionReturns the cumulative distribution of a value in a group of values. The result is the number of rows preceding or equal to the current row in the partition ordering divided by the number of rows in the window partition. DENSE_RANK¶ SyntaxDENSE_RANK() DescriptionReturns the rank of a value in a group of values. The result is one plus the previously assigned rank value. Unlike the RANK function, DENSE_RANK doesn’t produce gaps in the ranking sequence. Related function RANK FIRST_VALUE¶ SyntaxFIRST_VALUE(expression) DescriptionReturns the first value in an ordered set of values. Example-- returns first SELECT FIRST_VALUE(my_values) FROM (VALUES ('first'), ('second'), ('third')) AS my_values; Related function LAST_VALUE LAG¶ SyntaxLAG(expression [, offset] [, default]) DescriptionReturns the value of expression at the offsetth row before the current row in the window. The default value of offset is 1, and the default value of the default argument is NULL. ExampleThe following example shows how to use the LAG function to see player scores changing over time. SELECT $rowtime AS row_time , player_id , game_room_id , points , LAG(points, 1) OVER (PARTITION BY player_id ORDER BY $rowtime) previous_points_value FROM gaming_player_activity; For the full code example, see Compare Current and Previous Values in a Data Stream. Related function LEAD LAST_VALUE¶ SyntaxLAST_VALUE(expression) DescriptionReturns the last value in an ordered set of values. Example-- returns third SELECT LAST_VALUE(my_values) FROM (VALUES ('first'), ('second'), ('third')) AS my_values; Related function FIRST_VALUE LEAD¶ SyntaxLEAD(expression [, offset] [, default]) DescriptionReturns the value of the expression at the offsetth row after the current row in the window. The default value of offset is 1, and the default value of the default argument is NULL. Related function LAG LISTAGG¶ SyntaxLISTAGG(expression [, separator]) DescriptionConcatenates the values of string expressions and inserts separator values between them. The separator isn’t added at the end of string. The default value of separator is ','. Example-- returns first,second,third SELECT LISTAGG(my_values) FROM (VALUES ('first'), ('second'), ('third')) AS my_values; MAX¶ SyntaxMAX([ ALL | DISTINCT ] expression) DescriptionBy default or with the ALL keyword, returns the maximum value of expression over all input rows. Use DISTINCT to return one unique instance of each value. Examples-- returns 3 SELECT MAX(my_values) FROM (VALUES (0), (1), (2), (3)) AS my_values; The following example shows how to use the MAX function to find the highest player score in a tumbling window. SELECT window_start, window_end, SUM(points) AS total, MIN(points) as min_points, MAX(points) as max_points FROM TABLE(TUMBLE(TABLE gaming_player_activity_source, DESCRIPTOR($rowtime), INTERVAL '10' SECOND)) GROUP BY window_start, window_end; For the full code example, see Aggregate a Stream in a Tumbling Window. Related function MIN MIN¶ SyntaxMIN([ ALL | DISTINCT ] expression ) DescriptionBy default or with the ALL keyword, returns the minimum value of expression across all input rows. Use DISTINCT to return one unique instance of each value. Examples-- returns 0 SELECT MIN(my_values) FROM (VALUES (0), (1), (2), (3)) AS my_values; The following example shows how to use the MIN function to find the lowest player score in a tumbling window. SELECT window_start, window_end, SUM(points) AS total, MIN(points) as min_points, MAX(points) as max_points FROM TABLE(TUMBLE(TABLE gaming_player_activity_source, DESCRIPTOR($rowtime), INTERVAL '10' SECOND)) GROUP BY window_start, window_end; For the full code example, see Aggregate a Stream in a Tumbling Window. Related function MAX NTILE¶ SyntaxNTILE(n) DescriptionDivides the rows for each window partition into n buckets ranging from 1 to at most n. If the number of rows in the window partition doesn’t divide evenly into the number of buckets, the remainder values are distributed one per bucket, starting with the first bucket. For example, with 6 rows and 4 buckets, the bucket values would be: 1 1 2 2 3 4 PERCENT_RANK¶ SyntaxPERCENT_RANK() DescriptionReturns the percentage ranking of a value in a group of values. The result is the rank value minus one, divided by the number of rows in the partition minus one. If the partition only contains one row, the PERCENT_RANK function returns 0. RANK¶ SyntaxRANK() DescriptionReturns the rank of a value in a group of values. The result is one plus the number of rows preceding or equal to the current row in the partition ordering. The values produce gaps in the sequence. Related functions DENSE_RANK ROW_NUMBER ROW_NUMBER¶ SyntaxROW_NUMBER() DescriptionAssigns a unique, sequential number to each row, starting with one, according to the ordering of rows within the window partition. The ROW_NUMBER and RANK functions are similar. ROW_NUMBER numbers all rows sequentially, for example, 1, 2, 3, 4, 5. RANK provides the same numeric value for ties, for example 1, 2, 2, 4, 5. Related functions RANK DENSE_RANK STDDEV_POP¶ SyntaxSTDDEV_POP([ ALL | DISTINCT ] expression) DescriptionBy default or with the ALL keyword, returns the population standard deviation of expression over all input rows. Use DISTINCT to return one unique instance of each value. Example-- returns 0.986154 SELECT STDDEV_POP(my_values) FROM (VALUES (0.5), (1.5), (2.2), (3.2)) AS my_values; Related function STDDEV_SAMP STDDEV_SAMP¶ SyntaxSTDDEV_SAMP([ ALL | DISTINCT ] expression) DescriptionBy default or with the ALL keyword, returns the sample standard deviation of expression over all input rows. Use DISTINCT to return one unique instance of each value. Example-- returns 1.138713 SELECT STDDEV_SAMP(my_values) FROM (VALUES (0.5), (1.5), (2.2), (3.2)) AS my_values; Related function STDDEV_POP SUM¶ SyntaxSUM([ ALL | DISTINCT ] expression) By default or with the ALL keyword, returns the sum of expression across all input rows. Use DISTINCT to return one unique instance of each value. Examples-- returns 6 SELECT SUM(my_values) FROM (VALUES (0), (1), (2), (3)) AS my_values; The following example shows how to use the SUM function to find the total of player scores in a tumbling window. SELECT window_start, window_end, SUM(points) AS total, MIN(points) as min_points, MAX(points) as max_points FROM TABLE(TUMBLE(TABLE gaming_player_activity_source, DESCRIPTOR($rowtime), INTERVAL '10' SECOND)) GROUP BY window_start, window_end; For the full code example, see Aggregate a Stream in a Tumbling Window. VAR_POP¶ SyntaxVAR_POP([ ALL | DISTINCT ] expression) DescriptionBy default or with the ALL keyword, returns the population variance, which is the square of the population standard deviation, of expression over all input rows. Use DISTINCT to return one unique instance of each value. Example-- returns 0.972500 SELECT VAR_POP(my_values) FROM (VALUES (0.5), (1.5), (2.2), (3.2)) AS my_values; Related function VAR_SAMP VAR_SAMP¶ SyntaxVAR_SAMP([ ALL | DISTINCT ] expression) DescriptionBy default or with the ALL keyword, returns the sample variance, which is the square of the sample standard deviation, of expression over all input rows. Use DISTINCT to return one unique instance of each value. The VARIANCE function is equivalent to VAR_SAMP. Example-- returns 1.296667 SELECT VAR_SAMP(my_values) FROM (VALUES (0.5), (1.5), (2.2), (3.2)) AS my_values; Related functions STDDEV_POP VARIANCE VARIANCE¶ SyntaxVARIANCE([ ALL | DISTINCT ] expression) DescriptionEquivalent to VAR_SAMP. Other built-in functions¶ Aggregate Functions Collection Functions Comparison Functions Conditional Functions Datetime Functions Hash Functions JSON Functions ML Preprocessing Functions Model Inference Functions Numeric Functions String Functions Table API Functions Related content¶ User-defined Functions Create a User Defined Function Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql AVG([ ALL | DISTINCT ] expression) ``` ```sql -- returns 1.500000 SELECT AVG(my_values) FROM (VALUES (0.0), (1.0), (2.0), (3.0)) AS my_values; ``` ```sql COLLECT([ ALL | DISTINCT ] expression) ``` ```sql COUNT([ ALL ] expression | DISTINCT expression1 [, expression2]*) ``` ```sql -- returns 4 SELECT COUNT(my_values) FROM (VALUES (0), (1), (2), (3)) AS my_values; ``` ```sql CUME_DIST() ``` ```sql DENSE_RANK() ``` ```sql FIRST_VALUE(expression) ``` ```sql -- returns first SELECT FIRST_VALUE(my_values) FROM (VALUES ('first'), ('second'), ('third')) AS my_values; ``` ```sql LAG(expression [, offset] [, default]) ``` ```sql SELECT $rowtime AS row_time , player_id , game_room_id , points , LAG(points, 1) OVER (PARTITION BY player_id ORDER BY $rowtime) previous_points_value FROM gaming_player_activity; ``` ```sql LAST_VALUE(expression) ``` ```sql -- returns third SELECT LAST_VALUE(my_values) FROM (VALUES ('first'), ('second'), ('third')) AS my_values; ``` ```sql LEAD(expression [, offset] [, default]) ``` ```sql LISTAGG(expression [, separator]) ``` ```sql -- returns first,second,third SELECT LISTAGG(my_values) FROM (VALUES ('first'), ('second'), ('third')) AS my_values; ``` ```sql MAX([ ALL | DISTINCT ] expression) ``` ```sql -- returns 3 SELECT MAX(my_values) FROM (VALUES (0), (1), (2), (3)) AS my_values; ``` ```sql SELECT window_start, window_end, SUM(points) AS total, MIN(points) as min_points, MAX(points) as max_points FROM TABLE(TUMBLE(TABLE gaming_player_activity_source, DESCRIPTOR($rowtime), INTERVAL '10' SECOND)) GROUP BY window_start, window_end; ``` ```sql MIN([ ALL | DISTINCT ] expression ) ``` ```sql -- returns 0 SELECT MIN(my_values) FROM (VALUES (0), (1), (2), (3)) AS my_values; ``` ```sql SELECT window_start, window_end, SUM(points) AS total, MIN(points) as min_points, MAX(points) as max_points FROM TABLE(TUMBLE(TABLE gaming_player_activity_source, DESCRIPTOR($rowtime), INTERVAL '10' SECOND)) GROUP BY window_start, window_end; ``` ```sql 1 1 2 2 3 4 ``` ```sql PERCENT_RANK() ``` ```sql PERCENT_RANK ``` ```sql ROW_NUMBER() ``` ```sql 1, 2, 3, 4, 5 ``` ```sql 1, 2, 2, 4, 5 ``` ```sql STDDEV_POP([ ALL | DISTINCT ] expression) ``` ```sql -- returns 0.986154 SELECT STDDEV_POP(my_values) FROM (VALUES (0.5), (1.5), (2.2), (3.2)) AS my_values; ``` ```sql STDDEV_SAMP([ ALL | DISTINCT ] expression) ``` ```sql -- returns 1.138713 SELECT STDDEV_SAMP(my_values) FROM (VALUES (0.5), (1.5), (2.2), (3.2)) AS my_values; ``` ```sql SUM([ ALL | DISTINCT ] expression) ``` ```sql -- returns 6 SELECT SUM(my_values) FROM (VALUES (0), (1), (2), (3)) AS my_values; ``` ```sql SELECT window_start, window_end, SUM(points) AS total, MIN(points) as min_points, MAX(points) as max_points FROM TABLE(TUMBLE(TABLE gaming_player_activity_source, DESCRIPTOR($rowtime), INTERVAL '10' SECOND)) GROUP BY window_start, window_end; ``` ```sql VAR_POP([ ALL | DISTINCT ] expression) ``` ```sql -- returns 0.972500 SELECT VAR_POP(my_values) FROM (VALUES (0.5), (1.5), (2.2), (3.2)) AS my_values; ``` ```sql VAR_SAMP([ ALL | DISTINCT ] expression) ``` ```sql -- returns 1.296667 SELECT VAR_SAMP(my_values) FROM (VALUES (0.5), (1.5), (2.2), (3.2)) AS my_values; ``` ```sql VARIANCE([ ALL | DISTINCT ] expression) ``` --- ### SQL Collection Functions in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/functions/collection-functions.html Collection Functions in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides these built-in collection functions to use in Flink SQL queries: ARRAY ARRAY_AGG ARRAY_APPEND ARRAY_CONCAT ARRAY_CONTAINS ARRAY_DISTINCT ARRAY_EXCEPT ARRAY_INTERSECT ARRAY_JOIN ARRAY_MAX ARRAY_MIN ARRAY_POSITION ARRAY_PREPEND ARRAY_REMOVE ARRAY_REVERSE ARRAY_SLICE ARRAY_SORT ARRAY_UNION CARDINALITY(array) CARDINALITY(map) ELEMENT GROUP_ID GROUPING Implicit row constructor MAP MAP_ENTRIES MAP_FROM_ARRAYS MAP_KEYS MAP_UNION MAP_VALUES ARRAY¶ SyntaxARRAY ‘[’ value1 [, value2 ]* ‘]’ DescriptionCreates an array from the specified list of values, (value1, value2, ...). Use the bracket syntax, array_name[INT], to return the element at position INT in the array. The index starts at 1. Example-- returns Java SELECT ARRAY['Java', 'SQL'][1]; ARRAY_AGG¶ SyntaxARRAY_AGG([ ALL | DISTINCT ] expression [ RESPECT NULLS | IGNORE NULLS ]) DescriptionConcatenates the input rows and returns an array, or NULL if there are no input rows. Use the DISTINCT keyword to specify one unique instance of each value. The ALL keyword concatenates all rows. The default is ALL. By default, NULL values are respected. You can use IGNORE NULLS to skip NULL values. Currently, the ORDER BY clause is not supported. Example-- returns: -- product_name quantities -- Apple [3, 7] -- Orange [2] -- Banana [5, 4] WITH sales_data (id, product_name, quantity_sold) AS ( VALUES (1, 'Apple', 3), (2, 'Banana', 5), (3, 'Apple', 7), (4, 'Orange', 2), (5, 'Banana', 4) ) SELECT product_name, ARRAY_AGG(quantity_sold) AS quantities FROM sales_data GROUP BY product_name; ARRAY_APPEND¶ SyntaxARRAY_APPEND(array, element) DescriptionAppends an element to the end of the array and returns the result. If array is NULL, the function returns NULL. If element is NULL, the NULL element is added to the end of the array. Example-- returns [SQL,Java,C#] SELECT ARRAY_APPEND(ARRAY['SQL', 'Java'], 'C#'); ARRAY_CONCAT¶ SyntaxARRAY_CONCAT(array1, array2, …) DescriptionReturns an array that is the result of concatenating at least one array. The returned array contains all of the elements in the first array, followed by all of the elements in the second array, and so forth, up to the Nth array. If any input array is NULL, the function returns NULL. Example-- returns [SQL,Java,Python,Python,Rust,Haskell,C#] SELECT ARRAY_CONCAT(ARRAY['SQL', 'Java'], ARRAY['Python'], ARRAY['Python', 'Rust', 'Haskell', 'C#']); ARRAY_CONTAINS¶ SyntaxARRAY_CONTAINS(array, element) DescriptionReturns a value indicating whether the element exists in array. Checking for NULL elements in the array is supported. If array is NULL, the ARRAY_CONTAINS function returns NULL. The specified element is cast implicitly to the array’s element type, if necessary. Example-- returns TRUE SELECT ARRAY_CONTAINS(ARRAY['Java', 'SQL'], 'SQL'); ARRAY_DISTINCT¶ SyntaxARRAY_DISTINCT(array) DescriptionReturns an array with unique elements. If array is NULL, the ARRAY_DISTINCT function returns NULL. The order of elements in the source array is preserved in the returned array. Example-- returns [SQL,Java,Python] SELECT ARRAY_DISTINCT(ARRAY['SQL', 'Java', 'SQL', 'Python', 'SQL']); ARRAY_EXCEPT¶ SyntaxARRAY_EXCEPT(array1, array2) DescriptionReturns an array that contains the elements from array1 that are not in array2, without duplicates. The order of the elements from array1 is retained. If no elements remain after excluding the elements in array2 from array1, the function returns an empty array. If one or both arguments are NULL, the function returns NULL. Example-- returns [Java, SQL] SELECT ARRAY_EXCEPT(ARRAY['SQL', 'Java', 'Python', 'Rust',], ARRAY['Python', 'Rust', 'Haskell', 'C#']); ARRAY_INTERSECT¶ SyntaxARRAY_INTERSECT(array1, array2) DescriptionReturns an array that contains the elements from array1 that are also in array2, without duplicates. The order of the elements from array1 is retained. If there are no common elements in array1 and array2, the function returns an empty array. If either array is NULL, the function returns NULL. Example-- returns [Python, Rust] SELECT ARRAY_INTERSECT(ARRAY['SQL', 'Java', 'Python', 'Rust',], ARRAY['Python', 'Rust', 'Haskell', 'C#']); ARRAY_JOIN¶ SyntaxARRAY_JOIN(array, delimiter [, nullReplacement]) DescriptionReturns a string that represents the concatenation of the elements in array. Elements are cast to their string representation. The delimiter is a string that separates each pair of consecutive elements of the array. The optional nullReplacement is a string that replaces null elements in the array. If nullReplacement is not specified, null elements in the array are omitted from the resulting string. Returns NULL if any of the inputs is NULL. Example-- returns "Java, SQL, Python, not specified" SELECT ARRAY_JOIN(ARRAY['Java', 'SQL', 'Python', NULL], ', ', 'not specified'); ARRAY_MAX¶ SyntaxARRAY_MAX(array) DescriptionReturns the maximum value from array, or NULL if array is NULL. Example-- returns 4 SELECT ARRAY_MAX(ARRAY[1, 2, 3, 4]); ARRAY_MIN¶ SyntaxARRAY_MIN(array) DescriptionReturns the minimum value from array, or NULL if array is NULL. Example-- returns 1 SELECT ARRAY_MIN(ARRAY[1, 2, 3, 4]); ARRAY_POSITION¶ SyntaxARRAY_POSITION(array, element) DescriptionReturns the position of the first occurrence of element in array as an integer. The index is 1-based, so the first element in the array has index 1. Returns 0 if element is not found in array. Returns NULL if either of the arguments is NULL. Example-- returns 2 SELECT ARRAY_POSITION(ARRAY['Java', 'SQL', 'Python'], 'SQL'); ARRAY_PREPEND¶ SyntaxARRAY_PREPEND(array, element) DescriptionPrepends an element to the beginning of the array and returns the result. If array is NULL, the function returns NULL. If element is NULL, the NULL element is prepended to the beginning of the array. Example-- returns [SQL,Java,Python] SELECT ARRAY_PREPEND(ARRAY['Java', 'Python'], 'SQL'); ARRAY_REMOVE¶ SyntaxARRAY_REMOVE(array, element) DescriptionRemoves from array all elements that are equal to element. Order of elements is retained. If array is NULL, the function returns NULL. Example-- returns [Java,Python] SELECT ARRAY_REMOVE(ARRAY['Java', 'SQL', 'Python'], 'SQL'); ARRAY_REVERSE¶ SyntaxARRAY_REVERSE(array) DescriptionReturns an array that has elements in the reverse order of the elements in array. If array is NULL, the function returns NULL. Example-- returns [Python,SQL,Java] SELECT ARRAY_REVERSE(ARRAY['Java', 'SQL', 'Python']); ARRAY_SLICE¶ SyntaxARRAY_SLICE(array, start_offset [, end_offset]) DescriptionReturns a subarray of the input array between start_offset and end_offset, inclusive. The offsets are 1-based, but 0 is also treated as the beginning of the array. Elements of the subarray are returned in the order they appear in array. Positive values are counted from the beginning of the array. Negative values are counted from the end. If end_offset is omitted, this offset is treated as the length of the array. If start_offset is after end_offset, or both are out of array bounds, an empty array is returned. Returns NULL if any input value is NULL. Example-- returns [SQL,Python,C#,JavaScript] SELECT ARRAY_SLICE(ARRAY['Java', 'SQL', 'Python', 'C#', 'JavaScript', 'Go'], 2, 5); ARRAY_SORT¶ SyntaxARRAY_SORT(array [, ascending_order [, null_first]]) DescriptionReturns an array that has the elements of array in sorted order. When only array is specified, the function defaults to ascending order with NULLs at the start. Specifying ascending_order as TRUE orders the array in ascending order, with NULLs first. Setting ascending_order to FALSE orders the array in descending order, with NULLs last. Independently, specifying null_first as TRUE moves NULLs to the beginning. specifying null_first as FALSE moves NULLs to the end, irrespective of the sorting order. The function returns NULL if any input is NULL. Example-- returns [1,2,3,4,5] SELECT ARRAY_SORT(ARRAY[5,4,3,2,1]); -- returns [NULL,SQL,Python,Java,Go,C#] SELECT ARRAY_SORT(ARRAY['Java', 'SQL', 'Python', NULL, 'Go', 'C#'], FALSE, TRUE); ARRAY_UNION¶ SyntaxARRAY_UNION(array1, array2) DescriptionReturns an array that has the elements from the union of array1 and array2. Duplicate elements are removed. If array1 or array2 is NULL, the function returns NULL. Example-- returns [Java,SQL,Python,C#,Go] SELECT ARRAY_UNION(ARRAY['Java', 'SQL', 'Python'], ARRAY['C#', 'SQL', 'Go']); CARDINALITY(array)¶ SyntaxCARDINALITY(array) DescriptionReturns the number of elements in the specified array. Example-- returns 5 SELECT CARDINALITY(ARRAY['Java', 'SQL', 'Python', 'Rust', 'C++']); CARDINALITY(map)¶ SyntaxCARDINALITY(map) DescriptionReturns the number of entries in the specified map. Example-- returns 3 SELECT CARDINALITY(MAP['Java', 5, 'SQL', 4, 'Python', 3]); ELEMENT¶ SyntaxELEMENT(array) DescriptionReturns the sole element of the specified array. The cardinality of array must be 1. Returns NULL if array is empty. Throws an exception if array has more than one element. Example-- returns Java SELECT ELEMENT(ARRAY['Java']); GROUP_ID¶ SyntaxGROUP_ID() DescriptionReturns an integer that uniquely identifies the combination of grouping keys. GROUPING¶ SyntaxGROUPING(expression1 [, expression2]* ) GROUPING_ID(expression1 [, expression2]* ) DescriptionReturns a bit vector of the specified grouping expressions. Implicit row constructor¶ Syntax(value1 [, value2]*) DescriptionReturns a row created from a list of values, (value1, value2,...). The implicit row constructor supports arbitrary expressions as fields and requires at least two fields. The explicit row constructor can deal with an arbitrary number of fields but doesn’t support all kinds of field expressions. Example-- returns (1, SQL) SELECT (1, 'SQL'); MAP¶ SyntaxMAP [ key1, value1 [, key2, value2 ], ... ] DescriptionReturns a map created from the specified list of key-value pairs, ((key1, value1), (key2, value2), ...). Use the bracket syntax, map_name[key], to return the value that corresponds with the specified key. Example-- returns 4 SELECT MAP['Java', 5, 'SQL', 4, 'Python', 3]['SQL']; MAP_ENTRIES¶ SyntaxMAP_ENTRIES(map) DescriptionReturns an array with all elements in map. Order of elements in the returned array is not guaranteed. Example-- returns [Java,5,SQL,4,Python,3] SELECT MAP_ENTRIES(MAP['Java', 5, 'SQL', 4, 'Python', 3]); MAP_FROM_ARRAYS¶ SyntaxMAP_FROM_ARRAYS(array_of_keys, array_of_values) DescriptionReturns a map created from an array of keys and an array of and values. The lengths of array_of_keys and array_of_values must be the same. Example-- returns {key1=Python, key2=SQL, key3=Java} SELECT MAP_FROM_ARRAYS(ARRAY['key1', 'key2', 'key3'], ARRAY['Python', 'SQL', 'Java']); MAP_KEYS¶ SyntaxMAP_KEYS(map) DescriptionReturns the keys of map as an array. Order of elements in the returned array is not guaranteed. Example-- returns [Java,Python,SQL] SELECT MAP_KEYS(MAP['Java', 5, 'SQL', 4, 'Python', 3]); MAP_UNION¶ SyntaxMAP_UNION(map1, …) DescriptionReturns a map created by merging at least one map. The maps must have a common map type. If there are overlapping keys, the value from map2 overwrites the value from map1, the value from map3 overwrites the value from map2, the value from mapn overwrites the value from map(n-1). If any of the maps is NULL, the function returns NULL. Example-- returns ['Java', 5, 'SQL', 4, 'Python', 3, 'C#', 2, 'Rust', 1] SELECT MAP_UNION(MAP['Java', 5, 'SQL', 4, 'Python', 3], MAP['C#', 2, 'Rust', 1]); MAP_VALUES¶ SyntaxMAP_VALUES(map) DescriptionReturns the values of map as an array. Order of elements in the returned array is not guaranteed. Example-- returns [3,5,4] SELECT MAP_VALUES(MAP['Java', 5, 'SQL', 4, 'Python', 3]); Other built-in functions¶ Aggregate Functions Collection Functions Comparison Functions Conditional Functions Datetime Functions Hash Functions JSON Functions ML Preprocessing Functions Model Inference Functions Numeric Functions String Functions Table API Functions Related content¶ User-defined Functions Create a User Defined Function Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql ARRAY ‘[’ value1 [, value2 ]* ‘]’ ``` ```sql (value1, value2, ...) ``` ```sql array_name[INT] ``` ```sql -- returns Java SELECT ARRAY['Java', 'SQL'][1]; ``` ```sql ARRAY_AGG([ ALL | DISTINCT ] expression [ RESPECT NULLS | IGNORE NULLS ]) ``` ```sql -- returns: -- product_name quantities -- Apple [3, 7] -- Orange [2] -- Banana [5, 4] WITH sales_data (id, product_name, quantity_sold) AS ( VALUES (1, 'Apple', 3), (2, 'Banana', 5), (3, 'Apple', 7), (4, 'Orange', 2), (5, 'Banana', 4) ) SELECT product_name, ARRAY_AGG(quantity_sold) AS quantities FROM sales_data GROUP BY product_name; ``` ```sql ARRAY_APPEND(array, element) ``` ```sql -- returns [SQL,Java,C#] SELECT ARRAY_APPEND(ARRAY['SQL', 'Java'], 'C#'); ``` ```sql ARRAY_CONCAT(array1, array2, …) ``` ```sql -- returns [SQL,Java,Python,Python,Rust,Haskell,C#] SELECT ARRAY_CONCAT(ARRAY['SQL', 'Java'], ARRAY['Python'], ARRAY['Python', 'Rust', 'Haskell', 'C#']); ``` ```sql ARRAY_CONTAINS(array, element) ``` ```sql ARRAY_CONTAINS ``` ```sql -- returns TRUE SELECT ARRAY_CONTAINS(ARRAY['Java', 'SQL'], 'SQL'); ``` ```sql ARRAY_DISTINCT(array) ``` ```sql ARRAY_DISTINCT ``` ```sql -- returns [SQL,Java,Python] SELECT ARRAY_DISTINCT(ARRAY['SQL', 'Java', 'SQL', 'Python', 'SQL']); ``` ```sql ARRAY_EXCEPT(array1, array2) ``` ```sql -- returns [Java, SQL] SELECT ARRAY_EXCEPT(ARRAY['SQL', 'Java', 'Python', 'Rust',], ARRAY['Python', 'Rust', 'Haskell', 'C#']); ``` ```sql ARRAY_INTERSECT(array1, array2) ``` ```sql -- returns [Python, Rust] SELECT ARRAY_INTERSECT(ARRAY['SQL', 'Java', 'Python', 'Rust',], ARRAY['Python', 'Rust', 'Haskell', 'C#']); ``` ```sql ARRAY_JOIN(array, delimiter [, nullReplacement]) ``` ```sql nullReplacement ``` ```sql nullReplacement ``` ```sql -- returns "Java, SQL, Python, not specified" SELECT ARRAY_JOIN(ARRAY['Java', 'SQL', 'Python', NULL], ', ', 'not specified'); ``` ```sql ARRAY_MAX(array) ``` ```sql -- returns 4 SELECT ARRAY_MAX(ARRAY[1, 2, 3, 4]); ``` ```sql ARRAY_MIN(array) ``` ```sql -- returns 1 SELECT ARRAY_MIN(ARRAY[1, 2, 3, 4]); ``` ```sql ARRAY_POSITION(array, element) ``` ```sql -- returns 2 SELECT ARRAY_POSITION(ARRAY['Java', 'SQL', 'Python'], 'SQL'); ``` ```sql ARRAY_PREPEND(array, element) ``` ```sql -- returns [SQL,Java,Python] SELECT ARRAY_PREPEND(ARRAY['Java', 'Python'], 'SQL'); ``` ```sql ARRAY_REMOVE(array, element) ``` ```sql -- returns [Java,Python] SELECT ARRAY_REMOVE(ARRAY['Java', 'SQL', 'Python'], 'SQL'); ``` ```sql ARRAY_REVERSE(array) ``` ```sql -- returns [Python,SQL,Java] SELECT ARRAY_REVERSE(ARRAY['Java', 'SQL', 'Python']); ``` ```sql ARRAY_SLICE(array, start_offset [, end_offset]) ``` ```sql start_offset ``` ```sql start_offset ``` ```sql -- returns [SQL,Python,C#,JavaScript] SELECT ARRAY_SLICE(ARRAY['Java', 'SQL', 'Python', 'C#', 'JavaScript', 'Go'], 2, 5); ``` ```sql ARRAY_SORT(array [, ascending_order [, null_first]]) ``` ```sql ascending_order ``` ```sql ascending_order ``` ```sql -- returns [1,2,3,4,5] SELECT ARRAY_SORT(ARRAY[5,4,3,2,1]); -- returns [NULL,SQL,Python,Java,Go,C#] SELECT ARRAY_SORT(ARRAY['Java', 'SQL', 'Python', NULL, 'Go', 'C#'], FALSE, TRUE); ``` ```sql ARRAY_UNION(array1, array2) ``` ```sql -- returns [Java,SQL,Python,C#,Go] SELECT ARRAY_UNION(ARRAY['Java', 'SQL', 'Python'], ARRAY['C#', 'SQL', 'Go']); ``` ```sql CARDINALITY(array) ``` ```sql -- returns 5 SELECT CARDINALITY(ARRAY['Java', 'SQL', 'Python', 'Rust', 'C++']); ``` ```sql CARDINALITY(map) ``` ```sql -- returns 3 SELECT CARDINALITY(MAP['Java', 5, 'SQL', 4, 'Python', 3]); ``` ```sql ELEMENT(array) ``` ```sql -- returns Java SELECT ELEMENT(ARRAY['Java']); ``` ```sql GROUPING(expression1 [, expression2]* ) GROUPING_ID(expression1 [, expression2]* ) ``` ```sql (value1 [, value2]*) ``` ```sql (value1, value2,...) ``` ```sql -- returns (1, SQL) SELECT (1, 'SQL'); ``` ```sql MAP [ key1, value1 [, key2, value2 ], ... ] ``` ```sql ((key1, value1), (key2, value2), ...) ``` ```sql map_name[key] ``` ```sql -- returns 4 SELECT MAP['Java', 5, 'SQL', 4, 'Python', 3]['SQL']; ``` ```sql MAP_ENTRIES(map) ``` ```sql -- returns [Java,5,SQL,4,Python,3] SELECT MAP_ENTRIES(MAP['Java', 5, 'SQL', 4, 'Python', 3]); ``` ```sql MAP_FROM_ARRAYS(array_of_keys, array_of_values) ``` ```sql array_of_keys ``` ```sql array_of_values ``` ```sql -- returns {key1=Python, key2=SQL, key3=Java} SELECT MAP_FROM_ARRAYS(ARRAY['key1', 'key2', 'key3'], ARRAY['Python', 'SQL', 'Java']); ``` ```sql MAP_KEYS(map) ``` ```sql -- returns [Java,Python,SQL] SELECT MAP_KEYS(MAP['Java', 5, 'SQL', 4, 'Python', 3]); ``` ```sql MAP_UNION(map1, …) ``` ```sql -- returns ['Java', 5, 'SQL', 4, 'Python', 3, 'C#', 2, 'Rust', 1] SELECT MAP_UNION(MAP['Java', 5, 'SQL', 4, 'Python', 3], MAP['C#', 2, 'Rust', 1]); ``` ```sql MAP_VALUES(map) ``` ```sql -- returns [3,5,4] SELECT MAP_VALUES(MAP['Java', 5, 'SQL', 4, 'Python', 3]); ``` --- ### SQL comparison functions in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/functions/comparison-functions.html Comparison Functions in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides these built-in comparison functions to use in SQL queries: Equality operations Logical operations Comparison Functions Conversion functions Equality operations¶ SQL function Description value1 = value2 Returns TRUE if value1 is equal to value2. Returns UNKNOWN if value1 or value2 is NULL. value1 <> value2 Returns TRUE if value1 is not equal to value2. Returns UNKNOWN if value1 or value2 is NULL. value1 > value2 Returns TRUE if value1 is greater than value2. Returns UNKNOWN if value1 or value2 is NULL. value1 >= value2 Returns TRUE if value1 is greater than or equal to value2. Returns UNKNOWN if value1 or value2 is NULL. value1 < value2 Returns TRUE if value1 is less than value2. Returns UNKNOWN if value1 or value2 is NULL. value1 <= value2 Returns TRUE if value1 is less than or equal to value2. Returns UNKNOWN if value1 or value2 is NULL. Logical operations¶ Logical operation Description boolean1 OR boolean2 Returns TRUE if boolean1 is TRUE or boolean2 is TRUE. Supports three-valued logic. For example, TRUE || NULL(BOOLEAN) returns TRUE. boolean1 AND boolean2 Returns TRUE if boolean1 and boolean2 are both TRUE. Supports three-valued logic. For example, TRUE && NULL(BOOLEAN) returns UNKNOWN. NOT boolean Returns TRUE if boolean is FALSE; returns FALSE if boolean is TRUE; returns UNKNOWN if boolean is UNKNOWN. boolean IS FALSE Returns TRUE if boolean is FALSE; returns FALSE if boolean is TRUE or UNKNOWN. boolean IS NOT FALSE Returns TRUE if boolean is TRUE or UNKNOWN; returns FALSE if boolean is FALSE. boolean IS TRUE Returns TRUE if boolean is TRUE; returns FALSE if boolean is FALSE or UNKNOWN. boolean IS NOT TRUE Returns TRUE if boolean is FALSE or UNKNOWN; returns FALSE if boolean is TRUE. boolean IS UNKNOWN Returns TRUE if boolean is UNKNOWN; returns FALSE if boolean is TRUE or FALSE. boolean IS NOT UNKNOWN Returns TRUE if boolean is TRUE or FALSE; returns FALSE if boolean is UNKNOWN. Comparison functions¶ BETWEEN NOT BETWEEN IN NOT IN IS DISTINCT FROM IS NOT DISTINCT FROM IS NULL IS NOT NULL LIKE NOT LIKE SIMILAR TO NOT SIMILAR TO EXISTS BETWEEN¶ Checks whether a value is between two other values. Syntaxvalue1 BETWEEN [ ASYMMETRIC | SYMMETRIC ] value2 AND value3 DescriptionThe BETWEEN function returns TRUE if value1 is greater than or equal to value2 and less than or equal to value3, if ASYMMETRIC is specified. The default is ASYMMETRIC. If SYMMETRIC is specified, the BETWEEN function returns TRUE if value1 is inclusively between value2 and value3. When either value2 or value3 is NULL, returns FALSE or UNKNOWN. Examples- returns FALSE SELECT 12 BETWEEN 15 AND 12; - returns TRUE SELECT 12 BETWEEN SYMMETRIC 15 AND 12; - returns UNKNOWN SELECT 12 BETWEEN 10 AND NULL; - returns FALSE SELECT 12 BETWEEN NULL AND 10; - returns UNKNOWN SELECT 12 BETWEEN SYMMETRIC NULL AND 12; NOT BETWEEN¶ Checks whether a value is not between two other values. Syntaxvalue1 NOT BETWEEN [ ASYMMETRIC | SYMMETRIC ] value2 AND value3 DescriptionBy default (or with the ASYMMETRIC keyword), The NOT BETWEEN function returns TRUE if value1 is less than value2 or greater than value3, if ASYMMETRIC is specified. If SYMMETRIC is specified, The NOT BETWEEN function returns TRUE if value1 is not inclusively between value2 and value3. When either value2 or value3 is NULL, returns TRUE or UNKNOWN. Examples-- returns TRUE SELECT 12 NOT BETWEEN 15 AND 12; -- returns FALSE SELECT 12 NOT BETWEEN SYMMETRIC 15 AND 12; -- returns UNKNOWN SELECT 12 NOT BETWEEN NULL AND 15; -- returns TRUE SELECT 12 NOT BETWEEN 15 AND NULL; -- returns UNKNOWN SELECT 12 NOT BETWEEN SYMMETRIC 12 AND NULL; EXISTS¶ Check whether a query returns a row. SyntaxEXISTS (sub-query) DescriptionThe EXISTS function returns TRUE if sub-query returns at least one row. The EXISTS function is supported only if the operation can be rewritten in a join and group operation. For streaming queries, the operation is rewritten in a join and group operation. The required state to compute the query result might grow indefinitely, depending on the number of distinct input rows. Provide a query configuration with valid retention interval to prevent excessive state size. ExamplesSELECT user_id, item_id FROM user_behavior WHERE EXISTS ( SELECT * FROM category WHERE category.item_id = user_behavior.item_id AND category.name = 'book' ); IN¶ Checks whether a value exists in a list. Syntaxvalue1 IN (value2 [, value3]* ) value IN (sub-query) DescriptionThe IN function returns TRUE if value1 exists in the specified list (value2, value3, ...). If a subquery is specified, The IN function returns TRUE if value is equal to a row returned by sub-query. When (value2, value3, ...) contains NULL, The IN function returns TRUE if the element can be found and UNKNOWN otherwise. Always returns UNKNOWN if value1 is NULL. Examples-- returns FALSE SELECT 4 IN (1, 2, 3); -- returns TRUE SELECT 1 IN (1, 2, NULL); -- returns UNKNOWN SELECT 4 IN (1, 2, NULL); NOT IN¶ Checks whether a value doesn’t exist in a list. Syntaxvalue1 NOT IN (value2 [, value3]* ) value NOT IN (sub-query) DescriptionThe NOT IN function returns TRUE if value1 does not exist in the specified list (value2, value3, ...). If a subquery is specified, The NOT IN function returns TRUE if value isn’t equal to a row returned by sub-query. When (value2, value3, ...) contains NULL, the NOT IN function returns FALSE if value1 can be found and UNKNOWN otherwise. Always returns UNKNOWN if value1 is NULL. Examples-- returns TRUE SELECT 4 NOT IN (1, 2, 3); -- returns FALSE SELECT 1 NOT IN (1, 2, NULL); -- returns UNKNOWN SELECT 4 NOT IN (1, 2, NULL); IS DISTINCT FROM¶ Checks whether two values are different. Syntaxvalue1 IS DISTINCT FROM value2 DescriptionThe IS DISTINCT FROM function returns TRUE if two values are different. NULL values are treated as identical. Examples-- returns TRUE SELECT 1 IS DISTINCT FROM 2; -- returns TRUE SELECT 1 IS DISTINCT FROM NULL; -- returns FALSE SELECT NULL IS DISTINCT FROM NULL; IS NOT DISTINCT FROM¶ Checks whether two values are equal. Syntaxvalue1 IS NOT DISTINCT FROM value2 DescriptionThe IS NOT DISTINCT FROM function returns TRUE if two values are equal. NULL values are treated as identical. Examples-- returns FALSE SELECT 1 IS NOT DISTINCT FROM 2; -- returns FALSE SELECT 1 IS NOT DISTINCT FROM NULL; -- returns TRUE SELECT NULL IS NOT DISTINCT FROM NULL; IS NULL¶ Checks whether a value is NULL. Syntaxvalue IS NULL DescriptionThe IS NULL function returns TRUE if value is NULL. Examples-- returns FALSE SELECT 1 IS NULL; -- returns TRUE SELECT NULL IS NULL; IS NOT NULL¶ Checks whether a value is assigned. Syntaxvalue IS NOT NULL DescriptionThe IS NOT NULL function returns TRUE if value is not NULL. Examples-- returns TRUE SELECT 1 IS NOT NULL; -- returns FALSE SELECT NULL IS NOT NULL; LIKE¶ Checks whether a string matches a pattern. Syntaxstring1 LIKE string2 DescriptionThe LIKE function returns TRUE if string1 matches the pattern specified by string2. The pattern can contain these special characters: % – matches any number of characters _ – matches a single character Returns UNKNOWN if either string1 or string2 is NULL. Examples-- returns TRUE SELECT 'book-23' LIKE 'book-%'; -- returns FALSE SELECT 'book23' LIKE 'book_'; -- returns TRUE SELECT 'book2' LIKE 'book_'; NOT LIKE¶ Checks whether a string matches a pattern. Syntaxstring1 NOT LIKE string2 [ ESCAPE char ] DescriptionThe NOT LIKE function returns TRUE if string1 does not match the pattern specified by string2. The pattern can contain these special characters: % – matches any number of characters _ – matches a single character Returns UNKNOWN if string1 or string2 is NULL. Examples-- returns FALSE SELECT 'book-23' NOT LIKE 'book-%'; -- returns TRUE SELECT 'book23' NOT LIKE 'book_'; -- returns FALSE SELECT 'book2' NOT LIKE 'book_'; SIMILAR TO¶ Checks whether a string matches a regular expression. Syntaxstring1 SIMILAR TO string2 DescriptionThe SIMILAR TO function returns TRUE if string1 matches the SQL regular expression in string2. The pattern can contain any characters that are valid in regular expressions, like ., which matches any character, *, which matches zero or more occurrences, and + which matches one or more occurrences. Returns UNKNOWN if string1 or string2 is NULL. Examples-- returns TRUE SELECT 'book-523' SIMILAR TO 'book-[0-9]+'; -- returns TRUE SELECT 'bob.dobbs@example.com' SIMILAR TO '%@example.com'; NOT SIMILAR TO¶ Checks whether a string doesn’t match a regular expression. Syntaxstring1 NOT SIMILAR TO string2 [ ESCAPE char ] DescriptionThe NOT SIMILAR TO function returns TRUE if string1 does not match the SQL regular expression specified by string2. Returns UNKNOWN if string1 or string2 is NULL. Examples-- returns TRUE SELECT 'book-nan' NOT SIMILAR TO 'book-[0-9]+'; -- returns TRUE SELECT 'bob.dobbs@company.com' NOT SIMILAR TO '%@example.com'; Conversion functions¶ CAST TRY_CAST TYPEOF CAST¶ Casts a value to a different type. SyntaxCAST(value AS type) DescriptionThe CAST function returns the specified value cast to the type specified by type. A cast error throws an exception and fails the job. When performing a cast operation that may fail, like STRING to INT, prefer TRY_CAST, to enable handling errors. If table.exec.legacy-cast-behaviour is enabled, the CAST function behaves like TRY_CAST. Examples-- returns 42 SELECT CAST('42' AS INT); -- returns NULL of type STRING SELECT CAST(NULL AS STRING); -- throws an exception and fails the job SELECT CAST('not-a-number' AS INT); TRY_CAST¶ Casts a value to a different type and returns NULL on error. SyntaxTRY_CAST(value AS type) DescriptionSimilar to the CAST function, but in case of error, returns NULL rather than failing the job. Examples-- returns 42 SELECT TRY_CAST('42' AS INT); -- returns NULL of type STRING SELECT TRY_CAST(NULL AS STRING); -- returns NULL of type INT SELECT TRY_CAST('not-a-number' AS INT); -- returns 0 of type INT SELECT COALESCE(TRY_CAST('not-a-number' AS INT), 0); TYPEOF¶ Gets the string representation of a data type. SyntaxTYPEOF(input) TYPEOF(input, force_serializable) DescriptionThe TYPEOF function returns the string representation of the input expression’s data type. By default, the returned string is a summary string that might omit certain details for readability. If force_serializable is set to TRUE, the string represents a full data type that can be persisted in a catalog. Anonymous, inline data types have no serializable string representation. In these cases, NULL is returned. Examples-- returns "CHAR(13) NOT NULL" SELECT TYPEOF('a string type'); -- returns "INT NOT NULL" SELECT TYPEOF(23); -- returns "DATE NOT NULL" SELECT TYPEOF(DATE '2023-05-04'); -- returns "NULL" SELECT TYPEOF(NULL); Other built-in functions¶ Aggregate Functions Collection Functions Comparison Functions Conditional Functions Datetime Functions Hash Functions JSON Functions ML Preprocessing Functions Model Inference Functions Numeric Functions String Functions Table API Functions Related content¶ User-defined Functions Create a User Defined Function Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql value1 = value2 ``` ```sql value1 <> value2 ``` ```sql value1 > value2 ``` ```sql value1 >= value2 ``` ```sql value1 < value2 ``` ```sql value1 <= value2 ``` ```sql boolean1 OR boolean2 ``` ```sql TRUE || NULL(BOOLEAN) ``` ```sql boolean1 AND boolean2 ``` ```sql TRUE && NULL(BOOLEAN) ``` ```sql NOT boolean ``` ```sql boolean IS FALSE ``` ```sql boolean IS NOT FALSE ``` ```sql boolean IS TRUE ``` ```sql boolean IS NOT TRUE ``` ```sql boolean IS UNKNOWN ``` ```sql boolean IS NOT UNKNOWN ``` ```sql value1 BETWEEN [ ASYMMETRIC | SYMMETRIC ] value2 AND value3 ``` ```sql - returns FALSE SELECT 12 BETWEEN 15 AND 12; - returns TRUE SELECT 12 BETWEEN SYMMETRIC 15 AND 12; - returns UNKNOWN SELECT 12 BETWEEN 10 AND NULL; - returns FALSE SELECT 12 BETWEEN NULL AND 10; - returns UNKNOWN SELECT 12 BETWEEN SYMMETRIC NULL AND 12; ``` ```sql value1 NOT BETWEEN [ ASYMMETRIC | SYMMETRIC ] value2 AND value3 ``` ```sql NOT BETWEEN ``` ```sql NOT BETWEEN ``` ```sql -- returns TRUE SELECT 12 NOT BETWEEN 15 AND 12; -- returns FALSE SELECT 12 NOT BETWEEN SYMMETRIC 15 AND 12; -- returns UNKNOWN SELECT 12 NOT BETWEEN NULL AND 15; -- returns TRUE SELECT 12 NOT BETWEEN 15 AND NULL; -- returns UNKNOWN SELECT 12 NOT BETWEEN SYMMETRIC 12 AND NULL; ``` ```sql EXISTS (sub-query) ``` ```sql SELECT user_id, item_id FROM user_behavior WHERE EXISTS ( SELECT * FROM category WHERE category.item_id = user_behavior.item_id AND category.name = 'book' ); ``` ```sql value1 IN (value2 [, value3]* ) value IN (sub-query) ``` ```sql (value2, value3, ...) ``` ```sql (value2, value3, ...) ``` ```sql -- returns FALSE SELECT 4 IN (1, 2, 3); -- returns TRUE SELECT 1 IN (1, 2, NULL); -- returns UNKNOWN SELECT 4 IN (1, 2, NULL); ``` ```sql value1 NOT IN (value2 [, value3]* ) value NOT IN (sub-query) ``` ```sql (value2, value3, ...) ``` ```sql (value2, value3, ...) ``` ```sql -- returns TRUE SELECT 4 NOT IN (1, 2, 3); -- returns FALSE SELECT 1 NOT IN (1, 2, NULL); -- returns UNKNOWN SELECT 4 NOT IN (1, 2, NULL); ``` ```sql value1 IS DISTINCT FROM value2 ``` ```sql IS DISTINCT FROM ``` ```sql -- returns TRUE SELECT 1 IS DISTINCT FROM 2; -- returns TRUE SELECT 1 IS DISTINCT FROM NULL; -- returns FALSE SELECT NULL IS DISTINCT FROM NULL; ``` ```sql value1 IS NOT DISTINCT FROM value2 ``` ```sql IS NOT DISTINCT FROM ``` ```sql -- returns FALSE SELECT 1 IS NOT DISTINCT FROM 2; -- returns FALSE SELECT 1 IS NOT DISTINCT FROM NULL; -- returns TRUE SELECT NULL IS NOT DISTINCT FROM NULL; ``` ```sql value IS NULL ``` ```sql -- returns FALSE SELECT 1 IS NULL; -- returns TRUE SELECT NULL IS NULL; ``` ```sql value IS NOT NULL ``` ```sql IS NOT NULL ``` ```sql -- returns TRUE SELECT 1 IS NOT NULL; -- returns FALSE SELECT NULL IS NOT NULL; ``` ```sql string1 LIKE string2 ``` ```sql -- returns TRUE SELECT 'book-23' LIKE 'book-%'; -- returns FALSE SELECT 'book23' LIKE 'book_'; -- returns TRUE SELECT 'book2' LIKE 'book_'; ``` ```sql string1 NOT LIKE string2 [ ESCAPE char ] ``` ```sql -- returns FALSE SELECT 'book-23' NOT LIKE 'book-%'; -- returns TRUE SELECT 'book23' NOT LIKE 'book_'; -- returns FALSE SELECT 'book2' NOT LIKE 'book_'; ``` ```sql string1 SIMILAR TO string2 ``` ```sql -- returns TRUE SELECT 'book-523' SIMILAR TO 'book-[0-9]+'; -- returns TRUE SELECT 'bob.dobbs@example.com' SIMILAR TO '%@example.com'; ``` ```sql string1 NOT SIMILAR TO string2 [ ESCAPE char ] ``` ```sql NOT SIMILAR TO ``` ```sql -- returns TRUE SELECT 'book-nan' NOT SIMILAR TO 'book-[0-9]+'; -- returns TRUE SELECT 'bob.dobbs@company.com' NOT SIMILAR TO '%@example.com'; ``` ```sql CAST(value AS type) ``` ```sql table.exec.legacy-cast-behaviour ``` ```sql -- returns 42 SELECT CAST('42' AS INT); -- returns NULL of type STRING SELECT CAST(NULL AS STRING); -- throws an exception and fails the job SELECT CAST('not-a-number' AS INT); ``` ```sql TRY_CAST(value AS type) ``` ```sql -- returns 42 SELECT TRY_CAST('42' AS INT); -- returns NULL of type STRING SELECT TRY_CAST(NULL AS STRING); -- returns NULL of type INT SELECT TRY_CAST('not-a-number' AS INT); -- returns 0 of type INT SELECT COALESCE(TRY_CAST('not-a-number' AS INT), 0); ``` ```sql TYPEOF(input) TYPEOF(input, force_serializable) ``` ```sql force_serializable ``` ```sql -- returns "CHAR(13) NOT NULL" SELECT TYPEOF('a string type'); -- returns "INT NOT NULL" SELECT TYPEOF(23); -- returns "DATE NOT NULL" SELECT TYPEOF(DATE '2023-05-04'); -- returns "NULL" SELECT TYPEOF(NULL); ``` --- ### SQL conditional functions in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/functions/conditional-functions.html Conditional Functions in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides these built-in functions for controlling execution flow in SQL queries: CASE CASE WHEN CONDITION COALESCE GREATEST IF IFNULL IS_ALPHA IS_DECIMAL IS_DIGIT LEAST NULLIF CASE¶ SyntaxCASE value WHEN value1_1 [, value1_2]* THEN result1 (WHEN value2_1 [, value2_2 ]* THEN result2)* (ELSE result_z) END DescriptionReturns resultX when the specified value is contained in (valueX_1, valueX_2, ...). If no value matches, CASE returns result_z, if it’s provided, otherwise NULL. CASE WHEN CONDITION¶ SyntaxCASE WHEN condition1 THEN result1 (WHEN condition2 THEN result2)* (ELSE result_z) END Returns resultX when the first conditionX is met. When no condition is met, returns result_z, if it’s provided, otherwise NULL. COALESCE¶ SyntaxCOALESCE(value1 [, value2]*) Returns the first argument that is not NULL. If all arguments are NULL, the COALESCE function returns NULL. The return type is the least-restrictive, common type of all the arguments. The return type is nullable if all arguments are nullable as well. ExampleThe following SELECT statements return the values indicated in the comment lines. -- Returns 'default' SELECT COALESCE(NULL, 'default'); -- Returns the first non-null value among column0 and column1, -- or 'default' if column0 and column1 are both NULL. SELECT COALESCE(column0, column1, 'default'); GREATEST¶ SyntaxGREATEST(value1[, value2]*) Returns the greatest value in the specified list of arguments. Returns NULL if any argument is NULL. Example-- returns 4 SELECT GREATEST(1, 2, 3, 4); -- returns d SELECT GREATEST('a', 'b', 'c', 'd'); IF¶ SyntaxIF(condition, true_value, false_value) Returns the true_value if condition is met, otherwise false_value. ExampleThe following SELECT statements return the values indicated in the comment lines. -- returns 5 SELECT IF(5 > 3, 5, 3); IFNULL¶ SyntaxIFNULL(input, null_replacement) Returns null_replacement if input is NULL; otherwise returns input. The IFNULL function enables passing nullable columns into a function or table that is declared with a NOT NULL constraint. Compared with COALESCE or CASE, the IFNULL function returns a data type that’s specific with respect to nullability. The returned type is the common type of both arguments but only nullable if the null_replacement is nullable. For example, IFNULL(nullable_column, 5) never returns NULL. IS_ALPHA¶ SyntaxIS_ALPHA(string) Returns TRUE if all characters in the specified string are alphabetic, otherwise FALSE. Example-- returns FALSE SELECT IS_ALPHA('42'); -- returns TRUE SELECT IS_ALPHA('string'); IS_DECIMAL¶ SyntaxIS_DECIMAL(string) Returns TRUE if the specified string can be parsed to a valid NUMERIC, otherwise FALSE. Example-- returns TRUE SELECT IS_DECIMAL('23'); -- returns FALSE SELECT IS_DECIMAL('not a number'); IS_DIGIT¶ SyntaxIS_DIGIT(string) Returns TRUE if all characters in the specified string are digits, otherwise FALSE. Example-- returns TRUE SELECT IS_DIGIT('23'); -- returns FALSE SELECT IS_DIGIT('2 not a digit 3'); LEAST¶ SyntaxLEAST(value1[, value2]*) Returns the lowest value in the specified list of arguments. Returns NULL if any argument is NULL. Example-- returns 1 SELECT LEAST(1, 2, 3, 4); -- returns a SELECT LEAST('a', 'b', 'c', 'd'); NULLIF¶ SyntaxNULLIF(value1, value2) DescriptionReturns NULL if value1 is equal to value2, otherwise returns value1. Example-- returns NULL SELECT NULLIF(5, 5); -- returns 5 SELECT NULLIF(5, 0); Other built-in functions¶ Aggregate Functions Collection Functions Comparison Functions Conditional Functions Datetime Functions Hash Functions JSON Functions ML Preprocessing Functions Model Inference Functions Numeric Functions String Functions Table API Functions Related content¶ User-defined Functions Create a User Defined Function Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql CASE value WHEN value1_1 [, value1_2]* THEN result1 (WHEN value2_1 [, value2_2 ]* THEN result2)* (ELSE result_z) END ``` ```sql (valueX_1, valueX_2, ...) ``` ```sql CASE WHEN condition1 THEN result1 (WHEN condition2 THEN result2)* (ELSE result_z) END ``` ```sql COALESCE(value1 [, value2]*) ``` ```sql -- Returns 'default' SELECT COALESCE(NULL, 'default'); -- Returns the first non-null value among column0 and column1, -- or 'default' if column0 and column1 are both NULL. SELECT COALESCE(column0, column1, 'default'); ``` ```sql GREATEST(value1[, value2]*) ``` ```sql -- returns 4 SELECT GREATEST(1, 2, 3, 4); -- returns d SELECT GREATEST('a', 'b', 'c', 'd'); ``` ```sql IF(condition, true_value, false_value) ``` ```sql false_value ``` ```sql -- returns 5 SELECT IF(5 > 3, 5, 3); ``` ```sql IFNULL(input, null_replacement) ``` ```sql null_replacement ``` ```sql null_replacement ``` ```sql IFNULL(nullable_column, 5) ``` ```sql IS_ALPHA(string) ``` ```sql -- returns FALSE SELECT IS_ALPHA('42'); -- returns TRUE SELECT IS_ALPHA('string'); ``` ```sql IS_DECIMAL(string) ``` ```sql -- returns TRUE SELECT IS_DECIMAL('23'); -- returns FALSE SELECT IS_DECIMAL('not a number'); ``` ```sql IS_DIGIT(string) ``` ```sql -- returns TRUE SELECT IS_DIGIT('23'); -- returns FALSE SELECT IS_DIGIT('2 not a digit 3'); ``` ```sql LEAST(value1[, value2]*) ``` ```sql -- returns 1 SELECT LEAST(1, 2, 3, 4); -- returns a SELECT LEAST('a', 'b', 'c', 'd'); ``` ```sql NULLIF(value1, value2) ``` ```sql -- returns NULL SELECT NULLIF(5, 5); -- returns 5 SELECT NULLIF(5, 0); ``` --- ### SQL Datetime Functions in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/functions/datetime-functions.html Datetime Functions in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides these built-in functions for handling date and time logic in SQL queries: Date Time Timestamp Utility CURRENT_DATE CONVERT_TZ CURRENT_TIMESTAMP CEIL DATE_FORMAT CURRENT_TIME CURRENT_ROW_TIMESTAMP CURRENT_WATERMARK DATE HOUR LOCALTIMESTAMP EXTRACT DAYOFMONTH LOCALTIME TIMESTAMP FLOOR DAYOFWEEK MINUTE TO_TIMESTAMP FROM_UNIXTIME DAYOFYEAR NOW TO_TIMESTAMP_LTZ INTERVAL MONTH SECOND TIMESTAMPADD SOURCE_WATERMARK QUARTER TIME TIMESTAMPDIFF OVERLAPS TO_DATE UNIX_TIMESTAMP WEEK UNIX_TIMESTAMP YEAR Time interval and point unit specifiers¶ The following table lists specifiers for time interval and time point units. Time interval unit Time point unit MILLENNIUM CENTURY DECADE YEAR YEAR YEAR TO MONTH QUARTER QUARTER MONTH MONTH WEEK WEEK DAY DAY DAY TO HOUR DAY TO MINUTE DAY TO SECOND HOUR HOUR HOUR TO MINUTE HOUR TO SECOND MINUTE MINUTE MINUTE TO SECOND SECOND SECOND MILLISECOND MILLISECOND MICROSECOND MICROSECOND NANOSECOND EPOCH DOY DOW EPOCH ISODOW ISOYEAR SQL_TSI_YEAR SQL_TSI_QUARTER SQL_TSI_MONTH SQL_TSI_WEEK SQL_TSI_DAY SQL_TSI_HOUR SQL_TSI_MINUTE SQL_TSI_SECOND CEIL¶ Rounds a time point up. SyntaxCEIL(timepoint TO timeintervalunit) DescriptionThe CEIL function returns a value that rounds timepoint up to the time unit specified by timeintervalunit. Example-- returns "12:45:00" SELECT CEIL(TIME '12:44:31' TO MINUTE); Related function FLOOR CONVERT_TZ¶ Converts a datetime from one time zone to another. SyntaxCONVERT_TZ(string1, string2, string3) DescriptionThe CONVERT_TZ function converts a datetime string1 that has the default ISO timestamp format, “yyyy-MM-dd hh:mm:ss”, from the time zone specified by string2 to the time zone specified by string3. The format of the time zone arguments is either an abbreviation, like “PST”, a full name, like “America/Los_Angeles”, or a custom ID, like “GMT-08:00”. Example-- returns "1969-12-31 16:00:00" SELECT CONVERT_TZ('1970-01-01 00:00:00', 'UTC', 'America/Los_Angeles'); CURRENT_DATE¶ Returns the current date. SyntaxCURRENT_DATE DescriptionThe CURRENT_DATE function returns the current SQL date in the local time zone. In streaming mode, the current date is evaluated for each record. In batch mode, the current date is evaluated once when the query starts, and CURRENT_DATE returns the same result for every row. Example-- returns the current date SELECT CURRENT_DATE; CURRENT_ROW_TIMESTAMP¶ Returns the current timestamp for each row. SyntaxCURRENT_ROW_TIMESTAMP() DescriptionThe CURRENT_ROW_TIMESTAMP function returns the current SQL timestamp in the local time zone. The return type is TIMESTAMP_LTZ(3). The timestamp is evaluated for each row, in both batch and streaming mode. Example-- returns the timestamp of the current datetime SELECT CURRENT_ROW_TIMESTAMP(); CURRENT_TIME¶ SyntaxCURRENT_TIME DescriptionThe CURRENT_TIME function returns the current SQL time in the local time zone. The CURRENT_TIME function is equivalent to LOCALTIME. Example-- returns the current time, for example: -- 13:03:56 SELECT CURRENT_TIME; CURRENT_TIMESTAMP¶ SyntaxCURRENT_TIMESTAMP DescriptionThe CURRENT_TIMESTAMP function returns the current SQL timestamp in the local time zone. The return type is TIMESTAMP_LTZ(3). In streaming mode, the current timestamp is evaluated for each record. In batch mode, the current timestamp is evaluated once when the query starts, and CURRENT_TIMESTAMP returns the same result for every row. The CURRENT_TIMESTAMP function is equivalent to NOW. Example-- returns the current timestamp, for example: -- 2023-10-16 13:04:58.081 SELECT CURRENT_TIMESTAMP; CURRENT_WATERMARK¶ Gets the current watermark for a rowtime column. SyntaxCURRENT_WATERMARK(rowtime) DescriptionThe CURRENT_WATERMARK function returns the current watermark for the given rowtime attribute, or NULL if no common watermark of all upstream operations is available at the current operation in the pipeline. The return type of the function is inferred to match that of the provided rowtime attribute, but with an adjusted precision of 3. For example, if the rowtime attribute is TIMESTAMP_LTZ(9), the function returns TIMESTAMP_LTZ(3). This function can return NULL, and it may be necessary to consider this case. For more information, see watermarks. ExampleThe following example shows how to filter out late data by using the CURRENT_WATERMARK function with a rowtime column named ts. WHERE CURRENT_WATERMARK(ts) IS NULL OR ts > CURRENT_WATERMARK(ts) Related function SOURCE_WATERMARK DATE_FORMAT¶ Converts a timestamp to a formatted string. SyntaxDATE_FORMAT(timestamp, date_format) DescriptionThe DATE_FORMAT function converts the specified timestamp to a string value in the format specified by the date_format string. The format string is compatible with the Java SimpleDateFormat. class. Example-- returns "5:32 PM, UTC" SELECT DATE_FORMAT('2023-03-15 17:32:01.009', 'K:mm a, z'); DATE¶ Parses a DATE from a string. SyntaxDATE string DescriptionThe DATE function returns a SQL date parsed from the specified string. The date format of the input string must be “yyyy-MM-dd”. Example-- returns "2023-05-23" SELECT DATE '2023-05-23'; DAYOFMONTH¶ Gets the day of month from a DATE. SyntaxDAYOFMONTH(date) DescriptionThe DAYOFMONTH function returns the day of a month from the specified SQL DATE as an integer between 1 and 31. The DAYOFMONTH function is equivalent to EXTRACT(DAY FROM date). Example-- returns 27 SELECT DAYOFMONTH(DATE '1994-09-27'); DAYOFWEEK¶ Gets the day of week from a DATE. SyntaxDAYOFWEEK(date) DescriptionThe DAYOFWEEK function returns the day of a week from the specified SQL DATE as an integer between 1 and 7. The DAYOFWEEK function is equivalent to EXTRACT(DOW FROM date). Example-- returns 3 SELECT DAYOFWEEK(DATE '1994-09-27'); DAYOFYEAR¶ Gets the day of year from a DATE. SyntaxDAYOFYEAR(date) DescriptionThe DAYOFYEAR function returns the day of a year from the specified SQL DATE as an integer between 1 and 366. The DAYOFYEAR function is equivalent to EXTRACT(DOY FROM date). Example-- returns 270 SELECT DAYOFYEAR(DATE '1994-09-27'); EXTRACT¶ Gets a time interval unit from a datetime. SyntaxEXTRACT(timeintervalunit FROM temporal) DescriptionThe EXTRACT function returns a LONG value extracted from the specified timeintervalunit part of temporal. Example-- returns 5 SELECT EXTRACT(DAY FROM DATE '2006-06-05'); Related functions DAYOFMONTH DAYOFWEEK DAYOFYEAR FLOOR¶ Rounds a time point down. SyntaxFLOOR(timepoint TO timeintervalunit) DescriptionThe FLOOR function returns a value that rounds timepoint down to the time unit specified by timeintervalunit. Example-- returns 12:44:00 SELECT FLOOR(TIME '12:44:31' TO MINUTE); Related function CEIL FROM_UNIXTIME¶ Gets a Unix time as a formatted string. SyntaxFROM_UNIXTIME(numeric[, string]) DescriptionThe FROM_UNIXTIME function returns a representation of the NUMERIC argument as a value in string format. The default format is “yyyy-MM-dd hh:mm:ss”. The specified NUMERIC is an internal timestamp value representing seconds since “1970-01-01 00:00:00” UTC, such as produced by the UNIX_TIMESTAMP function. The return value is expressed in the session time zone (specified in TableConfig). Example-- Returns "1970-01-01 00:00:44" if in the UTC time zone, -- but returns "1970-01-01 09:00:44" if in the 'Asia/Tokyo' time zone. SELECT FROM_UNIXTIME(44); HOUR¶ Gets the hour of day from a timestamp. SyntaxHOUR(timestamp) DescriptionThe HOUR function returns the hour of a day from the specified SQL timestamp as an integer between 0 and 23. The HOUR function is equivalent to EXTRACT(HOUR FROM timestamp). Example-- returns 13 SELECT HOUR(TIMESTAMP '1994-09-27 13:14:15'); Related functions MINUTE SECOND INTERVAL¶ Parses an interval string. SyntaxINTERVAL string range DescriptionThe INTERVAL function parses an interval string in the form “dd hh:mm:ss.fff” for SQL intervals of milliseconds, or “yyyy-mm” for SQL intervals of months. For intervals of milliseconds, these interval ranges apply: DAY MINUTE DAY TO HOUR DAY TO SECOND For intervals of months, these interval ranges apply: YEAR YEAR TO MONTH ExamplesThe following SELECT statements return the values indicated in the comment lines. -- returns +10 00:00:00.004 SELECT INTERVAL '10 00:00:00.004' DAY TO SECOND; -- returns +10 00:00:00.000 SELECT INTERVAL '10' DAY; -- returns +2-10 SELECT INTERVAL '2-10' YEAR TO MONTH; LOCALTIME¶ Gets the current local time. SyntaxLOCALTIME DescriptionThe LOCALTIME function returns the current SQL time in the local time zone. The return type is TIME(0). In streaming mode, the current local time is evaluated for each record. In batch mode, the current local time is evaluated once when the query starts, and LOCALTIME returns the same result for every row. Example-- returns the local machine time as "hh:mm:ss", for example: -- 13:16:03 SELECT LOCALTIME; LOCALTIMESTAMP¶ Gets the current timestamp. SyntaxLOCALTIMESTAMP DescriptionThe LOCALTIMESTAMP function returns the current SQL timestamp in local time zone. The return type is TIMESTAMP(3). In streaming mode, the current timestamp is evaluated for each record. In batch mode, the current timestamp is evaluated once when the query starts, and LOCALTIMESTAMP returns the same result for every row. Example-- returns the local machine datetime as "yyyy-mm-dd hh:mm:ss.sss", for example: -- 2023-10-16 13:15:32.390 SELECT LOCALTIMESTAMP; MINUTE¶ Gets the minute of hour from a timestamp. SyntaxMINUTE(timestamp) DescriptionThe MINUTE function returns the minute of an hour from the specified SQL timestamp as an integer between 0 and 59. The MINUTE function is equivalent to EXTRACT(MINUTE FROM timestamp). Example- returns 14 SELECT MINUTE(TIMESTAMP '1994-09-27 13:14:15'); Related functions HOUR SECOND MONTH¶ Gets the month of year from a DATE. SyntaxMONTH(date) DescriptionThe MONTH function returns the month of a year from the specified SQL date as an integer between 1 and 12. The MONTH function is equivalent to EXTRACT(MONTH FROM date). Example-- returns 9 SELECT MONTH(DATE '1994-09-27'); Related functions DAYOFMONTH DAYOFYEAR WEEK YEAR NOW¶ Gets the current timestamp. SyntaxNOW() DescriptionThe NOW function returns the current SQL timestamp in the local time zone. The NOW function is equivalent to CURRENT_TIMESTAMP. Example-- returns the local machine datetime as "yyyy-mm-dd hh:mm:ss.sss", for example: -- 2023-10-16 13:17:54.382 SELECT NOW(); OVERLAPS¶ Checks whether two time intervals overlap. Syntax(timepoint1, temporal1) OVERLAPS (timepoint2, temporal2) DescriptionThe OVERLAPS function returns TRUE if two time intervals defined by (timepoint1, temporal1) and (timepoint2, temporal2) overlap. The temporal values can be either a time point or a time interval. Example-- returns TRUE SELECT (TIME '2:55:00', INTERVAL '1' HOUR) OVERLAPS (TIME '3:30:00', INTERVAL '2' HOUR); -- returns FALSE SELECT (TIME '9:00:00', TIME '10:00:00') OVERLAPS (TIME '10:15:00', INTERVAL '3' HOUR); QUARTER¶ Gets the quarter of year from a DATE. SyntaxQUARTER(date) DescriptionThe QUARTER function returns the quarter of a year from the specified SQL DATE as an integer between 1 and 4. The QUARTER function is equivalent to EXTRACT(QUARTER FROM date). Example-- returns 3 SELECT QUARTER(DATE '1994-09-27'); Related functions DAYOFMONTH DAYOFYEAR WEEK YEAR SECOND¶ Gets the second of minute from a TIMESTAMP. SyntaxSECOND(timestamp) DescriptionThe SECOND function returns the second of a minute from the specified SQL TIMESTAMP as an integer between 0 and 59. The SECOND function is equivalent to EXTRACT(SECOND FROM timestamp). Example-- returns 15 SELECT SECOND(TIMESTAMP '1994-09-27 13:14:15'); Related functions HOUR MINUTE SOURCE_WATERMARK¶ Provides a default watermark strategy. SyntaxWATERMARK FOR column AS SOURCE_WATERMARK() DescriptionThe SOURCE_WATERMARK function provides a default watermark strategy. Watermarks are assigned per Kafka partition in the source operator. They are based on a moving histogram of observed out-of-orderness in the table, In other words, the difference between the current event time of an event and the maximum event time seen so far. The watermark is then assigned as the maximum event time seen to this point, minus the 95% quantile of observed out-of-orderness. In other words, the default watermark strategy aims to assign watermarks so that at most 5% of messages are “late”, meaning they arrive after the watermark. The minimum out-of-orderness is 50 milliseconds. The maximum out-of-orderness is 7 days. The algorithm always considers the out-of-orderness of the last 5000 events per partition. During warmup, before the algorithm has seen 1000 messages (per partition) it applies an additional safety margin to the observed out-of-orderness. The safety margin depends on the number of messages seen so far. Number of messages Safety margin 1 - 250 7 days 251 - 500 30s 501 - 750 10s 751 - 1000 1s In effect, the algorithm doesn’t provide a usable watermark before it has seen 250 records per partition. Example-- Create a table that has the default watermark strategy -- on the ts column. CREATE TABLE t2 ( i INT, ts TIMESTAMP_LTZ(3), WATERMARK FOR ts AS SOURCE_WATERMARK()); -- The queryable schema for the table has the default watermark -- strategy on the ts column. ( i INT, ts TIMESTAMP_LTZ(3), `$rowtime` TIMESTAMP_LTZ(3) NOT NULL METADATA VIRTUAL COMMENT 'SYSTEM', WATERMARK FOR ts AS SOURCE_WATERMARK() ); Related functions CURRENT_WATERMARK Watermark clause TIME¶ Parses a string to a TIME. SyntaxTIME string DescriptionThe TIME function returns a SQL TIME parsed from the specified string. The time format of the input string must be “hh:mm:ss”. Example-- returns 23:42:55 as a TIME SELECT TIME '23:42:55'; TIMESTAMP¶ SyntaxTIMESTAMP string DescriptionThe TIMESTAMP function returns a SQL TIMESTAMP parsed from the specified string. The timestamp format of the input string must be “yyyy-MM-dd hh:mm:ss[.SSS]”. Example-- returns 2023-05-04 23:42:55 as a TIMESTAMP SELECT TIMESTAMP '2023-05-04 23:42:55'; TO_DATE¶ Converts a date string to a DATE. SyntaxTO_DATE(string1[, string2]) DescriptionThe TO_DATE function converts the date string string1 with format string2 to a DATE. The default format is ‘yyyy-mm-dd’. Example-- returns 2023-05-04 as a DATE SELECT TO_DATE('2023-05-04'); TO_TIMESTAMP¶ Converts a date string to a TIMESTAMP. SyntaxTO_TIMESTAMP(string1[, string2]) DescriptionThe TO_TIMESTAMP function converts datetime string string1 with format string2 under the ‘UTC+0’ time zone to a TIMESTAMP. The default format is ‘yyyy-mm-dd hh:mm:ss’. Example-- returns 2023-05-04 23:42:55.000 as a TIMESTAMP SELECT TO_TIMESTAMP('2023-05-04 23:42:55', 'yyyy-mm-dd hh:mm:ss'); TO_TIMESTAMP_LTZ¶ Converts a Unix time to a TIMESTAMP_LTZ. SyntaxTO_TIMESTAMP_LTZ(numeric, precision) TO_TIMESTAMP_LTZ(string1[, string2[, string3]]) DescriptionThe first version of the TO_TIMESTAMP_LTZ function converts Unix epoch seconds or epoch milliseconds to a TIMESTAMP_LTZ. These are the valid precision values: 0, which represents TO_TIMESTAMP_LTZ(epoch_seconds, 0) 3, which represents TO_TIMESTAMP_LTZ(epoch_milliseconds, 3) If no precision is provided, the default precision is 3. The second version converts a timestamp string string1 with format string2 (by default ‘yyyy-MM-dd HH:mm:ss.SSS’) in time zone string3 (by default ‘UTC’) to a TIMESTAMP_LTZ. If any input is NULL, the function will return NULL. Examples-- convert 1000 epoch seconds -- returns 1970-01-01 00:16:40.000 as a TIMESTAMP_LTZ SELECT TO_TIMESTAMP_LTZ(1000, 0); -- convert 1000 epoch milliseconds -- returns 1970-01-01 00:00:01.000 as a TIMESTAMP_LTZ SELECT TO_TIMESTAMP_LTZ(1000, 3); -- convert timestamp string with custom format and timezone -- returns appropriate TIMESTAMP_LTZ based on the timezone SELECT TO_TIMESTAMP_LTZ('2023-05-04 12:00:00', 'yyyy-MM-dd HH:mm:ss', 'America/Los_Angeles'); TIMESTAMPADD¶ Adds a time interval to a datetime. SyntaxTIMESTAMPADD(timeintervalunit, interval, timepoint) DescriptionReturns the sum of timepoint and the interval number of time units specified by timeintervalunit. The unit for the interval is given by the first argument, which must be one of the following values: DAY HOUR MINUTE MONTH SECOND YEAR Example-- returns 2000-01-01 SELECT TIMESTAMPADD(DAY, 1, DATE '1999-12-31'); -- returns 2000-01-01 01:00:00 SELECT TIMESTAMPADD(HOUR, 2, TIMESTAMP '1999-12-31 23:00:00'); TIMESTAMPDIFF¶ Computes the interval between two datetimes. SyntaxTIMESTAMPDIFF(timepointunit, timepoint1, timepoint2) DescriptionThe TIMESTAMPDIFF function returns the (signed) number of timepointunit between timepoint1 and timepoint2. The unit for the interval is given by the first argument, which must be one of the following values: DAY HOUR MINUTE MONTH SECOND YEAR Example-- returns -1 SELECT TIMESTAMPDIFF(DAY, DATE '2000-01-01', DATE '1999-12-31'); -- returns -2 SELECT TIMESTAMPDIFF(HOUR, TIMESTAMP '2000-01-01 01:00:00', TIMESTAMP '1999-12-31 23:00:00'); UNIX_TIMESTAMP¶ Gets the current Unix timestamp in seconds. SyntaxUNIX_TIMESTAMP() DescriptionThe UNIX_TIMESTAMP function is not deterministic, which means the value is recalculated for each row. Example-- returns Epoch seconds, for example: -- 1697487923 SELECT UNIX_TIMESTAMP(); UNIX_TIMESTAMP¶ Converts a datetime string to a Unix timestamp. SyntaxUNIX_TIMESTAMP(string1[, string2]) DescriptionThe UNIX_TIMESTAMP(string) function converts the specified datetime string string1 in format string2 to a Unix timestamp (in seconds), using the time zone specified in table config. The default format is “yyyy-MM-dd HH:mm:ss”. If a time zone is specified in the datetime string and parsed by the UTC+X format, like yyyy-MM-dd HH:mm:ss.SSS X, this function uses the specified timezone in the datetime string instead of the timezone in the table configuration. If the datetime string can’t be parsed, the default value of Long.MIN_VALUE(-9223372036854775808) is returned. Examples-- returns 1683201600 SELECT UNIX_TIMESTAMP('2023-05-04 12:00:00'); -- Returns 25201 SELECT UNIX_TIMESTAMP('1970-01-01 08:00:01.001', 'yyyy-MM-dd HH:mm:ss.SSS'); -- Returns 1 SELECT UNIX_TIMESTAMP('1970-01-01 08:00:01.001 +0800', 'yyyy-MM-dd HH:mm:ss.SSS X'); -- Returns 25201 SELECT UNIX_TIMESTAMP('1970-01-01 08:00:01.001 +0800', 'yyyy-MM-dd HH:mm:ss.SSS'); -- Returns -9223372036854775808 SELECT UNIX_TIMESTAMP('1970-01-01 08:00:01.001', 'yyyy-MM-dd HH:mm:ss.SSS X'); WEEK¶ Gets the week of year from a DATE. SyntaxWEEK(date) DescriptionThe WEEK function returns the week of a year from the specified SQL DATE as an integer between 1 and 53. The WEEK function is equivalent to EXTRACT(WEEK FROM date). Example-- returns 39 SELECT WEEK(DATE '1994-09-27'); Related functions DAYOFMONTH DAYOFYEAR QUARTER YEAR YEAR¶ Gets the year from a DATE. SyntaxYEAR(date) The YEAR function returns the year from the specified SQL DATE. The YEAR function is equivalent to EXTRACT(YEAR FROM date). Example-- returns 1994 SELECT YEAR(DATE '1994-09-27'); Related functions DAYOFMONTH DAYOFYEAR QUARTER MONTH Other built-in functions¶ Aggregate Functions Collection Functions Comparison Functions Conditional Functions Datetime Functions Hash Functions JSON Functions ML Preprocessing Functions Model Inference Functions Numeric Functions String Functions Table API Functions Related content¶ User-defined Functions Create a User Defined Function Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql YEAR TO MONTH ``` ```sql DAY TO HOUR ``` ```sql DAY TO MINUTE ``` ```sql DAY TO SECOND ``` ```sql HOUR TO MINUTE ``` ```sql HOUR TO SECOND ``` ```sql MINUTE TO SECOND ``` ```sql MILLISECOND ``` ```sql MILLISECOND ``` ```sql MICROSECOND ``` ```sql MICROSECOND ``` ```sql SQL_TSI_YEAR ``` ```sql SQL_TSI_QUARTER ``` ```sql SQL_TSI_MONTH ``` ```sql SQL_TSI_WEEK ``` ```sql SQL_TSI_DAY ``` ```sql SQL_TSI_HOUR ``` ```sql SQL_TSI_MINUTE ``` ```sql SQL_TSI_SECOND ``` ```sql CEIL(timepoint TO timeintervalunit) ``` ```sql timeintervalunit ``` ```sql -- returns "12:45:00" SELECT CEIL(TIME '12:44:31' TO MINUTE); ``` ```sql CONVERT_TZ(string1, string2, string3) ``` ```sql -- returns "1969-12-31 16:00:00" SELECT CONVERT_TZ('1970-01-01 00:00:00', 'UTC', 'America/Los_Angeles'); ``` ```sql CURRENT_DATE ``` ```sql CURRENT_DATE ``` ```sql CURRENT_DATE ``` ```sql -- returns the current date SELECT CURRENT_DATE; ``` ```sql CURRENT_ROW_TIMESTAMP() ``` ```sql CURRENT_ROW_TIMESTAMP ``` ```sql TIMESTAMP_LTZ(3) ``` ```sql -- returns the timestamp of the current datetime SELECT CURRENT_ROW_TIMESTAMP(); ``` ```sql CURRENT_TIME ``` ```sql CURRENT_TIME ``` ```sql CURRENT_TIME ``` ```sql -- returns the current time, for example: -- 13:03:56 SELECT CURRENT_TIME; ``` ```sql CURRENT_TIMESTAMP ``` ```sql CURRENT_TIMESTAMP ``` ```sql TIMESTAMP_LTZ(3) ``` ```sql CURRENT_TIMESTAMP ``` ```sql CURRENT_TIMESTAMP ``` ```sql -- returns the current timestamp, for example: -- 2023-10-16 13:04:58.081 SELECT CURRENT_TIMESTAMP; ``` ```sql CURRENT_WATERMARK(rowtime) ``` ```sql CURRENT_WATERMARK ``` ```sql TIMESTAMP_LTZ(9) ``` ```sql TIMESTAMP_LTZ(3) ``` ```sql CURRENT_WATERMARK ``` ```sql WHERE CURRENT_WATERMARK(ts) IS NULL OR ts > CURRENT_WATERMARK(ts) ``` ```sql DATE_FORMAT(timestamp, date_format) ``` ```sql DATE_FORMAT ``` ```sql date_format ``` ```sql -- returns "5:32 PM, UTC" SELECT DATE_FORMAT('2023-03-15 17:32:01.009', 'K:mm a, z'); ``` ```sql DATE string ``` ```sql -- returns "2023-05-23" SELECT DATE '2023-05-23'; ``` ```sql DAYOFMONTH(date) ``` ```sql EXTRACT(DAY FROM date) ``` ```sql -- returns 27 SELECT DAYOFMONTH(DATE '1994-09-27'); ``` ```sql DAYOFWEEK(date) ``` ```sql EXTRACT(DOW FROM date) ``` ```sql -- returns 3 SELECT DAYOFWEEK(DATE '1994-09-27'); ``` ```sql DAYOFYEAR(date) ``` ```sql EXTRACT(DOY FROM date) ``` ```sql -- returns 270 SELECT DAYOFYEAR(DATE '1994-09-27'); ``` ```sql EXTRACT(timeintervalunit FROM temporal) ``` ```sql timeintervalunit ``` ```sql -- returns 5 SELECT EXTRACT(DAY FROM DATE '2006-06-05'); ``` ```sql FLOOR(timepoint TO timeintervalunit) ``` ```sql timeintervalunit ``` ```sql -- returns 12:44:00 SELECT FLOOR(TIME '12:44:31' TO MINUTE); ``` ```sql FROM_UNIXTIME(numeric[, string]) ``` ```sql FROM_UNIXTIME ``` ```sql -- Returns "1970-01-01 00:00:44" if in the UTC time zone, -- but returns "1970-01-01 09:00:44" if in the 'Asia/Tokyo' time zone. SELECT FROM_UNIXTIME(44); ``` ```sql HOUR(timestamp) ``` ```sql EXTRACT(HOUR FROM timestamp) ``` ```sql -- returns 13 SELECT HOUR(TIMESTAMP '1994-09-27 13:14:15'); ``` ```sql INTERVAL string range ``` ```sql -- returns +10 00:00:00.004 SELECT INTERVAL '10 00:00:00.004' DAY TO SECOND; -- returns +10 00:00:00.000 SELECT INTERVAL '10' DAY; -- returns +2-10 SELECT INTERVAL '2-10' YEAR TO MONTH; ``` ```sql -- returns the local machine time as "hh:mm:ss", for example: -- 13:16:03 SELECT LOCALTIME; ``` ```sql LOCALTIMESTAMP ``` ```sql LOCALTIMESTAMP ``` ```sql TIMESTAMP(3) ``` ```sql LOCALTIMESTAMP ``` ```sql -- returns the local machine datetime as "yyyy-mm-dd hh:mm:ss.sss", for example: -- 2023-10-16 13:15:32.390 SELECT LOCALTIMESTAMP; ``` ```sql MINUTE(timestamp) ``` ```sql EXTRACT(MINUTE FROM timestamp) ``` ```sql - returns 14 SELECT MINUTE(TIMESTAMP '1994-09-27 13:14:15'); ``` ```sql MONTH(date) ``` ```sql EXTRACT(MONTH FROM date) ``` ```sql -- returns 9 SELECT MONTH(DATE '1994-09-27'); ``` ```sql -- returns the local machine datetime as "yyyy-mm-dd hh:mm:ss.sss", for example: -- 2023-10-16 13:17:54.382 SELECT NOW(); ``` ```sql (timepoint1, temporal1) OVERLAPS (timepoint2, temporal2) ``` ```sql (timepoint1, temporal1) ``` ```sql (timepoint2, temporal2) ``` ```sql -- returns TRUE SELECT (TIME '2:55:00', INTERVAL '1' HOUR) OVERLAPS (TIME '3:30:00', INTERVAL '2' HOUR); -- returns FALSE SELECT (TIME '9:00:00', TIME '10:00:00') OVERLAPS (TIME '10:15:00', INTERVAL '3' HOUR); ``` ```sql QUARTER(date) ``` ```sql EXTRACT(QUARTER FROM date) ``` ```sql -- returns 3 SELECT QUARTER(DATE '1994-09-27'); ``` ```sql SECOND(timestamp) ``` ```sql EXTRACT(SECOND FROM timestamp) ``` ```sql -- returns 15 SELECT SECOND(TIMESTAMP '1994-09-27 13:14:15'); ``` ```sql WATERMARK FOR column AS SOURCE_WATERMARK() ``` ```sql SOURCE_WATERMARK ``` ```sql -- Create a table that has the default watermark strategy -- on the ts column. CREATE TABLE t2 ( i INT, ts TIMESTAMP_LTZ(3), WATERMARK FOR ts AS SOURCE_WATERMARK()); -- The queryable schema for the table has the default watermark -- strategy on the ts column. ( i INT, ts TIMESTAMP_LTZ(3), `$rowtime` TIMESTAMP_LTZ(3) NOT NULL METADATA VIRTUAL COMMENT 'SYSTEM', WATERMARK FOR ts AS SOURCE_WATERMARK() ); ``` ```sql TIME string ``` ```sql -- returns 23:42:55 as a TIME SELECT TIME '23:42:55'; ``` ```sql TIMESTAMP string ``` ```sql -- returns 2023-05-04 23:42:55 as a TIMESTAMP SELECT TIMESTAMP '2023-05-04 23:42:55'; ``` ```sql TO_DATE(string1[, string2]) ``` ```sql -- returns 2023-05-04 as a DATE SELECT TO_DATE('2023-05-04'); ``` ```sql TO_TIMESTAMP(string1[, string2]) ``` ```sql TO_TIMESTAMP ``` ```sql -- returns 2023-05-04 23:42:55.000 as a TIMESTAMP SELECT TO_TIMESTAMP('2023-05-04 23:42:55', 'yyyy-mm-dd hh:mm:ss'); ``` ```sql TIMESTAMP_LTZ ``` ```sql TO_TIMESTAMP_LTZ(numeric, precision) TO_TIMESTAMP_LTZ(string1[, string2[, string3]]) ``` ```sql TO_TIMESTAMP_LTZ ``` ```sql TIMESTAMP_LTZ ``` ```sql TO_TIMESTAMP_LTZ(epoch_seconds, 0) ``` ```sql TO_TIMESTAMP_LTZ(epoch_milliseconds, 3) ``` ```sql -- convert 1000 epoch seconds -- returns 1970-01-01 00:16:40.000 as a TIMESTAMP_LTZ SELECT TO_TIMESTAMP_LTZ(1000, 0); -- convert 1000 epoch milliseconds -- returns 1970-01-01 00:00:01.000 as a TIMESTAMP_LTZ SELECT TO_TIMESTAMP_LTZ(1000, 3); -- convert timestamp string with custom format and timezone -- returns appropriate TIMESTAMP_LTZ based on the timezone SELECT TO_TIMESTAMP_LTZ('2023-05-04 12:00:00', 'yyyy-MM-dd HH:mm:ss', 'America/Los_Angeles'); ``` ```sql TIMESTAMPADD(timeintervalunit, interval, timepoint) ``` ```sql timeintervalunit ``` ```sql -- returns 2000-01-01 SELECT TIMESTAMPADD(DAY, 1, DATE '1999-12-31'); -- returns 2000-01-01 01:00:00 SELECT TIMESTAMPADD(HOUR, 2, TIMESTAMP '1999-12-31 23:00:00'); ``` ```sql TIMESTAMPDIFF(timepointunit, timepoint1, timepoint2) ``` ```sql TIMESTAMPDIFF ``` ```sql timepointunit ``` ```sql -- returns -1 SELECT TIMESTAMPDIFF(DAY, DATE '2000-01-01', DATE '1999-12-31'); -- returns -2 SELECT TIMESTAMPDIFF(HOUR, TIMESTAMP '2000-01-01 01:00:00', TIMESTAMP '1999-12-31 23:00:00'); ``` ```sql UNIX_TIMESTAMP() ``` ```sql UNIX_TIMESTAMP ``` ```sql -- returns Epoch seconds, for example: -- 1697487923 SELECT UNIX_TIMESTAMP(); ``` ```sql UNIX_TIMESTAMP(string1[, string2]) ``` ```sql UNIX_TIMESTAMP(string) ``` ```sql yyyy-MM-dd HH:mm:ss.SSS X ``` ```sql Long.MIN_VALUE(-9223372036854775808) ``` ```sql -- returns 1683201600 SELECT UNIX_TIMESTAMP('2023-05-04 12:00:00'); -- Returns 25201 SELECT UNIX_TIMESTAMP('1970-01-01 08:00:01.001', 'yyyy-MM-dd HH:mm:ss.SSS'); -- Returns 1 SELECT UNIX_TIMESTAMP('1970-01-01 08:00:01.001 +0800', 'yyyy-MM-dd HH:mm:ss.SSS X'); -- Returns 25201 SELECT UNIX_TIMESTAMP('1970-01-01 08:00:01.001 +0800', 'yyyy-MM-dd HH:mm:ss.SSS'); -- Returns -9223372036854775808 SELECT UNIX_TIMESTAMP('1970-01-01 08:00:01.001', 'yyyy-MM-dd HH:mm:ss.SSS X'); ``` ```sql EXTRACT(WEEK FROM date) ``` ```sql -- returns 39 SELECT WEEK(DATE '1994-09-27'); ``` ```sql EXTRACT(YEAR FROM date) ``` ```sql -- returns 1994 SELECT YEAR(DATE '1994-09-27'); ``` --- ### SQL hash functions in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/functions/hash-functions.html Hash Functions in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides these built-in functions to generate hash codes in SQL queries: MD5 SHA1 SHA2 SHA224 SHA256 SHA384 SHA512 MD5¶ Gets the MD5 hash of a string. SyntaxMD5(string) DescriptionThe MD5 function returns the MD5 hash of the specified string as a string of 32 hexadecimal digits. Returns NULL if string is NULL. Example-- returns 99dc0ea422440e5b3f675cffe6d... SELECT MD5('string-to-hash'); SHA1¶ Gets the SHA-1 hash of a string. SyntaxSHA1(string) DescriptionThe SHA1 function returns the SHA-1 hash of the specified string as a string of 40 hexadecimal digits. Returns NULL if string is NULL. Example-- returns 771a2b04044f8c51e3383a2675a... SELECT SHA1('string-to-hash'); SHA2¶ Hashes a string with one of the SHA-2 functions. SyntaxSHA2(string, hashLength) DescriptionThe SHA2 function returns the hash using the SHA-2 family of hash functions (SHA-224, SHA-256, SHA-384, and SHA-512). The first argument, string, is the string to be hashed. The second argument, hashLength, is the bit length of the result. These are the valid bit lengths for hashLength: 224 256 384 512 Returns NULL if string or hashLength is NULL. Example-- returns 222145560dbaa2abc1617e2c7ce... SELECT SHA2('string-to-hash', 512); SHA224¶ Gets the SHA-224 hash of a string. SyntaxSHA224(string) DescriptionThe SHA224 function returns the SHA-224 hash of the specified string as a string of 56 hexadecimal digits. Returns NULL if string is NULL. Example-- returns af1f1c988d9154f2ddb6201f60f... SELECT SHA224('string-to-hash'); SHA256¶ Gets the SHA-256 hash of a string. SyntaxSHA256(string) DescriptionThe SHA256 function returns the SHA-256 hash of the specified string as a string of 64 hexadecimal digits. Returns NULL if string is NULL. Example-- returns 2267d414e45335fd02e64057d55... SELECT SHA256('string-to-hash'); SHA384¶ Gets the SHA-384 hash of a string. SyntaxSHA384(string) DescriptionThe SHA5384 function returns the SHA-384 hash of the specified string as a string of 96 hexadecimal digits. Returns NULL if string is NULL. Example-- returns 02ba979b23f1b4a098732463ea8... SELECT SHA384('string-to-hash'); SHA512¶ Gets the SHA-512 hash of a string. SyntaxSHA512(string) DescriptionThe SHA512 function returns the SHA-512 hash of the specified string as a string of 128 hexadecimal digits. Returns NULL if string is NULL. Example-- returns 222145560dbaa2abc1617e2c7ce... SELECT SHA512('string-to-hash'); Other built-in functions¶ Aggregate Functions Collection Functions Comparison Functions Conditional Functions Datetime Functions Hash Functions JSON Functions ML Preprocessing Functions Model Inference Functions Numeric Functions String Functions Table API Functions Related content¶ User-defined Functions Create a User Defined Function Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql MD5(string) ``` ```sql -- returns 99dc0ea422440e5b3f675cffe6d... SELECT MD5('string-to-hash'); ``` ```sql SHA1(string) ``` ```sql -- returns 771a2b04044f8c51e3383a2675a... SELECT SHA1('string-to-hash'); ``` ```sql SHA2(string, hashLength) ``` ```sql -- returns 222145560dbaa2abc1617e2c7ce... SELECT SHA2('string-to-hash', 512); ``` ```sql SHA224(string) ``` ```sql -- returns af1f1c988d9154f2ddb6201f60f... SELECT SHA224('string-to-hash'); ``` ```sql SHA256(string) ``` ```sql -- returns 2267d414e45335fd02e64057d55... SELECT SHA256('string-to-hash'); ``` ```sql SHA384(string) ``` ```sql -- returns 02ba979b23f1b4a098732463ea8... SELECT SHA384('string-to-hash'); ``` ```sql SHA512(string) ``` ```sql -- returns 222145560dbaa2abc1617e2c7ce... SELECT SHA512('string-to-hash'); ``` --- ### SQL JSON functions in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/functions/json-functions.html JSON Functions in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides these built-in functions to help with JSON in SQL queries: IS JSON JSON_ARRAY JSON_ARRAYAGG JSON_EXISTS JSON_OBJECT JSON_OBJECTAGG JSON_QUERY JSON_QUOTE JSON_STRING JSON_UNQUOTE JSON_VALUE JSON functions make use of JSON path expressions as described in ISO/IEC TR 19075-6 of the SQL standard. Their syntax is inspired by and adopts many features of ECMAScript, but is neither a subset nor superset of the standard. Path expressions come in two flavors, lax and strict. When omitted, it defaults to the strict mode. Strict mode is intended to examine data from a schema perspective and will throw errors whenever data does not adhere to the path expression. However, functions like JSON_VALUE allow defining fallback behavior if an error is encountered. Lax mode, on the other hand, is more forgiving and converts errors to empty sequences. The special character $ denotes the root node in a JSON path. Paths can access properties ($.a), array elements ($.a[0].b), or branch over all elements in an array ($.a[*].b). Known Limitations: Not all features of Lax mode are currently supported. This is an upstream bug (CALCITE-4717). Non-standard behavior is not guaranteed. IS JSON¶ Checks whether a string is valid JSON. SyntaxIS JSON [ { VALUE | SCALAR | ARRAY | OBJECT } ] DescriptionThe IS JSON function determines whether the specified string is valid JSON. Providing the optional type argument constrains the type of JSON object to check for validity. The default is VALUE. If the string is valid JSON but not the provided type, IS JSON returns FALSE. ExamplesThe following SELECT statements return TRUE. -- The following statements return TRUE. SELECT '1' IS JSON; SELECT '[]' IS JSON; SELECT '{}' IS JSON; SELECT '"abc"' IS JSON; SELECT '1' IS JSON SCALAR; SELECT '{}' IS JSON OBJECT; The following SELECT statements return FALSE. -- The following statements return FALSE. SELECT 'abc' IS JSON; SELECT '1' IS JSON ARRAY; SELECT '1' IS JSON OBJECT; SELECT '{}' IS JSON SCALAR; SELECT '{}' IS JSON ARRAY; JSON_ARRAY¶ Creates a JSON array string from a list of values. SyntaxJSON_ARRAY([value]* [ { NULL | ABSENT } ON NULL ]) DescriptionThe JSON_ARRAY function returns a JSON string from the specified list of values. The values can be arbitrary expressions. The ON NULL behavior defines how to handle NULL values. If omitted, ABSENT ON NULL is the default. Elements that are created from other JSON construction function calls are inserted directly, rather than as a string. This enables building nested JSON structures by using the JSON_OBJECT and JSON_ARRAY construction functions. ExamplesThe following SELECT statements return the values indicated in the comment lines. -- returns '[]' SELECT JSON_ARRAY(); -- returns '[1,"2"]' SELECT JSON_ARRAY(1, '2'); -- Use an expression as a value. SELECT JSON_ARRAY(orders.orderId); -- ON NULL -- returns '[null]' SELECT JSON_ARRAY(CAST(NULL AS STRING) NULL ON NULL); -- ON NULL -- returns '[]' SELECT JSON_ARRAY(CAST(NULL AS STRING) ABSENT ON NULL); -- returns '[[1]]' SELECT JSON_ARRAY(JSON_ARRAY(1)); -- returns '[{"nested_json":{"value":42}}]' SELECT JSON_ARRAY(JSON('{"nested_json": {"value": 42}}')); JSON_ARRAYAGG¶ Aggregates items into a JSON array string. SyntaxJSON_ARRAYAGG(items [ { NULL | ABSENT } ON NULL ]) DescriptionThe JSON_ARRAYAGG function creates a JSON object string by aggregating the specified items into an array. The item expressions can be arbitrary, including other JSON functions. If a value is NULL, the ON NULL behavior defines what to do. If omitted, ABSENT ON NULL is the default. The JSON_ARRAYAGG function isn’t supported in OVER windows, unbounded session windows, or HOP windows. Example-- '["Apple","Banana","Orange"]' SELECT JSON_ARRAYAGG(product) FROM orders; JSON_EXISTS¶ Checks a JSON path. SyntaxJSON_EXISTS(jsonValue, path [ { TRUE | FALSE | UNKNOWN | ERROR } ON ERROR ]) DescriptionThe JSON_EXISTS function determines whether a JSON string satisfies a specified path search criterion. If the ON ERROR behavior is omitted, the default is FALSE ON ERROR. ExamplesThe following SELECT statements return TRUE. -- The following statements return TRUE. SELECT JSON_EXISTS('{"a": true}', '$.a'); SELECT JSON_EXISTS('{"a": [{ "b": 1 }]}', '$.a[0].b'); SELECT JSON_EXISTS('{"a": true}', 'strict $.b' TRUE ON ERROR); The following SELECT statements return FALSE. -- The following statements return FALSE. SELECT JSON_EXISTS('{"a": true}', '$.b'); SELECT JSON_EXISTS('{"a": true}', 'strict $.b' FALSE ON ERROR); JSON_OBJECT¶ SyntaxJSON_OBJECT([[KEY] key VALUE value]* [ { NULL | ABSENT } ON NULL ]) DescriptionThe JSON_OBJECT function creates a JSON object string from the specified list of key-value pairs. Keys must be non-NULL string literals, and values may be arbitrary expressions. The JSON_OBJECT function returns a JSON string. The ON NULL behavior defines how to treat NULL values. If omitted, NULL ON NULL is the default. Values that are created from another JSON construction function calls are inserted directly, rather than as a string. This enables building nested JSON structures by using the JSON_OBJECT and JSON_ARRAY construction functions. ExamplesThe following SELECT statements return the values indicated in the comment lines. -- returns '{}' SELECT JSON_OBJECT(); -- returns '{"K1":"V1","K2":"V2"}' SELECT JSON_OBJECT('K1' VALUE 'V1', 'K2' VALUE 'V2'); -- Use an expression as a value. SELECT JSON_OBJECT('orderNo' VALUE orders.orderId); -- ON NULL -- '{"K1":null}' SELECT JSON_OBJECT(KEY 'K1' VALUE CAST(NULL AS STRING) NULL ON NULL); -- ON NULL -- '{}' SELECT JSON_OBJECT(KEY 'K1' VALUE CAST(NULL AS STRING) ABSENT ON NULL); -- returns '{"K1":{"nested_json":{"value":42}}}' SELECT JSON_OBJECT('K1' VALUE JSON('{"nested_json": {"value": 42}}')); -- returns '{"K1":{"K2":"V"}}' SELECT JSON_OBJECT( KEY 'K1' VALUE JSON_OBJECT( KEY 'K2' VALUE 'V' ) ); JSON_OBJECTAGG¶ Aggregates key-value expressions into a JSON string. SyntaxJSON_OBJECTAGG([KEY] key VALUE value [ { NULL | ABSENT } ON NULL ]) DescriptionThe JSON_OBJECTAGG function creates a JSON object string by aggregating key-value expressions into a single JSON object. The key expression must return a non-nullable character string. Value expressions can be arbitrary, including other JSON functions. Keys must be unique. If a key occurs multiple times, an error is thrown. If a value is NULL, the ON NULL behavior defines what to do. If omitted, NULL ON NULL is the default. The JSON_OBJECTAGG function isn’t supported in OVER windows. Example JSON_QUERY¶ Gets values from a JSON string. SyntaxJSON_QUERY(jsonValue, path [ RETURNING ] [ { WITHOUT | WITH CONDITIONAL | WITH UNCONDITIONAL } [ ARRAY ] WRAPPER ] [ { NULL | EMPTY ARRAY | EMPTY OBJECT | ERROR } ON EMPTY ] [ { NULL | EMPTY ARRAY | EMPTY OBJECT | ERROR } ON ERROR ]) DescriptionThe JSON_QUERY function extracts JSON values from the specified JSON string. The result is returned as a STRING or an ARRAY. Use the RETURNING clause to control the return type. The WRAPPER clause specifies whether the extracted value should be wrapped into an array and whether to do so unconditionally or only if the value itself isn’t an array already. The ON EMPTY and ON ERROR clauses specify the behavior if the path expression is empty, or in case an error was raised, respectively. By default, in both cases NULL is returned. Other choices are to use an empty array, an empty object, or to raise an error. ExamplesThe following SELECT statements return the values indicated in the comment lines. -- returns '{ "b": 1 }' SELECT JSON_QUERY('{ "a": { "b": 1 } }', '$.a'); -- returns '[1, 2]' SELECT JSON_QUERY('[1, 2]', '$'); -- returns NULL SELECT JSON_QUERY(CAST(NULL AS STRING), '$'); -- returns array ['c1','c2'] SELECT JSON_QUERY('{"a":[{"c":"c1"},{"c":"c2"}]}', 'lax $.a[*].c' RETURNING ARRAY); -- Wrap the result into an array. -- returns '[{}]' SELECT JSON_QUERY('{}', '$' WITH CONDITIONAL ARRAY WRAPPER); -- returns '[1, 2]' SELECT JSON_QUERY('[1, 2]', '$' WITH CONDITIONAL ARRAY WRAPPER); -- returns '[[1, 2]]' SELECT JSON_QUERY('[1, 2]', '$' WITH UNCONDITIONAL ARRAY WRAPPER); -- Scalars must be wrapped to be returned. -- returns NULL SELECT JSON_QUERY(1, '$'); -- returns '[1]' SELECT JSON_QUERY(1, '$' WITH CONDITIONAL ARRAY WRAPPER); -- Behavior if the path expression is empty. -- returns '{}' SELECT JSON_QUERY('{}', 'lax $.invalid' EMPTY OBJECT ON EMPTY); -- Behavior if the path expression has an error. -- returns '[]' SELECT JSON_QUERY('{}', 'strict $.invalid' EMPTY ARRAY ON ERROR); JSON_QUOTE¶ Quotes a string as a JSON value by wrapping it with double-quote characters. SyntaxJSON_QUOTE(string) DescriptionThe JSON_QUOTE function quotes a string as a JSON value by wrapping it with double-quote characters, escaping interior quote and special characters (’”’, ‘’, ‘/’, ‘b’, ‘f’, ’n’, ‘r’, ’t’), and returning the result as a string. If string is NULL, the function returns NULL. Example -- returns { "SQL string" } SELECT JSON_QUOTE('SQL string'); JSON_STRING¶ Serializes a string to JSON. SyntaxJSON_STRING(value) DescriptionThe JSON_STRING function returns a JSON string containing the serialized value. If the value is NULL, the function returns NULL. ExamplesThe following SELECT statements return the values indicated in the comment lines. -- returns NULL SELECT JSON_STRING(CAST(NULL AS INT)); -- returns '1' SELECT JSON_STRING(1); -- returns 'true' SELECT JSON_STRING(TRUE); -- returns '"Hello, World!"' JSON_STRING('Hello, World!'); -- returns '[1,2]' JSON_STRING(ARRAY[1, 2]) JSON_UNQUOTE¶ Unquotes a JSON value. SyntaxJSON_UNQUOTE(string) DescriptionThe JSON_UNQUOTE function unquotes a JSON value, unescapes escaped special characters (’”’, ‘’, ‘/’, ‘b’, ‘f’, ’n’, ‘r’, ’t’, ‘u’), and returns the result as a string. If string is NULL, the function returns NULL. If string doesn’t start and end with double quotes, or if it starts and ends with double quotes but is not a valid JSON string literal, the value is passed through unmodified. Example -- returns { "SQL string" } SELECT JSON_UNQUOTE('SQL string'); JSON_VALUE¶ Gets a value from a JSON string. SyntaxJSON_VALUE(jsonValue, path [RETURNING ] [ { NULL | ERROR | DEFAULT } ON EMPTY ] [ { NULL | ERROR | DEFAULT } ON ERROR ]) DescriptionThe JSON_VALUE function extracts a scalar value from a JSON string. It searches a JSON string with the specified path expression and returns the value if the value at that path is scalar. Non-scalar values can’t be returned. By default, the value is returned as STRING. Use RETURNING to specify a different return type. The following return types are supported: BOOLEAN DOUBLE INTEGER VARCHAR / STRING For empty path expressions or errors, you can define a behavior to return NULL, raise an error, or return a defined default value instead. The default is NULL ON EMPTY or NULL ON ERROR, respectively. The default value may be a literal or an expression. If the default value itself raises an error, it falls through to the error behavior for ON EMPTY and raises an error for ON ERROR. For paths that contain special characters, like spaces, you can use ['property'] or ["property"] to select the specified property in a parent object. Be sure to put single or double quotes around the property name. When using JSON_VALUE in SQL, the path is a character parameter that’s already single-quoted, so you must escape the single quotes around the property name, for example, JSON_VALUE('{"a b": "true"}', '$.[''a b'']'). ExamplesThe following SELECT statements return the values indicated in the comment lines. -- returns "true" SELECT JSON_VALUE('{"a": true}', '$.a'); -- returns TRUE SELECT JSON_VALUE('{"a": true}', '$.a' RETURNING BOOLEAN); -- returns "false" SELECT JSON_VALUE('{"a": true}', 'lax $.b' DEFAULT FALSE ON EMPTY); -- returns "false" SELECT JSON_VALUE('{"a": true}', 'strict $.b' DEFAULT FALSE ON ERROR); -- returns 0.998D SELECT JSON_VALUE('{"a.b": [0.998,0.996]}','$.["a.b"][0]' RETURNING DOUBLE); -- returns "right" SELECT JSON_VALUE('{"contains blank": "right"}', 'strict $.[''contains blank'']' NULL ON EMPTY DEFAULT 'wrong' ON ERROR); Other built-in functions¶ Aggregate Functions Collection Functions Comparison Functions Conditional Functions Datetime Functions Hash Functions JSON Functions ML Preprocessing Functions Model Inference Functions Numeric Functions String Functions Table API Functions Related content¶ User-defined Functions Create a User Defined Function Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql IS JSON [ { VALUE | SCALAR | ARRAY | OBJECT } ] ``` ```sql -- The following statements return TRUE. SELECT '1' IS JSON; SELECT '[]' IS JSON; SELECT '{}' IS JSON; SELECT '"abc"' IS JSON; SELECT '1' IS JSON SCALAR; SELECT '{}' IS JSON OBJECT; ``` ```sql -- The following statements return FALSE. SELECT 'abc' IS JSON; SELECT '1' IS JSON ARRAY; SELECT '1' IS JSON OBJECT; SELECT '{}' IS JSON SCALAR; SELECT '{}' IS JSON ARRAY; ``` ```sql JSON_ARRAY([value]* [ { NULL | ABSENT } ON NULL ]) ``` ```sql ABSENT ON NULL ``` ```sql JSON_OBJECT ``` ```sql -- returns '[]' SELECT JSON_ARRAY(); -- returns '[1,"2"]' SELECT JSON_ARRAY(1, '2'); -- Use an expression as a value. SELECT JSON_ARRAY(orders.orderId); -- ON NULL -- returns '[null]' SELECT JSON_ARRAY(CAST(NULL AS STRING) NULL ON NULL); -- ON NULL -- returns '[]' SELECT JSON_ARRAY(CAST(NULL AS STRING) ABSENT ON NULL); -- returns '[[1]]' SELECT JSON_ARRAY(JSON_ARRAY(1)); -- returns '[{"nested_json":{"value":42}}]' SELECT JSON_ARRAY(JSON('{"nested_json": {"value": 42}}')); ``` ```sql JSON_ARRAYAGG(items [ { NULL | ABSENT } ON NULL ]) ``` ```sql JSON_ARRAYAGG ``` ```sql ABSENT ON NULL ``` ```sql JSON_ARRAYAGG ``` ```sql -- '["Apple","Banana","Orange"]' SELECT JSON_ARRAYAGG(product) FROM orders; ``` ```sql JSON_EXISTS(jsonValue, path [ { TRUE | FALSE | UNKNOWN | ERROR } ON ERROR ]) ``` ```sql JSON_EXISTS ``` ```sql FALSE ON ERROR ``` ```sql -- The following statements return TRUE. SELECT JSON_EXISTS('{"a": true}', '$.a'); SELECT JSON_EXISTS('{"a": [{ "b": 1 }]}', '$.a[0].b'); SELECT JSON_EXISTS('{"a": true}', 'strict $.b' TRUE ON ERROR); ``` ```sql -- The following statements return FALSE. SELECT JSON_EXISTS('{"a": true}', '$.b'); SELECT JSON_EXISTS('{"a": true}', 'strict $.b' FALSE ON ERROR); ``` ```sql JSON_OBJECT([[KEY] key VALUE value]* [ { NULL | ABSENT } ON NULL ]) ``` ```sql JSON_OBJECT ``` ```sql JSON_OBJECT ``` ```sql NULL ON NULL ``` ```sql JSON_OBJECT ``` ```sql -- returns '{}' SELECT JSON_OBJECT(); -- returns '{"K1":"V1","K2":"V2"}' SELECT JSON_OBJECT('K1' VALUE 'V1', 'K2' VALUE 'V2'); -- Use an expression as a value. SELECT JSON_OBJECT('orderNo' VALUE orders.orderId); -- ON NULL -- '{"K1":null}' SELECT JSON_OBJECT(KEY 'K1' VALUE CAST(NULL AS STRING) NULL ON NULL); -- ON NULL -- '{}' SELECT JSON_OBJECT(KEY 'K1' VALUE CAST(NULL AS STRING) ABSENT ON NULL); -- returns '{"K1":{"nested_json":{"value":42}}}' SELECT JSON_OBJECT('K1' VALUE JSON('{"nested_json": {"value": 42}}')); -- returns '{"K1":{"K2":"V"}}' SELECT JSON_OBJECT( KEY 'K1' VALUE JSON_OBJECT( KEY 'K2' VALUE 'V' ) ); ``` ```sql JSON_OBJECTAGG([KEY] key VALUE value [ { NULL | ABSENT } ON NULL ]) ``` ```sql JSON_OBJECTAGG ``` ```sql NULL ON NULL ``` ```sql JSON_OBJECTAGG ``` ```sql JSON_QUERY(jsonValue, path [ RETURNING ] [ { WITHOUT | WITH CONDITIONAL | WITH UNCONDITIONAL } [ ARRAY ] WRAPPER ] [ { NULL | EMPTY ARRAY | EMPTY OBJECT | ERROR } ON EMPTY ] [ { NULL | EMPTY ARRAY | EMPTY OBJECT | ERROR } ON ERROR ]) ``` ```sql ARRAY ``` ```sql -- returns '{ "b": 1 }' SELECT JSON_QUERY('{ "a": { "b": 1 } }', '$.a'); -- returns '[1, 2]' SELECT JSON_QUERY('[1, 2]', '$'); -- returns NULL SELECT JSON_QUERY(CAST(NULL AS STRING), '$'); -- returns array ['c1','c2'] SELECT JSON_QUERY('{"a":[{"c":"c1"},{"c":"c2"}]}', 'lax $.a[*].c' RETURNING ARRAY); -- Wrap the result into an array. -- returns '[{}]' SELECT JSON_QUERY('{}', '$' WITH CONDITIONAL ARRAY WRAPPER); -- returns '[1, 2]' SELECT JSON_QUERY('[1, 2]', '$' WITH CONDITIONAL ARRAY WRAPPER); -- returns '[[1, 2]]' SELECT JSON_QUERY('[1, 2]', '$' WITH UNCONDITIONAL ARRAY WRAPPER); -- Scalars must be wrapped to be returned. -- returns NULL SELECT JSON_QUERY(1, '$'); -- returns '[1]' SELECT JSON_QUERY(1, '$' WITH CONDITIONAL ARRAY WRAPPER); -- Behavior if the path expression is empty. -- returns '{}' SELECT JSON_QUERY('{}', 'lax $.invalid' EMPTY OBJECT ON EMPTY); -- Behavior if the path expression has an error. -- returns '[]' SELECT JSON_QUERY('{}', 'strict $.invalid' EMPTY ARRAY ON ERROR); ``` ```sql JSON_QUOTE(string) ``` ```sql -- returns { "SQL string" } SELECT JSON_QUOTE('SQL string'); ``` ```sql JSON_STRING(value) ``` ```sql JSON_STRING ``` ```sql -- returns NULL SELECT JSON_STRING(CAST(NULL AS INT)); -- returns '1' SELECT JSON_STRING(1); -- returns 'true' SELECT JSON_STRING(TRUE); -- returns '"Hello, World!"' JSON_STRING('Hello, World!'); -- returns '[1,2]' JSON_STRING(ARRAY[1, 2]) ``` ```sql JSON_UNQUOTE(string) ``` ```sql JSON_UNQUOTE ``` ```sql -- returns { "SQL string" } SELECT JSON_UNQUOTE('SQL string'); ``` ```sql JSON_VALUE(jsonValue, path [RETURNING ] [ { NULL | ERROR | DEFAULT } ON EMPTY ] [ { NULL | ERROR | DEFAULT } ON ERROR ]) ``` ```sql NULL ON EMPTY ``` ```sql NULL ON ERROR ``` ```sql ['property'] ``` ```sql ["property"] ``` ```sql JSON_VALUE('{"a b": "true"}', '$.[''a b'']') ``` ```sql -- returns "true" SELECT JSON_VALUE('{"a": true}', '$.a'); -- returns TRUE SELECT JSON_VALUE('{"a": true}', '$.a' RETURNING BOOLEAN); -- returns "false" SELECT JSON_VALUE('{"a": true}', 'lax $.b' DEFAULT FALSE ON EMPTY); -- returns "false" SELECT JSON_VALUE('{"a": true}', 'strict $.b' DEFAULT FALSE ON ERROR); -- returns 0.998D SELECT JSON_VALUE('{"a.b": [0.998,0.996]}','$.["a.b"][0]' RETURNING DOUBLE); -- returns "right" SELECT JSON_VALUE('{"contains blank": "right"}', 'strict $.[''contains blank'']' NULL ON EMPTY DEFAULT 'wrong' ON ERROR); ``` --- ### Machine-Learning Preprocessing Functions in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/functions/ml-preprocessing-functions.html Machine-Learning Preprocessing Functions in Confluent Cloud for Apache Flink¶ The following built-in functions are available for ML preprocessing in Confluent Cloud for Apache Flink®. These functions help transform features into representations more suitable for downstream processors. ML_BUCKETIZE ML_CHARACTER_TEXT_SPLITTER ML_FILE_FORMAT_TEXT_SPLITTER ML_LABEL_ENCODER ML_MAX_ABS_SCALER ML_MIN_MAX_SCALER ML_NGRAMS ML_NORMALIZER ML_ONE_HOT_ENCODER ML_RECURSIVE_TEXT_SPLITTER ML_ROBUST_SCALER ML_STANDARD_SCALER ML_BUCKETIZE¶ Bucketizes numerical values into discrete bins based on split points. SyntaxML_BUCKETIZE(value, splitBucketPoints [, bucketNames]) DescriptionThe ML_BUCKETIZE function divides numerical values into discrete buckets based on specified split points. Each bucket represents a range of values, and the function returns the bucket index or name for each input value. Arguments value: Numerical expression to be bucketized. If the input value is NaN or NULL, it is bucketized to the NULL bucket. splitBucketPoints: Array of numerical values that define the bucket boundaries, or split points. If the splitBucketPoints array is empty, an exception is thrown. Any split points that are NaN or NULL are removed from the splitBucketPoints array. splitBucketPoints must be in ascending order, or an exception is thrown. Duplicates are removed from splitBucketPoints. bucketNames: (Optional) Array of names of the buckets defined in splitBucketPoints. If the bucketNames array is not provided, buckets are named bin_NULL, bin_1, bin_2 … bin_n, with n being the total number of buckets in splitBucketPoints. If the bucketNames array is provided, names must be in the same order as in the splitBucketPoints array. Names for all of the buckets must be provided, including the NULL bucket, or an exception is thrown. If the bucketNames array is provided, the first name is the name for the NULL bucket. Example-- returns 'bin_2' SELECT ML_BUCKETIZE(2, ARRAY[1, 4, 7]); -- returns 'b2' SELECT ML_BUCKETIZE(2, ARRAY[1, 4, 7], ARRAY['b_null','b1','b2','b3','b4']); ML_CHARACTER_TEXT_SPLITTER¶ Splits text into chunks based on character count and separators. SyntaxML_CHARACTER_TEXT_SPLITTER(text, chunkSize, chunkOverlap, separator, isSeparatorRegex [, trimWhitespace] [, keepSeparator] [, separatorPosition]) DescriptionThe ML_CHARACTER_TEXT_SPLITTER function splits text into chunks based on character count and specified separators. This is useful for processing large text documents into smaller, manageable pieces. If any argument other than text is NULL, an exception is thrown. The returned array of chunks has the same order as the input. The function tries to keep every chunk within the chunkSize limit, but if a chunk is more than the limit, it is returned as is. Arguments text: The input text to be split. If the input text is NULL, it is returned as is. chunkSize: The size of each chunk. If chunkSize < 0 or chunkOverlap > chunkSize, an exception is thrown. chunkOverlap: The number of overlapping characters between chunks. If chunkOverlap < 0, an exception is thrown. separator: The separator used for splitting. isSeparatorRegex: Whether the separator is a regex pattern. trimWhitespace: (Optional) Whether to trim whitespace from chunks. The default is TRUE. keepSeparator: (Optional) Whether to keep the separator in the chunks. The default is FALSE. separatorPosition: (Optional) The position of the separator. Valid values are START or END. The default is START. START means place the separator at the start of the following chunk, and END means place the separator at the end of the previous chunk. Example-- returns ['This is the text I would like to ch', 'o chunk up. It is the example text ', 'ext for this exercise'] SELECT ML_CHARACTER_TEXT_SPLITTER('This is the text I would like to chunk up. It is the example text for this exercise', 35, 4, '', TRUE, FALSE, TRUE, 'END'); ML_FILE_FORMAT_TEXT_SPLITTER¶ Splits text into chunks based on specific file format patterns. SyntaxML_FILE_FORMAT_TEXT_SPLITTER(text, chunkSize, chunkOverlap, formatName, [trimWhitespace] [, keepSeparator] [, separatorPosition]) DescriptionThe ML_FILE_FORMAT_TEXT_SPLITTER function splits text into chunks based on specific file format patterns. It uses format-specific separators to split code intelligently or structure text. The returned array of chunks has the same order as the input. The function starts splitting the chunks with the first separator in the separators list. If a chunk is bigger than chunkSize, the function splits the chunk recursively using the next separator in the separators list for the given file format. If separators are exhausted, and the remaining text is bigger than chunkSize, the function returns the smallest chunk possible, even though it is bigger than chunkSize. Arguments text: The input text to be split. If the input text is NULL, it is returned as is. chunkSize: The size of each chunk. If chunkSize < 0 or chunkOverlap > chunkSize, an exception is thrown. chunkOverlap: The number of overlapping characters between chunks. If chunkOverlap < 0, an exception is thrown. formatName: ENUM of the format names. Valid values are: Valid values for formatName C CPP CSHARP ELIXIR GO HTML JAVA JAVASCRIPT JSON KOTLIN LATEX MARKDOWN PHP PYTHON RUBY RUST SCALA SQL SWIFT TYPESCRIPT XML trimWhitespace: (Optional) Whether to trim whitespace from chunks. The default is TRUE. keepSeparator: (Optional) Whether to keep the separator in the chunks. The default is FALSE. separatorPosition: (Optional) The position of the separator. Valid values are START or END. The default is START. START means place the separator at the start of the following chunk, and END means place the separator at the end of the previous chunk. Example-- returns ['def hello_world():\n print("Hello, World!")', '# Call the function\nhello_world()'] SELECT ML_FILE_FORMAT_TEXT_SPLITTER('def hello_world():\n print("Hello, World!")\n\n# Call the function\nhello_world()\n', 50, 0, 'PYTHON'); ML_LABEL_ENCODER¶ Encodes categorical variables into numerical labels. SyntaxML_LABEL_ENCODER(input, categories [, includeZeroLabel]) DescriptionThe ML_LABEL_ENCODER function encodes categorical variables into numerical labels. Each unique category is assigned a unique integer label. Arguments input: Input value to encode. If the input value is NULL, NaN, or Infinity, it is considered in the unknown category, which is given the 0 label. If the input value is not one of the categories, it is labeled as -1 or 0 depending on includeZeroLabel: -1 if includeZeroLabel is TRUE and 0 if includeZeroLabel is FALSE. categories: Arrays of category values to encode input value to. Category values must be the same type as the input value. If the categories array is empty, all inputs are considered to be in the unknown category, which is given the 0 label. The categories array can’t be NULL, or an exception is thrown. The categories array can’t have NULL or duplicate values, or an exception is thrown. The categories array must be sorted in ascending lexicographical order, or an exception is thrown. includeZeroLabel: (Optional) The start index for valid categories is 0. The default is FALSE. If includeZeroLabel is TRUE, the valid categories index starts at 0, and unknown values are labeled as -1. If includeZeroLabel is FALSE, the valid categories index starts at 1, and unknown values are labeled as 0. Example-- returns 1 SELECT ML_LABEL_ENCODER('abc', ARRAY['abc', 'def', 'efg', 'hikj']); -- returns 0 SELECT ML_LABEL_ENCODER('abc', ARRAY['abc', 'def', 'efg', 'hikj'], TRUE ); ML_MAX_ABS_SCALER¶ Scales numerical values by their maximum absolute value. SyntaxML_MAX_ABS_SCALER(value, absoluteMax) DescriptionThe ML_MAX_ABS_SCALER function scales numerical values by dividing them by the maximum absolute value. This preserves zero entries in sparse data. Arguments value: Numerical expression to be scaled. If the input value is NULL, NaN, or Infinity, it is returned as is. absoluteMax: Absolute Maximum value of the feature data seen in the dataset. If absoluteMax is NULL or NaN, an exception is thrown. If absoluteMax is Infinity, 0 is returned. If absoluteMax is 0, the scaled value is returned as is. Example-- returns 0.2 SELECT ML_MAX_ABS_SCALER(1, 5); ML_MIN_MAX_SCALER¶ Scales numerical values to a specified range using min-max normalization. SyntaxML_MIN_MAX_SCALER(value, min, max) DescriptionThe ML_MIN_MAX_SCALER function scales numerical values to a specified range using min-max normalization. The function transforms values to the range [0, 1] by default, or to a custom range if min and max are specified. Arguments value: Numerical expression to be scaled. If the input value is NULL, NaN, or Infinity, it is returned as is. If value > max, it is set to 1.0. If value < min, it is set to 0.0. If max == min, the range is set to 1.0 to avoid division by zero. min: Minimum value of the feature data seen in the dataset. If min is NULL, NaN, or Infinity, an exception is thrown. max: Maximum value of the feature data seen in the dataset. If max is NULL, NaN, or Infinity, an exception is thrown. If max < min, an exception is thrown. Example-- returns 0.25 SELECT ML_MIN_MAX_SCALER(2, 1, 5); ML_NGRAMS¶ Generates n-grams from an array of strings. SyntaxML_NGRAMS(input [, nValue] [, separator]) DescriptionThe ML_NGRAMS function generates n-grams from an array of strings. N-grams are contiguous sequences of n items from a given sample of text. The ordering of the returned output is the same as the input array. Arguments input: Array of CHAR or VARCHAR to return n-gram for. If the input array has NULL, it is ignored while forming N-GRAMS. If the input array is NULL or empty, an empty N-GRAMS array is returned. Empty strings in the input array are treated as is. Strings with only whitespace are treated as empty strings. nValue: (Optional) N value of n-gram function. The default is 2. If nValue < 1, an exception is thrown. If nValue > input.size(), an empty N-GRAMS array is returned. separator: (Optional) Characters to join n-gram values with. The default is whitespace. Example-- returns ['ab', 'cd', 'de', 'pwe'] SELECT ML_NGRAMS(ARRAY['ab', 'cd', 'de', 'pwe'], 1, '#'); -- returns ['ab#cd', 'cd#de'] SELECT ML_NGRAMS(ARRAY['ab','cd','de', NULL], 2, '#'); ML_NORMALIZER¶ Normalizes numerical values using p-norm normalization. SyntaxML_NORMALIZER(value, normValue) DescriptionThe ML_NORMALIZER function normalizes numerical values using p-norm normalization. This scales each sample to have unit norm. Arguments value: Numerical expression to be scaled. If the input value is NULL, NaN, or Infinity, it is returned as is. normValue: Calculated norm value of the feature data using p-norm. If normValue is NULL or NaN, an exception is thrown. If normValue is Infinity, 0 is returned. If normValue is 0, which is only possible when all the values are 0, the input value is returned as is. Example-- returns 0.6 SELECT ML_NORMALIZER(3.0, 5.0); ML_ONE_HOT_ENCODER¶ Encodes categorical variables into a binary vector representation. SyntaxML_ONE_HOT_ENCODER(input, categories [, dropLast] [, handleUnknown]) DescriptionThe ML_ONE_HOT_ENCODER function encodes categorical variables into a binary vector representation. Each category is represented by a binary vector where only one element is 1 and the rest are 0. Arguments input: Input value to encode. If the input value is NULL, it is considered to be in the unknown category. categories: Array of category values to encode input value to. The input argument must be of same type as the categories array. If the categories array is empty, an exception is thrown. The categories array can’t be NULL, or an exception is thrown. The categories array can’t have NULL or duplicate values, or an exception is thrown. dropLast: (Optional) Whether to drop the last category. The default is TRUE. By default, the last category is dropped, to prevent perfectly collinear features. handleUnknown: (Optional) ERROR, IGNORE, KEEP options to indicate how to handle unknown values. The default is IGNORE. If handleUnknown is ERROR, an exception is thrown when the input is an unknown value. If handleUnknown is IGNORE, unknown values are ignored and values of all the columns are 0. If handleUnknown is KEEP, the unknown category column has value 1. If handleUnknown is KEEP, the last column is for the unknown category. Example-- returns [1, 0, 0, 0] SELECT ML_ONE_HOT_ENCODER('abc', ARRAY['abc', 'def', 'efg', 'hikj']); -- returns [0, 0, 0, 0, 1] SELECT ML_ONE_HOT_ENCODER('abcd', ARRAY['abc', 'def', 'efg', 'hik'], TRUE, 'KEEP' ); ML_RECURSIVE_TEXT_SPLITTER¶ Splits text into chunks using multiple separators recursively. SyntaxML_RECURSIVE_TEXT_SPLITTER(text, chunkSize, chunkOverlap [, separators] [, isSeparatorRegex] [, trimWhitespace] [, keepSeparator] [, separatorPosition]) DescriptionThe ML_RECURSIVE_TEXT_SPLITTER function splits text into chunks using multiple separators recursively. It starts with the first separator and recursively applies subsequent separators if chunks are still too large. If any argument other than text is NULL, an exception is thrown. The returned array of chunks has the same order as the input. Arguments text: The input text to be split. If the input text is NULL, it is returned as is. chunkSize: The size of each chunk. If chunkSize < 0 or chunkOverlap > chunkSize, an exception is thrown. chunkOverlap: The number of overlapping characters between chunks. If chunkOverlap < 0, an exception is thrown. separators: (Optional) The list of separators used for splitting. The default is ["\n\n", "\n", " ", ""] isSeparatorRegex: (Optional) Whether the separator is a regex pattern. The default is FALSE trimWhitespace: (Optional) Whether to trim whitespace from chunks. The default is TRUE keepSeparator: (Optional) Whether to keep the separator in the chunks. The default is FALSE separatorPosition: (Optional) The position of the separator. Valid values are START or END. The default is START. START means place the separator at the start of the following chunk, and END means place the separator at the end of the previous chunk. Example-- returns ['Hello', '. world', '!'] SELECT ML_RECURSIVE_TEXT_SPLITTER('Hello. world!', 0, 0, ARRAY['[!]','[.]'], TRUE, TRUE, TRUE, 'START'); ML_ROBUST_SCALER¶ Scales numerical values using statistics that are robust to outliers. SyntaxML_ROBUST_SCALER(value, median, firstQuartile, thirdQuartile [, withCentering, withScaling) DescriptionThe ML_ROBUST_SCALER function scales numerical values using statistics that are robust to outliers. It removes the median and scales the data according to the quantile range. Arguments value: Numerical expression to be scaled. If the input value is NULL, NaN, or Infinity, it is returned as is. median: Median of the feature data seen in the training dataset. If median is NULL, NaN, or Infinity, an exception is thrown. firstQuartile: First Quartile of feature data seen in the dataset. If firstQuartile is NULL, NaN, or Infinity, an exception is thrown. thirdQuartile: Third Quartile of feature data seen in the dataset. If thirdQuartile is NULL, NaN, or Infinity, an exception is thrown. If thirdQuartile - firstQuartile = 0, the range is set to 1.0 to avoid division by zero. withCentering: (Optional) Boolean value indicating to center the numerical value using median before scaling. The default is TRUE. If withCentering is FALSE, the median value is ignored. withScaling: (Optional) Boolean value indicating to scale the numerical value using IQR after centering. The default is TRUE. If withScaling is FALSE, the firstQuartile and thirdQuartile values are ignored. Example-- returns 0.3333333333333333 SELECT ML_ROBUST_SCALER(2, 1, 0, 3, TRUE, TRUE); ML_STANDARD_SCALER¶ Standardizes numerical values by removing the mean and scaling to unit variance. SyntaxML_STANDARD_SCALER(value, mean, standardDeviation [, withCentering] [, withScaling]) DescriptionThe ML_STANDARD_SCALER function standardizes numerical values by removing the mean and scaling to unit variance. This is useful for features that follow a normal distribution. Arguments value: Numerical expression to be scaled. If the input value is NULL, NaN or Infinity, it is returned as is. mean: Mean of the feature data seen in the dataset. If mean is NULL, NaN or Infinity, an exception is thrown. standardDeviation: Standard Deviation of the feature data seen in the dataset. If standardDeviation is NULL or NaN, an exception is thrown. If standardDeviation is Infinity, 0 is returned. If standardDeviation is 0, the value does not need to be scaled, so it is returned as is. withCentering: (Optional) Boolean value indicating to center the numerical value using mean before scaling. The default is TRUE. If withCentering is FALSE, the mean value is ignored. withScaling: (Optional) Boolean value indicating to scale the numerical value using std after centering. The default is TRUE. If withScaling is FALSE, the standardDeviation value is ignored. Example-- returns 0.2 SELECT ML_STANDARD_SCALER(2, 1, 5, TRUE, TRUE); Other built-in functions¶ Aggregate Functions Collection Functions Comparison Functions Conditional Functions Datetime Functions Hash Functions JSON Functions ML Preprocessing Functions Model Inference Functions Numeric Functions String Functions Table API Functions Related content¶ AI Model Inference Functions Build AI with Flink SQL Flink SQL Queries Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql ML_BUCKETIZE(value, splitBucketPoints [, bucketNames]) ``` ```sql ML_BUCKETIZE ``` ```sql splitBucketPoints ``` ```sql splitBucketPoints ``` ```sql splitBucketPoints ``` ```sql splitBucketPoints ``` ```sql splitBucketPoints ``` ```sql bucketNames ``` ```sql splitBucketPoints ``` ```sql bucketNames ``` ```sql splitBucketPoints ``` ```sql bucketNames ``` ```sql -- returns 'bin_2' SELECT ML_BUCKETIZE(2, ARRAY[1, 4, 7]); -- returns 'b2' SELECT ML_BUCKETIZE(2, ARRAY[1, 4, 7], ARRAY['b_null','b1','b2','b3','b4']); ``` ```sql ML_CHARACTER_TEXT_SPLITTER(text, chunkSize, chunkOverlap, separator, isSeparatorRegex [, trimWhitespace] [, keepSeparator] [, separatorPosition]) ``` ```sql ML_CHARACTER_TEXT_SPLITTER ``` ```sql chunkSize < 0 ``` ```sql chunkOverlap > chunkSize ``` ```sql chunkOverlap < 0 ``` ```sql -- returns ['This is the text I would like to ch', 'o chunk up. It is the example text ', 'ext for this exercise'] SELECT ML_CHARACTER_TEXT_SPLITTER('This is the text I would like to chunk up. It is the example text for this exercise', 35, 4, '', TRUE, FALSE, TRUE, 'END'); ``` ```sql ML_FILE_FORMAT_TEXT_SPLITTER(text, chunkSize, chunkOverlap, formatName, [trimWhitespace] [, keepSeparator] [, separatorPosition]) ``` ```sql ML_FILE_FORMAT_TEXT_SPLITTER ``` ```sql chunkSize < 0 ``` ```sql chunkOverlap > chunkSize ``` ```sql chunkOverlap < 0 ``` ```sql -- returns ['def hello_world():\n print("Hello, World!")', '# Call the function\nhello_world()'] SELECT ML_FILE_FORMAT_TEXT_SPLITTER('def hello_world():\n print("Hello, World!")\n\n# Call the function\nhello_world()\n', 50, 0, 'PYTHON'); ``` ```sql ML_LABEL_ENCODER(input, categories [, includeZeroLabel]) ``` ```sql ML_LABEL_ENCODER ``` ```sql includeZeroLabel ``` ```sql includeZeroLabel ``` ```sql includeZeroLabel ``` ```sql includeZeroLabel ``` ```sql includeZeroLabel ``` ```sql -- returns 1 SELECT ML_LABEL_ENCODER('abc', ARRAY['abc', 'def', 'efg', 'hikj']); -- returns 0 SELECT ML_LABEL_ENCODER('abc', ARRAY['abc', 'def', 'efg', 'hikj'], TRUE ); ``` ```sql ML_MAX_ABS_SCALER(value, absoluteMax) ``` ```sql ML_MAX_ABS_SCALER ``` ```sql absoluteMax ``` ```sql absoluteMax ``` ```sql absoluteMax ``` ```sql -- returns 0.2 SELECT ML_MAX_ABS_SCALER(1, 5); ``` ```sql ML_MIN_MAX_SCALER(value, min, max) ``` ```sql ML_MIN_MAX_SCALER ``` ```sql value > max ``` ```sql value < min ``` ```sql -- returns 0.25 SELECT ML_MIN_MAX_SCALER(2, 1, 5); ``` ```sql ML_NGRAMS(input [, nValue] [, separator]) ``` ```sql nValue > input.size() ``` ```sql -- returns ['ab', 'cd', 'de', 'pwe'] SELECT ML_NGRAMS(ARRAY['ab', 'cd', 'de', 'pwe'], 1, '#'); -- returns ['ab#cd', 'cd#de'] SELECT ML_NGRAMS(ARRAY['ab','cd','de', NULL], 2, '#'); ``` ```sql ML_NORMALIZER(value, normValue) ``` ```sql ML_NORMALIZER ``` ```sql -- returns 0.6 SELECT ML_NORMALIZER(3.0, 5.0); ``` ```sql ML_ONE_HOT_ENCODER(input, categories [, dropLast] [, handleUnknown]) ``` ```sql ML_ONE_HOT_ENCODER ``` ```sql handleUnknown ``` ```sql handleUnknown ``` ```sql handleUnknown ``` ```sql handleUnknown ``` ```sql -- returns [1, 0, 0, 0] SELECT ML_ONE_HOT_ENCODER('abc', ARRAY['abc', 'def', 'efg', 'hikj']); -- returns [0, 0, 0, 0, 1] SELECT ML_ONE_HOT_ENCODER('abcd', ARRAY['abc', 'def', 'efg', 'hik'], TRUE, 'KEEP' ); ``` ```sql ML_RECURSIVE_TEXT_SPLITTER(text, chunkSize, chunkOverlap [, separators] [, isSeparatorRegex] [, trimWhitespace] [, keepSeparator] [, separatorPosition]) ``` ```sql ML_RECURSIVE_TEXT_SPLITTER ``` ```sql chunkSize < 0 ``` ```sql chunkOverlap > chunkSize ``` ```sql chunkOverlap < 0 ``` ```sql ["\n\n", "\n", " ", ""] ``` ```sql -- returns ['Hello', '. world', '!'] SELECT ML_RECURSIVE_TEXT_SPLITTER('Hello. world!', 0, 0, ARRAY['[!]','[.]'], TRUE, TRUE, TRUE, 'START'); ``` ```sql ML_ROBUST_SCALER(value, median, firstQuartile, thirdQuartile [, withCentering, withScaling) ``` ```sql ML_ROBUST_SCALER ``` ```sql firstQuartile ``` ```sql thirdQuartile ``` ```sql thirdQuartile - firstQuartile = 0 ``` ```sql withCentering ``` ```sql withScaling ``` ```sql -- returns 0.3333333333333333 SELECT ML_ROBUST_SCALER(2, 1, 0, 3, TRUE, TRUE); ``` ```sql ML_STANDARD_SCALER(value, mean, standardDeviation [, withCentering] [, withScaling]) ``` ```sql ML_STANDARD_SCALER ``` ```sql standardDeviation ``` ```sql standardDeviation ``` ```sql standardDeviation ``` ```sql withCentering ``` ```sql withScaling ``` ```sql standardDeviation ``` ```sql -- returns 0.2 SELECT ML_STANDARD_SCALER(2, 1, 5, TRUE, TRUE); ``` --- ### AI Model Inference and Machine Learning Functions in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/functions/model-inference-functions.html AI Model Inference and Machine Learning Functions in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides built-in functions for invoking remote AI/ML models in Flink SQL queries. These simplify developing and deploying AI applications by providing a unified platform for both data processing and AI/ML tasks. AI_COMPLETE: Generate text completions. AI_EMBEDDING: Create embeddings. AI_FORECAST: Forecast trends. AI_TOOL_INVOKE: Invoke model context protocol (MCP) tools. ML_DETECT_ANOMALIES: Detect anomalies in your data. ML_EVALUATE: Evaluate the performance of an AI/ML model. ML_PREDICT: Run a remote AI/ML model for tasks like predicting outcomes, generating text, and classification. Search Functions¶ Confluent Cloud for Apache Flink also supports read-only external tables to enable search with federated query execution on external databases. KEY_SEARCH_AGG: Perform exact key lookups in external databases like JDBC, REST APIs, MongoDB, and Couchbase. TEXT_SEARCH_AGG: Execute full-text searches in external databases like MongoDB, Couchbase, and Elasticsearch. VECTOR_SEARCH_AGG: Run semantic similarity searches using vector embeddings in databases like MongoDB, Pinecone, Elasticsearch, and Couchbase. For machine-language preprocessing utilities, see ML Preprocessing Functions. ML_PREDICT¶ Run a remote AI/ML model for tasks like predicting outcomes, generating text, and classification. SyntaxML_PREDICT(`model_name[$version_id]`, column); -- map settings are optional ML_PREDICT(`model_name[$version_id]`, column, map['async_enabled', [boolean], 'client_timeout', [int], 'max_parallelism', [int], 'retry_count', [int]]); DescriptionThe ML_PREDICT function performs predictions using pre-trained machine learning models. The first argument to the ML_PREDICT table function is the model name. The other arguments are the columns used for prediction. They are defined in the model resource INPUT for AI models and may vary in length or type. Before using ML_PREDICT, you must register the model by using the CREATE MODEL statement. For more information, see Run an AI Model. ConfigurationYou can control how calls to the remote model execute with these optional parameters. async_enabled: Calls to remote models are asynchronous and don’t block. The default is true. client_timeout: Time, in seconds, after which the request to the model endpoint times out. The default is 30 seconds. debug: Return a detailed stack trace in the API response. The default is false. Confluent Cloud for Apache Flink implements data masking for error messages to remove any secrets or customer input, but the stack trace may contain the prompt itself or some part of the response string. retry_count: Maximum number of times the remote model request is retried if the request to the model fails. The default is 3. max_parallelism: Maximum number of parallel requests that the function can make. Can be used only when async_enabled is true. The default is 10. ExampleAfter you have registered the AI model by using the CREATE MODEL statement, run the model by using the ML_PREDICT function in a SQL query. The following example runs a model named embeddingmodel on the data in a table named text_stream. SELECT id, text, embedding FROM text_stream, LATERAL TABLE(ML_PREDICT('embeddingmodel', text)); The following examples call the ML_PREDICT function with different configurations. -- Specify the timeout. SELECT * FROM `db1`.`tb1`, LATERAL TABLE(ML_PREDICT('md1', key, map['client_timeout', 60 ])); -- Specify all configuration parameters. SELECT * FROM `db1`.`tb1`, LATERAL TABLE(ML_PREDICT('md1', key, map['async_enabled', true, 'client_timeout', 60, 'max_parallelism', 20, 'retry_count', 5])); ML_DETECT_ANOMALIES¶ Identify outliers in a data stream. SyntaxML_DETECT_ANOMALIES( data_column, timestamp_column, JSON_OBJECT('p' VALUE 1, 'q' VALUE 1, 'd' VALUE 1, 'minTrainingSize' VALUE 10)); DescriptionThe ML_DETECT_ANOMALIES function uses an ARIMA model to identify outliers in time-series data. Your data must include: A timestamp column. A target column representing some quantity of interest at each timestamp. For more information, see Detect Anomalies in Data. ParametersFor anomaly detection parameters, see ARIMA model parameters. ExampleSELECT ML_DETECT_ANOMALIES( total_orderunits, summed_ts, JSON_OBJECT('p' VALUE 1, 'q' VALUE 1, 'd' VALUE 1, 'minTrainingSize' VALUE 10)) OVER ( ORDER BY summed_ts RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS anomalies FROM test_table; ML_EVALUATE¶ Aggregate a table and return model evaluation metrics. SyntaxML_EVALUATE(`model_name`, label, col1, col2, ...) FROM 'eval_data_table'; Description¶ The ML_EVALUATE function is a table aggregation function that takes an entire table and returns a single row of model evaluation metrics. If run on all versions of a model, the function returns one row for each model version. After comparing the metrics for different versions, you can update the default version for deployment with the model that has the best evaluation metrics. Internally, the ML_EVALUATE function runs ML_PREDICT and processes the results. Before using ML_EVALUATE, you must register the model by using the CREATE MODEL statement. The first argument to the ML_EVALUATE table function is the model name. The second argument is the true label that the output of the model should be evaluated against. Its type depends on the model OUTPUT type and the model task. The other arguments are the columns used for prediction. They are defined in the model resource INPUT for AI models and may vary in length or type. The return type of the ML_EVALUATE function is Map for all types of tasks. Each task type has different metrics keys in the map, depending on the task type. Metrics¶ The metric columns returned by ML_EVALUATE depend on the task type of the specified model. Classification¶ Classification models choose a group to place their inputs in and return one of N possible values. A classification model that returns only 2 possible values is called a binary classifier. If it returns more than 2 values, it is referred to as multi-class. Classification models return these metrics: Accuracy: Total Fraction of correct predictions across all classes. F1 Score: Harmonic mean of precision and recall. Precision: (Class X Correctly Predicted) / (# of Class X Predicted) Recall: (Class X Correctly Predicted) / (# of actual Class X) Clustering¶ Clustering models group the model examples into K groups. Metrics are a measure of how compact the clusters are. Clustering models return these metrics: Davies Bouldin Index: A measure of how separated clusters are and how compact they are. Intra-Cluster Variance (Mean Squared Distance): Average Squared distance of each training point to the centroid of the cluster it was assigned to. Silhouette Score: Compares how similar each point is to its own cluster with how dissimilar it is to other clusters. Embedding¶ Embedding models return these metrics: Mean Cosine Similarity: A measure of how similar two vectors are. Mean Jaccard Similarity: A measure of how similar two sets are. Mean Euclidean Distance: A measure of how similar two vectors are. Regression¶ Regression models predict a continuous output variable based on one or more input features. Regression models return these metrics: Mean Absolute Error: The average of the absolute differences between the predicted and actual values. Mean Squared Error: The average of the squared differences between the predicted and actual values. Text generation¶ Text generation models generate text based on a prompt. Text generation models return these metrics: Mean BLEU: A measure of how similar two texts are. Mean ROUGE: A measure of how similar two texts are. Mean Semantic Similarity: A measure of how similar two texts are. Example metrics¶ The following table shows example metrics for different task types. Task type Example metrics Classification {Accuracy=0.9999991465990892, Precision=0.9996998081063332, Recall=0.0013025368892873059, F1=0.0013025368892873059} Clustering {Mean Davies-Bouldin Index=0.9999991465990892} Embedding {Mean Cosine Similarity=0.9999991465990892, Mean Jaccard Similarity=0.9996998081063332, Mean Euclidean Distance=0.0013025368892873059} Regression {MAE=0.9999991465990892, MSE=0.9996998081063332, RMSE=0.0013025368892873059, MAPE=0.0013025368892873059, R²=0.0043025368892873059} Text generation {Mean BLEU=0.9999991465990892, Mean ROUGE=0.9996998081063332, Mean Semantic Similarity=0.0013025368892873059} Example¶ After you have registered the AI model by using the CREATE MODEL statement, run the model by using the ML_EVALUATE function in a SQL query. The following example statement registers a remote OpenAI model for a classification task. CREATE MODEL `my_remote_model` INPUT (f1 INT, f2 STRING) OUTPUT (output_label STRING) WITH( 'task' = 'classification', 'type' = 'remote', 'provider' = 'openai', 'openai.endpoint' = 'https://api.openai.com/v1/llm/v1/chat', 'openai.api_key' = '' ); The following statements show how to run the ML_EVALUATE function on various versions of my_remote_model using data in a table named eval_data. -- Model evaluation with all versions SELECT ML_EVALUATE(`my_remote_model$all`, label, f1, f2) FROM `eval_data`; -- Model evaluation with default version SELECT ML_EVALUATE(`my_remote_model`, label, f1, f2) FROM `eval_data`; -- Model evaluation with specific version 2 SELECT ML_EVALUATE(`my_remote_model$2`, label, f1, f2) FROM `eval_data`; KEY_SEARCH_AGG¶ Run a key search over an external table. SyntaxKEY_SEARCH_AGG(, DESCRIPTOR(), ); DescriptionUse the KEY_SEARCH_AGG function to run key searches over external databases in Confluent Cloud for Apache Flink. The KEY_SEARCH_AGG function uses a combination of serialized table properties and configuration settings to interact with external databases. It’s designed to handle the deserialization of table properties and manage the runtime environment for executing search queries. The output of KEY_SEARCH_AGG is an array with all rows in the external table that have a matching key in the search column. Search result array[row1, row2, …] ML_FORECAST¶ Perform continuous forecasting on a table. SyntaxML_FORECAST( data_column, timestamp_column, JSON_OBJECT('p' VALUE 1, 'q' VALUE 1, 'd' VALUE 1, 'minTrainingSize' VALUE 10)); DescriptionThe ML_FORECAST function uses an ARIMA model to perform time-series forecasting. Your data must include: A timestamp column. A target column representing some quantity of interest at each timestamp. For more information, see Forecast Data Trends. ParametersFor forecasting parameters, see ARIMA model parameters. ExampleSELECT ML_FORECAST( total_orderunits, summed_ts, JSON_OBJECT('p' VALUE 1, 'q' VALUE 1, 'd' VALUE 1, 'minTrainingSize' VALUE 10)) OVER ( ORDER BY summed_ts RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS forecast FROM test_table; AI_COMPLETE¶ Invoke a large language model (LLM) to generate text completions, summaries, or answers. SyntaxAI_COMPLETE(model_name, input_prompt [, invocation_config]); DescriptionThe AI_COMPLETE function provides a streamlined approach for generating text, taking a single string as input and returning a single string as output. This functionality enables you to leverage LLMs to produce text based on any given prompt. Configuration model_name: Name of the model entity to call to for prediction [STRING]. input_prompt: Input prompt to pass to the LLM for prediction [STRING]. invocation_config[optional]: Map to pass the configuration to manage function behavior, for example, MAP['debug', true]. ExampleThe following example shows how to invoke an LLM to generate text completions. # Create an OpenAI connection. CREATE CONNECTION openai_connection WITH ( 'type' = 'openai', 'endpoint' = 'https://api.openai.com/v1/chat/completions', 'api-key' = '' ); CREATE MODEL description_extractor INPUT (input STRING) OUTPUT (output_json STRING) WITH( 'provider' = 'openai', 'openai.connection' = 'openai_connection', 'openai.system_prompt' = 'Extract json from input free text', 'task' = 'text_generation' ); CREATE TABLE claims_with_structured_description(id INT, customer_id INT, output_json STRING); INSERT INTO claims_with_structured_description SELECT id, customer_id, output_json FROM claims_submitted, LATERAL TABLE(AI_COMPLETE('description_extractor', description)); AI_EMBEDDING¶ Generate vector embeddings for text or other data using a registered embedding model. AI_EMBEDDING(model_name, input_text [, invocation_config]); DescriptionThe AI_EMBEDDING function provides a straightforward interface, accepting a single string input and returning an array of floats as the embedding response. This functionality enables you to leverage large language models (LLMs) to generate embeddings for text efficiently. Configuration model_name: Name of the model entity to call to for embeddings [STRING]. input_text: Input text to pass to the LLM for embeddings [STRING]. invocation_config[optional]: Map to pass the configuration to manage function behavior, for example, MAP['debug', true]. ExampleThe following example shows how to generate vector embeddings for text or other data using a registered embedding model. # Create an OpenAI connection. CREATE CONNECTION openai_embedding_connection WITH ( 'type' = 'openai', 'endpoint' = 'https://api.openai.com/v1/embeddings', 'api-key' = '' ); CREATE MODEL description_embedding INPUT (input STRING) OUTPUT (embeddings ARRAY) WITH( 'provider' = 'openai', 'openai.connection' = 'openai_embedding_connection', 'task' = 'embedding' ); CREATE TABLE claims_embeddings(id INT, customer_id INT, embeddings ARRAY); INSERT INTO claims_embeddings SELECT id, customer_id, embeddings FROM claims_submitted, LATERAL TABLE(AI_EMBEDDING('description_embedding', description)); AI_TOOL_INVOKE¶ Invoke a registered tool, either externally by using an MCP server or locally by using a UDF, as part of an AI workflow. SyntaxAI_TOOL_INVOKE(model_name, input_prompt, remote_udf_descriptor, mcp_tool_descriptor [, invocation_config]); DescriptionThe AI_TOOL_INVOKE function enables large language models (LLMs) to access various tools. The LLM decides which tools should be accessed, then the AI_TOOL_INVOKE function invokes the tools, gets the responses, and returns the responses to the LLM. The function returns a map that includes all the tools that were accessed, along with their responses and the status of the call, indicating whether it was a SUCCESS or FAILURE. This function supports only SSE-based MCP servers. The following models are supported: Anthropic AzureOpenAI Gemini OpenAI Note The AI_TOOL_INVOKE function is available for preview. A Preview feature is a Confluent Cloud component that is being introduced to gain early feedback from developers. Preview features can be used for evaluation and non-production testing purposes or to provide feedback to Confluent. The warranty, SLA, and Support Services provisions of your agreement with Confluent do not apply to Preview features. Confluent may discontinue providing preview releases of the Preview features at any time in Confluent’s’ sole discretion. Configuration model_name: Name of the model entity to call [STRING]. input_prompt: Input prompt to pass to the LLM [STRING]. remote_udf_descriptor: Map to pass UDF names as key and function description as value [MAP]. A maximum of 3 UDFs can be passed. mcp_tool_descriptor: Map to pass MCP tool names as key and tool description as value [MAP]. A maximum of 5 tools can be passed. This additional description is passed to the LLM as “Additional description”. If the MCP server already has a description, and if the server doesn’t have a description, mcp_tool_descriptor is added as the description. You can leave it empty, in which case no changes are made to the description provided by the server. invocation_config[optional]: Map to pass the config to manage function behavior, for example, MAP['debug', true, 'on_error', 'continue']. ExampleThe following example shows how to invoke a UDF and a registered external tool or API as part of an AI workflow. When you create an MCP server connection, specify the following options: endpoint: Defines the base URL for all non-SSE communications with the MCP server, including other http calls and general data exchange. sse_endpoint: Specifies the explicit URL endpoint used to establish a Server-Sent Events (SSE) connection with the MCP server. If omitted, the client defaults to constructing the SSE endpoint by appending /sse to the domain specified in endpoint. # Create an MCP server connection. CREATE CONNECTION claims_mcp_server WITH ( 'type' = 'mcp_server', 'endpoint' = 'https://mcp.deepwiki.com', 'sse-endpoint' = 'https://mcp.deepwiki.com/sse', 'api-key' = 'api_key' ); -- Create a model that uses the MCP server connection. CREATE MODEL tool_invoker INPUT (input_message STRING) OUTPUT (tool_calls STRING) WITH( 'provider' = 'openai', 'openai.connection' = openai_connection, 'openai.system_prompt' = 'Select the best tools to complete the task', 'mcp.connection' = 'claims_mcp_server' ); -- Create a table that contains the input prompts. CREATE TABLE claims_verified ( id int, customer_id int ); -- Run the AI_TOOL_INVOKE function. SELECT id, customer_id, AI_TOOL_INVOKE( 'tool_invoker', customer_id, MAP['udf_1', 'udf_1 description', 'udf_2', 'udf_2 description'], MAP['tool_1', 'tool_1_description', 'tool_2', 'tool_2_description'] ) AS verified_result FROM claims_verified; TEXT_SEARCH_AGG¶ Run a text search over an external table. SyntaxSELECT * FROM key_input, LATERAL TABLE(TEXT_SEARCH_AGG(, DESCRIPTOR(), , )); DescriptionUse the TEXT_SEARCH_AGG function to run full-text searches over external databases in Confluent Cloud for Apache Flink. The TEXT_SEARCH_AGG function uses a combination of serialized table properties and configuration settings to interact with external databases. It’s designed to handle the deserialization of table properties and manage the runtime environment for executing search queries. The output of TEXT_SEARCH_AGG is an array with all rows in the external table that have matching text in the search column. Search result array[row1, row2, …] VECTOR_SEARCH_AGG¶ Run a vector search over an external table. SyntaxVECTOR_SEARCH_AGG(, DESCRIPTOR(), , ); Note Vector Search is an Open Preview feature in Confluent Cloud. A Preview feature is a Confluent Cloud component that is being introduced to gain early feedback from developers. Preview features can be used for evaluation and non-production testing purposes or to provide feedback to Confluent. The warranty, SLA, and Support Services provisions of your agreement with Confluent do not apply to Preview features. Confluent may discontinue providing preview releases of the Preview features at any time in Confluent’s’ sole discretion. DescriptionUse the VECTOR_SEARCH_AGG function in conjunction with AI model inference to enable LLM-RAG use cases on Confluent Cloud. The VECTOR_SEARCH_AGG function uses a combination of serialized table properties and configuration settings to interact with external databases. It’s designed to handle the deserialization of table properties and manage the runtime environment for executing search queries. The output of VECTOR_SEARCH_AGG is an array with all rows in the external table that have a matching vector in the search column. Search result array[row1, row2, …] ExampleAfter you have registered the AI inference model by using the CREATE MODEL statement, you can start running vector searches. The following example assumes a vector search endpoint as shown in Elasticsearch Quick Start Guide and an API key as shown in Kibana API Keys. Once your vector search is created, the following example shows these steps: Create a connection resource with the Elasticsearch endpoint and API key. Create an Elasticsearch external table. Create an input vector table. Run the vector search. Run the following statement to create a connection resource named elastic-connection that uses your AWS credentials. CREATE CONNECTION elastic-connection WITH ( 'type' = 'elastic', 'endpoint' = '', 'api-key' = '' ); Run the following statements to creates the tables and run the vector search. -- Create the external table. CREATE TABLE elastic ( vector array, text string ) WITH ( 'connector' = 'elastic', 'elastic.connection' = 'elastic-connection', 'elastic.index' = 'vector-search-index' ); -- Create the embedding output table. CREATE TABLE embedding_output (text string, embedding array); -- Insert mock data. INSERT INTO embedding_output values ('hello world', ARRAY[1, 5, -20]); -- Run the vector search. SELECT * FROM embedding_output, LATERAL TABLE(VECTOR_SEARCH_AGG('elastic', DESCRIPTOR(embedding), embedding, 3)); For more examples, see Vector Search with Confluent Cloud for Apache Flink. Other built-in functions¶ Aggregate Functions Collection Functions Comparison Functions Conditional Functions Datetime Functions Hash Functions JSON Functions ML Preprocessing Functions Model Inference Functions Numeric Functions String Functions Table API Functions Related content¶ Build AI with Flink SQL CREATE MODEL Flink SQL Queries ML Preprocessing Functions Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql ML_PREDICT(`model_name[$version_id]`, column); -- map settings are optional ML_PREDICT(`model_name[$version_id]`, column, map['async_enabled', [boolean], 'client_timeout', [int], 'max_parallelism', [int], 'retry_count', [int]]); ``` ```sql async_enabled ``` ```sql client_timeout ``` ```sql retry_count ``` ```sql max_parallelism ``` ```sql async_enabled ``` ```sql embeddingmodel ``` ```sql text_stream ``` ```sql SELECT id, text, embedding FROM text_stream, LATERAL TABLE(ML_PREDICT('embeddingmodel', text)); ``` ```sql -- Specify the timeout. SELECT * FROM `db1`.`tb1`, LATERAL TABLE(ML_PREDICT('md1', key, map['client_timeout', 60 ])); -- Specify all configuration parameters. SELECT * FROM `db1`.`tb1`, LATERAL TABLE(ML_PREDICT('md1', key, map['async_enabled', true, 'client_timeout', 60, 'max_parallelism', 20, 'retry_count', 5])); ``` ```sql ML_DETECT_ANOMALIES( data_column, timestamp_column, JSON_OBJECT('p' VALUE 1, 'q' VALUE 1, 'd' VALUE 1, 'minTrainingSize' VALUE 10)); ``` ```sql SELECT ML_DETECT_ANOMALIES( total_orderunits, summed_ts, JSON_OBJECT('p' VALUE 1, 'q' VALUE 1, 'd' VALUE 1, 'minTrainingSize' VALUE 10)) OVER ( ORDER BY summed_ts RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS anomalies FROM test_table; ``` ```sql ML_EVALUATE(`model_name`, label, col1, col2, ...) FROM 'eval_data_table'; ``` ```sql Map ``` ```sql CREATE MODEL `my_remote_model` INPUT (f1 INT, f2 STRING) OUTPUT (output_label STRING) WITH( 'task' = 'classification', 'type' = 'remote', 'provider' = 'openai', 'openai.endpoint' = 'https://api.openai.com/v1/llm/v1/chat', 'openai.api_key' = '' ); ``` ```sql my_remote_model ``` ```sql -- Model evaluation with all versions SELECT ML_EVALUATE(`my_remote_model$all`, label, f1, f2) FROM `eval_data`; -- Model evaluation with default version SELECT ML_EVALUATE(`my_remote_model`, label, f1, f2) FROM `eval_data`; -- Model evaluation with specific version 2 SELECT ML_EVALUATE(`my_remote_model$2`, label, f1, f2) FROM `eval_data`; ``` ```sql KEY_SEARCH_AGG(, DESCRIPTOR(), ); ``` ```sql ML_FORECAST( data_column, timestamp_column, JSON_OBJECT('p' VALUE 1, 'q' VALUE 1, 'd' VALUE 1, 'minTrainingSize' VALUE 10)); ``` ```sql SELECT ML_FORECAST( total_orderunits, summed_ts, JSON_OBJECT('p' VALUE 1, 'q' VALUE 1, 'd' VALUE 1, 'minTrainingSize' VALUE 10)) OVER ( ORDER BY summed_ts RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS forecast FROM test_table; ``` ```sql AI_COMPLETE(model_name, input_prompt [, invocation_config]); ``` ```sql input_prompt ``` ```sql invocation_config[optional] ``` ```sql MAP['debug', true] ``` ```sql # Create an OpenAI connection. CREATE CONNECTION openai_connection WITH ( 'type' = 'openai', 'endpoint' = 'https://api.openai.com/v1/chat/completions', 'api-key' = '' ); CREATE MODEL description_extractor INPUT (input STRING) OUTPUT (output_json STRING) WITH( 'provider' = 'openai', 'openai.connection' = 'openai_connection', 'openai.system_prompt' = 'Extract json from input free text', 'task' = 'text_generation' ); CREATE TABLE claims_with_structured_description(id INT, customer_id INT, output_json STRING); INSERT INTO claims_with_structured_description SELECT id, customer_id, output_json FROM claims_submitted, LATERAL TABLE(AI_COMPLETE('description_extractor', description)); ``` ```sql AI_EMBEDDING(model_name, input_text [, invocation_config]); ``` ```sql invocation_config[optional] ``` ```sql MAP['debug', true] ``` ```sql # Create an OpenAI connection. CREATE CONNECTION openai_embedding_connection WITH ( 'type' = 'openai', 'endpoint' = 'https://api.openai.com/v1/embeddings', 'api-key' = '' ); CREATE MODEL description_embedding INPUT (input STRING) OUTPUT (embeddings ARRAY) WITH( 'provider' = 'openai', 'openai.connection' = 'openai_embedding_connection', 'task' = 'embedding' ); CREATE TABLE claims_embeddings(id INT, customer_id INT, embeddings ARRAY); INSERT INTO claims_embeddings SELECT id, customer_id, embeddings FROM claims_submitted, LATERAL TABLE(AI_EMBEDDING('description_embedding', description)); ``` ```sql AI_TOOL_INVOKE(model_name, input_prompt, remote_udf_descriptor, mcp_tool_descriptor [, invocation_config]); ``` ```sql input_prompt ``` ```sql remote_udf_descriptor ``` ```sql mcp_tool_descriptor ``` ```sql mcp_tool_descriptor ``` ```sql invocation_config[optional] ``` ```sql MAP['debug', true, 'on_error', 'continue'] ``` ```sql sse_endpoint ``` ```sql # Create an MCP server connection. CREATE CONNECTION claims_mcp_server WITH ( 'type' = 'mcp_server', 'endpoint' = 'https://mcp.deepwiki.com', 'sse-endpoint' = 'https://mcp.deepwiki.com/sse', 'api-key' = 'api_key' ); ``` ```sql -- Create a model that uses the MCP server connection. CREATE MODEL tool_invoker INPUT (input_message STRING) OUTPUT (tool_calls STRING) WITH( 'provider' = 'openai', 'openai.connection' = openai_connection, 'openai.system_prompt' = 'Select the best tools to complete the task', 'mcp.connection' = 'claims_mcp_server' ); -- Create a table that contains the input prompts. CREATE TABLE claims_verified ( id int, customer_id int ); -- Run the AI_TOOL_INVOKE function. SELECT id, customer_id, AI_TOOL_INVOKE( 'tool_invoker', customer_id, MAP['udf_1', 'udf_1 description', 'udf_2', 'udf_2 description'], MAP['tool_1', 'tool_1_description', 'tool_2', 'tool_2_description'] ) AS verified_result FROM claims_verified; ``` ```sql SELECT * FROM key_input, LATERAL TABLE(TEXT_SEARCH_AGG(, DESCRIPTOR(), , )); ``` ```sql VECTOR_SEARCH_AGG(, DESCRIPTOR(), , ); ``` ```sql CREATE CONNECTION elastic-connection WITH ( 'type' = 'elastic', 'endpoint' = '', 'api-key' = '' ); ``` ```sql -- Create the external table. CREATE TABLE elastic ( vector array, text string ) WITH ( 'connector' = 'elastic', 'elastic.connection' = 'elastic-connection', 'elastic.index' = 'vector-search-index' ); -- Create the embedding output table. CREATE TABLE embedding_output (text string, embedding array); -- Insert mock data. INSERT INTO embedding_output values ('hello world', ARRAY[1, 5, -20]); -- Run the vector search. SELECT * FROM embedding_output, LATERAL TABLE(VECTOR_SEARCH_AGG('elastic', DESCRIPTOR(embedding), embedding, 3)); ``` --- ### SQL numeric functions in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/functions/numeric-functions.html Numeric Functions in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides these built-in numeric functions to use in SQL queries: Numeric Trigonometry Random number generators Utility ABS ACOS RAND HEX BIN ASIN RAND(INT) UUID CEILING ATAN RAND_INTEGER(INT) UNHEX E ATAN2 RAND_INTEGER(INT1, INT2) EXP COS FLOOR COSH LN COT LOG DEGREES LOG10 RADIANS LOG2 SIN PERCENTILE SINH PI TAN POWER TANH ROUND SIGN SQRT TRUNCATE ABS¶ Gets the absolute value of a number. SyntaxABS(numeric) DescriptionThe ABS function returns the absolute value of the specified NUMERIC. Examples-- returns 23 SELECT ABS(-23); -- returns 23 SELECT ABS(23); ACOS¶ Computes the arccosine. SyntaxACOS(numeric) DescriptionThe ACOS function returns the arccosine of the specified NUMERIC. Examples-- returns 1.5707963267948966 -- (approximately PI/2) SELECT ACOS(0); -- returns 0.0 SELECT ACOS(1); ASIN¶ Computes the arcsine. SyntaxASIN(numeric) DescriptionThe ASIN function returns the arcsine of the specified NUMERIC. Examples-- returns 0.0 SELECT ASIN(0); -- returns 1.5707963267948966 -- (approximately PI/2) SELECT ASIN(1); ATAN¶ Computes the arctangent. SyntaxATAN(numeric) DescriptionThe ATAN function returns the arctangent of the specified NUMERIC. Examples-- returns 0.0 SELECT ATAN(0); -- returns 0.7853981633974483 -- (approximately PI/4) SELECT ATAN2(1); ATAN2¶ Computes the arctangent of a 2D point. SyntaxATAN2(numeric1, numeric2) DescriptionReturns the arctangent of the coordinate specified by (numeric1, numeric2). Examples-- returns 0.0 SELECT ATAN2(0, 0); -- returns 0.7853981633974483 -- (approximately PI/4) SELECT ATAN2(1, 1); BIN¶ Converts an INTEGER number to binary. SyntaxBIN(int) DescriptionThe BIN function returns a string representation of the specified INTEGER in binary format. Returns NULL if int is NULL. Examples-- returns "100" SELECT BIN(4); -- returns "1100" SELECT BIN(12); CEILING¶ Rounds a number up. SyntaxCEILING(numeric) DescriptionThe CEILING function rounds the specified NUMERIC up and returns the smallest integer that’s greater than or equal to the NUMERIC. This function can be abbreviated to CEIL(numeric). Examples-- returns 24 SELECT CEIL(23.55); -- returns -23 SELECT CEIL(-23.55); COS¶ Computes the cosine of an angle. SyntaxCOS(numeric) DescriptionReturns the cosine of the specified NUMERIC in radians. Examples-- returns 1.0 SELECT COS(0); -- returns 6.123233995736766E-17 -- (approximately 0) SELECT COS(PI()/2); COSH¶ Computes the hyperbolic cosine. SyntaxCOT(numeric) DescriptionThe COSH function returns the hyperbolic cosine of the specified NUMERIC. The return value type is DOUBLE. Example-- returns 1.0 SELECT COSH(0); COT¶ Computes the cotangent of an angle. SyntaxCOT(numeric) DescriptionThe COT function returns the cotangent of the specified NUMERIC in radians. Example-- returns 6.123233995736766E-17 -- (approximately 0) SELECT COT(PI()/2); DEGREES¶ Converts an angle in radians to degrees. SyntaxDEGREES(numeric) DescriptionThe DEGREES function converts the specified NUMERIC value in radians to degrees. Examples-- returns 90.0 SELECT DEGREES(PI()/2); -- returns 180.0 SELECT DEGREES(PI()); -- returns -45.0 SELECT DEGREES(-PI()/4); E¶ Gets the approximate value of e. SyntaxE() DescriptionReturns a value that is closer than any other values to e, the base of the natural logarithm. Examples-- returns 2.718281828459045 -- which is the approximate value of e SELECT E(); -- returns 1.0 SELECT LN(E()); EXP¶ Computes e raised to a power. SyntaxEXP(numeric) DescriptionThe EXP function returns e, the base of the natural logarithm, raised to the power of the specified NUMERIC. Examples-- returns 2.718281828459045 -- which is the approximate value of e SELECT EXP(1); -- returns 7.38905609893065 SELECT EXP(2); -- returns 0.36787944117144233 SELECT EXP(-1); FLOOR¶ Rounds a number down. SyntaxFLOOR(numeric) DescriptionThe FLOOR function rounds the specified NUMERIC down and returns the largest integer that is less than or equal to the NUMERIC. Examples-- returns 23 SELECT FLOOR(23.55); -- returns -24 SELECT FLOOR(-23.55); HEX¶ Converts an integer or string to hexadecimal. SyntaxHEX(numeric) HEX(string) DescriptionThe HEX function returns a string representation of an integer NUMERIC value or a STRING in hexadecimal format. Returns NULL if the argument is NULL. Examples-- returns "14" SELECT HEX(20); -- returns "64" SELECT HEX(100); -- returns "68656C6C6F2C776F726C64" SELECT HEX('hello,world'); Related functionUNHEX LN¶ Computes the natural log. SyntaxLN(numeric) DescriptionThe LN function returns the natural logarithm (base e) of the specified NUMERIC. Examples-- returns 1.0 SELECT LN(E()); -- returns 0.0 SELECT LN(1); LOG¶ Computes a logarithm. SyntaxLOG(numeric1, numeric2) DescriptionThe LOG function returns the logarithm of numeric2 to the base of numeric1. When called with one argument, returns the natural logarithm of numeric2. numeric2 must be greater than 0, and numeric1 must be greater than 1. Examples-- returns 1.0 SELECT LOG(10, 10); -- returns 8.0 SELECT LOG(2, 256); -- returns 1.0 SELECT LOG(E()); LOG10¶ Computes the base-10 logarithm. SyntaxLOG10(numeric) DescriptionThe LOG10 function returns the base-10 logarithm of the specified NUMERIC. Examples-- returns 1.0 SELECT LOG10(10); -- returns 3.0 SELECT LOG(1000); LOG2¶ Computes the base-2 logarithm. SyntaxLOG2(numeric) Description The LOG2 function returns the base-2 logarithm of the specified NUMERIC. Examples-- returns 1.0 SELECT LOG2(2); -- returns 10.0 SELECT LOG2(1024); PERCENTILE¶ Gets a percentile value based on a continuous distribution. SyntaxPERCENTILE(expr, percentage[, frequency]) Arguments expr: A NUMERIC expression. percentage: A NUMERIC expression between 0 and 1, or an ARRAY of NUMERIC expressions, each between 0 and 1. frequency: An optional integral number greater than 0 that describes the number of times expr must be counted. The default is 1. ReturnsDOUBLE if percentage is numeric, or an ARRAY of DOUBLE if percentage is an ARRAY. DescriptionThe PERCENTILE function returns a percentile value based on a continuous distribution of the input column. If no input row lies exactly at the desired percentile, the result is calculated using linear interpolation of the two nearest input values. NULL values are ignored in the calculation. Examples-- returns 6.0 SELECT PERCENTILE(col, 0.3) FROM (VALUES (0), (10), (10)) AS col; -- returns 6.0 SELECT PERCENTILE(col, 0.3, freq) FROM ( VALUES (0, 1), (10, 2)) AS tab(col, freq); -- returns [2.5,7.5] SELECT PERCENTILE(col, ARRAY(0.25, 0.75)) FROM (VALUES (0), (10)) AS col; -- returns 50.0 SELECT PERCENTILE(age, 0.5) FROM (VALUES 0, 50, 100) AS age; PI¶ Gets the approximate value of pi. SyntaxPI() DescriptionThe PI function returns a value that is closer than any other values to pi. Examples-- returns 3.141592653589793 -- (approximately PI) SELECT PI(); -- returns -1.0 SELECT COS(PI()); POWER¶ Raises a number to a power. SyntaxPOWER(numeric1, numeric2) DescriptionThe POWER function returns numeric1 raised to the power of numeric2. Examples-- returns 1000.0 SELECT POWER(10, 3); -- returns 256.0 SELECT POWER(2, 8); -- returns 1.0 SELECT POWER(500, 0); RADIANS¶ Converts an angle in degrees to radians. SyntaxRADIANS(numeric) DescriptionThe RADIANS function converts the specified NUMERIC value in degrees to radians. Examples-- returns 3.141592653589793 -- (approximately PI) SELECT RADIANS(180); -- returns 0.7853981633974483 -- (approximately PI/4) SELECT RADIANS(45); RAND¶ Gets a random number. SyntaxRAND() DescriptionThe RAND function returns a pseudorandom DOUBLE value in the range [0.0, 1.0). Example-- an example return value is 0.9346105267662114 SELECT RAND(); RAND(INT)¶ Gets a random number from a seed. SyntaxRAND(seed INT) DescriptionThe RAND(INT) function returns a pseudorandom DOUBLE value in the range [0.0, 1.0) with the initial seed integer. Two RAND functions return identical sequences of numbers if they have the same initial seed value. Examples-- returns 0.7321323355141605 SELECT RAND(23); -- returns 0.7275636800328681 SELECT RAND(42); RAND_INTEGER(INT)¶ Gets a pseudorandom integer. SyntaxRAND_INTEGER(upper_bound INT) DescriptionThe RAND_INTEGER(INT) functions returns a pseudorandom integer value in the range [0, upper_bound). Examples-- returns 20 SELECT RAND_INTEGER(23); -- returns 28 SELECT RAND_INTEGER(42); RAND_INTEGER(INT1, INT2)¶ Gets a random integer in a range. SyntaxRAND_INTEGER(seed INT, upper_bound INT) DescriptionThe RAND_INTEGER(INT1, INT2) function returns a pseudorandom integer value in the range [0, upper_bound) with the initial seed value seed. Two RAND_INTEGER functions return identical sequences of numbers if they have the same initial seed and bound. Examples-- returns 227 SELECT RAND_INTEGER(23, 1000); -- returns 1130 SELECT RAND_INTEGER(42, 10000); ROUND¶ Rounds a number to the specified precision. SyntaxROUND(numeric, int) DescriptionThe ROUND function returns a number rounded to int decimal places for the specified NUMERIC. Examples-- returns 23.6 SELECT ROUND(23.58, 1); -- returns 3.1416 SELECT ROUND(PI(), 4); SIGN¶ Gets the sign of a number. SyntaxSIGN(numeric) DescriptionThe SIGN function returns the signum of the specified NUMERIC. Examples-- returns -1.00 SELECT SIGN(-23.55); -- returns 1.000 SELECT SIGN(606.808); SIN¶ Compute the sine of an angle. SyntaxSIN(numeric) DescriptionThe SIN function returns the sine of the specified NUMERIC in radians. Examples-- returns 1.0 SELECT SIN(PI()/2); -- returns -1.0 SELECT SIN(-PI()/2); SINH¶ Computes the hyperbolic sine. SyntaxSINH(numeric) DescriptionThe SINH function returns the hyperbolic sine of the specified NUMERIC. The return type is DOUBLE. Example-- returns 0.0 SELECT SINH(0); SQRT¶ Computes the square root of a number. SyntaxSQRT(numeric) DescriptionThe SQRT function returns the square root of the specified NUMERIC, which must greater than or equal to 0. Examples-- returns 8.0 SELECT SQRT(64); -- returns 10.0 SELECT SQRT(100); -- returns 12.0 SELECT SQRT(144); TAN¶ Computes the tangent of an angle. SyntaxTAN(numeric) DescriptionThe TAN function returns the tangent of the specified NUMERIC in radians. Examples-- returns 0.0 SELECT TAN(0); -- returns 0.9999999999999999 SELECT TAN(PI()/4); TANH¶ Computes the hyperbolic tangent. SyntaxTANH(numeric) DescriptionThe TANH function returns the hyperbolic tangent of the specified NUMERIC. The return type is DOUBLE. Examples-- returns 0.0 SELECT TANH(0); -- returns 0.9999092042625951 SELECT TANH(5); TRUNCATE¶ Truncates a number to the specified precision. SyntaxTRUNCATE(numeric, integer) DescriptionThe TRUNCATE(numeric, integer) function returns the specified NUMERIC truncated to the number of decimal places specified by integer. Returns NULL if numeric or integer is NULL. If integer is 0, the result has no decimal point or fractional part. The integer value can be negative, which causes integer digits to the left of the decimal point to become zero. If integer is not set, the function truncates as if integer were 0. Examples-- returns 42.32 SELECT TRUNCATE(42.324, 2); -- returns 42.0 SELECT TRUNCATE(42.324); -- returns 40 SELECT TRUNCATE(42.324, -1); UNHEX¶ Converts a hexadecimal expression to BINARY. SyntaxUNHEX(str) Argumentsstr: a hexadecimal STRING. The characters in str must be legal hexadecimal digits: 0 - 9, A - F, and a - f. ReturnsA BINARY string. If str contains any nonhexadecimal digits, or is NULL, the return value is NULL. DescriptionThe UNHEX function interprets each pair of characters in str as a hexadecimal number and converts it to the byte represented by the number. If the length of str is odd, the first character is discarded, and the result is left-padded with a NULL byte. Examples-- returns "Flink" SELECT DECODE(UNHEX('466C696E6B') , 'UTF-8'); -- returns NULL SELECT UNHEX('ZZ'); Related functions DECODE HEX UUID¶ Generates a UUID. SyntaxUUID() DescriptionThe UUID() function returns a Universally Unique Identifier (UUID) string that conforms to the RFC 4122 type 4 specification. The UUID is generated using a cryptographically strong pseudo-random number generator. Examples-- an example return value is -- 3d3c68f7-f608-473f-b60c-b0c44ad4cc4e SELECT UUID(); Other built-in functions¶ Aggregate Functions Collection Functions Comparison Functions Conditional Functions Datetime Functions Hash Functions JSON Functions ML Preprocessing Functions Model Inference Functions Numeric Functions String Functions Table API Functions Related content¶ User-defined Functions Create a User Defined Function Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql ABS(numeric) ``` ```sql -- returns 23 SELECT ABS(-23); -- returns 23 SELECT ABS(23); ``` ```sql ACOS(numeric) ``` ```sql -- returns 1.5707963267948966 -- (approximately PI/2) SELECT ACOS(0); -- returns 0.0 SELECT ACOS(1); ``` ```sql ASIN(numeric) ``` ```sql -- returns 0.0 SELECT ASIN(0); -- returns 1.5707963267948966 -- (approximately PI/2) SELECT ASIN(1); ``` ```sql ATAN(numeric) ``` ```sql -- returns 0.0 SELECT ATAN(0); -- returns 0.7853981633974483 -- (approximately PI/4) SELECT ATAN2(1); ``` ```sql ATAN2(numeric1, numeric2) ``` ```sql (numeric1, numeric2) ``` ```sql -- returns 0.0 SELECT ATAN2(0, 0); -- returns 0.7853981633974483 -- (approximately PI/4) SELECT ATAN2(1, 1); ``` ```sql -- returns "100" SELECT BIN(4); -- returns "1100" SELECT BIN(12); ``` ```sql CEILING(numeric) ``` ```sql CEIL(numeric) ``` ```sql -- returns 24 SELECT CEIL(23.55); -- returns -23 SELECT CEIL(-23.55); ``` ```sql COS(numeric) ``` ```sql -- returns 1.0 SELECT COS(0); -- returns 6.123233995736766E-17 -- (approximately 0) SELECT COS(PI()/2); ``` ```sql COT(numeric) ``` ```sql -- returns 1.0 SELECT COSH(0); ``` ```sql COT(numeric) ``` ```sql -- returns 6.123233995736766E-17 -- (approximately 0) SELECT COT(PI()/2); ``` ```sql DEGREES(numeric) ``` ```sql -- returns 90.0 SELECT DEGREES(PI()/2); -- returns 180.0 SELECT DEGREES(PI()); -- returns -45.0 SELECT DEGREES(-PI()/4); ``` ```sql -- returns 2.718281828459045 -- which is the approximate value of e SELECT E(); -- returns 1.0 SELECT LN(E()); ``` ```sql EXP(numeric) ``` ```sql -- returns 2.718281828459045 -- which is the approximate value of e SELECT EXP(1); -- returns 7.38905609893065 SELECT EXP(2); -- returns 0.36787944117144233 SELECT EXP(-1); ``` ```sql FLOOR(numeric) ``` ```sql -- returns 23 SELECT FLOOR(23.55); -- returns -24 SELECT FLOOR(-23.55); ``` ```sql HEX(numeric) HEX(string) ``` ```sql -- returns "14" SELECT HEX(20); -- returns "64" SELECT HEX(100); -- returns "68656C6C6F2C776F726C64" SELECT HEX('hello,world'); ``` ```sql LN(numeric) ``` ```sql -- returns 1.0 SELECT LN(E()); -- returns 0.0 SELECT LN(1); ``` ```sql LOG(numeric1, numeric2) ``` ```sql -- returns 1.0 SELECT LOG(10, 10); -- returns 8.0 SELECT LOG(2, 256); -- returns 1.0 SELECT LOG(E()); ``` ```sql LOG10(numeric) ``` ```sql -- returns 1.0 SELECT LOG10(10); -- returns 3.0 SELECT LOG(1000); ``` ```sql LOG2(numeric) ``` ```sql -- returns 1.0 SELECT LOG2(2); -- returns 10.0 SELECT LOG2(1024); ``` ```sql PERCENTILE(expr, percentage[, frequency]) ``` ```sql -- returns 6.0 SELECT PERCENTILE(col, 0.3) FROM (VALUES (0), (10), (10)) AS col; -- returns 6.0 SELECT PERCENTILE(col, 0.3, freq) FROM ( VALUES (0, 1), (10, 2)) AS tab(col, freq); -- returns [2.5,7.5] SELECT PERCENTILE(col, ARRAY(0.25, 0.75)) FROM (VALUES (0), (10)) AS col; -- returns 50.0 SELECT PERCENTILE(age, 0.5) FROM (VALUES 0, 50, 100) AS age; ``` ```sql -- returns 3.141592653589793 -- (approximately PI) SELECT PI(); -- returns -1.0 SELECT COS(PI()); ``` ```sql POWER(numeric1, numeric2) ``` ```sql -- returns 1000.0 SELECT POWER(10, 3); -- returns 256.0 SELECT POWER(2, 8); -- returns 1.0 SELECT POWER(500, 0); ``` ```sql RADIANS(numeric) ``` ```sql -- returns 3.141592653589793 -- (approximately PI) SELECT RADIANS(180); -- returns 0.7853981633974483 -- (approximately PI/4) SELECT RADIANS(45); ``` ```sql -- an example return value is 0.9346105267662114 SELECT RAND(); ``` ```sql RAND(seed INT) ``` ```sql -- returns 0.7321323355141605 SELECT RAND(23); -- returns 0.7275636800328681 SELECT RAND(42); ``` ```sql RAND_INTEGER(upper_bound INT) ``` ```sql RAND_INTEGER(INT) ``` ```sql -- returns 20 SELECT RAND_INTEGER(23); -- returns 28 SELECT RAND_INTEGER(42); ``` ```sql RAND_INTEGER(seed INT, upper_bound INT) ``` ```sql RAND_INTEGER(INT1, INT2) ``` ```sql RAND_INTEGER ``` ```sql -- returns 227 SELECT RAND_INTEGER(23, 1000); -- returns 1130 SELECT RAND_INTEGER(42, 10000); ``` ```sql ROUND(numeric, int) ``` ```sql -- returns 23.6 SELECT ROUND(23.58, 1); -- returns 3.1416 SELECT ROUND(PI(), 4); ``` ```sql SIGN(numeric) ``` ```sql -- returns -1.00 SELECT SIGN(-23.55); -- returns 1.000 SELECT SIGN(606.808); ``` ```sql SIN(numeric) ``` ```sql -- returns 1.0 SELECT SIN(PI()/2); -- returns -1.0 SELECT SIN(-PI()/2); ``` ```sql SINH(numeric) ``` ```sql -- returns 0.0 SELECT SINH(0); ``` ```sql SQRT(numeric) ``` ```sql -- returns 8.0 SELECT SQRT(64); -- returns 10.0 SELECT SQRT(100); -- returns 12.0 SELECT SQRT(144); ``` ```sql TAN(numeric) ``` ```sql -- returns 0.0 SELECT TAN(0); -- returns 0.9999999999999999 SELECT TAN(PI()/4); ``` ```sql TANH(numeric) ``` ```sql -- returns 0.0 SELECT TANH(0); -- returns 0.9999092042625951 SELECT TANH(5); ``` ```sql TRUNCATE(numeric, integer) ``` ```sql TRUNCATE(numeric, integer) ``` ```sql -- returns 42.32 SELECT TRUNCATE(42.324, 2); -- returns 42.0 SELECT TRUNCATE(42.324); -- returns 40 SELECT TRUNCATE(42.324, -1); ``` ```sql -- returns "Flink" SELECT DECODE(UNHEX('466C696E6B') , 'UTF-8'); -- returns NULL SELECT UNHEX('ZZ'); ``` ```sql -- an example return value is -- 3d3c68f7-f608-473f-b60c-b0c44ad4cc4e SELECT UUID(); ``` --- ### SQL Functions in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/functions/overview.html Flink SQL Functions in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables you to do data transformations and other operations with the following built-in functions. Aggregate Functions Collection Functions Comparison Functions Conditional Functions Datetime Functions Hash Functions JSON Functions ML Preprocessing Functions Model Inference Functions Numeric Functions String Functions Table API Functions Related content¶ User-defined Functions Create a User Defined Function Flink SQL Queries DDL Statements --- ### SQL string functions in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/functions/string-functions.html String Functions in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides these built-in string functions to use in SQL queries: ASCII BTRIM string1 || string2 CHARACTER_LENGTH CHR CONCAT CONCAT_WS DECODE ELT ENCODE FROM_BASE64 INITCAP INSTR LEFT LOCATE LOWER LPAD LTRIM OVERLAY PARSE_URL POSITION REGEXP REGEXP_EXTRACT REGEXP_REPLACE REPEAT REPLACE REVERSE RIGHT RPAD RTRIM SPLIT_INDEX STR_TO_MAP SUBSTRING TO_BASE64 TRANSLATE TRIM UPPER URL_DECODE URL_ENCODE ASCII¶ Gets the ASCII value of the first character of a string. SyntaxASCII(string) DescriptionThe ASCII function returns the numeric value of the first character of the specified string. Returns NULL if string is NULL. Examples-- returns 97 SELECT ASCII('abc'); -- returns NULL SELECT ASCII(CAST(NULL AS VARCHAR)); string1 || string2¶ Concatenates two strings. Syntaxstring1 || string2 DescriptionThe || function returns the concatenation of string1 and string2. Examples-- returns "FlinkSQL" SELECT 'Flink' || 'SQL'; Related functions CONCAT CONCAT_WS BTRIM¶ Trim both sides of a string. SyntaxBTRIM(str[, trimStr]) Arguments str: A source STRING expression. trimStr: An optional STRING expression that has characters to be trimmed. The default is the space character. ReturnsA trimmed STRING. DescriptionThe BTRIM function trims the leading and trailing characters from str. Examples-- returns 'www.apache.org' SELECT BTRIM(" www.apache.org "); -- returns 'www.apache.org' SELECT BTRIM('/www.apache.org/', '/'); -- returns 'www.apache.org' SELECT BTRIM('/*www.apache.org*/', '/*'); Related functions LTRIM RTRIM TRIM CHARACTER_LENGTH¶ Gets the length of a string. SyntaxCHARACTER_LENGTH(string) DescriptionThe CHARACTER_LENGTH function returns the number of characters in the specified string. This function can be abbreviated to CHAR_LENGTH(string). Examples-- returns 18 SELECT CHAR_LENGTH('Thomas A. Anderson'); CHR¶ Gets the character for an ASCII code. SyntaxCHR(integer) DescriptionThe CHR function returns the ASCII character that has the binary equivalent to the specified integer. Returns NULL if integer is NULL. If integer is larger than 255, the function computes the modulus of integer divided by 255 first and returns CHR of the modulus. Examples-- returns 'a' SELECT CHR(97); -- returns 'a' SELECT CHR(353); CONCAT¶ Concatenates a list of strings. SyntaxCONCAT(string1, string2, ...) DescriptionThe CONCAT function returns the concatenation of the specified strings. Returns NULL if any argument is NULL. Example-- returns "AABBCC" SELECT CONCAT('AA', 'BB', 'CC'); Related functions string1 || string2 CONCAT_WS CONCAT_WS¶ Concatenates a list of strings with a separator. SyntaxCONCAT_WS(string1, string2, string3, ...) DescriptionThe CONCAT_WS function returns a string that concatenates string2, string3, ... with the separator specified by string1. The separator is added between the strings to be concatenated. Returns NULL If string1 is NULL. Example-- returns "AA~BB~~CC" SELECT CONCAT_WS('~', 'AA', 'BB', '', 'CC'); Related functions string1 || string2 CONCAT DECODE¶ Decodes a binary into a string. SyntaxDECODE(binary, string) DescriptionThe DECODE function decodes the binary argument into a string using the specified character set. Returns NULL if either argument is null. These are the supported character set strings: ‘ISO-8859-1’ ‘US-ASCII’ ‘UTF-8’ ‘UTF-16BE’ ‘UTF-16LE’ ‘UTF-16’ Related function ENCODE ELT¶ Gets the expression at the specified index. SyntaxELT(index, expr[, exprs]*) Arguments index: The 1-based index of the expression to get. index must be an integer between 1 and the number of expressions. expr: An expression that resolves to CHAR, VARCHAR, BINARY, or VARBINARY. ReturnsThe expression at the location in the argument list specified by index. The result has the type of the least common type of all expressions. Returns NULL if index is NULL or out of range. DescriptionReturns the index-th expression. Example-- returns java-2 SELECT ELT(2, 'scala-1', 'java-2', 'go-3'); ENCODE¶ Encodes a string to a BINARY. SyntaxENCODE(string1, string2) DescriptionThe ENCODE function encodes string1 into a BINARY using the specified string2 character set. Returns NULL if either argument is null. These are the supported character set strings: ‘ISO-8859-1’ ‘US-ASCII’ ‘UTF-8’ ‘UTF-16BE’ ‘UTF-16LE’ ‘UTF-16’ Related function DECODE FROM_BASE64¶ Decodes a base-64 encoded string. SyntaxFROM_BASE64(string) DescriptionThe FROM_BASE64 function returns the base64-decoded result from the specified string. Returns NULL if string is NULL. Example-- returns "hello world" SELECT FROM_BASE64('aGVsbG8gd29ybGQ='); Related function TO_BASE64 INITCAP¶ Titlecase a string. SyntaxINITCAP(string) DescriptionThe INITCAP function returns a string that has the first character of each word converted to uppercase and the other characters converted to lowercase. A “word” is assumed to be a sequence of alphanumeric characters. Example-- returns "Title Case This String" SELECT INITCAP('title case this string'); Related functions LOWER UPPER INSTR¶ Find a substring in a string. SyntaxINSTR(string1, string2) DescriptionThe INSTR function returns the position of the first occurrence of string2 in string1. Returns NULL if either argument is NULL. The search is case-sensitive. Example-- returns 33 SELECT INSTR('The quick brown fox jumped over the lazy dog.', 'the'); Related function LOCATE LEFT¶ Gets the leftmost characters in a string. SyntaxLEFT(string, integer) DescriptionThe LEFT function returns the leftmost integer characters from the specified string. Returns an empty string if integer is negative. Returns NULL if either argument is NULL. Example-- returns "Morph" SELECT LEFT('Morpheus', 5); Related function RIGHT LOCATE¶ Finds a substring in a string after a specified position. SyntaxLOCATE(string1, string2[, integer]) DescriptionThe LOCATE function returns the position of the first occurrence of string1 in string2 after position integer. Returns 0 if string1 isn’t found. Returns NULL if any of the arguments is NULL. Example-- returns 12 SELECT LOCATE('the', 'the play’s the thing', 10); LOWER¶ Lowercases a string. SyntaxLOWER(string) DescriptionThe LOWER function returns the specified string in lowercase. To uppercase a string, use the UPPER function. Example-- returns "the quick brown fox jumped over the lazy dog." SELECT LOWER('The Quick Brown Fox Jumped Over The Lazy Dog.'); Related functions INITCAP UPPER LPAD¶ Left-pad a string. SyntaxLPAD(string1, integer, string2) DescriptionThe LPAD function returns a new string from string1 that’s left-padded with string2 to a length of integer characters. If the length of string1 is shorter than integer, the LPAD function returns string1 shortened to integer characters. To right-pad a string, use the RPAD function. Examples-- returns "??hi" SELECT LPAD('hi', 4, '??'); -- returns "h" SELECT LPAD('hi', 1, '??'); Related function - RPAD LTRIM¶ Removes left whitespaces from a string. SyntaxLTRIM(string) DescriptionThe LTRIM function removes the left whitespaces from the specified string. To remove the right whitespaces from a string, use the RTRIM function. Example-- returns "This is a test string." SELECT LTRIM(' This is a test string.'); Related functions BTRIM RTRIM TRIM OVERLAY¶ Replaces characters in a string with another string. SyntaxOVERLAY(string1 PLACING string2 FROM integer1 [ FOR integer2 ]) DescriptionThe OVERLAY function returns a string that replaces integer2 characters of string1 with string2, starting from position integer1. If integer2 isn’t specified, the default is the length of string2. Examples-- returns "xxxxxxxxx" SELECT OVERLAY('xxxxxtest' PLACING 'xxxx' FROM 6); -- returns "xxxxxxxxxst" SELECT OVERLAY('xxxxxtest' PLACING 'xxxx' FROM 6 FOR 2); Related functions REGEXP_REPLACE REPLACE TRANSLATE PARSE_URL¶ Gets parts of a URL. SyntaxPARSE_URL(string1, string2[, string3]) DescriptionThe PARSE_URL function returns the part specified by string2 from the URL in string1. For a URL that has a query, the optional string3 argument specifies the key to extract from the query string. Returns NULL if string1 or string2 is NULL. These are the valid values for string2: ‘AUTHORITY’ ‘FILE’ ‘HOST’ ‘PATH’ ‘PROTOCOL’ ‘QUERY’ ‘REF’ ‘USERINFO’ Example-- returns 'confluent.io' SELECT PARSE_URL('http://confluent.io/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST'); -- returns 'v1' SELECT PARSE_URL('http://confluent.io/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY', 'k1'); POSITION¶ Finds a substring in a string. SyntaxPOSITION(string1 IN string2) DescriptionThe POSITION function returns the position of the first occurrence of string1 in string2. Returns 0 if string1 isn’t found in string2. The position is 1-based, so the index of the first character is 1. Examples-- returns 1 SELECT POSITION('the' IN 'the quick brown fox'); -- returns 17 SELECT POSITION('fox' IN 'the quick brown fox'); REGEXP¶ Matches a string against a regular expression. SyntaxREGEXP(string1, string2) DescriptionThe REGEXP function returns TRUE if any (possibly empty) substring of string1 matches the regular expression in string2; otherwise, FALSE. Returns NULL if either of the arguments is NULL. Examples-- returns TRUE SELECT REGEXP('800 439 3207', '.?(\d{3}).*(\d{3}).*(\d{4})'); -- returns TRUE SELECT REGEXP('2023-05-04', '((\d{4}.\d{2}).(\d{2}))'); REGEXP_EXTRACT¶ Gets a string from a regular expression matching group. SyntaxREGEXP_EXTRACT(string1, string2[, integer]) DescriptionThe REGEXP_EXTRACT function returns a string from string1 that’s extracted with the regular expression specified in string2 and a regex match group index integer. The regex match group index starts from 1, and 0 specifies matching the whole regex. The regex match group index must not exceed the number of the defined groups. Example-- returns "bar" SELECT REGEXP_EXTRACT('foothebar', 'foo(.*?)(bar)', 2); REGEXP_REPLACE¶ Replaces substrings in a string that match a regular expression. SyntaxREGEXP_REPLACE(string1, string2, string3) DescriptionThe REGEXP_REPLACE function returns a string from string1 with all of the substrings that match the regular expression in string2 consecutively replaced with string3. Example-- returns "fb" SELECT REGEXP_REPLACE('foobar', 'oo|ar', ''); Related functions OVERLAY REPLACE TRANSLATE REPEAT¶ Concatenates copies of a string. SyntaxREPEAT(string, integer) DescriptionThe REPEAT function returns a string that repeats the base string integer times. Example-- returns "TestingTesting" SELECT REPEAT('Testing', 2); REPLACE¶ Replace substrings in a string. SyntaxREPLACE(string1, string2, string3) DescriptionThe REPLACE function returns a new string that replaces all occurrences of string2 with string3 (non-overlapping) from string1. Examples-- returns "hello flink" SELECT REPLACE('hello world', 'world', 'flink'); -- returns "zab" SELECT REPLACE('ababab', 'abab', 'z'); Related functions OVERLAY REGEXP_REPLACE TRANSLATE REVERSE¶ Reverses a string. SyntaxREVERSE(string) DescriptionThe REVERSE function returns the reversed string. Returns NULL if string is NULL. Example-- returns "xof nworb kciuq eht" SELECT REVERSE('the quick brown fox'); RIGHT¶ Gets the rightmost characters in a string. SyntaxRIGHT(string, integer) DescriptionThe RIGHT function returns the rightmost integer characters from the specified string. Returns an empty string if integer is negative. Returns NULL if either argument is NULL. Example-- returns "Anderson" SELECT RIGHT('Thomas A. Anderson', 8); Related function LEFT RPAD¶ Right-pad a string. SyntaxRPAD(string1, integer, string2) DescriptionThe RPAD function returns a new string from string1 that’s right-padded with string2 to a length of integer characters. If the length of string1 is shorter than integer, returns string1 shortened to integer characters. To left-pad a string, use the LPAD function. Examples-- returns "hi??" SELECT RPAD('hi', 4, '??'); -- returns "h" SELECT RPAD('hi', 1, '??'); Related function LPAD RTRIM¶ Removes right whitespaces from a string. SyntaxRTRIM(string) DescriptionThe RTRIM function removes the right whitespaces from the specified string. To remove the left whitespaces from a string, use the LTRIM function. Example-- returns "This is a test string." SELECT RTRIM('This is a test string. '); Related functions BTRIM LTRIM TRIM SPLIT_INDEX¶ Splits a string by a delimiter. SyntaxSPLIT_INDEX(string1, string2, integer1) DescriptionThe SPLIT_INDEX function splits string1 by the delimiter in string2 and returns the integer1 zero-based string of the split strings. Returns NULL if integer is negative. Returns NULL if any of the arguments is NULL. Example-- returns "fox" SELECT SPLIT_INDEX('The quick brown fox', ' ', 3); STR_TO_MAP¶ Creates a map from a list of key-value strings. SyntaxSTR_TO_MAP(string1[, string2, string3]) DescriptionThe STR_TO_MAP function returns a map after splitting string1 into key/value pairs using the pair delimiter specified in string2. The default is ','. The string3 argument specifies the key-value delimiter. The default is '='. Both the pair delimiter and the key-value delimiter are treated as regular expressions, so special characters, like <([{\^-=$!|]})?*+.>), must be properly escaped before using as a delimiter literal. Example-- returns {a=1, b=2, c=3} SELECT STR_TO_MAP('a=1,b=2,c=3'); -- returns {a=1, b=2, c=3} SELECT STR_TO_MAP('a:1;b:2;c:3', ';', ':'); SUBSTRING¶ Finds a substring in a string. SyntaxSUBSTRING(string, integer1 [ FOR integer2 ]) DescriptionThe SUBSTRING function returns a substring of the specified string, starting from position integer1 with length integer2. If integer2 isn’t specified, the substring runs to the end of string. This function can be abbreviated to SUBSTR(string, integer1[, integer2]), but SUBSTR doesn’t support the FROM and FOR keywords. Examples-- returns "fox" SELECT SUBSTR('The quick brown fox', 17); -- returns "The" SELECT SUBSTR('The quick brown fox', 1, 3); TO_BASE64¶ Encodes a string to base64. SyntaxTO_BASE64(string) DescriptionThe TO_BASE64 function returns the base64-encoded representation of the specified string. Returns NULL if string is NULL. Example-- returns "aGVsbG8gd29ybGQ=" SELECT TO_BASE64('hello world'); Related function FROM_BASE64 TRANSLATE¶ Substitutes characters in a string. SyntaxTRANSLATE(expr, from, to) Arguments expr: A source STRING expression. from: A STRING expression that specifies a set of characters to be replaced. to: A STRING expression that specifies a corresponding set of replacement characters. ReturnsA STRING that has the characters of expr replaced with the characters specified in the to string. DescriptionThe TRANSLATE function replaces the characters in the expr source string according to the replacement rules specified in the from and to strings. The replacement is case-sensitive. Examples:-- returns A1B2C3 SELECT TRANSLATE('AaBbCc', 'abc', '123'); -- returns A1BC SELECT TRANSLATE('AaBbCc', 'abc', '1'); -- returns ABC SELECT TRANSLATE('AaBbCc', 'abc', ''); -- returns .APACHE.com SELECT TRANSLATE('www.apache.org', 'wapcheorg', ' APCHEcom'); Related functions OVERLAY REGEXP_REPLACE REPLACE TRIM¶ Removes leading and/or trailing characters from a string. SyntaxTRIM([ BOTH | LEADING | TRAILING ] string1 FROM string2) DescriptionThe TRIM function returns a string that removes leading and/or trailing characters string2 from string1. Examples-- returns "The quick brown " SELECT TRIM(TRAILING 'fox' FROM 'The quick brown fox'); -- returns " quick brown fox" SELECT TRIM(LEADING 'The' FROM 'The quick brown fox'); -- returns " The quick brown fox " SELECT TRIM(BOTH 'yyy' FROM 'yyy The quick brown fox yyy'); Related functions BTRIM LTRIM RTRIM UPPER¶ Uppercases a string. SyntaxUPPER(string) DescriptionThe UPPER function returns the specified string in uppercase. To lowercase a string, use the LOWER function. Example-- returns "THE QUICK BROWN FOX" SELECT UPPER('The quick brown fox'); URL_DECODE¶ Decodes a URL string. SyntaxURL_DECODE(string) DescriptionThe URL_DECODE function decodes the specified string in application/x-www-form-urlencoded format using the UTF-8 encoding scheme. If the input string is NULL, or there is an issue with the decoding process, like encountering an illegal escape pattern, or the encoding scheme is not supported, the function returns NULL. Example-- returns "http://confluent.io" SELECT URL_DECODE('http%3A%2F%2Fconfluent.io'); URL_ENCODE¶ Encodes a URL string. SyntaxURL_ENCODE(string) DescriptionThe URL_ENCODE function translates the specified string into application/x-www-form-urlencoded format using the UTF-8 encoding scheme. If the input string is NULL, or there is an issue with the decoding process, like encountering an illegal escape pattern, or the encoding scheme is not supported, the function returns NULL. Example-- returns "http%3A%2F%2Fconfluent.io" SELECT URL_ENCODE('http://confluent.io'); Other built-in functions¶ Aggregate Functions Collection Functions Comparison Functions Conditional Functions Datetime Functions Hash Functions JSON Functions ML Preprocessing Functions Model Inference Functions Numeric Functions String Functions Table API Functions Related content¶ User-defined Functions Create a User Defined Function Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql ASCII(string) ``` ```sql -- returns 97 SELECT ASCII('abc'); -- returns NULL SELECT ASCII(CAST(NULL AS VARCHAR)); ``` ```sql string1 || string2 ``` ```sql -- returns "FlinkSQL" SELECT 'Flink' || 'SQL'; ``` ```sql BTRIM(str[, trimStr]) ``` ```sql -- returns 'www.apache.org' SELECT BTRIM(" www.apache.org "); -- returns 'www.apache.org' SELECT BTRIM('/www.apache.org/', '/'); -- returns 'www.apache.org' SELECT BTRIM('/*www.apache.org*/', '/*'); ``` ```sql CHARACTER_LENGTH(string) ``` ```sql CHARACTER_LENGTH ``` ```sql CHAR_LENGTH(string) ``` ```sql -- returns 18 SELECT CHAR_LENGTH('Thomas A. Anderson'); ``` ```sql CHR(integer) ``` ```sql -- returns 'a' SELECT CHR(97); -- returns 'a' SELECT CHR(353); ``` ```sql CONCAT(string1, string2, ...) ``` ```sql -- returns "AABBCC" SELECT CONCAT('AA', 'BB', 'CC'); ``` ```sql CONCAT_WS(string1, string2, string3, ...) ``` ```sql string2, string3, ... ``` ```sql -- returns "AA~BB~~CC" SELECT CONCAT_WS('~', 'AA', 'BB', '', 'CC'); ``` ```sql DECODE(binary, string) ``` ```sql ELT(index, expr[, exprs]*) ``` ```sql -- returns java-2 SELECT ELT(2, 'scala-1', 'java-2', 'go-3'); ``` ```sql ENCODE(string1, string2) ``` ```sql FROM_BASE64(string) ``` ```sql FROM_BASE64 ``` ```sql -- returns "hello world" SELECT FROM_BASE64('aGVsbG8gd29ybGQ='); ``` ```sql INITCAP(string) ``` ```sql -- returns "Title Case This String" SELECT INITCAP('title case this string'); ``` ```sql INSTR(string1, string2) ``` ```sql -- returns 33 SELECT INSTR('The quick brown fox jumped over the lazy dog.', 'the'); ``` ```sql LEFT(string, integer) ``` ```sql -- returns "Morph" SELECT LEFT('Morpheus', 5); ``` ```sql LOCATE(string1, string2[, integer]) ``` ```sql -- returns 12 SELECT LOCATE('the', 'the play’s the thing', 10); ``` ```sql LOWER(string) ``` ```sql -- returns "the quick brown fox jumped over the lazy dog." SELECT LOWER('The Quick Brown Fox Jumped Over The Lazy Dog.'); ``` ```sql LPAD(string1, integer, string2) ``` ```sql -- returns "??hi" SELECT LPAD('hi', 4, '??'); -- returns "h" SELECT LPAD('hi', 1, '??'); ``` ```sql LTRIM(string) ``` ```sql -- returns "This is a test string." SELECT LTRIM(' This is a test string.'); ``` ```sql OVERLAY(string1 PLACING string2 FROM integer1 [ FOR integer2 ]) ``` ```sql -- returns "xxxxxxxxx" SELECT OVERLAY('xxxxxtest' PLACING 'xxxx' FROM 6); -- returns "xxxxxxxxxst" SELECT OVERLAY('xxxxxtest' PLACING 'xxxx' FROM 6 FOR 2); ``` ```sql PARSE_URL(string1, string2[, string3]) ``` ```sql -- returns 'confluent.io' SELECT PARSE_URL('http://confluent.io/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST'); -- returns 'v1' SELECT PARSE_URL('http://confluent.io/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY', 'k1'); ``` ```sql POSITION(string1 IN string2) ``` ```sql -- returns 1 SELECT POSITION('the' IN 'the quick brown fox'); -- returns 17 SELECT POSITION('fox' IN 'the quick brown fox'); ``` ```sql REGEXP(string1, string2) ``` ```sql -- returns TRUE SELECT REGEXP('800 439 3207', '.?(\d{3}).*(\d{3}).*(\d{4})'); -- returns TRUE SELECT REGEXP('2023-05-04', '((\d{4}.\d{2}).(\d{2}))'); ``` ```sql REGEXP_EXTRACT(string1, string2[, integer]) ``` ```sql REGEXP_EXTRACT ``` ```sql -- returns "bar" SELECT REGEXP_EXTRACT('foothebar', 'foo(.*?)(bar)', 2); ``` ```sql REGEXP_REPLACE(string1, string2, string3) ``` ```sql REGEXP_REPLACE ``` ```sql -- returns "fb" SELECT REGEXP_REPLACE('foobar', 'oo|ar', ''); ``` ```sql REPEAT(string, integer) ``` ```sql -- returns "TestingTesting" SELECT REPEAT('Testing', 2); ``` ```sql REPLACE(string1, string2, string3) ``` ```sql -- returns "hello flink" SELECT REPLACE('hello world', 'world', 'flink'); -- returns "zab" SELECT REPLACE('ababab', 'abab', 'z'); ``` ```sql REVERSE(string) ``` ```sql -- returns "xof nworb kciuq eht" SELECT REVERSE('the quick brown fox'); ``` ```sql RIGHT(string, integer) ``` ```sql -- returns "Anderson" SELECT RIGHT('Thomas A. Anderson', 8); ``` ```sql RPAD(string1, integer, string2) ``` ```sql -- returns "hi??" SELECT RPAD('hi', 4, '??'); -- returns "h" SELECT RPAD('hi', 1, '??'); ``` ```sql RTRIM(string) ``` ```sql -- returns "This is a test string." SELECT RTRIM('This is a test string. '); ``` ```sql SPLIT_INDEX(string1, string2, integer1) ``` ```sql SPLIT_INDEX ``` ```sql -- returns "fox" SELECT SPLIT_INDEX('The quick brown fox', ' ', 3); ``` ```sql STR_TO_MAP(string1[, string2, string3]) ``` ```sql <([{\^-=$!|]})?*+.>) ``` ```sql -- returns {a=1, b=2, c=3} SELECT STR_TO_MAP('a=1,b=2,c=3'); -- returns {a=1, b=2, c=3} SELECT STR_TO_MAP('a:1;b:2;c:3', ';', ':'); ``` ```sql SUBSTRING(string, integer1 [ FOR integer2 ]) ``` ```sql SUBSTR(string, integer1[, integer2]) ``` ```sql -- returns "fox" SELECT SUBSTR('The quick brown fox', 17); -- returns "The" SELECT SUBSTR('The quick brown fox', 1, 3); ``` ```sql TO_BASE64(string) ``` ```sql -- returns "aGVsbG8gd29ybGQ=" SELECT TO_BASE64('hello world'); ``` ```sql TRANSLATE(expr, from, to) ``` ```sql -- returns A1B2C3 SELECT TRANSLATE('AaBbCc', 'abc', '123'); -- returns A1BC SELECT TRANSLATE('AaBbCc', 'abc', '1'); -- returns ABC SELECT TRANSLATE('AaBbCc', 'abc', ''); -- returns .APACHE.com SELECT TRANSLATE('www.apache.org', 'wapcheorg', ' APCHEcom'); ``` ```sql TRIM([ BOTH | LEADING | TRAILING ] string1 FROM string2) ``` ```sql -- returns "The quick brown " SELECT TRIM(TRAILING 'fox' FROM 'The quick brown fox'); -- returns " quick brown fox" SELECT TRIM(LEADING 'The' FROM 'The quick brown fox'); -- returns " The quick brown fox " SELECT TRIM(BOTH 'yyy' FROM 'yyy The quick brown fox yyy'); ``` ```sql UPPER(string) ``` ```sql -- returns "THE QUICK BROWN FOX" SELECT UPPER('The quick brown fox'); ``` ```sql URL_DECODE(string) ``` ```sql application/x-www-form-urlencoded ``` ```sql -- returns "http://confluent.io" SELECT URL_DECODE('http%3A%2F%2Fconfluent.io'); ``` ```sql URL_ENCODE(string) ``` ```sql application/x-www-form-urlencoded ``` ```sql -- returns "http%3A%2F%2Fconfluent.io" SELECT URL_ENCODE('http://confluent.io'); ``` --- ### Table API functions in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/functions/table-api-functions.html Table API in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports programming applications with the Table API. For more information, see the Table API Overview. To get started with programming a streaming data application with the Table API, see the Java Table API Quick Start. Confluent Cloud for Apache Flink supports the following Table API functions. TableEnvironment interface Table interface: SQL equivalents Table interface: API extensions TablePipeline interface StatementSet interface TableResult interface TableConfig class TableConfig class Confluent Others TableEnvironment interface¶ TableEnvironment.createStatementSet() TableEnvironment.createTable(String, TableDescriptor) TableEnvironment.executeSql(String) TableEnvironment.explainSql(String) TableEnvironment.from(String) TableEnvironment.fromValues(…) TableEnvironment.getConfig() TableEnvironment.getCurrentCatalog() TableEnvironment.getCurrentDatabase() TableEnvironment.listCatalogs() TableEnvironment.listDatabases() TableEnvironment.listFunctions() TableEnvironment.listTables() TableEnvironment.listTables(String, String) TableEnvironment.listViews() TableEnvironment.sqlQuery(String) TableEnvironment.useCatalog(String) TableEnvironment.useDatabase(String) Table interface: SQL equivalents¶ Table.as(…) Table.distinct() Table.executeInsert(String) Table.fetch(…) Table.filter(…) Table.fullOuterJoin(…) Table.groupBy(…) Table.insertInto(String) Table.intersect(…) Table.intersectAll(…) Table.join(…) Table.leftOuterJoin(…) Table.limit(…) Table.minus(…) Table.minusAll(…) Table.offset(…) Table.orderBy(…) Table.rightOuterJoin(…) Table.select(…) Table.union(…) Table.unionAll(…) Table.where(…) Table.window(…) Table interface: API extensions¶ Table.addColumns(…) Table.addOrReplaceColumns(…) Table.dropColumns(…) Table.execute() Table.explain() Table.getResolvedSchema() Table.map(…) Table.printExplain() Table.printSchema() Table.renameColumns(…) TablePipeline interface¶ TablePipeline.execute() TablePipeline.explain() TablePipeline.printExplain() StatementSet interface¶ StatementSet.add(TablePipeline) StatementSet.addInsert(String, Table) StatementSet.addInsertSql(String) StatementSet.execute() StatementSet.explain() TableResult interface¶ TableResult.await(…) TableResult.collect() TableResult.getJobClient().cancel() TableResult.getResolvedSchema() TableResult.print() TableConfig class¶ TableConfig.set(…) Expressions class¶ Expressions.* (except for call()) Others¶ FormatDescriptor.* TableDescriptor.* Over.* Session.* Slide.* Tumble.* Confluent¶ Confluent adds the following classes for more convenience: ConfluentSettings.* ConfluentTools.* ConfluentTableDescriptor.* Related content¶ Course: Apache Flink® Table API: Processing Data Streams in Java Table API Overview Java Table API Quick Start Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. --- ### Flink SQL Keywords in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/keywords.html Flink SQL Reserved Keywords in Confluent Cloud for Apache Flink¶ Keywords are words that have significance in Confluent Cloud for Apache Flink®. Some keywords, like AND, CHAR, and SELECT are reserved and require special treatment for use as identifiers like table names, column names, and the names of built-in functions. You can use reserved words as identifiers if you quote them with backtick characters. If you want to use one of the reserved words as a field name, enclose it with backticks, for example: `DATABASES` `RAW` You can use nonreserved keywords as identifiers without enclosing them with backticks. In the following tables, reserved keywords are shown in bold. Some string combinations are reserved as keywords for future use. Index¶ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z A¶ A ABS ABSENT ABSOLUTE ACTION ADA ADD ADMIN AFTER ALL ALLOCATE ALLOW ALTER ALWAYS AND ANALYZE ANY APPLY ARE ARRAY ARRAY_AGG ARRAY_CONCAT_AGG ARRAY_MAX_CARDINALITY AS ASC ASENSITIVE ASSERTION ASSIGNMENT ASYMMETRIC AT ATOMIC ATTRIBUTE ATTRIBUTES AUTHORIZATION AVG B¶ BEFORE BEGIN BEGIN_FRAME BEGIN_PARTITION BERNOULLI BETWEEN BIGINT BINARY BIT BLOB BOOLEAN BOTH BREADTH BUCKETS BY BYTES C¶ C CALL CALLED CARDINALITY CASCADE CASCADED CASE CAST CATALOG CATALOG_NAME CATALOGS CEIL CEILING CENTURY CHAIN CHANGELOG_MODE CHAR CHARACTER CHARACTERISTICS CHARACTERS CHARACTER_LENGTH CHARACTER_SET_CATALOG CHARACTER_SET_NAME CHARACTER_SET_SCHEMA CHAR_LENGTH CHECK CLASS_ORIGIN CLASSIFIER CLOB CLOSE COALESCE COBOL COLLATE COLLATION COLLATION_CATALOG COLLATION_NAME COLLATION_SCHEMA COLLECT COLUMN COLUMNS COLUMN_NAME COMMAND_FUNCTION COMMAND_FUNCTION_CODE COMMENT COMMIT COMMITTED COMPACT COMPILE COMPUTE CONDITION CONDITION_NUMBER CONDITIONAL CONNECT CONNECTION CONNECTION_NAME CONSTRAINT CONSTRAINTS CONSTRAINT_CATALOG CONSTRAINT_NAME CONSTRAINT_SCHEMA CONSTRUCTOR CONTAINS CONTAINS_SUBSTR CONTINUE CONTINUOUS CONVERT CORR CORRESPONDING COUNT COVAR_POP COVAR_SAMP CREATE CROSS CUBE CUME_DIST CURRENT CURRENT_CATALOG CURRENT_DATE CURRENT_DEFAULT_TRANSFORM_GROUP CURRENT_PATH CURRENT_ROLE CURRENT_ROW CURRENT_SCHEMA CURRENT_TIME CURRENT_TIMESTAMP CURRENT_TRANSFORM_GROUP_FOR_TYPE CURRENT_USER CURSOR CURSOR_NAME CYCLE D¶ DATA DATABASE DATABASES DATE DATE_DIFF DATE_TRUNC DATETIME DATETIME_DIFF DATETIME_INTERVAL_CODE DATETIME_INTERVAL_PRECISION DAY DAYOFWEEK DAYS DAYOFYEAR DATETIME_TRUNC DEALLOCATE DEC DECADE DECIMAL DECLARE DEFAULT DEFAULTS DEFERRABLE DEFERRED DEFINE DEFINED DEFINER DEGREE DELETE DENSE_RANK DEPTH DEREF DERIVED DESC DESCRIBE DESCRIPTION DESCRIPTOR DETERMINISTIC DIAGNOSTICS DISALLOW DISCONNECT DISPATCH DISTINCT DISTRIBUTED DISTRIBUTION DOMAIN DOT DOUBLE DOW DOY DRAIN DROP DYNAMIC DYNAMIC_FUNCTION DYNAMIC_FUNCTION_CODE E¶ EACH ELEMENT ELSE EMPTY ENCODING END END-EXEC END_FRAME END_PARTITION ENFORCED EPOCH EQUALS ERROR ESCAPE ESTIMATED_COST EVERY EXCEPT EXCEPTION EXCLUDE EXCLUDING EXEC EXECUTE EXISTS EXP EXPLAIN EXTEND EXTENDED EXTERNAL EXTRACT F¶ FALSE FETCH FILTER FINAL FIRST FIRST_VALUE FLOAT FLOOR FOLLOWING FOR FOREIGN FORMAT FORTRAN FOUND FRAC_SECOND FRAME_ROW FREE FRESHNESS FRIDAY FROM FULL FUNCTION FUNCTIONS FUSION G¶ G GENERAL GENERATED GEOMETRY GET GLOBAL GO GOTO GRANT GRANTED GROUP GROUPING GROUPS GROUP_CONCAT H¶ HAVING HASH HIERARCHY HOLD HOP HOUR HOURS I¶ IDENTITY IF IGNORE IMMEDIATE IMMEDIATELY IMPLEMENTATION ILIKE IMPORT IN INCLUDE INCLUDING INCREMENT INDICATOR INITIAL INITIALLY INNER INOUT INPUT INSENSITIVE INSERT INSTANCE INSTANTIABLE INT INTEGER INTERSECT INTERSECTION INTERVAL INTO INVOKER IS ISODOW ISOLATION ISOYEAR J¶ JAR JARS JAVA JOB JOBS JOIN JSON JSON_ARRAY JSON_ARRAYAGG JSON_EXECUTION_PLAN JSON_EXISTS JSON_OBJECT JSON_OBJECTAGG JSON_QUERY JSON_SCOPE JSON_VALUE K¶ K KEY KEY_MEMBER KEY_TYPE L¶ LABEL LAG LANGUAGE LARGE LAST LAST_VALUE LATERAL LEAD LEADING LEFT LENGTH LEVEL LIBRARY LIKE LIKE_REGEX LIMIT LN LOAD LOCAL LOCALTIME LOCALTIMESTAMP LOCATOR LOWER M¶ M MAP MATCH MATCHED MATCHES MATCH_NUMBER MATCH_RECOGNIZE MATERIALIZED MAX MAXVALUE MEASURES MEMBER MERGE MESSAGE_LENGTH MESSAGE_OCTET_LENGTH MESSAGE_TEXT METADATA METHOD MICROSECOND MILLENNIUM MILLISECOND MIN MINUS MINUTE MINUTES MINUTE MINVALUE ML_PREDICT MOD MODEL MODELS MODIFIES MODIFY MODULE MODULES MONDAY MONTH MONTHS MORE MULTISET MUMPS N¶ NAME NAMES NANOSECOND NATIONAL NATURAL NCHAR NCLOB NESTING NEW NEXT NO NONE NORMALIZE NORMALIZED NOT NTH_VALUE NTILE NULL NULLABLE NULLIF NULLS NUMBER NUMERIC O¶ OBJECT OCCURRENCES_REGEX OCTETS OCTET_LENGTH OF OFFSET OLD OMIT ON ONE ONLY OPEN OPTION OPTIONS OR ORDER ORDERING ORDINAL ORDINALITY OTHERS OUT OUTER OUTPUT OVER OVERLAPS OVERLAY OVERRIDING OVERWRITE OVERWRITING P¶ PAD PARAMETER PARAMETER_MODE PARAMETER_NAME PARAMETER_ORDINAL_POSITION PARAMETER_SPECIFIC_CATALOG PARAMETER_SPECIFIC_NAME PARAMETER_SPECIFIC_SCHEMA PARTIAL PARTITION PARTITIONED PARTITIONS PASCAL PASSING PASSTHROUGH PAST PATH PATTERN PER PERCENT PERCENTILE_CONT PERCENTILE_DISC PERCENT_RANK PERIOD PERMUTE PIVOT PLACING PLAN PLAN_ADVICE PLI PORTION POSITION POSITION_REGEX POWER PRECEDES PRECEDING PRECISION PREPARE PRESERVE PREV PRIMARY PRIOR PRIVILEGES PROCEDURE PROCEDURES PUBLIC PYTHON Q¶ QUALIFY QUARTER QUARTERS R¶ RANGE RANK RAW READ READS REAL RECURSIVE REF REFERENCES REFERENCING REFRESH_MODE REGR_AVGX REGR_AVGY REGR_COUNT REGR_INTERCEPT REGR_R2 REGR_SLOPE REGR_SXX REGR_SXY REGR_SYY RELATIVE RELEASE REMOVE RENAME REPEATABLE REPLACE RESET RESPECT RESTART RESTRICT RESULT RETURN RETURNED_CARDINALITY RETURNED_LENGTH RETURNED_OCTET_LENGTH RETURNED_SQLSTATE RETURNING RETURNS REVOKE RIGHT RLIKE ROLE ROLLBACK ROLLUP ROUTINE ROUTINE_CATALOG ROUTINE_NAME ROUTINE_SCHEMA ROW ROWS ROW_COUNT ROW_NUMBER RUNNING S¶ SAFE_CAST SAFE_OFFSET SAFE_ORDINAL SATURDAY SAVEPOINT SCALA SCALAR SCALE SCHEMA SCHEMA_NAME SCOPE SCOPE_CATALOGS SCOPE_NAME SCOPE_SCHEMA SCROLL SEARCH SECOND SECONDS SECTION SECURITY SEEK SELECT SELF SENSITIVE SEPARATOR SEQUENCE SERIALIZABLE SERVER SERVER_NAME SESSION SESSION_USER SET SETS SHOW SIMILAR SIMPLE SIZE SKIP SMALLINT SOME SOURCE SPACE SPECIFIC SPECIFICTYPE SPECIFIC_NAME SQL SQLEXCEPTION SQLSTATE SQLWARNING SQL_BIGINT SQL_BINARY SQL_BIT SQL_BLOB SQL_BOOLEAN SQL_CHAR SQL_CLOB SQL_DATE SQL_DECIMAL SQL_DOUBLE SQL_FLOAT SQL_INTEGER SQL_INTERVAL_DAY SQL_INTERVAL_DAY_TO_HOUR SQL_INTERVAL_DAY_TO_MINUTE SQL_INTERVAL_DAY_TO_SECOND SQL_INTERVAL_HOUR SQL_INTERVAL_HOUR_TO_MINUTE SQL_INTERVAL_HOUR_TO_SECOND SQL_INTERVAL_MINUTE SQL_INTERVAL_MINUTE_TO_SECOND SQL_INTERVAL_MONTH SQL_INTERVAL_SECOND SQL_INTERVAL_YEAR SQL_INTERVAL_YEAR_TO_MONTH SQL_LONGVARBINARY SQL_LONGVARCHAR SQL_LONGVARNCHAR SQL_NCHAR SQL_NCLOB SQL_NUMERIC SQL_NVARCHAR SQL_REAL SQL_SMALLINT SQL_TIME SQL_TIMESTAMP SQL_TINYINT SQL_TSI_DAY SQL_TSI_FRAC_SECOND SQL_TSI_HOUR SQL_TSI_MICROSECOND SQL_TSI_MINUTE SQL_TSI_MONTH SQL_TSI_QUARTER SQL_TSI_SECOND SQL_TSI_WEEK SQL_TSI_YEAR SQL_VARBINARY SQL_VARCHAR SQRT START STATE STATEMENT STATIC STATISTICS STDDEV_POP STDDEV_SAMP STOP STREAM STRING STRING_AGG STRUCTURE STYLE SUBCLASS_ORIGIN SUBMULTISET SUBSET SUBSTITUTE SUBSTRING SUBSTRING_REGEX SUCCEEDS SUM SUNDAY SUSPEND SYMMETRIC SYSTEM SYSTEM_TIME SYSTEM_USER T¶ TABLE TABLES TABLESAMPLE TABLE_NAME TEMPORARY THEN THURSDAY TIES TIME TIMESTAMP TIMESTAMP_DIFF TIMESTAMP_LTZ TIMESTAMP_TRUNC TIMESTAMPADD TIMESTAMPDIFF TIMEZONE_HOUR TIMEZONE_MINUTE TIME_DIFF TIME_TRUNC TINYINT TO TOP_LEVEL_COUNT TRAILING TRANSACTION TRANSACTIONS_ACTIVE TRANSACTIONS_COMMITTED TRANSACTIONS_ROLLED_BACK TRANSFORM TRANSFORMS TRANSLATE TRANSLATE_REGEX TRANSLATION TREAT TRIGGER TRIGGER_CATALOG TRIGGER_NAME TRIGGER_SCHEMA TRIM TRIM_ARRAY TRUE TRUNCATE TRY_CAST TUESDAY TUMBLE TYPE U¶ UESCAPE UNBOUNDED UNCOMMITTED UNCONDITIONAL UNDER UNION UNIQUE UNKNOWN UNLOAD UNNAMED UNNEST UNPIVOT UPDATE UPPER UPSERT USAGE USE USER USER_DEFINED_TYPE_CATALOG USER_DEFINED_TYPE_CODE USER_DEFINED_TYPE_NAME USER_DEFINED_TYPE_SCHEMA USING UTF16 UTF32 UTF8 V¶ VALUE VALUES VALUE_OF VARBINARY VARCHAR VARYING VAR_POP VAR_SAMP VERSION VERSIONING VIEW VIEWS VIRTUAL W¶ WATERMARK WATERMARKS WEDNESDAY WEEK WEEKS WHEN WHENEVER WHERE WIDTH_BUCKET WINDOW WITH WITHIN WITHOUT WORK WRAPPER WRITE X¶ XML Y¶ YEAR YEARS Z¶ ZONE Related content¶ DDL Statements Flink SQL Queries Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql `DATABASES` `RAW` ``` --- ### Flink SQL and Table API Reference in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/overview.html Flink SQL and Table API Reference in Confluent Cloud for Apache Flink¶ This section describes the SQL language support in Confluent Cloud for Apache Flink®, including Data Definition Language (DDL) statements, Data Manipulation Language (DML) statements, built-in functions, and the Table API. Apache Flink® SQL is based on Apache Calcite, which implements the SQL standard. Data Types¶ Flink SQL has a rich set of native data types that you can use in SQL statements and queries. Data Types Serialize and deserialize data¶ Data Type Mappings Reserved keywords¶ Some string combinations are reserved as keywords for future use. Flink SQL Reserved Keywords Related content¶ Stream Processing Concepts Time and Watermarks Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. --- ### SQL Deduplication Queries in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/queries/deduplication.html Deduplication Queries in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables removing duplicate rows over a set of columns in a Flink SQL table. Syntax¶ SELECT [column_list] FROM ( SELECT [column_list], ROW_NUMBER() OVER ([PARTITION BY column1[, column2...]] ORDER BY time_attr [asc|desc]) AS rownum FROM table_name) WHERE rownum = 1 Parameter Specification Note This query pattern must be followed exactly, otherwise, the optimizer can’t translate the query. ROW_NUMBER(): Assigns an unique, sequential number to each row, starting with one. PARTITION BY column1[, column2...]: Specifies the partition columns by the deduplicate key. ORDER BY time_attr [asc|desc]: Specifies the ordering column, which must be a time attribute. Flink SQL supports the event time attribute. Processing time is not supported in Confluent Cloud for Apache Flink. Ordering by ASC means keeping the first row, ordering by DESC means keeping the last row. WHERE rownum = 1: The rownum = 1 is required for Flink SQL to recognize the query is deduplication. Description¶ Deduplication removes duplicate rows over a set of columns, keeping only the first or last row. Flink SQL uses the ROW_NUMBER() function to remove duplicates, similar to its usage in Top-N Queries in Confluent Cloud for Apache Flink. Deduplication is a special case of the Top-N query, in which N is 1 and row order is by event time. In some cases, an upstream ETL job isn’t end-to-end exactly-once, which may cause duplicate records in the sink, in case of failover. Duplicate records affect the correctness of downstream analytical jobs, like SUM and COUNT, so deduplication is required before further analysis can continue. See deduplication in action Apply the Deduplicate Topic action to generate a table that contains only unique records from an input table. Example¶ In the Flink SQL shell or in a Cloud Console workspace, run the following statement to see an example of row deduplication. It returns the first URL that the customer has visited. The rows are deduplicated by the $rowtime column, which is the system column mapped to the Kafka record timestamp and can be either LogAppendTime or CreateTime. Run the following statement to return the deduplicated rows. SELECT user_id, url, $rowtime FROM ( SELECT *, $rowtime, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY $rowtime ASC) AS rownum FROM `examples`.`marketplace`.`clicks`) WHERE rownum = 1; Your output should resemble: user_id url $rowtime 3246 https://www.acme.com/product/upmtv 2024-04-16 08:04:47.365 4028 https://www.acme.com/product/jtahp 2024-04-16 08:04:47.367 4549 https://www.acme.com/product/ixsir 2024-04-16 08:04:47.367 Related content¶ Flink action: Deduplicate Rows in a Table Flink SQL Queries Flink SQL Functions Statements Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql SELECT [column_list] FROM ( SELECT [column_list], ROW_NUMBER() OVER ([PARTITION BY column1[, column2...]] ORDER BY time_attr [asc|desc]) AS rownum FROM table_name) WHERE rownum = 1 ``` ```sql ROW_NUMBER() ``` ```sql PARTITION BY column1[, column2...] ``` ```sql ORDER BY time_attr [asc|desc] ``` ```sql WHERE rownum = 1 ``` ```sql ROW_NUMBER() ``` ```sql LogAppendTime ``` ```sql SELECT user_id, url, $rowtime FROM ( SELECT *, $rowtime, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY $rowtime ASC) AS rownum FROM `examples`.`marketplace`.`clicks`) WHERE rownum = 1; ``` ```sql user_id url $rowtime 3246 https://www.acme.com/product/upmtv 2024-04-16 08:04:47.365 4028 https://www.acme.com/product/jtahp 2024-04-16 08:04:47.367 4549 https://www.acme.com/product/ixsir 2024-04-16 08:04:47.367 ``` --- ### SQL Group Aggregation Queries in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/queries/group-aggregation.html Group Aggregation Queries in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables computing a single result from multiple input rows in a Flink SQL table. Description¶ Compute a single result from multiple input rows in a table. Like most data systems, Apache Flink® supports aggregate functions. An aggregate function computes a single result from multiple input rows. For example, there are aggregates to compute the COUNT, SUM, AVG (average), MAX (maximum) and MIN (minimum) values over a set of rows. The following example shows how to count the number of rows in a table, by using the COUNT function. SELECT COUNT(*) FROM orders For streaming queries, Flink runs continuous queries that never terminate. A continuous query updates the result table according to the updates on its input tables. For the previous query, Flink outputs an updated count each time a new row is inserted into the orders table. GROUP BY clause¶ Flink SQL supports the standard GROUP BY clause for aggregating data. The following example shows how to count the number of rows in a table and group the results by a table column. SELECT COUNT(*) FROM orders GROUP BY order_id For streaming queries, the required state for computing the query result might grow indefinitely. State size depends on the number of groups and the number and type of aggregation functions. For example, MIN and MAX are heavy on state size while COUNT is inexpensive. DISTINCT Aggregation¶ Distinct aggregates remove duplicate values before applying an aggregation function. The following example counts the number of distinct order_ids instead of the total number of rows in an orders table. SELECT COUNT(DISTINCT order_id) FROM orders For streaming queries, the required state for computing the query result might grow indefinitely. State size depends primarily on the number of distinct rows and the time that a group is maintained. Short-lived GROUP BY windows are not a problem. GROUPING SETS¶ Grouping sets enable more complex grouping operations than those you can describe with a standard GROUP BY clause. Rows are grouped separately by each specified grouping set, and aggregates are computed for each group just as for simple GROUP BY clauses. The following example show how to use GROUPING SETS to SELECT supplier_id, rating, COUNT(*) AS total FROM (VALUES ('supplier1', 'product1', 4), ('supplier1', 'product2', 3), ('supplier2', 'product3', 3), ('supplier2', 'product4', 4)) AS Products(supplier_id, product_id, rating) GROUP BY GROUPING SETS ((supplier_id, rating), (supplier_id), ()) Results: +-------------+--------+-------+ | supplier_id | rating | total | +-------------+--------+-------+ | supplier1 | 4 | 1 | | supplier1 | (NULL) | 2 | | (NULL) | (NULL) | 4 | | supplier1 | 3 | 1 | | supplier2 | 3 | 1 | | supplier2 | (NULL) | 2 | | supplier2 | 4 | 1 | +-------------+--------+-------+ Each sublist of GROUPING SETS specifies zero or more columns or expressions and is interpreted as if it were used directly in the GROUP BY clause. An empty grouping set means that all rows are aggregated down to a single group, which is output even if no input rows were present. References to the grouping columns or expressions are replaced by null values in result rows for grouping sets in which those columns don’t appear. For streaming queries, the required state for computing the query result might grow indefinitely. State size depends on the number of group sets and type of aggregation functions. ROLLUP¶ ROLLUP is a shorthand notation for specifying a common type of grouping set. It represents the given list of expressions and all prefixes of the list, including the empty list. For example, the following query is equivalent to the previous GROUP BY GROUPING SETS query. SELECT supplier_id, rating, COUNT(*) FROM (VALUES ('supplier1', 'product1', 4), ('supplier1', 'product2', 3), ('supplier2', 'product3', 3), ('supplier2', 'product4', 4)) AS Products(supplier_id, product_id, rating) GROUP BY ROLLUP (supplier_id, rating) CUBE¶ CUBE is a shorthand notation for specifying a common type of grouping set. It represents the given list and all of its possible subsets, which is also known as the power set. For example, the following two queries are equivalent. SELECT supplier_id, rating, product_id, COUNT(*) FROM (VALUES ('supplier1', 'product1', 4), ('supplier1', 'product2', 3), ('supplier2', 'product3', 3), ('supplier2', 'product4', 4)) AS Products(supplier_id, product_id, rating) GROUP BY CUBE (supplier_id, rating, product_id) SELECT supplier_id, rating, product_id, COUNT(*) FROM (VALUES ('supplier1', 'product1', 4), ('supplier1', 'product2', 3), ('supplier2', 'product3', 3), ('supplier2', 'product4', 4)) AS Products(supplier_id, product_id, rating) GROUP BY GROUPING SET ( ( supplier_id, product_id, rating ), ( supplier_id, product_id ), ( supplier_id, rating ), ( supplier_id ), ( product_id, rating ), ( product_id ), ( rating ), ( ) ) HAVING¶ The HAVING clause eliminates group rows that don’t satisfy the specified condition. HAVING is distinct from the WHERE clause, because WHERE filters individual rows before the GROUP BY, while HAVING filters group rows created by GROUP BY. Each column referenced in the condition must unambiguously reference a grouping column, unless it appears within an aggregate function. SELECT SUM(amount) FROM orders GROUP BY users HAVING SUM(amount) > 50 The presence of a HAVING clause turns a query into a grouped query, even if there is no GROUP BY clause. It’s the same as what happens when the query contains aggregate functions but no GROUP BY clause. The query considers all selected rows to form a single group, and the SELECT list and HAVING clause can reference only table columns from within aggregate functions. Such a query emits a single row if the HAVING condition is true, and zero rows if it’s not true. Related content¶ Flink SQL Queries Flink SQL Functions Statements Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql SELECT COUNT(*) FROM orders ``` ```sql SELECT COUNT(*) FROM orders GROUP BY order_id ``` ```sql SELECT COUNT(DISTINCT order_id) FROM orders ``` ```sql SELECT supplier_id, rating, COUNT(*) AS total FROM (VALUES ('supplier1', 'product1', 4), ('supplier1', 'product2', 3), ('supplier2', 'product3', 3), ('supplier2', 'product4', 4)) AS Products(supplier_id, product_id, rating) GROUP BY GROUPING SETS ((supplier_id, rating), (supplier_id), ()) ``` ```sql +-------------+--------+-------+ | supplier_id | rating | total | +-------------+--------+-------+ | supplier1 | 4 | 1 | | supplier1 | (NULL) | 2 | | (NULL) | (NULL) | 4 | | supplier1 | 3 | 1 | | supplier2 | 3 | 1 | | supplier2 | (NULL) | 2 | | supplier2 | 4 | 1 | +-------------+--------+-------+ ``` ```sql GROUPING SETS ``` ```sql SELECT supplier_id, rating, COUNT(*) FROM (VALUES ('supplier1', 'product1', 4), ('supplier1', 'product2', 3), ('supplier2', 'product3', 3), ('supplier2', 'product4', 4)) AS Products(supplier_id, product_id, rating) GROUP BY ROLLUP (supplier_id, rating) ``` ```sql SELECT supplier_id, rating, product_id, COUNT(*) FROM (VALUES ('supplier1', 'product1', 4), ('supplier1', 'product2', 3), ('supplier2', 'product3', 3), ('supplier2', 'product4', 4)) AS Products(supplier_id, product_id, rating) GROUP BY CUBE (supplier_id, rating, product_id) SELECT supplier_id, rating, product_id, COUNT(*) FROM (VALUES ('supplier1', 'product1', 4), ('supplier1', 'product2', 3), ('supplier2', 'product3', 3), ('supplier2', 'product4', 4)) AS Products(supplier_id, product_id, rating) GROUP BY GROUPING SET ( ( supplier_id, product_id, rating ), ( supplier_id, product_id ), ( supplier_id, rating ), ( supplier_id ), ( product_id, rating ), ( product_id ), ( rating ), ( ) ) ``` ```sql SELECT SUM(amount) FROM orders GROUP BY users HAVING SUM(amount) > 50 ``` --- ### SQL INSERT INTO FROM SELECT Statement in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/queries/insert-into-from-select.html INSERT INTO FROM SELECT Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables inserting SELECT query results directly into a Flink SQL table. Syntax¶ [EXECUTE] INSERT { INTO | OVERWRITE } [catalog_name.][database_name.]table_name [PARTITION (partition_column_name1=value1 [, partition_column_name2=value2, ...])] [(column_name1 [, column_name2, ...])] select_statement OVERWRITEINSERT OVERWRITE overwrites all existing data in the table or partition. New records are appended. PARTITIONThe PARTITION clause contains static partition columns for the insertion. COLUMN LISTFor a table T(a INT, b INT, c INT), Flink supports INSERT INTO T(c, b) SELECT x, y FROM S. The x result is written to column c, and the y result is written to column b. If column a is nullable, a is set to NULL. Description¶ Insert query results into a table. Use the INSERT INTO FROM SELECT statement to insert rows into a table from another table or query. For example, if you have a table T with columns a, b, and c, and another table S with columns x and y, the following query writes the values of x and y from S into c and b of T, respectively. INSERT INTO T (c, b) SELECT x, y FROM S If column a of T is nullable, Flink sets it to NULL. Examples¶ Insert rows into a simple table¶ In the Flink SQL shell or in a Cloud Console workspace, run the following commands to see an example of the INSERT INTO FROM SELECT statement. Create a table for web page click events. -- Create a table for web page click events. CREATE TABLE clicks ( ip_address VARCHAR, url VARCHAR, click_ts_raw BIGINT ); Populate the table with mock clickstream data. -- Populate the table with mock clickstream data. INSERT INTO clicks VALUES( '10.0.0.1', 'https://acme.com/index.html', 1692812175), ( '10.0.0.12', 'https://apache.org/index.html', 1692826575), ( '10.0.0.13', 'https://confluent.io/index.html', 1692826575), ( '10.0.0.1', 'https://acme.com/index.html', 1692812175), ( '10.0.0.12', 'https://apache.org/index.html', 1692819375), ( '10.0.0.13', 'https://confluent.io/index.html', 1692826575); Press ENTER to return to the SQL shell. Because INSERT INTO VALUES is a point-in-time statement, it exits after it completes inserting records. Create another table for filtered web page click events. CREATE TABLE filtered_clicks ( ip_address VARCHAR, url VARCHAR, click_ts_raw BIGINT ); Run the following statement to insert filtered rows into the filtered_clicks table. Only clicks that have an IP address of 10.0.0.1 are inserted. INSERT INTO filtered_clicks( ip_address, url, click_ts_raw ) SELECT * FROM clicks WHERE ip_address = '10.0.0.1'; View the rows in the filtered_clicks table. SELECT * FROM filtered_clicks; Your output should resemble: ip_address url click_timestamp 10.0.0.1 https://acme.com/index.html 2023-08-23 10:36:15 10.0.0.1 https://acme.com/index.html 2023-08-23 10:36:15 Fill a table without specifying all columns¶ CREATE TABLE t_insert_gaps (c1 STRING, c2 STRING, c3 STRING, c4 STRING); INSERT INTO t_insert_gaps (c3) SELECT 'Bob'; INSERT INTO t_insert_gaps (c3, c2) SELECT 'Bob', 'Alice'; SELECT * FROM t_insert_gaps; Properties A column list is defined between the table name and the SELECT in the INSERT INTO statement, so the SELECT statement uses a reduced schema. Columns c1, c2, are c4 are filled with NULLs. If one of the columns is declared NOT NULL, an error occurs. Related content¶ Convert the Serialization Format of a Topic INSERT VALUES Flink SQL Queries Flink SQL Functions Statements Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql [EXECUTE] INSERT { INTO | OVERWRITE } [catalog_name.][database_name.]table_name [PARTITION (partition_column_name1=value1 [, partition_column_name2=value2, ...])] [(column_name1 [, column_name2, ...])] select_statement ``` ```sql INSERT OVERWRITE ``` ```sql T(a INT, b INT, c INT) ``` ```sql INSERT INTO T(c, b) SELECT x, y FROM S. ``` ```sql INSERT INTO T (c, b) SELECT x, y FROM S ``` ```sql -- Create a table for web page click events. CREATE TABLE clicks ( ip_address VARCHAR, url VARCHAR, click_ts_raw BIGINT ); ``` ```sql -- Populate the table with mock clickstream data. INSERT INTO clicks VALUES( '10.0.0.1', 'https://acme.com/index.html', 1692812175), ( '10.0.0.12', 'https://apache.org/index.html', 1692826575), ( '10.0.0.13', 'https://confluent.io/index.html', 1692826575), ( '10.0.0.1', 'https://acme.com/index.html', 1692812175), ( '10.0.0.12', 'https://apache.org/index.html', 1692819375), ( '10.0.0.13', 'https://confluent.io/index.html', 1692826575); ``` ```sql CREATE TABLE filtered_clicks ( ip_address VARCHAR, url VARCHAR, click_ts_raw BIGINT ); ``` ```sql filtered_clicks ``` ```sql INSERT INTO filtered_clicks( ip_address, url, click_ts_raw ) SELECT * FROM clicks WHERE ip_address = '10.0.0.1'; ``` ```sql filtered_clicks ``` ```sql SELECT * FROM filtered_clicks; ``` ```sql ip_address url click_timestamp 10.0.0.1 https://acme.com/index.html 2023-08-23 10:36:15 10.0.0.1 https://acme.com/index.html 2023-08-23 10:36:15 ``` ```sql CREATE TABLE t_insert_gaps (c1 STRING, c2 STRING, c3 STRING, c4 STRING); INSERT INTO t_insert_gaps (c3) SELECT 'Bob'; INSERT INTO t_insert_gaps (c3, c2) SELECT 'Bob', 'Alice'; SELECT * FROM t_insert_gaps; ``` --- ### SQL INSERT VALUES Statement in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/queries/insert-values.html INSERT VALUES Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables inserting data directly into a Flink SQL table. Syntax¶ [EXECUTE] INSERT { INTO | OVERWRITE } [catalog_name.][database_name.]table_name VALUES (value1 [, value2, ...]) [, (value3 [, value4, ...])] Description¶ Insert data into a table. Use the INSERT VALUES statement to insert one or more rows into a table by specifying the value for each column. For example, the following statement inserts a single row into a table named orders that has four columns. INSERT INTO orders VALUES (1, 1001, '2023-02-24', 50.0); You can insert multiple rows by using a comma-separated list of values. INSERT INTO orders VALUES (1, 1001, '2023-02-24', 50.0), (2, 1002, '2023-02-25', 60.0), (3, 1003, '2023-02-26', 70.0); Example¶ In the Flink SQL shell or in a Cloud Console workspace, run the following commands to see an example of the INSERT VALUES statement. Create a users table. -- Create a users table. CREATE TABLE users ( user_id STRING, registertime BIGINT, gender STRING, regionid STRING ); Insert rows into the users table. -- Populate the table with mock users data. INSERT INTO users VALUES ('Thomas A. Anderson', 1677260724, 'male', 'Region_4'), ('Trinity', 1677260733, 'female', 'Region_4'), ('Morpheus', 1677260742, 'male', 'Region_8'), ('Dozer', 1677260823, 'male', 'Region_1'), ('Agent Smith', 1677260955, 'male', 'Region_0'), ('Persephone', 1677260901, 'female', 'Region_2'), ('Niobe', 1677260921, 'female', 'Region_3'), ('Zee', 1677260922, 'female', 'Region_5'); Inspect the inserted rows. SELECT * FROM users; Your output should resemble: user_id registertime gender regionid Thomas A. Anderson 1677260724 male Region_4 Trinity 1677260733 female Region_4 Morpheus 1677260742 male Region_8 Dozer 1677260823 male Region_1 Agent Smith 1677260955 male Region_0 Persephone 1677260901 female Region_2 Niobe 1677260921 female Region_3 Zee 1677260922 female Region_5 Related content¶ INSERT INTO FROM SELECT Flink SQL Queries Flink SQL Functions Statements Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql [EXECUTE] INSERT { INTO | OVERWRITE } [catalog_name.][database_name.]table_name VALUES (value1 [, value2, ...]) [, (value3 [, value4, ...])] ``` ```sql INSERT INTO orders VALUES (1, 1001, '2023-02-24', 50.0); ``` ```sql INSERT INTO orders VALUES (1, 1001, '2023-02-24', 50.0), (2, 1002, '2023-02-25', 60.0), (3, 1003, '2023-02-26', 70.0); ``` ```sql -- Create a users table. CREATE TABLE users ( user_id STRING, registertime BIGINT, gender STRING, regionid STRING ); ``` ```sql -- Populate the table with mock users data. INSERT INTO users VALUES ('Thomas A. Anderson', 1677260724, 'male', 'Region_4'), ('Trinity', 1677260733, 'female', 'Region_4'), ('Morpheus', 1677260742, 'male', 'Region_8'), ('Dozer', 1677260823, 'male', 'Region_1'), ('Agent Smith', 1677260955, 'male', 'Region_0'), ('Persephone', 1677260901, 'female', 'Region_2'), ('Niobe', 1677260921, 'female', 'Region_3'), ('Zee', 1677260922, 'female', 'Region_5'); ``` ```sql SELECT * FROM users; ``` ```sql user_id registertime gender regionid Thomas A. Anderson 1677260724 male Region_4 Trinity 1677260733 female Region_4 Morpheus 1677260742 male Region_8 Dozer 1677260823 male Region_1 Agent Smith 1677260955 male Region_0 Persephone 1677260901 female Region_2 Niobe 1677260921 female Region_3 Zee 1677260922 female Region_5 ``` --- ### SQL Join Queries in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/queries/joins.html Join Queries in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables join data streams over Flink SQL dynamic tables. Description¶ Flink supports complex and flexible join operations over dynamic tables. There are a number of different types of joins to account for the wide variety of semantics that queries may require. By default, the order of joins is not optimized. Tables are joined in the order in which they are specified in the FROM clause. You can tweak the performance of your join queries, by listing the tables with the lowest update frequency first and the tables with the highest update frequency last. Make sure to specify tables in an order that doesn’t yield a cross join (Cartesian product), which aren’t supported and would cause a query to fail. Regular joins¶ Regular joins are the most generic type of join in which any new row, or changes to either side of the join, are visible and affect the whole join result. For example, if there is a new record on the left side, it is joined with all of the previous and future records on the right side when the join fields are equal. SELECT * FROM orders INNER JOIN Product ON orders.productId = Product.id For streaming queries, the grammar of regular joins is the most flexible and enables any kind of updates (insert, update, delete) on the input table. But this operation has important implications: it requires keeping both sides of the join input in state forever, so the required state for computing the query result might grow indefinitely, depending on the number of distinct input rows of all input tables and intermediate join results. INNER Equi-JOIN¶ Returns a simple Cartesian product restricted by the join condition. Only equi-joins are supported, i.e., joins that have at least one conjunctive condition with an equality predicate. Arbitrary cross or theta joins aren’t supported. SELECT * FROM orders INNER JOIN Product ON orders.product_id = Product.id OUTER Equi-JOIN¶ Returns all rows in the qualified Cartesian product (i.e., all combined rows that pass its join condition), plus one copy of each row in an outer table for which the join condition did not match with any row of the other table. Flink supports LEFT, RIGHT, and FULL outer joins. Only equi-joins are supported, i.e., joins that have at least one conjunctive condition with an equality predicate. Arbitrary cross or theta joins aren’t supported. SELECT * FROM orders LEFT JOIN Product ON orders.product_id = Product.id SELECT * FROM orders RIGHT JOIN Product ON orders.product_id = Product.id SELECT * FROM orders FULL OUTER JOIN Product ON orders.product_id = Product.id Interval joins¶ An interval join returns a simple Cartesian product restricted by the join condition and a time constraint. An interval join requires at least one equi-join predicate and a join condition that bounds the time on both sides. Two appropriate range predicates can define such a condition (<, <=, >=, >), a BETWEEN predicate, or a single equality predicate that compares time attributes of both input tables. For example, the following query joins all orders with their corresponding shipments if the order was shipped four hours after the order was received. SELECT * FROM orders o, Shipments s WHERE o.id = s.order_id AND o.order_time BETWEEN s.ship_time - INTERVAL '4' HOUR AND s.ship_time The following predicates are examples of valid interval join conditions: ltime = rtime ltime >= rtime AND ltime < rtime + INTERVAL '10' MINUTE ltime BETWEEN rtime - INTERVAL '10' SECOND AND rtime + INTERVAL '5' SECOND For streaming queries, compared to the regular join, interval join only supports append-only tables with time attributes. Because time attributes increase quasi-monotonically, Flink can remove old values from its state without affecting the correctness of the result. Temporal joins¶ A temporal join joins one table with another table that is updated over time. This join is made possible by linking both tables using a time attribute, which allows the join to consider the historical changes in the table. When viewing the table at a specific point in time, the join becomes a time-versioned join. In a temporal join, the join condition is based on a time attribute, and the join result includes all rows that satisfy the temporal relationship. A common use case for temporal joins is analyzing financial data, which often includes information that changes over time, such as stock prices, interest rates, and exchange rates. Event-time temporal joins¶ Event-time temporal joins are used to join two or more tables based on a common event time. Event time is the time at which an event occurred, which is typically embedded in the data itself. With Confluent Cloud for Apache Flink, you can use the $rowtime system column to get the timestamp from an Apache Kafka® record. This is also used for the default watermark strategy in Confluent Cloud. Temporal joins take an arbitrary table (left input/probe side) and correlate each row to the corresponding row’s relevant version in the versioned table (right input/build side). Flink uses the SQL syntax of FOR SYSTEM_TIME AS OF to perform this operation from the SQL:2011 standard. The syntax of a temporal join is as follows: SELECT [column_list] FROM table1 [AS ] [LEFT] JOIN table2 FOR SYSTEM_TIME AS OF table1.{ rowtime } [AS ] ON table1.column-name1 = table2.column-name1 With an event-time attribute, you can retrieve the value of a key as it was at some point in the past. This enables joining the two tables at a common point in time. The versioned table stores all versions, identified by time, since the last watermark. For example, suppose you have a table of orders, each with prices in different currencies. To properly normalize this table to a single currency, such as USD, each order needs to be joined with the proper currency conversion rate from the point in time when the order was placed. CREATE TABLE orders ( order_id STRING, price DECIMAL(32,2), currency STRING ); CREATE TABLE currency_rates ( currency STRING, conversion_rate DECIMAL(32, 2), PRIMARY KEY(currency) NOT ENFORCED ); SELECT orders.order_id, orders.price, orders.currency, currency_rates.conversion_rate FROM orders LEFT JOIN currency_rates FOR SYSTEM_TIME AS OF orders.`$rowtime` ON orders.currency = currency_rates.currency; The event-time temporal join requires the primary key contained in the equivalence condition of the temporal join condition. In this example, the primary key currency_rates.currency in the currency_rates table is constrained in the condition orders.currency = currency_rates.currency expression. With temporal joins, there’s some indeterminate amount of latency involved. In the example with orders and currency_rates, when enriching a particular order, an event-time temporal join waits until the watermark on the currency-rate stream reaches the timestamp of that order, because only then is it reasonable to be confident that the result of the join is being produced with complete knowledge of the relevant exchange-rate data. Event-time temporal joins can’t guarantee perfectly correct results. Despite having waited for the watermark, the most relevant exchange-rate record could still be late, in which case the join will be executed using an earlier version of the exchange rate. If the enrichment stream has infrequent updates, this will cause problems, because of the behavior of watermarking on idle streams. The operator, like any operator with two input streams, normally waits for the watermarks on both incoming streams to reach the desired timestamp before taking action. Array expansion¶ Returns a new row for each element in the given array. Unnesting WITH ORDINALITY is not yet supported. SELECT order_id, tag FROM orders CROSS JOIN UNNEST(tags) AS t (tag) Related content¶ Confluent Developer: Temporal Joins Explained Flink SQL Queries Flink SQL Functions Statements Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql SELECT * FROM orders INNER JOIN Product ON orders.productId = Product.id ``` ```sql SELECT * FROM orders INNER JOIN Product ON orders.product_id = Product.id ``` ```sql SELECT * FROM orders LEFT JOIN Product ON orders.product_id = Product.id SELECT * FROM orders RIGHT JOIN Product ON orders.product_id = Product.id SELECT * FROM orders FULL OUTER JOIN Product ON orders.product_id = Product.id ``` ```sql SELECT * FROM orders o, Shipments s WHERE o.id = s.order_id AND o.order_time BETWEEN s.ship_time - INTERVAL '4' HOUR AND s.ship_time ``` ```sql ltime = rtime ``` ```sql ltime >= rtime AND ltime < rtime + INTERVAL '10' MINUTE ``` ```sql ltime BETWEEN rtime - INTERVAL '10' SECOND AND rtime + INTERVAL '5' SECOND ``` ```sql SELECT [column_list] FROM table1 [AS ] [LEFT] JOIN table2 FOR SYSTEM_TIME AS OF table1.{ rowtime } [AS ] ON table1.column-name1 = table2.column-name1 ``` ```sql CREATE TABLE orders ( order_id STRING, price DECIMAL(32,2), currency STRING ); CREATE TABLE currency_rates ( currency STRING, conversion_rate DECIMAL(32, 2), PRIMARY KEY(currency) NOT ENFORCED ); SELECT orders.order_id, orders.price, orders.currency, currency_rates.conversion_rate FROM orders LEFT JOIN currency_rates FOR SYSTEM_TIME AS OF orders.`$rowtime` ON orders.currency = currency_rates.currency; ``` ```sql currency_rates.currency ``` ```sql currency_rates ``` ```sql condition orders.currency = currency_rates.currency ``` ```sql currency_rates ``` ```sql WITH ORDINALITY ``` ```sql SELECT order_id, tag FROM orders CROSS JOIN UNNEST(tags) AS t (tag) ``` --- ### SQL LIMIT clause in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/queries/limit.html LIMIT Clause in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables constraining the number of rows returned by a SELECT statement. Description¶ The LIMIT clause constrains the number of rows returned by a SELECT statement. Usually, this clause is used in conjunction with ORDER BY to ensure that the results are deterministic. Example¶ In the Flink SQL shell or in a Cloud Console workspace, run the following commands to see an example of the LIMIT clause. The following example selects the first 3 rows from a web page clicks table. Create a table for web page click events. -- Create a table for web page click events. CREATE TABLE clicks ( ip_address VARCHAR, url VARCHAR, click_ts_raw BIGINT ); Populate the table with mock clickstream data. -- Populate the table with mock clickstream data. INSERT INTO clicks VALUES( '10.0.0.1', 'https://acme.com/index.html', 1692812175), ( '10.0.0.12', 'https://apache.org/index.html', 1692826575), ( '10.0.0.13', 'https://confluent.io/index.html', 1692826575), ( '10.0.0.1', 'https://acme.com/index.html', 1692812175), ( '10.0.0.12', 'https://apache.org/index.html', 1692819375), ( '10.0.0.13', 'https://confluent.io/index.html', 1692826575); Press ENTER to return to the SQL shell. Because INSERT INTO VALUES is a point-in-time statement, it exits after it completes inserting records. View the rows in the clicks table and limit the result to 3 rows. SELECT * FROM clicks LIMIT 3; Your output should resemble: ip_address url click_ts_raw 10.0.0.1 https://acme.com/index.html 1692812175 10.0.0.12 https://apache.org/index.html 1692826575 10.0.0.13 https://confluent.io/index.html 1692826575 Related content¶ Flink SQL Queries Flink SQL Functions Statements Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql -- Create a table for web page click events. CREATE TABLE clicks ( ip_address VARCHAR, url VARCHAR, click_ts_raw BIGINT ); ``` ```sql -- Populate the table with mock clickstream data. INSERT INTO clicks VALUES( '10.0.0.1', 'https://acme.com/index.html', 1692812175), ( '10.0.0.12', 'https://apache.org/index.html', 1692826575), ( '10.0.0.13', 'https://confluent.io/index.html', 1692826575), ( '10.0.0.1', 'https://acme.com/index.html', 1692812175), ( '10.0.0.12', 'https://apache.org/index.html', 1692819375), ( '10.0.0.13', 'https://confluent.io/index.html', 1692826575); ``` ```sql SELECT * FROM clicks LIMIT 3; ``` ```sql ip_address url click_ts_raw 10.0.0.1 https://acme.com/index.html 1692812175 10.0.0.12 https://apache.org/index.html 1692826575 10.0.0.13 https://confluent.io/index.html 1692826575 ``` --- ### SQL Pattern Recognition Queries in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/queries/match_recognize.html Pattern Recognition Queries in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables pattern detection in event streams. Syntax¶ SELECT T.aid, T.bid, T.cid FROM MyTable MATCH_RECOGNIZE ( PARTITION BY userid ORDER BY $rowtime MEASURES A.id AS aid, B.id AS bid, C.id AS cid PATTERN (A B C) DEFINE A AS name = 'a', B AS name = 'b', C AS name = 'c' ) AS T Pattern recognition¶ It is a common use case to search for a set of event patterns, especially in case of data streams. Apache Flink® comes with a complex event processing (CEP) library, which enables pattern detection in event streams. Furthermore, the Flink SQL API provides a relational way of expressing queries with a large set of built-in functions and rule-based optimizations that you can use out of the box. In December 2016, the International Organization for Standardization (ISO) released a new version of the SQL standard which includes Row Pattern Recognition in SQL (ISO/IEC TR 19075-5:2016). It enables Flink to consolidate CEP and SQL API using the MATCH_RECOGNIZE clause for complex event processing in SQL. A MATCH_RECOGNIZE clause enables the following tasks: Logically partition and order the data that is used with the PARTITION BY and ORDER BY clauses. Define patterns of rows to seek using the PATTERN clause. These patterns use a syntax similar to that of regular expressions. The logical components of the row pattern variables are specified in the DEFINE clause. Define measures, which are expressions usable in other parts of the SQL query, in the MEASURES clause. This topic explains each keyword in more detail and illustrates more complex examples. Important The Flink implementation of the MATCH_RECOGNIZE clause is a subset of the full standard. Only the features documented in the following sections are supported. For more information, see Known limitations. Installation¶ To use the MATCH_RECOGNIZE clause in the Flink SQL CLI, no action is necessary, because all dependencies are included by default. SQL semantics¶ Every MATCH_RECOGNIZE query consists of the following clauses: PARTITION BY - defines the logical partitioning of the table, similar to a GROUP BY operation. ORDER BY - specifies how the incoming rows should be ordered, which is essential, because patterns depend on an order. MEASURES - defines the output of the clause, similar to a SELECT clause. ONE ROW PER MATCH - output mode that defines how many rows per match to produce. AFTER MATCH SKIP - specifies where the next match should start. This is also a way to control how many distinct matches a single event can belong to. PATTERN - enables constructing patterns that will be searched for using a syntax that’s similar to regular expressions. DEFINE - defines the conditions that the pattern variables must satisfy. Examples¶ These examples assume that a table Ticker has been registered. The table contains prices of stocks at a particular point in time. The table has a following schema: Ticker |-- symbol: String # symbol of the stock |-- price: Long # price of the stock |-- tax: Long # tax liability of the stock |-- rowtime: TimeIndicatorTypeInfo(rowtime) # point in time when the change to those values happened For simplicity, only the incoming data for a single stock, named ACME, is considered. A ticker could look similar to the following table, where rows are continuously appended. symbol rowtime price tax ====== ==================== ======= ======= 'ACME' '01-Apr-11 10:00:00' 12 1 'ACME' '01-Apr-11 10:00:01' 17 2 'ACME' '01-Apr-11 10:00:02' 19 1 'ACME' '01-Apr-11 10:00:03' 21 3 'ACME' '01-Apr-11 10:00:04' 25 2 'ACME' '01-Apr-11 10:00:05' 18 1 'ACME' '01-Apr-11 10:00:06' 15 1 'ACME' '01-Apr-11 10:00:07' 14 2 'ACME' '01-Apr-11 10:00:08' 24 2 'ACME' '01-Apr-11 10:00:09' 25 2 'ACME' '01-Apr-11 10:00:10' 19 1 The task is to find periods of a constantly decreasing price of a single ticker. To accomplish this, you could write a query like the following: SELECT * FROM Ticker MATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY $rowtime MEASURES START_ROW.rowtime AS start_tstamp, LAST(PRICE_DOWN.$rowtime) AS bottom_tstamp, LAST(PRICE_UP.$rowtime) AS end_tstamp ONE ROW PER MATCH AFTER MATCH SKIP TO LAST PRICE_UP PATTERN (START_ROW PRICE_DOWN+ PRICE_UP) DEFINE PRICE_DOWN AS (LAST(PRICE_DOWN.price, 1) IS NULL AND PRICE_DOWN.price < START_ROW.price) OR PRICE_DOWN.price < LAST(PRICE_DOWN.price, 1), PRICE_UP AS PRICE_UP.price > LAST(PRICE_DOWN.price, 1) ) MR; The query partitions the Ticker table by the symbol column and orders it by the rowtime time attribute. The PATTERN clause specifies a pattern with a starting event START_ROW that is followed by one or more PRICE_DOWN events and concluded with a PRICE_UP event. If such a pattern can be found, the next pattern match will be seeked at the last PRICE_UP event as indicated by the AFTER MATCH SKIP TO LAST clause. The DEFINE clause specifies the conditions that need to be met for a PRICE_DOWN and PRICE_UP event. Although the START_ROW pattern variable is not present it has an implicit condition that is evaluated always as TRUE. A pattern variable PRICE_DOWN is defined as a row with a price that is smaller than the price of the last row that met the PRICE_DOWN condition. For the initial case or when there is no last row that met the PRICE_DOWN condition, the price of the row should be smaller than the price of the preceding row in the pattern (referenced by START_ROW). A pattern variable PRICE_UP is defined as a row with a price that is larger than the price of the last row that met the PRICE_DOWN condition. This query produces a summary row for each period in which the price of a stock was continuously decreasing. The exact representation of the output rows is defined in the MEASURES part of the query. The number of output rows is defined by the ONE ROW PER MATCH output mode. symbol start_tstamp bottom_tstamp end_tstamp ========= ================== ================== ================== ACME 01-APR-11 10:00:04 01-APR-11 10:00:07 01-APR-11 10:00:08 The resulting row describes a period of falling prices that started at 01-APR-11 10:00:04 and achieved the lowest price at 01-APR-11 10:00:07 that increased again at 01-APR-11 10:00:08. Partitioning¶ It is possible to look for patterns in partitioned data, e.g., trends for a single ticker or a particular user. This can be expressed using the PARTITION BY clause. The clause is similar to using GROUP BY for aggregations. It is highly advised to partition the incoming data because otherwise the MATCH_RECOGNIZE clause will be translated into a non-parallel operator to ensure global ordering. Order of events¶ Flink enables searching for patterns based on time, either event time. Processing time is not supported in Confluent Cloud for Apache Flink. In the case of event time, the events are sorted before they are passed to the internal pattern state machine. As a consequence, the produced output will be correct regardless of the order in which rows are appended to the table. Instead, the pattern is evaluated in the order specified by the time contained in each row. The MATCH_RECOGNIZE clause assumes a time attribute with ascending ordering as the first argument to ORDER BY clause. For the example Ticker table, a definition like ORDER BY rowtime ASC, price DESC is valid but ORDER BY price, rowtime or ORDER BY rowtime DESC, price ASC is not. Define and measures¶ The DEFINE and MEASURES keywords have similar meanings to the WHERE and SELECT clauses in a simple SQL query. The MEASURES clause defines what will be included in the output of a matching pattern. It can project columns and define expressions for evaluation. The number of produced rows depends on the output mode setting. The DEFINE clause specifies conditions that rows have to fulfill in order to be classified to a corresponding pattern variable. If a condition isn’t defined for a pattern variable, a default condition is used, which evaluates to TRUE for every row. For a more detailed explanation about expressions that can be used in those clauses, see event stream navigation. Aggregations¶ Aggregations can be used in DEFINE and MEASURES clauses. Built-in functions are supported. Aggregate functions are applied to each subset of rows mapped to a match. To understand how these subsets are evaluated, see event stream navigation section. The task of the following example is to find the longest period of time for which the average price of a ticker did not go below a certain threshold. It shows how expressible MATCH_RECOGNIZE can become with aggregations. The following query performs this task. SELECT * FROM Ticker MATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY rowtime MEASURES FIRST(A.rowtime) AS start_tstamp, LAST(A.rowtime) AS end_tstamp, AVG(A.price) AS avgPrice ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW PATTERN (A+ B) DEFINE A AS AVG(A.price) < 15 ) MR; Given this query and following input values: symbol rowtime price tax ====== ==================== ======= ======= 'ACME' '01-Apr-11 10:00:00' 12 1 'ACME' '01-Apr-11 10:00:01' 17 2 'ACME' '01-Apr-11 10:00:02' 13 1 'ACME' '01-Apr-11 10:00:03' 16 3 'ACME' '01-Apr-11 10:00:04' 25 2 'ACME' '01-Apr-11 10:00:05' 2 1 'ACME' '01-Apr-11 10:00:06' 4 1 'ACME' '01-Apr-11 10:00:07' 10 2 'ACME' '01-Apr-11 10:00:08' 15 2 'ACME' '01-Apr-11 10:00:09' 25 2 'ACME' '01-Apr-11 10:00:10' 25 1 'ACME' '01-Apr-11 10:00:11' 30 1 The query accumulates events as part of the pattern variable A, as long as their average price doesn’t exceed 15. For example, such a limit exceeding happens at 01-Apr-11 10:00:04. The following period exceeds the average price of 15 again at 01-Apr-11 10:00:11. Here are results of the query: symbol start_tstamp end_tstamp avgPrice ========= ================== ================== ============ ACME 01-APR-11 10:00:00 01-APR-11 10:00:03 14.5 ACME 01-APR-11 10:00:05 01-APR-11 10:00:10 13.5 Aggregations can be applied to expressions, but only if they reference a single pattern variable. For example, SUM(A.price * A.tax) is valid, but AVG(A.price * B.tax) is not. Note DISTINCT aggregations aren’t supported. Define a pattern¶ The MATCH_RECOGNIZE clause enables you to search for patterns in event streams using a powerful and expressive syntax that is somewhat similar to the widely used regular expression syntax. Every pattern is constructed from basic building blocks, called pattern variables, to which operators (quantifiers and other modifiers) can be applied. The whole pattern must be enclosed in brackets. The following SQL shows an example pattern: PATTERN (A B+ C* D) You can use the following operators: Concatenation - a pattern like (A B) means that the contiguity is strict between A and B, so there can be no rows that weren’t mapped to A or B in between. Quantifiers - modify the number of rows that can be mapped to the pattern variable. * — 0 or more rows + — 1 or more rows ? — 0 or 1 rows { n } — exactly n rows (n > 0) { n, } — n or more rows (n ≥ 0) { n, m } — between n and m (inclusive) rows (0 ≤ n ≤ m, 0 < m) { , m } — between 0 and m (inclusive) rows (m > 0) Important Patterns that can potentially produce an empty match aren’t supported. For example, patterns like these produce an empty match: PATTERN (A*) PATTERN (A? B*) PATTERN (A{0,} B{0,} C*) Greedy and reluctant quantifiers¶ Each quantifier can be either greedy (default behavior) or reluctant. Greedy quantifiers try to match as many rows as possible, while reluctant quantifiers try to match as few as possible. To see the difference, the following example shows a query where a greedy quantifier is applied to the B variable: SELECT * FROM Ticker MATCH_RECOGNIZE( PARTITION BY symbol ORDER BY rowtime MEASURES C.price AS lastPrice ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW PATTERN (A B* C) DEFINE A AS A.price > 10, B AS B.price < 15, C AS C.price > 12 ) Given the following input: symbol tax price rowtime ======= ===== ======== ===================== XYZ 1 10 2018-09-17 10:00:02 XYZ 2 11 2018-09-17 10:00:03 XYZ 1 12 2018-09-17 10:00:04 XYZ 2 13 2018-09-17 10:00:05 XYZ 1 14 2018-09-17 10:00:06 XYZ 2 16 2018-09-17 10:00:07 The example pattern produces the following output: symbol lastPrice ======== =========== XYZ 16 If the query is modified to be reluctant, changing B* to B*?, it produces the following output: symbol lastPrice ======== =========== XYZ 13 XYZ 16 The pattern variable B matches only the row with price 12 instead of swallowing the rows with prices 12, 13, and 14. You can’t use a greedy quantifier for the last variable of a pattern. So a pattern like (A B*) isn’t valid. You can work around this limitation by introducing an artificial state, like C, that has a negated condition of B. The following query shows an example. PATTERN (A B* C) DEFINE A AS condA(), B AS condB(), C AS NOT condB() Note The optional-reluctant quantifier (A?? or A{0,1}?) isn’t supported. Time constraint¶ Especially for streaming use cases, it’s often required that a pattern finishes within a given period of time. This enables limiting the overall state size that Flink must maintain internally, even in the case of greedy quantifiers. For this reason, Flink SQL supports the additional (non-standard SQL) WITHIN clause for defining a time constraint for a pattern. The clause can be defined after the PATTERN clause and takes an interval of millisecond resolution. If the time between the first and last event of a potential match is longer than the given value, a match isn’t appended to the result table. Note It’s good practice to use the WITHIN clause, because it helps Flink with efficient memory management. Underlying state can be pruned once the threshold is reached. But the WITHIN clause isn’t part of the SQL standard. The recommended way of dealing with time constraints might change in the future. The following example query shows the WITHIN clause used with MATCH_RECOGNIZE. SELECT * FROM Ticker MATCH_RECOGNIZE( PARTITION BY symbol ORDER BY rowtime MEASURES C.rowtime AS dropTime, A.price - C.price AS dropDiff ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW PATTERN (A B* C) WITHIN INTERVAL '1' HOUR DEFINE B AS B.price > A.price - 10, C AS C.price < A.price - 10 ) The query detects a price drop of 10 that happens within an interval of 1 hour. Assume the query is used to analyze the following ticker data. symbol rowtime price tax ====== ==================== ======= ======= 'ACME' '01-Apr-11 10:00:00' 20 1 'ACME' '01-Apr-11 10:20:00' 17 2 'ACME' '01-Apr-11 10:40:00' 18 1 'ACME' '01-Apr-11 11:00:00' 11 3 'ACME' '01-Apr-11 11:20:00' 14 2 'ACME' '01-Apr-11 11:40:00' 9 1 'ACME' '01-Apr-11 12:00:00' 15 1 'ACME' '01-Apr-11 12:20:00' 14 2 'ACME' '01-Apr-11 12:40:00' 24 2 'ACME' '01-Apr-11 13:00:00' 1 2 'ACME' '01-Apr-11 13:20:00' 19 1 The query produces the following results: symbol dropTime dropDiff ====== ==================== ============= 'ACME' '01-Apr-11 13:00:00' 14 The resulting row represents a price drop from 15 (at 01-Apr-11 12:00:00) to 1 (at 01-Apr-11 13:00:00). The dropDiff column contains the price difference. Even though prices also drop by higher values, for example, by 11 (between 01-Apr-11 10:00:00 and 01-Apr-11 11:40:00), the time difference between those two events is larger than 1 hour, they don’t produce a match. Output mode¶ The output mode describes how many rows should be emitted for every found match. The SQL standard describes two modes: ALL ROWS PER MATCH ONE ROW PER MATCH In Flink SQL, the only supported output mode is ONE ROW PER MATCH, and it always produces one output summary row for each found match. The schema of the output row is a concatenation of [partitioning columns] + [measures columns], in that order. The following example shows the output of a query defined as: SELECT * FROM Ticker MATCH_RECOGNIZE( PARTITION BY symbol ORDER BY rowtime MEASURES FIRST(A.price) AS startPrice, LAST(A.price) AS topPrice, B.price AS lastPrice ONE ROW PER MATCH PATTERN (A+ B) DEFINE A AS LAST(A.price, 1) IS NULL OR A.price > LAST(A.price, 1), B AS B.price < LAST(A.price) ) For the following input rows: symbol tax price rowtime ======== ===== ======== ===================== XYZ 1 10 2018-09-17 10:00:02 XYZ 2 12 2018-09-17 10:00:03 XYZ 1 13 2018-09-17 10:00:04 XYZ 2 11 2018-09-17 10:00:05 The query produces the following output: symbol startPrice topPrice lastPrice ======== ============ ========== =========== XYZ 10 13 11 The pattern recognition is partitioned by the symbol column. Even though not explicitly mentioned in the MEASURES clause, the partitioned column is added at the beginning of the result. Pattern navigation¶ The DEFINE and MEASURES clauses enable navigating within the list of rows that (potentially) match a pattern. This section discusses navigation for declaring conditions or producing output results. Pattern variable referencing¶ A pattern variable reference enables referencoing a set of rows mapped to a particular pattern variable in the DEFINE or MEASURES clauses. For example, the expression A.price describes a set of rows mapped so far to A plus the current row, if the query tries to match the current row to A. If an expression in the DEFINE / MEASURES clause requires a single row, for example, A.price or A.price > 10, it selects the last value belonging to the corresponding set. If no pattern variable is specified, for example, SUM(price), an expression references the default pattern variable *, which references all variables in the pattern. In other words, it creates a list of all the rows mapped so far to any variable plus the current row. Example¶ For a more thorough example, consider the following pattern and corresponding conditions. PATTERN (A B+) DEFINE A AS A.price >= 10, B AS B.price > A.price AND SUM(price) < 100 AND SUM(B.price) < 80 The following table describes how these conditions are evaluated for each incoming event. The table consists of the following columns: # - the row identifier that uniquely identifies an incoming row in the lists [A.price] / [B.price] / [price]. price - the price of the incoming row. [A.price]/ [B.price]/ [price] - describe lists of rows which are used in the DEFINE clause to evaluate conditions. Classifier - the classifier of the current row which indicates the pattern variable the row is mapped to. A.price/ B.price/ SUM(price)/ SUM(B.price) - describes the result after those expressions have been evaluated. == ===== ========== ========= ============== ================== ======= ======= ========== ============ # price Classifier [A.price] [B.price] [price] A.price B.price SUM(price) SUM(B.price) == ===== ========== ========= ============== ================== ======= ======= ========== ============ #1 10 -> A #1 - - 10 - - - #2 15 -> B #1 #2 #1, #2 10 15 25 15 #3 20 -> B #1 #2, #3 #1, #2, #3 10 20 45 35 #4 31 -> B #1 #2, #3, #4 #1, #2, #3, #4 10 31 76 66 #5 35 #1 #2, #3, #4, #5 #1, #2, #3, #4, #5 10 35 111 101 == ===== ========== ========= ============== ================== ======= ======= ========== ============ The table shows that the first row is mapped to pattern variable A, and subsequent rows are mapped to pattern variable B. But the last row doesn’t fulfill the B condition, because the sum over all mapped rows, SUM(price), and the sum over all rows in B exceed the specified thresholds. Logical offsets¶ Logical offsets enable navigation within the events that were mapped to a particular pattern variable. This can be expressed with two corresponding functions. Offset functions Description LAST(variable.field, n) Returns the value of the field from the event that was mapped to the n-th last element of the variable. The counting starts at the last element mapped. FIRST(variable.field, n) Returns the value of the field from the event that was mapped to the n-th element of the variable. The counting starts at the first element mapped. Examples¶ For a more thorough example, consider the following pattern and corresponding conditions: PATTERN (A B+) DEFINE A AS A.price >= 10, B AS (LAST(B.price, 1) IS NULL OR B.price > LAST(B.price, 1)) AND (LAST(B.price, 2) IS NULL OR B.price > 2 * LAST(B.price, 2)) The following table describes how these conditions are evaluated for each incoming event. The table consists of the following columns: price - the price of the incoming row. Classifier - the classifier of the current row which indicates the pattern variable the row is mapped to. LAST(B.price, 1)/ LAST(B.price, 2) - describes the result after these expressions have been evaluated. ===== ========== ================ ================ ======================================================================================== price Classifier LAST(B.price, 1) LAST(B.price, 2) Comment ===== ========== ================ ================ ======================================================================================== 10 -> A 15 -> B null null Notice that ``LAST(B.price, 1)`` is null because there is still nothing mapped to ``B``. 20 -> B 15 null 31 -> B 20 15 35 31 20 Not mapped because ``35 < 2 * 20``. ===== ========== ================ ================ ======================================================================================== It might also make sense to use the default pattern variable with logical offsets. In this case, an offset considers all the rows mapped so far: PATTERN (A B? C) DEFINE B AS B.price < 20, C AS LAST(price, 1) < C.price ===== ========== ============== ===================================================================================== price Classifier LAST(price, 1) Comment ===== ========== ============== ===================================================================================== 10 -> A 15 -> B 20 -> C 15 ``LAST(price, 1)`` is evaluated as the price of the row mapped to the ``B`` variable. ===== ========== ============== ===================================================================================== If the second row didn’t map to the B variable, the query returns the following results: ===== ========== ============== ===================================================================================== price Classifier LAST(price, 1) Comment ===== ========== ============== ===================================================================================== 10 -> A 20 -> C 10 ``LAST(price, 1)`` is evaluated as the price of the row mapped to the ``A`` variable. ===== ========== ============== ===================================================================================== It’s also possible to use multiple pattern variable references in the first argument of the FIRST/LAST functions. This way, you can write an expression that accesses multiple columns, but all of them must use the same pattern variable. In other words, the value of the LAST/ FIRST function must be computed in a single row. this means that it’s possible to use LAST(A.price * A.tax), but an expression like LAST(A.price * B.tax) is not valid. After-match strategy¶ The AFTER MATCH SKIP clause specifies where to start a new matching procedure after a complete match was found. There are four different strategies: SKIP PAST LAST ROW - resumes the pattern matching at the next row after the last row of the current match. SKIP TO NEXT ROW - continues searching for a new match starting at the next row after the starting row of the match. SKIP TO LAST variable - resumes the pattern matching at the last row that is mapped to the specified pattern variable. SKIP TO FIRST variable - resumes the pattern matching at the first row that is mapped to the specified pattern variable. This is also a way to specify how many matches a single event can belong to. For example, with the SKIP PAST LAST ROW strategy, every event can belong to at most one match. Examples¶ To better understand the differences between these strategies consider the following example. For the following input rows: symbol tax price rowtime ======== ===== ======= ===================== XYZ 1 7 2018-09-17 10:00:01 XYZ 2 9 2018-09-17 10:00:02 XYZ 1 10 2018-09-17 10:00:03 XYZ 2 5 2018-09-17 10:00:04 XYZ 2 10 2018-09-17 10:00:05 XYZ 2 7 2018-09-17 10:00:06 XYZ 2 14 2018-09-17 10:00:07 Evaluate the following query with different strategies: SELECT * FROM Ticker MATCH_RECOGNIZE( PARTITION BY symbol ORDER BY rowtime MEASURES SUM(A.price) AS sumPrice, FIRST(rowtime) AS startTime, LAST(rowtime) AS endTime ONE ROW PER MATCH [AFTER MATCH STRATEGY] PATTERN (A+ C) DEFINE A AS SUM(A.price) < 30 ) The query returns the sum of the prices of all rows mapped to A and the first and last timestamp of the overall match. The query produces different results based on which AFTER MATCH strategy is used: AFTER MATCH SKIP PAST LAST ROW¶ symbol sumPrice startTime endTime ======== ========== ===================== ===================== XYZ 26 2018-09-17 10:00:01 2018-09-17 10:00:04 XYZ 17 2018-09-17 10:00:05 2018-09-17 10:00:07 The first result matched against the rows #1, #2, #3, #4. The second result matched against the rows #5, #6, #7. AFTER MATCH SKIP TO NEXT ROW¶ symbol sumPrice startTime endTime ======== ========== ===================== ===================== XYZ 26 2018-09-17 10:00:01 2018-09-17 10:00:04 XYZ 24 2018-09-17 10:00:02 2018-09-17 10:00:05 XYZ 25 2018-09-17 10:00:03 2018-09-17 10:00:06 XYZ 22 2018-09-17 10:00:04 2018-09-17 10:00:07 XYZ 17 2018-09-17 10:00:05 2018-09-17 10:00:07 Again, the first result matched against the rows #1, #2, #3, #4. Compared to the previous strategy, the next match includes row #2 again for the next matching. Therefore, the second result matched against the rows #2, #3, #4, #5. The third result matched against the rows #3, #4, #5, #6. The forth result matched against the rows #4, #5, #6, #7. The last result matched against the rows #5, #6, #7. AFTER MATCH SKIP TO LAST A¶ symbol sumPrice startTime endTime ======== ========== ===================== ===================== XYZ 26 2018-09-17 10:00:01 2018-09-17 10:00:04 XYZ 25 2018-09-17 10:00:03 2018-09-17 10:00:06 XYZ 17 2018-09-17 10:00:05 2018-09-17 10:00:07 Again, the first result matched against the rows #1, #2, #3, #4. Compared to the previous strategy, the next match includes only row #3 (mapped to A) again for the next matching. Therefore, the second result matched against the rows #3, #4, #5, #6. The last result matched against the rows #5, #6, #7. AFTER MATCH SKIP TO FIRST A¶ This combination produces a runtime exception, because one would always try to start a new match where the last one started. This would produce an infinite loop and, so it’s not valid. In case of the SKIP TO FIRST/LAST variable strategy, it may be possible that there are no rows mapped to that variable, for example, for pattern A*. In such cases, a runtime exception is thrown, because the standard requires a valid row to continue the matching. Time attributes¶ To apply some subsequent queries on top of the MATCH_RECOGNIZE it may be necessary to use time attributes. There are two functions for selecting these: MATCH_ROWTIME([rowtime_field])Returns the timestamp of the last row that was mapped to the given pattern. The function accepts zero or one operand, which is a field reference with rowtime attribute. If there is no operand, the function returns the rowtime attribute with TIMESTAMP type. Otherwise, the return type is same as the operand type. The resulting attribute is a rowtime attribute that you can use in subsequent time-based operations, like interval joins and group window or over-window aggregations. Control memory consumption¶ Memory consumption is an important consideration when writing MATCH_RECOGNIZE queries, because the space of potential matches is built in a breadth-first-like manner. This means that you must ensure that the pattern can finish, preferably with a reasonable number of rows mapped to the match, as they have to fit into memory. For example, the pattern must not have a quantifier without an upper limit that accepts every single row. Such a pattern could look like this: PATTERN (A B+ C) DEFINE A as A.price > 10, C as C.price > 20 This query maps every incoming row to the B variable, so it never finishes. This query could be fixed, for example, by negating the condition for C: PATTERN (A B+ C) DEFINE A as A.price > 10, B as B.price <= 20, C as C.price > 20 Also, the query could be fixed by using the reluctant quantifier: PATTERN (A B+? C) DEFINE A as A.price > 10, C as C.price > 20 Note The MATCH_RECOGNIZE clause doesn’t use a configured state retention time. You may want to use the WITHIN clause for this purpose. Known limitations¶ The Flink SQL implementation of the MATCH_RECOGNIZE clause is an ongoing effort, and some features of the SQL standard are not yet supported. Unsupported features include: Pattern expressions Pattern groups - this means that e.g. quantifiers can not be applied to a subsequence of the pattern. Thus, (A (B C)+) is not a valid pattern. Alterations - patterns like PATTERN((A B | C D) E), which means that either a subsequence A B or C D has to be found before looking for the E row. PERMUTE operator - which is equivalent to all permutations of variables that it was applied to e.g. PATTERN (PERMUTE (A, B, C)) = PATTERN (A B C | A C B | B A C | B C A | C A B | C B A). Anchors - ^, $, which denote beginning/end of a partition, those do not make sense in the streaming context and will not be supported. Exclusion - PATTERN ({- A -} B) meaning that A will be looked for but will not participate in the output. This works only for the ALL ROWS PER MATCH mode. Reluctant optional quantifier - PATTERN A?? only the greedy optional quantifier is supported. ALL ROWS PER MATCH output mode - which produces an output row for every row that participated in the creation of a found match. This also means: The only supported semantic for the MEASURES clause is FINAL. CLASSIFIER function, which returns the pattern variable that a row was mapped to, is not yet supported. SUBSET - which allows creating logical groups of pattern variables and using those groups in the DEFINE and MEASURES clauses. Physical offsets - PREV/NEXT, which indexes all events seen rather than only those that were mapped to a pattern variable (as in the logical offsets case). MATCH_RECOGNIZE is supported only for SQL. There is no equivalent in the Table API. Aggregations Distinct aggregations are not supported. Related content¶ Time Attributes Flink SQL Queries Flink SQL Functions Statements Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql SELECT T.aid, T.bid, T.cid FROM MyTable MATCH_RECOGNIZE ( PARTITION BY userid ORDER BY $rowtime MEASURES A.id AS aid, B.id AS bid, C.id AS cid PATTERN (A B C) DEFINE A AS name = 'a', B AS name = 'b', C AS name = 'c' ) AS T ``` ```sql MATCH_RECOGNIZE ``` ```sql MATCH_RECOGNIZE ``` ```sql PARTITION BY ``` ```sql MATCH_RECOGNIZE ``` ```sql MATCH_RECOGNIZE ``` ```sql MATCH_RECOGNIZE ``` ```sql Ticker |-- symbol: String # symbol of the stock |-- price: Long # price of the stock |-- tax: Long # tax liability of the stock |-- rowtime: TimeIndicatorTypeInfo(rowtime) # point in time when the change to those values happened ``` ```sql symbol rowtime price tax ====== ==================== ======= ======= 'ACME' '01-Apr-11 10:00:00' 12 1 'ACME' '01-Apr-11 10:00:01' 17 2 'ACME' '01-Apr-11 10:00:02' 19 1 'ACME' '01-Apr-11 10:00:03' 21 3 'ACME' '01-Apr-11 10:00:04' 25 2 'ACME' '01-Apr-11 10:00:05' 18 1 'ACME' '01-Apr-11 10:00:06' 15 1 'ACME' '01-Apr-11 10:00:07' 14 2 'ACME' '01-Apr-11 10:00:08' 24 2 'ACME' '01-Apr-11 10:00:09' 25 2 'ACME' '01-Apr-11 10:00:10' 19 1 ``` ```sql SELECT * FROM Ticker MATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY $rowtime MEASURES START_ROW.rowtime AS start_tstamp, LAST(PRICE_DOWN.$rowtime) AS bottom_tstamp, LAST(PRICE_UP.$rowtime) AS end_tstamp ONE ROW PER MATCH AFTER MATCH SKIP TO LAST PRICE_UP PATTERN (START_ROW PRICE_DOWN+ PRICE_UP) DEFINE PRICE_DOWN AS (LAST(PRICE_DOWN.price, 1) IS NULL AND PRICE_DOWN.price < START_ROW.price) OR PRICE_DOWN.price < LAST(PRICE_DOWN.price, 1), PRICE_UP AS PRICE_UP.price > LAST(PRICE_DOWN.price, 1) ) MR; ``` ```sql AFTER MATCH SKIP TO LAST ``` ```sql ONE ROW PER MATCH ``` ```sql symbol start_tstamp bottom_tstamp end_tstamp ========= ================== ================== ================== ACME 01-APR-11 10:00:04 01-APR-11 10:00:07 01-APR-11 10:00:08 ``` ```sql 01-APR-11 10:00:04 ``` ```sql 01-APR-11 10:00:07 ``` ```sql 01-APR-11 10:00:08 ``` ```sql PARTITION BY ``` ```sql MATCH_RECOGNIZE ``` ```sql MATCH_RECOGNIZE ``` ```sql ORDER BY rowtime ASC, price DESC ``` ```sql ORDER BY price, rowtime ``` ```sql ORDER BY rowtime DESC, price ASC ``` ```sql MATCH_RECOGNIZE ``` ```sql SELECT * FROM Ticker MATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY rowtime MEASURES FIRST(A.rowtime) AS start_tstamp, LAST(A.rowtime) AS end_tstamp, AVG(A.price) AS avgPrice ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW PATTERN (A+ B) DEFINE A AS AVG(A.price) < 15 ) MR; ``` ```sql symbol rowtime price tax ====== ==================== ======= ======= 'ACME' '01-Apr-11 10:00:00' 12 1 'ACME' '01-Apr-11 10:00:01' 17 2 'ACME' '01-Apr-11 10:00:02' 13 1 'ACME' '01-Apr-11 10:00:03' 16 3 'ACME' '01-Apr-11 10:00:04' 25 2 'ACME' '01-Apr-11 10:00:05' 2 1 'ACME' '01-Apr-11 10:00:06' 4 1 'ACME' '01-Apr-11 10:00:07' 10 2 'ACME' '01-Apr-11 10:00:08' 15 2 'ACME' '01-Apr-11 10:00:09' 25 2 'ACME' '01-Apr-11 10:00:10' 25 1 'ACME' '01-Apr-11 10:00:11' 30 1 ``` ```sql 01-Apr-11 10:00:04 ``` ```sql 01-Apr-11 10:00:11 ``` ```sql symbol start_tstamp end_tstamp avgPrice ========= ================== ================== ============ ACME 01-APR-11 10:00:00 01-APR-11 10:00:03 14.5 ACME 01-APR-11 10:00:05 01-APR-11 10:00:10 13.5 ``` ```sql SUM(A.price * A.tax) ``` ```sql AVG(A.price * B.tax) ``` ```sql MATCH_RECOGNIZE ``` ```sql PATTERN (A B+ C* D) ``` ```sql PATTERN (A*) PATTERN (A? B*) PATTERN (A{0,} B{0,} C*) ``` ```sql SELECT * FROM Ticker MATCH_RECOGNIZE( PARTITION BY symbol ORDER BY rowtime MEASURES C.price AS lastPrice ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW PATTERN (A B* C) DEFINE A AS A.price > 10, B AS B.price < 15, C AS C.price > 12 ) ``` ```sql symbol tax price rowtime ======= ===== ======== ===================== XYZ 1 10 2018-09-17 10:00:02 XYZ 2 11 2018-09-17 10:00:03 XYZ 1 12 2018-09-17 10:00:04 XYZ 2 13 2018-09-17 10:00:05 XYZ 1 14 2018-09-17 10:00:06 XYZ 2 16 2018-09-17 10:00:07 ``` ```sql symbol lastPrice ======== =========== XYZ 16 ``` ```sql symbol lastPrice ======== =========== XYZ 13 XYZ 16 ``` ```sql PATTERN (A B* C) DEFINE A AS condA(), B AS condB(), C AS NOT condB() ``` ```sql MATCH_RECOGNIZE ``` ```sql SELECT * FROM Ticker MATCH_RECOGNIZE( PARTITION BY symbol ORDER BY rowtime MEASURES C.rowtime AS dropTime, A.price - C.price AS dropDiff ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW PATTERN (A B* C) WITHIN INTERVAL '1' HOUR DEFINE B AS B.price > A.price - 10, C AS C.price < A.price - 10 ) ``` ```sql symbol rowtime price tax ====== ==================== ======= ======= 'ACME' '01-Apr-11 10:00:00' 20 1 'ACME' '01-Apr-11 10:20:00' 17 2 'ACME' '01-Apr-11 10:40:00' 18 1 'ACME' '01-Apr-11 11:00:00' 11 3 'ACME' '01-Apr-11 11:20:00' 14 2 'ACME' '01-Apr-11 11:40:00' 9 1 'ACME' '01-Apr-11 12:00:00' 15 1 'ACME' '01-Apr-11 12:20:00' 14 2 'ACME' '01-Apr-11 12:40:00' 24 2 'ACME' '01-Apr-11 13:00:00' 1 2 'ACME' '01-Apr-11 13:20:00' 19 1 ``` ```sql symbol dropTime dropDiff ====== ==================== ============= 'ACME' '01-Apr-11 13:00:00' 14 ``` ```sql 01-Apr-11 12:00:00 ``` ```sql 01-Apr-11 13:00:00 ``` ```sql 01-Apr-11 10:00:00 ``` ```sql 01-Apr-11 11:40:00 ``` ```sql ALL ROWS PER MATCH ``` ```sql ONE ROW PER MATCH ``` ```sql ONE ROW PER MATCH ``` ```sql [partitioning columns] + [measures columns] ``` ```sql SELECT * FROM Ticker MATCH_RECOGNIZE( PARTITION BY symbol ORDER BY rowtime MEASURES FIRST(A.price) AS startPrice, LAST(A.price) AS topPrice, B.price AS lastPrice ONE ROW PER MATCH PATTERN (A+ B) DEFINE A AS LAST(A.price, 1) IS NULL OR A.price > LAST(A.price, 1), B AS B.price < LAST(A.price) ) ``` ```sql symbol tax price rowtime ======== ===== ======== ===================== XYZ 1 10 2018-09-17 10:00:02 XYZ 2 12 2018-09-17 10:00:03 XYZ 1 13 2018-09-17 10:00:04 XYZ 2 11 2018-09-17 10:00:05 ``` ```sql symbol startPrice topPrice lastPrice ======== ============ ========== =========== XYZ 10 13 11 ``` ```sql A.price > 10 ``` ```sql PATTERN (A B+) DEFINE A AS A.price >= 10, B AS B.price > A.price AND SUM(price) < 100 AND SUM(B.price) < 80 ``` ```sql SUM(B.price) ``` ```sql == ===== ========== ========= ============== ================== ======= ======= ========== ============ # price Classifier [A.price] [B.price] [price] A.price B.price SUM(price) SUM(B.price) == ===== ========== ========= ============== ================== ======= ======= ========== ============ #1 10 -> A #1 - - 10 - - - #2 15 -> B #1 #2 #1, #2 10 15 25 15 #3 20 -> B #1 #2, #3 #1, #2, #3 10 20 45 35 #4 31 -> B #1 #2, #3, #4 #1, #2, #3, #4 10 31 76 66 #5 35 #1 #2, #3, #4, #5 #1, #2, #3, #4, #5 10 35 111 101 == ===== ========== ========= ============== ================== ======= ======= ========== ============ ``` ```sql LAST(variable.field, n) ``` ```sql FIRST(variable.field, n) ``` ```sql PATTERN (A B+) DEFINE A AS A.price >= 10, B AS (LAST(B.price, 1) IS NULL OR B.price > LAST(B.price, 1)) AND (LAST(B.price, 2) IS NULL OR B.price > 2 * LAST(B.price, 2)) ``` ```sql LAST(B.price, 1) ``` ```sql LAST(B.price, 2) ``` ```sql ===== ========== ================ ================ ======================================================================================== price Classifier LAST(B.price, 1) LAST(B.price, 2) Comment ===== ========== ================ ================ ======================================================================================== 10 -> A 15 -> B null null Notice that ``LAST(B.price, 1)`` is null because there is still nothing mapped to ``B``. 20 -> B 15 null 31 -> B 20 15 35 31 20 Not mapped because ``35 < 2 * 20``. ===== ========== ================ ================ ======================================================================================== ``` ```sql PATTERN (A B? C) DEFINE B AS B.price < 20, C AS LAST(price, 1) < C.price ``` ```sql ===== ========== ============== ===================================================================================== price Classifier LAST(price, 1) Comment ===== ========== ============== ===================================================================================== 10 -> A 15 -> B 20 -> C 15 ``LAST(price, 1)`` is evaluated as the price of the row mapped to the ``B`` variable. ===== ========== ============== ===================================================================================== ``` ```sql ===== ========== ============== ===================================================================================== price Classifier LAST(price, 1) Comment ===== ========== ============== ===================================================================================== 10 -> A 20 -> C 10 ``LAST(price, 1)`` is evaluated as the price of the row mapped to the ``A`` variable. ===== ========== ============== ===================================================================================== ``` ```sql LAST(A.price * A.tax) ``` ```sql LAST(A.price * B.tax) ``` ```sql AFTER MATCH SKIP ``` ```sql SKIP PAST LAST ROW ``` ```sql SKIP TO NEXT ROW ``` ```sql SKIP TO LAST variable ``` ```sql SKIP TO FIRST variable ``` ```sql SKIP PAST LAST ROW ``` ```sql symbol tax price rowtime ======== ===== ======= ===================== XYZ 1 7 2018-09-17 10:00:01 XYZ 2 9 2018-09-17 10:00:02 XYZ 1 10 2018-09-17 10:00:03 XYZ 2 5 2018-09-17 10:00:04 XYZ 2 10 2018-09-17 10:00:05 XYZ 2 7 2018-09-17 10:00:06 XYZ 2 14 2018-09-17 10:00:07 ``` ```sql SELECT * FROM Ticker MATCH_RECOGNIZE( PARTITION BY symbol ORDER BY rowtime MEASURES SUM(A.price) AS sumPrice, FIRST(rowtime) AS startTime, LAST(rowtime) AS endTime ONE ROW PER MATCH [AFTER MATCH STRATEGY] PATTERN (A+ C) DEFINE A AS SUM(A.price) < 30 ) ``` ```sql AFTER MATCH ``` ```sql AFTER MATCH SKIP PAST LAST ROW ``` ```sql symbol sumPrice startTime endTime ======== ========== ===================== ===================== XYZ 26 2018-09-17 10:00:01 2018-09-17 10:00:04 XYZ 17 2018-09-17 10:00:05 2018-09-17 10:00:07 ``` ```sql AFTER MATCH SKIP TO NEXT ROW ``` ```sql symbol sumPrice startTime endTime ======== ========== ===================== ===================== XYZ 26 2018-09-17 10:00:01 2018-09-17 10:00:04 XYZ 24 2018-09-17 10:00:02 2018-09-17 10:00:05 XYZ 25 2018-09-17 10:00:03 2018-09-17 10:00:06 XYZ 22 2018-09-17 10:00:04 2018-09-17 10:00:07 XYZ 17 2018-09-17 10:00:05 2018-09-17 10:00:07 ``` ```sql AFTER MATCH SKIP TO LAST A ``` ```sql symbol sumPrice startTime endTime ======== ========== ===================== ===================== XYZ 26 2018-09-17 10:00:01 2018-09-17 10:00:04 XYZ 25 2018-09-17 10:00:03 2018-09-17 10:00:06 XYZ 17 2018-09-17 10:00:05 2018-09-17 10:00:07 ``` ```sql AFTER MATCH SKIP TO FIRST A ``` ```sql SKIP TO FIRST/LAST variable ``` ```sql MATCH_RECOGNIZE ``` ```sql MATCH_ROWTIME([rowtime_field]) ``` ```sql MATCH_RECOGNIZE ``` ```sql PATTERN (A B+ C) DEFINE A as A.price > 10, C as C.price > 20 ``` ```sql PATTERN (A B+ C) DEFINE A as A.price > 10, B as B.price <= 20, C as C.price > 20 ``` ```sql PATTERN (A B+? C) DEFINE A as A.price > 10, C as C.price > 20 ``` ```sql MATCH_RECOGNIZE ``` ```sql MATCH_RECOGNIZE ``` ```sql PATTERN((A B | C D) E) ``` ```sql PATTERN (PERMUTE (A, B, C)) ``` ```sql PATTERN (A B C | A C B | B A C | B C A | C A B | C B A) ``` ```sql PATTERN ({- A -} B) ``` ```sql ALL ROWS PER MATCH ``` ```sql PATTERN A?? ``` ```sql ALL ROWS PER MATCH ``` ```sql MATCH_RECOGNIZE ``` --- ### SQL ORDER BY Clause in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/queries/orderby.html ORDER BY Clause in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables sorting rows from a SELECT statement. Description¶ The ORDER BY clause causes the result rows to be sorted according to the specified expression(s). If two rows are equal according to the leftmost expression, they are compared according to the next expression, and so on. If they are equal according to all specified expressions, they are returned in an implementation-dependent order. When running in streaming mode, the primary sort order of a table must be ascending on a time attribute. All subsequent sort orders can be freely chosen. When running in batch mode, there is no sort-order limitation. Example¶ SELECT * FROM orders ORDER BY order_time, order_id Related content¶ Flink SQL Queries Flink SQL Functions Statements Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql SELECT * FROM orders ORDER BY order_time, order_id ``` --- ### SQL OVER Aggregation Queries in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/queries/over-aggregation.html OVER Aggregation Queries in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables computing an aggregated value for every row over a range of ordered rows. Syntax¶ SELECT agg_func(agg_col) OVER ( [PARTITION BY column1[, column2, ...]] ORDER BY time_column range_definition), ... FROM ... Description¶ Compute an aggregated value for every row over a range of ordered rows. OVER aggregates compute an aggregated value for every input row over a range of ordered rows. In contrast to a GROUP BY aggregate, an OVER aggregate doesn’t reduce the number of result rows to a single row for every group. Instead, an OVER aggregate produces an aggregated value for every input row. You can define multiple OVER window aggregates in a SELECT clause. However, for streaming queries, the OVER windows for all aggregates must be identical due to current limitation. ORDER BY¶ OVER windows are defined on an ordered sequence of rows. Since tables do not have an inherent order, the ORDER BY clause is mandatory. For streaming queries, Flink currently only supports OVER windows that are defined with an ascending time attributes order. Additional orderings are not supported. PARTITION BY¶ OVER windows can be defined on a partitioned table. In presence of a PARTITION BY clause, the aggregate is computed for each input row only over the rows of its partition. Range Definitions¶ The range definition specifies how many rows are included in the aggregate. The range is defined with a BETWEEN clause that defines a lower and an upper boundary. All rows between these boundaries are included in the aggregate. Flink only supports CURRENT ROW as the upper boundary. There are two options to define the range, ROWS intervals and RANGE intervals. RANGE intervals¶ A RANGE interval is defined on the values of the ORDER BY column, which is in case of Flink always a time attribute. The following RANGE interval defines that all rows with a time attribute of at most 30 minutes less than the current row are included in the aggregate. RANGE BETWEEN INTERVAL '30' MINUTE PRECEDING AND CURRENT ROW ROW intervals¶ A ROWS interval is a count-based interval. It defines exactly how many rows are included in the aggregate. The following ROWS interval defines that the 10 rows preceding the current row and the current row (so 11 rows in total) are included in the aggregate. ROWS BETWEEN 10 PRECEDING AND CURRENT ROW The WINDOW clause can be used to define an OVER window outside of the SELECT clause. It can make queries more readable and also allows us to reuse the window definition for multiple aggregates. SELECT order_id, order_time, amount, SUM(amount) OVER w AS sum_amount, AVG(amount) OVER w AS avg_amount FROM orders WINDOW w AS ( PARTITION BY product ORDER BY order_time RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW) Example¶ The following query computes for every order the sum of amounts of all orders for the same product that were received within one hour before the current order. SELECT order_id, order_time, amount, SUM(amount) OVER ( PARTITION BY product ORDER BY order_time RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW ) AS one_hour_prod_amount_sum FROM orders Related content¶ Confluent Developer: OVER aggregations Time Attributes Flink SQL Queries Flink SQL Functions Statements Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql SELECT agg_func(agg_col) OVER ( [PARTITION BY column1[, column2, ...]] ORDER BY time_column range_definition), ... FROM ... ``` ```sql PARTITION BY ``` ```sql CURRENT ROW ``` ```sql RANGE BETWEEN INTERVAL '30' MINUTE PRECEDING AND CURRENT ROW ``` ```sql ROWS BETWEEN 10 PRECEDING AND CURRENT ROW ``` ```sql SELECT order_id, order_time, amount, SUM(amount) OVER w AS sum_amount, AVG(amount) OVER w AS avg_amount FROM orders WINDOW w AS ( PARTITION BY product ORDER BY order_time RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW) ``` ```sql SELECT order_id, order_time, amount, SUM(amount) OVER ( PARTITION BY product ORDER BY order_time RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW ) AS one_hour_prod_amount_sum FROM orders ``` --- ### SQL Queries in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/queries/overview.html Flink SQL Queries in Confluent Cloud for Apache Flink¶ In Confluent Cloud for Apache Flink®, Data Manipulation Language (DML) statements, also known as queries, are declarative verbs that read and modify data in Apache Flink® tables. Unlike Data Definition Language (DDL) statements, DML statements modify only data and don’t change metadata. When you want to change metadata, use DDL statements. These are the available DML statements in Confluent Cloud for Flink SQL. Deduplication Queries in Confluent Cloud for Apache Flink Group Aggregation Queries in Confluent Cloud for Apache Flink INSERT INTO FROM SELECT Statement in Confluent Cloud for Apache Flink INSERT VALUES Statement in Confluent Cloud for Apache Flink Interval joins LIMIT Clause in Confluent Cloud for Apache Flink EXECUTE STATEMENT SET in Confluent Cloud for Apache Flink ORDER BY Clause in Confluent Cloud for Apache Flink Pattern Recognition Queries in Confluent Cloud for Apache Flink Regular joins SELECT Statement in Confluent Cloud for Apache Flink Set Logic in Confluent Cloud for Apache Flink Temporal joins Top-N Queries in Confluent Cloud for Apache Flink Window Aggregation Queries in Confluent Cloud for Apache Flink Window Deduplication Queries in Confluent Cloud for Apache Flink Window Join Queries in Confluent Cloud for Apache Flink Window Top-N Queries in Confluent Cloud for Apache Flink Windowing Table-Valued Functions (Windowing TVFs) in Confluent Cloud for Apache Flink WITH Clause in Confluent Cloud for Apache Flink Prerequisites¶ You need the following prerequisites to use Confluent Cloud for Apache Flink. Access to Confluent Cloud. The organization ID, environment ID, and compute pool ID for your organization. The OrganizationAdmin, EnvironmentAdmin, or FlinkAdmin role for creating compute pools, or the FlinkDeveloper role if you already have a compute pool. If you don’t have the appropriate role, reach out to your OrganizationAdmin or EnvironmentAdmin. The Confluent CLI. To use the Flink SQL shell, update to the latest version of the Confluent CLI by running the following command: confluent update --yes If you used homebrew to install the Confluent CLI, update the CLI by using the brew upgrade command, instead of confluent update. For more information, see Confluent CLI. Use a workspace or the Flink SQL shell¶ You can run queries and statements either in a Confluent Cloud Console workspace or in the Flink SQL shell. To run queries in the Confluent Cloud Console, follow these steps. Log in to the Confluent Cloud Console. Navigate to the Environments page. Click the tile that has the environment where your Flink compute pools are provisioned. Click Flink. The Compute Pools list opens. In the compute pool where you want to run statements, click Open SQL workspace. The workspace opens with a cell for editing SQL statements. To run queries in the Flink SQL shell, run the following command: confluent flink shell --compute-pool --environment You’re ready to run your first Flink SQL query. Hello SQL¶ Run the following simple query to print “Hello SQL”. SELECT 'Hello SQL'; Your output should resemble: EXPR$0 Hello SQL Run the following query to aggregate values in a table. SELECT Name, COUNT(*) AS Num FROM (VALUES ('Neo'), ('Trinity'), ('Morpheus'), ('Trinity')) AS NameTable(Name) GROUP BY Name; Your output should resemble: Name Num Neo 1 Morpheus 1 Trinity 2 Functions¶ Flink supports many built-in functions that help you build sophisticated SQL queries. Run the SHOW FUNCTIONS statement to see the full list of built-in functions. SHOW FUNCTIONS; Your output should resemble: +------------------------+ | function name | +------------------------+ | % | | * | | + | | - | | / | | < | | <= | | <> | | = | | > | | >= | | ABS | | ACOS | | AND | | ARRAY | | ARRAY_CONTAINS | | ... Run the following statement to execute the built-in CURRENT_TIMESTAMP function, which returns the local machine’s current system time. SELECT CURRENT_TIMESTAMP; Your output should resemble: CURRENT_TIMESTAMP 2024-01-17 13:07:43.537 Run the following statement to compute the cosine of 0. SELECT COS(0) AS cosine; Your output should resemble: cosine 1.0 Source Tables¶ As with all SQL engines, Flink SQL queries operate on rows in tables. But unlike traditional databases, Flink doesn’t manage data-at-rest in a local store. Instead, Flink SQL queries operate continuously over external tables. Flink data processing pipelines begin with source tables. Source tables produce rows operated over during the query’s execution; they are the tables referenced in the FROM clause of a query. Tables are created automatically in Confluent Cloud from all the Apache Kafka® topics. Also, you can create tables by using the SQL shell. The Flink SQL shell supports SQL DDL commands similar to traditional SQL. Standard SQL DDL is used to create and alter tables. The following statement creates an employee_information table. CREATE TABLE employee_information( emp_id INT, name VARCHAR, dept_id INT); Confluent Cloud creates the corresponding employee_information topic automatically. Continuous Queries¶ You can define a continuous foreground query from the employee_information table that reads new rows as they are made available and immediately outputs their results. For example, you can filter for the employees who work in department 1. SELECT * from employee_information WHERE dept_id = 1; Although SQL wasn’t designed initially with streaming semantics in mind, it’s a powerful tool for building continuous data pipelines. A Flink query differs from a traditional database query by consuming rows continuously as they arrive and producing updates to the query results. A continuous query never terminates and produces a dynamic table as a result. Dynamic tables are the core concept of Flink’s SQL support for streaming data. Aggregations on continuous streams must store aggregated results continuously during the execution of the query. For example, suppose you need to count the number of employees for each department from an incoming data stream. To output timely results as new rows are processed, the query must maintain the most up-to-date count for each department. SELECT dept_id, COUNT(*) as emp_count FROM employee_information GROUP BY dept_id; Such queries are considered stateful. Flink’s advanced fault-tolerance mechanism maintains internal state and consistency, so queries always return the correct result, even in the face of hardware failure. Sink Tables¶ When running the previous query, the Flink SQL provides output in real-time but in a read-only fashion. Storing results - to power a report or dashboard - requires writing out to another table. You can achieve this by using an INSERT INTO statement. The table referenced in this clause is known as a sink table. An INSERT INTO statement is submitted as a detached query to Flink. INSERT INTO department_counts SELECT dept_id, COUNT(*) as emp_count FROM employee_information; Once submitted, this query runs and stores the results into the sink table directly, instead of loading the results into the system memory. Syntax¶ Flink parses SQL using Apache Calcite, which supports standard ANSI SQL. The following BNF-grammar describes the superset of supported SQL features. query: values | WITH withItem [ , withItem ]* query | { select | selectWithoutFrom | query UNION [ ALL ] query | query EXCEPT query | query INTERSECT query } [ ORDER BY orderItem [, orderItem ]* ] [ LIMIT { count | ALL } ] [ OFFSET start { ROW | ROWS } ] [ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY] withItem: name [ '(' column [, column ]* ')' ] AS '(' query ')' orderItem: expression [ ASC | DESC ] select: SELECT [ ALL | DISTINCT ] { * | projectItem [, projectItem ]* } FROM tableExpression [ WHERE booleanExpression ] [ GROUP BY { groupItem [, groupItem ]* } ] [ HAVING booleanExpression ] [ WINDOW windowName AS windowSpec [, windowName AS windowSpec ]* ] selectWithoutFrom: SELECT [ ALL | DISTINCT ] { * | projectItem [, projectItem ]* } projectItem: expression [ [ AS ] columnAlias ] | tableAlias . * tableExpression: tableReference [, tableReference ]* | tableExpression [ NATURAL ] [ LEFT | RIGHT | FULL ] JOIN tableExpression [ joinCondition ] joinCondition: ON booleanExpression | USING '(' column [, column ]* ')' tableReference: tablePrimary [ matchRecognize ] [ [ AS ] alias [ '(' columnAlias [, columnAlias ]* ')' ] ] tablePrimary: [ TABLE ] tablePath [ dynamicTableOptions ] [systemTimePeriod] [[AS] correlationName] | LATERAL TABLE '(' functionName '(' expression [, expression ]* ')' ')' | [ LATERAL ] '(' query ')' | UNNEST '(' expression ')' tablePath: [ [ catalogName . ] databaseName . ] tableName systemTimePeriod: FOR SYSTEM_TIME AS OF dateTimeExpression dynamicTableOptions: /*+ OPTIONS(key=val [, key=val]*) */ key: stringLiteral val: stringLiteral values: VALUES expression [, expression ]* groupItem: expression | '(' ')' | '(' expression [, expression ]* ')' | CUBE '(' expression [, expression ]* ')' | ROLLUP '(' expression [, expression ]* ')' | GROUPING SETS '(' groupItem [, groupItem ]* ')' windowRef: windowName | windowSpec windowSpec: [ windowName ] '(' [ ORDER BY orderItem [, orderItem ]* ] [ PARTITION BY expression [, expression ]* ] [ RANGE numericOrIntervalExpression {PRECEDING} | ROWS numericExpression {PRECEDING} ] ')' matchRecognize: MATCH_RECOGNIZE '(' [ PARTITION BY expression [, expression ]* ] [ ORDER BY orderItem [, orderItem ]* ] [ MEASURES measureColumn [, measureColumn ]* ] [ ONE ROW PER MATCH ] [ AFTER MATCH ( SKIP TO NEXT ROW | SKIP PAST LAST ROW | SKIP TO FIRST variable | SKIP TO LAST variable | SKIP TO variable ) ] PATTERN '(' pattern ')' [ WITHIN intervalLiteral ] DEFINE variable AS condition [, variable AS condition ]* ')' measureColumn: expression AS alias pattern: patternTerm [ '|' patternTerm ]* patternTerm: patternFactor [ patternFactor ]* patternFactor: variable [ patternQuantifier ] patternQuantifier: '*' | '*?' | '+' | '+?' | '?' | '??' | '{' { [ minRepeat ], [ maxRepeat ] } '}' ['?'] | '{' repeat '}' statementSet: EXECUTE STATEMENT SET BEGIN { insertStatement ';' }+ END ';' Flink uses a lexical policy for identifier (table, attribute, function names) that’s similar to Java. The case of identifiers is preserved whether or not they are quoted. After which, identifiers are matched case-sensitively. Unlike Java, back-ticks enable identifiers to contain non-alphanumeric characters, for example: SELECT a AS `my field` FROM t; String literals must be enclosed in single quotes, for example, SELECT 'Hello World'. Duplicate a single quote for escaping, for example, SELECT 'It''s me'. SELECT 'Hello World', 'It''s me'; Your output should resemble: EXPR$0 EXPR$1 Hello World It's me Unicode characters are supported in string literals. If explicit unicode code points are required, use the following syntax. Use the backslash (\) as the escaping character (default), for example, SELECT U&'\263A': SELECT U&'\263A'; Your output should resemble: EXPR$0 ☺ Also, you can use a custom escaping character with UESCAPE, for example, SELECT U&'#2713' UESCAPE '#': SELECT U&'#2713' UESCAPE '#'; Your output should resemble: EXPR$0 ✓ Related content¶ DDL Statements Stream Processing Concepts Built-in Functions Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql confluent update --yes ``` ```sql brew upgrade ``` ```sql confluent update ``` ```sql confluent flink shell --compute-pool --environment ``` ```sql SELECT 'Hello SQL'; ``` ```sql EXPR$0 Hello SQL ``` ```sql SELECT Name, COUNT(*) AS Num FROM (VALUES ('Neo'), ('Trinity'), ('Morpheus'), ('Trinity')) AS NameTable(Name) GROUP BY Name; ``` ```sql Name Num Neo 1 Morpheus 1 Trinity 2 ``` ```sql SHOW FUNCTIONS ``` ```sql SHOW FUNCTIONS; ``` ```sql +------------------------+ | function name | +------------------------+ | % | | * | | + | | - | | / | | < | | <= | | <> | | = | | > | | >= | | ABS | | ACOS | | AND | | ARRAY | | ARRAY_CONTAINS | | ... ``` ```sql CURRENT_TIMESTAMP ``` ```sql SELECT CURRENT_TIMESTAMP; ``` ```sql CURRENT_TIMESTAMP 2024-01-17 13:07:43.537 ``` ```sql SELECT COS(0) AS cosine; ``` ```sql employee_information ``` ```sql CREATE TABLE employee_information( emp_id INT, name VARCHAR, dept_id INT); ``` ```sql employee_information ``` ```sql employee_information ``` ```sql SELECT * from employee_information WHERE dept_id = 1; ``` ```sql SELECT dept_id, COUNT(*) as emp_count FROM employee_information GROUP BY dept_id; ``` ```sql INSERT INTO ``` ```sql INSERT INTO ``` ```sql INSERT INTO department_counts SELECT dept_id, COUNT(*) as emp_count FROM employee_information; ``` ```sql query: values | WITH withItem [ , withItem ]* query | { select | selectWithoutFrom | query UNION [ ALL ] query | query EXCEPT query | query INTERSECT query } [ ORDER BY orderItem [, orderItem ]* ] [ LIMIT { count | ALL } ] [ OFFSET start { ROW | ROWS } ] [ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY] withItem: name [ '(' column [, column ]* ')' ] AS '(' query ')' orderItem: expression [ ASC | DESC ] select: SELECT [ ALL | DISTINCT ] { * | projectItem [, projectItem ]* } FROM tableExpression [ WHERE booleanExpression ] [ GROUP BY { groupItem [, groupItem ]* } ] [ HAVING booleanExpression ] [ WINDOW windowName AS windowSpec [, windowName AS windowSpec ]* ] selectWithoutFrom: SELECT [ ALL | DISTINCT ] { * | projectItem [, projectItem ]* } projectItem: expression [ [ AS ] columnAlias ] | tableAlias . * tableExpression: tableReference [, tableReference ]* | tableExpression [ NATURAL ] [ LEFT | RIGHT | FULL ] JOIN tableExpression [ joinCondition ] joinCondition: ON booleanExpression | USING '(' column [, column ]* ')' tableReference: tablePrimary [ matchRecognize ] [ [ AS ] alias [ '(' columnAlias [, columnAlias ]* ')' ] ] tablePrimary: [ TABLE ] tablePath [ dynamicTableOptions ] [systemTimePeriod] [[AS] correlationName] | LATERAL TABLE '(' functionName '(' expression [, expression ]* ')' ')' | [ LATERAL ] '(' query ')' | UNNEST '(' expression ')' tablePath: [ [ catalogName . ] databaseName . ] tableName systemTimePeriod: FOR SYSTEM_TIME AS OF dateTimeExpression dynamicTableOptions: /*+ OPTIONS(key=val [, key=val]*) */ key: stringLiteral val: stringLiteral values: VALUES expression [, expression ]* groupItem: expression | '(' ')' | '(' expression [, expression ]* ')' | CUBE '(' expression [, expression ]* ')' | ROLLUP '(' expression [, expression ]* ')' | GROUPING SETS '(' groupItem [, groupItem ]* ')' windowRef: windowName | windowSpec windowSpec: [ windowName ] '(' [ ORDER BY orderItem [, orderItem ]* ] [ PARTITION BY expression [, expression ]* ] [ RANGE numericOrIntervalExpression {PRECEDING} | ROWS numericExpression {PRECEDING} ] ')' matchRecognize: MATCH_RECOGNIZE '(' [ PARTITION BY expression [, expression ]* ] [ ORDER BY orderItem [, orderItem ]* ] [ MEASURES measureColumn [, measureColumn ]* ] [ ONE ROW PER MATCH ] [ AFTER MATCH ( SKIP TO NEXT ROW | SKIP PAST LAST ROW | SKIP TO FIRST variable | SKIP TO LAST variable | SKIP TO variable ) ] PATTERN '(' pattern ')' [ WITHIN intervalLiteral ] DEFINE variable AS condition [, variable AS condition ]* ')' measureColumn: expression AS alias pattern: patternTerm [ '|' patternTerm ]* patternTerm: patternFactor [ patternFactor ]* patternFactor: variable [ patternQuantifier ] patternQuantifier: '*' | '*?' | '+' | '+?' | '?' | '??' | '{' { [ minRepeat ], [ maxRepeat ] } '}' ['?'] | '{' repeat '}' statementSet: EXECUTE STATEMENT SET BEGIN { insertStatement ';' }+ END ';' ``` ```sql SELECT a AS `my field` FROM t; ``` ```sql SELECT 'Hello World' ``` ```sql SELECT 'It''s me' ``` ```sql SELECT 'Hello World', 'It''s me'; ``` ```sql EXPR$0 EXPR$1 Hello World It's me ``` ```sql SELECT U&'\263A' ``` ```sql SELECT U&'\263A'; ``` ```sql SELECT U&'#2713' UESCAPE '#' ``` ```sql SELECT U&'#2713' UESCAPE '#'; ``` --- ### SQL SELECT statement in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/queries/select.html SELECT Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables querying the content of your tables by using familiar SELECT syntax. Syntax¶ SELECT [DISTINCT] select_list FROM table_expression [ WHERE boolean_expression ] [ LIMIT row_limit ] Description¶ The SELECT statement in Flink does what the SQL standard says it must do. You needn’t look further than standard SQL itself to understand the behavior. For example, UNION without ALL means that duplicate rows must be removed. Flink maintains the relation, called a dynamic table, specified by the SQL query. Its behavior is always the same as if you ran the SQL query again, over the current snapshot of the data, each time a new row arrives for any table in the relation. This formalism is what enables you to reason about exactly what Flink will do just by understanding what any SQL system, like MySQL, Snowflake, or Oracle, would do. Another way to understand what Flink SQL does is to consider the following statement: SELECT * FROM clicks ORDER BY clickTime LIMIT 10; This statement doesn’t only look at 10 rows, sort them, and terminate. It maintains this relation, and as new orders arrive, the relation changes, always showing the top 10 most recent orders. This is exactly as if you re-ran the query each time a new row was written to the clicks table. You’ll get the same result. Select list¶ The select_list specification * means the query resolves all columns. But in production, using * is not recommended, because it makes queries less robust to catalog changes. Instead, use a select_list to specify a subset of available columns or make calculations using the columns. For example, if an orders table has columns named order_id, price, and tax you could write the following query: SELECT order_id, price + tax FROM orders Table expression¶ The table_expression can be any source of data, including a table, view, or VALUES clause, the joined results of multiple existing tables, or a subquery. Assuming that an orders table is available in the catalog, the following would read all rows from . SELECT * FROM orders; VALUES clause¶ Queries can consume from inline data by using the VALUES clause. Each tuple corresponds to one row. You can provide an alias to assign a name to each column. SELECT order_id, price FROM (VALUES (1, 2.0), (2, 3.1)) AS t (order_id, price); Your output should resemble: order_id price 1 2.0 2 3.1 WHERE clause¶ Filter rows by using the WHERE clause. SELECT price + tax FROM orders WHERE id = 10; Functions¶ You can invoke built-in scalar functions on the columns of a single row. SELECT PRETTY_PRINT(order_id) FROM orders; DISTINCT¶ If SELECT DISTINCT is specified, all duplicate rows are removed from the result set, which means that one row is kept from each group of duplicates. For streaming queries, the required state for computing the query result might grow infinitely. State size depends on the number of distinct rows. SELECT DISTINCT id FROM orders; Usage¶ In the Flink SQL shell or in a Cloud Console workspace, run the following commands to see examples of the SELECT statement. Create a table for web page click events. -- Create a table for web page click events. CREATE TABLE clicks ( ip_address VARCHAR, url VARCHAR, click_ts_raw BIGINT ); Populate the table with mock clickstream data. -- Populate the table with mock clickstream data. INSERT INTO clicks VALUES( '10.0.0.1', 'https://acme.com/index.html', 1692812175), ( '10.0.0.12', 'https://apache.org/index.html', 1692826575), ( '10.0.0.13', 'https://confluent.io/index.html', 1692826575), ( '10.0.0.1', 'https://acme.com/index.html', 1692812175), ( '10.0.0.12', 'https://apache.org/index.html', 1692819375), ( '10.0.0.13', 'https://confluent.io/index.html', 1692826575); Press ENTER to return to the SQL shell. Because INSERT INTO VALUES is a point-in-time statement, it exits after it completes inserting records. View all rows in the clicks table by using a SELECT statement. SELECT * FROM clicks; Your output should resemble: ip_address url click_ts_raw 10.0.0.1 https://acme.com/index.html 1692812175 10.0.0.12 https://apache.org/index.html 1692826575 10.0.0.13 https://confluent.io/index.html 1692826575 10.0.0.1 https://acme.com/index.html 1692812175 10.0.0.12 https://apache.org/index.html 1692819375 10.0.0.13 https://confluent.io/index.html 1692826575 View only unique rows in the clicks table by using a SELECT DISTINCT statement. SELECT DISTINCT * FROM clicks; Your output should resemble: ip_address url click_ts_raw 10.0.0.1 https://acme.com/index.html 1692812175 10.0.0.12 https://apache.org/index.html 1692826575 10.0.0.13 https://confluent.io/index.html 1692826575 10.0.0.12 https://apache.org/index.html 1692819375 View only records that have the ip_address of 10.0.0.1 by using a SELECT WHERE statement. SELECT * FROM clicks WHERE ip_address='10.0.0.1'; Your output should resemble: ip_address url click_ts_raw 10.0.0.1 https://acme.com/index.html 1692812175 10.0.0.1 https://acme.com/index.html 1692812175 Examples¶ The following examples show frequently encountered scenarios with SELECT. Most minimal statement¶ SyntaxSELECT 1; Properties Statement is bounded Check local time zone is configured correctly¶ SyntaxSELECT NOW(); Properties Statement is bounded NOW() returns a TIMSTAMP_LTZ(3), so if the client is configured correctly, it should show a timestamp in your local time zone. Combine multiple tables into one¶ SyntaxCREATE TABLE t_union_1 (i INT); CREATE TABLE t_union_2 (i INT); TABLE t_union_1 UNION ALL TABLE t_union_2; -- alternate syntax SELECT * FROM t_union_1 UNION ALL SELECT * FROM t_union_2; Get insights into the current watermark¶ SyntaxCREATE TABLE t_watermarked_insight (s STRING) DISTRIBUTED INTO 1 BUCKETS; INSERT INTO t_watermarked_insight VALUES ('Bob'), ('Alice'), ('Charly'); SELECT $rowtime, CURRENT_WATERMARK($rowtime) FROM t_watermarked_insight; The output resembles: $rowtime EXPR$1 2024-04-29 11:59:01.080 NULL 2024-04-29 11:59:01.093 2024-04-04 15:27:37.433 2024-04-29 11:59:01.094 2024-04-04 15:27:37.433 Properties The CURRENT_WATERMARK function returns the watermark that arrived at the operator evaluating the SELECT statement. The returned watermark is the minimum of all inputs, across all tables/topics and their partitions. If a common watermark was not received from all inputs, the function returns NULL. The CURRENT_WATERMARK function takes a time attribute, which is a column that has WATERMARK FOR defined. A watermark is always emitted after the row has been processed, so the first row always has a NULL watermark. Because the default watermark algorithm requires at least 250 records, initially it assumes the maximum lag of 7 days plus a safety margin of 7 days. The watermark quickly (exponentially) goes down as more data arrives. Sources emit watermarks every 200 ms, but within the first 200 ms they emit per row for powering examples like this. Flatten fields into columns¶ SyntaxCREATE TABLE t_flattening (i INT, r1 ROW, r2 ROW); SELECT r1.*, r2.* FROM t_flattening; PropertiesYou can apply the * operator on nested data, which enables flattening fields into columns of the table. Related content¶ Flink SQL Queries Flink SQL Functions Statements Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql SELECT [DISTINCT] select_list FROM table_expression [ WHERE boolean_expression ] [ LIMIT row_limit ] ``` ```sql SELECT * FROM clicks ORDER BY clickTime LIMIT 10; ``` ```sql select_list ``` ```sql select_list ``` ```sql SELECT order_id, price + tax FROM orders ``` ```sql table_expression ``` ```sql SELECT * FROM orders; ``` ```sql SELECT order_id, price FROM (VALUES (1, 2.0), (2, 3.1)) AS t (order_id, price); ``` ```sql order_id price 1 2.0 2 3.1 ``` ```sql SELECT price + tax FROM orders WHERE id = 10; ``` ```sql SELECT PRETTY_PRINT(order_id) FROM orders; ``` ```sql SELECT DISTINCT ``` ```sql SELECT DISTINCT id FROM orders; ``` ```sql -- Create a table for web page click events. CREATE TABLE clicks ( ip_address VARCHAR, url VARCHAR, click_ts_raw BIGINT ); ``` ```sql -- Populate the table with mock clickstream data. INSERT INTO clicks VALUES( '10.0.0.1', 'https://acme.com/index.html', 1692812175), ( '10.0.0.12', 'https://apache.org/index.html', 1692826575), ( '10.0.0.13', 'https://confluent.io/index.html', 1692826575), ( '10.0.0.1', 'https://acme.com/index.html', 1692812175), ( '10.0.0.12', 'https://apache.org/index.html', 1692819375), ( '10.0.0.13', 'https://confluent.io/index.html', 1692826575); ``` ```sql SELECT * FROM clicks; ``` ```sql ip_address url click_ts_raw 10.0.0.1 https://acme.com/index.html 1692812175 10.0.0.12 https://apache.org/index.html 1692826575 10.0.0.13 https://confluent.io/index.html 1692826575 10.0.0.1 https://acme.com/index.html 1692812175 10.0.0.12 https://apache.org/index.html 1692819375 10.0.0.13 https://confluent.io/index.html 1692826575 ``` ```sql SELECT DISTINCT * FROM clicks; ``` ```sql ip_address url click_ts_raw 10.0.0.1 https://acme.com/index.html 1692812175 10.0.0.12 https://apache.org/index.html 1692826575 10.0.0.13 https://confluent.io/index.html 1692826575 10.0.0.12 https://apache.org/index.html 1692819375 ``` ```sql SELECT * FROM clicks WHERE ip_address='10.0.0.1'; ``` ```sql ip_address url click_ts_raw 10.0.0.1 https://acme.com/index.html 1692812175 10.0.0.1 https://acme.com/index.html 1692812175 ``` ```sql SELECT NOW(); ``` ```sql CREATE TABLE t_union_1 (i INT); CREATE TABLE t_union_2 (i INT); TABLE t_union_1 UNION ALL TABLE t_union_2; -- alternate syntax SELECT * FROM t_union_1 UNION ALL SELECT * FROM t_union_2; ``` ```sql CREATE TABLE t_watermarked_insight (s STRING) DISTRIBUTED INTO 1 BUCKETS; INSERT INTO t_watermarked_insight VALUES ('Bob'), ('Alice'), ('Charly'); SELECT $rowtime, CURRENT_WATERMARK($rowtime) FROM t_watermarked_insight; ``` ```sql $rowtime EXPR$1 2024-04-29 11:59:01.080 NULL 2024-04-29 11:59:01.093 2024-04-04 15:27:37.433 2024-04-29 11:59:01.094 2024-04-04 15:27:37.433 ``` ```sql CREATE TABLE t_flattening (i INT, r1 ROW, r2 ROW); SELECT r1.*, r2.* FROM t_flattening; ``` --- ### SQL Set Logic in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/queries/set-logic.html Set Logic in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables set logic operations on tables in SQL statements. EXCEPT EXISTS IN INTERSECT UNION Example data¶ The following examples use these tables to show how the different logical operators work. -- Create tables for the set logic operations. CREATE TABLE t1(chr CHAR); INSERT INTO t1 VALUES('c'), ('a'), ('b'), ('b'), ('c'); CREATE TABLE t2(chr CHAR); INSERT INTO t2 VALUES('d'), ('e'), ('a'), ('b'), ('b'); EXCEPT¶ EXCEPT and EXCEPT ALL return the rows that are found in one table but not the other. EXCEPT returns only distinct rows. EXCEPT ALL doesn’t remove duplicates from the result rows. The following code example shows output from the EXCEPT function on tables t1 and t2. (SELECT chr FROM t1) EXCEPT (SELECT chr FROM t2); Your output should resemble: chr c The following code example shows output from the EXCEPT ALL function on tables t1 and t2. (SELECT chr FROM t1) EXCEPT ALL (SELECT chr FROM t2); Your output should resemble: +----+ | chr| +----+ | c| | c| +----+ EXISTS¶ SELECT user, amount FROM orders WHERE product EXISTS ( SELECT product FROM NewProducts ) Returns TRUE if the sub-query returns at least one row. Only supported if the operation can be rewritten in a join and group operation. The optimizer rewrites the EXISTS operation into a join and group operation. For streaming queries, the required state for computing the query result might grow infinitely depending on the number of distinct input rows. IN¶ Returns TRUE if an expression exists in a table sub-query. The sub-query table must consist of one column. This column must have the same data type as the expression. SELECT user, amount FROM orders WHERE product IN ( SELECT product FROM NewProducts ) The optimizer rewrites the IN condition into a join and group operation. For streaming queries, the required state for computing the query result might grow infinitely depending on the number of distinct input rows. INTERSECT¶ INTERSECT and INTERSECT ALL return the rows that are found in both tables. INTERSECT returns only distinct rows. INTERSECT ALL doesn’t remove duplicates from the result rows. The following code example shows output from the INTERSECT function on tables t1 and t2. (SELECT chr FROM t1) INTERSECT (SELECT chr FROM t2); Your output should resemble: chr a b The following code example shows output from the INTERSECT ALL function on tables t1 and t2. (SELECT chr FROM t1) INTERSECT ALL (SELECT chr FROM t2); Your output should resemble: chr a b b UNION¶ UNION and UNION ALL return the rows that are found in either table. UNION returns only distinct rows. UNION ALL doesn’t remove duplicates from the result rows. The following code example shows output from the UNION function on tables t1 and t2. (SELECT chr FROM view1) UNION (SELECT chr FROM view2); Your output should resemble: chr c a b d e The following code example shows output from the UNION ALL function on tables t1 and t2. (SELECT chr FROM t1) UNION ALL (SELECT chr FROM t2); Your output should resemble: chr c a b b c d e a b b Related content¶ Flink SQL Queries Flink SQL Functions Statements Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql -- Create tables for the set logic operations. CREATE TABLE t1(chr CHAR); INSERT INTO t1 VALUES('c'), ('a'), ('b'), ('b'), ('c'); CREATE TABLE t2(chr CHAR); INSERT INTO t2 VALUES('d'), ('e'), ('a'), ('b'), ('b'); ``` ```sql (SELECT chr FROM t1) EXCEPT (SELECT chr FROM t2); ``` ```sql (SELECT chr FROM t1) EXCEPT ALL (SELECT chr FROM t2); ``` ```sql +----+ | chr| +----+ | c| | c| +----+ ``` ```sql SELECT user, amount FROM orders WHERE product EXISTS ( SELECT product FROM NewProducts ) ``` ```sql SELECT user, amount FROM orders WHERE product IN ( SELECT product FROM NewProducts ) ``` ```sql INTERSECT ALL ``` ```sql INTERSECT ALL ``` ```sql (SELECT chr FROM t1) INTERSECT (SELECT chr FROM t2); ``` ```sql INTERSECT ALL ``` ```sql (SELECT chr FROM t1) INTERSECT ALL (SELECT chr FROM t2); ``` ```sql (SELECT chr FROM view1) UNION (SELECT chr FROM view2); ``` ```sql chr c a b d e ``` ```sql (SELECT chr FROM t1) UNION ALL (SELECT chr FROM t2); ``` ```sql chr c a b b c d e a b b ``` --- ### SQL Statement Sets in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/queries/statement-set.html EXECUTE STATEMENT SET in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables executing multiple SQL statements as a single, optimized statement by using statement sets. Syntax¶ EXECUTE STATEMENT SET BEGIN -- one or more INSERT INTO statements { INSERT INTO ; }+ END; Description¶ Statement sets are a feature of Confluent Cloud for Apache Flink® that enables executing a set of SQL statements as a single, optimized statement. This is useful when you have multiple SQL statements that share common intermediate results, as it enables you to reuse those results and avoid unnecessary computation. To use statement sets, you enclose one or more SQL statements in a block and execute them as a single unit. All statements in the block are optimized and executed together as a single Flink statement. Statement sets are particularly useful when you have multiple INSERT INTO statements that read from the same table or share intermediate results. By executing these statements together as a single statement, you can avoid redundant computation and improve performance. Example¶ The following query results in a single statement being executed which reads from an orders table. If the status is completed, the product and quantity values are written to the sales table. If the status is returned, the product and quantity values are written to the returns table. EXECUTE STATEMENT SET BEGIN INSERT INTO `sales` (product, quantity) SELECT product, quantity FROM orders WHERE status = 'completed'; INSERT INTO `returns` (product, quantity) SELECT product, quantity FROM orders WHERE status = 'returned'; END; Related content¶ INSERT INTO FROM SELECT INSERT VALUES Flink SQL Queries Flink SQL Functions Statements Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql EXECUTE STATEMENT SET BEGIN -- one or more INSERT INTO statements { INSERT INTO ; }+ END; ``` ```sql EXECUTE STATEMENT SET BEGIN INSERT INTO `sales` (product, quantity) SELECT product, quantity FROM orders WHERE status = 'completed'; INSERT INTO `returns` (product, quantity) SELECT product, quantity FROM orders WHERE status = 'returned'; END; ``` --- ### SQL Top-N queries in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/queries/topn.html Top-N Queries in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables finding the smallest or largest values, ordered by columns, in a table. Syntax¶ SELECT [column_list] FROM ( SELECT [column_list], ROW_NUMBER() OVER ([PARTITION BY column1[, column2...]] ORDER BY column1 [asc|desc][, column2 [asc|desc]...]) AS rownum FROM table_name) WHERE rownum <= N [AND conditions] Parameter Specification Note This query pattern must be followed exactly, otherwise, the optimizer can’t translate the query. ROW_NUMBER(): Assigns an unique, sequential number to each row, starting with one, according to the ordering of rows within the partition. Currently, Flink supports only ROW_NUMBER as the over window function. In the future, Flink may support RANK() and DENSE_RANK(). PARTITION BY column1[, column2...]: Specifies the partition columns. Each partition has a Top-N result. ORDER BY column1 [asc|desc][, column2 [asc|desc]...]: Specifies the ordering columns. The ordering directions can be different on different columns. WHERE rownum <= N: The rownum <= N is required for Flink to recognize this query is a Top-N query. The N represents the number of smallest or largest records to retain. [AND conditions]: You can add other conditions in the WHERE clause, but the other conditions can only be combined with rownum <= N using the AND conjunction. Description¶ Find the smallest or largest values, ordered by columns, in a table. Top-N queries return the N smallest or largest values in a table, ordered by columns. Both smallest and largest values sets are considered Top-N queries. Top-N queries are useful in cases where the need is to display only the N bottom-most or the N top- most records from batch/streaming table on a condition. This result set can be used for further analysis. Flink uses the combination of a OVER window clause and a filter condition to express a Top-N query. With the power of OVER window PARTITION BY clause, Flink also supports per group Top-N. For example, the top five products per category that have the maximum sales in realtime. Top-N queries are supported for SQL on batch and streaming tables. The Top-N query is Result Updating, which means that Flink sorts the input stream according to the order key. If the top N rows have changed, the changed rows are sent downstream as retraction/update records. Examples¶ The following examples show how to specify Top-N queries on streaming tables. The unique key of a Top-N query is the combination of partition columns and the rownum column. Also, a Top-N query can derive the unique key of upstream. The following example shows how to get “the top five products per category that have the maximum sales in realtime”. If product_id is the unique key of the ShopSales table, the unique keys of the Top-N query are [category, rownum] and [product_id]. CREATE TABLE ShopSales ( product_id STRING, category STRING, product_name STRING, sales BIGINT ) WITH (...); SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales DESC) AS row_num FROM ShopSales) WHERE row_num <= 5 No ranking output optimization¶ As described in the previous example, the rownum field is written into the result table as one field of the unique key, which may cause many records to be written to the result table. For example, when a record, fro example, product-1001, of ranking 9 is updated and its rank is upgraded to 1, all the records from ranking 1 - 9 are output to the result table as update messages. If the result table receives too many rows, it may slow the SQL job execution. To optimize the query, omit the rownum field in the outer SELECT clause of the Top-N query. This approach is reasonable, because the number of Top-N rows usually isn’t large, so consumers can sort the rows themselves quickly. Without the rownum field, only the changed record (product-1001) must be sent to downstream, which can reduce much of the IO to the result table. The following example shows how to optimize the previous Top-N example by : CREATE TABLE ShopSales ( product_id STRING, category STRING, product_name STRING, sales BIGINT ) WITH (...); -- omit row_num field from the output SELECT product_id, category, product_name, sales FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales DESC) AS row_num FROM ShopSales) WHERE row_num <= 5 Note In Streaming Mode, to output the above query to an external storage and have a correct result, the external storage must have the same unique key with the Top-N query. In the above example query, if the product_id is the unique key of the query, then the external table should also has product_id as the unique key. Related content¶ Window Aggregation Queries Windowing Table-Valued Functions (Windowing TVFs) Top-N Queries Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql SELECT [column_list] FROM ( SELECT [column_list], ROW_NUMBER() OVER ([PARTITION BY column1[, column2...]] ORDER BY column1 [asc|desc][, column2 [asc|desc]...]) AS rownum FROM table_name) WHERE rownum <= N [AND conditions] ``` ```sql ROW_NUMBER() ``` ```sql DENSE_RANK() ``` ```sql PARTITION BY column1[, column2...] ``` ```sql ORDER BY column1 [asc|desc][, column2 [asc|desc]...] ``` ```sql WHERE rownum <= N ``` ```sql rownum <= N ``` ```sql [AND conditions] ``` ```sql rownum <= N ``` ```sql PARTITION BY ``` ```sql CREATE TABLE ShopSales ( product_id STRING, category STRING, product_name STRING, sales BIGINT ) WITH (...); SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales DESC) AS row_num FROM ShopSales) WHERE row_num <= 5 ``` ```sql product-1001 ``` ```sql product-1001 ``` ```sql CREATE TABLE ShopSales ( product_id STRING, category STRING, product_name STRING, sales BIGINT ) WITH (...); -- omit row_num field from the output SELECT product_id, category, product_name, sales FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales DESC) AS row_num FROM ShopSales) WHERE row_num <= 5 ``` --- ### SQL Window Aggregation Queries in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/queries/window-aggregation.html Window Aggregation Queries in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables aggregating data over windows in a table. Syntax¶ SELECT ... FROM -- relation applied windowing TVF GROUP BY window_start, window_end, ... Description¶ Window TVF Aggregation¶ Window aggregations are defined in the GROUP BY clause containing “window_start” and “window_end” columns of the relation applied Windowing TVF. Just like queries with regular GROUP BY clauses, queries with a group by window aggregation compute a single result row per group. Unlike other aggregations on continuous tables, window aggregations do not emit intermediate results but only a final result: the total aggregation at the end of the window. Moreover, window aggregations purge all intermediate state when they’re no longer needed. Windowing TVFs¶ Flink supports TUMBLE, HOP, CUMULATE and SESSION types of window aggregations. The time attribute field of a window table-valued function must be event time attributes. For more information, see Windowing TVF. In batch mode, the time attribute field of a window table-valued function must be an attribute of type TIMESTAMP or TIMESTAMP_LTZ. SESSION window aggregation is not supported in batch mode. Examples¶ The following examples show Window aggregations over example data streams that you can experiment with. Note To show the behavior of windowing more clearly in the following examples, TIMESTAMP(3) values may be simplified so that trailing zeroes aren’t shown. For example, 2020-04-15 08:05:00.000 may be shown as 2020-04-15 08:05. Columns may be hidden intentionally to enhance the readability of the content. Here are some examples for TUMBLE, HOP, CUMULATE and SESSION window aggregations. DESCRIBE `examples`.`marketplace`.`orders`; +--------------+-----------+----------+---------------+ | Column Name | Data Type | Nullable | Extras | +--------------+-----------+----------+---------------+ | order_id | STRING | NOT NULL | | | customer_id | INT | NOT NULL | | | product_id | STRING | NOT NULL | | | price | DOUBLE | NOT NULL | | +--------------+-----------+----------+---------------+ SELECT * FROM `examples`.`marketplace`.`orders`; order_id customer_id product_id price d770a538-a70c-4de6-9d06-e6c16c5bef5a 3075 1379 32.21 787ee1f4-d0d0-4c39-bdb9-44dc2d203d55 3028 1335 34.74 7ab7ce23-5f61-4398-afad-b1e3f548fee3 3148 1045 69.26 6fea712c-9454-497e-8038-ebaf6dfc7a17 3247 1390 67.26 dc9daf5e-98d5-4bcd-8839-251fed13b75e 3167 1309 12.04 ab3151d0-2950-49cd-9783-016ccc6a3281 3105 1094 21.52 d27ca945-3cff-48a4-afcc-7b17446aa95d 3168 1250 99.95 -- apply aggregation on the tumbling windowed table SELECT window_start, window_end, SUM(price) as `sum` FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES)) GROUP BY window_start, window_end; window_start window_end sum 2023-11-02 10:40:00 2023-11-02 10:50:00 258484.93 2023-11-02 10:50:00 2023-11-02 11:00:00 287632.15 2023-11-02 11:00:00 2023-11-02 11:10:00 271945.78 2023-11-02 11:10:00 2023-11-02 11:20:00 315207.46 2023-11-02 11:20:00 2023-11-02 11:30:00 342618.92 2023-11-02 11:30:00 2023-11-02 11:40:00 329754.31 -- apply aggregation on the hopping windowed table SELECT window_start, window_end, SUM(price) as `sum` FROM TABLE( HOP(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES, INTERVAL '10' MINUTES)) GROUP BY window_start, window_end; window_start window_end sum 2023-11-02 11:10:00 2023-11-02 11:20:00 296049.38 2023-11-02 11:15:00 2023-11-02 11:25:00 1122455.07 2023-11-02 11:20:00 2023-11-02 11:30:00 1648270.20 2023-11-02 11:25:00 2023-11-02 11:35:00 2143271.00 2023-11-02 11:30:00 2023-11-02 11:40:00 2701592.45 2023-11-02 11:35:00 2023-11-02 11:45:00 3214376.78 -- apply aggregation on the cumulating windowed table SELECT window_start, window_end, SUM(price) as `sum` FROM TABLE( CUMULATE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '2' MINUTES, INTERVAL '10' MINUTES)) GROUP BY window_start, window_end; window_start window_end sum 2023-11-02 12:40:00.000 2023-11-02 12:46:00.000 327376.23 2023-11-02 12:40:00.000 2023-11-02 12:48:00.000 661272.70 2023-11-02 12:40:00.000 2023-11-02 12:50:00.000 989294.13 2023-11-02 12:50:00.000 2023-11-02 12:52:00.000 1316596.58 2023-11-02 12:50:00.000 2023-11-02 12:54:00.000 1648097.20 2023-11-02 12:50:00.000 2023-11-02 12:56:00.000 1977881.53 2023-11-02 12:50:00.000 2023-11-02 12:58:00.000 2304080.32 2023-11-02 12:50:00.000 2023-11-02 13:00:00.000 2636795.56 -- apply aggregation on the session windowed table SELECT window_start, window_end, customer_id, SUM(price) as `sum` FROM TABLE( SESSION(TABLE `examples`.`marketplace`.`orders` PARTITION BY customer_id, DESCRIPTOR($rowtime), INTERVAL '1' MINUTES)) GROUP BY window_start, window_end, customer_id; window_start window_end sum 2023-11-02 12:40:00 2023-11-02 12:46:00 327376.23 2023-11-02 12:40:00 2023-11-02 12:48:00 661272.70 2023-11-02 12:40:00 2023-11-02 12:50:00 989294.13 2023-11-02 12:50:00 2023-11-02 12:52:00 1316596.58 2023-11-02 12:50:00 2023-11-02 12:54:00 1648097.20 2023-11-02 12:50:00 2023-11-02 12:56:00 1977881.53 2023-11-02 12:50:00 2023-11-02 12:58:00 2304080.32 2023-11-02 12:50:00 2023-11-02 13:00:00 2636795.56 GROUPING SETS¶ Window aggregations also support GROUPING SETS syntax. Grouping sets allow for more complex grouping operations than those describable by a standard GROUP BY. Rows are grouped separately by each specified grouping set and aggregates are computed for each group just as for simple GROUP BY clauses. Window aggregations with GROUPING SETS require both the window_start and window_end columns have to be in the GROUP BY clause, but not in the GROUPING SETS clause. SELECT window_start, window_end, player_id, SUM(points) as `sum` FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES)) GROUP BY window_start, window_end, GROUPING SETS ((player_id), ()); window_start window_end player_id sum 2023-11-03 11:20 2023-11-03 11:30 (NULL) 6596 2023-11-03 11:20 2023-11-03 11:30 1025 6232 2023-11-03 11:20 2023-11-03 11:30 1007 4486 2023-11-03 11:30 2023-11-03 11:40 (NULL) 6073 2023-11-03 11:30 2023-11-03 11:40 1025 6953 2023-11-03 11:30 2023-11-03 11:40 1007 3723 Each sublist of GROUPING SETS may specify zero or more columns or expressions and is interpreted the same way as though used directly in the GROUP BY clause. An empty grouping set means that all rows are aggregated down to a single group, which is output even if no input rows were present. References to the grouping columns or expressions are replaced by null values in result rows for grouping sets in which those columns do not appear. ROLLUP¶ ROLLUP is a shorthand notation for specifying a common type of grouping set. It represents the given list of expressions and all prefixes of the list, including the empty list. Window aggregations with ROLLUP requires both the window_start and window_end columns have to be in the GROUP BY clause, but not in the ROLLUP clause. For example, the following query is equivalent to the one above. SELECT window_start, window_end, player_id, SUM(points) as `sum` FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES)) GROUP BY window_start, window_end, ROLLUP (player_id); CUBE¶ CUBE is a shorthand notation for specifying a common type of grouping set. It represents the given list and all of its possible subsets - the power set. Window aggregations with CUBE requires both the window_start and window_end columns have to be in the GROUP BY clause, but not in the CUBE clause. For example, the following two queries are equivalent. SELECT window_start, window_end, game_room_id, player_id, SUM(points) as `sum` FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES)) GROUP BY window_start, window_end, CUBE (player_id, game_room_id); SELECT window_start, window_end, game_room_id, player_id, SUM(points) as `sum` FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES)) GROUP BY window_start, window_end, GROUPING SETS ( (player_id, game_room_id), (player_id ), ( game_room_id), ( ) ); Selecting Group Window Start and End Timestamps¶ The start and end timestamps of group windows can be selected with the grouped window_start and window_end columns. Cascading Window Aggregation¶ The window_start and window_end columns are regular timestamp columns, not time attributes, so they can’t be used as time attributes in subsequent time-based operations. To propagate time attributes, you also need to add window_time column into GROUP BY clause. The window_time is the third column produced by Windowing TVFs, which is a time attribute of the assigned window. Adding window_time into a GROUP BY clause makes window_time also to be a group key that can be selected. Following queries can use this column for subsequent time-based operations, like cascading window aggregations and Window TopN. The following code shows a cascading window aggregation in which the first window aggregation propagates the time attribute for the second window aggregation. -- tumbling 5 minutes for each player_id WITH fiveminutewindow AS ( -- Note: The window start and window end fields of inner Window TVF -- are optional in the SELECT clause. But if they appear in the clause, -- they must be aliased to prevent name conflicts with the window start -- and window end of the outer Window TVF. SELECT window_start AS window_5mintumble_start, window_end as window_5mintumble_end, window_time AS rowtime, SUM(points) as `partial_sum` FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES)) GROUP BY player_id, window_start, window_end, window_time ) -- tumbling 10 minutes on the first window SELECT window_start, window_end, SUM(partial_price) as total_price FROM TABLE( TUMBLE(TABLE fiveminutewindow, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES)) GROUP BY window_start, window_end; Related content¶ Course: Window Aggregations Top-N Queries Window Top-N Queries Windowing Table-Valued Functions (Windowing TVFs) Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql SELECT ... FROM -- relation applied windowing TVF GROUP BY window_start, window_end, ... ``` ```sql TIMESTAMP_LTZ ``` ```sql TIMESTAMP(3) ``` ```sql 2020-04-15 08:05:00.000 ``` ```sql 2020-04-15 08:05 ``` ```sql DESCRIBE `examples`.`marketplace`.`orders`; ``` ```sql +--------------+-----------+----------+---------------+ | Column Name | Data Type | Nullable | Extras | +--------------+-----------+----------+---------------+ | order_id | STRING | NOT NULL | | | customer_id | INT | NOT NULL | | | product_id | STRING | NOT NULL | | | price | DOUBLE | NOT NULL | | +--------------+-----------+----------+---------------+ ``` ```sql SELECT * FROM `examples`.`marketplace`.`orders`; ``` ```sql order_id customer_id product_id price d770a538-a70c-4de6-9d06-e6c16c5bef5a 3075 1379 32.21 787ee1f4-d0d0-4c39-bdb9-44dc2d203d55 3028 1335 34.74 7ab7ce23-5f61-4398-afad-b1e3f548fee3 3148 1045 69.26 6fea712c-9454-497e-8038-ebaf6dfc7a17 3247 1390 67.26 dc9daf5e-98d5-4bcd-8839-251fed13b75e 3167 1309 12.04 ab3151d0-2950-49cd-9783-016ccc6a3281 3105 1094 21.52 d27ca945-3cff-48a4-afcc-7b17446aa95d 3168 1250 99.95 ``` ```sql -- apply aggregation on the tumbling windowed table SELECT window_start, window_end, SUM(price) as `sum` FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES)) GROUP BY window_start, window_end; ``` ```sql window_start window_end sum 2023-11-02 10:40:00 2023-11-02 10:50:00 258484.93 2023-11-02 10:50:00 2023-11-02 11:00:00 287632.15 2023-11-02 11:00:00 2023-11-02 11:10:00 271945.78 2023-11-02 11:10:00 2023-11-02 11:20:00 315207.46 2023-11-02 11:20:00 2023-11-02 11:30:00 342618.92 2023-11-02 11:30:00 2023-11-02 11:40:00 329754.31 ``` ```sql -- apply aggregation on the hopping windowed table SELECT window_start, window_end, SUM(price) as `sum` FROM TABLE( HOP(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES, INTERVAL '10' MINUTES)) GROUP BY window_start, window_end; ``` ```sql window_start window_end sum 2023-11-02 11:10:00 2023-11-02 11:20:00 296049.38 2023-11-02 11:15:00 2023-11-02 11:25:00 1122455.07 2023-11-02 11:20:00 2023-11-02 11:30:00 1648270.20 2023-11-02 11:25:00 2023-11-02 11:35:00 2143271.00 2023-11-02 11:30:00 2023-11-02 11:40:00 2701592.45 2023-11-02 11:35:00 2023-11-02 11:45:00 3214376.78 ``` ```sql -- apply aggregation on the cumulating windowed table SELECT window_start, window_end, SUM(price) as `sum` FROM TABLE( CUMULATE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '2' MINUTES, INTERVAL '10' MINUTES)) GROUP BY window_start, window_end; ``` ```sql window_start window_end sum 2023-11-02 12:40:00.000 2023-11-02 12:46:00.000 327376.23 2023-11-02 12:40:00.000 2023-11-02 12:48:00.000 661272.70 2023-11-02 12:40:00.000 2023-11-02 12:50:00.000 989294.13 2023-11-02 12:50:00.000 2023-11-02 12:52:00.000 1316596.58 2023-11-02 12:50:00.000 2023-11-02 12:54:00.000 1648097.20 2023-11-02 12:50:00.000 2023-11-02 12:56:00.000 1977881.53 2023-11-02 12:50:00.000 2023-11-02 12:58:00.000 2304080.32 2023-11-02 12:50:00.000 2023-11-02 13:00:00.000 2636795.56 ``` ```sql -- apply aggregation on the session windowed table SELECT window_start, window_end, customer_id, SUM(price) as `sum` FROM TABLE( SESSION(TABLE `examples`.`marketplace`.`orders` PARTITION BY customer_id, DESCRIPTOR($rowtime), INTERVAL '1' MINUTES)) GROUP BY window_start, window_end, customer_id; ``` ```sql window_start window_end sum 2023-11-02 12:40:00 2023-11-02 12:46:00 327376.23 2023-11-02 12:40:00 2023-11-02 12:48:00 661272.70 2023-11-02 12:40:00 2023-11-02 12:50:00 989294.13 2023-11-02 12:50:00 2023-11-02 12:52:00 1316596.58 2023-11-02 12:50:00 2023-11-02 12:54:00 1648097.20 2023-11-02 12:50:00 2023-11-02 12:56:00 1977881.53 2023-11-02 12:50:00 2023-11-02 12:58:00 2304080.32 2023-11-02 12:50:00 2023-11-02 13:00:00 2636795.56 ``` ```sql GROUPING SETS ``` ```sql GROUPING SETS ``` ```sql window_start ``` ```sql GROUPING SETS ``` ```sql SELECT window_start, window_end, player_id, SUM(points) as `sum` FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES)) GROUP BY window_start, window_end, GROUPING SETS ((player_id), ()); ``` ```sql window_start window_end player_id sum 2023-11-03 11:20 2023-11-03 11:30 (NULL) 6596 2023-11-03 11:20 2023-11-03 11:30 1025 6232 2023-11-03 11:20 2023-11-03 11:30 1007 4486 2023-11-03 11:30 2023-11-03 11:40 (NULL) 6073 2023-11-03 11:30 2023-11-03 11:40 1025 6953 2023-11-03 11:30 2023-11-03 11:40 1007 3723 ``` ```sql GROUPING SETS ``` ```sql window_start ``` ```sql SELECT window_start, window_end, player_id, SUM(points) as `sum` FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES)) GROUP BY window_start, window_end, ROLLUP (player_id); ``` ```sql window_start ``` ```sql SELECT window_start, window_end, game_room_id, player_id, SUM(points) as `sum` FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES)) GROUP BY window_start, window_end, CUBE (player_id, game_room_id); SELECT window_start, window_end, game_room_id, player_id, SUM(points) as `sum` FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES)) GROUP BY window_start, window_end, GROUPING SETS ( (player_id, game_room_id), (player_id ), ( game_room_id), ( ) ); ``` ```sql window_start ``` ```sql window_start ``` ```sql window_time ``` ```sql window_time ``` ```sql window_time ``` ```sql window_time ``` ```sql -- tumbling 5 minutes for each player_id WITH fiveminutewindow AS ( -- Note: The window start and window end fields of inner Window TVF -- are optional in the SELECT clause. But if they appear in the clause, -- they must be aliased to prevent name conflicts with the window start -- and window end of the outer Window TVF. SELECT window_start AS window_5mintumble_start, window_end as window_5mintumble_end, window_time AS rowtime, SUM(points) as `partial_sum` FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES)) GROUP BY player_id, window_start, window_end, window_time ) -- tumbling 10 minutes on the first window SELECT window_start, window_end, SUM(partial_price) as total_price FROM TABLE( TUMBLE(TABLE fiveminutewindow, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES)) GROUP BY window_start, window_end; ``` --- ### SQL Window Deduplication Queries in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/queries/window-deduplication.html Window Deduplication Queries in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables removing duplicate rows over a set of columns in a windowed table. Syntax¶ SELECT [column_list] FROM ( SELECT [column_list], ROW_NUMBER() OVER (PARTITION BY window_start, window_end [, column_key1...] ORDER BY time_attr [asc|desc]) AS rownum FROM table_name) -- relation applied windowing TVF WHERE (rownum = 1 | rownum <=1 | rownum < 2) [AND conditions] Parameter Specification Note This query pattern must be followed exactly, otherwise, the optimizer won’t translate the query to Window Deduplication. ROW_NUMBER(): Assigns an unique, sequential number to each row, starting with one. PARTITION BY window_start, window_end [, column_key1...]: Specifies the partition columns which contain window_start, window_end and other partition keys. ORDER BY time_attr [asc|desc]: Specifies the ordering column, which must be a time attribute. Flink SQL supports the event time attribute. Processing time is not supported in Confluent Cloud for Apache Flink. Ordering by ASC means keeping the first row, ordering by DESC means keeping the last row. WHERE (rownum = 1 | rownum <=1 | rownum < 2): The rownum = 1 | rownum <=1 | rownum < 2 is required for the optimizer to recognize the query should be translated to Window Deduplication. Description¶ Window Deduplication is a special deduplication that removes duplicate rows over a set of columns, keeping the first row or the last row for each window and partitioned keys. For streaming queries, unlike regular deduplicate on continuous tables, Window Deduplication doesn’t emit intermediate results, instead emitting only a final result at the end of the window. Also, window Deduplication purges all intermediate state when it’s no longer needed. As a result, Window Deduplication queries have better performance, if you don’t need results updated per row. Usually, Window Deduplication is used with Windowing TVF directly. Window Deduplication can be used with other operations based on Windowing TVF, like Window Aggregation, Window TopN, and Window Join. Window Deduplication can be defined in the same syntax as regular Deduplication. For more information, see Deduplication Queries in Confluent Cloud for Apache Flink. Window Deduplication requires that the PARTITION BY clause contains window_start and window_end columns of the relation, otherwise, the optimizer can’t translate the query. Flink uses ROW_NUMBER() to remove duplicates, similar to its usage in Top-N Queries in Confluent Cloud for Apache Flink. Deduplication is a special case of the Top-N query, in which N is one and order is by event time. Example¶ The following example shows how to keep the last record for every 10-minute tumbling window. The mock data is produced by the Datagen Source Connector configured with the Gaming Player Activity quickstart. DESCRIBE gaming_player_activity_source; +--------------+-----------+----------+---------------+ | Column Name | Data Type | Nullable | Extras | +--------------+-----------+----------+---------------+ | key | BYTES | NULL | PARTITION KEY | | player_id | INT | NOT NULL | | | game_room_id | INT | NOT NULL | | | points | INT | NOT NULL | | | coordinates | STRING | NOT NULL | | +--------------+-----------+----------+---------------+ SELECT * FROM gaming_player_activity_source; player_id game_room_id points coordinates 1051 1144 371 [65,36] 1079 3451 38 [20,71] 1017 4177 419 [63,05] 1092 1801 209 [31,67] 1074 3013 401 [32,69] 1003 1038 284 [18,32] 1081 2265 196 [78,68] SELECT * FROM ( SELECT $rowtime, points, game_room_id, player_id, window_start, window_end, ROW_NUMBER() OVER (PARTITION BY window_start, window_end ORDER BY $rowtime DESC) AS rownum FROM TABLE( TUMBLE(TABLE gaming_player_activity_source, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES)) ) WHERE rownum <= 1; $rowtime points game_room_id player_id window_start window_end rownum 2023-11-03 19:59:59.407 371 2504 1094 2023-11-03 19:50 2023-11-03 20:00 1 2023-11-03 20:09:59.921 188 4342 1036 2023-11-03 20:00 2023-11-03 20:10 1 2023-11-03 20:19:59.741 128 3427 1046 2023-11-03 20:10 2023-11-03 20:20 1 2023-11-03 20:29:59.992 311 1000 1049 2023-11-03 20:20 2023-11-03 20:30 1 2023-11-03 20:39:59.569 429 1217 1062 2023-11-03 20:30 2023-11-03 20:40 1 Related content¶ Top-N Queries Window Top-N Queries Windowing Table-Valued Functions (Windowing TVFs) Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql SELECT [column_list] FROM ( SELECT [column_list], ROW_NUMBER() OVER (PARTITION BY window_start, window_end [, column_key1...] ORDER BY time_attr [asc|desc]) AS rownum FROM table_name) -- relation applied windowing TVF WHERE (rownum = 1 | rownum <=1 | rownum < 2) [AND conditions] ``` ```sql ROW_NUMBER() ``` ```sql PARTITION BY window_start, window_end [, column_key1...] ``` ```sql window_start ``` ```sql ORDER BY time_attr [asc|desc] ``` ```sql WHERE (rownum = 1 | rownum <=1 | rownum < 2) ``` ```sql rownum = 1 | rownum <=1 | rownum < 2 ``` ```sql PARTITION BY ``` ```sql window_start ``` ```sql ROW_NUMBER() ``` ```sql DESCRIBE gaming_player_activity_source; ``` ```sql +--------------+-----------+----------+---------------+ | Column Name | Data Type | Nullable | Extras | +--------------+-----------+----------+---------------+ | key | BYTES | NULL | PARTITION KEY | | player_id | INT | NOT NULL | | | game_room_id | INT | NOT NULL | | | points | INT | NOT NULL | | | coordinates | STRING | NOT NULL | | +--------------+-----------+----------+---------------+ ``` ```sql SELECT * FROM gaming_player_activity_source; ``` ```sql player_id game_room_id points coordinates 1051 1144 371 [65,36] 1079 3451 38 [20,71] 1017 4177 419 [63,05] 1092 1801 209 [31,67] 1074 3013 401 [32,69] 1003 1038 284 [18,32] 1081 2265 196 [78,68] ``` ```sql SELECT * FROM ( SELECT $rowtime, points, game_room_id, player_id, window_start, window_end, ROW_NUMBER() OVER (PARTITION BY window_start, window_end ORDER BY $rowtime DESC) AS rownum FROM TABLE( TUMBLE(TABLE gaming_player_activity_source, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES)) ) WHERE rownum <= 1; ``` ```sql $rowtime points game_room_id player_id window_start window_end rownum 2023-11-03 19:59:59.407 371 2504 1094 2023-11-03 19:50 2023-11-03 20:00 1 2023-11-03 20:09:59.921 188 4342 1036 2023-11-03 20:00 2023-11-03 20:10 1 2023-11-03 20:19:59.741 128 3427 1046 2023-11-03 20:10 2023-11-03 20:20 1 2023-11-03 20:29:59.992 311 1000 1049 2023-11-03 20:20 2023-11-03 20:30 1 2023-11-03 20:39:59.569 429 1217 1062 2023-11-03 20:30 2023-11-03 20:40 1 ``` --- ### SQL Window Join Queries in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/queries/window-join.html Window Join Queries in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables joining data over time windows in dynamic tables. Syntax¶ The following shows the syntax of the INNER/LEFT/RIGHT/FULL OUTER Window Join statement. SELECT ... FROM L [LEFT|RIGHT|FULL OUTER] JOIN R -- L and R are relations applied windowing TVF ON L.window_start = R.window_start AND L.window_end = R.window_end AND ... Description¶ A window join adds the dimension of time into the join criteria themselves. In doing so, the window join joins the elements of two streams that share a common key and are in the same window. For streaming queries, unlike other joins on continuous tables, window join does not emit intermediate results but only emits final results at the end of the window. Moreover, window join purge all intermediate state when no longer needed. Usually, Window Join is used with Windowing TVF. Also, Window Join can follow after other operations based on Windowing TVF, like Window Aggregation and Window TopN. Window Join requires that the join on condition contains window_starts equality of input tables and window_ends equality of input tables. Window Join supports INNER/LEFT/RIGHT/FULL OUTER/ANTI/SEMI JOIN. The syntax is very similar for all of the different joins. Examples¶ The following examples show Window joins over mock data produced by the Datagen Source Connector configured with the Gaming Player Activity quickstart. Note To show the behavior of windowing more clearly in the following examples, TIMESTAMP(3) values may be simplified so that trailing zeroes aren’t shown. For example, 2020-04-15 08:05:00.000 may be shown as 2020-04-15 08:05. Columns may be hidden intentionally to enhance the readability of the content. FULL OUTER JOIN¶ The following example shows a FULL OUTER JOIN, with a Window Join that works on a Tumble Window TVF. When performing a window join, all elements with a common key and a common tumbling window are joined together. By scoping the region of time for the oin into fixed five-minute intervals, the datasets are chopped into two distinct windows of time: [12:00, 12:05) and [12:05, 12:10). The L2 and R2 rows don’t join together because they fall into separate windows. describe LeftTable; +-------------+--------------+----------+--------+ | Column Name | Data Type | Nullable | Extras | +-------------+--------------+----------+--------+ | row_time | TIMESTAMP(3) | NULL | | | num | INT | NULL | | | id | STRING | NULL | | +-------------+--------------+----------+--------+ SELECT * FROM LeftTable; row_time num id 2023-11-03 12:22:47.268 1 L1 2023-11-03 12:22:43.189 2 L2 2023-11-03 12:22:47.486 3 L3 describe RightTable; +-------------+--------------+----------+--------+ | Column Name | Data Type | Nullable | Extras | +-------------+--------------+----------+--------+ | row_time | TIMESTAMP(3) | NULL | | | num | INT | NULL | | | id | STRING | NULL | | +-------------+--------------+----------+--------+ SELECT * FROM RightTable; row_time num id 2023-11-03 12:23:22.045 2 R2 2023-11-03 12:23:16.437 3 R3 2023-11-03 12:23:18.349 4 R4 SELECT L.num as L_Num, L.id as L_Id, R.num as R_Num, R.id as R_Id, COALESCE(L.window_start, R.window_start) as window_start, COALESCE(L.window_end, R.window_end) as window_end FROM ( SELECT * FROM TABLE(TUMBLE(TABLE LeftTable, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES)) ) L FULL JOIN ( SELECT * FROM TABLE(TUMBLE(TABLE RightTable, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES)) ) R ON L.num = R.num AND L.window_start = R.window_start AND L.window_end = R.window_end; The output resembles: L_Num L_Id R_Num R_Id window_start window_end 1 L1 NULL NULL 2023-11-03 13:20 2023-11-03 13:25 NULL NULL 2 R2 2023-11-03 13:20 2023-11-03 13:25 3 L3 3 R3 2023-11-03 13:20 2023-11-03 13:25 2 L2 NULL NULL 2023-11-03 13:25 2023-11-03 13:30 NULL NULL 4 R4 2023-11-03 13:25 2023-11-03 13:30 SEMI¶ Semi Window Joins return a row from one left record if there is at least one matching row on the right side within the common window. SELECT * FROM ( SELECT * FROM TABLE(TUMBLE(TABLE LeftTable, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES)) ) L WHERE L.num IN ( SELECT num FROM ( SELECT * FROM TABLE(TUMBLE(TABLE RightTable, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES)) ) R WHERE L.window_start = R.window_start AND L.window_end = R.window_end); row_time num id window_start window_end window_time 2023-11-03 12:43:57.095 1 L3 2023-11-03 13:40 2023-11-03 13:45 2023-11-03 13:44:59.999 2023-11-03 12:43:54.914 1 L2 2023-11-03 13:40 2023-11-03 13:45 2023-11-03 13:44:59.999 2023-11-03 12:43:56.898 1 L1 2023-11-03 13:40 2023-11-03 13:45 2023-11-03 13:44:59.999 2023-11-03 12:43:59.112 1 L1 2023-11-03 13:40 2023-11-03 13:45 2023-11-03 13:44:59.999 2023-11-03 12:43:59.626 1 L5 2023-11-03 13:40 2023-11-03 13:45 2023-11-03 13:44:59.999 SELECT * FROM ( SELECT * FROM TABLE(TUMBLE(TABLE LeftTable, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES)) ) L WHERE EXISTS ( SELECT * FROM ( SELECT * FROM TABLE(TUMBLE(TABLE RightTable, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES)) ) R WHERE L.num = R.num AND L.window_start = R.window_start AND L.window_end = R.window_end); row_time num id window_start window_end window_time 2023-11-03 12:45:08.329 2 L4 2023-11-03 13:45 2023-11-03 13:50 2023-11-03 13:49:59.999 2023-11-03 12:45:06.702 2 L3 2023-11-03 13:45 2023-11-03 13:50 2023-11-03 13:49:59.999 2023-11-03 12:45:07.024 2 L4 2023-11-03 13:45 2023-11-03 13:50 2023-11-03 13:49:59.999 2023-11-03 12:45:05.581 2 L3 2023-11-03 13:45 2023-11-03 13:50 2023-11-03 13:49:59.999 ANTI¶ Anti Window Joins are the obverse of the Inner Window Join: they contain all of the unjoined rows within each common window. SELECT * FROM ( SELECT * FROM TABLE(TUMBLE(TABLE LeftTable, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES)) ) L WHERE L.num NOT IN ( SELECT num FROM ( SELECT * FROM TABLE(TUMBLE(TABLE RightTable, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES)) ) R WHERE L.window_start = R.window_start AND L.window_end = R.window_end); row_time num id window_start window_end window_time 2023-11-03 12:23:42.865 1 L1 2023-11-03 13:20 2023-11-03 13:25 2023-11-03 13:24:59.999 2023-11-03 12:23:42.956 1 L5 2023-11-03 13:20 2023-11-03 13:25 2023-11-03 13:24:59.999 2023-11-03 12:23:41.029 2 L1 2023-11-03 13:20 2023-11-03 13:25 2023-11-03 13:24:59.999 2023-11-03 12:23:36.826 1 L1 2023-11-03 13:20 2023-11-03 13:25 2023-11-03 13:24:59.999 2023-11-03 12:23:36.435 1 L4 2023-11-03 13:20 2023-11-03 13:25 2023-11-03 13:24:59.999 SELECT * FROM ( SELECT * FROM TABLE(TUMBLE(TABLE LeftTable, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES)) ) L WHERE NOT EXISTS ( SELECT * FROM ( SELECT * FROM TABLE(TUMBLE(TABLE RightTable, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES)) ) R WHERE L.num = R.num AND L.window_start = R.window_start AND L.window_end = R.window_end); row_time num id window_start window_end window_time 2023-11-03 12:23:14.693 2 L1 2023-11-03 13:20 2023-11-03 13:25 2023-11-03 13:24:59.999 2023-11-03 12:23:19.174 2 L1 2023-11-03 13:20 2023-11-03 13:25 2023-11-03 13:24:59.999 2023-11-03 12:23:11.035 2 L1 2023-11-03 13:20 2023-11-03 13:25 2023-11-03 13:24:59.999 2023-11-03 12:23:11.764 2 L3 2023-11-03 13:20 2023-11-03 13:25 2023-11-03 13:24:59.999 2023-11-03 12:23:16.240 2 L5 2023-11-03 13:20 2023-11-03 13:25 2023-11-03 13:24:59.999 Limitations¶ Limitation on Join clause¶ Currently, the window join requires that the join-on condition contains window-starts equality of input tables and window-ends equality of input tables. In the future, the join on clause could be simplified to include only the window-start equality if the windowing TVF is TUMBLE or HOP. Limitation on Windowing TVFs of inputs¶ Currently, the windowing TVFs must be the same for left and right inputs. This could be extended in the future, for example, tumbling windows join sliding windows with the same window size. Related content¶ Top-N Queries Window Top-N Queries Windowing Table-Valued Functions (Windowing TVFs) Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql SELECT ... FROM L [LEFT|RIGHT|FULL OUTER] JOIN R -- L and R are relations applied windowing TVF ON L.window_start = R.window_start AND L.window_end = R.window_end AND ... ``` ```sql window_starts ``` ```sql window_ends ``` ```sql TIMESTAMP(3) ``` ```sql 2020-04-15 08:05:00.000 ``` ```sql 2020-04-15 08:05 ``` ```sql [12:00, 12:05) ``` ```sql [12:05, 12:10) ``` ```sql describe LeftTable; ``` ```sql +-------------+--------------+----------+--------+ | Column Name | Data Type | Nullable | Extras | +-------------+--------------+----------+--------+ | row_time | TIMESTAMP(3) | NULL | | | num | INT | NULL | | | id | STRING | NULL | | +-------------+--------------+----------+--------+ ``` ```sql SELECT * FROM LeftTable; ``` ```sql row_time num id 2023-11-03 12:22:47.268 1 L1 2023-11-03 12:22:43.189 2 L2 2023-11-03 12:22:47.486 3 L3 ``` ```sql describe RightTable; ``` ```sql +-------------+--------------+----------+--------+ | Column Name | Data Type | Nullable | Extras | +-------------+--------------+----------+--------+ | row_time | TIMESTAMP(3) | NULL | | | num | INT | NULL | | | id | STRING | NULL | | +-------------+--------------+----------+--------+ ``` ```sql SELECT * FROM RightTable; ``` ```sql row_time num id 2023-11-03 12:23:22.045 2 R2 2023-11-03 12:23:16.437 3 R3 2023-11-03 12:23:18.349 4 R4 ``` ```sql SELECT L.num as L_Num, L.id as L_Id, R.num as R_Num, R.id as R_Id, COALESCE(L.window_start, R.window_start) as window_start, COALESCE(L.window_end, R.window_end) as window_end FROM ( SELECT * FROM TABLE(TUMBLE(TABLE LeftTable, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES)) ) L FULL JOIN ( SELECT * FROM TABLE(TUMBLE(TABLE RightTable, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES)) ) R ON L.num = R.num AND L.window_start = R.window_start AND L.window_end = R.window_end; ``` ```sql L_Num L_Id R_Num R_Id window_start window_end 1 L1 NULL NULL 2023-11-03 13:20 2023-11-03 13:25 NULL NULL 2 R2 2023-11-03 13:20 2023-11-03 13:25 3 L3 3 R3 2023-11-03 13:20 2023-11-03 13:25 2 L2 NULL NULL 2023-11-03 13:25 2023-11-03 13:30 NULL NULL 4 R4 2023-11-03 13:25 2023-11-03 13:30 ``` ```sql SELECT * FROM ( SELECT * FROM TABLE(TUMBLE(TABLE LeftTable, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES)) ) L WHERE L.num IN ( SELECT num FROM ( SELECT * FROM TABLE(TUMBLE(TABLE RightTable, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES)) ) R WHERE L.window_start = R.window_start AND L.window_end = R.window_end); ``` ```sql row_time num id window_start window_end window_time 2023-11-03 12:43:57.095 1 L3 2023-11-03 13:40 2023-11-03 13:45 2023-11-03 13:44:59.999 2023-11-03 12:43:54.914 1 L2 2023-11-03 13:40 2023-11-03 13:45 2023-11-03 13:44:59.999 2023-11-03 12:43:56.898 1 L1 2023-11-03 13:40 2023-11-03 13:45 2023-11-03 13:44:59.999 2023-11-03 12:43:59.112 1 L1 2023-11-03 13:40 2023-11-03 13:45 2023-11-03 13:44:59.999 2023-11-03 12:43:59.626 1 L5 2023-11-03 13:40 2023-11-03 13:45 2023-11-03 13:44:59.999 ``` ```sql SELECT * FROM ( SELECT * FROM TABLE(TUMBLE(TABLE LeftTable, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES)) ) L WHERE EXISTS ( SELECT * FROM ( SELECT * FROM TABLE(TUMBLE(TABLE RightTable, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES)) ) R WHERE L.num = R.num AND L.window_start = R.window_start AND L.window_end = R.window_end); ``` ```sql row_time num id window_start window_end window_time 2023-11-03 12:45:08.329 2 L4 2023-11-03 13:45 2023-11-03 13:50 2023-11-03 13:49:59.999 2023-11-03 12:45:06.702 2 L3 2023-11-03 13:45 2023-11-03 13:50 2023-11-03 13:49:59.999 2023-11-03 12:45:07.024 2 L4 2023-11-03 13:45 2023-11-03 13:50 2023-11-03 13:49:59.999 2023-11-03 12:45:05.581 2 L3 2023-11-03 13:45 2023-11-03 13:50 2023-11-03 13:49:59.999 ``` ```sql SELECT * FROM ( SELECT * FROM TABLE(TUMBLE(TABLE LeftTable, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES)) ) L WHERE L.num NOT IN ( SELECT num FROM ( SELECT * FROM TABLE(TUMBLE(TABLE RightTable, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES)) ) R WHERE L.window_start = R.window_start AND L.window_end = R.window_end); ``` ```sql row_time num id window_start window_end window_time 2023-11-03 12:23:42.865 1 L1 2023-11-03 13:20 2023-11-03 13:25 2023-11-03 13:24:59.999 2023-11-03 12:23:42.956 1 L5 2023-11-03 13:20 2023-11-03 13:25 2023-11-03 13:24:59.999 2023-11-03 12:23:41.029 2 L1 2023-11-03 13:20 2023-11-03 13:25 2023-11-03 13:24:59.999 2023-11-03 12:23:36.826 1 L1 2023-11-03 13:20 2023-11-03 13:25 2023-11-03 13:24:59.999 2023-11-03 12:23:36.435 1 L4 2023-11-03 13:20 2023-11-03 13:25 2023-11-03 13:24:59.999 ``` ```sql SELECT * FROM ( SELECT * FROM TABLE(TUMBLE(TABLE LeftTable, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES)) ) L WHERE NOT EXISTS ( SELECT * FROM ( SELECT * FROM TABLE(TUMBLE(TABLE RightTable, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES)) ) R WHERE L.num = R.num AND L.window_start = R.window_start AND L.window_end = R.window_end); ``` ```sql row_time num id window_start window_end window_time 2023-11-03 12:23:14.693 2 L1 2023-11-03 13:20 2023-11-03 13:25 2023-11-03 13:24:59.999 2023-11-03 12:23:19.174 2 L1 2023-11-03 13:20 2023-11-03 13:25 2023-11-03 13:24:59.999 2023-11-03 12:23:11.035 2 L1 2023-11-03 13:20 2023-11-03 13:25 2023-11-03 13:24:59.999 2023-11-03 12:23:11.764 2 L3 2023-11-03 13:20 2023-11-03 13:25 2023-11-03 13:24:59.999 2023-11-03 12:23:16.240 2 L5 2023-11-03 13:20 2023-11-03 13:25 2023-11-03 13:24:59.999 ``` --- ### SQL Window Top-N Queries in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/queries/window-topn.html Window Top-N Queries in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables Window Top-N queries in dynamic tables. Syntax¶ SELECT [column_list] FROM ( SELECT [column_list], ROW_NUMBER() OVER (PARTITION BY window_start, window_end [, col_key1...] ORDER BY col1 [asc|desc][, col2 [asc|desc]...]) AS rownum FROM table_name) -- relation applied windowing TVF WHERE rownum <= N [AND conditions] Description¶ Window Top-N is a special Top-N that returns the N smallest or largest values for each window and other partitioned keys. For streaming queries, unlike regular Top-N on continuous tables, Window Top-N doesn’t emit intermediate results, but only a final result, the total Top N records at the end of the window. Moreover, Window Top-N purges all intermediate state when no longer needed, so Window Top-N queries have better performance if you don’t need results updated per record. Usually, Window Top-N is used with Windowing TVF directly, but Window Top-N can be used with other operations based on Windowing TVF, like Window Aggregation, and Window Join. You can define Window Top-N with the same syntax as regular Top-N. For more information, see Top-N. In addition, Window Top-N requires that the PARTITION BY clause contains window_start and window_end columns of the relation applied by Windowing TVF or Window Aggregation. Otherwise, the optimizer can’t translate the query. Examples¶ The following examples show Window Top-N aggregations over example data streams that you can experiment with. Note To show the behavior of windowing more clearly in the following examples, TIMESTAMP(3) values may be simplified so that trailing zeroes aren’t shown. For example, 2020-04-15 08:05:00.000 may be shown as 2020-04-15 08:05. Columns may be hidden intentionally to enhance the readability of the content. Window Top-N follows after Window Aggregation¶ The following example shows how to calculate Top 3 customers who have the highest order value for every tumbling 10 minutes window. DESCRIBE `examples`.`marketplace`.`orders`; +--------------+-----------+----------+---------------+ | Column Name | Data Type | Nullable | Extras | +--------------+-----------+----------+---------------+ | order_id | STRING | NOT NULL | | | customer_id | INT | NOT NULL | | | product_id | STRING | NOT NULL | | | price | DOUBLE | NOT NULL | | +--------------+-----------+----------+---------------+ SELECT * FROM `examples`.`marketplace`.`orders`; order_id customer_id product_id price d770a538-a70c-4de6-9d06-e6c16c5bef5a 3075 1379 32.21 787ee1f4-d0d0-4c39-bdb9-44dc2d203d55 3028 1335 34.74 7ab7ce23-5f61-4398-afad-b1e3f548fee3 3148 1045 69.26 6fea712c-9454-497e-8038-ebaf6dfc7a17 3247 1390 67.26 dc9daf5e-98d5-4bcd-8839-251fed13b75e 3167 1309 12.04 ab3151d0-2950-49cd-9783-016ccc6a3281 3105 1094 21.52 d27ca945-3cff-48a4-afcc-7b17446aa95d 3168 1250 99.95 SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY window_start, window_end ORDER BY price DESC) as rownum FROM ( SELECT window_start, window_end, customer_id, SUM(price) as price, COUNT(*) as cnt FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES)) GROUP BY window_start, window_end, customer_id ) ) WHERE rownum <= 3; window_start window_end customer_id price cnt rownum 2023-11-02 17:50 2023-11-02 18:00 3084 1523.75 18 1 2023-11-02 17:50 2023-11-02 18:00 3092 1487.32 15 2 2023-11-02 17:50 2023-11-02 18:00 3082 1452.18 17 3 2023-11-02 18:00 2023-11-02 18:10 3095 1698.50 20 1 2023-11-02 18:00 2023-11-02 18:10 3088 1645.23 19 2 2023-11-02 18:00 2023-11-02 18:10 3079 1589.75 16 3 Window Top-N follows after Windowing TVF¶ The following example shows how to calculate Top 3 customers which have the highest order value for every tumbling 10 minutes window. SELECT * FROM ( SELECT $rowtime, price, product_id, customer_id, window_start, window_end, ROW_NUMBER() OVER (PARTITION BY window_start, window_end ORDER BY price DESC) as rownum FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES)) ) WHERE rownum <= 3; $rowtime price product_id customer_id window_start window_end rownum 2023-11-05 19:35:38 99.53 1382 3120 2023-11-05 19:30 2023-11-05 19:40 1 2023-11-05 19:35:39 99.04 1216 3204 2023-11-05 19:30 2023-11-05 19:40 2 2023-11-05 19:35:32 98.95 1364 3114 2023-11-05 19:30 2023-11-05 19:40 3 2023-11-05 19:42:41 97.75 1295 3187 2023-11-05 19:40 2023-11-05 19:50 1 2023-11-05 19:41:53 97.30 1428 3256 2023-11-05 19:40 2023-11-05 19:50 2 2023-11-05 19:43:17 96.80 1173 3092 2023-11-05 19:40 2023-11-05 19:50 3 Related content¶ Top-N Queries Windowing Table-Valued Functions (Windowing TVFs) Window Aggregation Queries Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql SELECT [column_list] FROM ( SELECT [column_list], ROW_NUMBER() OVER (PARTITION BY window_start, window_end [, col_key1...] ORDER BY col1 [asc|desc][, col2 [asc|desc]...]) AS rownum FROM table_name) -- relation applied windowing TVF WHERE rownum <= N [AND conditions] ``` ```sql PARTITION BY ``` ```sql window_start ``` ```sql TIMESTAMP(3) ``` ```sql 2020-04-15 08:05:00.000 ``` ```sql 2020-04-15 08:05 ``` ```sql DESCRIBE `examples`.`marketplace`.`orders`; ``` ```sql +--------------+-----------+----------+---------------+ | Column Name | Data Type | Nullable | Extras | +--------------+-----------+----------+---------------+ | order_id | STRING | NOT NULL | | | customer_id | INT | NOT NULL | | | product_id | STRING | NOT NULL | | | price | DOUBLE | NOT NULL | | +--------------+-----------+----------+---------------+ ``` ```sql SELECT * FROM `examples`.`marketplace`.`orders`; ``` ```sql order_id customer_id product_id price d770a538-a70c-4de6-9d06-e6c16c5bef5a 3075 1379 32.21 787ee1f4-d0d0-4c39-bdb9-44dc2d203d55 3028 1335 34.74 7ab7ce23-5f61-4398-afad-b1e3f548fee3 3148 1045 69.26 6fea712c-9454-497e-8038-ebaf6dfc7a17 3247 1390 67.26 dc9daf5e-98d5-4bcd-8839-251fed13b75e 3167 1309 12.04 ab3151d0-2950-49cd-9783-016ccc6a3281 3105 1094 21.52 d27ca945-3cff-48a4-afcc-7b17446aa95d 3168 1250 99.95 ``` ```sql SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY window_start, window_end ORDER BY price DESC) as rownum FROM ( SELECT window_start, window_end, customer_id, SUM(price) as price, COUNT(*) as cnt FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES)) GROUP BY window_start, window_end, customer_id ) ) WHERE rownum <= 3; ``` ```sql window_start window_end customer_id price cnt rownum 2023-11-02 17:50 2023-11-02 18:00 3084 1523.75 18 1 2023-11-02 17:50 2023-11-02 18:00 3092 1487.32 15 2 2023-11-02 17:50 2023-11-02 18:00 3082 1452.18 17 3 2023-11-02 18:00 2023-11-02 18:10 3095 1698.50 20 1 2023-11-02 18:00 2023-11-02 18:10 3088 1645.23 19 2 2023-11-02 18:00 2023-11-02 18:10 3079 1589.75 16 3 ``` ```sql SELECT * FROM ( SELECT $rowtime, price, product_id, customer_id, window_start, window_end, ROW_NUMBER() OVER (PARTITION BY window_start, window_end ORDER BY price DESC) as rownum FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES)) ) WHERE rownum <= 3; ``` ```sql $rowtime price product_id customer_id window_start window_end rownum 2023-11-05 19:35:38 99.53 1382 3120 2023-11-05 19:30 2023-11-05 19:40 1 2023-11-05 19:35:39 99.04 1216 3204 2023-11-05 19:30 2023-11-05 19:40 2 2023-11-05 19:35:32 98.95 1364 3114 2023-11-05 19:30 2023-11-05 19:40 3 2023-11-05 19:42:41 97.75 1295 3187 2023-11-05 19:40 2023-11-05 19:50 1 2023-11-05 19:41:53 97.30 1428 3256 2023-11-05 19:40 2023-11-05 19:50 2 2023-11-05 19:43:17 96.80 1173 3092 2023-11-05 19:40 2023-11-05 19:50 3 ``` --- ### SQL Windowing Table-Valued Functions in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/queries/window-tvf.html Windowing Table-Valued Functions (Windowing TVFs) in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides several window table-valued functions (TVFs) for dividing the elements of a table into windows. Description¶ Windows are central to processing infinite streams. Windows split the stream into “buckets” of finite size, over which you can apply computations. This document focuses on how windowing is performed in Confluent Cloud for Apache Flink and how you can benefit from windowed functions. Flink provides several window table-valued functions (TVF) to divide the elements of your table into windows, including: Tumble Windows Hop Windows Cumulate Windows Session Windows (not supported in batch mode) Note that each element can logically belong to more than one window, depending on the windowing table-valued function you use. For example, HOP windowing creates overlapping windows in which a single element can be assigned to multiple windows. Windowing TVFs are Flink-defined Polymorphic Table Functions (abbreviated PTF). PTF is part of the SQL 2016 standard, a special table-function, but can have a table as a parameter. PTF is a powerful feature to change the shape of a table. Because PTFs are used semantically like tables, their invocation occurs in a FROM clause of a SELECT statement. These are frequently-used computations based on windowing TVF: Window Aggregation Window TopN Window Join Window Deduplication Window functions¶ Flink provides 4 built-in windowing TVFs: TUMBLE, HOP, CUMULATE and SESSION. The return value of windowing TVF is a new relation that includes all columns of original relation as well as additional 3 columns named “window_start”, “window_end”, “window_time” to indicate the assigned window. In streaming mode, the “window_time” field is a time attribute of the window. In batch mode, the “window_time” field is an attribute of type TIMESTAMP or TIMESTAMP_LTZ based on input time field type. The “window_time” field can be used in subsequent time-based operations, for example, another windowing TVF, interval-join, or over aggregation. The value of window_time always equal to window_end - 1ms. Window alignment¶ Time-based window boundaries align with clock seconds, minutes, hours, and days. For example, assume that you have events with these timestamps (in UTC): 00:59:00.000 00:59:30.000 01:00:15.000 If you put these events into hour-long tumbling windows, the first two land in the window for 00:00:00-00:59:59.999, and the third event lands in the following hour. Supported time units¶ Window TVFs support the following time units: SECOND MINUTE HOUR DAY MONTH and YEAR time units are not currently supported. Examples¶ The following examples show Window TVFs over example data streams that you can experiment with. Note To show the behavior of windowing more clearly in the following examples, TIMESTAMP(3) values may be simplified so that trailing zeroes aren’t shown. For example, 2020-04-15 08:05:00.000 may be shown as 2020-04-15 08:05. Columns may be hidden intentionally to enhance the readability of the content. TUMBLE¶ The TUMBLE function assigns each element to a window of specified window size. Tumbling windows have a fixed size and do not overlap. For example, suppose you specify a tumbling window with a size of 5 minutes. In that case, Flink will evaluate the current window, and a new window started every five minutes, as illustrated by the following figure. The TUMBLE function assigns a window for each row of a relation based on a time attribute field. In streaming mode, the time attribute field must be an event time attribute. In batch mode, the time attribute field of window table function must be an attribute of type TIMESTAMP or TIMESTAMP_LTZ. The return value of TUMBLE is a new relation that includes all columns of the original relation, as well as an additional 3 columns named window_start, window_end, and window_time to indicate the assigned window. The original time attribute, timecol is a regular timestamp column after windowing TVF. The TUMBLE function takes three required parameters and one optional parameter: TUMBLE(TABLE data, DESCRIPTOR(timecol), size [, offset ]) data: is a table parameter that can be any relation with a time attribute column. timecol: is a column descriptor indicating which time attributes column of data should be mapped to tumbling windows. size: is a duration specifying the width of the tumbling windows. offset: is an optional parameter to specify the offset which window start would be shifted by. Here is an example invocation on the orders table: DESCRIBE `examples`.`marketplace`.`orders`; The output resembles: +--------------+-----------+----------+---------------+ | Column Name | Data Type | Nullable | Extras | +--------------+-----------+----------+---------------+ | order_id | STRING | NOT NULL | | | customer_id | INT | NOT NULL | | | product_id | STRING | NOT NULL | | | price | DOUBLE | NOT NULL | | +--------------+-----------+----------+---------------+ The following query returns all rows in the orders table. SELECT * FROM `examples`.`marketplace`.`orders`; The output resembles: order_id customer_id product_id price d770a538-a70c-4de6-9d06-e6c16c5bef5a 3075 1379 32.21 787ee1f4-d0d0-4c39-bdb9-44dc2d203d55 3028 1335 34.74 7ab7ce23-5f61-4398-afad-b1e3f548fee3 3148 1045 69.26 6fea712c-9454-497e-8038-ebaf6dfc7a17 3247 1390 67.26 dc9daf5e-98d5-4bcd-8839-251fed13b75e 3167 1309 12.04 ab3151d0-2950-49cd-9783-016ccc6a3281 3105 1094 21.52 d27ca945-3cff-48a4-afcc-7b17446aa95d 3168 1250 99.95 The following queries return all rows in the orders table in 10-minute tumbling windows. SELECT * FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES)) -- or with the named params -- note: the DATA param must be the first SELECT * FROM TABLE( TUMBLE( DATA => TABLE `examples`.`marketplace`.`orders`, TIMECOL => DESCRIPTOR($rowtime), SIZE => INTERVAL '10' MINUTES)); The output resembles: order_id customer_id product_id price $rowtime window_start window_end window_time e69058b5-7ed9-44fa-86ff-4d6f8baff028 3145 1488 63.94 2023-11-02 13:20:27 2023-11-02 13:20:00 2023-11-02 13:30:00 2023-11-02 13:29:59.999 92e81cc4-93c4-488b-9386-ae9300d7cd21 3223 1328 29.37 2023-11-02 13:20:27 2023-11-02 13:20:00 2023-11-02 13:30:00 2023-11-02 13:29:59.999 7ca2ddaa-dd5e-41dc-ac47-c9aa7477d913 3223 1402 49.78 2023-11-02 13:20:27 2023-11-02 13:20:00 2023-11-02 13:30:00 2023-11-02 13:29:59.999 84efa0d0-7157-4cd3-a893-e7d2780cefdd 3076 1321 47.38 2023-11-02 13:20:27 2023-11-02 13:20:00 2023-11-02 13:30:00 2023-11-02 13:29:59.999 d72a37d2-ef15-4740-8ae8-1199ddf84ea9 3211 1234 56.27 2023-11-02 13:20:27 2023-11-02 13:20:00 2023-11-02 13:30:00 2023-11-02 13:29:59.999 4d57c754-63e1-413a-8af8-768d54d128ee 3126 1223 21.52 2023-11-02 13:20:27 2023-11-02 13:20:00 2023-11-02 13:30:00 2023-11-02 13:29:59.999 80f9fe0b-3e5d-4c25-aa6e-0b3dacfa36de 3087 1393 70.26 2023-11-02 13:20:27 2023-11-02 13:20:00 2023-11-02 13:30:00 2023-11-02 13:29:59.999 ea733533-1516-41b6-b5e3-cadcb6f71529 3079 1488 17.55 2023-11-02 13:20:27 2023-11-02 13:20:00 2023-11-02 13:30:00 2023-11-02 13:29:59.999 cef1cd9f-379e-4791-8a0d-69eec8adae35 3211 1293 91.20 2023-11-02 13:20:27 2023-11-02 13:20:00 2023-11-02 13:30:00 2023-11-02 13:29:59.999 The following query computes the sum of the price column in the orders table within 10-minute tumbling windows. -- apply aggregation on the tumbling windowed table SELECT window_start, window_end, SUM(price) as `sum` FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES)) GROUP BY window_start, window_end; The output resembles: window_start window_end sum 2023-11-02 10:40:00 2023-11-02 10:50:00 258484.93 2023-11-02 10:50:00 2023-11-02 11:00:00 287632.15 2023-11-02 11:00:00 2023-11-02 11:10:00 271945.78 2023-11-02 11:10:00 2023-11-02 11:20:00 315207.46 2023-11-02 11:20:00 2023-11-02 11:30:00 342618.92 2023-11-02 11:30:00 2023-11-02 11:40:00 329754.31 HOP¶ The HOP function assigns elements to windows of fixed length. Like a TUMBLE windowing function, the size of the windows is configured by the window size parameter. An additional window slide parameter controls how frequently a hopping window is started. Hence, hopping windows can be overlapping if the slide is smaller than the window size. In this case, elements are assigned to multiple windows. Hopping windows are also known as “sliding windows”. For example, you could have windows of size 10 minutes that slides by 5 minutes. With this, you get every 5 minutes a window that contains the events that arrived during the last 10 minutes, as depicted by the following figure. The HOP function assigns windows that cover rows within the interval of size and shifting every slide based on a time attribute field. In streaming mode, the time attribute field must be an event time attribute. In batch mode, the time attribute field of window table function must be an attribute of type TIMESTAMP or TIMESTAMP_LTZ. The return value of HOP is a new relation that includes all columns of the original relation as well as an additional 3 columns named window_start, window_end, and window_time to indicate the assigned window. The original time attribute, timecol, is a regular timestamp column after windowing TVF. The HOP takes four required parameters and one optional parameter: HOP(TABLE data, DESCRIPTOR(timecol), slide, size [, offset ]) data: is a table parameter that can be any relation with an time attribute column. timecol: is a column descriptor indicating which time attributes column of data should be mapped to hopping windows. slide: is a duration specifying the duration between the start of sequential hopping windows size: is a duration specifying the width of the hopping windows. offset: is an optional parameter to specify the offset which window start would be shifted by. The following queries return all rows in the orders table in hopping windows with a 5-minute slide and 10-minute size. SELECT * FROM TABLE( HOP(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES, INTERVAL '10' MINUTES)) -- or with the named params -- note: the DATA param must be the first SELECT * FROM TABLE( HOP( DATA => TABLE `examples`.`marketplace`.`orders`, TIMECOL => DESCRIPTOR($rowtime), SLIDE => INTERVAL '5' MINUTES, SIZE => INTERVAL '10' MINUTES)); The output resembles: order_id customer_id product_id price $rowtime window_start window_end window_time 10ae1386-496e-4c6c-9436-7f7e2e7a59f9 3160 1015 26.20 2023-11-02 19:24:46 2023-11-02 19:20:00 2023-11-02 19:30:00 2023-11-02 19:29:59.999 10ae1386-496e-4c6c-9436-7f7e2e7a59f9 3160 1015 26.20 2023-11-02 19:24:46 2023-11-02 19:15:00 2023-11-02 19:25:00 2023-11-02 19:24:59.999 66ecb3b3-7a3d-43ac-b3a2-4c35e06a8d7c 3046 1081 20.24 2023-11-02 19:24:46 2023-11-02 19:20:00 2023-11-02 19:30:00 2023-11-02 19:29:59.999 66ecb3b3-7a3d-43ac-b3a2-4c35e06a8d7c 3046 1081 20.24 2023-11-02 19:24:46 2023-11-02 19:15:00 2023-11-02 19:25:00 2023-11-02 19:24:59.999 4d86db03-a573-4fc2-9699-85455331a7c4 3023 1346 85.45 2023-11-02 19:24:46 2023-11-02 19:20:00 2023-11-02 19:30:00 2023-11-02 19:29:59.999 4d86db03-a573-4fc2-9699-85455331a7c4 3023 1346 85.45 2023-11-02 19:24:46 2023-11-02 19:15:00 2023-11-02 19:25:00 2023-11-02 19:24:59.999 d1460cf7-9472-45e0-9c2d-40537c9f34c0 3114 1333 49.56 2023-11-02 19:24:47 2023-11-02 19:20:00 2023-11-02 19:30:00 2023-11-02 19:29:59.999 d1460cf7-9472-45e0-9c2d-40537c9f34c0 3114 1333 49.56 2023-11-02 19:24:47 2023-11-02 19:15:00 2023-11-02 19:25:00 2023-11-02 19:24:59.999 e38984d8-5683-4e55-9f7a-e43350de7c3d 3024 1402 90.75 2023-11-02 19:24:47 2023-11-02 19:20:00 2023-11-02 19:30:00 2023-11-02 19:29:59.999 e38984d8-5683-4e55-9f7a-e43350de7c3d 3024 1402 90.75 2023-11-02 19:24:47 2023-11-02 19:15:00 2023-11-02 19:25:00 2023-11-02 19:24:59.999 The following query computes the sum of the price column in the orders table within hopping windows that have a 5-minute slide and 10-minute size. -- apply aggregation on the hopping windowed table SELECT window_start, window_end, SUM(price) as `sum` FROM TABLE( HOP(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES, INTERVAL '10' MINUTES)) GROUP BY window_start, window_end; The output resembles: window_start window_end sum 2023-11-02 11:10:00 2023-11-02 11:20:00 296049.38 2023-11-02 11:15:00 2023-11-02 11:25:00 1122455.07 2023-11-02 11:20:00 2023-11-02 11:30:00 1648270.20 2023-11-02 11:25:00 2023-11-02 11:35:00 2143271.00 2023-11-02 11:30:00 2023-11-02 11:40:00 2701592.45 2023-11-02 11:35:00 2023-11-02 11:45:00 3214376.78 CUMULATE¶ Cumulating windows are useful in some scenarios, such as tumbling windows with early firing in a fixed window interval. For example, a daily dashboard might display cumulative unique views (UVs) from 00:00 to every minute, and the UV at 10:00 might represent the total number of UVs from 00:00 to 10:00. This can be implemented easily and efficiently by CUMULATE windowing. The CUMULATE function assigns elements to windows that cover rows within an initial interval of a specified step size, and it expands by one more step size, keeping the window start fixed, for every step, until the maximum window size is reached. CUMULATE function windows all have the same window start but add a step size to each window until the max value is reached, so the window size is always changing, and the windows overlap. When the max value is reached, the window start is advanced to the end of the last window, and the size resets to the step size. In comparison, TUMBLE function windows all have the same size, the step size, and do not overlap. For example, you could have a cumulating window with a 1-hour step and 1-day maximum size, and you will get these windows for every day: [00:00, 01:00) [00:00, 02:00) [00:00, 03:00) … [00:00, 24:00) The CUMULATE function assigns windows based on a time attribute column. In streaming mode, the time attribute field must be an event time attribute. In batch mode, the time attribute field of window table function must be an attribute of type TIMESTAMP or TIMESTAMP_LTZ. The return value of CUMULATE is a new relation that includes all columns of the original relation, as well as an additional 3 columns named window_start, window_end, and window_time to indicate the assigned window. The original time attribute, timecol, is a regular timestamp column after window TVF. The CUMULATE takes four required parameters and one optional parameter: CUMULATE(TABLE data, DESCRIPTOR(timecol), step, size) data: is a table parameter that can be any relation with an time attribute column. timecol: is a column descriptor indicating which time attributes column of data should be mapped to cumulating windows. step: is a duration specifying the increased window size between the end of sequential cumulating windows. size: is a duration specifying the max width of the cumulating windows. size must be an integral multiple of step. offset: is an optional parameter to specify the offset which window start would be shifted by. The following queries return all rows in the orders table in CUMULATE windows that have a 2-minute step and 10-minute size. SELECT * FROM TABLE( CUMULATE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '2' MINUTES, INTERVAL '10' MINUTES)); -- or with the named params -- note: the DATA param must be the first SELECT * FROM TABLE( CUMULATE( DATA => TABLE `examples`.`marketplace`.`orders`, TIMECOL => DESCRIPTOR($rowtime), STEP => INTERVAL '2' MINUTES, SIZE => INTERVAL '10' MINUTES)); The output resembles: order_id customer_id product_id price $rowtime window_start window_end window_time 2572a2e0-2ba2-4947-8926-e70e31b68df3 3239 1015 13.59 2023-11-02 19:27:39 2023-11-02 19:20:00 2023-11-02 19:28:00 2023-11-02 19:27:59.999 2572a2e0-2ba2-4947-8926-e70e31b68df3 3239 1015 13.59 2023-11-02 19:27:39 2023-11-02 19:20:00 2023-11-02 19:30:00 2023-11-02 19:29:59.999 7f791e40-a524-4a9b-bb0d-35a2c1b5a7c4 3102 1374 93.59 2023-11-02 19:27:39 2023-11-02 19:20:00 2023-11-02 19:28:00 2023-11-02 19:27:59.999 7f791e40-a524-4a9b-bb0d-35a2c1b5a7c4 3102 1374 93.59 2023-11-02 19:27:39 2023-11-02 19:20:00 2023-11-02 19:30:00 2023-11-02 19:29:59.999 47e70310-8fa4-4568-b521-7e2b68b06634 3026 1142 58.26 2023-11-02 19:27:39 2023-11-02 19:20:00 2023-11-02 19:28:00 2023-11-02 19:27:59.999 47e70310-8fa4-4568-b521-7e2b68b06634 3026 1142 58.26 2023-11-02 19:27:39 2023-11-02 19:20:00 2023-11-02 19:30:00 2023-11-02 19:29:59.999 fe1b440e-dc75-4092-be11-8e1c3afe55c7 3106 1057 11.37 2023-11-02 19:27:39 2023-11-02 19:20:00 2023-11-02 19:28:00 2023-11-02 19:27:59.999 fe1b440e-dc75-4092-be11-8e1c3afe55c7 3106 1057 11.37 2023-11-02 19:27:39 2023-11-02 19:20:00 2023-11-02 19:30:00 2023-11-02 19:29:59.999 6668e4dc-d574-44db-8f0f-2b8e1b1f3c2e 3061 1049 26.20 2023-11-02 19:27:39 2023-11-02 19:20:00 2023-11-02 19:28:00 2023-11-02 19:27:59.999 6668e4dc-d574-44db-8f0f-2b8e1b1f3c2e 3061 1049 26.20 2023-11-02 19:27:39 2023-11-02 19:20:00 2023-11-02 19:30:00 2023-11-02 19:29:59.999 The following query computes the sum of the price column in the orders table within CUMULATE windows that have a 2-minute step and 10-minute size. -- apply aggregation on the cumulating windowed table SELECT window_start, window_end, SUM(price) as `sum` FROM TABLE( CUMULATE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '2' MINUTES, INTERVAL '10' MINUTES)) GROUP BY window_start, window_end; The output resembles: window_start window_end sum 2023-11-02 12:40:00.000 2023-11-02 12:46:00.000 327376.23 2023-11-02 12:40:00.000 2023-11-02 12:48:00.000 661272.70 2023-11-02 12:40:00.000 2023-11-02 12:50:00.000 989294.13 2023-11-02 12:50:00.000 2023-11-02 12:52:00.000 1316596.58 2023-11-02 12:50:00.000 2023-11-02 12:54:00.000 1648097.20 2023-11-02 12:50:00.000 2023-11-02 12:56:00.000 1977881.53 2023-11-02 12:50:00.000 2023-11-02 12:58:00.000 2304080.32 2023-11-02 12:50:00.000 2023-11-02 13:00:00.000 2636795.56 SESSION¶ The SESSION function groups elements by sessions of activity. Unlike TUMBLE and HOP windows, session windows do not overlap and do not have a fixed start and end time. Instead, a session window closes when it doesn’t receive elements for a certain period of time, that is, when a gap of inactivity occurs. A session window is configured with a static session gap that defines the duration of inactivity. When this period expires, the current session closes and subsequent elements are assigned to a new session window. For example, you could have windows with a gap of 1 minute. With this configuration, when the interval between two events is less than 1 minute, these events are grouped into the same session window. If there is no data for 1 minute following the latest event, then this session window closes and is sent downstream. Subsequent events are assigned to a new session window. The SESSION function assigns windows that cover rows based on a time attribute. In streaming mode, the time attribute field must be an event time attribute. SESSION Window TVF is not supported in batch mode. The return value of SESSION is a new relation that includes all columns of the original relation, as well as three additional columns named window_start, window_end, and window_time to indicate the assigned window. The original time attribute timecol becomes a regular timestamp column after the windowing TVF. The SESSION function takes three required parameters and one optional parameter: SESSION(TABLE data [PARTITION BY(keycols, ...)], DESCRIPTOR(timecol), gap) data: is a table parameter that can be any relation with a time attribute column. keycols: is a column or set of columns indicating which columns should be used to partition the data prior to session windows. timecol: is a column descriptor indicating which time attribute column of data should be mapped to session windows. gap: is the maximum interval in timestamp for two events to be considered part of the same session window. The following query returns all columns from the orders table within SESSION windows that have a 1-minute gap, partitioned by product_id: SELECT * FROM TABLE( SESSION(TABLE `examples`.`marketplace`.`orders` PARTITION BY product_id, DESCRIPTOR($rowtime), INTERVAL '1' MINUTES)); -- or with the named params -- note: the DATA param must be the first SELECT * FROM TABLE( SESSION( DATA => TABLE `examples`.`marketplace`.`orders` PARTITION BY product_id, TIMECOL => DESCRIPTOR($rowtime), GAP => INTERVAL '1' MINUTES)); The output resembles: order_id customer_id product_id price $rowtime window_start window_end window_time d7ef1f9a-4f5f-406e-bbad-25db521c38bf 3068 1234 17.08 2023-11-02T19:43:58.626Z 2023-11-02 21:43:58.626 2023-11-02 21:44:58.626 2023-11-02T19:44:58.625Z 804f0c86-a59a-4425-a293-b28bafaa9674 3071 1332 48.12 2023-11-02T19:44:00.506Z 2023-11-02 21:44:00.506 2023-11-02 21:45:00.506 2023-11-02T19:45:00.505Z 61ea63e3-f040-4501-b78e-8db1fdcf45fc 3179 1267 12.35 2023-11-02T19:43:58.405Z 2023-11-02 21:43:58.405 2023-11-02 21:45:07.925 2023-11-02T19:45:07.924Z b70ba5bc-428c-41d7-b8fc-8014dd3fd429 3234 1267 40.81 2023-11-02T19:44:00.365Z 2023-11-02 21:43:58.405 2023-11-02 21:45:07.925 2023-11-02T19:45:07.924Z 37688f8c-65ee-4e27-a567-4890e6c7663b 3179 1267 98.17 2023-11-02T19:44:07.925Z 2023-11-02 21:43:58.405 2023-11-02 21:45:07.925 2023-11-02T19:45:07.924Z 4cfa0cc6-881a-43b3-bb34-1746c3b93094 3077 1047 16.78 2023-11-02T19:44:01.985Z 2023-11-02 21:44:01.985 2023-11-02 21:45:23.285 2023-11-02T19:45:23.284Z e007ce6e-5a76-4390-8fb3-50f46025b965 3095 1047 77.48 2023-11-02T19:44:11.365Z 2023-11-02 21:44:01.985 2023-11-02 21:45:23.285 2023-11-02T19:45:23.284Z 487a0248-a534-489e-bbc5-733e87d19cc7 3200 1047 47.86 2023-11-02T19:44:23.285Z 2023-11-02 21:44:01.985 2023-11-02 21:45:23.285 2023-11-02T19:45:23.284Z 4dd1ab51-8ca4-4de6-9f79-bb2ad7ab2498 3043 1235 36.5 2023-11-02T19:43:57.785Z 2023-11-02 21:43:57.785 2023-11-02 21:45:24.625 2023-11-02T19:45:24.624Z bb524ec6-1b21-40f1-8c54-3aac7b454c5b 3232 1235 36.98 2023-11-02T19:44:07.265Z 2023-11-02 21:43:57.785 2023-11-02 21:45:24.625 2023-11-02T19:45:24.624Z 9c218c8a-1566-4982-9640-a0deb9ac203c 3065 1235 30.17 2023-11-02T19:44:16.966Z 2023-11-02 21:43:57.785 2023-11-02 21:45:24.625 2023-11-02T19:45:24.624Z 6623c41b-04fa-4df0-a312-45b6dfcdc639 3143 1235 12.2 2023-11-02T19:44:24.625Z 2023-11-02 21:43:57.785 2023-11-02 21:45:24.625 2023-11-02T19:45:24.624Z The following query computes the sum of the price column in the orders table within SESSION windows that have a 5-minute gap. SELECT window_start, window_end, customer_id, SUM(price) as `sum` FROM TABLE( SESSION(TABLE `examples`.`marketplace`.`orders` PARTITION BY customer_id, DESCRIPTOR($rowtime), INTERVAL '1' MINUTES)) GROUP BY window_start, window_end, customer_id; The output resembles: window_start window_end sum 2023-11-02 12:40:00 2023-11-02 12:46:00 327376.23 2023-11-02 12:40:00 2023-11-02 12:48:00 661272.70 2023-11-02 12:40:00 2023-11-02 12:50:00 989294.13 2023-11-02 12:50:00 2023-11-02 12:52:00 1316596.58 2023-11-02 12:50:00 2023-11-02 12:54:00 1648097.20 2023-11-02 12:50:00 2023-11-02 12:56:00 1977881.53 2023-11-02 12:50:00 2023-11-02 12:58:00 2304080.32 2023-11-02 12:50:00 2023-11-02 13:00:00 2636795.56 Window Offset¶ Offset is an optional parameter that you can use to change the window assignment. It can be a positive duration or a negative duration. The default value for a window offset is 0. The same record may be assigned to a different window if set to a different offset value. For example, which window would a record be assigned to if it has a timestamp of 2021-06-30 00:00:00, for a Tumble window with 10 MINUTE as size? If the offset is -16 MINUTE, the record assigns to window [2021-06-29 23:44:00, 2021-06-29 23:54:00]. If the offset is -6 MINUTE, the record assigns to window [2021-06-29 23:54:00, 2021-06-30 00:04:00]. If the offset is -4 MINUTE, the record assigns to window [2021-06-29 23:56:00, 2021-06-30 00:06:00]. If the offset is 0, the record assigns to window [2021-06-30 00:00:00, 2021-06-30 00:10:00]. If the offset is 4 MINUTE, the record assigns to window [2021-06-30 00:04:00, 2021-06-30 00:14:00]. If the offset is 6 MINUTE, the record assigns to window [2021-06-30 00:06:00, 2021-06-30 00:16:00]. If the offset is 16 MINUTE, the record assigns to window [2021-06-30 00:16:00, 2021-06-30 00:26:00]. Note The effect of window offset is only for updating window assignment. It has no effect on Watermark. Examples¶ The following SQL examples show how to use offset in a tumbling window. SELECT * FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES, INTERVAL '1' MINUTES)); -- or with the named params -- note: the DATA param must be the first SELECT * FROM TABLE( TUMBLE( DATA => TABLE `examples`.`marketplace`.`orders`, TIMECOL => DESCRIPTOR($rowtime), SIZE => INTERVAL '10' MINUTES, OFFSET => INTERVAL '1' MINUTES)); The output resembles: order_id customer_id product_id price $rowtime window_start window_end window_time 0932497b-a3c2-4f80-9b1f-9d099b091696 3063 1035 75.85 2023-11-02 19:29:51 2023-11-02 19:21:00 2023-11-02 19:31:00 2023-11-02 19:30:59.999 20f4529c-9c86-4a54-8c38-f6c3caa1d7b8 3131 1207 89.00 2023-11-02 19:29:51 2023-11-02 19:21:00 2023-11-02 19:31:00 2023-11-02 19:30:59.999 cbda6c08-e0c7-41cb-ae04-c50f5b1f5e3c 3074 1312 63.71 2023-11-02 19:29:51 2023-11-02 19:21:00 2023-11-02 19:31:00 2023-11-02 19:30:59.999 d049ed28-cbbb-479b-8df6-8c637c1b68f5 3006 1201 72.14 2023-11-02 19:29:51 2023-11-02 19:21:00 2023-11-02 19:31:00 2023-11-02 19:30:59.999 63b6f2ef-c0e9-4737-ab81-f5acb93e4a64 3182 1346 76.18 2023-11-02 19:29:51 2023-11-02 19:21:00 2023-11-02 19:31:00 2023-11-02 19:30:59.999 00c088db-9cb7-4128-a4fd-4e06c0e95f7a 3198 1166 63.49 2023-11-02 19:29:51 2023-11-02 19:21:00 2023-11-02 19:31:00 2023-11-02 19:30:59.999 b9ca292e-635a-4ef7-a6ee-bcf099df7c1b 3236 1462 69.13 2023-11-02 19:29:51 2023-11-02 19:21:00 2023-11-02 19:31:00 2023-11-02 19:30:59.999 3299fd08-264e-4e49-8bb9-82cae18c5d7c 3058 1226 59.53 2023-11-02 19:29:51 2023-11-02 19:21:00 2023-11-02 19:31:00 2023-11-02 19:30:59.999 45878388-7cb3-409d-91a4-8ef1f02c8576 3028 1228 16.63 2023-11-02 19:29:51 2023-11-02 19:21:00 2023-11-02 19:31:00 2023-11-02 19:30:59.999 c2fef024-c0c2-4c0f-9880-bc423d1c2db6 3219 1071 80.66 2023-11-02 19:29:51 2023-11-02 19:21:00 2023-11-02 19:31:00 2023-11-02 19:30:59.999 The following query computes the sum of the price column in the orders table within 10-minute tumbling windows that have an offset of 1 minute. -- apply aggregation on the tumbling windowed table SELECT window_start, window_end, SUM(price) as `sum` FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES, INTERVAL '1' MINUTES)) GROUP BY window_start, window_end; The output resembles: window_start window_end sum 2023-11-02 19:21:00 2023-11-02 19:31:00 7285.64 2023-11-02 19:22:00 2023-11-02 19:32:00 6932.18 2023-11-02 19:23:00 2023-11-02 19:33:00 7104.53 2023-11-02 19:24:00 2023-11-02 19:34:00 7456.92 2023-11-02 19:25:00 2023-11-02 19:35:00 7198.75 2023-11-02 19:26:00 2023-11-02 19:36:00 6875.39 2023-11-02 19:27:00 2023-11-02 19:37:00 7312.87 2023-11-02 19:28:00 2023-11-02 19:38:00 7089.26 2023-11-02 19:29:00 2023-11-02 19:39:00 7401.58 2023-11-02 19:30:00 2023-11-02 19:40:00 7156.43 Related content¶ Course: Window Aggregations Confluent Developer: How to create cumulating windows Top-N Queries Window Top-N Queries Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql TIMESTAMP_LTZ ``` ```sql window_time ``` ```sql window_end - 1ms ``` ```sql 00:00:00-00:59:59.999 ``` ```sql TIMESTAMP(3) ``` ```sql 2020-04-15 08:05:00.000 ``` ```sql 2020-04-15 08:05 ``` ```sql TIMESTAMP_LTZ ``` ```sql window_start ``` ```sql window_time ``` ```sql TUMBLE(TABLE data, DESCRIPTOR(timecol), size [, offset ]) ``` ```sql DESCRIBE `examples`.`marketplace`.`orders`; ``` ```sql +--------------+-----------+----------+---------------+ | Column Name | Data Type | Nullable | Extras | +--------------+-----------+----------+---------------+ | order_id | STRING | NOT NULL | | | customer_id | INT | NOT NULL | | | product_id | STRING | NOT NULL | | | price | DOUBLE | NOT NULL | | +--------------+-----------+----------+---------------+ ``` ```sql SELECT * FROM `examples`.`marketplace`.`orders`; ``` ```sql order_id customer_id product_id price d770a538-a70c-4de6-9d06-e6c16c5bef5a 3075 1379 32.21 787ee1f4-d0d0-4c39-bdb9-44dc2d203d55 3028 1335 34.74 7ab7ce23-5f61-4398-afad-b1e3f548fee3 3148 1045 69.26 6fea712c-9454-497e-8038-ebaf6dfc7a17 3247 1390 67.26 dc9daf5e-98d5-4bcd-8839-251fed13b75e 3167 1309 12.04 ab3151d0-2950-49cd-9783-016ccc6a3281 3105 1094 21.52 d27ca945-3cff-48a4-afcc-7b17446aa95d 3168 1250 99.95 ``` ```sql SELECT * FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES)) -- or with the named params -- note: the DATA param must be the first SELECT * FROM TABLE( TUMBLE( DATA => TABLE `examples`.`marketplace`.`orders`, TIMECOL => DESCRIPTOR($rowtime), SIZE => INTERVAL '10' MINUTES)); ``` ```sql order_id customer_id product_id price $rowtime window_start window_end window_time e69058b5-7ed9-44fa-86ff-4d6f8baff028 3145 1488 63.94 2023-11-02 13:20:27 2023-11-02 13:20:00 2023-11-02 13:30:00 2023-11-02 13:29:59.999 92e81cc4-93c4-488b-9386-ae9300d7cd21 3223 1328 29.37 2023-11-02 13:20:27 2023-11-02 13:20:00 2023-11-02 13:30:00 2023-11-02 13:29:59.999 7ca2ddaa-dd5e-41dc-ac47-c9aa7477d913 3223 1402 49.78 2023-11-02 13:20:27 2023-11-02 13:20:00 2023-11-02 13:30:00 2023-11-02 13:29:59.999 84efa0d0-7157-4cd3-a893-e7d2780cefdd 3076 1321 47.38 2023-11-02 13:20:27 2023-11-02 13:20:00 2023-11-02 13:30:00 2023-11-02 13:29:59.999 d72a37d2-ef15-4740-8ae8-1199ddf84ea9 3211 1234 56.27 2023-11-02 13:20:27 2023-11-02 13:20:00 2023-11-02 13:30:00 2023-11-02 13:29:59.999 4d57c754-63e1-413a-8af8-768d54d128ee 3126 1223 21.52 2023-11-02 13:20:27 2023-11-02 13:20:00 2023-11-02 13:30:00 2023-11-02 13:29:59.999 80f9fe0b-3e5d-4c25-aa6e-0b3dacfa36de 3087 1393 70.26 2023-11-02 13:20:27 2023-11-02 13:20:00 2023-11-02 13:30:00 2023-11-02 13:29:59.999 ea733533-1516-41b6-b5e3-cadcb6f71529 3079 1488 17.55 2023-11-02 13:20:27 2023-11-02 13:20:00 2023-11-02 13:30:00 2023-11-02 13:29:59.999 cef1cd9f-379e-4791-8a0d-69eec8adae35 3211 1293 91.20 2023-11-02 13:20:27 2023-11-02 13:20:00 2023-11-02 13:30:00 2023-11-02 13:29:59.999 ``` ```sql -- apply aggregation on the tumbling windowed table SELECT window_start, window_end, SUM(price) as `sum` FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES)) GROUP BY window_start, window_end; ``` ```sql window_start window_end sum 2023-11-02 10:40:00 2023-11-02 10:50:00 258484.93 2023-11-02 10:50:00 2023-11-02 11:00:00 287632.15 2023-11-02 11:00:00 2023-11-02 11:10:00 271945.78 2023-11-02 11:10:00 2023-11-02 11:20:00 315207.46 2023-11-02 11:20:00 2023-11-02 11:30:00 342618.92 2023-11-02 11:30:00 2023-11-02 11:40:00 329754.31 ``` ```sql TIMESTAMP_LTZ ``` ```sql window_start ``` ```sql window_time ``` ```sql HOP(TABLE data, DESCRIPTOR(timecol), slide, size [, offset ]) ``` ```sql SELECT * FROM TABLE( HOP(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES, INTERVAL '10' MINUTES)) -- or with the named params -- note: the DATA param must be the first SELECT * FROM TABLE( HOP( DATA => TABLE `examples`.`marketplace`.`orders`, TIMECOL => DESCRIPTOR($rowtime), SLIDE => INTERVAL '5' MINUTES, SIZE => INTERVAL '10' MINUTES)); ``` ```sql order_id customer_id product_id price $rowtime window_start window_end window_time 10ae1386-496e-4c6c-9436-7f7e2e7a59f9 3160 1015 26.20 2023-11-02 19:24:46 2023-11-02 19:20:00 2023-11-02 19:30:00 2023-11-02 19:29:59.999 10ae1386-496e-4c6c-9436-7f7e2e7a59f9 3160 1015 26.20 2023-11-02 19:24:46 2023-11-02 19:15:00 2023-11-02 19:25:00 2023-11-02 19:24:59.999 66ecb3b3-7a3d-43ac-b3a2-4c35e06a8d7c 3046 1081 20.24 2023-11-02 19:24:46 2023-11-02 19:20:00 2023-11-02 19:30:00 2023-11-02 19:29:59.999 66ecb3b3-7a3d-43ac-b3a2-4c35e06a8d7c 3046 1081 20.24 2023-11-02 19:24:46 2023-11-02 19:15:00 2023-11-02 19:25:00 2023-11-02 19:24:59.999 4d86db03-a573-4fc2-9699-85455331a7c4 3023 1346 85.45 2023-11-02 19:24:46 2023-11-02 19:20:00 2023-11-02 19:30:00 2023-11-02 19:29:59.999 4d86db03-a573-4fc2-9699-85455331a7c4 3023 1346 85.45 2023-11-02 19:24:46 2023-11-02 19:15:00 2023-11-02 19:25:00 2023-11-02 19:24:59.999 d1460cf7-9472-45e0-9c2d-40537c9f34c0 3114 1333 49.56 2023-11-02 19:24:47 2023-11-02 19:20:00 2023-11-02 19:30:00 2023-11-02 19:29:59.999 d1460cf7-9472-45e0-9c2d-40537c9f34c0 3114 1333 49.56 2023-11-02 19:24:47 2023-11-02 19:15:00 2023-11-02 19:25:00 2023-11-02 19:24:59.999 e38984d8-5683-4e55-9f7a-e43350de7c3d 3024 1402 90.75 2023-11-02 19:24:47 2023-11-02 19:20:00 2023-11-02 19:30:00 2023-11-02 19:29:59.999 e38984d8-5683-4e55-9f7a-e43350de7c3d 3024 1402 90.75 2023-11-02 19:24:47 2023-11-02 19:15:00 2023-11-02 19:25:00 2023-11-02 19:24:59.999 ``` ```sql -- apply aggregation on the hopping windowed table SELECT window_start, window_end, SUM(price) as `sum` FROM TABLE( HOP(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES, INTERVAL '10' MINUTES)) GROUP BY window_start, window_end; ``` ```sql window_start window_end sum 2023-11-02 11:10:00 2023-11-02 11:20:00 296049.38 2023-11-02 11:15:00 2023-11-02 11:25:00 1122455.07 2023-11-02 11:20:00 2023-11-02 11:30:00 1648270.20 2023-11-02 11:25:00 2023-11-02 11:35:00 2143271.00 2023-11-02 11:30:00 2023-11-02 11:40:00 2701592.45 2023-11-02 11:35:00 2023-11-02 11:45:00 3214376.78 ``` ```sql [00:00, 01:00) ``` ```sql [00:00, 02:00) ``` ```sql [00:00, 03:00) ``` ```sql [00:00, 24:00) ``` ```sql TIMESTAMP_LTZ ``` ```sql window_start ``` ```sql window_time ``` ```sql CUMULATE(TABLE data, DESCRIPTOR(timecol), step, size) ``` ```sql SELECT * FROM TABLE( CUMULATE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '2' MINUTES, INTERVAL '10' MINUTES)); -- or with the named params -- note: the DATA param must be the first SELECT * FROM TABLE( CUMULATE( DATA => TABLE `examples`.`marketplace`.`orders`, TIMECOL => DESCRIPTOR($rowtime), STEP => INTERVAL '2' MINUTES, SIZE => INTERVAL '10' MINUTES)); ``` ```sql order_id customer_id product_id price $rowtime window_start window_end window_time 2572a2e0-2ba2-4947-8926-e70e31b68df3 3239 1015 13.59 2023-11-02 19:27:39 2023-11-02 19:20:00 2023-11-02 19:28:00 2023-11-02 19:27:59.999 2572a2e0-2ba2-4947-8926-e70e31b68df3 3239 1015 13.59 2023-11-02 19:27:39 2023-11-02 19:20:00 2023-11-02 19:30:00 2023-11-02 19:29:59.999 7f791e40-a524-4a9b-bb0d-35a2c1b5a7c4 3102 1374 93.59 2023-11-02 19:27:39 2023-11-02 19:20:00 2023-11-02 19:28:00 2023-11-02 19:27:59.999 7f791e40-a524-4a9b-bb0d-35a2c1b5a7c4 3102 1374 93.59 2023-11-02 19:27:39 2023-11-02 19:20:00 2023-11-02 19:30:00 2023-11-02 19:29:59.999 47e70310-8fa4-4568-b521-7e2b68b06634 3026 1142 58.26 2023-11-02 19:27:39 2023-11-02 19:20:00 2023-11-02 19:28:00 2023-11-02 19:27:59.999 47e70310-8fa4-4568-b521-7e2b68b06634 3026 1142 58.26 2023-11-02 19:27:39 2023-11-02 19:20:00 2023-11-02 19:30:00 2023-11-02 19:29:59.999 fe1b440e-dc75-4092-be11-8e1c3afe55c7 3106 1057 11.37 2023-11-02 19:27:39 2023-11-02 19:20:00 2023-11-02 19:28:00 2023-11-02 19:27:59.999 fe1b440e-dc75-4092-be11-8e1c3afe55c7 3106 1057 11.37 2023-11-02 19:27:39 2023-11-02 19:20:00 2023-11-02 19:30:00 2023-11-02 19:29:59.999 6668e4dc-d574-44db-8f0f-2b8e1b1f3c2e 3061 1049 26.20 2023-11-02 19:27:39 2023-11-02 19:20:00 2023-11-02 19:28:00 2023-11-02 19:27:59.999 6668e4dc-d574-44db-8f0f-2b8e1b1f3c2e 3061 1049 26.20 2023-11-02 19:27:39 2023-11-02 19:20:00 2023-11-02 19:30:00 2023-11-02 19:29:59.999 ``` ```sql -- apply aggregation on the cumulating windowed table SELECT window_start, window_end, SUM(price) as `sum` FROM TABLE( CUMULATE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '2' MINUTES, INTERVAL '10' MINUTES)) GROUP BY window_start, window_end; ``` ```sql window_start window_end sum 2023-11-02 12:40:00.000 2023-11-02 12:46:00.000 327376.23 2023-11-02 12:40:00.000 2023-11-02 12:48:00.000 661272.70 2023-11-02 12:40:00.000 2023-11-02 12:50:00.000 989294.13 2023-11-02 12:50:00.000 2023-11-02 12:52:00.000 1316596.58 2023-11-02 12:50:00.000 2023-11-02 12:54:00.000 1648097.20 2023-11-02 12:50:00.000 2023-11-02 12:56:00.000 1977881.53 2023-11-02 12:50:00.000 2023-11-02 12:58:00.000 2304080.32 2023-11-02 12:50:00.000 2023-11-02 13:00:00.000 2636795.56 ``` ```sql window_start ``` ```sql window_time ``` ```sql SESSION(TABLE data [PARTITION BY(keycols, ...)], DESCRIPTOR(timecol), gap) ``` ```sql SELECT * FROM TABLE( SESSION(TABLE `examples`.`marketplace`.`orders` PARTITION BY product_id, DESCRIPTOR($rowtime), INTERVAL '1' MINUTES)); -- or with the named params -- note: the DATA param must be the first SELECT * FROM TABLE( SESSION( DATA => TABLE `examples`.`marketplace`.`orders` PARTITION BY product_id, TIMECOL => DESCRIPTOR($rowtime), GAP => INTERVAL '1' MINUTES)); ``` ```sql order_id customer_id product_id price $rowtime window_start window_end window_time d7ef1f9a-4f5f-406e-bbad-25db521c38bf 3068 1234 17.08 2023-11-02T19:43:58.626Z 2023-11-02 21:43:58.626 2023-11-02 21:44:58.626 2023-11-02T19:44:58.625Z 804f0c86-a59a-4425-a293-b28bafaa9674 3071 1332 48.12 2023-11-02T19:44:00.506Z 2023-11-02 21:44:00.506 2023-11-02 21:45:00.506 2023-11-02T19:45:00.505Z 61ea63e3-f040-4501-b78e-8db1fdcf45fc 3179 1267 12.35 2023-11-02T19:43:58.405Z 2023-11-02 21:43:58.405 2023-11-02 21:45:07.925 2023-11-02T19:45:07.924Z b70ba5bc-428c-41d7-b8fc-8014dd3fd429 3234 1267 40.81 2023-11-02T19:44:00.365Z 2023-11-02 21:43:58.405 2023-11-02 21:45:07.925 2023-11-02T19:45:07.924Z 37688f8c-65ee-4e27-a567-4890e6c7663b 3179 1267 98.17 2023-11-02T19:44:07.925Z 2023-11-02 21:43:58.405 2023-11-02 21:45:07.925 2023-11-02T19:45:07.924Z 4cfa0cc6-881a-43b3-bb34-1746c3b93094 3077 1047 16.78 2023-11-02T19:44:01.985Z 2023-11-02 21:44:01.985 2023-11-02 21:45:23.285 2023-11-02T19:45:23.284Z e007ce6e-5a76-4390-8fb3-50f46025b965 3095 1047 77.48 2023-11-02T19:44:11.365Z 2023-11-02 21:44:01.985 2023-11-02 21:45:23.285 2023-11-02T19:45:23.284Z 487a0248-a534-489e-bbc5-733e87d19cc7 3200 1047 47.86 2023-11-02T19:44:23.285Z 2023-11-02 21:44:01.985 2023-11-02 21:45:23.285 2023-11-02T19:45:23.284Z 4dd1ab51-8ca4-4de6-9f79-bb2ad7ab2498 3043 1235 36.5 2023-11-02T19:43:57.785Z 2023-11-02 21:43:57.785 2023-11-02 21:45:24.625 2023-11-02T19:45:24.624Z bb524ec6-1b21-40f1-8c54-3aac7b454c5b 3232 1235 36.98 2023-11-02T19:44:07.265Z 2023-11-02 21:43:57.785 2023-11-02 21:45:24.625 2023-11-02T19:45:24.624Z 9c218c8a-1566-4982-9640-a0deb9ac203c 3065 1235 30.17 2023-11-02T19:44:16.966Z 2023-11-02 21:43:57.785 2023-11-02 21:45:24.625 2023-11-02T19:45:24.624Z 6623c41b-04fa-4df0-a312-45b6dfcdc639 3143 1235 12.2 2023-11-02T19:44:24.625Z 2023-11-02 21:43:57.785 2023-11-02 21:45:24.625 2023-11-02T19:45:24.624Z ``` ```sql SELECT window_start, window_end, customer_id, SUM(price) as `sum` FROM TABLE( SESSION(TABLE `examples`.`marketplace`.`orders` PARTITION BY customer_id, DESCRIPTOR($rowtime), INTERVAL '1' MINUTES)) GROUP BY window_start, window_end, customer_id; ``` ```sql window_start window_end sum 2023-11-02 12:40:00 2023-11-02 12:46:00 327376.23 2023-11-02 12:40:00 2023-11-02 12:48:00 661272.70 2023-11-02 12:40:00 2023-11-02 12:50:00 989294.13 2023-11-02 12:50:00 2023-11-02 12:52:00 1316596.58 2023-11-02 12:50:00 2023-11-02 12:54:00 1648097.20 2023-11-02 12:50:00 2023-11-02 12:56:00 1977881.53 2023-11-02 12:50:00 2023-11-02 12:58:00 2304080.32 2023-11-02 12:50:00 2023-11-02 13:00:00 2636795.56 ``` ```sql 2021-06-30 00:00:00 ``` ```sql 2021-06-29 23:44:00 ``` ```sql 2021-06-29 23:54:00 ``` ```sql 2021-06-29 23:54:00 ``` ```sql 2021-06-30 00:04:00 ``` ```sql 2021-06-29 23:56:00 ``` ```sql 2021-06-30 00:06:00 ``` ```sql 2021-06-30 00:00:00 ``` ```sql 2021-06-30 00:10:00 ``` ```sql 2021-06-30 00:04:00 ``` ```sql 2021-06-30 00:14:00 ``` ```sql 2021-06-30 00:06:00 ``` ```sql 2021-06-30 00:16:00 ``` ```sql 2021-06-30 00:16:00 ``` ```sql 2021-06-30 00:26:00 ``` ```sql SELECT * FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES, INTERVAL '1' MINUTES)); -- or with the named params -- note: the DATA param must be the first SELECT * FROM TABLE( TUMBLE( DATA => TABLE `examples`.`marketplace`.`orders`, TIMECOL => DESCRIPTOR($rowtime), SIZE => INTERVAL '10' MINUTES, OFFSET => INTERVAL '1' MINUTES)); ``` ```sql order_id customer_id product_id price $rowtime window_start window_end window_time 0932497b-a3c2-4f80-9b1f-9d099b091696 3063 1035 75.85 2023-11-02 19:29:51 2023-11-02 19:21:00 2023-11-02 19:31:00 2023-11-02 19:30:59.999 20f4529c-9c86-4a54-8c38-f6c3caa1d7b8 3131 1207 89.00 2023-11-02 19:29:51 2023-11-02 19:21:00 2023-11-02 19:31:00 2023-11-02 19:30:59.999 cbda6c08-e0c7-41cb-ae04-c50f5b1f5e3c 3074 1312 63.71 2023-11-02 19:29:51 2023-11-02 19:21:00 2023-11-02 19:31:00 2023-11-02 19:30:59.999 d049ed28-cbbb-479b-8df6-8c637c1b68f5 3006 1201 72.14 2023-11-02 19:29:51 2023-11-02 19:21:00 2023-11-02 19:31:00 2023-11-02 19:30:59.999 63b6f2ef-c0e9-4737-ab81-f5acb93e4a64 3182 1346 76.18 2023-11-02 19:29:51 2023-11-02 19:21:00 2023-11-02 19:31:00 2023-11-02 19:30:59.999 00c088db-9cb7-4128-a4fd-4e06c0e95f7a 3198 1166 63.49 2023-11-02 19:29:51 2023-11-02 19:21:00 2023-11-02 19:31:00 2023-11-02 19:30:59.999 b9ca292e-635a-4ef7-a6ee-bcf099df7c1b 3236 1462 69.13 2023-11-02 19:29:51 2023-11-02 19:21:00 2023-11-02 19:31:00 2023-11-02 19:30:59.999 3299fd08-264e-4e49-8bb9-82cae18c5d7c 3058 1226 59.53 2023-11-02 19:29:51 2023-11-02 19:21:00 2023-11-02 19:31:00 2023-11-02 19:30:59.999 45878388-7cb3-409d-91a4-8ef1f02c8576 3028 1228 16.63 2023-11-02 19:29:51 2023-11-02 19:21:00 2023-11-02 19:31:00 2023-11-02 19:30:59.999 c2fef024-c0c2-4c0f-9880-bc423d1c2db6 3219 1071 80.66 2023-11-02 19:29:51 2023-11-02 19:21:00 2023-11-02 19:31:00 2023-11-02 19:30:59.999 ``` ```sql -- apply aggregation on the tumbling windowed table SELECT window_start, window_end, SUM(price) as `sum` FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES, INTERVAL '1' MINUTES)) GROUP BY window_start, window_end; ``` ```sql window_start window_end sum 2023-11-02 19:21:00 2023-11-02 19:31:00 7285.64 2023-11-02 19:22:00 2023-11-02 19:32:00 6932.18 2023-11-02 19:23:00 2023-11-02 19:33:00 7104.53 2023-11-02 19:24:00 2023-11-02 19:34:00 7456.92 2023-11-02 19:25:00 2023-11-02 19:35:00 7198.75 2023-11-02 19:26:00 2023-11-02 19:36:00 6875.39 2023-11-02 19:27:00 2023-11-02 19:37:00 7312.87 2023-11-02 19:28:00 2023-11-02 19:38:00 7089.26 2023-11-02 19:29:00 2023-11-02 19:39:00 7401.58 2023-11-02 19:30:00 2023-11-02 19:40:00 7156.43 ``` --- ### SQL WITH Clause in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/queries/with.html WITH Clause in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables writing auxiliary statements to use in larger SQL queries. Syntax¶ WITH [ , ... ] SELECT ... FROM ...; : with_item_name (column_name[, ...n]) AS ( ) Description¶ The WITH clause provides a way to write auxiliary statements for use in a larger query. These statements, which are often referred to as Common Table Expressions (CTE), can be thought of as defining temporary views that exist just for one query. Example¶ The following example defines a common table expression orders_with_total and uses it in a GROUP BY query. WITH orders_with_total AS ( SELECT order_id, price + tax AS total FROM orders ) SELECT order_id, SUM(total) FROM orders_with_total GROUP BY order_id; Related content¶ Flink SQL Queries Flink SQL Functions Statements Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql WITH [ , ... ] SELECT ... FROM ...; : with_item_name (column_name[, ...n]) AS ( ) ``` ```sql orders_with_total ``` ```sql WITH orders_with_total AS ( SELECT order_id, price + tax AS total FROM orders ) SELECT order_id, SUM(total) FROM orders_with_total GROUP BY order_id; ``` --- ### Data Type Mappings with Flink SQL Statements in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/serialization.html Data Type Mappings in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports records in the Avro Schema Registry, JSON_SR, and Protobuf Schema Registry formats. Avro schemas JSON Schema Protobuf schema Avro schemas¶ Known limitations¶ Avro enums have limited support. Flink supports reading and writing enums but treats them as a STRING type. From Flink’s perspective, enums are not distinguishable from the STRING type. You can’t create an Avro schema from Flink that has an enum field. Flink doesn’t support reading Avro time-micros as a TIME type. Flink supports TIME with precision up to 3. time-micros is read and written as BIGINT. Field names must match Avro criteria. Avro expects field names to start with [A-Za-z_] and subsequently contain only [A-Za-z0-9_]. These Flink types are not supported: INTERVAL_DAY_TIME INTERVAL_YEAR_MONTH TIMESTAMP_WITH_TIMEZONE Flink SQL types to Avro types¶ The following table shows the mapping of Flink SQL types to Avro physical types. This mapping is important for creating tables, because it defines the Avro schema that’s produced by a CREATE TABLE statement. ARRAY¶ Avro type: array Avro logical type: – Additional properties: – Example: { "type" : "array", "items" : "long" } BIGINT¶ Avro type: long Avro logical type: – Additional properties: – Example: long BINARY¶ Avro type: fixed Avro logical type: – Additional properties: flink.maxLength (MAX_LENGTH if not set) Example: { "type" : "fixed", "name" : "row", "namespace" : "io.confluent", "size" : 123 } BOOLEAN¶ Avro type: boolean Avro logical type: – Additional properties: – Example: boolean CHAR¶ Avro type: string Avro logical type: – Additional properties: flink.maxLength (MAX_LENGTH if not set) Example: { "type" : "string", "flink.maxLength" : 123, "flink.minLength" : 123, "flink.version" : "1" } DATE¶ Avro type: int Avro logical type: date Additional properties: – Example: { "type" : "int", "logicalType" : "date" } DECIMAL¶ Avro type: bytes Avro logical type: decimal Additional properties: – Example: { "type" : "bytes", "logicalType" : "decimal", "precision" : 6, "scale" : 3 } DOUBLE¶ Avro type: double Avro logical type: – Additional properties: – Example: double FLOAT¶ Avro type: float Avro logical type: – Additional properties: – Example: float INT¶ Avro type: int Avro logical type: – Additional properties: – Example: int MAP (character key)¶ Avro type: map Avro logical type: – Additional properties: – Example: { "type" : "map", "values" : "boolean" } MAP (non-character key)¶ Avro type: array Avro logical type: – Additional properties: array of io.confluent.connect.avro.MapEntry(key, value) Example: { "type" : "array", "items" : { "type" : "record", "name" : "MapEntry", "namespace" : "io.confluent.connect.avro", "fields" : [ { "name" : "key", "type" : "int" }, { "name" : "value", "type" : "bytes" } ] } } MULTISET (character element)¶ Avro type: map Avro logical type: – Additional properties: flink.type : multiset Example: { "type" : "map", "values" : "int", "flink.type" : "multiset", "flink.version" : "1" } MULTISET (non-character key)¶ Avro type: array Avro logical type: – Additional properties: array of io.confluent.connect.avro.MapEntry(key, value), flink.type : multiset Example: { "type" : "array", "items" : { "type" : "record", "name" : "MapEntry", "namespace" : "io.confluent.connect.avro", "fields" : [ { "name" : "key", "type" : "long" }, { "name" : "value", "type" : "int" } ] }, "flink.type" : "multiset", "flink.version" : "1" } ROW¶ Avro type: record Avro logical type: – Additional properties: connect.type=int16 Name: org.apache.flink.avro.generated.record Nested records name: org.apache.flink.avro.generated.record_$fieldName Example: { "type" : "record", "name" : "row", "namespace" : "io.confluent", "fields" : [ { "name" : "f0", "type" : "long", "doc" : "field comment" } ] } SMALLINT¶ Avro type: int Avro logical type: – Additional properties: connect.type=int16 Example: { "type" : "int", "connect.type" : "int16" } STRING / VARCHAR¶ Avro type: string Avro logical type: – Additional properties: flink.maxLength = flink.minLength (MAX_LENGTH if not set) Example: { "type" : "string", "flink.maxLength" : 123, "flink.version" : "1" } TIME¶ Avro type: int Avro logical type: time-millis Additional properties: flink.precision (default: 3, max supported: 3) Example: { "type" : "int", "flink.precision" : 2, "flink.version" : "1", "logicalType" : "time-millis" } TIMESTAMP¶ Avro type: long Avro logical type: local-timestamp-millis / local-timestamp-micros Additional properties: flink.precision (default: 3/6, max supported: 3/9) Example: { "type" : "long", "flink.precision" : 2, "flink.version" : "1", "logicalType" : "local-timestamp-millis" } TIMESTAMP_LTZ¶ Avro type: long Avro logical type: timestamp-millis / timestamp-micros Additional properties: flink.precision (default: 3/6, max supported: 3/9) Example: { "type" : "long", "flink.precision" : 2, "flink.version" : "1", "logicalType" : "timestamp-millis" } TINYINT¶ Avro type: int Avro logical type: – Additional properties: connect.type=int8 Example: { "type" : "int", "connect.type" : "int8" } VARBINARY¶ Avro type: bytes Avro logical type: – Additional properties: flink.maxLength (MAX_LENGTH if not set) Example: { "type" : "bytes", "flink.maxLength" : 123, "flink.version" : "1" } Avro types to Flink SQL types¶ The following table shows the mapping of Avro types to Flink SQL and types. It shows only mappings that are not covered by the previous table. These types can’t originate from Flink SQL. This mapping is important when consuming/reading records with a schema that was created outside of Flink. The mapping defines the Flink table’s schema inferred from an Avro schema. Flink SQL supports reading and writing nullable types. A nullable type is mapped to an Avro union(avro_type, null), with the avro_type converted from the corresponding Flink type. Avro type Avro logical type Flink SQL type Example long time-micros BIGINT – enum – STRING – union with null type (null + one other type) – NULLABLE(type) – union (other unions) – ROW(type_name Type0, …) [ "long", "string", { "type": "record", "name": "User", "namespace": "io.test1", "fields": [ { "name": "f0", "type": "long" } ] } ] string (uuid) – STRING – fixed (duration) – BINARY(size) – JSON Schema¶ Flink SQL types to JSON Schema types¶ The following table shows the mapping of Flink SQL types to JSON Schema types. This mapping is important for creating tables, because it defines the JSON Schema that’s produced by a CREATE TABLE statement. Nullable types are expressed as oneOf(Null, T). Object for a MAP and MULTISET must have two fields [key, value]. MULTISET is equivalent to MAP[K, INT] and is serialized accordingly. ARRAY¶ JSON Schema type: Array Additional properties: – JSON type title: – Example: { "type": "array", "items": { "type": "number", "title": "org.apache.kafka.connect.data.Time", "flink.precision": 2, "connect.type": "int32", "flink.version": "1" } } BIGINT¶ JSON Schema type: Number Additional properties: connect.type=int64 JSON type title: – Example: { "type": "number", "connect.type": "int64" } BINARY¶ JSON Schema type: String Additional properties: connect.type=bytes flink.minLength=flink.maxLength: Different from JSON’s minLength/maxLength, because this property describes bytes length, not string length. JSON type title: – Example: { "type": "string", "flink.maxLength": 123, "flink.minLength": 123, "flink.version": "1", "connect.type": "bytes" } BOOLEAN¶ JSON Schema type: Boolean Additional properties: – JSON type title: – Example: { "type": "array", "items": { "type": "number", "title": "org.apache.kafka.connect.data.Time", "flink.precision": 2, "connect.type": "int32", "flink.version": "1" } } CHAR¶ JSON Schema type: String Additional properties: minLength=maxLength JSON type title: – Example: { "type": "string", "minLength": 123, "maxLength": 123 } DATE¶ JSON Schema type: Number Additional properties: connect.type=int32 JSON type title: org.apache.kafka.connect.data.Date Example: – DECIMAL¶ JSON Schema type: Number Additional properties: connect.type=bytes JSON type title: org.apache.kafka.connect.data.Decimal Example: – DOUBLE¶ JSON Schema type: Number Additional properties: connect.type=float64 JSON type title: – Example: { "type": "number", "connect.type": "float64" } FLOAT¶ JSON Schema type: Number Additional properties: connect.type=float32 JSON type title: – Example: { "type": "number", "connect.type": "float32" } INT¶ JSON Schema type: Number Additional properties: connect.type=int32 JSON type title: – Example: { "type": "number", "connect.type": "int32" } MAP[K, V]¶ JSON Schema type: Array[Object] Additional properties: connect.type=map JSON type title: – Example: { "type": "array", "connect.type": "map", "items": { "type": "object", "properties": { "value": { "type": "number", "connect.type": "int64" }, "key": { "type": "number", "connect.type": "int32" } } } } MAP[VARCHAR, V]¶ JSON Schema type: Object Additional properties: connect.type=map JSON type title: – Example: { "type":"object", "connect.type":"map", "additionalProperties": { "type":"number", "connect.type":"int64" } } MULTISET[K]¶ JSON Schema type: Array[Object] Additional properties: connect.type=map flink.type=multiset JSON type title: The count (value) in the JSON schema must map to a Flink INT type. For MULTISET types, the count (value) in the JSON schema must map to a Flink INT type, which corresponds to connect.type: int32 in the JSON Schema. Using connect.type: int64 causes a validation error. Example: { "type": "array", "connect.type": "map", "flink.type": "multiset", "items": { "type": "object", "properties": { "value": { "type": "number", "connect.type": "int32" }, "key": { "type": "number", "connect.type": "int32" } } } } MULTISET[VARCHAR]¶ JSON Schema type: Object Additional properties: connect.type=map flink.type=multiset JSON type title: The count (value) in the JSON schema must map to a Flink INT type. For MULTISET types, the count (value) in the JSON schema must map to a Flink INT type, which corresponds to connect.type: int32 in the JSON Schema. Using connect.type: int64 causes a validation error. Example: { "type": "object", "connect.type": "map", "flink.type": "multiset", "additionalProperties": { "type": "number", "connect.type": "int32" } } ROW¶ JSON Schema type: Object Additional properties: – JSON type title: – Example: – SMALLINT¶ JSON Schema type: Number Additional properties: connect.type=int16 JSON type title: – Example: { "type": "number", "connect.type": "int16" } TIME¶ JSON Schema type: Number Additional properties: connect.type=int32 flink.precision JSON type title: org.apache.kafka.connect.data.Time Example: { "type":"number", "title":"org.apache.kafka.connect.data.Time", "flink.precision":2, "connect.type":"int32", "flink.version":"1" } TIMESTAMP¶ JSON Schema type: Number Additional properties: connect.type=int64 flink.precision flink.type=timestamp JSON type title: org.apache.kafka.connect.data.Timestamp Example: { "type":"number", "title":"org.apache.kafka.connect.data.Timestamp", "flink.precision":2, "flink.type":"timestamp", "connect.type":"int64", "flink.version":"1" } TIMESTAMP_LTZ¶ JSON Schema type: Number Additional properties: connect.type=int64 flink.precision JSON type title: org.apache.kafka.connect.data.Timestamp Example: { "type":"number", "title":"org.apache.kafka.connect.data.Timestamp", "flink.precision":2, "connect.type":"int64", "flink.version":"1" } TINYINT¶ JSON Schema type: Number Additional properties: connect.type=int8 JSON type title: – Example: { "type": "number", "connect.type": "int8" } VARBINARY¶ JSON Schema type: String Additional properties: connect.type=bytes flink.maxLength: Different from JSON’s maxLength, because this property describes bytes length, not string length. JSON type title: – Example: { "type": "string", "flink.maxLength": 123, "flink.version": "1", "connect.type": "bytes" } VARCHAR¶ JSON Schema type: String Additional properties: maxLength JSON type title: – Example: { "type": "string", "maxLength": 123 } JSON types to Flink SQL types¶ The following table shows the mapping of JSON types to Flink SQL types. It shows only mappings that are not covered by the previous table. These types can’t originate from Flink SQL. This mapping is important when consuming/reading records with a schema that was created outside of Flink. The mapping defines the Flink table’s schema inferred from JSON Schema. JSON type Flink SQL type Combined ROW Enum VARCHAR Number(requiresInteger=true) BIGINT Number(requiresInteger=false) DOUBLE Protobuf schema¶ Flink SQL types to Protobuf types¶ The following table shows the mapping of Flink SQL types to Protobuf types. This mapping is important for creating tables, because it defines the Protobuf schema that’s produced by a CREATE TABLE statement. ARRAY[T]¶ Protobuf type: repeated T Message type: – Additional properties: flink.wrapped, which indicates that Flink wrappers are used to represent nullability, because Protobuf doesn’t support nullable repeated natively. Example: repeated int64 value = 1; Nullable array: arrayNullableRepeatedWrapper arrayNullable = 1 [(confluent.field_meta) = { params: [ { key: "flink.wrapped", value: "true" }, { key: "flink.version", value: "1" } ] }]; message arrayNullableRepeatedWrapper { repeated int64 value = 1; } Nullable elements: repeated elementNullableElementWrapper elementNullable = 2 [(confluent.field_meta) = { params: [ { key: "flink.wrapped", value: "true" }, { key: "flink.version", value: "1" } ] }]; message elementNullableElementWrapper { optional int64 value = 1; } BIGINT¶ Protobuf type: INT64 Message type: – Additional properties: – Example: optional int64 bigint = 8; BINARY¶ Protobuf type: BYTES Message type: – Additional properties: flink.maxLength=flink.minLength Example: optional bytes binary = 13 [(confluent.field_meta) = { params: [ { key: "flink.maxLength", value: "123" }, { key: "flink.minLength", value: "123" }, { key: "flink.version", value: "1" } ] }]; BOOLEAN¶ Protobuf type: BOOL Message type: – Additional properties: – Example: optional bool boolean = 2; CHAR¶ Protobuf type: STRING Message type: – Additional properties: flink.maxLength=flink.minLength Example: optional string char = 11 [(confluent.field_meta) = { params: [ { key: "flink.maxLength", value: "123" }, { key: "flink.minLength", value: "123" }, { key: "flink.version", value: "1" } ] }]; DATE¶ Protobuf type: MESSAGE Message type: google.type.Date Additional properties: – Example: optional .google.type.Date date = 17; DECIMAL¶ Protobuf type: MESSAGE Message type: confluent.type.Decimal Additional properties: – Example: optional .confluent.type.Decimal decimal = 19 [(confluent.field_meta) = { params: [ { value: "5", key: "precision" }, { value: "1", key: "scale" }, { key: "flink.version", value: "1" } ] }]; DOUBLE¶ Protobuf type: DOUBLE Message type: – Additional properties: – Example: optional double double = 10; FLOAT¶ Protobuf type: FLOAT Message type: – Additional properties: – Example: optional float float = 9; INT¶ Protobuf type: INT32 Message type: – Additional properties: – Example: optional int32 int = 7; MAP[K, V]¶ Protobuf type: repeated MESSAGE Message type: XXEntry(K key, V value) Additional properties: flink.wrapped, which indicates that Flink wrappers are used to represent nullability, because Protobuf doesn’t support nullable repeated natively. For examples, see the ARRAY type. Example: repeated MapEntry map = 20; message MapEntry { optional string key = 1; optional int64 value = 2; } MULTISET[V]¶ Protobuf type: repeated MESSAGE Message type: XXEntry(V key, int32 value) Additional properties: flink.wrapped, which indicates that Flink wrappers are used to represent nullability, because Protobuf doesn’t support nullable repeated natively. For examples, see the ARRAY type. flink.type=multiset Example: repeated MultisetEntry multiset = 1 [(confluent.field_meta) = { params: [ { key: "flink.type", value: "multiset" }, { key: "flink.version", value: "1" } ] }]; message MultisetEntry { optional string key = 1; int32 value = 2; } ROW¶ Protobuf type: MESSAGE Message type: fieldName Additional properties: – Example: meta_Row meta = 1; message meta_Row { float a = 1; float b = 2; } SMALLINT¶ Protobuf type: INT32 Message type: – Additional properties: MetaProto extension: connect.type = int16 Example: optional int32 smallInt = 6 [(confluent.field_meta) = { doc: "smallInt comment", params: [ { key: "flink.version", value: "1" }, { key: "connect.type", value: "int16" } ] }]; TIMESTAMP¶ Protobuf type: MESSAGE Message type: google.protobuf.Timestamp Additional properties: flink.precision flink.type=timestamp Example: optional .google.protobuf.Timestamp timestamp_ltz_3 = 16 [(confluent.field_meta) = { params: [ { key: "flink.type", value: "timestamp" }, { key: "flink.precision", value: "3" }, { key: "flink.version", value: "1" } ] }]; TIMESTAMP_LTZ¶ Protobuf type: MESSAGE Message type: google.protobuf.Timestamp Additional properties: flink.precision Example: optional .google.protobuf.Timestamp timestamp_ltz_3 = 15 [(confluent.field_meta) = { params: [ { key: "flink.precision", value: "3" }, { key: "flink.version", value: "1" } ] }]; TIME_WITHOUT_TIME_ZONE¶ Protobuf type: MESSAGE Message type: google.type.TimeOfDay Additional properties: – Example: optional .google.type.TimeOfDay time = 18 [(confluent.field_meta) = { params: [ { key: "flink.precision", value: "3" }, { key: "flink.version", value: "1" } ] }]; TINYINT¶ Protobuf type: INT32 Message type: – Additional properties: MetaProto extension: connect.type = int8 Example: optional int32 tinyInt = 4 [(confluent.field_meta) = { doc: "tinyInt comment", params: [ { key: "flink.version", value: "1" }, { key: "connect.type", value: "int8" } ] }]; VARBINARY¶ Protobuf type: BYTES Message type: – Additional properties: flink.maxLength (default = MAX_LENGTH) Example: optional bytes varbinary = 14 [(confluent.field_meta) = { params: [ { key: "flink.maxLength", value: "123" }, { key: "flink.version", value: "1" } ] }]; VARCHAR¶ Protobuf type: STRING Message type: – Additional properties: flink.maxLength (default = MAX_LENGTH) Example: optional string varchar = 12 [(confluent.field_meta) = { params: [ { key: "flink.maxLength", value: "123" }, { key: "flink.version", value: "1" } ] }]; Protobuf types to Flink SQL types¶ The following table shows the mapping of Protobuf types to Flink SQL and Connect types. It shows only mappings that are not covered by the previous table. These types can’t originate from Flink SQL. This mapping is important when consuming/reading records with a schema that was created outside of Flink. The mapping defines the Flink table’s schema inferred from a Protobuf schema. Protobuf type Flink SQL type Message type Connect type annotation FIXED32 | FIXED64 | SFIXED64 BIGINT – – INT32 | SINT32 | SFIXED32 INT – – INT32 | SINT32 | SFIXED32 SMALLINT – int16 INT32 | SINT32 | SFIXED32 TINYINT – int8 INT64 | SINT64 BIGINT – – UINT32 | UINT64 BIGINT – – MESSAGE BIGINT google.protobuf.Int64Value – MESSAGE BIGINT google.protobuf.UInt64Value – MESSAGE BIGINT google.protobuf.UInt32Value – MESSAGE BOOLEAN google.protobuf.BoolValue – MESSAGE DOUBLE google.protobuf.DoubleValue – MESSAGE FLOAT google.protobuf.FloatValue – MESSAGE INT google.protobuf.Int32Value – MESSAGE VARBINARY google.protobuf.BytesValue – MESSAGE VARCHAR google.protobuf.StringValue – oneOf ROW – – Protobuf 3 nullable field behavior¶ When working with Protobuf 3 schemas in Confluent Cloud for Apache Flink, it’s important to understand how nullable fields are handled. When converting to a Protobuf schema, Flink marks all NULLABLE fields as optional. In Protobuf, expressing something as NULLABLE or NOT NULL is not straightforward. All non-MESSAGE types are NOT NULL. If not set explicitly, the default value is assigned. Non-MESSAGE types marked with optional can be checked if they were set. If not set, Flink assumes NULL. MESSAGE types are all NULLABLE, which means that all fields of MESSAGE type are optional, and there is no way to ensure on a format level they are NOT NULL. To store this information, Flink uses the flink.notNull property, for example: message Row { .google.type.Date date = 1 [(confluent.field_meta) = { params: [ { key: "flink.version", value: "1" }, { key: "flink.notNull", value: "true" } ] }]; } Fields without the optional keywordIn Protobuf 3, fields without the optional keyword are treated as NOT NULL by Flink. This is because Protobuf 3 doesn’t support nullable getters/setters by default. If a field is omitted in the data, Protobuf 3 assigns the default value, which is 0 for numbers, the empty string for strings, and false for booleans. Fields with the optional keywordFields marked with optional in Protobuf 3 are treated as nullable by Flink. When such a field is not set in the data, Flink interprets it as NULL. Fields with the repeated keywordFields marked with repeated in Protobuf 3 are treated as arrays by Flink. The array itself is NOT NULL, but individual elements within the array can be nullable depending on their type. For MESSAGE types, elements are nullable by default. For primitive types, elements are NOT NULL. This behavior is consistent across all streaming platforms that work with Protobuf 3, including Kafka Streams and other Confluent products, and is not specific to Flink. It’s a fundamental characteristic of the Protobuf 3 specification itself. In a Protobuf 3 schema, if you want a field to be nullable in Flink, you must explicitly mark it as optional, for example: message Example { string required_field = 1; // NOT NULL in Flink optional string nullable_field = 2; // NULLABLE in Flink repeated string array_field = 3; // NOT NULL array in Flink repeated optional string nullable_array_field = 4; // NOT NULL array with nullable elements } Related content¶ Data Types Apache Avro Specification JSON Schema Specification Protocol Buffers Version 3 Language Specification Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql time-micros ``` ```sql time-micros ``` ```sql [A-Za-z0-9_] ``` ```sql { "type" : "array", "items" : "long" } ``` ```sql flink.maxLength ``` ```sql { "type" : "fixed", "name" : "row", "namespace" : "io.confluent", "size" : 123 } ``` ```sql flink.maxLength ``` ```sql { "type" : "string", "flink.maxLength" : 123, "flink.minLength" : 123, "flink.version" : "1" } ``` ```sql { "type" : "int", "logicalType" : "date" } ``` ```sql { "type" : "bytes", "logicalType" : "decimal", "precision" : 6, "scale" : 3 } ``` ```sql { "type" : "map", "values" : "boolean" } ``` ```sql io.confluent.connect.avro.MapEntry(key, value) ``` ```sql { "type" : "array", "items" : { "type" : "record", "name" : "MapEntry", "namespace" : "io.confluent.connect.avro", "fields" : [ { "name" : "key", "type" : "int" }, { "name" : "value", "type" : "bytes" } ] } } ``` ```sql flink.type : multiset ``` ```sql { "type" : "map", "values" : "int", "flink.type" : "multiset", "flink.version" : "1" } ``` ```sql io.confluent.connect.avro.MapEntry(key, value) ``` ```sql flink.type : multiset ``` ```sql { "type" : "array", "items" : { "type" : "record", "name" : "MapEntry", "namespace" : "io.confluent.connect.avro", "fields" : [ { "name" : "key", "type" : "long" }, { "name" : "value", "type" : "int" } ] }, "flink.type" : "multiset", "flink.version" : "1" } ``` ```sql connect.type=int16 ``` ```sql org.apache.flink.avro.generated.record ``` ```sql org.apache.flink.avro.generated.record_$fieldName ``` ```sql { "type" : "record", "name" : "row", "namespace" : "io.confluent", "fields" : [ { "name" : "f0", "type" : "long", "doc" : "field comment" } ] } ``` ```sql connect.type=int16 ``` ```sql { "type" : "int", "connect.type" : "int16" } ``` ```sql flink.maxLength = flink.minLength ``` ```sql { "type" : "string", "flink.maxLength" : 123, "flink.version" : "1" } ``` ```sql time-millis ``` ```sql flink.precision ``` ```sql { "type" : "int", "flink.precision" : 2, "flink.version" : "1", "logicalType" : "time-millis" } ``` ```sql local-timestamp-millis ``` ```sql local-timestamp-micros ``` ```sql flink.precision ``` ```sql { "type" : "long", "flink.precision" : 2, "flink.version" : "1", "logicalType" : "local-timestamp-millis" } ``` ```sql timestamp-millis ``` ```sql timestamp-micros ``` ```sql flink.precision ``` ```sql { "type" : "long", "flink.precision" : 2, "flink.version" : "1", "logicalType" : "timestamp-millis" } ``` ```sql connect.type=int8 ``` ```sql { "type" : "int", "connect.type" : "int8" } ``` ```sql flink.maxLength ``` ```sql { "type" : "bytes", "flink.maxLength" : 123, "flink.version" : "1" } ``` ```sql union(avro_type, null) ``` ```sql [ "long", "string", { "type": "record", "name": "User", "namespace": "io.test1", "fields": [ { "name": "f0", "type": "long" } ] } ] ``` ```sql { "type": "array", "items": { "type": "number", "title": "org.apache.kafka.connect.data.Time", "flink.precision": 2, "connect.type": "int32", "flink.version": "1" } } ``` ```sql connect.type=int64 ``` ```sql { "type": "number", "connect.type": "int64" } ``` ```sql connect.type=bytes ``` ```sql flink.minLength=flink.maxLength ``` ```sql minLength/maxLength ``` ```sql { "type": "string", "flink.maxLength": 123, "flink.minLength": 123, "flink.version": "1", "connect.type": "bytes" } ``` ```sql { "type": "array", "items": { "type": "number", "title": "org.apache.kafka.connect.data.Time", "flink.precision": 2, "connect.type": "int32", "flink.version": "1" } } ``` ```sql minLength=maxLength ``` ```sql { "type": "string", "minLength": 123, "maxLength": 123 } ``` ```sql connect.type=int32 ``` ```sql org.apache.kafka.connect.data.Date ``` ```sql connect.type=bytes ``` ```sql org.apache.kafka.connect.data.Decimal ``` ```sql connect.type=float64 ``` ```sql { "type": "number", "connect.type": "float64" } ``` ```sql connect.type=float32 ``` ```sql { "type": "number", "connect.type": "float32" } ``` ```sql connect.type=int32 ``` ```sql { "type": "number", "connect.type": "int32" } ``` ```sql Array[Object] ``` ```sql connect.type=map ``` ```sql { "type": "array", "connect.type": "map", "items": { "type": "object", "properties": { "value": { "type": "number", "connect.type": "int64" }, "key": { "type": "number", "connect.type": "int32" } } } } ``` ```sql connect.type=map ``` ```sql { "type":"object", "connect.type":"map", "additionalProperties": { "type":"number", "connect.type":"int64" } } ``` ```sql Array[Object] ``` ```sql connect.type=map ``` ```sql flink.type=multiset ``` ```sql connect.type: int32 ``` ```sql connect.type: int64 ``` ```sql { "type": "array", "connect.type": "map", "flink.type": "multiset", "items": { "type": "object", "properties": { "value": { "type": "number", "connect.type": "int32" }, "key": { "type": "number", "connect.type": "int32" } } } } ``` ```sql connect.type=map ``` ```sql flink.type=multiset ``` ```sql connect.type: int32 ``` ```sql connect.type: int64 ``` ```sql { "type": "object", "connect.type": "map", "flink.type": "multiset", "additionalProperties": { "type": "number", "connect.type": "int32" } } ``` ```sql connect.type=int16 ``` ```sql { "type": "number", "connect.type": "int16" } ``` ```sql connect.type=int32 ``` ```sql flink.precision ``` ```sql org.apache.kafka.connect.data.Time ``` ```sql { "type":"number", "title":"org.apache.kafka.connect.data.Time", "flink.precision":2, "connect.type":"int32", "flink.version":"1" } ``` ```sql connect.type=int64 ``` ```sql flink.precision ``` ```sql flink.type=timestamp ``` ```sql org.apache.kafka.connect.data.Timestamp ``` ```sql { "type":"number", "title":"org.apache.kafka.connect.data.Timestamp", "flink.precision":2, "flink.type":"timestamp", "connect.type":"int64", "flink.version":"1" } ``` ```sql connect.type=int64 ``` ```sql flink.precision ``` ```sql org.apache.kafka.connect.data.Timestamp ``` ```sql { "type":"number", "title":"org.apache.kafka.connect.data.Timestamp", "flink.precision":2, "connect.type":"int64", "flink.version":"1" } ``` ```sql connect.type=int8 ``` ```sql { "type": "number", "connect.type": "int8" } ``` ```sql connect.type=bytes ``` ```sql flink.maxLength ``` ```sql { "type": "string", "flink.maxLength": 123, "flink.version": "1", "connect.type": "bytes" } ``` ```sql { "type": "string", "maxLength": 123 } ``` ```sql flink.wrapped ``` ```sql repeated int64 value = 1; ``` ```sql arrayNullableRepeatedWrapper arrayNullable = 1 [(confluent.field_meta) = { params: [ { key: "flink.wrapped", value: "true" }, { key: "flink.version", value: "1" } ] }]; message arrayNullableRepeatedWrapper { repeated int64 value = 1; } ``` ```sql repeated elementNullableElementWrapper elementNullable = 2 [(confluent.field_meta) = { params: [ { key: "flink.wrapped", value: "true" }, { key: "flink.version", value: "1" } ] }]; message elementNullableElementWrapper { optional int64 value = 1; } ``` ```sql optional int64 bigint = 8; ``` ```sql flink.maxLength=flink.minLength ``` ```sql optional bytes binary = 13 [(confluent.field_meta) = { params: [ { key: "flink.maxLength", value: "123" }, { key: "flink.minLength", value: "123" }, { key: "flink.version", value: "1" } ] }]; ``` ```sql optional bool boolean = 2; ``` ```sql flink.maxLength=flink.minLength ``` ```sql optional string char = 11 [(confluent.field_meta) = { params: [ { key: "flink.maxLength", value: "123" }, { key: "flink.minLength", value: "123" }, { key: "flink.version", value: "1" } ] }]; ``` ```sql google.type.Date ``` ```sql optional .google.type.Date date = 17; ``` ```sql confluent.type.Decimal ``` ```sql optional .confluent.type.Decimal decimal = 19 [(confluent.field_meta) = { params: [ { value: "5", key: "precision" }, { value: "1", key: "scale" }, { key: "flink.version", value: "1" } ] }]; ``` ```sql optional double double = 10; ``` ```sql optional float float = 9; ``` ```sql optional int32 int = 7; ``` ```sql repeated MESSAGE ``` ```sql XXEntry(K key, V value) ``` ```sql flink.wrapped ``` ```sql repeated MapEntry map = 20; message MapEntry { optional string key = 1; optional int64 value = 2; } ``` ```sql repeated MESSAGE ``` ```sql XXEntry(V key, int32 value) ``` ```sql flink.wrapped ``` ```sql flink.type=multiset ``` ```sql repeated MultisetEntry multiset = 1 [(confluent.field_meta) = { params: [ { key: "flink.type", value: "multiset" }, { key: "flink.version", value: "1" } ] }]; message MultisetEntry { optional string key = 1; int32 value = 2; } ``` ```sql meta_Row meta = 1; message meta_Row { float a = 1; float b = 2; } ``` ```sql connect.type = int16 ``` ```sql optional int32 smallInt = 6 [(confluent.field_meta) = { doc: "smallInt comment", params: [ { key: "flink.version", value: "1" }, { key: "connect.type", value: "int16" } ] }]; ``` ```sql google.protobuf.Timestamp ``` ```sql flink.precision ``` ```sql flink.type=timestamp ``` ```sql optional .google.protobuf.Timestamp timestamp_ltz_3 = 16 [(confluent.field_meta) = { params: [ { key: "flink.type", value: "timestamp" }, { key: "flink.precision", value: "3" }, { key: "flink.version", value: "1" } ] }]; ``` ```sql google.protobuf.Timestamp ``` ```sql flink.precision ``` ```sql optional .google.protobuf.Timestamp timestamp_ltz_3 = 15 [(confluent.field_meta) = { params: [ { key: "flink.precision", value: "3" }, { key: "flink.version", value: "1" } ] }]; ``` ```sql google.type.TimeOfDay ``` ```sql optional .google.type.TimeOfDay time = 18 [(confluent.field_meta) = { params: [ { key: "flink.precision", value: "3" }, { key: "flink.version", value: "1" } ] }]; ``` ```sql connect.type = int8 ``` ```sql optional int32 tinyInt = 4 [(confluent.field_meta) = { doc: "tinyInt comment", params: [ { key: "flink.version", value: "1" }, { key: "connect.type", value: "int8" } ] }]; ``` ```sql flink.maxLength ``` ```sql optional bytes varbinary = 14 [(confluent.field_meta) = { params: [ { key: "flink.maxLength", value: "123" }, { key: "flink.version", value: "1" } ] }]; ``` ```sql flink.maxLength ``` ```sql optional string varchar = 12 [(confluent.field_meta) = { params: [ { key: "flink.maxLength", value: "123" }, { key: "flink.version", value: "1" } ] }]; ``` ```sql flink.notNull ``` ```sql message Row { .google.type.Date date = 1 [(confluent.field_meta) = { params: [ { key: "flink.version", value: "1" }, { key: "flink.notNull", value: "true" } ] }]; } ``` ```sql message Example { string required_field = 1; // NOT NULL in Flink optional string nullable_field = 2; // NULLABLE in Flink repeated string array_field = 3; // NOT NULL array in Flink repeated optional string nullable_array_field = 4; // NOT NULL array with nullable elements } ``` --- ### Flink SQL Examples in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/sql-examples.html Flink SQL Examples in Confluent Cloud for Apache Flink¶ The following code examples show common Flink SQL use cases with Confluent Cloud for Apache Flink®. CREATE TABLE Inferred tables ALTER TABLE SELECT Schema reference CREATE TABLE examples¶ The following examples show how to create Flink tables with various options. Minimal table¶ CREATE TABLE t_minimal (s STRING); Properties Append changelog mode. No Schema Registry key. Round robin distribution. 6 Kafka partitions. The $rowtime column and system watermark are added implicitly. Table with a primary key¶ SyntaxCREATE TABLE t_pk (k INT PRIMARY KEY NOT ENFORCED, s STRING); Properties Upsert changelog mode. The primary key defines an implicit DISTRIBUTED BY(k). k is the Schema Registry key. Hash distribution on k. The table has 6 Kafka partitions. k is declared as being unique, meaning no duplicate rows. k must not contain NULLs, so an implicit NOT NULL is added. The $rowtime column and system watermark are added implicitly. Table with a primary key in append mode¶ SyntaxCREATE TABLE t_pk_append (k INT PRIMARY KEY NOT ENFORCED, s STRING) DISTRIBUTED INTO 4 BUCKETS WITH ('changelog.mode' = 'append'); Properties Append changelog mode. k is the Schema Registry key. Hash distribution on k. The table has 4 Kafka partitions. k is declared as being unique, meaning no duplicate rows. k must not contain NULLs, meaning implicit NOT NULL. The $rowtime column and system watermark are added implicitly. Table with hash distribution¶ SyntaxCREATE TABLE t_dist (k INT, s STRING) DISTRIBUTED BY (k) INTO 4 BUCKETS; Properties Append changelog mode. k is the Schema Registry key. Hash distribution on k. The table has 4 Kafka partitions. The $rowtime column and system watermark are added implicitly. Complex table with all concepts combined¶ SyntaxCREATE TABLE t_complex (k1 INT, k2 INT, PRIMARY KEY (k1, k2) NOT ENFORCED, s STRING) COMMENT 'My complex table' DISTRIBUTED BY HASH(k1) INTO 4 BUCKETS WITH ('changelog.mode' = 'append'); Properties Append changelog mode. k1 is the Schema Registry key. Hash distribution on k1. k2 is treated as a value column and is stored in the value part of Schema Registry. The table has 4 Kafka partitions. k1 and k2 are declared as being unique, meaning no duplicates. k and k2 must not contain NULLs, meaning implicit NOT NULL. The $rowtime column and system watermark are added implicitly. An additional comment is added. Table with overlapping names in key/value of Schema Registry but disjoint data¶ SyntaxCREATE TABLE t_disjoint (from_key_k INT, k STRING) DISTRIBUTED BY (from_key_k) WITH ('key.fields-prefix' = 'from_key_'); Properties Append changelog mode. Hash distribution on from_key_k. The key prefix from_key_ is defined and is stripped before storing the schema in Schema Registry. Therefore, k is the Schema Registry key of type INT. Also, k is the Schema Registry value of type STRING. Both key and value store disjoint data, so they can have different data types Create with overlapping names in key/value of Schema Registry but joint data¶ SyntaxCREATE TABLE t_joint (k INT, v STRING) DISTRIBUTED BY (k) WITH ('value.fields-include' = 'all'); Properties Append changelog mode. Hash distribution on k. By default, the key is never included in the value in Schema Registry. By setting 'value.fields-include' = 'all', the value contains the full table schema Therefore, k is the Schema Registry key. Also, k, v is the Schema Registry value. The payload of k is stored twice in the Kafka message, because key and value store joint data and they have the same data type for k. Table with metadata columns for writing a Kafka message timestamp¶ SyntaxCREATE TABLE t_metadata_write (name STRING, ts TIMESTAMP_LTZ(3) NOT NULL METADATA FROM 'timestamp') DISTRIBUTED INTO 1 BUCKETS; Properties Adds the ts metadata column, which isn’t part of Schema Registry but instead is a pure Flink concept. In contrast with $rowtime, which is declared as a METADATA VIRTUAL column, ts is selected in a SELECT * statement and is writable. The following examples show how to fill Kafka messages with an instant. INSERT INTO t (ts, name) SELECT NOW(), 'Alice'; INSERT INTO t (ts, name) SELECT TO_TIMESTAMP_LTZ(0, 3), 'Bob'; SELECT $rowtime, * FROM t; The Schema Registry subject compatibility mode must be FULL or FULL_TRANSITIVE. For more information, see Schema Evolution and Compatibility for Schema Registry on Confluent Cloud. Table with string key and value in Schema Registry¶ SyntaxCREATE TABLE t_raw_string_key (key STRING, i INT) DISTRIBUTED BY (key) WITH ('key.format' = 'raw'); Properties Schema Registry is filled with a value subject containing i. The key columns are determined by the DISTRIBUTED BY clause. By default, Avro in Schema Registry would be used for the key, but the WITH clause overrides this to the raw format. Tables with cross-region schema sharing¶ Create two Kafka clusters in different regions, for example, eu-west-1 and us-west-2. Create two Flink compute pools in different regions, for example, eu-west-1 and us-west-2. In the first region, run the following statement. CREATE TABLE t_shared_schema (key STRING, s STRING) DISTRIBUTED BY (key); In the second region, run the same statement. CREATE TABLE t_shared_schema (key STRING, s STRING) DISTRIBUTED BY (key); Properties Schema Registry is shared across regions. The SQL metastore, Flink compute pools, and Kafka clusters are regional. Both tables in either region share the Schema Registry subjects t_shared_schema-key and t_shared_schema-value. Create with different changelog modes¶ There are three ways of storing events in a table’s log, this is, in the underlying Kafka topic. append Every insertion event is an immutable fact. Every event is insert-only. Events can be distributed in a round-robin fashion across workers/shards because they are unrelated. upsert Events are related using a primary key. Every event is either an upsert or delete event for a primary key. Events for the same primary key should land at the same worker/shard. retract Every upsert event is a fact that can be “undone”. This means that every event is either an insertion or its retraction. So, two events are related by all columns. In other words, the entire row is the key. For example, +I['Bob', 42] is related to -D['Bob', 42] and +U['Alice', 13] is related to -U['Alice', 13]. The retract mode is intermediate between the append and upsert modes. The append and upsert modes are natural to existing Kafka consumers and producers. Kafka compaction is a kind of upsert. Start with a table created by the following statement. CREATE TABLE t_changelog_modes (i BIGINT); Properties Confluent Cloud for Apache Flink always derives an appropriate changelog mode for the preceding declaration. If there is no primary key, append is the safest option, because it prevents users from pushing updates into a topic accidentally, and it has the best support of downstream consumers. -- works because the query is non-updating INSERT INTO t_changelog_modes SELECT 1; -- does not work because the query is updating, causing an error INSERT INTO t_changelog_modes SELECT COUNT(*) FROM (VALUES (1), (2), (3)); If you need updates, and if downstream consumers support it, for example, when the consumer is another Flink job, you can set the changelog mode to retract. ALTER TABLE t_changelog_modes SET ('changelog.mode' = 'retract'); Properties The table starts accepting retractions during INSERT INTO. Already existing records in the Kafka topic are treated as insertions. Newly added records receive a changeflag (+I, +U, -U, -D) in the Kafka message header. Going back to append mode is possible, but retractions (-U, -D) appear as insertions, and the Kafka header metadata column reveals the changeflag. ALTER TABLE t_changelog_modes SET ('changelog.mode' = 'append'); ALTER TABLE t_changelog_modes ADD headers MAP METADATA VIRTUAL; -- Shows what is serialized internally SELECT i, headers FROM t_changelog_modes; Table with infinite retention time¶ CREATE TABLE t_infinite_retention (i INT) WITH ('kafka.retention.time' = '0'); Properties By default, the retention time is 7 days, as in all other APIs. Flink doesn’t support -1 for durations, so 0 means infinite retention time. Durations in Flink support 2 day or 2 d syntax, so it doesn’t need to be in milliseconds. If no unit is specified, the unit is milliseconds. The following units are supported: "d", "day", "h", "hour", "m", "min", "minute", "ms", "milli", "millisecond", "µs", "micro", "microsecond", "ns", "nano", "nanosecond" Inferred table examples¶ Inferred tables are tables that have not been created by using a CREATE TABLE statement, but instead are automatically detected from information about existing Kafka topics and Schema Registry entries. You can use the ALTER TABLE statement to evolve schemas for inferred tables. The following examples show output from the SHOW CREATE TABLE statement called on the resulting table. No key or value in Schema Registry¶ For an inferred table with no registered key or value schemas, SHOW CREATE TABLE returns the following output: CREATE TABLE `t_raw` ( `key` VARBINARY(2147483647), `val` VARBINARY(2147483647) ) DISTRIBUTED BY HASH(`key`) INTO 2 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'raw' ... ) Properties Key and value formats are raw (binary format) with BYTES. Following Kafka message semantics, both key and value support NULL as well, so the following code is valid: INSERT INTO t_raw (key, val) SELECT CAST(NULL AS BYTES), CAST(NULL AS BYTES); No key and but record value in Schema Registry¶ For the following value schema in Schema Registry: { "type": "record", "name": "TestRecord", "fields": [ { "name": "i", "type": "int" }, { "name": "s", "type": "string" } ] } SHOW CREATE TABLE returns the following output: CREATE TABLE `t_raw_key` ( `key` VARBINARY(2147483647), `i` INT NOT NULL, `s` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'avro-registry' ... ) Properties The key format is raw (binary format) with BYTES. Following Kafka message semantics, the key supports NULL as well, so the following code is valid: INSERT INTO t_raw_key SELECT CAST(NULL AS BYTES), 12, 'Bob'; Atomic key and record value in Schema Registry¶ For the following key schema in Schema Registry: "int" And for the following value schema in Schema Registry: { "type": "record", "name": "TestRecord", "fields": [ { "name": "i", "type": "int" }, { "name": "s", "type": "string" } ] } SHOW CREATE TABLE returns the following output: CREATE TABLE `t_atomic_key` ( `key` INT NOT NULL, `i` INT NOT NULL, `s` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`key`) INTO 2 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'avro-registry', 'value.format' = 'avro-registry' ... ) Properties Schema Registry defines the column data type as INT NOT NULL. The column name, key, is used as the default, because Schema Registry doesn’t provide a column name. Overlapping names in key/value, no key in Schema Registry¶ For the following value schema in Schema Registry: { "type": "record", "name": "TestRecord", "fields": [ { "name": "i", "type": "int" }, { "name": "key", "type": "string" } ] } SHOW CREATE TABLE returns the following output: CREATE TABLE `t_raw_disjoint` ( `key_key` VARBINARY(2147483647), `i` INT NOT NULL, `key` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`key_key`) INTO 1 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.fields-prefix' = 'key_', 'key.format' = 'raw', 'value.format' = 'avro-registry' ... ) Properties The Schema Registry value schema defines columns i INT NOT NULL and key STRING. The column name key BYTES is used as the default if no key is in Schema Registry. Because key would collide with value schema column, the key_ prefix is added. Record key and record value in Schema Registry¶ For the following key schema in Schema Registry: { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" } ] } And for the following value schema in Schema Registry: { "type": "record", "name": "TestRecord", "fields": [ { "name": "name", "type": "string" }, { "name": "zip_code", "type": "string" } ] } SHOW CREATE TABLE returns the following output: CREATE TABLE `t_sr_disjoint` ( `uid` INT NOT NULL, `name` VARCHAR(2147483647) NOT NULL, `zip_code` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`uid`) INTO 1 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'value.format' = 'avro-registry' ... ) Properties Schema Registry defines columns for both key and value. The column names of key and value are disjoint sets and don’t overlap. Record key and record value with overlap in Schema Registry¶ For the following key schema in Schema Registry: { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" } ] } And for the following value schema in Schema Registry: { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" },{ "name": "name", "type": "string" }, { "name": "zip_code", "type": "string" } ] } SHOW CREATE TABLE returns the following output: CREATE TABLE `t_sr_joint` ( `uid` INT NOT NULL, `name` VARCHAR(2147483647) NOT NULL, `zip_code` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`uid`) INTO 1 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'value.fields-include' = 'all', 'value.format' = 'avro-registry' ... ) Properties Schema Registry defines columns for both key and value. The column names of key and value overlap on uid. 'value.fields-include' = 'all' is set to exclude the key, because it is fully contained in the value. Detecting that key is fully contained in the value requires that both field name and data type match completely, including nullability, and all fields of the key are included in the value. Union types in Schema Registry¶ For the following value schema in Schema Registry: ["int", "string"] SHOW CREATE TABLE returns the following output: CREATE TABLE `t_union` ( `key` VARBINARY(2147483647), `int` INT, `string` VARCHAR(2147483647) ) ... For the following value schema in Schema Registry: [ "string", { "type": "record", "name": "User", "fields": [ { "name": "uid", "type": "int" },{ "name": "name", "type": "string" } ] }, { "type": "record", "name": "Address", "fields": [ { "name": "zip_code", "type": "string" } ] } ] SHOW CREATE TABLE returns the following output: CREATE TABLE `t_union` ( `key` VARBINARY(2147483647), `string` VARCHAR(2147483647), `User` ROW<`uid` INT NOT NULL, `name` VARCHAR(2147483647) NOT NULL>, `Address` ROW<`zip_code` VARCHAR(2147483647) NOT NULL> ) ... Properties NULL and NOT NULL are inferred depending on whether a union contains NULL. Elements of a union are always NULL, because they need to be set to NULL when a different element is set. If a record defines a namespace, the field is prefixed with it, for example, org.myorg.avro.User. Multi-message protobuf schema in Schema Registry¶ For the following value schema in Schema Registry: syntax = "proto3"; message Purchase { string item = 1; double amount = 2; string customer_id = 3; } message Pageview { string url = 1; bool is_special = 2; string customer_id = 3; } SHOW CREATE TABLE returns the following output: CREATE TABLE `t` ( `key` VARBINARY(2147483647), `Purchase` ROW< `item` VARCHAR(2147483647) NOT NULL, `amount` DOUBLE NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL >, `Pageview` ROW< `url` VARCHAR(2147483647) NOT NULL, `is_special` BOOLEAN NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL > ) ... For the following value schema in Schema Registry: syntax = "proto3"; message Purchase { string item = 1; double amount = 2; string customer_id = 3; Pageview pageview = 4; } message Pageview { string url = 1; bool is_special = 2; string customer_id = 3; } SHOW CREATE TABLE returns the following output: CREATE TABLE `t` ( `key` VARBINARY(2147483647), `Purchase` ROW< `item` VARCHAR(2147483647) NOT NULL, `amount` DOUBLE NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL, `pageview` ROW< `url` VARCHAR(2147483647) NOT NULL, `is_special` BOOLEAN NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL > >, `Pageview` ROW< `url` VARCHAR(2147483647) NOT NULL, `is_special` BOOLEAN NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL > ) ... For the following value schema in Schema Registry: syntax = "proto3"; message Purchase { string item = 1; double amount = 2; string customer_id = 3; Pageview pageview = 4; message Pageview { string url = 1; bool is_special = 2; string customer_id = 3; } } SHOW CREATE TABLE returns the following output: CREATE TABLE `t` ( `key` VARBINARY(2147483647), `item` VARCHAR(2147483647) NOT NULL, `amount` DOUBLE NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL, `pageview` ROW< `url` VARCHAR(2147483647) NOT NULL, `is_special` BOOLEAN NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL > ) ... Debezium CDC format in Schema Registry¶ For a Debezium CDC format with the following value schema in Schema Registry: { "type": "record", "name": "Customer", "namespace": "io.debezium.data", "fields": [ { "name": "before", "type": ["null", { "type": "record", "name": "Value", "fields": [ {"name": "id", "type": "int"}, {"name": "name", "type": "string"}, {"name": "email", "type": "string"} ] }], "default": null }, { "name": "after", "type": ["null", "Value"], "default": null }, { "name": "source", "type": { "type": "record", "name": "Source", "fields": [ {"name": "version", "type": "string"}, {"name": "connector", "type": "string"}, {"name": "name", "type": "string"}, {"name": "ts_ms", "type": "long"}, {"name": "db", "type": "string"}, {"name": "schema", "type": "string"}, {"name": "table", "type": "string"} ] } }, {"name": "op", "type": "string"}, {"name": "ts_ms", "type": ["null", "long"], "default": null}, {"name": "transaction", "type": ["null", { "type": "record", "name": "Transaction", "fields": [ {"name": "id", "type": "string"}, {"name": "total_order", "type": "long"}, {"name": "data_collection_order", "type": "long"} ] }], "default": null} ] } SHOW CREATE TABLE returns the following output: CREATE TABLE `customer_changes` ( `key` VARBINARY(2147483647), `id` INT NOT NULL, `name` VARCHAR(2147483647) NOT NULL, `email` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( 'changelog.mode' = 'retract', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'avro-debezium-registry' ... ) Properties Flink detects the Debezium format automatically, based on the schema structure with after, before, and op fields. The table schema is inferred from the after schema, exposing only the actual data fields. Automatic Debezium Envelope Detection: For schemas created after May 19, 2025 at 09:00 UTC, Flink automatically detects Debezium envelopes and sets appropriate defaults: value.format defaults to *-debezium-registry (instead of *-registry) changelog.mode defaults to retract (instead of append) Exception: If Kafka cleanup.policy is compact, changelog.mode is set to upsert The default changelog.mode is retract, which properly handles all CDC operations, including inserts, updates, and deletes. You can manually override the changelog mode if necessary: -- Change to upsert mode for primary key-based operations ALTER TABLE customer_changes SET ('changelog.mode' = 'upsert'); -- Change to append mode (processes only inserts and updates) ALTER TABLE customer_changes SET ('changelog.mode' = 'append'); ALTER TABLE examples¶ The following examples show frequently used scenarios for ALTER TABLE. Define a watermark for perfectly ordered data¶ Flink guarantees that rows are always emitted before the watermark is generated. The following statements ensure that for perfectly ordered events, meaning events without time-skew, a watermark can be equal to the timestamp or 1 ms less than the timestamp. CREATE TABLE t_perfect_watermark (i INT); -- If multiple events can have the same timestamp. ALTER TABLE t_perfect_watermark MODIFY WATERMARK FOR $rowtime AS $rowtime - INTERVAL '0.001' SECOND; -- If a single event can have the timestamp. ALTER TABLE t_perfect_watermark MODIFY WATERMARK FOR $rowtime AS $rowtime; Drop your custom watermark strategy¶ Remove the custom watermark strategy to restore the default watermark strategy. View the current table schema and metadata. DESCRIBE `orders`; Your output should resemble: +-------------+------------------------+----------+-------------------+ | Column Name | Data Type | Nullable | Extras | +-------------+------------------------+----------+-------------------+ | user | BIGINT | NOT NULL | PRIMARY KEY | | product | STRING | NULL | | | amount | INT | NULL | | | ts | TIMESTAMP(3) *ROWTIME* | NULL | WATERMARK AS `ts` | +-------------+------------------------+----------+-------------------+ Remove the watermark strategy of the table. ALTER TABLE `orders` DROP WATERMARK; Your output should resemble: Statement phase is COMPLETED. Check the new table schema and metadata. DESCRIBE `orders`; Your output should resemble: +-------------+--------------+----------+-------------+ | Column Name | Data Type | Nullable | Extras | +-------------+--------------+----------+-------------+ | user | BIGINT | NOT NULL | PRIMARY KEY | | product | STRING | NULL | | | amount | INT | NULL | | | ts | TIMESTAMP(3) | NULL | | +-------------+--------------+----------+-------------+ Configure Debezium format for CDC data¶ Change regular format to Debezium format¶ Note For schemas created after May 19, 2025 at 09:00 UTC, Flink automatically detects Debezium envelopes and configures the appropriate format and changelog mode. Manual conversion is necessary only for older schemas or when you want to override the default behavior. For tables that have been inferred with regular formats but contain Debezium CDC (Change Data Capture) data: AvroJSON SchemaProtobuf-- Convert from regular Avro format to Debezium CDC format -- and configure the appropriate Flink changelog interpretation mode: -- * append: Treats each record as an INSERT operation with no relationship between records -- * retract: Handles paired operations (INSERT/UPDATE/DELETE) where changes to the same row -- are represented as a retraction of the old value followed by an addition of the new value -- * upsert: Groups all operations for the primary key (derived from the Kafka message key), -- with each operation effectively merging with or replacing previous state -- (INSERT creates, UPDATE modifies, DELETE removes) ALTER TABLE customer_data SET ( 'value.format' = 'avro-debezium-registry', 'changelog.mode' = 'retract' ); -- Convert from regular JSON format to Debezium CDC format -- and configure the appropriate Flink changelog interpretation mode: -- * append: Treats each record as an INSERT operation with no relationship between records -- * retract: Handles paired operations (INSERT/UPDATE/DELETE) where changes to the same row -- are represented as a retraction of the old value followed by an addition of the new value -- * upsert: Groups all operations for the primary key (derived from the Kafka message key), -- with each operation effectively merging with or replacing previous state -- (INSERT creates, UPDATE modifies, DELETE removes) ALTER TABLE customer_data_json SET ( 'value.format' = 'json-debezium-registry', 'changelog.mode' = 'retract' ); -- Convert from regular Protobuf format to Debezium CDC format -- and configure the appropriate Flink changelog interpretation mode: -- * append: Treats each record as an INSERT operation with no relationship between records -- * retract: Handles paired operations (INSERT/UPDATE/DELETE) where changes to the same row -- are represented as a retraction of the old value followed by an addition of the new value -- * upsert: Groups all operations for the primary key (derived from the Kafka message key), -- with each operation effectively merging with or replacing previous state -- (INSERT creates, UPDATE modifies, DELETE removes) ALTER TABLE customer_data_proto SET ( 'value.format' = 'proto-debezium-registry', 'changelog.mode' = 'retract' ); Modify Changelog Processing Mode¶ For tables with any type of data that need a different processing mode for handling changes: -- Change to append mode (default) -- Best for event streams where each record is independent ALTER TABLE customer_changes SET ( 'changelog.mode' = 'append' ); -- Change to retract mode -- Useful when changes to the same row are represented as paired operations ALTER TABLE customer_changes SET ( 'changelog.mode' = 'retract' ); -- Change upsert mode when working with primary keys -- Best when tracking state changes using a primary key (derived from Kafka message key) ALTER TABLE customer_changes SET ( 'changelog.mode' = 'upsert' ); Read and/or write Kafka headers¶ -- Create example topic CREATE TABLE t_headers (i INT); -- For read-only (virtual) ALTER TABLE t_headers ADD headers MAP METADATA VIRTUAL; -- For read and write (persisted). Column becomes mandatory in INSERT INTO. ALTER TABLE t_headers MODIFY headers MAP METADATA; -- Use implicit casting (origin is always MAP) ALTER TABLE t_headers MODIFY headers MAP METADATA; -- Insert and read INSERT INTO t_headers SELECT 42, MAP['k1', 'v1', 'k2', 'v2']; SELECT * FROM t_headers; Properties The metadata key is headers. If you don’t want to name the column this way, use: other_name MAP METADATA FROM 'headers' VIRTUAL. Keys of headers must be unique. Multi-key headers are not supported. Add headers as a metadata column¶ You can get the headers of a Kafka record as a map of raw bytes by adding a headers virtual metadata column. Run the following statement to add the Kafka partition as a metadata column: ALTER TABLE `orders` ADD ( `headers` MAP METADATA VIRTUAL); View the new schema. DESCRIBE `orders`; Your output should resemble: +-------------+-------------------+----------+-------------------------+ | Column Name | Data Type | Nullable | Extras | +-------------+-------------------+----------+-------------------------+ | user | BIGINT | NOT NULL | PRIMARY KEY, BUCKET KEY | | product | STRING | NULL | | | amount | INT | NULL | | | ts | TIMESTAMP(3) | NULL | | | headers | MAP | NULL | METADATA VIRTUAL | +-------------+-------------------+----------+-------------------------+ Read topic from specific offsets¶ -- Create example topic with 1 partition filled with values CREATE TABLE t_specific_offsets (i INT) DISTRIBUTED INTO 1 BUCKETS; INSERT INTO t_specific_offsets VALUES (1), (2), (3), (4), (5); -- Returns 1, 2, 3, 4, 5 SELECT * FROM t_specific_offsets; -- Changes the scan range ALTER TABLE t_specific_offsets SET ( 'scan.startup.mode' = 'specific-offsets', 'scan.startup.specific-offsets' = 'partition:0,offset:3' ); -- Returns 4, 5 SELECT * FROM t_specific_offsets; Properties scan.startup.mode and scan.bounded.mode control which range in the changelog (Kafka topic) to read. scan.startup.specific-offsets and scan.bounded.specific-offsets define offsets per partition. In the example, only 1 partition is used. For multiple partitions, use the following syntax: 'scan.startup.specific-offsets' = 'partition:0,offset:3; partition:1,offset:42; partition:2,offset:0' Debug “no output” and no watermark cases¶ The root cause for most “no output” cases is that a time-based operation, for example, TUMBLE, MATCH_RECOGNIZE, and FOR SYSTEM_TIME AS OF, did not receive recent enough watermarks. The current time of an operator is calculated by the minimum watermark of all inputs, meaning across all tables/topics and their partitions. If one partition does not emit a watermark, it can affect the entire pipeline. The following statements may be helpful for debugging issues related to watermarks. -- example table CREATE TABLE t_watermark_debugging (k INT, s STRING) DISTRIBUTED BY (k) INTO 4 BUCKETS; -- Each value lands in a separate Kafka partition (out of 4). -- Leave out values to see missing watermarks. INSERT INTO t_watermark_debugging VALUES (1, 'Bob'), (2, 'Alice'), (8, 'John'), (15, 'David'); -- If ROW_NUMBER doesn't show results, it's clearly a watermark issue. SELECT ROW_NUMBER() OVER (ORDER BY $rowtime ASC) AS `number`, * FROM t_watermark_debugging; -- Add partition information as metadata column ALTER TABLE t_watermark_debugging ADD part INT METADATA FROM 'partition' VIRTUAL; -- Use the CURRENT_WATERMARK() function to check which watermark is calculated SELECT *, part AS `Row Partition`, $rowtime AS `Row Timestamp`, CURRENT_WATERMARK($rowtime) AS `Operator Watermark` FROM t_watermark_debugging; -- Visualize the highest timestamp per Kafka partition -- Due to the table declaration (with 4 buckets), this query should show 4 rows. -- If not, the missing partitions might be the cause for watermark issues. SELECT part AS `Partition`, MAX($rowtime) AS `Max Timestamp in Partition` FROM t_watermark_debugging GROUP BY part; -- A workaround could be to not use the system watermark: ALTER TABLE t_watermark_debugging MODIFY WATERMARK FOR $rowtime AS $rowtime - INTERVAL '2' SECOND; -- Or for perfect input data: ALTER TABLE t_watermark_debugging MODIFY WATERMARK FOR $rowtime AS $rowtime - INTERVAL '0.001' SECOND; -- Add "fresh" data while the above statements with -- ROW_NUMBER() or CURRENT_WATERMARK() are running. INSERT INTO t_watermark_debugging VALUES (1, 'Fresh Bob'), (2, 'Fresh Alice'), (8, 'Fresh John'), (15, 'Fresh David'); The debugging examples above won’t solve everything but may help in finding the root cause. The system watermark strategy is smart and excludes idle Kafka partitions from the watermark calculation after some time, but at least one partition must produce new data for the “logical clock” with watermarks. Typically, root causes are: Idle Kafka partitions No data in Kafka partitions Not enough data in Kafka partitions Watermark strategy is too conservative No fresh data after warm up with historical data for progressing the logical clock Handle idle partitions for missing watermarks¶ Idle partitions often cause missing watermarks. Also, no data in a partition or infrequent data can be a root cause. -- Create a topic with 4 partitions. CREATE TABLE t_watermark_idle (k INT, s STRING) DISTRIBUTED BY (k) INTO 4 BUCKETS; -- Avoid the "not enough data" problem by using a custom watermark. -- The watermark strategy is still coarse-grained enough for this example. ALTER TABLE t_watermark_idle MODIFY WATERMARK FOR $rowtime AS $rowtime - INTERVAL '2' SECONDS; -- Each value lands in a separate Kafka partition, and partition 1 is empty. INSERT INTO t_watermark_idle VALUES (1, 'Bob in partition 0'), (2, 'Alice in partition 3'), (8, 'John in partition 2'); -- Thread 1: Start a streaming job. SELECT ROW_NUMBER() OVER (ORDER BY $rowtime ASC) AS `number`, * FROM t_watermark_idle; -- Thread 2: Insert some data immediately -> Thread 1 still without results. INSERT INTO t_watermark_idle VALUES (1, 'Another Bob in partition 0 shortly after'); -- Thread 2: Insert some data after 15s -> Thread 1 should show results. INSERT INTO t_watermark_idle VALUES (1, 'Another Bob in partition 0 after 15s') Within the first 15 seconds, all partitions contribute to the watermark calculation, so the first INSERT INTO has no effect because partition 1 is still empty. After 15 seconds, all partitions are marked as idle. No partition contributes to the watermark calculation. But when the second INSERT INTO is executed, it becomes the main driving partition for the logical clock. The global watermark jumps to “second INSERT INTO - 2 seconds”. In the following code, the sql.tables.scan.idle-timeout configuration overrides the default idle-detection algorithm, so even an immediate INSERT INTO can be the main driving partition for the logical clock, because all other partitions are marked as idle after 1 second. -- Thread 1: Start a streaming job. -- Lower the idle timeout further. SET 'sql.tables.scan.idle-timeout' = '1s'; SELECT ROW_NUMBER() OVER (ORDER BY $rowtime ASC) AS `number`, * FROM t_watermark_idle; -- Thread 2: Insert some data immediately -> Thread 1 should show results. INSERT INTO t_watermark_idle VALUES (1, 'Another Bob in partition 0 shortly after'); Change the schema context property¶ You can set the schema context for key and value formats to control the namespace for your schema resolution in Schema Registry. Set the schema context for the value format ALTER TABLE `orders` SET ('value.format.schema-context' = '.lsrc-newcontext'); Your output should resemble: Statement phase is COMPLETED. Check the new table properties. SHOW CREATE TABLE `orders`; Your output should resemble: +----------------------------------------------------------------------+ | SHOW CREATE TABLE | +----------------------------------------------------------------------+ | CREATE TABLE `catalog`.`database`.`orders` ( | | `user` BIGINT NOT NULL, | | `product` VARCHAR(2147483647), | | `amount` INT, | | `ts` TIMESTAMP(3) | | ) | | DISTRIBUTED BY HASH(`user`) INTO 6 BUCKETS | | WITH ( | | 'changelog.mode' = 'upsert', | | 'connector' = 'confluent', | | 'kafka.cleanup-policy' = 'delete', | | 'kafka.max-message-size' = '2097164 bytes', | | 'kafka.retention.size' = '0 bytes', | | 'kafka.retention.time' = '604800000 ms', | | 'key.format' = 'avro-registry', | | 'scan.bounded.mode' = 'unbounded', | | 'scan.startup.mode' = 'latest-offset', | | 'value.format' = 'avro-registry', | | 'value.format.schema-context' = '.lsrc-newcontext' | | ) | | | +----------------------------------------------------------------------+ Inferred tables schema evolution¶ You can use the ALTER TABLE statement to evolve schemas for inferred tables. The following examples show output from the SHOW CREATE TABLE statement called on the resulting table. Schema Registry columns overlap with computed/metadata columns¶ For the following value schema in Schema Registry: { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" } ] } Evolve a table by adding metadata: ALTER TABLE t_metadata_overlap ADD `timestamp` TIMESTAMP_LTZ(3) NOT NULL METADATA; SHOW CREATE TABLE returns the following output: CREATE TABLE t_metadata_overlap` ( `key` VARBINARY(2147483647), `uid` INT NOT NULL, `timestamp` TIMESTAMP(3) WITH LOCAL TIME ZONE NOT NULL METADATA ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( ... ) Properties Schema Registry says there is a timestamp physical column, but Flink says there is timestamp metadata column. In this case, metadata columns and computed columns have precedence, and Confluent Cloud for Apache Flink removes the physical column from the schema. Because Confluent Cloud for Apache Flink advertises FULL_TRANSITIVE mode, queries still work, and the physical column is set to NULL in the payload: INSERT INTO t_metadata_overlap SELECT CAST(NULL AS BYTES), 42, TO_TIMESTAMP_LTZ(0, 3); Evolve the table by renaming metadata: ALTER TABLE t_metadata_overlap DROP `timestamp`; ALTER TABLE t_metadata_overlap ADD message_timestamp TIMESTAMP_LTZ(3) METADATA FROM 'timestamp'; SELECT * FROM t_metadata_overlap; SHOW CREATE TABLE returns the following output: CREATE TABLE `t_metadata_overlap` ( `key` VARBINARY(2147483647), `uid` INT NOT NULL, `timestamp` VARCHAR(2147483647), `message_timestamp` TIMESTAMP(3) WITH LOCAL TIME ZONE METADATA FROM 'timestamp' ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( ... ) Properties Now, both physical and metadata columns appear and can be accessed for reading and writing. Enrich a column that has no Schema Registry information¶ For the following value schema in Schema Registry: { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" } ] } SHOW CREATE TABLE returns the following output: CREATE TABLE `t_enrich_raw_key` ( `key` VARBINARY(2147483647), `uid` INT NOT NULL ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'avro-registry' ... ) Properties Schema Registry provides only information for the value part. Because the key part is not backed by Schema Registry, the key.format is raw. The default data type of raw is BYTES, but you can change this by using the ALTER TABLE statement. Evolve the table by giving a raw format column a specific type: ALTER TABLE t_enrich_raw_key MODIFY key STRING; SHOW CREATE TABLE returns the following output: CREATE TABLE `t_enrich_raw_key` ( `key` STRING, `uid` INT NOT NULL ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'avro-registry' ... ) Properties Only changes to simple, atomic types, like INT, BYTES, and STRING are supported, where the binary representation is clear. For more complex modifications, use Schema Registry. In multi-cluster scenarios, the ALTER TABLE statement must be executed for every cluster, because the data type for key is stored in the Flink regional metastore. Configure Schema Registry subject names¶ When working with topics that use RecordNameStrategy or TopicRecordNameStrategy, you can configure the subject names for the schema resolution in Schema Registry. This is particularly useful when handling multiple event types in a single topic. For topics using these strategies, Flink initially infers a raw binary table: SHOW CREATE TABLE events; Your output will show a raw binary structure: CREATE TABLE `events` ( `key` VARBINARY(2147483647), `value` VARBINARY(2147483647) ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'raw' ) Configure value schema subject names for each format: AvroJSON SchemaProtobufALTER TABLE events SET ( 'value.format' = 'avro-registry', 'value.avro-registry.subject-names' = 'com.example.Order;com.example.Shipment' ); ALTER TABLE events SET ( 'value.format' = 'json-registry', 'value.json-registry.subject-names' = 'com.example.Order;com.example.Shipment' ); ALTER TABLE events SET ( 'value.format' = 'proto-registry', 'value.proto-registry.subject-names' = 'com.example.Order;com.example.Shipment' ); If your topic uses keyed messages, you can also configure the key format: ALTER TABLE events SET ( 'key.format' = 'avro-registry', 'key.avro-registry.subject-names' = 'com.example.OrderKey' ); You can configure both key and value schema subject names in a single statement: ALTER TABLE events SET ( 'key.format' = 'avro-registry', 'key.avro-registry.subject-names' = 'com.example.OrderKey', 'value.format' = 'avro-registry', 'value.avro-registry.subject-names' = 'com.example.Order;com.example.Shipment' ); Properties: Use semicolons (;) to separate multiple subject names Subject names must match exactly with the names registered in Schema Registry The format prefix (avro-registry, json-registry, or proto-registry) must match the schema format in Schema Registry Reset a key value¶ You can use the RESET option to set any key to its default value. The following example shows how to reset a table that has a JSON Schema back to raw format. ALTER TABLE json_table RESET ( 'value.json-registry.wire-encoding', 'value.json-registry.subject-names' ); Custom error handling¶ You can use ALTER TABLE with the error-handling.mode and error-handling.log.target table properties to set custom error handling for deserialization errors. The following code example shows how to log errors to the specified Dead Letter Queue (DLQ) table and enable processing to continue. ALTER TABLE my_table SET ( 'error-handling.mode' = 'log', 'error-handling.log.target' = 'my_error_table' ); Related content¶ Video: How to Set Idle Timeouts SELECT examples¶ The following examples show frequently used scenarios for SELECT. Most minimal statement¶ SyntaxSELECT 1; Properties Statement is bounded Check local time zone is configured correctly¶ SyntaxSELECT NOW(); Properties Statement is bounded NOW() returns a TIMSTAMP_LTZ(3), so if the client is configured correctly, it should show a timestamp in your local time zone. Combine multiple tables into one¶ SyntaxCREATE TABLE t_union_1 (i INT); CREATE TABLE t_union_2 (i INT); TABLE t_union_1 UNION ALL TABLE t_union_2; -- alternate syntax SELECT * FROM t_union_1 UNION ALL SELECT * FROM t_union_2; Get insights into the current watermark¶ SyntaxCREATE TABLE t_watermarked_insight (s STRING) DISTRIBUTED INTO 1 BUCKETS; INSERT INTO t_watermarked_insight VALUES ('Bob'), ('Alice'), ('Charly'); SELECT $rowtime, CURRENT_WATERMARK($rowtime) FROM t_watermarked_insight; The output resembles: $rowtime EXPR$1 2024-04-29 11:59:01.080 NULL 2024-04-29 11:59:01.093 2024-04-04 15:27:37.433 2024-04-29 11:59:01.094 2024-04-04 15:27:37.433 Properties The CURRENT_WATERMARK function returns the watermark that arrived at the operator evaluating the SELECT statement. The returned watermark is the minimum of all inputs, across all tables/topics and their partitions. If a common watermark was not received from all inputs, the function returns NULL. The CURRENT_WATERMARK function takes a time attribute, which is a column that has WATERMARK FOR defined. A watermark is always emitted after the row has been processed, so the first row always has a NULL watermark. Because the default watermark algorithm requires at least 250 records, initially it assumes the maximum lag of 7 days plus a safety margin of 7 days. The watermark quickly (exponentially) goes down as more data arrives. Sources emit watermarks every 200 ms, but within the first 200 ms they emit per row for powering examples like this. Flatten fields into columns¶ SyntaxCREATE TABLE t_flattening (i INT, r1 ROW, r2 ROW); SELECT r1.*, r2.* FROM t_flattening; PropertiesYou can apply the * operator on nested data, which enables flattening fields into columns of the table. Schema reference examples¶ The following examples show how to use schema references in Flink SQL. For the following schemas in Schema Registry: AvroProtobufJSON{ "type":"record", "namespace": "io.confluent.developer.avro", "name":"Purchase", "fields": [ {"name": "item", "type":"string"}, {"name": "amount", "type": "double"}, {"name": "customer_id", "type": "string"} ] } syntax = "proto3"; package io.confluent.developer.proto; message Purchase { string item = 1; double amount = 2; string customer_id = 3; } { "$schema": "http://json-schema.org/draft-07/schema#", "title": "Purchase", "type": "object", "properties": { "item": { "type": "string" }, "amount": { "type": "number" }, "customer_id": { "type": "string" } }, "required": ["item", "amount", "customer_id"] } AvroProtobufJSON{ "type":"record", "namespace": "io.confluent.developer.avro", "name":"Pageview", "fields": [ {"name": "url", "type":"string"}, {"name": "is_special", "type": "boolean"}, {"name": "customer_id", "type": "string"} ] } syntax = "proto3"; package io.confluent.developer.proto; message Pageview { string url = 1; bool is_special = 2; string customer_id = 3; } { "$schema": "http://json-schema.org/draft-07/schema#", "title": "Pageview", "type": "object", "properties": { "url": { "type": "string" }, "is_special": { "type": "boolean" }, "customer_id": { "type": "string" } }, "required": ["url", "is_special", "customer_id"] } AvroProtobufJSON[ "io.confluent.developer.avro.Purchase", "io.confluent.developer.avro.Pageview" ] syntax = "proto3"; package io.confluent.developer.proto; import "purchase.proto"; import "pageview.proto"; message CustomerEvent { oneof action { Purchase purchase = 1; Pageview pageview = 2; } } { "$schema": "http://json-schema.org/draft-07/schema#", "title": "CustomerEvent", "type": "object", "oneOf": [ { "$ref": "io.confluent.developer.json.Purchase" }, { "$ref": "io.confluent.developer.json.Pageview" } ] } and references: AvroProtobufJSON[ { "name": "io.confluent.developer.avro.Purchase", "subject": "purchase", "version": 1 }, { "name": "io.confluent.developer.avro.Pageview", "subject": "pageview", "version": 1 } ] [ { "name": "purchase.proto", "subject": "purchase", "version": 1 }, { "name": "pageview.proto", "subject": "pageview", "version": 1 } ] [ { "name": "io.confluent.developer.json.Purchase", "subject": "purchase", "version": 1 }, { "name": "io.confluent.developer.json.Pageview", "subject": "pageview", "version": 1 } ] SHOW CREATE TABLE customer-events; returns the following output: CREATE TABLE `customer-events` ( `key` VARBINARY(2147483647), `Purchase` ROW<`item` VARCHAR(2147483647) NOT NULL, `amount` DOUBLE NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL>, `Pageview` ROW<`url` VARCHAR(2147483647) NOT NULL, `is_special` BOOLEAN NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL> ) DISTRIBUTED BY HASH(`key`) INTO 2 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'kafka.cleanup-policy' = 'delete', 'kafka.max-message-size' = '2097164 bytes', 'kafka.retention.size' = '0 bytes', 'kafka.retention.time' = '7 d', 'key.format' = 'raw', 'scan.bounded.mode' = 'unbounded', 'scan.startup.mode' = 'earliest-offset', 'value.format' = '[VALUE_FORMAT]' ) Split into tables for each type¶ Syntax CREATE TABLE purchase AS SELECT Purchase.* FROM `customer-events` WHERE Purchase IS NOT NULL; SELECT * FROM purchase; CREATE TABLE pageview AS SELECT Pageview.* FROM `customer-events` WHERE Pageview IS NOT NULL; SELECT * FROM pageview; Output: item amount customer_id apple 9.99 u-21 jam 4.29 u-67 mango 13.99 u-67 socks 7.99 u-123 url is_special customer_id https://www.confluent.io TRUE u-67 http://www.cflt.io FALSE u-12 Related content¶ Flink SQL Queries Flink SQL Functions DDL Statements in Confluent Cloud for Apache Flink Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql CREATE TABLE t_minimal (s STRING); ``` ```sql CREATE TABLE t_pk (k INT PRIMARY KEY NOT ENFORCED, s STRING); ``` ```sql CREATE TABLE t_pk_append (k INT PRIMARY KEY NOT ENFORCED, s STRING) DISTRIBUTED INTO 4 BUCKETS WITH ('changelog.mode' = 'append'); ``` ```sql CREATE TABLE t_dist (k INT, s STRING) DISTRIBUTED BY (k) INTO 4 BUCKETS; ``` ```sql CREATE TABLE t_complex (k1 INT, k2 INT, PRIMARY KEY (k1, k2) NOT ENFORCED, s STRING) COMMENT 'My complex table' DISTRIBUTED BY HASH(k1) INTO 4 BUCKETS WITH ('changelog.mode' = 'append'); ``` ```sql CREATE TABLE t_disjoint (from_key_k INT, k STRING) DISTRIBUTED BY (from_key_k) WITH ('key.fields-prefix' = 'from_key_'); ``` ```sql CREATE TABLE t_joint (k INT, v STRING) DISTRIBUTED BY (k) WITH ('value.fields-include' = 'all'); ``` ```sql 'value.fields-include' = 'all' ``` ```sql CREATE TABLE t_metadata_write (name STRING, ts TIMESTAMP_LTZ(3) NOT NULL METADATA FROM 'timestamp') DISTRIBUTED INTO 1 BUCKETS; ``` ```sql INSERT INTO t (ts, name) SELECT NOW(), 'Alice'; INSERT INTO t (ts, name) SELECT TO_TIMESTAMP_LTZ(0, 3), 'Bob'; SELECT $rowtime, * FROM t; ``` ```sql CREATE TABLE t_raw_string_key (key STRING, i INT) DISTRIBUTED BY (key) WITH ('key.format' = 'raw'); ``` ```sql CREATE TABLE t_shared_schema (key STRING, s STRING) DISTRIBUTED BY (key); ``` ```sql CREATE TABLE t_shared_schema (key STRING, s STRING) DISTRIBUTED BY (key); ``` ```sql t_shared_schema-key ``` ```sql t_shared_schema-value ``` ```sql +I['Bob', 42] ``` ```sql -D['Bob', 42] ``` ```sql +U['Alice', 13] ``` ```sql -U['Alice', 13] ``` ```sql CREATE TABLE t_changelog_modes (i BIGINT); ``` ```sql -- works because the query is non-updating INSERT INTO t_changelog_modes SELECT 1; -- does not work because the query is updating, causing an error INSERT INTO t_changelog_modes SELECT COUNT(*) FROM (VALUES (1), (2), (3)); ``` ```sql ALTER TABLE t_changelog_modes SET ('changelog.mode' = 'retract'); ``` ```sql ALTER TABLE t_changelog_modes SET ('changelog.mode' = 'append'); ALTER TABLE t_changelog_modes ADD headers MAP METADATA VIRTUAL; -- Shows what is serialized internally SELECT i, headers FROM t_changelog_modes; ``` ```sql CREATE TABLE t_infinite_retention (i INT) WITH ('kafka.retention.time' = '0'); ``` ```sql "d", "day", "h", "hour", "m", "min", "minute", "ms", "milli", "millisecond", "µs", "micro", "microsecond", "ns", "nano", "nanosecond" ``` ```sql CREATE TABLE `t_raw` ( `key` VARBINARY(2147483647), `val` VARBINARY(2147483647) ) DISTRIBUTED BY HASH(`key`) INTO 2 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'raw' ... ) ``` ```sql INSERT INTO t_raw (key, val) SELECT CAST(NULL AS BYTES), CAST(NULL AS BYTES); ``` ```sql { "type": "record", "name": "TestRecord", "fields": [ { "name": "i", "type": "int" }, { "name": "s", "type": "string" } ] } ``` ```sql CREATE TABLE `t_raw_key` ( `key` VARBINARY(2147483647), `i` INT NOT NULL, `s` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'avro-registry' ... ) ``` ```sql INSERT INTO t_raw_key SELECT CAST(NULL AS BYTES), 12, 'Bob'; ``` ```sql { "type": "record", "name": "TestRecord", "fields": [ { "name": "i", "type": "int" }, { "name": "s", "type": "string" } ] } ``` ```sql CREATE TABLE `t_atomic_key` ( `key` INT NOT NULL, `i` INT NOT NULL, `s` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`key`) INTO 2 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'avro-registry', 'value.format' = 'avro-registry' ... ) ``` ```sql { "type": "record", "name": "TestRecord", "fields": [ { "name": "i", "type": "int" }, { "name": "key", "type": "string" } ] } ``` ```sql CREATE TABLE `t_raw_disjoint` ( `key_key` VARBINARY(2147483647), `i` INT NOT NULL, `key` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`key_key`) INTO 1 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.fields-prefix' = 'key_', 'key.format' = 'raw', 'value.format' = 'avro-registry' ... ) ``` ```sql i INT NOT NULL ``` ```sql { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" } ] } ``` ```sql { "type": "record", "name": "TestRecord", "fields": [ { "name": "name", "type": "string" }, { "name": "zip_code", "type": "string" } ] } ``` ```sql CREATE TABLE `t_sr_disjoint` ( `uid` INT NOT NULL, `name` VARCHAR(2147483647) NOT NULL, `zip_code` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`uid`) INTO 1 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'value.format' = 'avro-registry' ... ) ``` ```sql { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" } ] } ``` ```sql { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" },{ "name": "name", "type": "string" }, { "name": "zip_code", "type": "string" } ] } ``` ```sql CREATE TABLE `t_sr_joint` ( `uid` INT NOT NULL, `name` VARCHAR(2147483647) NOT NULL, `zip_code` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`uid`) INTO 1 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'value.fields-include' = 'all', 'value.format' = 'avro-registry' ... ) ``` ```sql 'value.fields-include' = 'all' ``` ```sql ["int", "string"] ``` ```sql CREATE TABLE `t_union` ( `key` VARBINARY(2147483647), `int` INT, `string` VARCHAR(2147483647) ) ... ``` ```sql [ "string", { "type": "record", "name": "User", "fields": [ { "name": "uid", "type": "int" },{ "name": "name", "type": "string" } ] }, { "type": "record", "name": "Address", "fields": [ { "name": "zip_code", "type": "string" } ] } ] ``` ```sql CREATE TABLE `t_union` ( `key` VARBINARY(2147483647), `string` VARCHAR(2147483647), `User` ROW<`uid` INT NOT NULL, `name` VARCHAR(2147483647) NOT NULL>, `Address` ROW<`zip_code` VARCHAR(2147483647) NOT NULL> ) ... ``` ```sql org.myorg.avro.User ``` ```sql syntax = "proto3"; message Purchase { string item = 1; double amount = 2; string customer_id = 3; } message Pageview { string url = 1; bool is_special = 2; string customer_id = 3; } ``` ```sql CREATE TABLE `t` ( `key` VARBINARY(2147483647), `Purchase` ROW< `item` VARCHAR(2147483647) NOT NULL, `amount` DOUBLE NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL >, `Pageview` ROW< `url` VARCHAR(2147483647) NOT NULL, `is_special` BOOLEAN NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL > ) ... ``` ```sql syntax = "proto3"; message Purchase { string item = 1; double amount = 2; string customer_id = 3; Pageview pageview = 4; } message Pageview { string url = 1; bool is_special = 2; string customer_id = 3; } ``` ```sql CREATE TABLE `t` ( `key` VARBINARY(2147483647), `Purchase` ROW< `item` VARCHAR(2147483647) NOT NULL, `amount` DOUBLE NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL, `pageview` ROW< `url` VARCHAR(2147483647) NOT NULL, `is_special` BOOLEAN NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL > >, `Pageview` ROW< `url` VARCHAR(2147483647) NOT NULL, `is_special` BOOLEAN NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL > ) ... ``` ```sql syntax = "proto3"; message Purchase { string item = 1; double amount = 2; string customer_id = 3; Pageview pageview = 4; message Pageview { string url = 1; bool is_special = 2; string customer_id = 3; } } ``` ```sql CREATE TABLE `t` ( `key` VARBINARY(2147483647), `item` VARCHAR(2147483647) NOT NULL, `amount` DOUBLE NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL, `pageview` ROW< `url` VARCHAR(2147483647) NOT NULL, `is_special` BOOLEAN NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL > ) ... ``` ```sql { "type": "record", "name": "Customer", "namespace": "io.debezium.data", "fields": [ { "name": "before", "type": ["null", { "type": "record", "name": "Value", "fields": [ {"name": "id", "type": "int"}, {"name": "name", "type": "string"}, {"name": "email", "type": "string"} ] }], "default": null }, { "name": "after", "type": ["null", "Value"], "default": null }, { "name": "source", "type": { "type": "record", "name": "Source", "fields": [ {"name": "version", "type": "string"}, {"name": "connector", "type": "string"}, {"name": "name", "type": "string"}, {"name": "ts_ms", "type": "long"}, {"name": "db", "type": "string"}, {"name": "schema", "type": "string"}, {"name": "table", "type": "string"} ] } }, {"name": "op", "type": "string"}, {"name": "ts_ms", "type": ["null", "long"], "default": null}, {"name": "transaction", "type": ["null", { "type": "record", "name": "Transaction", "fields": [ {"name": "id", "type": "string"}, {"name": "total_order", "type": "long"}, {"name": "data_collection_order", "type": "long"} ] }], "default": null} ] } ``` ```sql CREATE TABLE `customer_changes` ( `key` VARBINARY(2147483647), `id` INT NOT NULL, `name` VARCHAR(2147483647) NOT NULL, `email` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( 'changelog.mode' = 'retract', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'avro-debezium-registry' ... ) ``` ```sql value.format ``` ```sql *-debezium-registry ``` ```sql changelog.mode ``` ```sql cleanup.policy ``` ```sql changelog.mode ``` ```sql changelog.mode ``` ```sql -- Change to upsert mode for primary key-based operations ALTER TABLE customer_changes SET ('changelog.mode' = 'upsert'); -- Change to append mode (processes only inserts and updates) ALTER TABLE customer_changes SET ('changelog.mode' = 'append'); ``` ```sql CREATE TABLE t_perfect_watermark (i INT); -- If multiple events can have the same timestamp. ALTER TABLE t_perfect_watermark MODIFY WATERMARK FOR $rowtime AS $rowtime - INTERVAL '0.001' SECOND; -- If a single event can have the timestamp. ALTER TABLE t_perfect_watermark MODIFY WATERMARK FOR $rowtime AS $rowtime; ``` ```sql DESCRIBE `orders`; ``` ```sql +-------------+------------------------+----------+-------------------+ | Column Name | Data Type | Nullable | Extras | +-------------+------------------------+----------+-------------------+ | user | BIGINT | NOT NULL | PRIMARY KEY | | product | STRING | NULL | | | amount | INT | NULL | | | ts | TIMESTAMP(3) *ROWTIME* | NULL | WATERMARK AS `ts` | +-------------+------------------------+----------+-------------------+ ``` ```sql ALTER TABLE `orders` DROP WATERMARK; ``` ```sql Statement phase is COMPLETED. ``` ```sql DESCRIBE `orders`; ``` ```sql +-------------+--------------+----------+-------------+ | Column Name | Data Type | Nullable | Extras | +-------------+--------------+----------+-------------+ | user | BIGINT | NOT NULL | PRIMARY KEY | | product | STRING | NULL | | | amount | INT | NULL | | | ts | TIMESTAMP(3) | NULL | | +-------------+--------------+----------+-------------+ ``` ```sql -- Convert from regular Avro format to Debezium CDC format -- and configure the appropriate Flink changelog interpretation mode: -- * append: Treats each record as an INSERT operation with no relationship between records -- * retract: Handles paired operations (INSERT/UPDATE/DELETE) where changes to the same row -- are represented as a retraction of the old value followed by an addition of the new value -- * upsert: Groups all operations for the primary key (derived from the Kafka message key), -- with each operation effectively merging with or replacing previous state -- (INSERT creates, UPDATE modifies, DELETE removes) ALTER TABLE customer_data SET ( 'value.format' = 'avro-debezium-registry', 'changelog.mode' = 'retract' ); ``` ```sql -- Convert from regular JSON format to Debezium CDC format -- and configure the appropriate Flink changelog interpretation mode: -- * append: Treats each record as an INSERT operation with no relationship between records -- * retract: Handles paired operations (INSERT/UPDATE/DELETE) where changes to the same row -- are represented as a retraction of the old value followed by an addition of the new value -- * upsert: Groups all operations for the primary key (derived from the Kafka message key), -- with each operation effectively merging with or replacing previous state -- (INSERT creates, UPDATE modifies, DELETE removes) ALTER TABLE customer_data_json SET ( 'value.format' = 'json-debezium-registry', 'changelog.mode' = 'retract' ); ``` ```sql -- Convert from regular Protobuf format to Debezium CDC format -- and configure the appropriate Flink changelog interpretation mode: -- * append: Treats each record as an INSERT operation with no relationship between records -- * retract: Handles paired operations (INSERT/UPDATE/DELETE) where changes to the same row -- are represented as a retraction of the old value followed by an addition of the new value -- * upsert: Groups all operations for the primary key (derived from the Kafka message key), -- with each operation effectively merging with or replacing previous state -- (INSERT creates, UPDATE modifies, DELETE removes) ALTER TABLE customer_data_proto SET ( 'value.format' = 'proto-debezium-registry', 'changelog.mode' = 'retract' ); ``` ```sql -- Change to append mode (default) -- Best for event streams where each record is independent ALTER TABLE customer_changes SET ( 'changelog.mode' = 'append' ); -- Change to retract mode -- Useful when changes to the same row are represented as paired operations ALTER TABLE customer_changes SET ( 'changelog.mode' = 'retract' ); -- Change upsert mode when working with primary keys -- Best when tracking state changes using a primary key (derived from Kafka message key) ALTER TABLE customer_changes SET ( 'changelog.mode' = 'upsert' ); ``` ```sql -- Create example topic CREATE TABLE t_headers (i INT); -- For read-only (virtual) ALTER TABLE t_headers ADD headers MAP METADATA VIRTUAL; -- For read and write (persisted). Column becomes mandatory in INSERT INTO. ALTER TABLE t_headers MODIFY headers MAP METADATA; -- Use implicit casting (origin is always MAP) ALTER TABLE t_headers MODIFY headers MAP METADATA; -- Insert and read INSERT INTO t_headers SELECT 42, MAP['k1', 'v1', 'k2', 'v2']; SELECT * FROM t_headers; ``` ```sql other_name MAP METADATA FROM 'headers' VIRTUAL ``` ```sql ALTER TABLE `orders` ADD ( `headers` MAP METADATA VIRTUAL); ``` ```sql DESCRIBE `orders`; ``` ```sql +-------------+-------------------+----------+-------------------------+ | Column Name | Data Type | Nullable | Extras | +-------------+-------------------+----------+-------------------------+ | user | BIGINT | NOT NULL | PRIMARY KEY, BUCKET KEY | | product | STRING | NULL | | | amount | INT | NULL | | | ts | TIMESTAMP(3) | NULL | | | headers | MAP | NULL | METADATA VIRTUAL | +-------------+-------------------+----------+-------------------------+ ``` ```sql -- Create example topic with 1 partition filled with values CREATE TABLE t_specific_offsets (i INT) DISTRIBUTED INTO 1 BUCKETS; INSERT INTO t_specific_offsets VALUES (1), (2), (3), (4), (5); -- Returns 1, 2, 3, 4, 5 SELECT * FROM t_specific_offsets; -- Changes the scan range ALTER TABLE t_specific_offsets SET ( 'scan.startup.mode' = 'specific-offsets', 'scan.startup.specific-offsets' = 'partition:0,offset:3' ); -- Returns 4, 5 SELECT * FROM t_specific_offsets; ``` ```sql scan.startup.mode ``` ```sql scan.bounded.mode ``` ```sql scan.startup.specific-offsets ``` ```sql scan.bounded.specific-offsets ``` ```sql 'scan.startup.specific-offsets' = 'partition:0,offset:3; partition:1,offset:42; partition:2,offset:0' ``` ```sql -- example table CREATE TABLE t_watermark_debugging (k INT, s STRING) DISTRIBUTED BY (k) INTO 4 BUCKETS; -- Each value lands in a separate Kafka partition (out of 4). -- Leave out values to see missing watermarks. INSERT INTO t_watermark_debugging VALUES (1, 'Bob'), (2, 'Alice'), (8, 'John'), (15, 'David'); -- If ROW_NUMBER doesn't show results, it's clearly a watermark issue. SELECT ROW_NUMBER() OVER (ORDER BY $rowtime ASC) AS `number`, * FROM t_watermark_debugging; -- Add partition information as metadata column ALTER TABLE t_watermark_debugging ADD part INT METADATA FROM 'partition' VIRTUAL; -- Use the CURRENT_WATERMARK() function to check which watermark is calculated SELECT *, part AS `Row Partition`, $rowtime AS `Row Timestamp`, CURRENT_WATERMARK($rowtime) AS `Operator Watermark` FROM t_watermark_debugging; -- Visualize the highest timestamp per Kafka partition -- Due to the table declaration (with 4 buckets), this query should show 4 rows. -- If not, the missing partitions might be the cause for watermark issues. SELECT part AS `Partition`, MAX($rowtime) AS `Max Timestamp in Partition` FROM t_watermark_debugging GROUP BY part; -- A workaround could be to not use the system watermark: ALTER TABLE t_watermark_debugging MODIFY WATERMARK FOR $rowtime AS $rowtime - INTERVAL '2' SECOND; -- Or for perfect input data: ALTER TABLE t_watermark_debugging MODIFY WATERMARK FOR $rowtime AS $rowtime - INTERVAL '0.001' SECOND; -- Add "fresh" data while the above statements with -- ROW_NUMBER() or CURRENT_WATERMARK() are running. INSERT INTO t_watermark_debugging VALUES (1, 'Fresh Bob'), (2, 'Fresh Alice'), (8, 'Fresh John'), (15, 'Fresh David'); ``` ```sql -- Create a topic with 4 partitions. CREATE TABLE t_watermark_idle (k INT, s STRING) DISTRIBUTED BY (k) INTO 4 BUCKETS; -- Avoid the "not enough data" problem by using a custom watermark. -- The watermark strategy is still coarse-grained enough for this example. ALTER TABLE t_watermark_idle MODIFY WATERMARK FOR $rowtime AS $rowtime - INTERVAL '2' SECONDS; -- Each value lands in a separate Kafka partition, and partition 1 is empty. INSERT INTO t_watermark_idle VALUES (1, 'Bob in partition 0'), (2, 'Alice in partition 3'), (8, 'John in partition 2'); -- Thread 1: Start a streaming job. SELECT ROW_NUMBER() OVER (ORDER BY $rowtime ASC) AS `number`, * FROM t_watermark_idle; -- Thread 2: Insert some data immediately -> Thread 1 still without results. INSERT INTO t_watermark_idle VALUES (1, 'Another Bob in partition 0 shortly after'); -- Thread 2: Insert some data after 15s -> Thread 1 should show results. INSERT INTO t_watermark_idle VALUES (1, 'Another Bob in partition 0 after 15s') ``` ```sql sql.tables.scan.idle-timeout ``` ```sql -- Thread 1: Start a streaming job. -- Lower the idle timeout further. SET 'sql.tables.scan.idle-timeout' = '1s'; SELECT ROW_NUMBER() OVER (ORDER BY $rowtime ASC) AS `number`, * FROM t_watermark_idle; -- Thread 2: Insert some data immediately -> Thread 1 should show results. INSERT INTO t_watermark_idle VALUES (1, 'Another Bob in partition 0 shortly after'); ``` ```sql ALTER TABLE `orders` SET ('value.format.schema-context' = '.lsrc-newcontext'); ``` ```sql Statement phase is COMPLETED. ``` ```sql SHOW CREATE TABLE `orders`; ``` ```sql +----------------------------------------------------------------------+ | SHOW CREATE TABLE | +----------------------------------------------------------------------+ | CREATE TABLE `catalog`.`database`.`orders` ( | | `user` BIGINT NOT NULL, | | `product` VARCHAR(2147483647), | | `amount` INT, | | `ts` TIMESTAMP(3) | | ) | | DISTRIBUTED BY HASH(`user`) INTO 6 BUCKETS | | WITH ( | | 'changelog.mode' = 'upsert', | | 'connector' = 'confluent', | | 'kafka.cleanup-policy' = 'delete', | | 'kafka.max-message-size' = '2097164 bytes', | | 'kafka.retention.size' = '0 bytes', | | 'kafka.retention.time' = '604800000 ms', | | 'key.format' = 'avro-registry', | | 'scan.bounded.mode' = 'unbounded', | | 'scan.startup.mode' = 'latest-offset', | | 'value.format' = 'avro-registry', | | 'value.format.schema-context' = '.lsrc-newcontext' | | ) | | | +----------------------------------------------------------------------+ ``` ```sql { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" } ] } ``` ```sql ALTER TABLE t_metadata_overlap ADD `timestamp` TIMESTAMP_LTZ(3) NOT NULL METADATA; ``` ```sql CREATE TABLE t_metadata_overlap` ( `key` VARBINARY(2147483647), `uid` INT NOT NULL, `timestamp` TIMESTAMP(3) WITH LOCAL TIME ZONE NOT NULL METADATA ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( ... ) ``` ```sql INSERT INTO t_metadata_overlap SELECT CAST(NULL AS BYTES), 42, TO_TIMESTAMP_LTZ(0, 3); ``` ```sql ALTER TABLE t_metadata_overlap DROP `timestamp`; ALTER TABLE t_metadata_overlap ADD message_timestamp TIMESTAMP_LTZ(3) METADATA FROM 'timestamp'; SELECT * FROM t_metadata_overlap; ``` ```sql CREATE TABLE `t_metadata_overlap` ( `key` VARBINARY(2147483647), `uid` INT NOT NULL, `timestamp` VARCHAR(2147483647), `message_timestamp` TIMESTAMP(3) WITH LOCAL TIME ZONE METADATA FROM 'timestamp' ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( ... ) ``` ```sql { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" } ] } ``` ```sql CREATE TABLE `t_enrich_raw_key` ( `key` VARBINARY(2147483647), `uid` INT NOT NULL ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'avro-registry' ... ) ``` ```sql ALTER TABLE t_enrich_raw_key MODIFY key STRING; ``` ```sql CREATE TABLE `t_enrich_raw_key` ( `key` STRING, `uid` INT NOT NULL ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'avro-registry' ... ) ``` ```sql SHOW CREATE TABLE events; ``` ```sql CREATE TABLE `events` ( `key` VARBINARY(2147483647), `value` VARBINARY(2147483647) ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'raw' ) ``` ```sql ALTER TABLE events SET ( 'value.format' = 'avro-registry', 'value.avro-registry.subject-names' = 'com.example.Order;com.example.Shipment' ); ``` ```sql ALTER TABLE events SET ( 'value.format' = 'json-registry', 'value.json-registry.subject-names' = 'com.example.Order;com.example.Shipment' ); ``` ```sql ALTER TABLE events SET ( 'value.format' = 'proto-registry', 'value.proto-registry.subject-names' = 'com.example.Order;com.example.Shipment' ); ``` ```sql ALTER TABLE events SET ( 'key.format' = 'avro-registry', 'key.avro-registry.subject-names' = 'com.example.OrderKey' ); ``` ```sql ALTER TABLE events SET ( 'key.format' = 'avro-registry', 'key.avro-registry.subject-names' = 'com.example.OrderKey', 'value.format' = 'avro-registry', 'value.avro-registry.subject-names' = 'com.example.Order;com.example.Shipment' ); ``` ```sql avro-registry ``` ```sql json-registry ``` ```sql proto-registry ``` ```sql ALTER TABLE json_table RESET ( 'value.json-registry.wire-encoding', 'value.json-registry.subject-names' ); ``` ```sql ALTER TABLE my_table SET ( 'error-handling.mode' = 'log', 'error-handling.log.target' = 'my_error_table' ); ``` ```sql SELECT NOW(); ``` ```sql CREATE TABLE t_union_1 (i INT); CREATE TABLE t_union_2 (i INT); TABLE t_union_1 UNION ALL TABLE t_union_2; -- alternate syntax SELECT * FROM t_union_1 UNION ALL SELECT * FROM t_union_2; ``` ```sql CREATE TABLE t_watermarked_insight (s STRING) DISTRIBUTED INTO 1 BUCKETS; INSERT INTO t_watermarked_insight VALUES ('Bob'), ('Alice'), ('Charly'); SELECT $rowtime, CURRENT_WATERMARK($rowtime) FROM t_watermarked_insight; ``` ```sql $rowtime EXPR$1 2024-04-29 11:59:01.080 NULL 2024-04-29 11:59:01.093 2024-04-04 15:27:37.433 2024-04-29 11:59:01.094 2024-04-04 15:27:37.433 ``` ```sql CREATE TABLE t_flattening (i INT, r1 ROW, r2 ROW); SELECT r1.*, r2.* FROM t_flattening; ``` ```sql { "type":"record", "namespace": "io.confluent.developer.avro", "name":"Purchase", "fields": [ {"name": "item", "type":"string"}, {"name": "amount", "type": "double"}, {"name": "customer_id", "type": "string"} ] } ``` ```sql syntax = "proto3"; package io.confluent.developer.proto; message Purchase { string item = 1; double amount = 2; string customer_id = 3; } ``` ```sql { "$schema": "http://json-schema.org/draft-07/schema#", "title": "Purchase", "type": "object", "properties": { "item": { "type": "string" }, "amount": { "type": "number" }, "customer_id": { "type": "string" } }, "required": ["item", "amount", "customer_id"] } ``` ```sql { "type":"record", "namespace": "io.confluent.developer.avro", "name":"Pageview", "fields": [ {"name": "url", "type":"string"}, {"name": "is_special", "type": "boolean"}, {"name": "customer_id", "type": "string"} ] } ``` ```sql syntax = "proto3"; package io.confluent.developer.proto; message Pageview { string url = 1; bool is_special = 2; string customer_id = 3; } ``` ```sql { "$schema": "http://json-schema.org/draft-07/schema#", "title": "Pageview", "type": "object", "properties": { "url": { "type": "string" }, "is_special": { "type": "boolean" }, "customer_id": { "type": "string" } }, "required": ["url", "is_special", "customer_id"] } ``` ```sql [ "io.confluent.developer.avro.Purchase", "io.confluent.developer.avro.Pageview" ] ``` ```sql syntax = "proto3"; package io.confluent.developer.proto; import "purchase.proto"; import "pageview.proto"; message CustomerEvent { oneof action { Purchase purchase = 1; Pageview pageview = 2; } } ``` ```sql { "$schema": "http://json-schema.org/draft-07/schema#", "title": "CustomerEvent", "type": "object", "oneOf": [ { "$ref": "io.confluent.developer.json.Purchase" }, { "$ref": "io.confluent.developer.json.Pageview" } ] } ``` ```sql [ { "name": "io.confluent.developer.avro.Purchase", "subject": "purchase", "version": 1 }, { "name": "io.confluent.developer.avro.Pageview", "subject": "pageview", "version": 1 } ] ``` ```sql [ { "name": "purchase.proto", "subject": "purchase", "version": 1 }, { "name": "pageview.proto", "subject": "pageview", "version": 1 } ] ``` ```sql [ { "name": "io.confluent.developer.json.Purchase", "subject": "purchase", "version": 1 }, { "name": "io.confluent.developer.json.Pageview", "subject": "pageview", "version": 1 } ] ``` ```sql SHOW CREATE TABLE customer-events; ``` ```sql CREATE TABLE `customer-events` ( `key` VARBINARY(2147483647), `Purchase` ROW<`item` VARCHAR(2147483647) NOT NULL, `amount` DOUBLE NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL>, `Pageview` ROW<`url` VARCHAR(2147483647) NOT NULL, `is_special` BOOLEAN NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL> ) DISTRIBUTED BY HASH(`key`) INTO 2 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'kafka.cleanup-policy' = 'delete', 'kafka.max-message-size' = '2097164 bytes', 'kafka.retention.size' = '0 bytes', 'kafka.retention.time' = '7 d', 'key.format' = 'raw', 'scan.bounded.mode' = 'unbounded', 'scan.startup.mode' = 'earliest-offset', 'value.format' = '[VALUE_FORMAT]' ) ``` ```sql CREATE TABLE purchase AS SELECT Purchase.* FROM `customer-events` WHERE Purchase IS NOT NULL; SELECT * FROM purchase; ``` ```sql CREATE TABLE pageview AS SELECT Pageview.* FROM `customer-events` WHERE Pageview IS NOT NULL; SELECT * FROM pageview; ``` --- ### Flink SQL Syntax in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/sql-syntax.html Flink SQL Syntax in Confluent Cloud for Apache Flink¶ SQL is a domain-specific language for managing and manipulating data. It’s used primarily to work with structured data, where the types and relationships across entities are well-defined. Originally adopted for relational databases, SQL is rapidly becoming the language of choice for stream processing. It’s declarative, expressive, and ubiquitous. The American National Standards Institute (ANSI) maintains a standard for the specification of SQL. Flink SQL is compliant with ANSI SQL 2011. Beyond the standard, there are many flavors and extensions to SQL so that it can express programs beyond what’s possible with the SQL 2011 grammar. Lexical structure¶ The grammar of Apache Flink® parses SQL using Apache Calcite, which supports standard ANSI SQL. Syntax¶ Flink SQL inputs are made up of a series of statements. Each statement is made up of a series of tokens and ends in a semicolon (;). The tokens that apply depend on the statement being invoked. A token is any keyword, identifier, backticked identifier, literal, or special character. By convention, tokens are separated by whitespace, unless there is no ambiguity in the grammar. This happens when tokens flank a special character. The following example statements are syntactically valid Flink SQL input: -- Create a users table. CREATE TABLE users ( user_id STRING, registertime BIGINT, gender STRING, regionid STRING ); -- Populate the table with mock users data. INSERT INTO users VALUES ('Thomas A. Anderson', 1677260724, 'male', 'Region_4'), ('Trinity', 1677260733, 'female', 'Region_4'), ('Morpheus', 1677260742, 'male', 'Region_8'); SELECT * FROM users; Keywords¶ Some tokens, such as SELECT, INSERT, and CREATE, are keywords. Keywords are reserved tokens that have a specific meaning in Flink’s syntax. They control their surrounding allowable tokens and execution semantics. Keywords are case insensitive, meaning SELECT and select are equivalent. You can’t create an identifier that is already a reserved word, unless you use backticked identifiers, for example, `table`. For a complete list of keywords, see Flink SQL Reserved Keywords. Identifiers¶ Identifiers are symbols that represent user-defined entities, like tables, columns, and other objects. For example, if you have a table named t1, t1 is an identifier for that table. By default, identifiers are case-sensitive, meaning t1 and T1 refer to different tables. Unless an identifier is backticked, it may be composed only of characters that are a letter, number, or underscore. There is no imposed limit on the number of characters. To make it possible to use any character in an identifier, you can enclose it in backtick characters (`) when you declare and use it. A backticked identifier is useful when you don’t control the data, so it might have special characters, or even keywords. If you want to use one of the keyword strings as an identifier, enclose them with backticks, for example: `value` `count` When you use backticked identifiers, Flink SQL captures the case exactly, and any future references to the identifier are case-sensitive. For example, if you declare the following table: CREATE TABLE `t1` ( id VARCHAR, `@MY-identifier-table-column!` INT); You must select from it by backticking the table name and column name and using the original casing: SELECT `@MY-identifier-table-column!` FROM `t1`; If you use an invalid identifier without enclosing it in backticks, you receive a SQL parse failed error. For example, the following SQL query tries to read records from a table named table-with-dashes, but the dash character (-) is not valid in an identifier. SELECT * FROM table-with-dashes; The error output resembles: SQL parse failed. Encountered "-" at line 1, column 20. You can fix the error by enclosing the identifier with backticks: SELECT * FROM `table-with-dashes`; Constants¶ There are three implicitly typed constants, or literals, in Flink SQL: strings, numbers, and booleans. String constants¶ A string constant is an arbitrary series of characters surrounded by single quotes ('), like 'Hello world'. To include a quote inside of a string literal, escape the quote by prefixing it with another quote, for example, 'You can call me ''Stuart'', or Stu.' Numeric constants¶ Numeric constants are accepted in the following forms: digits digits.[digits][e[+-]digits] [digits].digits[e[+-]digits] digitse[+-]digits where digits is one or more single-digit integers (0 through 9). At least one digit must be present before or after the decimal point, if there is one. At least one digit must follow the exponent symbol e, if there is one. No spaces, underscores, or any other characters are allowed in the constant. Numeric constants may also have a + or - prefix, but this is considered to be a function applied to the constant, not the constant itself. Here are some examples of valid numeric constants: 5 7.2 0.0087 1. .5 1e-3 1.332434e+2 +100 -250 Boolean constants¶ A boolean constant is represented as either the identifier true or false. Boolean constants are not case-sensitive, which means that true evaluates to the same value as TRUE. Operators¶ Operators are infix functions composed of special characters. Flink SQL doesn’t allow you to add user-space operators. For a complete list of operators, see Comparison Functions in Confluent Cloud for Apache Flink. Special characters¶ Some characters have a particular meaning that doesn’t correspond to an operator. The following list describes the special characters and their purposes. Parentheses (()) retain their usual meaning in programming languages for grouping expressions and controlling the order of evaluation. Brackets ([]) are used to work with arrays, both in their construction and subscript access. They also allow you to key into maps. Commas (,) delineate a discrete list of entities. The semi-colon (;) terminates a SQL statement. The asterisk (*), when used in particular syntax, is used as an “all” qualifier. This is seen most commonly in a SELECT command to retrieve all columns. The period (.) accesses a column in a table or a field in a struct data type. Comments¶ A comment is a string beginning with two dashes. It includes all of the content from the dashes to the end of the line: -- Here is a comment. You can also span a comment over multiple lines by using C-style syntax: /* Here is another comment. */ Lexical precedence¶ Operators are evaluated using the following order of precedence: *, /, % +, - =, >, <, >=, <=, <>, != NOT AND BETWEEN, LIKE, OR In an expression, when two operators have the same precedence level, they’re evaluated left-to-right, based on their position. You can enclose an expression in parentheses to force precedence or clarify precedence, for example, (5 + 2) * 3. Related content¶ Flink SQL Reserved Keywords Data Types Flink SQL Queries DDL Statements in Confluent Cloud for Apache Flink Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql -- Create a users table. CREATE TABLE users ( user_id STRING, registertime BIGINT, gender STRING, regionid STRING ); -- Populate the table with mock users data. INSERT INTO users VALUES ('Thomas A. Anderson', 1677260724, 'male', 'Region_4'), ('Trinity', 1677260733, 'female', 'Region_4'), ('Morpheus', 1677260742, 'male', 'Region_8'); SELECT * FROM users; ``` ```sql CREATE TABLE `t1` ( id VARCHAR, `@MY-identifier-table-column!` INT); ``` ```sql SELECT `@MY-identifier-table-column!` FROM `t1`; ``` ```sql SQL parse failed ``` ```sql table-with-dashes ``` ```sql SELECT * FROM table-with-dashes; ``` ```sql SQL parse failed. Encountered "-" at line 1, column 20. ``` ```sql SELECT * FROM `table-with-dashes`; ``` ```sql 'Hello world' ``` ```sql 'You can call me ''Stuart'', or Stu.' ``` ```sql 1.332434e+2 ``` ```sql -- Here is a comment. ``` ```sql /* Here is another comment. */ ``` ```sql (5 + 2) * 3 ``` --- ### SQL ALTER CONNECTION Statement in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/statements/alter-connection.html ALTER CONNECTION Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports creating secure connections to external services and data sources. You can use these connections in your Flink statements. Use the ALTER CONNECTION statement to change the API key or credentials of an existing connection. Syntax¶ ALTER CONNECTION [IF EXISTS] [catalog_name.][db_name.]connection_name SET (key1=val1[, key2=val2]...) Description¶ Change the API key or credentials of a connection. Secrets are extracted to the secret store and aren’t displayed in subsequent DESCRIBE CONNECTION statements, the Flink SQL shell, or the Confluent Cloud Console. Confluent Cloud for Apache Flink makes a best-effort attempt to redact sensitive values from the CREATE CONNECTION and ALTER CONNECTION statements by masking the values for the known sensitive keys. In Confluent Cloud Console, the sensitive values are redacted in the Flink SQL workspace if you navigate away from the workspace and return, or if you reload the page in the browser. Alternatively, you can use the Confluent CLI commands to create and manage connections. In addition, if syntax in the CREATE CONNECTION statement is incorrect, Confluent Cloud for Apache Flink may not detect the secrets. For example, if you type CREATE CONNECTION my_conn WITH ('ap-key' = 'x'), Flink won’t redact the x, because api-key is misspelled. Examples¶ -- Update the API key for a connection. ALTER CONNECTION `conn-one` SET ('api-key' = ''); -- Update the credentials for a connection. ALTER CONNECTION `my-couchbase-conn` SET ( 'username' = '', 'password' = '' ); Related content¶ CREATE CONNECTION DESCRIBE CONNECTION DROP CONNECTION SHOW CONNECTIONS Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql ALTER CONNECTION [IF EXISTS] [catalog_name.][db_name.]connection_name SET (key1=val1[, key2=val2]...) ``` ```sql CREATE CONNECTION my_conn WITH ('ap-key' = 'x') ``` ```sql -- Update the API key for a connection. ALTER CONNECTION `conn-one` SET ('api-key' = ''); -- Update the credentials for a connection. ALTER CONNECTION `my-couchbase-conn` SET ( 'username' = '', 'password' = '' ); ``` --- ### SQL ALTER MODEL Statement in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/statements/alter-model.html ALTER MODEL Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables real-time inference and prediction with AI models. Use the CREATE MODEL statement to register an AI model. Syntax¶ -- Rename a model. ALTER MODEL [IF EXISTS][catalog_name.][database_name.]model_name RENAME TO [catalog_name.][database_name.]new_model_name -- Alter model options. ALTER MODEL [IF EXISTS] [catalog_name.][database_name.]model_name[$version_id] SET (key1=val1[, key2=val2]...) -- Reset model options. ALTER MODEL [IF EXISTS] [catalog_name.][database_name.]model_name[$version_id] RESET (key1[, key2]...) Description¶ Rename an AI model or change model options. Use the $ syntax to change a specific version of a model. For more information, see Model versioning. ALTER MODEL options apply only to model metadata, not model data. If the IF EXISTS clause is provided, and the model doesn’t exist, nothing happens. If the IF EXISTS clause is provided, and the model version doesn’t exist, nothing happens. For RESET, the specified model option keys are reset to the default value. Examples¶ -- Rename a model. ALTER MODEL `my_model` RENAME TO `my_new_model` -- Check for model existence and rename if it exists. ALTER MODEL IF EXISTS `my_model` RENAME TO `my_new_model` -- Change options for version 2. ALTER MODEL `my_model$2` SET ( tag = 'prod', description = "new_description" ); -- Reset the tag option. ALTER MODEL `my_model` RESET (tag) Related content¶ CREATE MODEL DROP MODEL Run an AI Model Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql -- Rename a model. ALTER MODEL [IF EXISTS][catalog_name.][database_name.]model_name RENAME TO [catalog_name.][database_name.]new_model_name -- Alter model options. ALTER MODEL [IF EXISTS] [catalog_name.][database_name.]model_name[$version_id] SET (key1=val1[, key2=val2]...) -- Reset model options. ALTER MODEL [IF EXISTS] [catalog_name.][database_name.]model_name[$version_id] RESET (key1[, key2]...) ``` ```sql $ ``` ```sql -- Rename a model. ALTER MODEL `my_model` RENAME TO `my_new_model` -- Check for model existence and rename if it exists. ALTER MODEL IF EXISTS `my_model` RENAME TO `my_new_model` -- Change options for version 2. ALTER MODEL `my_model$2` SET ( tag = 'prod', description = "new_description" ); -- Reset the tag option. ALTER MODEL `my_model` RESET (tag) ``` --- ### SQL ALTER TABLE Statement in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/statements/alter-table.html ALTER TABLE Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables changing properties of an existing table. Syntax¶ ALTER TABLE [catalog_name.][db_name.]table_name { ADD (metadata_column_name metadata_column_type METADATA [FROM metadata_key] VIRTUAL [COMMENT column_comment]) | ADD (computed_column_name AS computed_column_expression [COMMENT column_comment]) | MODIFY WATERMARK FOR rowtime_column_name AS watermark_strategy_expression | DROP WATERMARK | SET (key1='value1' [, key2='value2', ...]) | RESET (key1 [, key2, ...]) } Description¶ ALTER TABLE allows you to add metadata columns, computed columns, change or remove the watermark, and modify table properties. Physical columns cannot be added, modified, or dropped within Confluent Cloud for Apache Flink directly, but schemas can be evolved in Schema Registry. Examples¶ The following examples show frequently encountered scenarios with ALTER TABLE. Define a watermark for perfectly ordered data¶ Flink guarantees that rows are always emitted before the watermark is generated. The following statements ensure that for perfectly ordered events, meaning events without time-skew, a watermark can be equal to the timestamp or 1 ms less than the timestamp. CREATE TABLE t_perfect_watermark (i INT); -- If multiple events can have the same timestamp. ALTER TABLE t_perfect_watermark MODIFY WATERMARK FOR $rowtime AS $rowtime - INTERVAL '0.001' SECOND; -- If a single event can have the timestamp. ALTER TABLE t_perfect_watermark MODIFY WATERMARK FOR $rowtime AS $rowtime; Drop your custom watermark strategy¶ Remove the custom watermark strategy to restore the default watermark strategy. View the current table schema and metadata. DESCRIBE `orders`; Your output should resemble: +-------------+------------------------+----------+-------------------+ | Column Name | Data Type | Nullable | Extras | +-------------+------------------------+----------+-------------------+ | user | BIGINT | NOT NULL | PRIMARY KEY | | product | STRING | NULL | | | amount | INT | NULL | | | ts | TIMESTAMP(3) *ROWTIME* | NULL | WATERMARK AS `ts` | +-------------+------------------------+----------+-------------------+ Remove the watermark strategy of the table. ALTER TABLE `orders` DROP WATERMARK; Your output should resemble: Statement phase is COMPLETED. Check the new table schema and metadata. DESCRIBE `orders`; Your output should resemble: +-------------+--------------+----------+-------------+ | Column Name | Data Type | Nullable | Extras | +-------------+--------------+----------+-------------+ | user | BIGINT | NOT NULL | PRIMARY KEY | | product | STRING | NULL | | | amount | INT | NULL | | | ts | TIMESTAMP(3) | NULL | | +-------------+--------------+----------+-------------+ Configure Debezium format for CDC data¶ Change regular format to Debezium format¶ Note For schemas created after May 19, 2025 at 09:00 UTC, Flink automatically detects Debezium envelopes and configures the appropriate format and changelog mode. Manual conversion is necessary only for older schemas or when you want to override the default behavior. For tables that have been inferred with regular formats but contain Debezium CDC (Change Data Capture) data: AvroJSON SchemaProtobuf-- Convert from regular Avro format to Debezium CDC format -- and configure the appropriate Flink changelog interpretation mode: -- * append: Treats each record as an INSERT operation with no relationship between records -- * retract: Handles paired operations (INSERT/UPDATE/DELETE) where changes to the same row -- are represented as a retraction of the old value followed by an addition of the new value -- * upsert: Groups all operations for the primary key (derived from the Kafka message key), -- with each operation effectively merging with or replacing previous state -- (INSERT creates, UPDATE modifies, DELETE removes) ALTER TABLE customer_data SET ( 'value.format' = 'avro-debezium-registry', 'changelog.mode' = 'retract' ); -- Convert from regular JSON format to Debezium CDC format -- and configure the appropriate Flink changelog interpretation mode: -- * append: Treats each record as an INSERT operation with no relationship between records -- * retract: Handles paired operations (INSERT/UPDATE/DELETE) where changes to the same row -- are represented as a retraction of the old value followed by an addition of the new value -- * upsert: Groups all operations for the primary key (derived from the Kafka message key), -- with each operation effectively merging with or replacing previous state -- (INSERT creates, UPDATE modifies, DELETE removes) ALTER TABLE customer_data_json SET ( 'value.format' = 'json-debezium-registry', 'changelog.mode' = 'retract' ); -- Convert from regular Protobuf format to Debezium CDC format -- and configure the appropriate Flink changelog interpretation mode: -- * append: Treats each record as an INSERT operation with no relationship between records -- * retract: Handles paired operations (INSERT/UPDATE/DELETE) where changes to the same row -- are represented as a retraction of the old value followed by an addition of the new value -- * upsert: Groups all operations for the primary key (derived from the Kafka message key), -- with each operation effectively merging with or replacing previous state -- (INSERT creates, UPDATE modifies, DELETE removes) ALTER TABLE customer_data_proto SET ( 'value.format' = 'proto-debezium-registry', 'changelog.mode' = 'retract' ); Modify Changelog Processing Mode¶ For tables with any type of data that need a different processing mode for handling changes: -- Change to append mode (default) -- Best for event streams where each record is independent ALTER TABLE customer_changes SET ( 'changelog.mode' = 'append' ); -- Change to retract mode -- Useful when changes to the same row are represented as paired operations ALTER TABLE customer_changes SET ( 'changelog.mode' = 'retract' ); -- Change upsert mode when working with primary keys -- Best when tracking state changes using a primary key (derived from Kafka message key) ALTER TABLE customer_changes SET ( 'changelog.mode' = 'upsert' ); Read and/or write Kafka headers¶ -- Create example topic CREATE TABLE t_headers (i INT); -- For read-only (virtual) ALTER TABLE t_headers ADD headers MAP METADATA VIRTUAL; -- For read and write (persisted). Column becomes mandatory in INSERT INTO. ALTER TABLE t_headers MODIFY headers MAP METADATA; -- Use implicit casting (origin is always MAP) ALTER TABLE t_headers MODIFY headers MAP METADATA; -- Insert and read INSERT INTO t_headers SELECT 42, MAP['k1', 'v1', 'k2', 'v2']; SELECT * FROM t_headers; Properties The metadata key is headers. If you don’t want to name the column this way, use: other_name MAP METADATA FROM 'headers' VIRTUAL. Keys of headers must be unique. Multi-key headers are not supported. Add headers as a metadata column¶ You can get the headers of a Kafka record as a map of raw bytes by adding a headers virtual metadata column. Run the following statement to add the Kafka partition as a metadata column: ALTER TABLE `orders` ADD ( `headers` MAP METADATA VIRTUAL); View the new schema. DESCRIBE `orders`; Your output should resemble: +-------------+-------------------+----------+-------------------------+ | Column Name | Data Type | Nullable | Extras | +-------------+-------------------+----------+-------------------------+ | user | BIGINT | NOT NULL | PRIMARY KEY, BUCKET KEY | | product | STRING | NULL | | | amount | INT | NULL | | | ts | TIMESTAMP(3) | NULL | | | headers | MAP | NULL | METADATA VIRTUAL | +-------------+-------------------+----------+-------------------------+ Read topic from specific offsets¶ -- Create example topic with 1 partition filled with values CREATE TABLE t_specific_offsets (i INT) DISTRIBUTED INTO 1 BUCKETS; INSERT INTO t_specific_offsets VALUES (1), (2), (3), (4), (5); -- Returns 1, 2, 3, 4, 5 SELECT * FROM t_specific_offsets; -- Changes the scan range ALTER TABLE t_specific_offsets SET ( 'scan.startup.mode' = 'specific-offsets', 'scan.startup.specific-offsets' = 'partition:0,offset:3' ); -- Returns 4, 5 SELECT * FROM t_specific_offsets; Properties scan.startup.mode and scan.bounded.mode control which range in the changelog (Kafka topic) to read. scan.startup.specific-offsets and scan.bounded.specific-offsets define offsets per partition. In the example, only 1 partition is used. For multiple partitions, use the following syntax: 'scan.startup.specific-offsets' = 'partition:0,offset:3; partition:1,offset:42; partition:2,offset:0' Debug “no output” and no watermark cases¶ The root cause for most “no output” cases is that a time-based operation, for example, TUMBLE, MATCH_RECOGNIZE, and FOR SYSTEM_TIME AS OF, did not receive recent enough watermarks. The current time of an operator is calculated by the minimum watermark of all inputs, meaning across all tables/topics and their partitions. If one partition does not emit a watermark, it can affect the entire pipeline. The following statements may be helpful for debugging issues related to watermarks. -- example table CREATE TABLE t_watermark_debugging (k INT, s STRING) DISTRIBUTED BY (k) INTO 4 BUCKETS; -- Each value lands in a separate Kafka partition (out of 4). -- Leave out values to see missing watermarks. INSERT INTO t_watermark_debugging VALUES (1, 'Bob'), (2, 'Alice'), (8, 'John'), (15, 'David'); -- If ROW_NUMBER doesn't show results, it's clearly a watermark issue. SELECT ROW_NUMBER() OVER (ORDER BY $rowtime ASC) AS `number`, * FROM t_watermark_debugging; -- Add partition information as metadata column ALTER TABLE t_watermark_debugging ADD part INT METADATA FROM 'partition' VIRTUAL; -- Use the CURRENT_WATERMARK() function to check which watermark is calculated SELECT *, part AS `Row Partition`, $rowtime AS `Row Timestamp`, CURRENT_WATERMARK($rowtime) AS `Operator Watermark` FROM t_watermark_debugging; -- Visualize the highest timestamp per Kafka partition -- Due to the table declaration (with 4 buckets), this query should show 4 rows. -- If not, the missing partitions might be the cause for watermark issues. SELECT part AS `Partition`, MAX($rowtime) AS `Max Timestamp in Partition` FROM t_watermark_debugging GROUP BY part; -- A workaround could be to not use the system watermark: ALTER TABLE t_watermark_debugging MODIFY WATERMARK FOR $rowtime AS $rowtime - INTERVAL '2' SECOND; -- Or for perfect input data: ALTER TABLE t_watermark_debugging MODIFY WATERMARK FOR $rowtime AS $rowtime - INTERVAL '0.001' SECOND; -- Add "fresh" data while the above statements with -- ROW_NUMBER() or CURRENT_WATERMARK() are running. INSERT INTO t_watermark_debugging VALUES (1, 'Fresh Bob'), (2, 'Fresh Alice'), (8, 'Fresh John'), (15, 'Fresh David'); The debugging examples above won’t solve everything but may help in finding the root cause. The system watermark strategy is smart and excludes idle Kafka partitions from the watermark calculation after some time, but at least one partition must produce new data for the “logical clock” with watermarks. Typically, root causes are: Idle Kafka partitions No data in Kafka partitions Not enough data in Kafka partitions Watermark strategy is too conservative No fresh data after warm up with historical data for progressing the logical clock Handle idle partitions for missing watermarks¶ Idle partitions often cause missing watermarks. Also, no data in a partition or infrequent data can be a root cause. -- Create a topic with 4 partitions. CREATE TABLE t_watermark_idle (k INT, s STRING) DISTRIBUTED BY (k) INTO 4 BUCKETS; -- Avoid the "not enough data" problem by using a custom watermark. -- The watermark strategy is still coarse-grained enough for this example. ALTER TABLE t_watermark_idle MODIFY WATERMARK FOR $rowtime AS $rowtime - INTERVAL '2' SECONDS; -- Each value lands in a separate Kafka partition, and partition 1 is empty. INSERT INTO t_watermark_idle VALUES (1, 'Bob in partition 0'), (2, 'Alice in partition 3'), (8, 'John in partition 2'); -- Thread 1: Start a streaming job. SELECT ROW_NUMBER() OVER (ORDER BY $rowtime ASC) AS `number`, * FROM t_watermark_idle; -- Thread 2: Insert some data immediately -> Thread 1 still without results. INSERT INTO t_watermark_idle VALUES (1, 'Another Bob in partition 0 shortly after'); -- Thread 2: Insert some data after 15s -> Thread 1 should show results. INSERT INTO t_watermark_idle VALUES (1, 'Another Bob in partition 0 after 15s') Within the first 15 seconds, all partitions contribute to the watermark calculation, so the first INSERT INTO has no effect because partition 1 is still empty. After 15 seconds, all partitions are marked as idle. No partition contributes to the watermark calculation. But when the second INSERT INTO is executed, it becomes the main driving partition for the logical clock. The global watermark jumps to “second INSERT INTO - 2 seconds”. In the following code, the sql.tables.scan.idle-timeout configuration overrides the default idle-detection algorithm, so even an immediate INSERT INTO can be the main driving partition for the logical clock, because all other partitions are marked as idle after 1 second. -- Thread 1: Start a streaming job. -- Lower the idle timeout further. SET 'sql.tables.scan.idle-timeout' = '1s'; SELECT ROW_NUMBER() OVER (ORDER BY $rowtime ASC) AS `number`, * FROM t_watermark_idle; -- Thread 2: Insert some data immediately -> Thread 1 should show results. INSERT INTO t_watermark_idle VALUES (1, 'Another Bob in partition 0 shortly after'); Change the schema context property¶ You can set the schema context for key and value formats to control the namespace for your schema resolution in Schema Registry. Set the schema context for the value format ALTER TABLE `orders` SET ('value.format.schema-context' = '.lsrc-newcontext'); Your output should resemble: Statement phase is COMPLETED. Check the new table properties. SHOW CREATE TABLE `orders`; Your output should resemble: +----------------------------------------------------------------------+ | SHOW CREATE TABLE | +----------------------------------------------------------------------+ | CREATE TABLE `catalog`.`database`.`orders` ( | | `user` BIGINT NOT NULL, | | `product` VARCHAR(2147483647), | | `amount` INT, | | `ts` TIMESTAMP(3) | | ) | | DISTRIBUTED BY HASH(`user`) INTO 6 BUCKETS | | WITH ( | | 'changelog.mode' = 'upsert', | | 'connector' = 'confluent', | | 'kafka.cleanup-policy' = 'delete', | | 'kafka.max-message-size' = '2097164 bytes', | | 'kafka.retention.size' = '0 bytes', | | 'kafka.retention.time' = '604800000 ms', | | 'key.format' = 'avro-registry', | | 'scan.bounded.mode' = 'unbounded', | | 'scan.startup.mode' = 'latest-offset', | | 'value.format' = 'avro-registry', | | 'value.format.schema-context' = '.lsrc-newcontext' | | ) | | | +----------------------------------------------------------------------+ Inferred tables schema evolution¶ You can use the ALTER TABLE statement to evolve schemas for inferred tables. The following examples show output from the SHOW CREATE TABLE statement called on the resulting table. Schema Registry columns overlap with computed/metadata columns¶ For the following value schema in Schema Registry: { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" } ] } Evolve a table by adding metadata: ALTER TABLE t_metadata_overlap ADD `timestamp` TIMESTAMP_LTZ(3) NOT NULL METADATA; SHOW CREATE TABLE returns the following output: CREATE TABLE t_metadata_overlap` ( `key` VARBINARY(2147483647), `uid` INT NOT NULL, `timestamp` TIMESTAMP(3) WITH LOCAL TIME ZONE NOT NULL METADATA ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( ... ) Properties Schema Registry says there is a timestamp physical column, but Flink says there is timestamp metadata column. In this case, metadata columns and computed columns have precedence, and Confluent Cloud for Apache Flink removes the physical column from the schema. Because Confluent Cloud for Apache Flink advertises FULL_TRANSITIVE mode, queries still work, and the physical column is set to NULL in the payload: INSERT INTO t_metadata_overlap SELECT CAST(NULL AS BYTES), 42, TO_TIMESTAMP_LTZ(0, 3); Evolve the table by renaming metadata: ALTER TABLE t_metadata_overlap DROP `timestamp`; ALTER TABLE t_metadata_overlap ADD message_timestamp TIMESTAMP_LTZ(3) METADATA FROM 'timestamp'; SELECT * FROM t_metadata_overlap; SHOW CREATE TABLE returns the following output: CREATE TABLE `t_metadata_overlap` ( `key` VARBINARY(2147483647), `uid` INT NOT NULL, `timestamp` VARCHAR(2147483647), `message_timestamp` TIMESTAMP(3) WITH LOCAL TIME ZONE METADATA FROM 'timestamp' ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( ... ) Properties Now, both physical and metadata columns appear and can be accessed for reading and writing. Enrich a column that has no Schema Registry information¶ For the following value schema in Schema Registry: { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" } ] } SHOW CREATE TABLE returns the following output: CREATE TABLE `t_enrich_raw_key` ( `key` VARBINARY(2147483647), `uid` INT NOT NULL ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'avro-registry' ... ) Properties Schema Registry provides only information for the value part. Because the key part is not backed by Schema Registry, the key.format is raw. The default data type of raw is BYTES, but you can change this by using the ALTER TABLE statement. Evolve the table by giving a raw format column a specific type: ALTER TABLE t_enrich_raw_key MODIFY key STRING; SHOW CREATE TABLE returns the following output: CREATE TABLE `t_enrich_raw_key` ( `key` STRING, `uid` INT NOT NULL ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'avro-registry' ... ) Properties Only changes to simple, atomic types, like INT, BYTES, and STRING are supported, where the binary representation is clear. For more complex modifications, use Schema Registry. In multi-cluster scenarios, the ALTER TABLE statement must be executed for every cluster, because the data type for key is stored in the Flink regional metastore. Configure Schema Registry subject names¶ When working with topics that use RecordNameStrategy or TopicRecordNameStrategy, you can configure the subject names for the schema resolution in Schema Registry. This is particularly useful when handling multiple event types in a single topic. For topics using these strategies, Flink initially infers a raw binary table: SHOW CREATE TABLE events; Your output will show a raw binary structure: CREATE TABLE `events` ( `key` VARBINARY(2147483647), `value` VARBINARY(2147483647) ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'raw' ) Configure value schema subject names for each format: AvroJSON SchemaProtobufALTER TABLE events SET ( 'value.format' = 'avro-registry', 'value.avro-registry.subject-names' = 'com.example.Order;com.example.Shipment' ); ALTER TABLE events SET ( 'value.format' = 'json-registry', 'value.json-registry.subject-names' = 'com.example.Order;com.example.Shipment' ); ALTER TABLE events SET ( 'value.format' = 'proto-registry', 'value.proto-registry.subject-names' = 'com.example.Order;com.example.Shipment' ); If your topic uses keyed messages, you can also configure the key format: ALTER TABLE events SET ( 'key.format' = 'avro-registry', 'key.avro-registry.subject-names' = 'com.example.OrderKey' ); You can configure both key and value schema subject names in a single statement: ALTER TABLE events SET ( 'key.format' = 'avro-registry', 'key.avro-registry.subject-names' = 'com.example.OrderKey', 'value.format' = 'avro-registry', 'value.avro-registry.subject-names' = 'com.example.Order;com.example.Shipment' ); Properties: Use semicolons (;) to separate multiple subject names Subject names must match exactly with the names registered in Schema Registry The format prefix (avro-registry, json-registry, or proto-registry) must match the schema format in Schema Registry Reset a key value¶ You can use the RESET option to set any key to its default value. The following example shows how to reset a table that has a JSON Schema back to raw format. ALTER TABLE json_table RESET ( 'value.json-registry.wire-encoding', 'value.json-registry.subject-names' ); Custom error handling¶ You can use ALTER TABLE with the error-handling.mode and error-handling.log.target table properties to set custom error handling for deserialization errors. The following code example shows how to log errors to the specified Dead Letter Queue (DLQ) table and enable processing to continue. ALTER TABLE my_table SET ( 'error-handling.mode' = 'log', 'error-handling.log.target' = 'my_error_table' ); Related content¶ Video: How to Set Idle Timeouts #### Code Examples ```sql ALTER TABLE [catalog_name.][db_name.]table_name { ADD (metadata_column_name metadata_column_type METADATA [FROM metadata_key] VIRTUAL [COMMENT column_comment]) | ADD (computed_column_name AS computed_column_expression [COMMENT column_comment]) | MODIFY WATERMARK FOR rowtime_column_name AS watermark_strategy_expression | DROP WATERMARK | SET (key1='value1' [, key2='value2', ...]) | RESET (key1 [, key2, ...]) } ``` ```sql CREATE TABLE t_perfect_watermark (i INT); -- If multiple events can have the same timestamp. ALTER TABLE t_perfect_watermark MODIFY WATERMARK FOR $rowtime AS $rowtime - INTERVAL '0.001' SECOND; -- If a single event can have the timestamp. ALTER TABLE t_perfect_watermark MODIFY WATERMARK FOR $rowtime AS $rowtime; ``` ```sql DESCRIBE `orders`; ``` ```sql +-------------+------------------------+----------+-------------------+ | Column Name | Data Type | Nullable | Extras | +-------------+------------------------+----------+-------------------+ | user | BIGINT | NOT NULL | PRIMARY KEY | | product | STRING | NULL | | | amount | INT | NULL | | | ts | TIMESTAMP(3) *ROWTIME* | NULL | WATERMARK AS `ts` | +-------------+------------------------+----------+-------------------+ ``` ```sql ALTER TABLE `orders` DROP WATERMARK; ``` ```sql Statement phase is COMPLETED. ``` ```sql DESCRIBE `orders`; ``` ```sql +-------------+--------------+----------+-------------+ | Column Name | Data Type | Nullable | Extras | +-------------+--------------+----------+-------------+ | user | BIGINT | NOT NULL | PRIMARY KEY | | product | STRING | NULL | | | amount | INT | NULL | | | ts | TIMESTAMP(3) | NULL | | +-------------+--------------+----------+-------------+ ``` ```sql -- Convert from regular Avro format to Debezium CDC format -- and configure the appropriate Flink changelog interpretation mode: -- * append: Treats each record as an INSERT operation with no relationship between records -- * retract: Handles paired operations (INSERT/UPDATE/DELETE) where changes to the same row -- are represented as a retraction of the old value followed by an addition of the new value -- * upsert: Groups all operations for the primary key (derived from the Kafka message key), -- with each operation effectively merging with or replacing previous state -- (INSERT creates, UPDATE modifies, DELETE removes) ALTER TABLE customer_data SET ( 'value.format' = 'avro-debezium-registry', 'changelog.mode' = 'retract' ); ``` ```sql -- Convert from regular JSON format to Debezium CDC format -- and configure the appropriate Flink changelog interpretation mode: -- * append: Treats each record as an INSERT operation with no relationship between records -- * retract: Handles paired operations (INSERT/UPDATE/DELETE) where changes to the same row -- are represented as a retraction of the old value followed by an addition of the new value -- * upsert: Groups all operations for the primary key (derived from the Kafka message key), -- with each operation effectively merging with or replacing previous state -- (INSERT creates, UPDATE modifies, DELETE removes) ALTER TABLE customer_data_json SET ( 'value.format' = 'json-debezium-registry', 'changelog.mode' = 'retract' ); ``` ```sql -- Convert from regular Protobuf format to Debezium CDC format -- and configure the appropriate Flink changelog interpretation mode: -- * append: Treats each record as an INSERT operation with no relationship between records -- * retract: Handles paired operations (INSERT/UPDATE/DELETE) where changes to the same row -- are represented as a retraction of the old value followed by an addition of the new value -- * upsert: Groups all operations for the primary key (derived from the Kafka message key), -- with each operation effectively merging with or replacing previous state -- (INSERT creates, UPDATE modifies, DELETE removes) ALTER TABLE customer_data_proto SET ( 'value.format' = 'proto-debezium-registry', 'changelog.mode' = 'retract' ); ``` ```sql -- Change to append mode (default) -- Best for event streams where each record is independent ALTER TABLE customer_changes SET ( 'changelog.mode' = 'append' ); -- Change to retract mode -- Useful when changes to the same row are represented as paired operations ALTER TABLE customer_changes SET ( 'changelog.mode' = 'retract' ); -- Change upsert mode when working with primary keys -- Best when tracking state changes using a primary key (derived from Kafka message key) ALTER TABLE customer_changes SET ( 'changelog.mode' = 'upsert' ); ``` ```sql -- Create example topic CREATE TABLE t_headers (i INT); -- For read-only (virtual) ALTER TABLE t_headers ADD headers MAP METADATA VIRTUAL; -- For read and write (persisted). Column becomes mandatory in INSERT INTO. ALTER TABLE t_headers MODIFY headers MAP METADATA; -- Use implicit casting (origin is always MAP) ALTER TABLE t_headers MODIFY headers MAP METADATA; -- Insert and read INSERT INTO t_headers SELECT 42, MAP['k1', 'v1', 'k2', 'v2']; SELECT * FROM t_headers; ``` ```sql other_name MAP METADATA FROM 'headers' VIRTUAL ``` ```sql ALTER TABLE `orders` ADD ( `headers` MAP METADATA VIRTUAL); ``` ```sql DESCRIBE `orders`; ``` ```sql +-------------+-------------------+----------+-------------------------+ | Column Name | Data Type | Nullable | Extras | +-------------+-------------------+----------+-------------------------+ | user | BIGINT | NOT NULL | PRIMARY KEY, BUCKET KEY | | product | STRING | NULL | | | amount | INT | NULL | | | ts | TIMESTAMP(3) | NULL | | | headers | MAP | NULL | METADATA VIRTUAL | +-------------+-------------------+----------+-------------------------+ ``` ```sql -- Create example topic with 1 partition filled with values CREATE TABLE t_specific_offsets (i INT) DISTRIBUTED INTO 1 BUCKETS; INSERT INTO t_specific_offsets VALUES (1), (2), (3), (4), (5); -- Returns 1, 2, 3, 4, 5 SELECT * FROM t_specific_offsets; -- Changes the scan range ALTER TABLE t_specific_offsets SET ( 'scan.startup.mode' = 'specific-offsets', 'scan.startup.specific-offsets' = 'partition:0,offset:3' ); -- Returns 4, 5 SELECT * FROM t_specific_offsets; ``` ```sql scan.startup.mode ``` ```sql scan.bounded.mode ``` ```sql scan.startup.specific-offsets ``` ```sql scan.bounded.specific-offsets ``` ```sql 'scan.startup.specific-offsets' = 'partition:0,offset:3; partition:1,offset:42; partition:2,offset:0' ``` ```sql -- example table CREATE TABLE t_watermark_debugging (k INT, s STRING) DISTRIBUTED BY (k) INTO 4 BUCKETS; -- Each value lands in a separate Kafka partition (out of 4). -- Leave out values to see missing watermarks. INSERT INTO t_watermark_debugging VALUES (1, 'Bob'), (2, 'Alice'), (8, 'John'), (15, 'David'); -- If ROW_NUMBER doesn't show results, it's clearly a watermark issue. SELECT ROW_NUMBER() OVER (ORDER BY $rowtime ASC) AS `number`, * FROM t_watermark_debugging; -- Add partition information as metadata column ALTER TABLE t_watermark_debugging ADD part INT METADATA FROM 'partition' VIRTUAL; -- Use the CURRENT_WATERMARK() function to check which watermark is calculated SELECT *, part AS `Row Partition`, $rowtime AS `Row Timestamp`, CURRENT_WATERMARK($rowtime) AS `Operator Watermark` FROM t_watermark_debugging; -- Visualize the highest timestamp per Kafka partition -- Due to the table declaration (with 4 buckets), this query should show 4 rows. -- If not, the missing partitions might be the cause for watermark issues. SELECT part AS `Partition`, MAX($rowtime) AS `Max Timestamp in Partition` FROM t_watermark_debugging GROUP BY part; -- A workaround could be to not use the system watermark: ALTER TABLE t_watermark_debugging MODIFY WATERMARK FOR $rowtime AS $rowtime - INTERVAL '2' SECOND; -- Or for perfect input data: ALTER TABLE t_watermark_debugging MODIFY WATERMARK FOR $rowtime AS $rowtime - INTERVAL '0.001' SECOND; -- Add "fresh" data while the above statements with -- ROW_NUMBER() or CURRENT_WATERMARK() are running. INSERT INTO t_watermark_debugging VALUES (1, 'Fresh Bob'), (2, 'Fresh Alice'), (8, 'Fresh John'), (15, 'Fresh David'); ``` ```sql -- Create a topic with 4 partitions. CREATE TABLE t_watermark_idle (k INT, s STRING) DISTRIBUTED BY (k) INTO 4 BUCKETS; -- Avoid the "not enough data" problem by using a custom watermark. -- The watermark strategy is still coarse-grained enough for this example. ALTER TABLE t_watermark_idle MODIFY WATERMARK FOR $rowtime AS $rowtime - INTERVAL '2' SECONDS; -- Each value lands in a separate Kafka partition, and partition 1 is empty. INSERT INTO t_watermark_idle VALUES (1, 'Bob in partition 0'), (2, 'Alice in partition 3'), (8, 'John in partition 2'); -- Thread 1: Start a streaming job. SELECT ROW_NUMBER() OVER (ORDER BY $rowtime ASC) AS `number`, * FROM t_watermark_idle; -- Thread 2: Insert some data immediately -> Thread 1 still without results. INSERT INTO t_watermark_idle VALUES (1, 'Another Bob in partition 0 shortly after'); -- Thread 2: Insert some data after 15s -> Thread 1 should show results. INSERT INTO t_watermark_idle VALUES (1, 'Another Bob in partition 0 after 15s') ``` ```sql sql.tables.scan.idle-timeout ``` ```sql -- Thread 1: Start a streaming job. -- Lower the idle timeout further. SET 'sql.tables.scan.idle-timeout' = '1s'; SELECT ROW_NUMBER() OVER (ORDER BY $rowtime ASC) AS `number`, * FROM t_watermark_idle; -- Thread 2: Insert some data immediately -> Thread 1 should show results. INSERT INTO t_watermark_idle VALUES (1, 'Another Bob in partition 0 shortly after'); ``` ```sql ALTER TABLE `orders` SET ('value.format.schema-context' = '.lsrc-newcontext'); ``` ```sql Statement phase is COMPLETED. ``` ```sql SHOW CREATE TABLE `orders`; ``` ```sql +----------------------------------------------------------------------+ | SHOW CREATE TABLE | +----------------------------------------------------------------------+ | CREATE TABLE `catalog`.`database`.`orders` ( | | `user` BIGINT NOT NULL, | | `product` VARCHAR(2147483647), | | `amount` INT, | | `ts` TIMESTAMP(3) | | ) | | DISTRIBUTED BY HASH(`user`) INTO 6 BUCKETS | | WITH ( | | 'changelog.mode' = 'upsert', | | 'connector' = 'confluent', | | 'kafka.cleanup-policy' = 'delete', | | 'kafka.max-message-size' = '2097164 bytes', | | 'kafka.retention.size' = '0 bytes', | | 'kafka.retention.time' = '604800000 ms', | | 'key.format' = 'avro-registry', | | 'scan.bounded.mode' = 'unbounded', | | 'scan.startup.mode' = 'latest-offset', | | 'value.format' = 'avro-registry', | | 'value.format.schema-context' = '.lsrc-newcontext' | | ) | | | +----------------------------------------------------------------------+ ``` ```sql { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" } ] } ``` ```sql ALTER TABLE t_metadata_overlap ADD `timestamp` TIMESTAMP_LTZ(3) NOT NULL METADATA; ``` ```sql CREATE TABLE t_metadata_overlap` ( `key` VARBINARY(2147483647), `uid` INT NOT NULL, `timestamp` TIMESTAMP(3) WITH LOCAL TIME ZONE NOT NULL METADATA ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( ... ) ``` ```sql INSERT INTO t_metadata_overlap SELECT CAST(NULL AS BYTES), 42, TO_TIMESTAMP_LTZ(0, 3); ``` ```sql ALTER TABLE t_metadata_overlap DROP `timestamp`; ALTER TABLE t_metadata_overlap ADD message_timestamp TIMESTAMP_LTZ(3) METADATA FROM 'timestamp'; SELECT * FROM t_metadata_overlap; ``` ```sql CREATE TABLE `t_metadata_overlap` ( `key` VARBINARY(2147483647), `uid` INT NOT NULL, `timestamp` VARCHAR(2147483647), `message_timestamp` TIMESTAMP(3) WITH LOCAL TIME ZONE METADATA FROM 'timestamp' ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( ... ) ``` ```sql { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" } ] } ``` ```sql CREATE TABLE `t_enrich_raw_key` ( `key` VARBINARY(2147483647), `uid` INT NOT NULL ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'avro-registry' ... ) ``` ```sql ALTER TABLE t_enrich_raw_key MODIFY key STRING; ``` ```sql CREATE TABLE `t_enrich_raw_key` ( `key` STRING, `uid` INT NOT NULL ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'avro-registry' ... ) ``` ```sql SHOW CREATE TABLE events; ``` ```sql CREATE TABLE `events` ( `key` VARBINARY(2147483647), `value` VARBINARY(2147483647) ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'raw' ) ``` ```sql ALTER TABLE events SET ( 'value.format' = 'avro-registry', 'value.avro-registry.subject-names' = 'com.example.Order;com.example.Shipment' ); ``` ```sql ALTER TABLE events SET ( 'value.format' = 'json-registry', 'value.json-registry.subject-names' = 'com.example.Order;com.example.Shipment' ); ``` ```sql ALTER TABLE events SET ( 'value.format' = 'proto-registry', 'value.proto-registry.subject-names' = 'com.example.Order;com.example.Shipment' ); ``` ```sql ALTER TABLE events SET ( 'key.format' = 'avro-registry', 'key.avro-registry.subject-names' = 'com.example.OrderKey' ); ``` ```sql ALTER TABLE events SET ( 'key.format' = 'avro-registry', 'key.avro-registry.subject-names' = 'com.example.OrderKey', 'value.format' = 'avro-registry', 'value.avro-registry.subject-names' = 'com.example.Order;com.example.Shipment' ); ``` ```sql avro-registry ``` ```sql json-registry ``` ```sql proto-registry ``` ```sql ALTER TABLE json_table RESET ( 'value.json-registry.wire-encoding', 'value.json-registry.subject-names' ); ``` ```sql ALTER TABLE my_table SET ( 'error-handling.mode' = 'log', 'error-handling.log.target' = 'my_error_table' ); ``` --- ### SQL ALTER VIEW Statement in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/statements/alter-view.html ALTER VIEW Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables modifying properties of an existing view. Syntax¶ ALTER VIEW [catalog_name.][db_name.]view_name RENAME TO new_view_name ALTER VIEW [catalog_name.][db_name.]view_name AS new_statement_expression Description¶ ALTER VIEW enables you to change the name of a view or modify the statement expression that defines the view. The first syntax enables renaming a view within the same catalog and database. The new view name must not already exist in the catalog and database. The second syntax enables changing the underlying statement that defines the view. The new statement expression must be a valid SELECT statement supported by Flink SQL. The schema of the new statement expression must be compatible with the schema of the existing view. Examples¶ The following examples show frequently encountered scenarios with ALTER VIEW. Rename a view¶ In the Confluent CLI or in a Cloud Console workspace, run the following commands to rename a view. Create a view. CREATE VIEW customer_orders AS SELECT customer_id, SUM(price) AS total_spent FROM `examples`.`marketplace`.`orders` GROUP BY customer_id; Rename the view. ALTER VIEW customer_orders RENAME TO vip_customers; Your output should resemble: Statement phase is COMPLETED. Query the renamed view. SELECT * FROM vip_customers; The statement now references the view by its new name. Change the statement expression of a view¶ View the current definition of the view. SHOW CREATE VIEW vip_customers; Your output should resemble: +------------------------------------------------------------------------------+ | SHOW CREATE VIEW | +------------------------------------------------------------------------------+ | CREATE VIEW vip_customers AS SELECT customer_id, SUM(price) AS total_spent | | FROM orders | | GROUP BY customer_id; | +------------------------------------------------------------------------------+ Change the statement expression of the view. ALTER VIEW vip_customers AS SELECT customer_id, SUM(price) AS total_spent, COUNT(*) AS order_count FROM `examples`.`marketplace`.`orders` GROUP BY customer_id HAVING SUM(price) > 1000; Your output should resemble: Statement phase is COMPLETED. View the updated definition of the view. SHOW CREATE VIEW vip_customers; Your output should resemble: +-----------------------------------------------------------------------------------------------------+ | SHOW CREATE VIEW | +-----------------------------------------------------------------------------------------------------+ | CREATE VIEW vip_customers AS SELECT customer_id, SUM(price) AS total_spent, COUNT(*) AS order_count | | FROM orders | | GROUP BY customer_id | | HAVING SUM(price) > 1000; | +-----------------------------------------------------------------------------------------------------+ The view now includes an additional order_count column representing the number of orders per customer, and filters for only those customers who have spent more than 1000. Related content¶ CREATE VIEW statement SELECT statement Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql ALTER VIEW [catalog_name.][db_name.]view_name RENAME TO new_view_name ALTER VIEW [catalog_name.][db_name.]view_name AS new_statement_expression ``` ```sql CREATE VIEW customer_orders AS SELECT customer_id, SUM(price) AS total_spent FROM `examples`.`marketplace`.`orders` GROUP BY customer_id; ``` ```sql ALTER VIEW customer_orders RENAME TO vip_customers; ``` ```sql Statement phase is COMPLETED. ``` ```sql SELECT * FROM vip_customers; ``` ```sql SHOW CREATE VIEW vip_customers; ``` ```sql +------------------------------------------------------------------------------+ | SHOW CREATE VIEW | +------------------------------------------------------------------------------+ | CREATE VIEW vip_customers AS SELECT customer_id, SUM(price) AS total_spent | | FROM orders | | GROUP BY customer_id; | +------------------------------------------------------------------------------+ ``` ```sql ALTER VIEW vip_customers AS SELECT customer_id, SUM(price) AS total_spent, COUNT(*) AS order_count FROM `examples`.`marketplace`.`orders` GROUP BY customer_id HAVING SUM(price) > 1000; ``` ```sql Statement phase is COMPLETED. ``` ```sql SHOW CREATE VIEW vip_customers; ``` ```sql +-----------------------------------------------------------------------------------------------------+ | SHOW CREATE VIEW | +-----------------------------------------------------------------------------------------------------+ | CREATE VIEW vip_customers AS SELECT customer_id, SUM(price) AS total_spent, COUNT(*) AS order_count | | FROM orders | | GROUP BY customer_id | | HAVING SUM(price) > 1000; | +-----------------------------------------------------------------------------------------------------+ ``` ```sql order_count ``` --- ### SQL CREATE CONNECTION Statement in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/statements/create-connection.html CREATE CONNECTION Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports creating secure connections to external services and data sources. You can use these connections in your Flink statements. Connections are resources that you define to configure parameters needed for connecting to third-party services. Connections include endpoint and authentication information. They provide a way to handle sensitive information such as credentials while ensuring security. Connections are essential for secure communications in Confluent AI and Flink UDFs to make secure calls to external services. For more information, see Reuse Confluent Cloud Connections With External Services. A connection has its own lifecycle and can be created, managed, updated, or deleted by users with appropriate permissions. For more information, see Manage Connections. Confluent Cloud for Apache Flink makes a best-effort attempt to redact sensitive values from the CREATE CONNECTION and ALTER CONNECTION statements by masking the values for the known sensitive keys. In Confluent Cloud Console, the sensitive values are redacted in the Flink SQL workspace if you navigate away from the workspace and return, or if you reload the page in the browser. Alternatively, you can use the Confluent CLI commands to create and manage connections. In addition, if syntax in the CREATE CONNECTION statement is incorrect, Confluent Cloud for Apache Flink may not detect the secrets. For example, if you type CREATE CONNECTION my_conn WITH ('ap-key' = 'x'), Flink won’t redact the x, because api-key is misspelled. Note Connection resources are an Open Preview feature in Confluent Cloud. A Preview feature is a Confluent Cloud component that is being introduced to gain early feedback from developers. Preview features can be used for evaluation and non-production testing purposes or to provide feedback to Confluent. The warranty, SLA, and Support Services provisions of your agreement with Confluent do not apply to Preview features. Confluent may discontinue providing preview releases of the Preview features at any time in Confluent’s’ sole discretion. Syntax¶ CREATE [OR REPLACE] CONNECTION [IF NOT EXISTS] [catalog_name.][db_name.]connection_name [COMMENT connection_comment] WITH ( 'type' = '', 'endpoint' = '', ['sse-endpoint' = ''], ['api-key' = 'api_key'] | ['username' = 'user_name', 'password' = 'user_password'] | ['aws-access-key' = '', 'aws-secret-key' = '', 'aws-session-token' = ''] | ); Description¶ Create a new secure connection to an external service or data source. Change the authorization settings of an existing connection by using the ALTER CONNECTION statement. To remove a connection from the current database, use the DROP CONNECTION statement. Confluent Cloud for Apache Flink supports these authentication methods: Basic: username and password. The credentials are added to the HTTP request as a BASIC header. Bearer: token. The credentials are added to the HTTP request as a BEARER header. OAuth: token-endpoint, client-id, client-secret, and scope. The provided options are used to retrieve the OAuth token from the token endpoint and add the token to the HTTP request as a BEARER token. Connection types¶ The following connection types are supported: azureml azureopenai bedrock confluent_jdbc couchbase elastic googleai mcp_server mongodb openai pinecone rest sagemaker vertexai Authorization¶ Depending on the connection type, the following authorization methods are supported: API key: azureml, azureopenai, elastic, googleai, mcp_server, openai, pinecone basic: mongodb, couchbase, confluent_jdbc, or rest bearer: rest or mcp_server connections oauth: rest or mcp_server connections Secrets are extracted to the secret store and aren’t displayed in subsequent DESCRIBE CONNECTION statements, the Flink SQL shell, or the Confluent Cloud Console. The maximum secret length is 4000 bytes, which is checked after the string is converted to bytes. Examples¶ -- example AzureML connection with API key CREATE CONNECTION `my-azureml-connection` WITH ( 'type' = 'AZUREML', 'endpoint' = 'https://myworkspace.myregion.inference.ml.azure.com/test', 'api_key' = '' ); -- example AzureML connection with comment CREATE CONNECTION `my-azureml-connection` COMMENT 'Connection Comment' WITH ( 'type' = 'AZUREML', 'endpoint' = 'https://myworkspace.myregion.inference.ml.azure.com/test', 'api_key' = '' ); -- example Couchbase connection with basic authorization CREATE CONNECTION `my-couchbase-connection` WITH ( 'type' = 'COUCHBASE', 'endpoint' = 'couchbases://my-cluster.cloud.couchbase.com', 'username' = '', 'password' = '' ); -- example Bedrock connection with AWS authentication CREATE CONNECTION `my-bedrock-connection` WITH ( 'type' = 'BEDROCK', 'endpoint' = 'https://bedrock-runtime.us-east-1.amazonaws.com/model/my-model/invoke', 'aws-access-key' = '', 'aws-secret-key' = '', 'aws-session-token' = '' ); -- example REST connection with bearer token CREATE CONNECTION `my-rest-connection` WITH ( 'type' = 'REST', 'endpoint' = 'https://myrest.connection.com', 'token' = '' ); -- example MCP server connection with OAuth CREATE CONNECTION `my-mcp-connection` WITH ( 'type' = 'MCP_SERVER', 'endpoint' = 'https://mymcp.connection.com', 'scope' = '', 'token-endpoint' = '', 'client-id' = '', 'client-secret' = '' ); MongoDB external table¶ -- Create a MongoDB connection with basic authorization. CREATE CONNECTION `my-mongodb-connection` WITH ( 'type' = 'MONGODB', 'endpoint' = 'mongodb+srv://myCluster.mongodb.net/myDatabase', 'username' = '', 'password' = '' ); -- Use the MongoDB connection to create a MongoDB external table. CREATE TABLE mongodb_movies_full_text_search ( title STRING, plot STRING ) WITH ( 'connector' = 'mongodb', 'mongodb.connection' = 'my-mongodb-connection', 'mongodb.database' = 'sample_mflix', 'mongodb.collection' = 'movies', 'mongodb.index' = 'default' ); Confluent JDBC¶ -- Create a Confluent JDBC connection with basic authorization. CREATE CONNECTION `jdbc-postgres-connection` WITH ( 'type' = 'confluent_jdbc', 'endpoint' = 'jdbc:postgresql://my.example.com:5432/mydatabase', 'username' = '', 'password' = ''); -- Use the Confluent JDBC connection to create a table. CREATE TABLE jdbc_postgres ( show_id STRING, type STRING, title STRING, cast_members STRING, country STRING, date_added DATE, release_year INT, rating STRING, duration STRING, listed_in STRING, description STRING ) WITH ( 'connector' = 'confluent-jdbc', 'confluent-jdbc.connection' = 'jdbc-postgres-connection', 'confluent-jdbc.table-name' = 'netflix_shows' ); Related content¶ ALTER CONNECTION DESCRIBE CONNECTION DROP CONNECTION SHOW CONNECTIONS Reuse Confluent Cloud Connections With External Services Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql CREATE CONNECTION my_conn WITH ('ap-key' = 'x') ``` ```sql CREATE [OR REPLACE] CONNECTION [IF NOT EXISTS] [catalog_name.][db_name.]connection_name [COMMENT connection_comment] WITH ( 'type' = '', 'endpoint' = '', ['sse-endpoint' = ''], ['api-key' = 'api_key'] | ['username' = 'user_name', 'password' = 'user_password'] | ['aws-access-key' = '', 'aws-secret-key' = '', 'aws-session-token' = ''] | ); ``` ```sql token-endpoint ``` ```sql client-secret ``` ```sql -- example AzureML connection with API key CREATE CONNECTION `my-azureml-connection` WITH ( 'type' = 'AZUREML', 'endpoint' = 'https://myworkspace.myregion.inference.ml.azure.com/test', 'api_key' = '' ); -- example AzureML connection with comment CREATE CONNECTION `my-azureml-connection` COMMENT 'Connection Comment' WITH ( 'type' = 'AZUREML', 'endpoint' = 'https://myworkspace.myregion.inference.ml.azure.com/test', 'api_key' = '' ); -- example Couchbase connection with basic authorization CREATE CONNECTION `my-couchbase-connection` WITH ( 'type' = 'COUCHBASE', 'endpoint' = 'couchbases://my-cluster.cloud.couchbase.com', 'username' = '', 'password' = '' ); -- example Bedrock connection with AWS authentication CREATE CONNECTION `my-bedrock-connection` WITH ( 'type' = 'BEDROCK', 'endpoint' = 'https://bedrock-runtime.us-east-1.amazonaws.com/model/my-model/invoke', 'aws-access-key' = '', 'aws-secret-key' = '', 'aws-session-token' = '' ); -- example REST connection with bearer token CREATE CONNECTION `my-rest-connection` WITH ( 'type' = 'REST', 'endpoint' = 'https://myrest.connection.com', 'token' = '' ); -- example MCP server connection with OAuth CREATE CONNECTION `my-mcp-connection` WITH ( 'type' = 'MCP_SERVER', 'endpoint' = 'https://mymcp.connection.com', 'scope' = '', 'token-endpoint' = '', 'client-id' = '', 'client-secret' = '' ); ``` ```sql -- Create a MongoDB connection with basic authorization. CREATE CONNECTION `my-mongodb-connection` WITH ( 'type' = 'MONGODB', 'endpoint' = 'mongodb+srv://myCluster.mongodb.net/myDatabase', 'username' = '', 'password' = '' ); -- Use the MongoDB connection to create a MongoDB external table. CREATE TABLE mongodb_movies_full_text_search ( title STRING, plot STRING ) WITH ( 'connector' = 'mongodb', 'mongodb.connection' = 'my-mongodb-connection', 'mongodb.database' = 'sample_mflix', 'mongodb.collection' = 'movies', 'mongodb.index' = 'default' ); ``` ```sql -- Create a Confluent JDBC connection with basic authorization. CREATE CONNECTION `jdbc-postgres-connection` WITH ( 'type' = 'confluent_jdbc', 'endpoint' = 'jdbc:postgresql://my.example.com:5432/mydatabase', 'username' = '', 'password' = ''); -- Use the Confluent JDBC connection to create a table. CREATE TABLE jdbc_postgres ( show_id STRING, type STRING, title STRING, cast_members STRING, country STRING, date_added DATE, release_year INT, rating STRING, duration STRING, listed_in STRING, description STRING ) WITH ( 'connector' = 'confluent-jdbc', 'confluent-jdbc.connection' = 'jdbc-postgres-connection', 'confluent-jdbc.table-name' = 'netflix_shows' ); ``` --- ### Flink SQL CREATE TABLE Statement in Confluent Cloud | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/statements/create-function.html CREATE FUNCTION Statement¶ Confluent Cloud for Apache Flink® enables registering customer user defined functions (UDFs) by using the CREATE FUNCTION statement. When your UDFs are registered in a Flink database, you can use it in your SQL queries. Syntax¶ CREATE FUNCTION AS USING JAR 'confluent-artifact:///'; Description¶ Register a user defined function (UDF) in the current database. To remove a (UDF) from the current database, use the DROP FUNCTION statement. Related content¶ Create a User-Defined Function with Confluent Cloud for Apache Flink confluent flink artifact create Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql CREATE FUNCTION AS USING JAR 'confluent-artifact:///'; ``` --- ### SQL CREATE MODEL Statement in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/statements/create-model.html CREATE MODEL Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables real-time inference and prediction with AI and ML models. The Flink SQL interface is available in Cloud Console and the Flink SQL shell. Get started using AI models with Run an AI Model. The following providers are supported: AWS Bedrock AWS Sagemaker Azure Machine Learning (Azure ML) Azure OpenAI Google AI OpenAI Vertex AI Syntax¶ CREATE MODEL [IF NOT EXISTS] [[catalogname].[database_name]].model_name [INPUT (input_column_list)] [OUTPUT (output_column_list)] [COMMENT model_comment] WITH(model_option_list) Description¶ Create a new AI model. If a model with the same name exists already, a new version of the model is created. For more information, see version. If the IF NOT EXISTS option is specified and a model with the same name exists already, the statement is ignored. To view the currently registered models, use the SHOW MODELS statement. To view the WITH options that were used to create the model, run the SHOW CREATE MODEL statement. To view the versions, inputs, and outputs of the model, run the Models statement. To change the name or options of an existing model, use the ALTER MODEL statement. To delete a model from the current environment, use the DROP MODEL statement. Tip If you get a 429 error when you run a CREATE MODEL statement, the most likely cause is rate limiting by the model provider. Some providers, like Azure OpenAI, support increasing the default limit of tokens per minute. Increasing this limit to match your throughput may fix 429 errors. Task types¶ Confluent Cloud for Apache Flink supports these types of analysis for AI model inference: Classification: Categorize input data into predefined classes or labels. This task is used in applications like spam detection, where emails are classified as “spam” or “not spam”, and image recognition. Clustering: Group a set of objects so that objects in the same group, called a “cluster”, are more similar to each other than to those in other groups. This task is a form of unsupervised learning, because it doesn’t rely on predefined categories. Applications include customer segmentation in marketing and gene sequence analysis in biology. Embedding: Transform high-dimensional data into lower-dimensional vectors while preserving the relative distances between data points. This is crucial for tasks like natural language processing (NLP), where words or sentences are converted into vectors, enabling models to understand semantic similarities. Embeddings are used in recommendation systems, search engines, and more. Regression: Regression models predict a continuous output variable based on one or more input features. This task is used in scenarios like predicting house prices based on features like size, location, and number of bedrooms, or forecasting stock prices. Regression analysis helps in understanding the relationships between variables and forecasting. Text generation: Generate human-like text based on input data. Applications include chatbots, content creation, and language translation. When you register an AI or ML model, you specify the task type by using the task property. task is a required property, but it applies only when using the ML_EVALUATE function. Examples¶ The following code example shows how to run an AI model. The model must be created with the model provider and registered by using the CREATE MODEL statement with . SELECT * FROM my_table, LATERAL TABLE(ML_PREDICT('', column1, column2)); All of the CREATE MODEL statements require a connection resource that you create by using the CREATE CONNECTION statement. For example, the following code example shows how to create a connection for AWS Bedrock. # Example command to create a connection for AWS Bedrock. CREATE CONNECTION bedrock-connection WITH ( 'type' = 'bedrock', 'endpoint' = 'https://bedrock-runtime.us-west-2.amazonaws.com/model/amazon.titan-embed-text-v1/invoke', 'aws-access-key' = '', 'aws-secret-key' = '', 'aws-session-token' = '' ); Classification task¶ The following example shows how to create an OpenAI classification model. For more information, see Sentiment analysis with OpenAI LLM. CREATE MODEL sentimentmodel INPUT(text STRING) OUTPUT(sentiment STRING) COMMENT 'sentiment analysis model' WITH ( 'provider' = 'openai', 'task' = 'classification', 'openai.connection' = '', 'openai.model_version' = 'gpt-3.5-turbo', 'openai.system_prompt' = 'Analyze the sentiment of the text and return only POSITIVE, NEGATIVE, or NEUTRAL.' ); Clustering task¶ The following example shows how to create an Azure ML clustering model. It requires that a K-Means model has been trained and deployed on Azure. Replace and with your values. CREATE MODEL clusteringmodel INPUT (vectors ARRAY, other_feature INT, other_feature2 STRING) OUTPUT (cluster_num INT) WITH ( 'task' = 'clustering', 'provider' = 'azureml', 'azureml.connection' = '' ); Embedding task¶ The following example shows how to create an AWS Bedrock text embedding model. Replace with your value. For more information, see Text embedding with AWS Bedrock and Azure OpenAI. CREATE MODEL embeddingmodel INPUT (text STRING) OUTPUT (embedding ARRAY) WITH ( 'task' = 'embedding', 'provider' = 'bedrock', 'bedrock.connection' = '' ); Text generation task¶ The following example shows how to create an OpenAI text generation task for translating from English to Spanish. CREATE MODEL translatemodel INPUT(english STRING) OUTPUT(spanish STRING) COMMENT 'spanish translation model' WITH ( 'provider' = 'openai', 'task' = 'text_generation', 'openai.connection' = '', 'openai.model_version' = 'gpt-3.5-turbo', 'openai.system_prompt' = 'Translate to spanish' ); For more examples, see Run an AI Model. Model versioning¶ A model can have multiple versions. A version is an integer number that starts at 1. The default version for a new model is 1. Currently, the maximum number of supported versions is 10. New versions are created by the CREATE MODEL statement for the same model name. A new version increments the current maximum version by 1. To view the versions of a model, use the DESCRIBE MODEL statement. Only model options are versioned, which that means input/output format and comments don’t change across versions. The statement fails if input format, output format, or comments change. For model options, model task changes are not permitted. The following code example shows the result of running CREATE MODEL twice with the same model name. CREATE MODEL `my-model` ... -- Output `my-model` with version 1 created. Default version: 1 CREATE MODEL `my-model` ... -- Output `my-model` with version 2 created. Default version: 1 By default, version 1 is the default version when a model is first created. As more versions are created by the CREATE MODEL statement, you can change the default version by using the ALTER MODEL statement. The following example shows how to change the default version of an existing model. ALTER MODEL SET ('default_version'=''); You can access a specific version of a model in queries by using the $ syntax. If no version is specified, the default version is used. The following code examples show how to use a specific version of a model in a query. -- Use version 2 of the model. SELECT * FROM `my-table` LATERAL TABLE (ML_PREDICT('my-model$2', col1, col2)); -- Use the default version of the model. SELECT * FROM `my-table` LATERAL TABLE (ML_PREDICT('my-model', col1, col2)); Use the $ syntax to delete a specific version of a model: -- Delete a specific version of the model. DROP MODEL `$`; -- Delete all versions and the model. DROP MODEL `$all`; The maximum version number is the next default version. If all versions are dropped, the whole model is deleted. To change the version of an existing model, use the ALTER MODEL statement. If no version is specified, the default version is changed. ALTER MODEL `$` SET ('k1'='v1', 'k2'='v2'); WITH options¶ Specify the details of your AI inference model by using the WITH clause. The following tables show the supported properties in the WITH clause. Model Provider Property Common {PROVIDER}.client_timeout {PROVIDER}.connection {PROVIDER}.input_format {PROVIDER}.input_content_type {PROVIDER}.output_format {PROVIDER}.output_content_type {PROVIDER}.PARAMS.* {PROVIDER}.system_prompt OpenAI openai.input_format openai.model_version Azure OpenAI azureopenai.input_format azureopenai.model_version Azure ML azureml.input_format azureml.deployment_name Google AI googleai.input_format Sagemaker sagemaker.custom_attributes sagemaker.enable_explanations sagemaker.inference_component_name sagemaker.inference_id sagemaker.input_content_type sagemaker.output_content_type sagemaker.target_container_hostname sagemaker.target_model sagemaker.target_variant Vertex AI vertexai.service_key vertexai.input_format Connection resource¶ Secrets must be set by using a connection resource that you create by using the CREATE CONNECTION statement. The connection resource securely contains the provider endpoint and secrets like the API key. For example, the following code example shows how to create a connection to OpenAI, named openai-connection. CREATE CONNECTION openai-connection WITH ( 'type' = 'openai', 'endpoint' = 'https://api.openai.com/v1/chat/completions', 'api-key' = '' ); Specify the connection by name in the {PROVIDER}.connection property of the WITH clause. The environment, cloud, and region options in the CREATE CONNECTION statement must be the same as the compute pool which uses the connection. The following code example shows how to refer to the connection named openai-connection in the WITH clause: 'openai.connection' = 'openai-connection' The maximum secret length is 4000 bytes, which is checked after the string is converted to bytes. Common properties¶ The following properties are common to all of the model providers. {PROVIDER}.client_timeout¶ Set the request timeout to the client endpoint. {PROVIDER}.connection¶ Set the credentials for connecting to a model provider. Create the connection resource by using the CREATE CONNECTION statement. This property is required. {PROVIDER}.input_format¶ Set the json, text, or binary input format used by the model. Each provider has a default value. This property is optional. For supported input formats, see Text generation and LLM model formats and Other formats. {PROVIDER}.input_content_type¶ The HTTP content media type header to set when calling the model. The value is a Media/MIME type. The default is chosen based on input_format. Usually, this property is required only for Sagemaker and Bedrock models. {PROVIDER}.output_format¶ Set the json, text, or binary output format used by the model. The default is chosen based on input_format. This property is optional. For supported output formats, see Text generation and LLM model formats and Other formats.. {PROVIDER}.output_content_type¶ The HTTP Accept media type header to set when calling the model. The value is a Media/MIME type. The default is chosen based on output_format. Usually, this property is required only for Sagemaker and Bedrock models. {PROVIDER}.PARAMS.*¶ Provide parameters based on the input_format. The maximum number of parameters you can set is 32. This property is optional. For more information, see Parameters. {PROVIDER}.system_prompt¶ A system prompt passed to an LLM model to give it general behavioral instructions. The value is a string. Not all models support a system prompt. This property is optional. task¶ Specify the kind of analysis to perform. Supported values are: “classification” “clustering” “embedding” “regression” “text_generation” This property is required, but it applies only when using the ML_EVALUATE function. OpenAI properties¶ openai.input_format¶ Set the input format used by the model. The default is OPENAI-CHAT. This property is optional. openai.model_version¶ Set the version string of the requested model. The default is gpt-3.5-turbo. This property is optional. Azure OpenAI properties¶ Properties for OpenAI models deployed in Azure AI Studio. Azure OpenAI accepts all of the OpenAI parameters, but with a different endpoint. azureopenai.input_format¶ Set the input format used by the model. The default is OPENAI-CHAT. This property is optional. azureopenai.model_version¶ Set the version string of the requested model. The default is gpt-3.5-turbo. This property is optional. Azure ML properties¶ Properties for both Azure Machine Learning and LLM models from Azure AI Studio can use this provider. azureml.input_format¶ Set the input format used by the model. The default is AZUREML-PANDAS-DATAFRAME. For AI Studio LLMs, OPENAI-CHAT is usually the correct format, even for non-OpenAI models. This property is optional. azureml.deployment_name¶ Set the model name. Bedrock properties¶ The default input_format for Bedrock is determined automatically based on the model endpoint, or AMAZON-TITAN-TEXT if there is no match. If necessary, change it to match the model for your endpoint. Google AI properties¶ googleai.input_format¶ Set the input format used by the model. The default is GEMINI-GENERATE. This property is optional. Sagemaker properties¶ sagemaker.custom_attributes¶ Set a model-dependent value that is passed through to Sagemaker in the header of the same name. This property is optional. sagemaker.enable_explanations¶ Enable writing explanations, if your model supports them. Passed through to Sagemaker in the header of the same name. If your model supports writing explanations, they should be disabled, because Confluent Cloud for Apache Flink currently doesn’t support reading them. Don’t set enable_explanations if the model doesn’t support explanations, because this causes Sagemaker to return an error. This property is optional. sagemaker.inference_component_name¶ Specify which inference component to use in the endpoint. Passed through to Sagemaker in the header of the same name. This property is optional. sagemaker.inference_id¶ Set an ID that is passed through to Sagemaker in the header of the same name. Used for tracking request origins. This property is optional. sagemaker.input_content_type¶ The HTTP content media type header to set when calling the model. Setting this property overrides the Content-type header for the model request. Many Sagemaker models use this header to determine their behavior, but set it only if choosing an appropriate input_format is not sufficient. This property is optional. sagemaker.output_content_type¶ The HTTP Accept media type header to set when calling the model. Setting this property overrides the Accept header for the model request. Some Sagemaker models use this header to determine their outputs, but set it only if choosing an appropriate output_format is not sufficient. This property is optional. sagemaker.target_container_hostname¶ Allows calling a specific container when the endpoint has multiple containers. Passed through to Sagemaker in the header of the same name. This property is optional. sagemaker.target_model¶ Enables calling a specific model from multiple models deployed to the same endpoint. Passed through to Sagemaker in the header of the same name. This property is optional. sagemaker.target_variant¶ Enables calling a specific version of the model from multiple deployed variants. Passed through to Sagemaker in the header of the same name. This property is optional. Vertex AI properties¶ vertexai.service_key¶ Set the Service Account Key of a service account with permission to call the inference endpoint. This value is a secret. This property is required. vertexai.input_format¶ Set the input format used by the model. The default is TF-SERVING. Defaults to GEMINI-GENERATE if the endpoint is for a published Gemini model. This property is optional. Supported input/output formats¶ The following input/output formats for text generation and LLM models are supported. AI-21-COMPLETE AMAZON-TITAN-EMBED AMAZON-TITAN-TEXT ANTHROPIC-COMPLETIONS ANTHROPIC-MESSAGES AZURE-EMBED BEDROCK-LLAMA COHERE-CHAT COHERE-EMBED COHERE-GENERATE GEMINI-GENERATE GEMINI-CHAT MISTRAL-CHAT MISTRAL-COMPLETIONS OPENAI-CHAT OPENAI-EMBED VERTEX-EMBED The following additional input/output formats are supported. AZUREML-PANDAS-DATAFRAME AZUREML-TENSOR BINARY CSV JSON JSON-ARRAY JSON:wrapper KSERVE-V1 KSERVE-V2 MLFLOW-TENSOR PANDAS-DATAFRAME TEXT TF-SERVING TF-SERVING-COLUMN TRITON VERTEXAI-PYTORCH Parameters¶ The text generation and LLM formats support some or all of the following parameters. {PROVIDER}.PARAMS.temperature¶ Controls the randomness or “creativity” of the output. Typical values are between 0.0 and 1.0. This parameter is model-dependent. Its type is Float. {PROVIDER}.PARAMS.top_p¶ The probability cutoff for token selection. Usually, either temperature or top_p are specified, but not both. This parameter is model-dependent. Its type is Float. {PROVIDER}.PARAMS.top_k¶ The number of possible tokens to sample from at each step. This parameter is model-dependent. Its type is Float. {PROVIDER}.PARAMS.stop¶ A CSV list of strings to pass as stop sequences to the model. {PROVIDER}.PARAMS.max_tokens¶ The maximum number of tokens for the model to return. Its type is Int. Text generation and LLM model formats¶ The following formats are intended for text generation models and LLMs. They require that the model has a single STRING input and a single STRING output. AI-21-COMPLETE¶ This format is for models using the AI21 Labs J2 Complete API, including the AI21 Labs Foundation models on AWS Bedrock. This format does not support the top_k parameter. AMAZON-TITAN-EMBED¶ This format is for Amazon Titan Text Embedding models. AMAZON-TITAN-TEXT¶ The format is for Amazon’s Titan Text models. This is the default format for the AWS Bedrock provider. This format does not support the top_k parameter. ANTHROPIC-COMPLETIONS¶ This format is for models using the Anthropic Claude Text Completions API, including some Anthropic models on AWS Bedrock. ANTHROPIC-MESSAGES¶ This format is for models using the Anthropic Claude Messages API, including some Anthropic models on AWS Bedrock. Some Anthropic models accept both this and the Completions API format. AZURE-EMBED¶ The embedding format used by other foundation models on Azure. This format is the same as OPENAI-EMBED. BEDROCK-LLAMA¶ The format used by Llama models on AWS Bedrock. This format does not support the top_k or stop parameters. COHERE-CHAT¶ The Cohere Chat API format. COHERE-EMBED¶ Cohere’s Embedding API format. COHERE-GENERATE¶ The legacy Cohere Chat API format. This format is used by AWS Bedrock Cohere Command models. GEMINI-GENERATE¶ The Google Gemini API format. This is the default format for the Google AI provider, but you can also use it with Gemini models on the Google Vertex AI. GEMINI-CHAT¶ Same as the GEMINI-GENERATE format. MISTRAL-CHAT¶ The standard Mistral API format. MISTRAL-COMPLETIONS¶ The legacy Mistral Completions API format used by AWS Bedrock. OPENAI-CHAT¶ The OpenAI Chat API format. This is the default for the OpenAI and Azure OpenAI providers. It is also generally used by most non-OpenAI LLM models deployed in Azure AI Studio using the Azure ML provider. OPENAI-EMBED¶ The OpenAI Embedding model format. VERTEX-EMBED¶ The Embedding format for Vertex AI Gemini models. Other formats¶ The following formats are intended for predictive models running on providers like Sagemaker, Vertex AI, and Azure ML. Usually, these models are used for tasks like classification, regression, and clustering. Currently, none of these formats support PARAMS. Unless specified, each input format defaults to the associated output format with the same name. AZUREML-PANDAS-DATAFRAME¶ Azure ML’s version of the Pandas Dataframe Split format. The only difference is that this version has “input_data” as the top-level field, instead of “dataframe_split”. This is the default format for Azure ML models. The output format defaults to JSON-ARRAY. AZUREML-TENSOR¶ Azure ML’s version of named input tensors. Equivalent to the “JSON:input_data” input format. the output format defaults to “JSON:outputs”. BINARY¶ Raw binary inputs, serialized in little-endian byte order. This input format accepts multiple input columns, which are packed in order. CSV¶ Comma separated text. This is the default format for Sagemaker models, but Sagemaker models vary widely, and most models must choose a different format. JSON¶ The inputs are formatted as a JSON object, with field names equal to the column names of the model input schema. The JSON format supports user-defined parameters. If you specify '{provider}.params.some_key'='value' in the WITH options, the key and value are used in the JSON input as {"some_key": "value"}. Example: { "column1": "String Data", "column2": [1,2,3,4] } JSON-ARRAY¶ The inputs are formatted as a JSON array, including [] brackets, but without the {} braces of a top-level JSON object. Column names are not included in the format. If the model takes a single input array column, it will be output as the top-level array. Models with multiple inputs have their arrays nested in JSON fashion. This format is usually appropriate for models that expect Numpy arrays. Example: [1,2,3,"String Data"] JSON:wrapper¶ Similar to the default JSON behavior, but all fields are wrapped in a named top-level object. The wrapper may be any valid JSON string. Example: { "wrapper": { "column1": "String Data", "column2: [1,2,3,4] } } KSERVE-V1¶ Same as the TF-SERVING format. KSERVE-V2¶ Same as the TRITON format. MLFLOW-TENSOR¶ The format used by some MLFlow models. It is the same format as TF-SERVING-COLUMN. PANDAS-DATAFRAME¶ The Pandas Dataframe Split format used by most MLFlow models. The output format defaults to JSON-ARRAY. TEXT¶ Model input values formatted as raw text. Use newlines to separate multiple inputs. TF-SERVING¶ The Tensorflow Serving Row format. This is the default format for Vertex AI models. It is generally the correct format to use for most predictive models trained in Vertex AI. TF-SERVING-COLUMN¶ The TensorFlow Serving Column format. It is exactly equivalent to “JSON:inputs”. The output format defaults to “JSON:outputs”. TRITON¶ The Triton/KServeV2 format used by NVidia Triton Inference Servers. When possible, this format serializes data in the protocol’s mixed json+binary format. Note that some Tensor datatypes, like 16-bit floats, do not have an exact equivalent in Flink SQL, but they are converted, when possible. VERTEXAI-PYTORCH¶ Vertex AI’s format for PyTorch models. This format is the TF-SERVING format with an extra wrapper around the data. The output format defaults to TF-SERVING. Related content¶ ALTER MODEL DROP MODEL Run an AI Model Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql CREATE MODEL [IF NOT EXISTS] [[catalogname].[database_name]].model_name [INPUT (input_column_list)] [OUTPUT (output_column_list)] [COMMENT model_comment] WITH(model_option_list) ``` ```sql ``` ```sql SELECT * FROM my_table, LATERAL TABLE(ML_PREDICT('', column1, column2)); ``` ```sql # Example command to create a connection for AWS Bedrock. CREATE CONNECTION bedrock-connection WITH ( 'type' = 'bedrock', 'endpoint' = 'https://bedrock-runtime.us-west-2.amazonaws.com/model/amazon.titan-embed-text-v1/invoke', 'aws-access-key' = '', 'aws-secret-key' = '', 'aws-session-token' = '' ); ``` ```sql CREATE MODEL sentimentmodel INPUT(text STRING) OUTPUT(sentiment STRING) COMMENT 'sentiment analysis model' WITH ( 'provider' = 'openai', 'task' = 'classification', 'openai.connection' = '', 'openai.model_version' = 'gpt-3.5-turbo', 'openai.system_prompt' = 'Analyze the sentiment of the text and return only POSITIVE, NEGATIVE, or NEUTRAL.' ); ``` ```sql CREATE MODEL clusteringmodel INPUT (vectors ARRAY, other_feature INT, other_feature2 STRING) OUTPUT (cluster_num INT) WITH ( 'task' = 'clustering', 'provider' = 'azureml', 'azureml.connection' = '' ); ``` ```sql CREATE MODEL embeddingmodel INPUT (text STRING) OUTPUT (embedding ARRAY) WITH ( 'task' = 'embedding', 'provider' = 'bedrock', 'bedrock.connection' = '' ); ``` ```sql CREATE MODEL translatemodel INPUT(english STRING) OUTPUT(spanish STRING) COMMENT 'spanish translation model' WITH ( 'provider' = 'openai', 'task' = 'text_generation', 'openai.connection' = '', 'openai.model_version' = 'gpt-3.5-turbo', 'openai.system_prompt' = 'Translate to spanish' ); ``` ```sql CREATE MODEL `my-model` ... -- Output `my-model` with version 1 created. Default version: 1 CREATE MODEL `my-model` ... -- Output `my-model` with version 2 created. Default version: 1 ``` ```sql ALTER MODEL SET ('default_version'=''); ``` ```sql $ ``` ```sql -- Use version 2 of the model. SELECT * FROM `my-table` LATERAL TABLE (ML_PREDICT('my-model$2', col1, col2)); -- Use the default version of the model. SELECT * FROM `my-table` LATERAL TABLE (ML_PREDICT('my-model', col1, col2)); ``` ```sql $ ``` ```sql -- Delete a specific version of the model. DROP MODEL `$`; -- Delete all versions and the model. DROP MODEL `$all`; ``` ```sql ALTER MODEL `$` SET ('k1'='v1', 'k2'='v2'); ``` ```sql openai-connection ``` ```sql CREATE CONNECTION openai-connection WITH ( 'type' = 'openai', 'endpoint' = 'https://api.openai.com/v1/chat/completions', 'api-key' = '' ); ``` ```sql openai-connection ``` ```sql 'openai.connection' = 'openai-connection' ``` ```sql input_format ``` ```sql input_format ``` ```sql output_format ``` ```sql input_format ``` ```sql OPENAI-CHAT ``` ```sql gpt-3.5-turbo ``` ```sql OPENAI-CHAT ``` ```sql gpt-3.5-turbo ``` ```sql AZUREML-PANDAS-DATAFRAME ``` ```sql OPENAI-CHAT ``` ```sql input_format ``` ```sql AMAZON-TITAN-TEXT ``` ```sql GEMINI-GENERATE ``` ```sql enable_explanations ``` ```sql Content-type ``` ```sql input_format ``` ```sql output_format ``` ```sql GEMINI-GENERATE ``` ```sql '{provider}.params.some_key'='value' ``` ```sql {"some_key": "value"} ``` ```sql { "column1": "String Data", "column2": [1,2,3,4] } ``` ```sql [1,2,3,"String Data"] ``` ```sql { "wrapper": { "column1": "String Data", "column2: [1,2,3,4] } } ``` --- ### SQL CREATE TABLE Statement in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/statements/create-table.html CREATE TABLE Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables creating tables backed by Apache Kafka® topics by using the CREATE TABLE statement. With Flink tables, you can run SQL queries on streaming data in Kafka topics. Syntax¶ CREATE TABLE [IF NOT EXISTS] [catalog_name.][db_name.]table_name ( { | | | }[ , ...n] [ ] [ ][ , ...n] ) [COMMENT table_comment] [DISTRIBUTED BY (distribution_column_name1, distribution_column_name2, ...) INTO n BUCKETS] WITH (key1=value1, key2=value2, ...) [ LIKE source_table [( )] | AS select_query ] : column_name column_type [ ] [COMMENT column_comment] : column_name column_type METADATA [ FROM metadata_key ] [ VIRTUAL ] : column_name AS computed_column_expression [COMMENT column_comment] column_name column_type : WATERMARK FOR rowtime_column_name AS watermark_strategy_expression : [CONSTRAINT constraint_name] PRIMARY KEY (column_name, ...) NOT ENFORCED : { { INCLUDING | EXCLUDING } { ALL | CONSTRAINTS | PARTITIONS } | { INCLUDING | EXCLUDING | OVERWRITING } { GENERATED | OPTIONS | WATERMARKS } } Description¶ Register a table into the current or specified catalog. When a table is registered, you can use it in SQL queries. The CREATE TABLE statement always creates a backing Kafka topic as well as the corresponding schema subjects for key and value. Trying to create a table with a name that exists in the catalog causes an exception. The table name can be in these formats: catalog_name.db_name.table_name: The table is registered with the catalog named “catalog_name” and the database named “db_name”. db_name.table_name: The table is registered into the current catalog of the execution table environment and the database named “db_name”. table_name: The table is registered into the current catalog and the database of the execution table environment. A table registered with the CREATE TABLE statement can be used as both table source and table sink. Flink can’t determine whether the table is used as a source or a sink until it’s referenced in a DML query. The following sections show the options and clauses that are available with the CREATE TABLE statement. Physical / Regular Columns Metadata columns Computed columns System columns Watermark clause PRIMARY KEY constraint DISTRIBUTED BY clause CREATE TABLE AS SELECT (CTAS) LIKE WITH options Usage¶ This following CREATE TABLE statement registers a table named t1 in the current catalog. Also, it creates a backing Kafka topic and corresponding value-schema. By default, the table is registered as append-only, uses AVRO serializers, and reads from the earliest offset. CREATE TABLE t1 ( `id` BIGINT, `name` STRING, `age` INT, `salary` DECIMAL(10,2), `active` BOOLEAN, `created_at` TIMESTAMP_LTZ(3) ); You can override defaults by specifying WITH options. The following SQL registers the table in retraction mode, so you can use the table to sink the results of a streaming join. CREATE TABLE t2 ( `id` BIGINT, `name` STRING, `age` INT, `salary` DECIMAL(10,2), `active` BOOLEAN, `created_at` TIMESTAMP_LTZ(3) ) WITH ( 'changelog.mode' = 'retract' ); Physical / Regular Columns¶ Physical or regular columns are the columns that define the structure of the table and the data types of its fields. Each physical column is defined by a name and a data type, and optionally, a column constraint. You can use the column constraint to specify additional properties of the column, such as whether it is a unique key. ExampleThe following SQL shows how to declare physical columns of various types in a table named t1. For available column types, see Data Types. CREATE TABLE t1 ( `id` BIGINT, `name` STRING, `age` INT, `salary` DECIMAL(10,2), `active` BOOLEAN, `created_at` TIMESTAMP_LTZ(3) ); Metadata columns¶ You can access the following table metadata as metadata columns in a table definition. Available metadata leader-epoch offset partition raw-key raw-value timestamp timestamp-type topic Use the METADATA keyword to declare a metadata column. Metadata fields are readable or readable/writable. Read-only columns must be declared VIRTUAL to exclude them during INSERT INTO operations. Metadata columns are not registered in Schema Registry. ExampleThe following CREATE TABLE statement shows the syntax for exposing metadata fields. CREATE TABLE t ( `user_id` BIGINT, `item_id` BIGINT, `behavior` STRING, `event_time` TIMESTAMP_LTZ(3) METADATA FROM 'timestamp', `partition` BIGINT METADATA VIRTUAL, `offset` BIGINT METADATA VIRTUAL ); Available metadata¶ headers¶ Type: MAP NOT NULL Access: readable/writable Headers of the Kafka record as a map of raw bytes. leader-epoch¶ Type: INT NULL Access: readable Leader epoch of the Kafka record, if available. offset¶ Type: BIGINT NOT NULL Access: readable Offset of the Kafka record in the partition. partition¶ Type: INT NOT NULL Access: readable Partition ID of the Kafka record. raw-key¶ Type: BYTES NOT NULL Access: readable The unique identifier or key of the Kafka record as raw bytes. The type may vary based on the serializer used, for example, STRING for StringSerializer. raw-value¶ Type: BYTES NOT NULL Access: readable The actual message content or payload of the Kafka record as raw bytes. Contains the main data being transmitted. The type may vary based on the serializer used, for example, STRING for StringSerializer. timestamp¶ Type: TIMESTAMP_LTZ(3) NOT NULL Access: readable/writable Timestamp of the Kafka record. With timestamp, you can pass event time end-to-end. Otherwise, the sink uses the ingestion time by default. timestamp-type¶ Type: STRING NOT NULL Access: readable Timestamp type of the Kafka record. Valid values are: “NoTimestampType” “CreateTime” (also set when writing metadata) “LogAppendTime” topic¶ Type: STRING NOT NULL Access: readable Topic name of the Kafka record. Computed columns¶ Computed columns are virtual columns that are not stored in the table but are computed on the fly based on the values of other columns. These virtual columns are not registered in Schema Registry. A computed column is defined by using an expression that references one or more physical or metadata columns in the table. The expression can use arithmetic operators, functions, and other SQL constructs to manipulate the values of the physical and metadata columns and compute the value of the computed column. ExampleThe following CREATE TABLE statement shows the syntax for declaring a full_name computed column by concatenating a first_name column and a last_name column. CREATE TABLE t ( `id` BIGINT, `first_name` STRING, `last_name` STRING, `full_name` AS CONCAT(first_name, ' ', last_name) ); Vector database columns¶ Confluent Cloud for Apache Flink supports read-only external tables to enable search with federated query execution on external vector databases, like MongoDB, Pinecone, and ElasticSearch. Note Vector Search is an Open Preview feature in Confluent Cloud. A Preview feature is a Confluent Cloud component that is being introduced to gain early feedback from developers. Preview features can be used for evaluation and non-production testing purposes or to provide feedback to Confluent. The warranty, SLA, and Support Services provisions of your agreement with Confluent do not apply to Preview features. Confluent may discontinue providing preview releases of the Preview features at any time in Confluent’s’ sole discretion. For more information, see Vector Search. System columns¶ Confluent Cloud for Apache Flink introduces system columns for Flink tables. System columns build on the metadata columns. System columns can only be read and are not part of the query-to-sink schema. System columns aren’t selected in a SELECT * statement, and they’re not shown in DESCRIBE or SHOW CREATE TABLE statements. The result from the DESCRIBE EXTENDED statement does include system columns. Both inferred and manual tables are provisioned with a set of default system columns. $rowtime¶ Currently, $rowtime TIMESTAMP_LTZ(3) NOT NULL is provided as a system column. You can use the $rowtime system column to get the timestamp from a Kafka record, because $rowtime is exactly the Kafka record timestamp. If you want to write out $rowtime, you must use the timestamp metadata key. PRIMARY KEY constraint¶ A primary key constraint is a hint for Flink SQL to leverage for optimizations which specifies that a column or a set of columns in a table or a view are unique and they do not contain null. A primary key uniquely identifies a row in a table. No columns in a primary key can be nullable. You can declare a primary key constraint together with a column definition (a column constraint) or as a single line (a table constraint). In both cases, it must be declared as a singleton. If you define more than one primary key constraint in the same statement, Flink SQL throws an exception. The SQL standard specifies that a constraint can be ENFORCED or NOT ENFORCED, which controls whether the constraint checks are performed on the incoming/outgoing data. Flink SQL doesn’t own the data, so the only mode it supports is NOT ENFORCED. It’s your responsibility to ensure that the query enforces key integrity. Flink SQL assumes correctness of the primary key by assuming that the column’s nullability is aligned with the columns in primary key. Connectors must ensure that these are aligned. The PRIMARY KEY constraint distributes the table implicitly by the key column. A Kafka message key is defined either by an implicit DISTRIBUTED BY clause clause from a PRIMARY KEY constraint or an explicit DISTRIBUTED BY. Note In a CREATE TABLE statement, a primary key constraint alters the column’s nullability, which means that a column with a primary key constraint isn’t nullable. ExampleThe following SQL statement creates a table named latest_page_per_ip with a primary key defined on ip. This statement creates a Kafka topic, a value-schema, and a key-schema. The value-schema contains the definitions for page_url and ts, while the key-schema contains the definition for ip. CREATE TABLE latest_page_per_ip ( `ip` STRING, `page_url` STRING, `ts` TIMESTAMP_LTZ(3), PRIMARY KEY(`ip`) NOT ENFORCED ); DISTRIBUTED BY clause¶ The DISTRIBUTED BY clause buckets the created table by the specified columns. Bucketing enables a file-like structure with a small, human-enumerable key space. It groups rows that have “infinite” key space, like user_id, usually by using a hash function, for example: bucket = hash(user_id) % number_of_buckets Kafka partitions map 1:1 to SQL buckets. The n BUCKETS are used for the number of partitions when creating a topic. If n is not defined, the default is 6. The number of buckets is fixed. A bucket is identifiable regardless of partition. Bucketing is good in long-term storage for reading across partitions based on a large key space, for example, user_id. Also, bucketing is good for short-term storage for load balancing. Every mode comes with a default distribution, so DISTRIBUTED BY is required only by power users. In most cases, a simple CREATE TABLE t (schema); is sufficient. For upsert mode, the bucket key must be equal to primary key. For append/retract mode, the bucket key can be a subset of the primary key. The bucket key can be undefined, which corresponds to a “connector defined” distribution: round robin for append, and hash-by-row for retract. Custom distributions are possible, but currently only custom hash distributions are supported. ExampleThe following SQL declares a table named t_dist that has one key column named k and 4 Kafka partitions. CREATE TABLE t_dist (k INT, s STRING) DISTRIBUTED BY (k) INTO 4 BUCKETS; PARTITIONED BY clause¶ Deprecated Use the DISTRIBUTED BY clause instead. The PARTITIONED BY clause partitions the created table by the specified columns. Use PARTITIONED BY to declare key columns in a table explicitly. A Kafka message key is defined either by an explicit PARTITIONED BY clause or an implicit PARTITIONED BY clause from a PRIMARY KEY constraint. If compaction is enabled, the Kafka message key is overloaded with another semantic used for compaction, which influences constraints on the Kafka message key for partitioning. ExampleThe following SQL declares a table named t that has one key column named key of type INT. CREATE TABLE t (partition_key INT, example_value STRING) PARTITIONED BY (partition_key); Watermark clause¶ The WATERMARK clause defines the event-time attributes of a table. A watermark in Flink is used to track the progress of event time and provide a way to trigger time-based operations. Default watermark strategy¶ Confluent Cloud for Apache Flink provides a default watermark strategy for all tables, whether created automatically from a Kafka topic or from a CREATE TABLE statement. The default watermark strategy is applied on the $rowtime system column. Watermarks are calculated per Kafka partition, and at least 250 events are required per partition. If a delay of longer than 7 days can occur, choose a custom watermark strategy. Because the concrete implementation is provided by Confluent, you see only WATERMARK FOR $rowtime AS SOURCE_WATERMARK() in the declaration. Custom watermark strategies¶ You can replace the default strategy with a custom strategy at any time by using ALTER TABLE. Watermark strategy reference¶ WATERMARK FOR rowtime_column_name AS watermark_strategy_expression The rowtime_column_name defines an existing column that is marked as the event-time attribute of the table. The column must be of type TIMESTAMP(3), and it must be a top-level column in the schema. The watermark_strategy_expression defines the watermark generation strategy. It allows arbitrary non-query expressions, including computed columns, to calculate the watermark. The expression return type must be TIMESTAMP(3), which represents the timestamp since the Unix Epoch. The returned watermark is emitted only if it’s non-null and its value is larger than the previously emitted local watermark, to respect the contract of ascending watermarks. The watermark generation expression is evaluated by Flink SQL for every record. The framework emits the largest generated watermark periodically. No new watermark is emitted if any of the following conditions apply. The current watermark is null. The current watermark is identical to the previous watermark. The value of the returned watermark is smaller than the value of the last emitted watermark. When you use event-time semantics, your tables must contain an event-time attribute and watermarking strategy. Flink SQL provides these watermark strategies. Strictly ascending timestamps: Emit a watermark of the maximum observed timestamp so far. Rows that have a timestamp larger than the max timestamp are not late. WATERMARK FOR rowtime_column AS rowtime_column Ascending timestamps: Emit a watermark of the maximum observed timestamp so far, minus 1. Rows that have a timestamp larger than or equal to the max timestamp are not late. WATERMARK FOR rowtime_column AS rowtime_column - INTERVAL '0.001' SECOND Bounded out-of-orderness timestamps: Emit watermarks which are the maximum observed timestamp minus the specified delay. WATERMARK FOR rowtime_column AS rowtime_column - INTERVAL 'string' timeUnit The following example shows a “5-seconds delayed” watermark strategy. WATERMARK FOR rowtime_column AS rowtime_column - INTERVAL '5' SECOND ExampleThe following CREATE TABLE statement defines an orders table that has a rowtime column named order_time and a watermark strategy with a 5-second delay. CREATE TABLE orders ( `user` BIGINT, `product` STRING, `order_time` TIMESTAMP(3), WATERMARK FOR `order_time` AS `order_time` - INTERVAL '5' SECOND ); Progressive idleness detection¶ When a source does not receive any elements for a timeout time, which is specified by the sql.tables.scan.idle-timeout property, the source is marked as temporarily idle. This enables each downstream task to advance its watermark without the need to wait for watermarks from this source while it’s idle. By default, Confluent Cloud for Apache Flink has progressive idleness detection that starts with an idle-timeout of 15 seconds, and increases to a maximum of 5 minutes over time. You can disable idleness detection by setting the sql.tables.scan.idle-timeout property to 0, or you can set a fixed idleness timeout with your desired value. When idleness detection is disabled, a single idle partition on any of the sources causes the watermarks to stop advancing. In turn, this causes operations that rely on watermarks to stop producing results. On the other hand, with idleness detection enabled, with either progressive idleness or a fixed value, the watermark advances unless all partitions of all sources are idle. For more information, see the video, How to Set Idle Timeouts. CREATE TABLE AS SELECT (CTAS)¶ Tables can also be created and populated by the results of a query in one create-table-as-select (CTAS) statement. CTAS is the simplest and fastest way to create and insert data into a table with a single command. The CTAS statement consists of two parts: The SELECT part can be any SELECT query supported by Flink SQL. The CREATE part takes the resulting schema from the SELECT part and creates the target table. The following two code examples are equivalent. -- Equivalent to the following CREATE TABLE and INSERT INTO statements. CREATE TABLE my_ctas_table AS SELECT id, name, age FROM source_table WHERE mod(id, 10) = 0; -- These two statements are equivalent to the preceding CREATE TABLE AS statement. CREATE TABLE my_ctas_table ( id BIGINT, name STRING, age INT ); INSERT INTO my_ctas_table SELECT id, name, age FROM source_table WHERE mod(id, 10) = 0; Similar to CREATE TABLE, CTAS requires all options of the target table to be specified in the WITH clause. The syntax is CREATE TABLE t WITH (…) AS SELECT …, for example: CREATE TABLE t WITH ('scan.startup.mode' = 'latest-offset') AS SELECT * FROM b; Specifying explicit columns¶ The CREATE part enables you to specify explicit columns. The resulting table schema contains the columns defined in the CREATE part first, followed by the columns from the SELECT part. Columns named in both parts retain the same column position as defined in the SELECT part. You can also override the data type of SELECT columns if you specify it in the CREATE part. CREATE TABLE my_ctas_table ( desc STRING, quantity DOUBLE, cost AS price * quantity, WATERMARK FOR order_time AS order_time - INTERVAL '5' SECOND, ) AS SELECT id, price, quantity, order_time FROM source_table; Primary keys and distribution strategies¶ The CREATE part enable you to specify primary keys and distribution strategies. Primary keys work only on NOT NULL columns. Currently, primary keys only allow you to define columns from the SELECT part, which may be NOT NULL. The following two code examples are equivalent. -- Equivalent to the following CREATE TABLE and INSERT INTO statements. CREATE TABLE my_ctas_table ( PRIMARY KEY (id) NOT ENFORCED ) DISTRIBUTED BY HASH(id) INTO 4 BUCKETS AS SELECT id, name FROM source_table; -- These two statements are equivalent to the preceding CREATE TABLE AS statement. CREATE TABLE my_ctas_table ( id BIGINT NOT NULL PRIMARY KEY NOT ENFORCED, name STRING ) DISTRIBUTED BY HASH(id) INTO 4 BUCKETS; INSERT INTO my_ctas_table SELECT id, name FROM source_table; LIKE¶ The CREATE TABLE LIKE clause enables creating a new table with the same schema as an existing table. It is a combination of SQL features and can be used to extend or exclude certain parts of the original table. The clause must be defined at the top-level of a CREATE statement and applies to multiple parts of the table definition. Use the LIKE options to control the merging logic of table features. You can control the merging behavior of: CONSTRAINTS - Constraints such as primary key. and unique keys. GENERATED - Computed columns. METADATA - Metadata columns. OPTIONS - Table options. PARTITIONS - Partition options. WATERMARKS - Watermark strategies. with three different merging strategies: INCLUDING - Includes the feature of the source table and fails on duplicate entries, for example, if an option with the same key exists in both tables. EXCLUDING - Does not include the given feature of the source table. OVERWRITING - Includes the feature of the source table, overwrites duplicate entries of the source table with properties of the new table. For example, if an option with the same key exists in both tables, the option from the current statement is used. Additionally, you can use the INCLUDING/EXCLUDING ALL option to specify what should be the strategy if no specific strategy is defined. For example, if you use EXCLUDING ALL INCLUDING WATERMARKS, only the watermarks are included from the source table. If you provide no LIKE options, INCLUDING ALL OVERWRITING OPTIONS is used as a default. Example¶ The following CREATE TABLE statement defines a table named t that has 5 physical columns and three metadata columns. CREATE TABLE t ( `user_id` BIGINT, `item_id` BIGINT, `price` DOUBLE, `behavior` STRING, `created_at` TIMESTAMP(3), `price_with_tax` AS `price` * 1.19, `event_time` TIMESTAMP_LTZ(3) METADATA FROM 'timestamp', `partition` BIGINT METADATA VIRTUAL, `offset` BIGINT METADATA VIRTUAL ); You can run the following CREATE TABLE LIKE statement to define table t_derived, which contains the physical and computed columns of t, drops the metadata and default watermark strategy, and applies a custom watermark strategy on event_time. CREATE TABLE t_derived ( WATERMARK FOR `created_at` AS `created_at` - INTERVAL '5' SECOND ) LIKE t ( EXCLUDING WATERMARKS EXCLUDING METADATA ); WITH options¶ Table properties used to create a table source or sink. Both the key and value of the expression key1=val1 are string literals. You can change an existing table’s property values by using the ALTER TABLE Statement in Confluent Cloud for Apache Flink. You can set the following properties when you create a table. changelog.mode error-handling.log.target error-handling.mode kafka.cleanup-policy kafka.max-message-size kafka.retention.size kafka.retention.time key.fields-prefix key.format key.format.schema-context scan.bounded.mode scan.bounded.timestamp-millis scan.startup.mode value.fields-include value.format value.format.schema-context changelog.mode¶ Set the changelog mode of the connector. For more information on changelog modes, see dynamic tables. 'changelog.mode' = [append | upsert | retract] These are the changelog modes for an inferred table: append (if uncompacted and not a Debezium envelope) upsert (if compacted) retract (if a Debezium envelope is detected and uncompacted) These are the changelog modes for a manually created table: append retract upsert Primary key interaction¶ With a primary key declared, the changelog modes have these properties: append means that every row can be treated as an independent fact. retract means that the combination of +U and -U are related and must be partitioned together. upsert means that all rows with same primary key are related and must be partitioned together To build indices, primary keys must be partitioned together. Encoding of changes Default Partitioning without PK Default Partitioning with PK Custom Partitioning without PK Custom Partitioning with PK Each value is an insertion (+I). round robin hash by PK hash by specified column(s) hash by subset of PK A special op header represents the change (+I, -U, +U, -D). The header is omitted for insertions. Append queries encoding is the same for all modes. hash by entire value hash by PK hash by specified column(s) hash by subset of PK If value is null, it represents a deletion (-D). Other values are +U and the engine will normalize the changelog internally. unsupported, PK is mandatory hash by PK unsupported, PK is mandatory unsupported Change type header¶ Changes for an updating table have the change type encoded in the Kafka record as a special op header that represents the change (+I, -U, +U, -D). The value of the op header, if present, represents the kind of change that a row can describe in a changelog: 0: represents INSERT (+I), an insertion operation. 1: represents UPDATE_BEFORE (-U), an update operation with the previous content of the updated row. 2: represents UPDATE_AFTER (+U), an update operation with new content for the updated row. 3: represents DELETE (-D), a deletion operation. The default is 0. For more information, see Changelog entries. error-handling.log.target¶ Type: string Default: error_log 'error-handling.log.target' = '' Specify the destination Dead Letter Queue (DLQ) table for error logs when error-handling.mode is set to log. If error-handling.log.target isn’t set, the default is error_log. If the DLQ table doesn’t exist and can’t be created, the job fails. The principal running the CREATE TABLE or ALTER TABLE statement must have permissions to create the DLQ topic and schema. If permissions are missing, the statement fails. If a principal runs a SELECT or any other query, it needs permissions to write into the defined DLQ table. If permissions are missing, the statement fails. For more information, see Grant Role-Based Access in Confluent Cloud for Apache Flink. error-handling.mode¶ Type: enum Default: fail 'error-handling.mode' = [fail | ignore | log] Control how Flink handles deserialization errors for a table. The following values are supported. fail: The statement fails on error (default). ignore: The error is skipped and processing continues. log: The error is logged to a Dead Letter Queue (DLQ) table and processing continues. When a statement reads from the table, for example, SELECT * FROM my_table, and a deserialization error occurs, as with a poison pill, Flink handles the error based on the error-handling.mode setting. fail: Flink fails the statement. ignore: Flink ignores the error and continues processing with the next row. log: Flink sends the poison pill to the DLQ table and continues processing with the next row. All Flink tables receive the error-handling.mode setting. If you don’t specify a value, the default is fail. You can override the setting for an existing table by using the ALTER TABLE statement. Only table-level overrides are supported. Per-statement overrides are not supported. The following limitations apply: Only deserialization errors at the source are supported. Errors outside the source, for example, in windowed aggregations, are not handled. kafka.cleanup-policy¶ Type: enum Default: delete 'kafka.cleanup-policy' = [delete | compact | delete-compact] Set the default cleanup policy for Kafka topic log segments beyond the retention window. Translates to the Kafka log.cleanup.policy property. For more information, see Log Compaction. compact: topic log is compacted periodically in the background by the log cleaner. delete: old log segments are discarded when their retention time or size limit is reached. delete-compact: compact the log and follow the retention time or size limit settings. kafka.consumer.isolation-level¶ Type: enum Default: read-committed 'kafka.consumer.isolation-level' = [read-committed | read-uncommitted] Controls which transactional messages to read: read-committed: Only return messages from committed transactions. Any transactional messages from aborted or in-progress transactions are filtered out. read-uncommitted: Return all messages, including those from transactional messages that were aborted or are still in progress. For more information, see delivery guarantees and latency. kafka.max-message-size¶ 'kafka.max-message-size' = MemorySize Translates to the Kafka max.message.bytes property. The default is 2097164 bytes. kafka.producer.compression.type¶ Type: enum Default: none 'kafka.producer.compression.type' = [none | gzip | snappy | lz4 | zstd] Translates to the Kafka compression.type property. kafka.retention.size¶ Type: Integer Default: 0 'kafka.retention.size' = MemorySize Translates to the Kafka log.retention.bytes property. kafka.retention.time¶ Type: Duration Default: 7 days 'kafka.retention.time' = '' Translates to the Kafka log.retention.ms property. key.fields-prefix¶ Type: String Default: “” Specify a custom prefix for all fields of the key format. 'key.fields-prefix' = '' The key.fields-prefix property defines a custom prefix for all fields of the key format, which avoids name clashes with fields of the value format. By default, the prefix is empty. If a custom prefix is defined, the table schema property works with prefixed names. When constructing the data type of the key format, the prefix is removed, and the non-prefixed names are used within the key format. This option requires that the value.fields-include property is set to EXCEPT_KEY. The prefix for an inferred table is key_, for non-atomic Schema Registry types and fields that have a name. key.format¶ Type: String Default: “avro-registry” Specify the serialization format of the table’s key fields. 'key.format' = '' These are the key formats for an inferred table: raw (if no Schema Registry entry) avro-registry (for AVRO Schema Registry entry) json-registry (for JSON Schema Registry entry) proto-registry (for Protobuf Schema Registry entry) These are the key formats for a manually created table: avro-registry (for Avro Schema Registry entry) json-registry (for JSON Schema Registry entry) proto-registry (for Protobuf Schema Registry entry) If no format is specified, Avro Schema Registry is used by default. This applies only if a primary or distribution key is defined. The Schema Registry subject compatibility mode must be FULL or FULL_TRANSITIVE. For more information, see Schema Evolution and Compatibility for Schema Registry on Confluent Cloud. key.format.schema-context¶ Type: String Default: (none) Specify the Confluent Schema Registry Schema Context for the key format. 'key..schema-context' = '' Similar to value.format.schema-context, this option enables you to specify a schema context for the key format. It provides an independent scope in Schema Registry for key schemas. scan.bounded.mode¶ Type: Enum Default: unbounded Specify the bounded mode for the Kafka consumer. scan.bounded.mode = [latest-offset | timestamp | unbounded] The following list shows the valid bounded mode values. latest-offset: bounded by latest offsets. This is evaluated at the start of consumption from a given partition. timestamp: bounded by a user-supplied timestamp. unbounded: table is unbounded. If scan.bounded.mode isn’t set, the default is an unbounded table. For more information, see Bounded and unbounded tables. If timestamp is specified, the scan.bounded.timestamp-millis config option is required to specify a specific bounded timestamp in milliseconds since the Unix epoch, January 1, 1970 00:00:00.000 GMT. scan.bounded.timestamp-millis¶ Type: Long Default: (none) End at the specified epoch timestamp (milliseconds) when the timestamp bounded mode is set in the scan.bounded.mode property. 'scan.bounded.mode' = 'timestamp', 'scan.bounded.timestamp-millis' = '' scan.startup.mode¶ Type: Enum Default: earliest-offset The startup mode for Kafka consumers. 'scan.startup.mode' = '' The following list shows the valid startup mode values. earliest-offset: start from the earliest offset possible. latest-offset: start from the latest offset. timestamp: start from the user-supplied timestamp for each partition. The default is earliest-offset. This differs from the default in Apache Flink, which is group-offsets. If timestamp is specified, the scan.startup.timestamp-millis config option is required, to define a specific startup timestamp in milliseconds since the Unix epoch, January 1, 1970 00:00:00.000 GMT. scan.startup.timestamp-millis¶ Type: Long Default: (none) Start from the specified Unix epoch timestamp (milliseconds) when the timestamp mode is set in the scan.startup.mode property. 'scan.startup.mode' = 'timestamp', 'scan.startup.timestamp-millis' = '' value.fields-include¶ Type: Enum Default: except-key Specify a strategy for handling key columns in the data type of the value format. 'value.fields-include' = [all, except-key] If all is specified, all physical columns of the table schema are included in the value format, which means that key columns appear in the data type for both the key and value format. value.format¶ Type: String Default: “avro-registry” Specify the format for serializing and deserializing the value part of Kafka messages. 'value.format' = '' These are the value formats for an inferred table: raw (if no Schema Registry entry) avro-registry (for Avro Schema Registry entry) json-registry (for JSON Schema Registry entry) proto-registry (for Protobuf Schema Registry entry) avro-debezium-registry (for Avro Debezium Schema Registry entry) json-debezium-registry (for JSON Debezium Schema Registry entry) proto-debezium-registry (for Protobuf Debezium Schema Registry entry) These are the value formats for a manually created table: avro-registry (for Avro Schema Registry entry) json-registry (for JSON Schema Registry entry) proto-registry (for Protobuf Schema Registry entry) If no format is specified, Avro Schema Registry is used by default. value.format.schema-context¶ Type: String Default: (none) Specify the Confluent Schema Registry Schema Context for the value format. 'value..schema-context' = '' A schema context represents an independent scope in Schema Registry and can be used to create separate “sub-registries” within one Schema Registry. Each schema context is an independent grouping of schema IDs and subject names, enabling the same schema ID in different contexts to represent completely different schemas. Inferred tables¶ Inferred tables are tables that have not been created by using a CREATE TABLE statement, but instead are automatically detected from information about existing Kafka topics and Schema Registry entries. You can use the ALTER TABLE statement to evolve schemas for inferred tables. The following examples show output from the SHOW CREATE TABLE statement called on the resulting table. No key or value in Schema Registry¶ For an inferred table with no registered key or value schemas, SHOW CREATE TABLE returns the following output: CREATE TABLE `t_raw` ( `key` VARBINARY(2147483647), `val` VARBINARY(2147483647) ) DISTRIBUTED BY HASH(`key`) INTO 2 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'raw' ... ) Properties Key and value formats are raw (binary format) with BYTES. Following Kafka message semantics, both key and value support NULL as well, so the following code is valid: INSERT INTO t_raw (key, val) SELECT CAST(NULL AS BYTES), CAST(NULL AS BYTES); No key and but record value in Schema Registry¶ For the following value schema in Schema Registry: { "type": "record", "name": "TestRecord", "fields": [ { "name": "i", "type": "int" }, { "name": "s", "type": "string" } ] } SHOW CREATE TABLE returns the following output: CREATE TABLE `t_raw_key` ( `key` VARBINARY(2147483647), `i` INT NOT NULL, `s` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'avro-registry' ... ) Properties The key format is raw (binary format) with BYTES. Following Kafka message semantics, the key supports NULL as well, so the following code is valid: INSERT INTO t_raw_key SELECT CAST(NULL AS BYTES), 12, 'Bob'; Atomic key and record value in Schema Registry¶ For the following key schema in Schema Registry: "int" And for the following value schema in Schema Registry: { "type": "record", "name": "TestRecord", "fields": [ { "name": "i", "type": "int" }, { "name": "s", "type": "string" } ] } SHOW CREATE TABLE returns the following output: CREATE TABLE `t_atomic_key` ( `key` INT NOT NULL, `i` INT NOT NULL, `s` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`key`) INTO 2 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'avro-registry', 'value.format' = 'avro-registry' ... ) Properties Schema Registry defines the column data type as INT NOT NULL. The column name, key, is used as the default, because Schema Registry doesn’t provide a column name. Overlapping names in key/value, no key in Schema Registry¶ For the following value schema in Schema Registry: { "type": "record", "name": "TestRecord", "fields": [ { "name": "i", "type": "int" }, { "name": "key", "type": "string" } ] } SHOW CREATE TABLE returns the following output: CREATE TABLE `t_raw_disjoint` ( `key_key` VARBINARY(2147483647), `i` INT NOT NULL, `key` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`key_key`) INTO 1 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.fields-prefix' = 'key_', 'key.format' = 'raw', 'value.format' = 'avro-registry' ... ) Properties The Schema Registry value schema defines columns i INT NOT NULL and key STRING. The column name key BYTES is used as the default if no key is in Schema Registry. Because key would collide with value schema column, the key_ prefix is added. Record key and record value in Schema Registry¶ For the following key schema in Schema Registry: { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" } ] } And for the following value schema in Schema Registry: { "type": "record", "name": "TestRecord", "fields": [ { "name": "name", "type": "string" }, { "name": "zip_code", "type": "string" } ] } SHOW CREATE TABLE returns the following output: CREATE TABLE `t_sr_disjoint` ( `uid` INT NOT NULL, `name` VARCHAR(2147483647) NOT NULL, `zip_code` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`uid`) INTO 1 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'value.format' = 'avro-registry' ... ) Properties Schema Registry defines columns for both key and value. The column names of key and value are disjoint sets and don’t overlap. Record key and record value with overlap in Schema Registry¶ For the following key schema in Schema Registry: { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" } ] } And for the following value schema in Schema Registry: { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" },{ "name": "name", "type": "string" }, { "name": "zip_code", "type": "string" } ] } SHOW CREATE TABLE returns the following output: CREATE TABLE `t_sr_joint` ( `uid` INT NOT NULL, `name` VARCHAR(2147483647) NOT NULL, `zip_code` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`uid`) INTO 1 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'value.fields-include' = 'all', 'value.format' = 'avro-registry' ... ) Properties Schema Registry defines columns for both key and value. The column names of key and value overlap on uid. 'value.fields-include' = 'all' is set to exclude the key, because it is fully contained in the value. Detecting that key is fully contained in the value requires that both field name and data type match completely, including nullability, and all fields of the key are included in the value. Union types in Schema Registry¶ For the following value schema in Schema Registry: ["int", "string"] SHOW CREATE TABLE returns the following output: CREATE TABLE `t_union` ( `key` VARBINARY(2147483647), `int` INT, `string` VARCHAR(2147483647) ) ... For the following value schema in Schema Registry: [ "string", { "type": "record", "name": "User", "fields": [ { "name": "uid", "type": "int" },{ "name": "name", "type": "string" } ] }, { "type": "record", "name": "Address", "fields": [ { "name": "zip_code", "type": "string" } ] } ] SHOW CREATE TABLE returns the following output: CREATE TABLE `t_union` ( `key` VARBINARY(2147483647), `string` VARCHAR(2147483647), `User` ROW<`uid` INT NOT NULL, `name` VARCHAR(2147483647) NOT NULL>, `Address` ROW<`zip_code` VARCHAR(2147483647) NOT NULL> ) ... Properties NULL and NOT NULL are inferred depending on whether a union contains NULL. Elements of a union are always NULL, because they need to be set to NULL when a different element is set. If a record defines a namespace, the field is prefixed with it, for example, org.myorg.avro.User. Multi-message protobuf schema in Schema Registry¶ For the following value schema in Schema Registry: syntax = "proto3"; message Purchase { string item = 1; double amount = 2; string customer_id = 3; } message Pageview { string url = 1; bool is_special = 2; string customer_id = 3; } SHOW CREATE TABLE returns the following output: CREATE TABLE `t` ( `key` VARBINARY(2147483647), `Purchase` ROW< `item` VARCHAR(2147483647) NOT NULL, `amount` DOUBLE NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL >, `Pageview` ROW< `url` VARCHAR(2147483647) NOT NULL, `is_special` BOOLEAN NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL > ) ... For the following value schema in Schema Registry: syntax = "proto3"; message Purchase { string item = 1; double amount = 2; string customer_id = 3; Pageview pageview = 4; } message Pageview { string url = 1; bool is_special = 2; string customer_id = 3; } SHOW CREATE TABLE returns the following output: CREATE TABLE `t` ( `key` VARBINARY(2147483647), `Purchase` ROW< `item` VARCHAR(2147483647) NOT NULL, `amount` DOUBLE NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL, `pageview` ROW< `url` VARCHAR(2147483647) NOT NULL, `is_special` BOOLEAN NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL > >, `Pageview` ROW< `url` VARCHAR(2147483647) NOT NULL, `is_special` BOOLEAN NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL > ) ... For the following value schema in Schema Registry: syntax = "proto3"; message Purchase { string item = 1; double amount = 2; string customer_id = 3; Pageview pageview = 4; message Pageview { string url = 1; bool is_special = 2; string customer_id = 3; } } SHOW CREATE TABLE returns the following output: CREATE TABLE `t` ( `key` VARBINARY(2147483647), `item` VARCHAR(2147483647) NOT NULL, `amount` DOUBLE NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL, `pageview` ROW< `url` VARCHAR(2147483647) NOT NULL, `is_special` BOOLEAN NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL > ) ... Debezium CDC format in Schema Registry¶ For a Debezium CDC format with the following value schema in Schema Registry: { "type": "record", "name": "Customer", "namespace": "io.debezium.data", "fields": [ { "name": "before", "type": ["null", { "type": "record", "name": "Value", "fields": [ {"name": "id", "type": "int"}, {"name": "name", "type": "string"}, {"name": "email", "type": "string"} ] }], "default": null }, { "name": "after", "type": ["null", "Value"], "default": null }, { "name": "source", "type": { "type": "record", "name": "Source", "fields": [ {"name": "version", "type": "string"}, {"name": "connector", "type": "string"}, {"name": "name", "type": "string"}, {"name": "ts_ms", "type": "long"}, {"name": "db", "type": "string"}, {"name": "schema", "type": "string"}, {"name": "table", "type": "string"} ] } }, {"name": "op", "type": "string"}, {"name": "ts_ms", "type": ["null", "long"], "default": null}, {"name": "transaction", "type": ["null", { "type": "record", "name": "Transaction", "fields": [ {"name": "id", "type": "string"}, {"name": "total_order", "type": "long"}, {"name": "data_collection_order", "type": "long"} ] }], "default": null} ] } SHOW CREATE TABLE returns the following output: CREATE TABLE `customer_changes` ( `key` VARBINARY(2147483647), `id` INT NOT NULL, `name` VARCHAR(2147483647) NOT NULL, `email` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( 'changelog.mode' = 'retract', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'avro-debezium-registry' ... ) Properties Flink detects the Debezium format automatically, based on the schema structure with after, before, and op fields. The table schema is inferred from the after schema, exposing only the actual data fields. Automatic Debezium Envelope Detection: For schemas created after May 19, 2025 at 09:00 UTC, Flink automatically detects Debezium envelopes and sets appropriate defaults: value.format defaults to *-debezium-registry (instead of *-registry) changelog.mode defaults to retract (instead of append) Exception: If Kafka cleanup.policy is compact, changelog.mode is set to upsert The default changelog.mode is retract, which properly handles all CDC operations, including inserts, updates, and deletes. You can manually override the changelog mode if necessary: -- Change to upsert mode for primary key-based operations ALTER TABLE customer_changes SET ('changelog.mode' = 'upsert'); -- Change to append mode (processes only inserts and updates) ALTER TABLE customer_changes SET ('changelog.mode' = 'append'); Examples¶ The following examples show how to create Flink tables for frequently encountered scenarios. Minimal table¶ CREATE TABLE t_minimal (s STRING); Properties Append changelog mode. No Schema Registry key. Round robin distribution. 6 Kafka partitions. The $rowtime column and system watermark are added implicitly. Table with a primary key¶ SyntaxCREATE TABLE t_pk (k INT PRIMARY KEY NOT ENFORCED, s STRING); Properties Upsert changelog mode. The primary key defines an implicit DISTRIBUTED BY(k). k is the Schema Registry key. Hash distribution on k. The table has 6 Kafka partitions. k is declared as being unique, meaning no duplicate rows. k must not contain NULLs, so an implicit NOT NULL is added. The $rowtime column and system watermark are added implicitly. Table with a primary key in append mode¶ SyntaxCREATE TABLE t_pk_append (k INT PRIMARY KEY NOT ENFORCED, s STRING) DISTRIBUTED INTO 4 BUCKETS WITH ('changelog.mode' = 'append'); Properties Append changelog mode. k is the Schema Registry key. Hash distribution on k. The table has 4 Kafka partitions. k is declared as being unique, meaning no duplicate rows. k must not contain NULLs, meaning implicit NOT NULL. The $rowtime column and system watermark are added implicitly. Table with hash distribution¶ SyntaxCREATE TABLE t_dist (k INT, s STRING) DISTRIBUTED BY (k) INTO 4 BUCKETS; Properties Append changelog mode. k is the Schema Registry key. Hash distribution on k. The table has 4 Kafka partitions. The $rowtime column and system watermark are added implicitly. Complex table with all concepts combined¶ SyntaxCREATE TABLE t_complex (k1 INT, k2 INT, PRIMARY KEY (k1, k2) NOT ENFORCED, s STRING) COMMENT 'My complex table' DISTRIBUTED BY HASH(k1) INTO 4 BUCKETS WITH ('changelog.mode' = 'append'); Properties Append changelog mode. k1 is the Schema Registry key. Hash distribution on k1. k2 is treated as a value column and is stored in the value part of Schema Registry. The table has 4 Kafka partitions. k1 and k2 are declared as being unique, meaning no duplicates. k and k2 must not contain NULLs, meaning implicit NOT NULL. The $rowtime column and system watermark are added implicitly. An additional comment is added. Table with overlapping names in key/value of Schema Registry but disjoint data¶ SyntaxCREATE TABLE t_disjoint (from_key_k INT, k STRING) DISTRIBUTED BY (from_key_k) WITH ('key.fields-prefix' = 'from_key_'); Properties Append changelog mode. Hash distribution on from_key_k. The key prefix from_key_ is defined and is stripped before storing the schema in Schema Registry. Therefore, k is the Schema Registry key of type INT. Also, k is the Schema Registry value of type STRING. Both key and value store disjoint data, so they can have different data types Create with overlapping names in key/value of Schema Registry but joint data¶ SyntaxCREATE TABLE t_joint (k INT, v STRING) DISTRIBUTED BY (k) WITH ('value.fields-include' = 'all'); Properties Append changelog mode. Hash distribution on k. By default, the key is never included in the value in Schema Registry. By setting 'value.fields-include' = 'all', the value contains the full table schema Therefore, k is the Schema Registry key. Also, k, v is the Schema Registry value. The payload of k is stored twice in the Kafka message, because key and value store joint data and they have the same data type for k. Table with metadata columns for writing a Kafka message timestamp¶ SyntaxCREATE TABLE t_metadata_write (name STRING, ts TIMESTAMP_LTZ(3) NOT NULL METADATA FROM 'timestamp') DISTRIBUTED INTO 1 BUCKETS; Properties Adds the ts metadata column, which isn’t part of Schema Registry but instead is a pure Flink concept. In contrast with $rowtime, which is declared as a METADATA VIRTUAL column, ts is selected in a SELECT * statement and is writable. The following examples show how to fill Kafka messages with an instant. INSERT INTO t (ts, name) SELECT NOW(), 'Alice'; INSERT INTO t (ts, name) SELECT TO_TIMESTAMP_LTZ(0, 3), 'Bob'; SELECT $rowtime, * FROM t; The Schema Registry subject compatibility mode must be FULL or FULL_TRANSITIVE. For more information, see Schema Evolution and Compatibility for Schema Registry on Confluent Cloud. Table with string key and value in Schema Registry¶ SyntaxCREATE TABLE t_raw_string_key (key STRING, i INT) DISTRIBUTED BY (key) WITH ('key.format' = 'raw'); Properties Schema Registry is filled with a value subject containing i. The key columns are determined by the DISTRIBUTED BY clause. By default, Avro in Schema Registry would be used for the key, but the WITH clause overrides this to the raw format. Tables with cross-region schema sharing¶ Create two Kafka clusters in different regions, for example, eu-west-1 and us-west-2. Create two Flink compute pools in different regions, for example, eu-west-1 and us-west-2. In the first region, run the following statement. CREATE TABLE t_shared_schema (key STRING, s STRING) DISTRIBUTED BY (key); In the second region, run the same statement. CREATE TABLE t_shared_schema (key STRING, s STRING) DISTRIBUTED BY (key); Properties Schema Registry is shared across regions. The SQL metastore, Flink compute pools, and Kafka clusters are regional. Both tables in either region share the Schema Registry subjects t_shared_schema-key and t_shared_schema-value. Create with different changelog modes¶ There are three ways of storing events in a table’s log, this is, in the underlying Kafka topic. append Every insertion event is an immutable fact. Every event is insert-only. Events can be distributed in a round-robin fashion across workers/shards because they are unrelated. upsert Events are related using a primary key. Every event is either an upsert or delete event for a primary key. Events for the same primary key should land at the same worker/shard. retract Every upsert event is a fact that can be “undone”. This means that every event is either an insertion or its retraction. So, two events are related by all columns. In other words, the entire row is the key. For example, +I['Bob', 42] is related to -D['Bob', 42] and +U['Alice', 13] is related to -U['Alice', 13]. The retract mode is intermediate between the append and upsert modes. The append and upsert modes are natural to existing Kafka consumers and producers. Kafka compaction is a kind of upsert. Start with a table created by the following statement. CREATE TABLE t_changelog_modes (i BIGINT); Properties Confluent Cloud for Apache Flink always derives an appropriate changelog mode for the preceding declaration. If there is no primary key, append is the safest option, because it prevents users from pushing updates into a topic accidentally, and it has the best support of downstream consumers. -- works because the query is non-updating INSERT INTO t_changelog_modes SELECT 1; -- does not work because the query is updating, causing an error INSERT INTO t_changelog_modes SELECT COUNT(*) FROM (VALUES (1), (2), (3)); If you need updates, and if downstream consumers support it, for example, when the consumer is another Flink job, you can set the changelog mode to retract. ALTER TABLE t_changelog_modes SET ('changelog.mode' = 'retract'); Properties The table starts accepting retractions during INSERT INTO. Already existing records in the Kafka topic are treated as insertions. Newly added records receive a changeflag (+I, +U, -U, -D) in the Kafka message header. Going back to append mode is possible, but retractions (-U, -D) appear as insertions, and the Kafka header metadata column reveals the changeflag. ALTER TABLE t_changelog_modes SET ('changelog.mode' = 'append'); ALTER TABLE t_changelog_modes ADD headers MAP METADATA VIRTUAL; -- Shows what is serialized internally SELECT i, headers FROM t_changelog_modes; Table with infinite retention time¶ CREATE TABLE t_infinite_retention (i INT) WITH ('kafka.retention.time' = '0'); Properties By default, the retention time is 7 days, as in all other APIs. Flink doesn’t support -1 for durations, so 0 means infinite retention time. Durations in Flink support 2 day or 2 d syntax, so it doesn’t need to be in milliseconds. If no unit is specified, the unit is milliseconds. The following units are supported: "d", "day", "h", "hour", "m", "min", "minute", "ms", "milli", "millisecond", "µs", "micro", "microsecond", "ns", "nano", "nanosecond" Related content¶ Video: How to Set Idle Timeouts ALTER TABLE statement INSERT INTO FROM SELECT Statement Join Queries Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql CREATE TABLE [IF NOT EXISTS] [catalog_name.][db_name.]table_name ( { | | | }[ , ...n] [ ] [ ][ , ...n] ) [COMMENT table_comment] [DISTRIBUTED BY (distribution_column_name1, distribution_column_name2, ...) INTO n BUCKETS] WITH (key1=value1, key2=value2, ...) [ LIKE source_table [( )] | AS select_query ] : column_name column_type [ ] [COMMENT column_comment] : column_name column_type METADATA [ FROM metadata_key ] [ VIRTUAL ] : column_name AS computed_column_expression [COMMENT column_comment] column_name column_type : WATERMARK FOR rowtime_column_name AS watermark_strategy_expression : [CONSTRAINT constraint_name] PRIMARY KEY (column_name, ...) NOT ENFORCED : { { INCLUDING | EXCLUDING } { ALL | CONSTRAINTS | PARTITIONS } | { INCLUDING | EXCLUDING | OVERWRITING } { GENERATED | OPTIONS | WATERMARKS } } ``` ```sql catalog_name.db_name.table_name ``` ```sql db_name.table_name ``` ```sql CREATE TABLE t1 ( `id` BIGINT, `name` STRING, `age` INT, `salary` DECIMAL(10,2), `active` BOOLEAN, `created_at` TIMESTAMP_LTZ(3) ); ``` ```sql CREATE TABLE t2 ( `id` BIGINT, `name` STRING, `age` INT, `salary` DECIMAL(10,2), `active` BOOLEAN, `created_at` TIMESTAMP_LTZ(3) ) WITH ( 'changelog.mode' = 'retract' ); ``` ```sql CREATE TABLE t1 ( `id` BIGINT, `name` STRING, `age` INT, `salary` DECIMAL(10,2), `active` BOOLEAN, `created_at` TIMESTAMP_LTZ(3) ); ``` ```sql CREATE TABLE t ( `user_id` BIGINT, `item_id` BIGINT, `behavior` STRING, `event_time` TIMESTAMP_LTZ(3) METADATA FROM 'timestamp', `partition` BIGINT METADATA VIRTUAL, `offset` BIGINT METADATA VIRTUAL ); ``` ```sql StringSerializer ``` ```sql StringSerializer ``` ```sql CREATE TABLE t ( `id` BIGINT, `first_name` STRING, `last_name` STRING, `full_name` AS CONCAT(first_name, ' ', last_name) ); ``` ```sql SHOW CREATE TABLE ``` ```sql DESCRIBE EXTENDED ``` ```sql $rowtime TIMESTAMP_LTZ(3) NOT NULL ``` ```sql NOT ENFORCED ``` ```sql NOT ENFORCED ``` ```sql PRIMARY KEY ``` ```sql DISTRIBUTED BY ``` ```sql latest_page_per_ip ``` ```sql CREATE TABLE latest_page_per_ip ( `ip` STRING, `page_url` STRING, `ts` TIMESTAMP_LTZ(3), PRIMARY KEY(`ip`) NOT ENFORCED ); ``` ```sql DISTRIBUTED BY ``` ```sql bucket = hash(user_id) % number_of_buckets ``` ```sql CREATE TABLE t (schema); ``` ```sql CREATE TABLE t_dist (k INT, s STRING) DISTRIBUTED BY (k) INTO 4 BUCKETS; ``` ```sql PARTITIONED BY ``` ```sql PARTITIONED BY ``` ```sql PARTITIONED BY ``` ```sql PARTITIONED BY ``` ```sql CREATE TABLE t (partition_key INT, example_value STRING) PARTITIONED BY (partition_key); ``` ```sql WATERMARK FOR $rowtime AS SOURCE_WATERMARK() ``` ```sql WATERMARK FOR rowtime_column_name AS watermark_strategy_expression ``` ```sql rowtime_column_name ``` ```sql TIMESTAMP(3) ``` ```sql watermark_strategy_expression ``` ```sql TIMESTAMP(3) ``` ```sql WATERMARK FOR rowtime_column AS rowtime_column ``` ```sql WATERMARK FOR rowtime_column AS rowtime_column - INTERVAL '0.001' SECOND ``` ```sql WATERMARK FOR rowtime_column AS rowtime_column - INTERVAL 'string' timeUnit ``` ```sql WATERMARK FOR rowtime_column AS rowtime_column - INTERVAL '5' SECOND ``` ```sql CREATE TABLE orders ( `user` BIGINT, `product` STRING, `order_time` TIMESTAMP(3), WATERMARK FOR `order_time` AS `order_time` - INTERVAL '5' SECOND ); ``` ```sql sql.tables.scan.idle-timeout ``` ```sql sql.tables.scan.idle-timeout ``` ```sql -- Equivalent to the following CREATE TABLE and INSERT INTO statements. CREATE TABLE my_ctas_table AS SELECT id, name, age FROM source_table WHERE mod(id, 10) = 0; ``` ```sql -- These two statements are equivalent to the preceding CREATE TABLE AS statement. CREATE TABLE my_ctas_table ( id BIGINT, name STRING, age INT ); INSERT INTO my_ctas_table SELECT id, name, age FROM source_table WHERE mod(id, 10) = 0; ``` ```sql CREATE TABLE t WITH (…) AS SELECT … ``` ```sql CREATE TABLE t WITH ('scan.startup.mode' = 'latest-offset') AS SELECT * FROM b; ``` ```sql CREATE TABLE my_ctas_table ( desc STRING, quantity DOUBLE, cost AS price * quantity, WATERMARK FOR order_time AS order_time - INTERVAL '5' SECOND, ) AS SELECT id, price, quantity, order_time FROM source_table; ``` ```sql -- Equivalent to the following CREATE TABLE and INSERT INTO statements. CREATE TABLE my_ctas_table ( PRIMARY KEY (id) NOT ENFORCED ) DISTRIBUTED BY HASH(id) INTO 4 BUCKETS AS SELECT id, name FROM source_table; ``` ```sql -- These two statements are equivalent to the preceding CREATE TABLE AS statement. CREATE TABLE my_ctas_table ( id BIGINT NOT NULL PRIMARY KEY NOT ENFORCED, name STRING ) DISTRIBUTED BY HASH(id) INTO 4 BUCKETS; INSERT INTO my_ctas_table SELECT id, name FROM source_table; ``` ```sql CREATE TABLE t ( `user_id` BIGINT, `item_id` BIGINT, `price` DOUBLE, `behavior` STRING, `created_at` TIMESTAMP(3), `price_with_tax` AS `price` * 1.19, `event_time` TIMESTAMP_LTZ(3) METADATA FROM 'timestamp', `partition` BIGINT METADATA VIRTUAL, `offset` BIGINT METADATA VIRTUAL ); ``` ```sql CREATE TABLE t_derived ( WATERMARK FOR `created_at` AS `created_at` - INTERVAL '5' SECOND ) LIKE t ( EXCLUDING WATERMARKS EXCLUDING METADATA ); ``` ```sql 'changelog.mode' = [append | upsert | retract] ``` ```sql 'error-handling.log.target' = '' ``` ```sql error-handling.log.target ``` ```sql 'error-handling.mode' = [fail | ignore | log] ``` ```sql SELECT * FROM my_table ``` ```sql error-handling.mode ``` ```sql error-handling.mode ``` ```sql 'kafka.cleanup-policy' = [delete | compact | delete-compact] ``` ```sql log.cleanup.policy ``` ```sql delete-compact ``` ```sql read-committed ``` ```sql 'kafka.consumer.isolation-level' = [read-committed | read-uncommitted] ``` ```sql read-committed ``` ```sql read-uncommitted ``` ```sql 'kafka.max-message-size' = MemorySize ``` ```sql max.message.bytes ``` ```sql 'kafka.producer.compression.type' = [none | gzip | snappy | lz4 | zstd] ``` ```sql compression.type ``` ```sql 'kafka.retention.size' = MemorySize ``` ```sql log.retention.bytes ``` ```sql 'kafka.retention.time' = '' ``` ```sql log.retention.ms ``` ```sql 'key.fields-prefix' = '' ``` ```sql key.fields-prefix ``` ```sql 'key.format' = '' ``` ```sql avro-registry ``` ```sql json-registry ``` ```sql proto-registry ``` ```sql avro-registry ``` ```sql json-registry ``` ```sql proto-registry ``` ```sql 'key..schema-context' = '' ``` ```sql scan.bounded.mode = [latest-offset | timestamp | unbounded] ``` ```sql latest-offset ``` ```sql scan.bounded.mode ``` ```sql January 1, 1970 00:00:00.000 GMT ``` ```sql 'scan.bounded.mode' = 'timestamp', 'scan.bounded.timestamp-millis' = '' ``` ```sql earliest-offset ``` ```sql 'scan.startup.mode' = '' ``` ```sql earliest-offset ``` ```sql latest-offset ``` ```sql earliest-offset ``` ```sql group-offsets ``` ```sql 'scan.startup.mode' = 'timestamp', 'scan.startup.timestamp-millis' = '' ``` ```sql 'value.fields-include' = [all, except-key] ``` ```sql 'value.format' = '' ``` ```sql avro-registry ``` ```sql json-registry ``` ```sql proto-registry ``` ```sql avro-debezium-registry ``` ```sql json-debezium-registry ``` ```sql proto-debezium-registry ``` ```sql avro-registry ``` ```sql json-registry ``` ```sql proto-registry ``` ```sql 'value..schema-context' = '' ``` ```sql CREATE TABLE `t_raw` ( `key` VARBINARY(2147483647), `val` VARBINARY(2147483647) ) DISTRIBUTED BY HASH(`key`) INTO 2 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'raw' ... ) ``` ```sql INSERT INTO t_raw (key, val) SELECT CAST(NULL AS BYTES), CAST(NULL AS BYTES); ``` ```sql { "type": "record", "name": "TestRecord", "fields": [ { "name": "i", "type": "int" }, { "name": "s", "type": "string" } ] } ``` ```sql CREATE TABLE `t_raw_key` ( `key` VARBINARY(2147483647), `i` INT NOT NULL, `s` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'avro-registry' ... ) ``` ```sql INSERT INTO t_raw_key SELECT CAST(NULL AS BYTES), 12, 'Bob'; ``` ```sql { "type": "record", "name": "TestRecord", "fields": [ { "name": "i", "type": "int" }, { "name": "s", "type": "string" } ] } ``` ```sql CREATE TABLE `t_atomic_key` ( `key` INT NOT NULL, `i` INT NOT NULL, `s` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`key`) INTO 2 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'avro-registry', 'value.format' = 'avro-registry' ... ) ``` ```sql { "type": "record", "name": "TestRecord", "fields": [ { "name": "i", "type": "int" }, { "name": "key", "type": "string" } ] } ``` ```sql CREATE TABLE `t_raw_disjoint` ( `key_key` VARBINARY(2147483647), `i` INT NOT NULL, `key` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`key_key`) INTO 1 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.fields-prefix' = 'key_', 'key.format' = 'raw', 'value.format' = 'avro-registry' ... ) ``` ```sql i INT NOT NULL ``` ```sql { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" } ] } ``` ```sql { "type": "record", "name": "TestRecord", "fields": [ { "name": "name", "type": "string" }, { "name": "zip_code", "type": "string" } ] } ``` ```sql CREATE TABLE `t_sr_disjoint` ( `uid` INT NOT NULL, `name` VARCHAR(2147483647) NOT NULL, `zip_code` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`uid`) INTO 1 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'value.format' = 'avro-registry' ... ) ``` ```sql { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" } ] } ``` ```sql { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" },{ "name": "name", "type": "string" }, { "name": "zip_code", "type": "string" } ] } ``` ```sql CREATE TABLE `t_sr_joint` ( `uid` INT NOT NULL, `name` VARCHAR(2147483647) NOT NULL, `zip_code` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`uid`) INTO 1 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'value.fields-include' = 'all', 'value.format' = 'avro-registry' ... ) ``` ```sql 'value.fields-include' = 'all' ``` ```sql ["int", "string"] ``` ```sql CREATE TABLE `t_union` ( `key` VARBINARY(2147483647), `int` INT, `string` VARCHAR(2147483647) ) ... ``` ```sql [ "string", { "type": "record", "name": "User", "fields": [ { "name": "uid", "type": "int" },{ "name": "name", "type": "string" } ] }, { "type": "record", "name": "Address", "fields": [ { "name": "zip_code", "type": "string" } ] } ] ``` ```sql CREATE TABLE `t_union` ( `key` VARBINARY(2147483647), `string` VARCHAR(2147483647), `User` ROW<`uid` INT NOT NULL, `name` VARCHAR(2147483647) NOT NULL>, `Address` ROW<`zip_code` VARCHAR(2147483647) NOT NULL> ) ... ``` ```sql org.myorg.avro.User ``` ```sql syntax = "proto3"; message Purchase { string item = 1; double amount = 2; string customer_id = 3; } message Pageview { string url = 1; bool is_special = 2; string customer_id = 3; } ``` ```sql CREATE TABLE `t` ( `key` VARBINARY(2147483647), `Purchase` ROW< `item` VARCHAR(2147483647) NOT NULL, `amount` DOUBLE NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL >, `Pageview` ROW< `url` VARCHAR(2147483647) NOT NULL, `is_special` BOOLEAN NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL > ) ... ``` ```sql syntax = "proto3"; message Purchase { string item = 1; double amount = 2; string customer_id = 3; Pageview pageview = 4; } message Pageview { string url = 1; bool is_special = 2; string customer_id = 3; } ``` ```sql CREATE TABLE `t` ( `key` VARBINARY(2147483647), `Purchase` ROW< `item` VARCHAR(2147483647) NOT NULL, `amount` DOUBLE NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL, `pageview` ROW< `url` VARCHAR(2147483647) NOT NULL, `is_special` BOOLEAN NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL > >, `Pageview` ROW< `url` VARCHAR(2147483647) NOT NULL, `is_special` BOOLEAN NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL > ) ... ``` ```sql syntax = "proto3"; message Purchase { string item = 1; double amount = 2; string customer_id = 3; Pageview pageview = 4; message Pageview { string url = 1; bool is_special = 2; string customer_id = 3; } } ``` ```sql CREATE TABLE `t` ( `key` VARBINARY(2147483647), `item` VARCHAR(2147483647) NOT NULL, `amount` DOUBLE NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL, `pageview` ROW< `url` VARCHAR(2147483647) NOT NULL, `is_special` BOOLEAN NOT NULL, `customer_id` VARCHAR(2147483647) NOT NULL > ) ... ``` ```sql { "type": "record", "name": "Customer", "namespace": "io.debezium.data", "fields": [ { "name": "before", "type": ["null", { "type": "record", "name": "Value", "fields": [ {"name": "id", "type": "int"}, {"name": "name", "type": "string"}, {"name": "email", "type": "string"} ] }], "default": null }, { "name": "after", "type": ["null", "Value"], "default": null }, { "name": "source", "type": { "type": "record", "name": "Source", "fields": [ {"name": "version", "type": "string"}, {"name": "connector", "type": "string"}, {"name": "name", "type": "string"}, {"name": "ts_ms", "type": "long"}, {"name": "db", "type": "string"}, {"name": "schema", "type": "string"}, {"name": "table", "type": "string"} ] } }, {"name": "op", "type": "string"}, {"name": "ts_ms", "type": ["null", "long"], "default": null}, {"name": "transaction", "type": ["null", { "type": "record", "name": "Transaction", "fields": [ {"name": "id", "type": "string"}, {"name": "total_order", "type": "long"}, {"name": "data_collection_order", "type": "long"} ] }], "default": null} ] } ``` ```sql CREATE TABLE `customer_changes` ( `key` VARBINARY(2147483647), `id` INT NOT NULL, `name` VARCHAR(2147483647) NOT NULL, `email` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( 'changelog.mode' = 'retract', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'avro-debezium-registry' ... ) ``` ```sql value.format ``` ```sql *-debezium-registry ``` ```sql changelog.mode ``` ```sql cleanup.policy ``` ```sql changelog.mode ``` ```sql changelog.mode ``` ```sql -- Change to upsert mode for primary key-based operations ALTER TABLE customer_changes SET ('changelog.mode' = 'upsert'); -- Change to append mode (processes only inserts and updates) ALTER TABLE customer_changes SET ('changelog.mode' = 'append'); ``` ```sql CREATE TABLE t_minimal (s STRING); ``` ```sql CREATE TABLE t_pk (k INT PRIMARY KEY NOT ENFORCED, s STRING); ``` ```sql CREATE TABLE t_pk_append (k INT PRIMARY KEY NOT ENFORCED, s STRING) DISTRIBUTED INTO 4 BUCKETS WITH ('changelog.mode' = 'append'); ``` ```sql CREATE TABLE t_dist (k INT, s STRING) DISTRIBUTED BY (k) INTO 4 BUCKETS; ``` ```sql CREATE TABLE t_complex (k1 INT, k2 INT, PRIMARY KEY (k1, k2) NOT ENFORCED, s STRING) COMMENT 'My complex table' DISTRIBUTED BY HASH(k1) INTO 4 BUCKETS WITH ('changelog.mode' = 'append'); ``` ```sql CREATE TABLE t_disjoint (from_key_k INT, k STRING) DISTRIBUTED BY (from_key_k) WITH ('key.fields-prefix' = 'from_key_'); ``` ```sql CREATE TABLE t_joint (k INT, v STRING) DISTRIBUTED BY (k) WITH ('value.fields-include' = 'all'); ``` ```sql 'value.fields-include' = 'all' ``` ```sql CREATE TABLE t_metadata_write (name STRING, ts TIMESTAMP_LTZ(3) NOT NULL METADATA FROM 'timestamp') DISTRIBUTED INTO 1 BUCKETS; ``` ```sql INSERT INTO t (ts, name) SELECT NOW(), 'Alice'; INSERT INTO t (ts, name) SELECT TO_TIMESTAMP_LTZ(0, 3), 'Bob'; SELECT $rowtime, * FROM t; ``` ```sql CREATE TABLE t_raw_string_key (key STRING, i INT) DISTRIBUTED BY (key) WITH ('key.format' = 'raw'); ``` ```sql CREATE TABLE t_shared_schema (key STRING, s STRING) DISTRIBUTED BY (key); ``` ```sql CREATE TABLE t_shared_schema (key STRING, s STRING) DISTRIBUTED BY (key); ``` ```sql t_shared_schema-key ``` ```sql t_shared_schema-value ``` ```sql +I['Bob', 42] ``` ```sql -D['Bob', 42] ``` ```sql +U['Alice', 13] ``` ```sql -U['Alice', 13] ``` ```sql CREATE TABLE t_changelog_modes (i BIGINT); ``` ```sql -- works because the query is non-updating INSERT INTO t_changelog_modes SELECT 1; -- does not work because the query is updating, causing an error INSERT INTO t_changelog_modes SELECT COUNT(*) FROM (VALUES (1), (2), (3)); ``` ```sql ALTER TABLE t_changelog_modes SET ('changelog.mode' = 'retract'); ``` ```sql ALTER TABLE t_changelog_modes SET ('changelog.mode' = 'append'); ALTER TABLE t_changelog_modes ADD headers MAP METADATA VIRTUAL; -- Shows what is serialized internally SELECT i, headers FROM t_changelog_modes; ``` ```sql CREATE TABLE t_infinite_retention (i INT) WITH ('kafka.retention.time' = '0'); ``` ```sql "d", "day", "h", "hour", "m", "min", "minute", "ms", "milli", "millisecond", "µs", "micro", "microsecond", "ns", "nano", "nanosecond" ``` --- ### SQL CREATE VIEW Statement in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/statements/create-view.html CREATE VIEW Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables creating views based on statement expressions by using the CREATE VIEW statement. With Flink views, you can encapsulate complex queries and reference them like regular tables. Syntax¶ CREATE VIEW [IF NOT EXISTS] [catalog_name.][db_name.]view_name [( columnName [, columnName ]* )] [COMMENT view_comment] AS statement_expression Description¶ Create a view with the given statement expression. If a view with the same name already exists in the catalog, an exception is thrown. If you specify IF NOT EXISTS, nothing happens if the view exists already. The view name can be in these formats: catalog_name.db_name.view_name: The view is registered with the catalog named “catalog_name” and the database named “db_name”. db_name.view_name: The view is registered into the current catalog of the execution table environment and the database named “db_name”. view_name: The view is registered into the current catalog and the database of the execution table environment. A view created with the CREATE VIEW statement acts as a virtual table that refers to the result of the specified statement expression. The statement expression can be any valid SELECT statement supported by Flink SQL. Views vs. tables¶ Views in Flink are similar to tables in that they can be referenced in SQL queries just like regular tables. But there are some key differences: Views are read-only and can’t be used as sinks in INSERT statements. Tables support both read and write operations. Views don’t have a physical representation and are computed on-the-fly when referenced in a statement. Creating a view results in creating a special Kafka topic. This Flink resource only reserves the name and doesn’t store data. Creating a table results in creating a regular Kafka topic that stores data and corresponding key and value schemas in Confluent Schema Registry. Views are lightweight and store only the statement expression. Despite these differences, views and tables share the same namespace in Flink. This means a view can’t have the same fully qualified name as an existing table in the same catalog and database. Usage¶ The following CREATE VIEW statement defines a view named orders_by_customer that computes the total order value per customer from an orders table. CREATE VIEW customer_orders AS SELECT customer_id, SUM(price) AS total_spent FROM `examples`.`marketplace`.`orders` GROUP BY customer_id; You can then use this view in queries as if it were a table: SELECT customer_id, total_spent FROM customer_orders WHERE total_spent > 1000; This statement retrieves all customers with a total order value greater than 1000, leveraging the aggregation already computed in the orders_by_customer view. Related content¶ CREATE TABLE statement SELECT statement Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql CREATE VIEW [IF NOT EXISTS] [catalog_name.][db_name.]view_name [( columnName [, columnName ]* )] [COMMENT view_comment] AS statement_expression ``` ```sql catalog_name.db_name.view_name ``` ```sql db_name.view_name ``` ```sql orders_by_customer ``` ```sql CREATE VIEW customer_orders AS SELECT customer_id, SUM(price) AS total_spent FROM `examples`.`marketplace`.`orders` GROUP BY customer_id; ``` ```sql SELECT customer_id, total_spent FROM customer_orders WHERE total_spent > 1000; ``` ```sql orders_by_customer ``` --- ### SQL DESCRIBE Statement in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/statements/describe.html DESCRIBE Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables viewing the schema of an Apache Kafka® topic. Also, you can view details of an AI model, function, or connection. Syntax¶ -- View table details. { DESCRIBE | DESC } [EXTENDED] [catalog_name.][db_name.]table_name -- View model details. { DESCRIBE | DESC } MODEL [[catalogname].[database_name]].model_name -- View function details. { DESCRIBE | DESC } FUNCTION [EXTENDED] [catalog_name.][db_name.]function_name -- View connection details. { DESCRIBE | DESC } CONNECTION [catalog_name.][db_name.]connection_name Description¶ The DESCRIBE statement shows the following properties of a table: Columns and their data type, including nullability constraints Primary keys Bucket keys, i.e., keys of distribution Implicit NOT NULL for primary key columns Custom watermark The DESCRIBE EXTENDED statement shows all of the properties from the DESCRIBE statement and also shows system columns, like $rowtime, including the system watermark. The DESCRIBE MODEL statement shows the following properties of an AI model: Input format Output format Model version isDefault version (yes or no) The DESCRIBE FUNCTION statement shows the following properties of a function: System function (yes or no) Temporary (yes or no) Class name Function language Plugin ID Version ID Argument types Return type The DESCRIBE FUNCTION EXTENDED statement shows all of the properties from the DESCRIBE FUNCTION statement and also shows the following properties: Kind i.e. SCALAR, TABLE, or AGGREGATE Requirements e.g an aggregate function that can only be applied in an OVER window Deterministic (yes or no) Constant folding (yes or no) Signature The DESCRIBE CONNECTION statement shows the following properties of a connection: Name Type Endpoint Comment Examples¶ Tables¶ In the Flink SQL shell or in a Cloud Console workspace, run the following commands to see an example of the DESCRIBE statement. Create a table. CREATE TABLE orders ( `user` BIGINT NOT NULL, product STRING, amount INT, ts TIMESTAMP(3), PRIMARY KEY(`user`) NOT ENFORCED ); Your output should resemble: [INFO] Execute statement succeed. View the table’s schema. DESCRIBE orders; Your output should resemble: +-------------+--------------+----------+-------------------------+ | Column Name | Data Type | Nullable | Extras | +-------------+--------------+----------+-------------------------+ | user | BIGINT | NOT NULL | PRIMARY KEY, BUCKET KEY | | product | STRING | NULL | | | amount | INT | NULL | | | ts | TIMESTAMP(3) | NULL | | +-------------+--------------+----------+-------------------------+ View the table’s schema and system columns. DESCRIBE EXTENDED orders; Your output should resemble: +-------------+----------------------------+----------+-----------------------------------------------------+---------+ | Column Name | Data Type | Nullable | Extras | Comment | +-------------+----------------------------+----------+-----------------------------------------------------+---------+ | user | BIGINT | NOT NULL | PRIMARY KEY, BUCKET KEY | | | product | STRING | NULL | | | | amount | INT | NULL | | | | ts | TIMESTAMP(3) | NULL | | | | $rowtime | TIMESTAMP_LTZ(3) *ROWTIME* | NOT NULL | METADATA VIRTUAL, WATERMARK AS `SOURCE_WATERMARK`() | SYSTEM | +-------------+----------------------------+----------+-----------------------------------------------------+---------+ Models¶ If you have an AI model registered in the Flink environment, you can view its details and creation options by using the DESCRIBE MODEL statement. The following code example shows how to view the default model version: DESCRIBE MODEL `my-model`; Your output should resemble: +-----------------------+---------------------------+---------------------------+---------+ | Inputs | Outputs | Options | Comment | +-----------------------+---------------------------+---------------------------+---------+ | ( | ( | { | | | `credit_limit` INT, | `predicted_default` INT | AZUREML.API_KEY=******, | | | `age` INT | ) | AZUREML.ENDPOINT=h... | | | ) | | | | +-----------------------+---------------------------+---------------------------+---------+ The following code example shows how to view a specific model version: DESCRIBE MODEL `my-model$2`; Your output should resemble: +-----------+------------------+-----------------------+---------------------------+--------------------+---------+ | VersionId | IsDefaultVersion | Inputs | Outputs | Options | Comment | +-----------+------------------+-----------------------+---------------------------+--------------------+---------+ | 2 | true | ( | ( | { | | | | | `credit_limit` INT, | `predicted_default` INT | AZUREML.API_K... | | | | | `age` INT | ) | | | | | | ) | | | | +-----------+------------------+-----------------------+---------------------------+--------------------+---------+ The following code example shows how to view all model versions: DESCRIBE MODEL `my-model$all`; Your output should resemble: +-----------+------------------+-----------------------+---------------------------+--------------------+---------+ | VersionId | IsDefaultVersion | Inputs | Outputs | Options | Comment | +-----------+------------------+-----------------------+---------------------------+--------------------+---------+ | 1 | true | ( | ( | { | | | | | `credit_limit` INT, | `predicted_default` INT | AZUREML.API_K... | | | | | `age` INT | ) | | | | | | ) | | | | | 2 | false | ( | ( | { | | | | | `credit_limit` INT, | `predicted_default` INT | AZUREML.API_K... | | | | | `age` INT | ) | | | | | | ) | | | | +-----------+------------------+-----------------------+---------------------------+--------------------+---------+ For more information, see Model versioning. Functions¶ You can view the details of any system functions or registered user-defined functions in the Flink environment, by using the DESCRIBE FUNCTION statement. The following code example shows how to describe a system function: DESCRIBE FUNCTION `SUM`; Your output should resemble: +-----------------+------------+ | info name | info value | +-----------------+------------+ | system function | true | | temporary | false | +-----------------+------------+ View more details about the system function definition. DESCRIBE FUNCTION EXTENDED `SUM`; Your output should resemble: +------------------+----------------+ | info name | info value | +------------------+----------------+ | system function | true | | temporary | false | | kind | AGGREGATE | | requirements | [] | | deterministic | true | | constant folding | true | | signature | SUM() | +------------------+----------------+ Here is what describing a user-defined function looks like DESCRIBE FUNCTION `MyUpperCaseUdf`; Your output should resemble: +-------------------+----------------------+ | info name | info value | +-------------------+----------------------+ | system function | false | | temporary | true | | class name | org.example.UpperUDF | | function language | JAVA | | plugin id | ccp-xyz | | version id | ver-123 | | argument types | [str] | | return type | str | +-------------------+----------------------+ View more details about the user-defined function definition. DESCRIBE FUNCTION EXTENDED `MyUpperCaseUdf`; Your output should resemble: +-------------------+-------------------------------+ | info name | info value | +-------------------+-------------------------------+ | system function | false | | temporary | true | | class name | org.example.UpperUDF | | function language | JAVA | | kind | SCALAR | | requirements | [] | | deterministic | true | | constant folding | true | | signature | cat.db.MyUpperCaseUdf(STRING) | | plugin id | ccp-xyz | | version id | ver-123 | | argument types | [str] | | return type | str | +-------------------+-------------------------------+ Connections¶ You can view the details of any connection in the Flink environment by using the DESCRIBE CONNECTION statement. The following code example shows how to describe an example connection named azure-openai-connection. DESCRIBE CONNECTION `azure-openai-connection`; Your output should resemble: +-------------------------+-------------+-----------------------------------------------------------------------+---------+ | Name | Type | Endpoint | Comment | +-------------------------+-------------+-----------------------------------------------------------------------+---------+ | azure-openai-connection | AZUREOPENAI | https://.openai.azure.com/openai/deployments/matrix-... | | +-------------------------+-------------+-----------------------------------------------------------------------+---------+ Related content¶ CREATE TABLE CREATE MODEL CREATE FUNCTION CREATE CONNECTION USE CATALOG Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql -- View table details. { DESCRIBE | DESC } [EXTENDED] [catalog_name.][db_name.]table_name -- View model details. { DESCRIBE | DESC } MODEL [[catalogname].[database_name]].model_name -- View function details. { DESCRIBE | DESC } FUNCTION [EXTENDED] [catalog_name.][db_name.]function_name -- View connection details. { DESCRIBE | DESC } CONNECTION [catalog_name.][db_name.]connection_name ``` ```sql CREATE TABLE orders ( `user` BIGINT NOT NULL, product STRING, amount INT, ts TIMESTAMP(3), PRIMARY KEY(`user`) NOT ENFORCED ); ``` ```sql [INFO] Execute statement succeed. ``` ```sql DESCRIBE orders; ``` ```sql +-------------+--------------+----------+-------------------------+ | Column Name | Data Type | Nullable | Extras | +-------------+--------------+----------+-------------------------+ | user | BIGINT | NOT NULL | PRIMARY KEY, BUCKET KEY | | product | STRING | NULL | | | amount | INT | NULL | | | ts | TIMESTAMP(3) | NULL | | +-------------+--------------+----------+-------------------------+ ``` ```sql DESCRIBE EXTENDED orders; ``` ```sql +-------------+----------------------------+----------+-----------------------------------------------------+---------+ | Column Name | Data Type | Nullable | Extras | Comment | +-------------+----------------------------+----------+-----------------------------------------------------+---------+ | user | BIGINT | NOT NULL | PRIMARY KEY, BUCKET KEY | | | product | STRING | NULL | | | | amount | INT | NULL | | | | ts | TIMESTAMP(3) | NULL | | | | $rowtime | TIMESTAMP_LTZ(3) *ROWTIME* | NOT NULL | METADATA VIRTUAL, WATERMARK AS `SOURCE_WATERMARK`() | SYSTEM | +-------------+----------------------------+----------+-----------------------------------------------------+---------+ ``` ```sql DESCRIBE MODEL `my-model`; ``` ```sql +-----------------------+---------------------------+---------------------------+---------+ | Inputs | Outputs | Options | Comment | +-----------------------+---------------------------+---------------------------+---------+ | ( | ( | { | | | `credit_limit` INT, | `predicted_default` INT | AZUREML.API_KEY=******, | | | `age` INT | ) | AZUREML.ENDPOINT=h... | | | ) | | | | +-----------------------+---------------------------+---------------------------+---------+ ``` ```sql DESCRIBE MODEL `my-model$2`; ``` ```sql +-----------+------------------+-----------------------+---------------------------+--------------------+---------+ | VersionId | IsDefaultVersion | Inputs | Outputs | Options | Comment | +-----------+------------------+-----------------------+---------------------------+--------------------+---------+ | 2 | true | ( | ( | { | | | | | `credit_limit` INT, | `predicted_default` INT | AZUREML.API_K... | | | | | `age` INT | ) | | | | | | ) | | | | +-----------+------------------+-----------------------+---------------------------+--------------------+---------+ ``` ```sql DESCRIBE MODEL `my-model$all`; ``` ```sql +-----------+------------------+-----------------------+---------------------------+--------------------+---------+ | VersionId | IsDefaultVersion | Inputs | Outputs | Options | Comment | +-----------+------------------+-----------------------+---------------------------+--------------------+---------+ | 1 | true | ( | ( | { | | | | | `credit_limit` INT, | `predicted_default` INT | AZUREML.API_K... | | | | | `age` INT | ) | | | | | | ) | | | | | 2 | false | ( | ( | { | | | | | `credit_limit` INT, | `predicted_default` INT | AZUREML.API_K... | | | | | `age` INT | ) | | | | | | ) | | | | +-----------+------------------+-----------------------+---------------------------+--------------------+---------+ ``` ```sql DESCRIBE FUNCTION `SUM`; ``` ```sql +-----------------+------------+ | info name | info value | +-----------------+------------+ | system function | true | | temporary | false | +-----------------+------------+ ``` ```sql DESCRIBE FUNCTION EXTENDED `SUM`; ``` ```sql +------------------+----------------+ | info name | info value | +------------------+----------------+ | system function | true | | temporary | false | | kind | AGGREGATE | | requirements | [] | | deterministic | true | | constant folding | true | | signature | SUM() | +------------------+----------------+ ``` ```sql DESCRIBE FUNCTION `MyUpperCaseUdf`; ``` ```sql +-------------------+----------------------+ | info name | info value | +-------------------+----------------------+ | system function | false | | temporary | true | | class name | org.example.UpperUDF | | function language | JAVA | | plugin id | ccp-xyz | | version id | ver-123 | | argument types | [str] | | return type | str | +-------------------+----------------------+ ``` ```sql DESCRIBE FUNCTION EXTENDED `MyUpperCaseUdf`; ``` ```sql +-------------------+-------------------------------+ | info name | info value | +-------------------+-------------------------------+ | system function | false | | temporary | true | | class name | org.example.UpperUDF | | function language | JAVA | | kind | SCALAR | | requirements | [] | | deterministic | true | | constant folding | true | | signature | cat.db.MyUpperCaseUdf(STRING) | | plugin id | ccp-xyz | | version id | ver-123 | | argument types | [str] | | return type | str | +-------------------+-------------------------------+ ``` ```sql azure-openai-connection ``` ```sql DESCRIBE CONNECTION `azure-openai-connection`; ``` ```sql +-------------------------+-------------+-----------------------------------------------------------------------+---------+ | Name | Type | Endpoint | Comment | +-------------------------+-------------+-----------------------------------------------------------------------+---------+ | azure-openai-connection | AZUREOPENAI | https://.openai.azure.com/openai/deployments/matrix-... | | +-------------------------+-------------+-----------------------------------------------------------------------+---------+ ``` --- ### SQL DROP CONNECTION Statement in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/statements/drop-connection.html DROP CONNECTION Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports creating secure connections to external services and data sources. You can use these connections in your Flink statements. You remove these connections by using the DROP CONNECTION statement. Syntax¶ DROP CONNECTION [IF EXISTS] [catalog_name.][db_name.]connection_name Description¶ Delete a connection from the Flink environment. Dropping a connection deletes the corresponding credentials stored in the SecretStore. Example¶ DROP CONNECTION `azure-openai-connection`; Related content¶ ALTER CONNECTION CREATE CONNECTION DESCRIBE CONNECTION SHOW CONNECTIONS Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql DROP CONNECTION [IF EXISTS] [catalog_name.][db_name.]connection_name ``` ```sql SecretStore ``` ```sql DROP CONNECTION `azure-openai-connection`; ``` --- ### SQL DROP MODEL Statement in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/statements/drop-model.html DROP MODEL Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables real-time inference and prediction with AI models. Use the CREATE MODEL statement to register an AI model. Syntax¶ -- Delete the default version. The min version becomes the new default. DROP MODEL [IF EXISTS] [[catalog_name].[database_name]].model_name -- Delete the specified version. DROP MODEL [IF EXISTS] [[catalog_name].[database_name]].model_name[$version-id] -- Delete all versions and the model. DROP MODEL [IF EXISTS] [[catalog_name].[database_name]].model_name[$all] Description¶ Delete an AI model in Confluent Cloud for Apache Flink. Use the $ syntax to delete a specific version of a model. For more information, see Model versioning. If version_id is not specified, DROP deletes the default version, and the min version becomes the default version. DROP MODEL $all deletes all versions. When the IF EXISTS clause is provided and the model or version doesn’t exist, no action is taken. Examples¶ -- Delete the default version. The min version becomes the new default. DROP MODEL ``; -- Delete a specific version of the model. DROP MODEL `$`; -- Delete all versions and the model. DROP MODEL `$all`; Related content¶ CREATE MODEL ALTER MODEL Run an AI Model Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql -- Delete the default version. The min version becomes the new default. DROP MODEL [IF EXISTS] [[catalog_name].[database_name]].model_name -- Delete the specified version. DROP MODEL [IF EXISTS] [[catalog_name].[database_name]].model_name[$version-id] -- Delete all versions and the model. DROP MODEL [IF EXISTS] [[catalog_name].[database_name]].model_name[$all] ``` ```sql $ ``` ```sql DROP MODEL $all ``` ```sql -- Delete the default version. The min version becomes the new default. DROP MODEL ``; -- Delete a specific version of the model. DROP MODEL `$`; -- Delete all versions and the model. DROP MODEL `$all`; ``` --- ### SQL DROP TABLE Statement in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/statements/drop-table.html DROP TABLE Statement in Confluent Cloud for Apache Flink¶ The DROP TABLE statement removes a table definition from Confluent Cloud for Apache Flink® and, depending on the table type, will also delete associated resources like the Kafka topic and schemas in Schema Registry. Syntax¶ DROP TABLE [IF EXISTS] table_name Parameters¶ IF EXISTSOptional clause that prevents an error if the table does not exist. table_nameThe name of the table to drop. Description¶ The DROP TABLE statement behavior varies depending on the table type. Regular Tables¶ For tables backed by Kafka topics, which are created by using CREATE TABLE or inferred from existing topics: Deletes the underlying Kafka topic permanently When using TopicNameStrategy (default): - Deletes all versions of the associated schemas from Schema Registry When using RecordNameStrategy or TopicRecordNameStrategy: - Deletes the Kafka topic but preserves schemas in Schema Registry External Tables¶ Note External tables are an Open Preview feature in Confluent Cloud. A Preview feature is a Confluent Cloud component that is being introduced to gain early feedback from developers. Preview features can be used for evaluation and non-production testing purposes or to provide feedback to Confluent. The warranty, SLA, and Support Services provisions of your agreement with Confluent do not apply to Preview features. Confluent may discontinue providing preview releases of the Preview features at any time in Confluent’s’ sole discretion. Confluent Cloud for Apache Flink enables vector search with external tables. Use the CREATE TABLE statement to register an external table. For external tables, like vector databases and lookup tables: Removes the table definition from Flink metadata Does not delete data from the external system Examples include vector search tables and federated search tables Permissions¶ To execute DROP TABLE, you need an RBAC role that enables you to delete the Kafka topics and Schema Registry schema subjects. Important considerations¶ The DROP TABLE operation is not atomic. If either the Kafka topic deletion or schema deletion fails, the operation may partially complete. Dropping a table permanently deletes the Kafka topic data. Running statements that depend on a dropped table will transition to DEGRADED status. You should stop dependent statements before dropping a table. When using TopicNameStrategy, dropping a table deletes schemas, even if they are used by other topics. Examples¶ -- Drop a Kafka-backed table. DROP TABLE my_table; -- Drop a table if it exists. DROP TABLE IF EXISTS my_table; -- Drop an external table. DROP TABLE ``; Related Content¶ ALTER TABLE CREATE TABLE Search External Tables Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql DROP TABLE [IF EXISTS] table_name ``` ```sql -- Drop a Kafka-backed table. DROP TABLE my_table; -- Drop a table if it exists. DROP TABLE IF EXISTS my_table; -- Drop an external table. DROP TABLE ``; ``` --- ### SQL DROP VIEW Statement in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/statements/drop-view.html DROP VIEW Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables dropping views using the DROP VIEW statement. When a view is dropped, its definition is removed from the catalog. The corresponding Kafka topic Flink resource reservation is removed. Any new statement referencing the dropped view will fail. Syntax¶ DROP VIEW [IF EXISTS] [catalog_name.][db_name.]view_name Description¶ DROP VIEW removes a view from the catalog. If the view does not exist, an exception is thrown unless IF EXISTS is specified. The view name can be in these formats: catalog_name.db_name.view_name: The view with the given name is dropped from the catalog named “catalog_name” and the database named “db_name”. db_name.view_name: The view with the given name is dropped from the current catalog of the execution table environment and the database named “db_name”. view_name: The view with the given name is dropped from the current catalog and the current database of the execution table environment. Examples¶ The following example drops the vip_customers view. In the Confluent CLI or in a Cloud Console workspace, run the following command: DROP VIEW vip_customers; Your output should resemble: Statement phase is COMPLETED. If you try to query the dropped view: SELECT * FROM vip_customers; You will get an error message indicating that the view does not exist: [Code: 1, SQL State: 42000]: Object 'default_catalog.default_database.vip_customers' does not exist. To avoid the error when dropping a view that may not exist, use the IF EXISTS clause: DROP VIEW IF EXISTS vip_customers; This statement will not throw an error if the vip_customers view does not exist. Related content¶ CREATE VIEW ALTER VIEW Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql DROP VIEW [IF EXISTS] [catalog_name.][db_name.]view_name ``` ```sql catalog_name.db_name.view_name ``` ```sql db_name.view_name ``` ```sql vip_customers ``` ```sql DROP VIEW vip_customers; ``` ```sql Statement phase is COMPLETED. ``` ```sql SELECT * FROM vip_customers; ``` ```sql [Code: 1, SQL State: 42000]: Object 'default_catalog.default_database.vip_customers' does not exist. ``` ```sql DROP VIEW IF EXISTS vip_customers; ``` ```sql vip_customers ``` --- ### SQL EXPLAIN Statement in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/statements/explain.html EXPLAIN Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables viewing and analyzing the query plans of Flink SQL statements. Syntax¶ EXPLAIN { | | | CREATE TABLE ... AS SELECT ... } : STATEMENT SET BEGIN -- one or more INSERT INTO statements { INSERT INTO ; }+ END; Description¶ The EXPLAIN statement provides detailed information about how Flink executes a specified query or INSERT statement. EXPLAIN shows: The optimized physical execution plan If the changelog mode is not append-only, details about the changelog mode per operator Upsert keys and primary keys where applicable Table source and sink details This information is valuable for understanding query performance, optimizing complex queries, and debugging unexpected results. Use the EXPLAIN statement in conjunction with the Flink SQL Query Profiler to understand the physical plan of your query. Example queries¶ Basic query analysis¶ This example analyzes a query finding users who clicked but never placed an order: EXPLAIN SELECT c.* FROM `examples`.`marketplace`.`clicks` c LEFT JOIN ( SELECT DISTINCT customer_id FROM `examples`.`marketplace`.`orders` ) o ON c.user_id = o.customer_id WHERE o.customer_id IS NULL; The output shows the physical plan and operator details: == Physical Plan == StreamSink [11] +- StreamCalc [10] +- StreamJoin [9] +- StreamExchange [3] : +- StreamCalc [2] : +- StreamTableSourceScan [1] +- StreamExchange [8] +- StreamGroupAggregate [7] +- StreamExchange [6] +- StreamCalc [5] +- StreamTableSourceScan [4] == Physical Details == [1] StreamTableSourceScan Table: `examples`.`marketplace`.`clicks` Changelog mode: append State size: low [4] StreamTableSourceScan Table: `examples`.`marketplace`.`orders` Changelog mode: append State size: low [7] StreamGroupAggregate Changelog mode: retract Upsert key: (customer_id) State size: medium [8] StreamExchange Changelog mode: retract Upsert key: (customer_id) [9] StreamJoin Changelog mode: retract State size: medium [10] StreamCalc Changelog mode: retract [11] StreamSink Table: Foreground Changelog mode: retract State size: low Note that the [11] StreamSink Table: Foreground in the output indicates this is a preview execution plan. For more accurate optimization analysis, it’s recommended to test queries using either the final target table or CREATE TABLE AS statements, which will determine the optimal primary key and changelog mode for your specific use case. Creating tables¶ This example shows creating a new table from a query: EXPLAIN CREATE TABLE clicks_without_orders AS SELECT c.* FROM `examples`.`marketplace`.`clicks` c LEFT JOIN ( SELECT DISTINCT customer_id FROM `examples`.`marketplace`.`orders` ) o ON c.user_id = o.customer_id WHERE o.customer_id IS NULL; The output includes sink information for the new table: == Physical Plan == StreamSink [11] +- StreamCalc [10] +- StreamJoin [9] +- StreamExchange [3] : +- StreamCalc [2] : +- StreamTableSourceScan [1] +- StreamExchange [8] +- StreamGroupAggregate [7] +- StreamExchange [6] +- StreamCalc [5] +- StreamTableSourceScan [4] == Physical Details == [1] StreamTableSourceScan Table: `examples`.`marketplace`.`clicks` Changelog mode: append State size: low [4] StreamTableSourceScan Table: `examples`.`marketplace`.`orders` Changelog mode: append State size: low [7] StreamGroupAggregate Changelog mode: retract Upsert key: (customer_id) State size: medium [8] StreamExchange Changelog mode: retract Upsert key: (customer_id) [9] StreamJoin Changelog mode: retract State size: medium [10] StreamCalc Changelog mode: retract [11] StreamSink Table: `catalog`.`database`.`clicks_without_orders` Changelog mode: retract State size: low Inserting values¶ This example shows inserting static values: EXPLAIN INSERT INTO orders VALUES (1, 1001, '2023-02-24', 50.0), (2, 1002, '2023-02-25', 60.0), (3, 1003, '2023-02-26', 70.0); The output shows a simple insertion plan: == Physical Plan == StreamSink [3] +- StreamCalc [2] +- StreamValues [1] == Physical Details == [1] StreamValues Changelog mode: append State size: low [3] StreamSink Table: `catalogs`.`database`.`orders` Changelog mode: append State size: low Multiple operations¶ This example demonstrates operation reuse across multiple inserts: EXPLAIN STATEMENT SET BEGIN INSERT INTO low_orders SELECT * from `orders` where price < 100; INSERT INTO high_orders SELECT * from `orders` where price > 100; END; The output shows table scan reuse: == Physical Plan == StreamSink [3] +- StreamCalc [2] +- StreamTableSourceScan [1] StreamSink [5] +- StreamCalc [4] +- (reused) [1] == Physical Details == [1] StreamTableSourceScan Table: `examples`.`marketplace`.`orders` Changelog mode: append State size: low [3] StreamSink Table: `catalog`.`database`.`low_orders` Changelog mode: append State size: low [5] StreamSink Table: `catalog`.`database`.`high_orders` Changelog mode: append State size: low Window functions¶ This example shows window functions and self-joins: EXPLAIN WITH windowed_customers AS ( SELECT * FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`customers`, DESCRIPTOR($rowtime), INTERVAL '1' MINUTE) ) ) SELECT c1.window_start, c1.city, COUNT(DISTINCT c1.customer_id) as unique_customers, COUNT(c2.customer_id) as total_connections FROM windowed_customers c1 JOIN windowed_customers c2 ON c1.city = c2.city AND c1.customer_id < c2.customer_id AND c1.window_start = c2.window_start GROUP BY c1.window_start, c1.city HAVING COUNT(DISTINCT c1.customer_id) > 5; The output shows the complex processing required for windowed aggregations: == Physical Plan == StreamSink [14] +- StreamCalc [13] +- StreamGroupAggregate [12] +- StreamExchange [11] +- StreamCalc [10] +- StreamJoin [9] +- StreamExchange [8] : +- StreamCalc [7] : +- StreamWindowTableFunction [6] : +- StreamCalc [5] : +- StreamChangelogNormalize [4] : +- StreamExchange [3] : +- StreamCalc [2] : +- StreamTableSourceScan [1] +- (reused) [8] == Physical Details == [1] StreamTableSourceScan Table: `examples`.`marketplace`.`customers` Primary key: (customer_id) Changelog mode: upsert Upsert key: (customer_id) State size: low [2] StreamCalc Changelog mode: upsert Upsert key: (customer_id) [3] StreamExchange Changelog mode: upsert Upsert key: (customer_id) [4] StreamChangelogNormalize Changelog mode: retract Upsert key: (customer_id) State size: medium [5] StreamCalc Changelog mode: retract Upsert key: (customer_id) [6] StreamWindowTableFunction Changelog mode: retract State size: low [7] StreamCalc Changelog mode: retract [8] StreamExchange Changelog mode: retract [9] StreamJoin Changelog mode: retract State size: medium [10] StreamCalc Changelog mode: retract [11] StreamExchange Changelog mode: retract [12] StreamGroupAggregate Changelog mode: retract Upsert key: (window_start,city) State size: medium [13] StreamCalc Changelog mode: retract Upsert key: (window_start,city) [14] StreamSink Table: Foreground Changelog mode: retract Upsert key: (window_start,city) State size: low Understanding the output¶ Reading physical plans¶ The physical plan shows how Flink executes your query. Each operation is numbered and indented to show its position in the execution flow. Indentation indicates data flow, with each operator passing results to its parent. Changelog modes¶ Changelog modes describe how operators handle data modifications: Append: The operator processes only insert operations. New rows are simply added. Upsert: The operator handles both inserts and updates. It uses an “upsert key” to identify rows. If a row with a given key exists already, the operator updates it; otherwise, it inserts a new row. Retract: The operator handles inserts, updates, and deletes. Updates are typically represented as a retraction (deletion) of the old row followed by an insertion of the new row. Deletes are represented as retractions. Operators change changelog modes when different update patterns are needed, such as when moving from streaming reads to aggregations. Data movement¶ The physical details section shows how data moves between operators. Watch for: Exchange operators indicating data redistribution Changes in upsert keys showing where data must be reshuffled Operator reuse marked by “(reused)” references State size¶ Each operator in the physical plan includes a “State Size” property indicating its memory requirements during execution: LOW: Minimal state maintenance, typically efficient memory usage MEDIUM: Moderate state requirements, may need attention with high cardinality HIGH: Significant state maintenance that requires careful management When operators show HIGH state size, you should configure a state TTL to prevent unbounded state growth. Without TTL configuration, these operators can accumulate unlimited state over time, potentially leading to resource exhaustion and the statement ending up in a DEGRADED state. SET 'sql.state-ttl' = '12 hours'; For MEDIUM state size, consider TTL settings if your data has high cardinality or frequent updates per key. Physical operators¶ Below is a reference of common operators you may see in EXPLAIN output, along with examples of SQL that typically produces them. Basic operations¶ StreamTableSourceScanReads data from a source table. The foundation of any query reading from a table. SELECT * FROM orders; StreamCalcPerforms row-level computations and filtering. Appears when using WHERE clauses or expressions in SELECT. SELECT amount * 1.1 as amount_with_tax FROM orders WHERE status = 'completed'; StreamValuesGenerates literal row values. Commonly seen with INSERT statements. INSERT INTO orders VALUES (1, 'pending', 100); StreamSinkWrites results to a destination. Present in any INSERT or when displaying query results. Supports two modes of operation: Append-only: Each record is treated as a new event, which displays as State size: Low. Upsert-materialize: Maintains state to handle updates/deletes based on key fields. which displays as State size: High. INSERT INTO order_summaries SELECT status, COUNT(*) FROM orders GROUP BY status; Aggregation operations¶ StreamGroupAggregatePerforms grouping and aggregation. Created by GROUP BY clauses. SELECT customer_id, SUM(price) FROM orders GROUP BY customer_id; StreamLocalWindowAggregate and StreamGlobalWindowAggregateThese operators implement Flink two-phase aggregation strategy for distributed stream processing. They work together to compute aggregations efficiently across multiple parallel instances while maintaining exactly-once processing semantics. The LocalGroupAggregate performs initial aggregation within each parallel task, maintaining partial results in its state. The GlobalGroupAggregate then combines these partial results to produce final aggregations. This two-phase approach appears in both regular GROUP BY operations and windowed aggregations. For window operations, these operators appear as StreamLocalWindowAggregate and StreamGlobalWindowAggregate. Here’s an example that triggers their use: SELECT window_start, window_end, SUM(price) as total_price FROM TABLE( TUMBLE(TABLE orders, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES)) GROUP BY window_start, window_end; Join operations¶ StreamJoinPerforms standard stream-to-stream joins. SELECT o.*, c.name FROM orders o JOIN customers c ON o.customer_id = c.id; StreamTemporalJoinJoins streams using temporal (time-versioned) semantics. SELECT orders.*, customers.* FROM orders LEFT JOIN customers FOR SYSTEM_TIME AS OF orders.`$rowtime` ON orders.customer_id = customers.customer_id; StreamIntervalJoinJoins streams within a time interval. SELECT * FROM orders o, clicks c WHERE o.customer_id = c.user_id AND o.`$rowtime` BETWEEN c.`$rowtime` - INTERVAL '1' MINUTE AND c.`$rowtime`; StreamWindowJoinJoins streams within defined windows. SELECT * FROM ( SELECT * FROM TABLE(TUMBLE(TABLE clicks, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES)) ) c JOIN ( SELECT * FROM TABLE(TUMBLE(TABLE orders, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES)) ) o ON c.user_id = o.customer_id AND c.window_start = o.window_start AND c.window_end = o.window_end; Ordering and ranking¶ StreamRankComputes the smallest or largest values (Top-N queries). SELECT product_id, price FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY price DESC) AS row_num FROM orders) WHERE row_num <= 5; StreamLimitLimits the number of returned rows. SELECT * FROM orders LIMIT 10; StreamSortLimitCombines sorting with row limiting. SELECT * FROM orders ORDER BY $rowtime LIMIT 10; StreamWindowRankComputes the smallest or largest values within window boundaries (Window Top-N queries). SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY window_start, window_end ORDER BY price DESC) as rownum FROM ( SELECT window_start, window_end, customer_id, SUM(price) as price, COUNT(*) as cnt FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES)) GROUP BY window_start, window_end, customer_id ) ) WHERE rownum <= 3; Data movement and distribution¶ StreamExchangeRedistributes/exchanges data between parallel instances. For example, when you write a query with a GROUP BY clause, Flink might use a HASH exchange to ensure all records with the same key are processed by the same task: -- Appears in plans with GROUP BY on a different key than the source distribution SELECT customer_id, COUNT(*) FROM orders GROUP BY customer_id; StreamUnionCombines results from multiple queries. SELECT * FROM european_orders UNION ALL SELECT * FROM american_orders; StreamExpandGenerates multiple rows from a single row for CUBE, ROLLUP, and GROUPING SETS. SELECT department, brand, COUNT(*) as product_count, COUNT(DISTINCT vendor) as vendor_count FROM products GROUP BY CUBE(department, brand) HAVING COUNT(*) > 1; Specialized operations¶ StreamChangelogNormalizeConverts upsert-based changelog streams (based on primary key) into retract-based streams (with explicit +/- records) to support correct aggregation results in streaming queries. -- Appears when processing versioned data, like a table that uses upsert semantics SELECT COUNT(*) as cnt FROM products; StreamAsyncCalcExecutes user-defined functions. This operator allows for non-blocking execution of user-defined functions (UDFs). SELECT my_udf(name) FROM customers; StreamWindowTableFunctionApplies windowing operations as table functions. SELECT * FROM TABLE( TUMBLE(TABLE orders, DESCRIPTOR($rowtime), INTERVAL '1' HOUR) ); StreamCorrelateHandles correlated subqueries (UNNEST) and table function calls. EXPLAIN SELECT product_id, product_name, tag FROM ( VALUES (1, 'Laptop', ARRAY['electronics', 'computers']), (2, 'Phone', ARRAY['electronics', 'mobile']) ) AS products (product_id, product_name, tags) CROSS JOIN UNNEST(tags) AS t (tag); StreamMatchExecutes pattern-matching operations using MATCH_RECOGNIZE. SELECT * FROM orders MATCH_RECOGNIZE ( PARTITION BY customer_id ORDER BY $rowtime MEASURES COUNT(*) as order_count PATTERN (A B+) DEFINE A as price > 100, B as price <= 100 ); Optimizing query performance¶ Minimizing data movement¶ Data shuffling impacts performance. When examining EXPLAIN output: Look for exchange operators and upsert key changes. Consider keeping compatible partitioning keys through your query. Watch for opportunities to reduce data redistribution. Pay special attention to data skew when designing your queries. If a particular key value appears much more frequently than others, it can lead to uneven processing where a single parallel instance becomes overwhelmed handling that key’s data. Consider strategies like adding additional dimensions to your keys or pre-aggregating hot keys to distribute the workload more evenly. Using operator reuse¶ Flink automatically reuses operators when possible. In EXPLAIN output: Look for “(reused)” references showing optimization. Consider restructuring queries to enable more reuse. Verify that similar operations share scan results. Optimizing sink configuration¶ When working with sinks in upsert mode, it’s crucial to align your primary and upsert keys for optimal performance: Whenever possible, configure the primary key to be identical to the upsert key. Having different primary and upsert keys in upsert mode can lead to significant performance degradation. If you must use different keys, carefully evaluate the performance impact and consider restructuring your query to align these keys. Related content¶ Flink SQL Query Profiler Profile a Query SELECT INSERT VALUES INSERT INTO FROM SELECT CREATE TABLE AS Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql EXPLAIN { | | | CREATE TABLE ... AS SELECT ... } : STATEMENT SET BEGIN -- one or more INSERT INTO statements { INSERT INTO ; }+ END; ``` ```sql EXPLAIN SELECT c.* FROM `examples`.`marketplace`.`clicks` c LEFT JOIN ( SELECT DISTINCT customer_id FROM `examples`.`marketplace`.`orders` ) o ON c.user_id = o.customer_id WHERE o.customer_id IS NULL; ``` ```sql == Physical Plan == StreamSink [11] +- StreamCalc [10] +- StreamJoin [9] +- StreamExchange [3] : +- StreamCalc [2] : +- StreamTableSourceScan [1] +- StreamExchange [8] +- StreamGroupAggregate [7] +- StreamExchange [6] +- StreamCalc [5] +- StreamTableSourceScan [4] == Physical Details == [1] StreamTableSourceScan Table: `examples`.`marketplace`.`clicks` Changelog mode: append State size: low [4] StreamTableSourceScan Table: `examples`.`marketplace`.`orders` Changelog mode: append State size: low [7] StreamGroupAggregate Changelog mode: retract Upsert key: (customer_id) State size: medium [8] StreamExchange Changelog mode: retract Upsert key: (customer_id) [9] StreamJoin Changelog mode: retract State size: medium [10] StreamCalc Changelog mode: retract [11] StreamSink Table: Foreground Changelog mode: retract State size: low ``` ```sql [11] StreamSink Table: Foreground ``` ```sql EXPLAIN CREATE TABLE clicks_without_orders AS SELECT c.* FROM `examples`.`marketplace`.`clicks` c LEFT JOIN ( SELECT DISTINCT customer_id FROM `examples`.`marketplace`.`orders` ) o ON c.user_id = o.customer_id WHERE o.customer_id IS NULL; ``` ```sql == Physical Plan == StreamSink [11] +- StreamCalc [10] +- StreamJoin [9] +- StreamExchange [3] : +- StreamCalc [2] : +- StreamTableSourceScan [1] +- StreamExchange [8] +- StreamGroupAggregate [7] +- StreamExchange [6] +- StreamCalc [5] +- StreamTableSourceScan [4] == Physical Details == [1] StreamTableSourceScan Table: `examples`.`marketplace`.`clicks` Changelog mode: append State size: low [4] StreamTableSourceScan Table: `examples`.`marketplace`.`orders` Changelog mode: append State size: low [7] StreamGroupAggregate Changelog mode: retract Upsert key: (customer_id) State size: medium [8] StreamExchange Changelog mode: retract Upsert key: (customer_id) [9] StreamJoin Changelog mode: retract State size: medium [10] StreamCalc Changelog mode: retract [11] StreamSink Table: `catalog`.`database`.`clicks_without_orders` Changelog mode: retract State size: low ``` ```sql EXPLAIN INSERT INTO orders VALUES (1, 1001, '2023-02-24', 50.0), (2, 1002, '2023-02-25', 60.0), (3, 1003, '2023-02-26', 70.0); ``` ```sql == Physical Plan == StreamSink [3] +- StreamCalc [2] +- StreamValues [1] == Physical Details == [1] StreamValues Changelog mode: append State size: low [3] StreamSink Table: `catalogs`.`database`.`orders` Changelog mode: append State size: low ``` ```sql EXPLAIN STATEMENT SET BEGIN INSERT INTO low_orders SELECT * from `orders` where price < 100; INSERT INTO high_orders SELECT * from `orders` where price > 100; END; ``` ```sql == Physical Plan == StreamSink [3] +- StreamCalc [2] +- StreamTableSourceScan [1] StreamSink [5] +- StreamCalc [4] +- (reused) [1] == Physical Details == [1] StreamTableSourceScan Table: `examples`.`marketplace`.`orders` Changelog mode: append State size: low [3] StreamSink Table: `catalog`.`database`.`low_orders` Changelog mode: append State size: low [5] StreamSink Table: `catalog`.`database`.`high_orders` Changelog mode: append State size: low ``` ```sql EXPLAIN WITH windowed_customers AS ( SELECT * FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`customers`, DESCRIPTOR($rowtime), INTERVAL '1' MINUTE) ) ) SELECT c1.window_start, c1.city, COUNT(DISTINCT c1.customer_id) as unique_customers, COUNT(c2.customer_id) as total_connections FROM windowed_customers c1 JOIN windowed_customers c2 ON c1.city = c2.city AND c1.customer_id < c2.customer_id AND c1.window_start = c2.window_start GROUP BY c1.window_start, c1.city HAVING COUNT(DISTINCT c1.customer_id) > 5; ``` ```sql == Physical Plan == StreamSink [14] +- StreamCalc [13] +- StreamGroupAggregate [12] +- StreamExchange [11] +- StreamCalc [10] +- StreamJoin [9] +- StreamExchange [8] : +- StreamCalc [7] : +- StreamWindowTableFunction [6] : +- StreamCalc [5] : +- StreamChangelogNormalize [4] : +- StreamExchange [3] : +- StreamCalc [2] : +- StreamTableSourceScan [1] +- (reused) [8] == Physical Details == [1] StreamTableSourceScan Table: `examples`.`marketplace`.`customers` Primary key: (customer_id) Changelog mode: upsert Upsert key: (customer_id) State size: low [2] StreamCalc Changelog mode: upsert Upsert key: (customer_id) [3] StreamExchange Changelog mode: upsert Upsert key: (customer_id) [4] StreamChangelogNormalize Changelog mode: retract Upsert key: (customer_id) State size: medium [5] StreamCalc Changelog mode: retract Upsert key: (customer_id) [6] StreamWindowTableFunction Changelog mode: retract State size: low [7] StreamCalc Changelog mode: retract [8] StreamExchange Changelog mode: retract [9] StreamJoin Changelog mode: retract State size: medium [10] StreamCalc Changelog mode: retract [11] StreamExchange Changelog mode: retract [12] StreamGroupAggregate Changelog mode: retract Upsert key: (window_start,city) State size: medium [13] StreamCalc Changelog mode: retract Upsert key: (window_start,city) [14] StreamSink Table: Foreground Changelog mode: retract Upsert key: (window_start,city) State size: low ``` ```sql SET 'sql.state-ttl' = '12 hours'; ``` ```sql SELECT * FROM orders; ``` ```sql SELECT amount * 1.1 as amount_with_tax FROM orders WHERE status = 'completed'; ``` ```sql INSERT INTO orders VALUES (1, 'pending', 100); ``` ```sql INSERT INTO order_summaries SELECT status, COUNT(*) FROM orders GROUP BY status; ``` ```sql SELECT customer_id, SUM(price) FROM orders GROUP BY customer_id; ``` ```sql SELECT window_start, window_end, SUM(price) as total_price FROM TABLE( TUMBLE(TABLE orders, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES)) GROUP BY window_start, window_end; ``` ```sql SELECT o.*, c.name FROM orders o JOIN customers c ON o.customer_id = c.id; ``` ```sql SELECT orders.*, customers.* FROM orders LEFT JOIN customers FOR SYSTEM_TIME AS OF orders.`$rowtime` ON orders.customer_id = customers.customer_id; ``` ```sql SELECT * FROM orders o, clicks c WHERE o.customer_id = c.user_id AND o.`$rowtime` BETWEEN c.`$rowtime` - INTERVAL '1' MINUTE AND c.`$rowtime`; ``` ```sql SELECT * FROM ( SELECT * FROM TABLE(TUMBLE(TABLE clicks, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES)) ) c JOIN ( SELECT * FROM TABLE(TUMBLE(TABLE orders, DESCRIPTOR($rowtime), INTERVAL '5' MINUTES)) ) o ON c.user_id = o.customer_id AND c.window_start = o.window_start AND c.window_end = o.window_end; ``` ```sql SELECT product_id, price FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY price DESC) AS row_num FROM orders) WHERE row_num <= 5; ``` ```sql SELECT * FROM orders LIMIT 10; ``` ```sql SELECT * FROM orders ORDER BY $rowtime LIMIT 10; ``` ```sql SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY window_start, window_end ORDER BY price DESC) as rownum FROM ( SELECT window_start, window_end, customer_id, SUM(price) as price, COUNT(*) as cnt FROM TABLE( TUMBLE(TABLE `examples`.`marketplace`.`orders`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES)) GROUP BY window_start, window_end, customer_id ) ) WHERE rownum <= 3; ``` ```sql -- Appears in plans with GROUP BY on a different key than the source distribution SELECT customer_id, COUNT(*) FROM orders GROUP BY customer_id; ``` ```sql SELECT * FROM european_orders UNION ALL SELECT * FROM american_orders; ``` ```sql SELECT department, brand, COUNT(*) as product_count, COUNT(DISTINCT vendor) as vendor_count FROM products GROUP BY CUBE(department, brand) HAVING COUNT(*) > 1; ``` ```sql -- Appears when processing versioned data, like a table that uses upsert semantics SELECT COUNT(*) as cnt FROM products; ``` ```sql SELECT my_udf(name) FROM customers; ``` ```sql SELECT * FROM TABLE( TUMBLE(TABLE orders, DESCRIPTOR($rowtime), INTERVAL '1' HOUR) ); ``` ```sql EXPLAIN SELECT product_id, product_name, tag FROM ( VALUES (1, 'Laptop', ARRAY['electronics', 'computers']), (2, 'Phone', ARRAY['electronics', 'mobile']) ) AS products (product_id, product_name, tags) CROSS JOIN UNNEST(tags) AS t (tag); ``` ```sql SELECT * FROM orders MATCH_RECOGNIZE ( PARTITION BY customer_id ORDER BY $rowtime MEASURES COUNT(*) as order_count PATTERN (A B+) DEFINE A as price > 100, B as price <= 100 ); ``` --- ### SQL HINTS in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/statements/hints.html Dynamic Table Options in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports dynamic table options, or SQL hints, which enable you to specify or override table options dynamically. Syntax¶ To use dynamic table options, employ the following Oracle-style SQL hint syntax: table_path /*+ OPTIONS(key=val [, key=val]*) */ key: stringLiteral val: stringLiteral The dynamic options must be placed next to the table and not by any aliases, for example: SELECT * FROM t /*+ OPTIONS(...) */ AS alias; Description¶ Dynamic Table Options in Confluent Cloud for Apache Flink offer the following benefits: Flexible configuration: Specify table options on a per-statement basis, providing more flexibility than static options as stored in the table definition. Query-specific adjustments: Customize table behavior for individual queries without altering the permanent table definition. Examples¶ Here are some examples of using dynamic table options in Confluent Cloud for Apache Flink: Override scan startup mode for a table: SELECT id, name FROM table /*+ OPTIONS('scan.startup.mode'='earliest-offset') */; Set options for multiple tables in a join: SELECT * FROM table1 /*+ OPTIONS('scan.startup.mode'='earliest-offset') */ t1 JOIN table2 /*+ OPTIONS('scan.startup.mode'='earliest-offset') */ t2 ON t1.id = t2.id; Set the scan startup mode to use the latest offset: SELECT * FROM orders /*+ OPTIONS('scan.startup.mode'='latest-offset') */; Set the scan startup mode to use the specific offsets, for example, using the latest_offsets attribute from a previous statement: INSERT INTO customers_sink (customer_id, name, address, postcode, city, email) SELECT customer_id, name, address, postcode, city, email FROM customers_source /*+ OPTIONS( 'scan.startup.mode' = 'specific-offsets', 'scan.startup.specific-offsets' = 'partition:0,offset:10;partition:1,offset:123' ) */; // Note: for a statement with multiple topics, use OPTIONS for each table SELECT * FROM table1 /*+ OPTIONS('scan.startup.mode'='specific-offsets', 'scan.startup.specific-offsets' = '...') */ t1 JOIN table2 /*+ OPTIONS('scan.startup.mode'='specific-offsets', 'scan.startup.specific-offsets' = '...') */ t2 ON t1.id = t2.id; State TTL Hints¶ For stateful computations such as Regular Joins and Group Aggregations, Confluent Cloud for Apache Flink supports the STATE_TTL hint. This hint allows you to specify operator-level Idle State Retention Time, enabling these operators to have a different TTL from the pipeline-level configuration set by sql.state-ttl. Syntax¶ The syntax for using State TTL hints is as follows: table_path /*+ STATE_TTL('table_name_or_alias'='ttl_value') */ ttl_value: stringLiteral (e.g., '6h', '2d', '10800s') Examples¶ Here are some examples of using State TTL hints in Confluent Cloud for Apache Flink for social media analytics: Set State TTL for a Regular Join of posts and users: SELECT /*+ STATE_TTL('posts'='6h', 'users'='2d') */ * FROM posts JOIN users ON posts.user_id = users.id; Use table aliases with State TTL hints for analyzing engagement: SELECT /*+ STATE_TTL('p'='4h', 'e'='12h') */ * FROM posts p JOIN engagement e ON p.post_id = e.post_id; Apply State TTL hints in a Group Aggregation for trending hashtags: SELECT /*+ STATE_TTL('hashtags' = '1h') */ hashtag, COUNT(*) AS usage_count FROM hashtags GROUP BY hashtag; Important Considerations¶ When using State TTL hints, keep the following in mind: You can use either the table name or table alias as the hint key. If you specify an alias for a table, you must use that alias in the STATE_TTL hint. For queries with multiple joins, the specified TTLs are applied in a bottom-up order. The STATE_TTL hint only affects the query block where it’s applied. If a hint key is duplicated, the last occurrence takes precedence. When multiple STATE_TTL hints are used with the same hint key, the first occurrence is applied. Related content¶ CREATE TABLE ALTER TABLE Table Options Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql table_path /*+ OPTIONS(key=val [, key=val]*) */ key: stringLiteral val: stringLiteral ``` ```sql SELECT * FROM t /*+ OPTIONS(...) */ AS alias; ``` ```sql SELECT id, name FROM table /*+ OPTIONS('scan.startup.mode'='earliest-offset') */; ``` ```sql SELECT * FROM table1 /*+ OPTIONS('scan.startup.mode'='earliest-offset') */ t1 JOIN table2 /*+ OPTIONS('scan.startup.mode'='earliest-offset') */ t2 ON t1.id = t2.id; ``` ```sql SELECT * FROM orders /*+ OPTIONS('scan.startup.mode'='latest-offset') */; ``` ```sql INSERT INTO customers_sink (customer_id, name, address, postcode, city, email) SELECT customer_id, name, address, postcode, city, email FROM customers_source /*+ OPTIONS( 'scan.startup.mode' = 'specific-offsets', 'scan.startup.specific-offsets' = 'partition:0,offset:10;partition:1,offset:123' ) */; // Note: for a statement with multiple topics, use OPTIONS for each table SELECT * FROM table1 /*+ OPTIONS('scan.startup.mode'='specific-offsets', 'scan.startup.specific-offsets' = '...') */ t1 JOIN table2 /*+ OPTIONS('scan.startup.mode'='specific-offsets', 'scan.startup.specific-offsets' = '...') */ t2 ON t1.id = t2.id; ``` ```sql table_path /*+ STATE_TTL('table_name_or_alias'='ttl_value') */ ttl_value: stringLiteral (e.g., '6h', '2d', '10800s') ``` ```sql SELECT /*+ STATE_TTL('posts'='6h', 'users'='2d') */ * FROM posts JOIN users ON posts.user_id = users.id; ``` ```sql SELECT /*+ STATE_TTL('p'='4h', 'e'='12h') */ * FROM posts p JOIN engagement e ON p.post_id = e.post_id; ``` ```sql SELECT /*+ STATE_TTL('hashtags' = '1h') */ hashtag, COUNT(*) AS usage_count FROM hashtags GROUP BY hashtag; ``` --- ### SQL Statements in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/statements/overview.html DDL Statements in Confluent Cloud for Apache Flink¶ In Confluent Cloud for Apache Flink®, a statement is a high-level resource that’s created when you enter a SQL query. Data Definition Language (DDL) statements are imperative verbs that define metadata in Flink SQL by adding, changing, or deleting tables. Unlike Data Manipulation Language (DML) statements, DDL statements modify only metadata and don’t change data. When you want to change data, use DML statements. For valid lexical structure of statements, see Flink SQL Syntax in Confluent Cloud for Apache Flink. Available DDL statements¶ These are the available DDL statements in Confluent Cloud for Flink SQL. ALTER ALTER TABLE: Change properties of an existing table. ALTER MODEL: Rename an AI model or change model options. CREATE CREATE TABLE: Register a table into the current or specified catalog (Confluent Cloud environment). CREATE FUNCTION: Register a user-defined function (UDF) in the current database (Apache Kafka® cluster). CREATE MODEL: Create a new AI model. DESCRIBE DESCRIBE: Show properties of a table, AI model, or UDF. DROP DROP MODEL: Remove an AI model. DROP TABLE: Remove a table. DROP VIEW: Remove a view from a catalog. EXPLAIN EXPLAIN: View the query plan of a Flink SQL statement. RESET RESET: Reset the Flink SQL shell configuration to default settings. SET SET: Modify or list the Flink SQL shell configuration. SHOW SHOW CATALOGS: List all catalogs. SHOW CREATE MODEL: Show details about an AI inference model. SHOW CREATE TABLE: Show details about a table. SHOW CURRENT CATALOG: Show the current catalog. SHOW CURRENT DATABASE: Show the current database. SHOW DATABASES: List all databases in the current catalog. SHOW FUNCTIONS: List all functions in the current catalog and database. SHOW JOBS: List the status of all statements in the current catalog. SHOW MODELS: List all AI models that are registered in the current catalog. SHOW TABLES: List all tables for the current database. USE USE CATALOG: Set the current catalog. USE [database_name]: Set the current database. Related content¶ Flink SQL Syntax Flink SQL Queries Stream Processing Concepts Built-in Functions Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. --- ### SQL RESET Statement in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/statements/reset.html RESET Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables resetting Flink SQL shell properties to default values. Syntax¶ RESET 'key'; Description¶ Reset the Flink SQL shell configuration to the default settings. If no key is specified, all properties are set to their default values. To assign a session property, use the SET Statement in Confluent Cloud for Apache Flink. Example¶ The following examples show how to run a RESET statement in the Flink SQL shell. RESET 'table.local-time-zone'; Your output should resemble: configuration key "table.local-time-zone" has been reset successfully. +------------------------+---------------------+ | Key | Value | +------------------------+---------------------+ | client.service-account | (default) | | sql.local-time-zone | GMT+02:00 (default) | +------------------------+---------------------+ RESET; configuration has been reset successfully. +------------------------+---------------------+ | Key | Value | +------------------------+---------------------+ | client.service-account | (default) | | sql.local-time-zone | GMT+02:00 (default) | +------------------------+---------------------+ Related content¶ SET Statement in Confluent Cloud for Apache Flink Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql RESET 'key'; ``` ```sql RESET 'table.local-time-zone'; ``` ```sql configuration key "table.local-time-zone" has been reset successfully. +------------------------+---------------------+ | Key | Value | +------------------------+---------------------+ | client.service-account | (default) | | sql.local-time-zone | GMT+02:00 (default) | +------------------------+---------------------+ ``` ```sql configuration has been reset successfully. +------------------------+---------------------+ | Key | Value | +------------------------+---------------------+ | client.service-account | (default) | | sql.local-time-zone | GMT+02:00 (default) | +------------------------+---------------------+ ``` --- ### SQL SET Statement in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/statements/set.html SET Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables setting Flink SQL shell properties to different values. Syntax¶ SET 'key' = 'value'; Description¶ Modify or list the Flink SQL shell configuration. If no key and value are specified, SET prints all of the properties that you have assigned for the session. To reset a session property to its default value, use the RESET Statement in Confluent Cloud for Apache Flink. Note In a Cloud Console workspace, the SET statement can’t be run separately and must be submitted along with another Flink SQL statement, like SELECT, CREATE, or INSERT, for example: SET 'sql.current-catalog' = 'default'; SET 'sql.current-database' = 'cluster_0'; SELECT * FROM pageviews; Example¶ The following examples show how to run a SET statement in the Flink SQL shell. SET 'table.local-time-zone' = 'America/Los_Angeles'; Your output should resemble: Statement successfully submitted. Statement phase is COMPLETED. configuration updated successfully. To list the current session settings, run the SET command with no parameters. SET; Your output should resemble: Statement successfully submitted. Statement phase is COMPLETED. +-----------------------+--------------------------+ | Key | Value | +-----------------------+--------------------------+ | catalog | default (default) | | default_database | (default) | | table.local-time-zone | America/Los_Angeles | +-----------------------+--------------------------+ The SET; operation is not supported in Cloud Console workspaces. Available SET Options¶ These are the available configuration options available by using the SET statement in Confluent Cloud for Apache Flink. For a comparison of option names with corresponding options in Apache Flink, see Configuration options. Table options¶ Key Default Type Description sql.current-catalog (None) String Defines the current catalog. Semantically equivalent with USE CATALOG [catalog_name]. Required if object identifiers are not fully qualified. sql.current-database (None) String Defines the current database. Semantically equivalent with USE [database_id]. Required if object identifiers are not fully qualified. sql.dry-run false Boolean If true, the statement is parsed and validated but not executed. sql.inline-result false Boolean If true, query results are returned inline. sql.local-time-zone “UTC” String Specifies the local time zone offset for TIMESTAMP_LTZ conversions. When converting to data types that don’t include a time zone (for example, TIMESTAMP, TIME, or simply STRING), this time zone is used. The input for this option is either a Time Zone Database (TZDB) ID, like “America/Los_Angeles”, or fixed offset, like “GMT+03:00”. sql.snapshot.mode “off” String Specifies the mode for snapshot queries. Valid values are “now” and “off”. If not specified, the default value is “now”. For more information, see Snapshot Queries in Confluent Cloud for Apache Flink. sql.state-ttl 0 ms Duration Specifies a minimum time interval for how long idle state, which is state that hasn’t been updated, is retained. The system decides on actual clearance after this interval. If set to the default value of 0, no clearance is performed. sql.tables.initial-offset-from (None) String Specifies the name of a reference statement from which to carry over topic offsets when creating a new statement. Applies only when replacing an existing statement in the same organization, environment, and region. For details, see Carry Over Offsets. sql.tables.scan.bounded.timestamp-millis (None) Long Overwrites scan.bounded.timestamp-millis for Confluent-native tables used in newly created queries. This option is not applied if the table uses a value that differs from the default value. sql.tables.scan.bounded.mode (None) GlobalScanBoundedMode Overwrites scan.bounded.mode for Confluent-native tables used in newly created queries. This option is not applied if the table uses a value that differs from the default value. sql.tables.scan.idle-timeout (None) Duration Specifies the timeout interval for progressive idleness detection. Setting this value to 0 disables idleness detection. For more information, see Progressive idleness detection. sql.tables.scan.watermark-alignment.max-allowed-drift 5 min Duration Specifies the maximum allowed drift for watermark alignment across different splits or partitions to ensure even processing. Setting to 0 disables watermark alignment, which can prevent performance bottlenecks and latency for queries that don’t require event-time semantics, like regular joins, non-windowed aggregations, and ETL. Intended for advanced use-cases, because incorrect use can cause issues, for example, state growth, in queries that depend on event-time. For more information, see Watermark alignment. sql.tables.scan.startup.timestamp-millis (None) Long Overwrites scan.startup.timestamp-millis for Confluent-native tables used in newly created queries. This option is not applied if the table uses a value that differs from the default value. sql.tables.scan.startup.mode (None) GlobalScanStartupMode Overwrites scan.startup.mode for Confluent-native tables used in newly created queries. This option is not applied if the table uses a value that differs from the default value. Flink SQL shell options¶ The following SET options are available only in the Flink SQL shell. In a Cloud Console workspace, the only client option you can set is client.statement-name. Key Default Type Description client.output-format standard String Output format. Valid values are “standard” or “plain-text”. client.results-timeout 600000 Long Total amount of time, in milliseconds, to wait before timing out the request waiting for results to be ready. client.service-account (None) String Service account to use instead of running statements attached to your user account. For more information, see Production workloads (service accounts). client.statement-name (None) String Give your Flink statement a meaningful name that can help you identify it more easily. Instead of an autogenerated name, like 123e4567-e89b-12d3, this sets the statement name to the given value. To avoid naming conflicts, the name resets itself after successful submission. The underscore character (_) and period character (.) are not supported. Related content¶ RESET Statement in Confluent Cloud for Apache Flink Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql SET 'key' = 'value'; ``` ```sql SET 'sql.current-catalog' = 'default'; SET 'sql.current-database' = 'cluster_0'; SELECT * FROM pageviews; ``` ```sql SET 'table.local-time-zone' = 'America/Los_Angeles'; ``` ```sql Statement successfully submitted. Statement phase is COMPLETED. configuration updated successfully. ``` ```sql Statement successfully submitted. Statement phase is COMPLETED. +-----------------------+--------------------------+ | Key | Value | +-----------------------+--------------------------+ | catalog | default (default) | | default_database | (default) | | table.local-time-zone | America/Los_Angeles | +-----------------------+--------------------------+ ``` ```sql GlobalScanBoundedMode ``` ```sql GlobalScanStartupMode ``` ```sql client.statement-name ``` ```sql 123e4567-e89b-12d3 ``` --- ### SQL SHOW Statements in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/statements/show.html SHOW Statements in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables listing catalogs, which map to Confluent Cloud environments, databases, which map to Apache Kafka® clusters, and other available Flink resources, like AI models, UDFs, connections, and tables. Confluent Cloud for Apache Flink supports these SHOW statements. SHOW CATALOGS SHOW CONNECTIONS SHOW CREATE MODEL SHOW CURRENT CATALOG SHOW CREATE TABLE SHOW CURRENT DATABASE SHOW DATABASES SHOW JOBS SHOW FUNCTIONS SHOW MODELS SHOW TABLES SHOW CATALOGS¶ SyntaxSHOW CATALOGS; DescriptionShow all catalogs. Confluent Cloud for Apache Flink maps Flink catalogs to environments. ExampleSHOW CATALOGS; Your output should resemble: +-------------------------+------------+ | catalog name | catalog id | +-------------------------+------------+ | my_environment | env-12abcz | | example-streams-env | env-23xjoo | | quickstart-env | env-9wg8ny | | default | env-t12345 | +-------------------------+------------+ Run the USE CATALOG statement to set the current Flink catalog (Confluent Cloud environment). USE CATALOG my_environment; Your output should resemble: +---------------------+----------------+ | Key | Value | +---------------------+----------------+ | sql.current-catalog | my_environment | +---------------------+----------------+ SHOW CONNECTIONS¶ SyntaxSHOW CONNECTIONS [LIKE ]; DescriptionShow all connections. ExampleSHOW CONNECTIONS; -- with name filter SHOW CONNECTIONS LIKE 'sql%'; Your output should resemble: +-------------------------+ | Name | +-------------------------+ | azure-openai-connection | | deepwiki-mcp-connection | | demo-day-mcp-connection | | mcp-connection | +-------------------------+ SHOW CURRENT CATALOG¶ SyntaxSHOW CURRENT CATALOG; DescriptionShow the current catalog. ExampleSHOW CURRENT CATALOG; Your output should resemble: +----------------------+ | current catalog name | +----------------------+ | my_environment | +----------------------+ SHOW DATABASES¶ SyntaxSHOW DATABASES; DescriptionShow all databases in the current catalog. Confluent Cloud for Apache Flink maps Flink databases to Kafka clusters. ExampleSHOW DATABASES; Your output should resemble: +---------------+-------------+ | database name | database id | +---------------+-------------+ | cluster_0 | lkc-r289m7 | +---------------+-------------+ Run the USE statement to set the current database (Kafka cluster). USE cluster_0; Your output should resemble: +----------------------+-----------+ | Key | Value | +----------------------+-----------+ | sql.current-database | cluster_0 | +----------------------+-----------+ SHOW CURRENT DATABASE¶ SyntaxSHOW CURRENT DATABASE; DescriptionShow the current database. Confluent Cloud for Apache Flink maps Flink databases to Kafka clusters. ExampleSHOW CURRENT DATABASE; Your output should resemble: +-----------------------+ | current database name | +-----------------------+ | cluster_0 | +-----------------------+ SHOW TABLES¶ SyntaxSHOW TABLES [ [catalog_name.]database_name ] [ [NOT] LIKE ] DescriptionShow all tables for the current database. You can filter the output of SHOW TABLES by using the LIKE clause with an optional matching pattern. The optional LIKE clause shows all tables with names that match . The syntax of the SQL pattern in a LIKE clause is the same as in the MySQL dialect. % matches any number of characters, including zero characters. Use the backslash character to escape the % character: \% matches one % character. _ matches exactly one character. Use the backslash character to escape the _ character: \_ matches one _ character. ExampleCreate two tables in the current catalog: flights and orders. -- Create a flights table. CREATE TABLE flights ( flight_id STRING, origin STRING, destination STRING ); -- Create an orders table. CREATE TABLE orders ( user_id BIGINT NOT NULL, product_id STRING, amount INT ); Show all tables in the current database that are similar to the specified SQL pattern. SHOW TABLES LIKE 'f%'; Your output should resemble: +------------+ | table name | +------------+ | flights | +------------+ Show all tables in the given database that are not similar to the specified SQL pattern. SHOW TABLES NOT LIKE 'f%'; Your output should resemble: +------------+ | table name | +------------+ | orders | +------------+ Show all tables in the current database. SHOW TABLES; +------------+ | table name | +------------+ | flights | | orders | +------------+ SHOW CREATE TABLE¶ SyntaxSHOW CREATE TABLE [catalog_name.][db_name.]table_name; DescriptionShow details about the specified table. ExampleSHOW CREATE TABLE flights; Your output should resemble: +-----------------------------------------------------------+ | SHOW CREATE TABLE | +-----------------------------------------------------------+ | CREATE TABLE `my_environment`.`cluster_0`.`flights` ( | | `flight_id` VARCHAR(2147483647), | | `origin` VARCHAR(2147483647), | | `destination` VARCHAR(2147483647) | | ) WITH ( | | 'changelog.mode' = 'append', | | 'connector' = 'confluent', | | 'kafka.cleanup-policy' = 'delete', | | 'kafka.max-message-size' = '2097164 bytes', | | 'kafka.partitions' = '6', | | 'kafka.retention.size' = '0 bytes', | | 'kafka.retention.time' = '604800000 ms', | | 'scan.bounded.mode' = 'unbounded', | | 'scan.startup.mode' = 'earliest-offset', | | 'value.format' = 'avro-registry' | | ) | | | +-----------------------------------------------------------+ Inferred Tables¶ Inferred tables are tables that have not been created with CREATE TABLE but are detected automatically by using information about existing topics and Schema Registry entries. The following examples show SHOW CREATE TABLE called on the resulting table. No key and no value in Schema Registry¶ SHOW CREATE TABLE returns: CREATE TABLE `t_raw` ( `key` VARBINARY(2147483647), `val` VARBINARY(2147483647) ) DISTRIBUTED BY HASH(`key`) INTO 2 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'raw' ... ); Properties Key and value formats are raw (binary format) with BYTES Following Kafka message semantics, both key and value support NULL as well, so the following statement is supported: INSERT INTO t_raw (key, val) SELECT CAST(NULL AS BYTES), CAST(NULL AS BYTES); No key and but record value in Schema Registry¶ Given the following value in Schema Registry: { "type": "record", "name": "TestRecord", "fields": [ { "name": "i", "type": "int" }, { "name": "s", "type": "string" } ] } SHOW CREATE TABLE returns: CREATE TABLE `t_raw_key` ( `key` VARBINARY(2147483647), `i` INT NOT NULL, `s` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'avro-registry' ... ) Properties Key format is raw (binary format) with BYTES Following Kafka message semantics, key supports NULL as well. So this is possible: so the following statement is supported: INSERT INTO t_raw_key SELECT CAST(NULL AS BYTES), 12, 'Bob'; Atomic key and record value in Schema Registry¶ Given the following key and value in Schema Registry: "int" { "type": "record", "name": "TestRecord", "fields": [ { "name": "i", "type": "int" }, { "name": "s", "type": "string" } ] } SHOW CREATE TABLE returns: CREATE TABLE `t_atomic_key` ( `key` INT NOT NULL, `i` INT NOT NULL, `s` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`key`) INTO 2 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'avro-registry', 'value.format' = 'avro-registry' ... ) Properties Schema Registry defines column data type INT NOT NULL. The column name key is used as a default, because Schema Registry doesn’t provide a column name. Overlapping names in key/value, no key in Schema Registry¶ Given the following value in Schema Registry: { "type": "record", "name": "TestRecord", "fields": [ { "name": "i", "type": "int" }, { "name": "s", "type": "string" } ] } SHOW CREATE TABLE returns: CREATE TABLE `t_raw_disjoint` ( `key_key` VARBINARY(2147483647), `i` INT NOT NULL, `key` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`key_key`) INTO 1 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.fields-prefix' = 'key_', 'key.format' = 'raw', 'value.format' = 'avro-registry' ... ) Properties Schema Registry value defines columns INT NOT NULL and key STRING The column name key BYTES is used as a default if no key is in Schema Registry Because key would collide with value column, key_ prefix is added Record key and record value in Schema Registry¶ Given the following key and value in Schema Registry: { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" } ] } { "type": "record", "name": "TestRecord", "fields": [ { "name": "name", "type": "string" }, { "name": "zip_code", "type": "string" } ] } SHOW CREATE TABLE returns: CREATE TABLE `t_sr_disjoint` ( `uid` INT NOT NULL, `name` VARCHAR(2147483647) NOT NULL, `zip_code` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`uid`) INTO 1 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'value.format' = 'avro-registry' ... ) Properties Schema Registry defines columns for both key and value. The column names of key and value are disjoint sets and don’t overlap. Record key and record value with overlap in Schema Registry¶ Given the following key and value in Schema Registry: { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" } ] } { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" },{ "name": "name", "type": "string" }, { "name": "zip_code", "type": "string" } ] } SHOW CREATE TABLE returns: CREATE TABLE `t_sr_joint` ( `uid` INT NOT NULL, `name` VARCHAR(2147483647) NOT NULL, `zip_code` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`uid`) INTO 1 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'value.fields-include' = 'all', 'value.format' = 'avro-registry' ... ) Properties Schema Registry defines columns for both key and value. The column names of key and value overlap on uid. 'value.fields-include' = 'all' is set to exclude the key because it is fully contained in the value. Inferred tables schema evolution¶ Schema Registry columns overlap with computed/metadata columns¶ Given the following value in Schema Registry: { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" } ] } Evolve the table by adding metadata: ALTER TABLE t_metadata_overlap ADD `timestamp` TIMESTAMP_LTZ(3) NOT NULL METADATA; Evolve the table by adding an optional schema column: { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" }, { "name": "timestamp", "type": ["null", "string"], "default": null } ] } SHOW CREATE TABLE shows: CREATE TABLE t_metadata_overlap` ( `key` VARBINARY(2147483647), `uid` INT NOT NULL, `timestamp` TIMESTAMP(3) WITH LOCAL TIME ZONE NOT NULL METADATA ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( ... ) Properties Schema Registry says there is a timestamp physical column, but Flink says there is timestamp metadata column. In this case, metadata columns and computed columns have precedence, so Flink removes the physical column from the schema. Given that Flink advertises FULL_TRANSITIVE mode, queries still work, and the physical column is set to NULL in the payload: INSERT INTO t_metadata_overlap SELECT CAST(NULL AS BYTES), 42, TO_TIMESTAMP_LTZ(0, 3); SELECT * FROM t_metadata_overlap; Evolve the table by renaming metadata: ALTER TABLE t_metadata_overlap DROP `timestamp`; ALTER TABLE t_metadata_overlap ADD message_timestamp TIMESTAMP_LTZ(3) METADATA FROM 'timestamp'; SHOW CREATE TABLE shows: CREATE TABLE `t_metadata_overlap` ( `key` VARBINARY(2147483647), `uid` INT NOT NULL, `timestamp` VARCHAR(2147483647), `message_timestamp` TIMESTAMP(3) WITH LOCAL TIME ZONE METADATA FROM 'timestamp' ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( ... ) Properties Now, both physical and metadata column show up and can be accessed both for reading and writing. SHOW JOBS¶ SyntaxSHOW JOBS; DescriptionShow the status of all statements in the current catalog/environment. ExampleSHOW JOBS; Your output should resemble: +----------------------------------+-----------+------------------+--------------+------------------+------------------+ | Name | Phase | Statement | Compute Pool | Creation Time | Detail | +----------------------------------+-----------+------------------+--------------+------------------+------------------+ | 0fb72c57-8e3d-4614 | COMPLETED | CREATE TABLE ... | lfcp-8m03rm | 2024-01-23 13... | Table 'flight... | | 8567b0eb-fabd-4cb8 | COMPLETED | CREATE TABLE ... | lfcp-8m03rm | 2024-01-23 13... | Table 'orders... | | 4cd171ca-77db-48ce | COMPLETED | SHOW TABLES L... | lfcp-8m03rm | 2024-01-23 13... | | | 291eb50b-965c-4a53 | COMPLETED | SHOW TABLES N... | lfcp-8m03rm | 2024-01-23 13... | | | 7a30e70a-36af-41f4 | COMPLETED | SHOW TABLES; | lfcp-8m03rm | 2024-01-23 13... | | +----------------------------------+-----------+------------------+--------------+------------------+------------------+ SHOW FUNCTIONS¶ SyntaxSHOW [USER] FUNCTIONS; DescriptionShow all functions including system functions and user-defined functions in the current catalog and current database. Both system and catalog functions are returned. The USER option shows only user-defined functions in the current catalog and current database. Functions of internal modules are shown if your Organization is in the allow-list, for example, OLTP functions. For convenience, SHOW FUNCITONS also shows functions with special syntax or keywords that don’t follow a traditional functional-style syntax, like FUNC(arg0). For example, || (string concatenation) or IS BETWEEN. ExampleSHOW FUNCTIONS; Your output should resemble: +------------------------+ | function name | +------------------------+ | % | | * | | + | | - | | / | | < | | <= | | <> | | = | | > | | >= | | ABS | | ACOS | | AND | | ARRAY | | ARRAY_CONTAINS | | ASCII | | ASIN | | ATAN | | ATAN2 | | AVG | ... SHOW MODELS¶ SyntaxSHOW MODELS [ ( FROM | IN ) [catalog_name.]database_name ] [ [NOT] LIKE ]; DescriptionShow all AI models that are registered in the current Flink environment. To register an AI model, run the CREATE MODEL statement. ExampleSHOW MODELS; Your output should resemble: +----------------+ | Model Name | +----------------+ | demo_model | +----------------+ SHOW CREATE MODEL¶ SyntaxSHOW CREATE MODEL ; DescriptionShow details about the specified AI inference model. This command is useful for understanding the configuration and options that were set when the model was created with the CREATE MODEL statement. ExampleFor an example AWS Bedrock model named “bedrock_embed”, the following statement might display the shown output. SHOW CREATE MODEL bedrock_embed; -- Example SHOW CREATE MODEL output: CREATE MODEL `model-testing`.`virtual_topic_GCP`.`bedrock_embed` INPUT (`text` VARCHAR(2147483647)) OUTPUT (`response` ARRAY) WITH ( 'BEDROCK.CONNECTION' = 'bedrock-connection-hao', 'BEDROCK.INPUT_FORMAT' = 'AMAZON-TITAN-EMBED', 'PROVIDER' = 'bedrock', 'TASK' = 'text_generation' ); Related content¶ DESCRIBE USE CATALOG Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql SHOW CATALOGS; ``` ```sql SHOW CATALOGS; ``` ```sql +-------------------------+------------+ | catalog name | catalog id | +-------------------------+------------+ | my_environment | env-12abcz | | example-streams-env | env-23xjoo | | quickstart-env | env-9wg8ny | | default | env-t12345 | +-------------------------+------------+ ``` ```sql USE CATALOG my_environment; ``` ```sql +---------------------+----------------+ | Key | Value | +---------------------+----------------+ | sql.current-catalog | my_environment | +---------------------+----------------+ ``` ```sql SHOW CONNECTIONS [LIKE ]; ``` ```sql SHOW CONNECTIONS; -- with name filter SHOW CONNECTIONS LIKE 'sql%'; ``` ```sql +-------------------------+ | Name | +-------------------------+ | azure-openai-connection | | deepwiki-mcp-connection | | demo-day-mcp-connection | | mcp-connection | +-------------------------+ ``` ```sql SHOW CURRENT CATALOG; ``` ```sql SHOW CURRENT CATALOG; ``` ```sql +----------------------+ | current catalog name | +----------------------+ | my_environment | +----------------------+ ``` ```sql SHOW DATABASES; ``` ```sql SHOW DATABASES; ``` ```sql +---------------+-------------+ | database name | database id | +---------------+-------------+ | cluster_0 | lkc-r289m7 | +---------------+-------------+ ``` ```sql USE cluster_0; ``` ```sql +----------------------+-----------+ | Key | Value | +----------------------+-----------+ | sql.current-database | cluster_0 | +----------------------+-----------+ ``` ```sql SHOW CURRENT DATABASE; ``` ```sql SHOW CURRENT DATABASE; ``` ```sql +-----------------------+ | current database name | +-----------------------+ | cluster_0 | +-----------------------+ ``` ```sql SHOW TABLES [ [catalog_name.]database_name ] [ [NOT] LIKE ] ``` ```sql ``` ```sql -- Create a flights table. CREATE TABLE flights ( flight_id STRING, origin STRING, destination STRING ); ``` ```sql -- Create an orders table. CREATE TABLE orders ( user_id BIGINT NOT NULL, product_id STRING, amount INT ); ``` ```sql SHOW TABLES LIKE 'f%'; ``` ```sql +------------+ | table name | +------------+ | flights | +------------+ ``` ```sql SHOW TABLES NOT LIKE 'f%'; ``` ```sql +------------+ | table name | +------------+ | orders | +------------+ ``` ```sql SHOW TABLES; ``` ```sql +------------+ | table name | +------------+ | flights | | orders | +------------+ ``` ```sql SHOW CREATE TABLE [catalog_name.][db_name.]table_name; ``` ```sql SHOW CREATE TABLE flights; ``` ```sql +-----------------------------------------------------------+ | SHOW CREATE TABLE | +-----------------------------------------------------------+ | CREATE TABLE `my_environment`.`cluster_0`.`flights` ( | | `flight_id` VARCHAR(2147483647), | | `origin` VARCHAR(2147483647), | | `destination` VARCHAR(2147483647) | | ) WITH ( | | 'changelog.mode' = 'append', | | 'connector' = 'confluent', | | 'kafka.cleanup-policy' = 'delete', | | 'kafka.max-message-size' = '2097164 bytes', | | 'kafka.partitions' = '6', | | 'kafka.retention.size' = '0 bytes', | | 'kafka.retention.time' = '604800000 ms', | | 'scan.bounded.mode' = 'unbounded', | | 'scan.startup.mode' = 'earliest-offset', | | 'value.format' = 'avro-registry' | | ) | | | +-----------------------------------------------------------+ ``` ```sql CREATE TABLE `t_raw` ( `key` VARBINARY(2147483647), `val` VARBINARY(2147483647) ) DISTRIBUTED BY HASH(`key`) INTO 2 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'raw' ... ); ``` ```sql INSERT INTO t_raw (key, val) SELECT CAST(NULL AS BYTES), CAST(NULL AS BYTES); ``` ```sql { "type": "record", "name": "TestRecord", "fields": [ { "name": "i", "type": "int" }, { "name": "s", "type": "string" } ] } ``` ```sql CREATE TABLE `t_raw_key` ( `key` VARBINARY(2147483647), `i` INT NOT NULL, `s` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'raw', 'value.format' = 'avro-registry' ... ) ``` ```sql INSERT INTO t_raw_key SELECT CAST(NULL AS BYTES), 12, 'Bob'; ``` ```sql { "type": "record", "name": "TestRecord", "fields": [ { "name": "i", "type": "int" }, { "name": "s", "type": "string" } ] } ``` ```sql CREATE TABLE `t_atomic_key` ( `key` INT NOT NULL, `i` INT NOT NULL, `s` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`key`) INTO 2 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.format' = 'avro-registry', 'value.format' = 'avro-registry' ... ) ``` ```sql { "type": "record", "name": "TestRecord", "fields": [ { "name": "i", "type": "int" }, { "name": "s", "type": "string" } ] } ``` ```sql CREATE TABLE `t_raw_disjoint` ( `key_key` VARBINARY(2147483647), `i` INT NOT NULL, `key` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`key_key`) INTO 1 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'key.fields-prefix' = 'key_', 'key.format' = 'raw', 'value.format' = 'avro-registry' ... ) ``` ```sql { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" } ] } ``` ```sql { "type": "record", "name": "TestRecord", "fields": [ { "name": "name", "type": "string" }, { "name": "zip_code", "type": "string" } ] } ``` ```sql CREATE TABLE `t_sr_disjoint` ( `uid` INT NOT NULL, `name` VARCHAR(2147483647) NOT NULL, `zip_code` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`uid`) INTO 1 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'value.format' = 'avro-registry' ... ) ``` ```sql { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" } ] } ``` ```sql { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" },{ "name": "name", "type": "string" }, { "name": "zip_code", "type": "string" } ] } ``` ```sql CREATE TABLE `t_sr_joint` ( `uid` INT NOT NULL, `name` VARCHAR(2147483647) NOT NULL, `zip_code` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`uid`) INTO 1 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'value.fields-include' = 'all', 'value.format' = 'avro-registry' ... ) ``` ```sql 'value.fields-include' = 'all' ``` ```sql { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" } ] } ``` ```sql ALTER TABLE t_metadata_overlap ADD `timestamp` TIMESTAMP_LTZ(3) NOT NULL METADATA; ``` ```sql { "type": "record", "name": "TestRecord", "fields": [ { "name": "uid", "type": "int" }, { "name": "timestamp", "type": ["null", "string"], "default": null } ] } ``` ```sql CREATE TABLE t_metadata_overlap` ( `key` VARBINARY(2147483647), `uid` INT NOT NULL, `timestamp` TIMESTAMP(3) WITH LOCAL TIME ZONE NOT NULL METADATA ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( ... ) ``` ```sql INSERT INTO t_metadata_overlap SELECT CAST(NULL AS BYTES), 42, TO_TIMESTAMP_LTZ(0, 3); SELECT * FROM t_metadata_overlap; ``` ```sql ALTER TABLE t_metadata_overlap DROP `timestamp`; ALTER TABLE t_metadata_overlap ADD message_timestamp TIMESTAMP_LTZ(3) METADATA FROM 'timestamp'; ``` ```sql CREATE TABLE `t_metadata_overlap` ( `key` VARBINARY(2147483647), `uid` INT NOT NULL, `timestamp` VARCHAR(2147483647), `message_timestamp` TIMESTAMP(3) WITH LOCAL TIME ZONE METADATA FROM 'timestamp' ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( ... ) ``` ```sql +----------------------------------+-----------+------------------+--------------+------------------+------------------+ | Name | Phase | Statement | Compute Pool | Creation Time | Detail | +----------------------------------+-----------+------------------+--------------+------------------+------------------+ | 0fb72c57-8e3d-4614 | COMPLETED | CREATE TABLE ... | lfcp-8m03rm | 2024-01-23 13... | Table 'flight... | | 8567b0eb-fabd-4cb8 | COMPLETED | CREATE TABLE ... | lfcp-8m03rm | 2024-01-23 13... | Table 'orders... | | 4cd171ca-77db-48ce | COMPLETED | SHOW TABLES L... | lfcp-8m03rm | 2024-01-23 13... | | | 291eb50b-965c-4a53 | COMPLETED | SHOW TABLES N... | lfcp-8m03rm | 2024-01-23 13... | | | 7a30e70a-36af-41f4 | COMPLETED | SHOW TABLES; | lfcp-8m03rm | 2024-01-23 13... | | +----------------------------------+-----------+------------------+--------------+------------------+------------------+ ``` ```sql SHOW [USER] FUNCTIONS; ``` ```sql SHOW FUNCTIONS; ``` ```sql +------------------------+ | function name | +------------------------+ | % | | * | | + | | - | | / | | < | | <= | | <> | | = | | > | | >= | | ABS | | ACOS | | AND | | ARRAY | | ARRAY_CONTAINS | | ASCII | | ASIN | | ATAN | | ATAN2 | | AVG | ... ``` ```sql SHOW MODELS [ ( FROM | IN ) [catalog_name.]database_name ] [ [NOT] LIKE ]; ``` ```sql SHOW MODELS; ``` ```sql +----------------+ | Model Name | +----------------+ | demo_model | +----------------+ ``` ```sql SHOW CREATE MODEL ; ``` ```sql SHOW CREATE MODEL bedrock_embed; -- Example SHOW CREATE MODEL output: CREATE MODEL `model-testing`.`virtual_topic_GCP`.`bedrock_embed` INPUT (`text` VARCHAR(2147483647)) OUTPUT (`response` ARRAY) WITH ( 'BEDROCK.CONNECTION' = 'bedrock-connection-hao', 'BEDROCK.INPUT_FORMAT' = 'AMAZON-TITAN-EMBED', 'PROVIDER' = 'bedrock', 'TASK' = 'text_generation' ); ``` --- ### SQL USE CATALOG Statement in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/statements/use-catalog.html USE CATALOG Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables setting the active environment with the SQL USE statement. Syntax¶ USE CATALOG catalog_name; Description¶ Set the current catalog (Confluent Cloud environment). All subsequent commands that don’t specify a catalog use catalog_name. Confluent Cloud for Apache Flink interprets your Confluent Cloud environments as catalogs. Flink can access various databases (Apache Kafka® clusters) in a catalog. The catalog_name parameter is case-sensitive. The default current catalog is named default. If catalog_name doesn’t exist, Flink throws an exception on the next DML or DDL statement. Important USE CATALOG is a client-side setting statement and sets corresponding properties that are attached to future requests. By itself, a USE CATALOG statement is a no-op. To see its effect, you must follow it with one or more DML or DDL statements, for example: -- Set the current catalog (environment). USE CATALOG my_env; -- Set the current database (Kafka cluster). USE cluster_0; -- Submit a DDL statement. SELECT * FROM my_table; Use the USE DATABASE statement to set the current Flink database (Kafka cluster). USE CATALOG in Cloud Console workspaces¶ When you run the USE CATALOG statement in a Cloud Console workspace, it sets the catalog that will be used in any subsequent CREATE statement requests for the specific editor cell. Different cells can use different catalogs within the same workspace. The catalog parameter is unquoted, for example, USE CATALOG catalog1;. Any USE statements within an editor cell take precedence over the settings in the workspace’s global catalog and database dropdown controls. Related content¶ USE database Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql USE CATALOG catalog_name; ``` ```sql catalog_name ``` ```sql catalog_name ``` ```sql catalog_name ``` ```sql -- Set the current catalog (environment). USE CATALOG my_env; -- Set the current database (Kafka cluster). USE cluster_0; -- Submit a DDL statement. SELECT * FROM my_table; ``` ```sql USE CATALOG catalog1; ``` --- ### SQL USE database_name Statement in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/statements/use-database.html USE Statement in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® enables setting the current Apache Kafka® cluster with the USE statement. Syntax¶ USE database_name; Description¶ Set the current database (Kafka cluster). The USE statement enables you to access tables in various databases without specifying the full paths. In Confluent Cloud, Apache Flink® databases are equivalent to Kafka clusters. All Kafka clusters in the region where Flink is running are registered automatically as databases and can be accessed by Flink, when using the correct catalog/environment. All subsequent commands that don’t specify a database use . If doesn’t exist, Flink throws an exception on the next DML or DDL statement. Important USE is a client-side setting statement and sets corresponding properties that are attached to future requests. By itself, a USE statement is a no-op. To see its effect, you must follow it with one or more DML or DDL statements, for example: -- Set the current catalog (environment). USE CATALOG my_env; -- Set the current database (Kafka cluster). USE cluster_0; -- Submit a DDL statement. SELECT * FROM my_table; Run the USE CATALOG statement to set the current Flink catalog (Confluent Cloud environment). USE database_name in Cloud Console workspaces¶ When you run the USE statement in a Cloud Console workspace, it sets the database that will be used in any subsequent CREATE statement requests for the specific editor cell. Different cells can use different databases within the same workspace. The parameter is unquoted, for example, USE database1;. Any USE statements within an editor cell take precedence over the settings in the workspace’s global catalog and database dropdown controls. Example¶ In the Flink SQL shell, run the following commands to see an example of the USE statement. View the existing databases. SHOW DATABASES; Your output should resemble: +---------------+-------------+ | database name | database id | +---------------+-------------+ | cluster_0 | lkc-a123c4 | +---------------+-------------+ Set the current database to cluster_0. USE cluster_0; Your output should resemble: +----------------------+-----------+ | Key | Value | +----------------------+-----------+ | sql.current-database | cluster_0 | +----------------------+-----------+ Run the SHOW CURRENT DATABASE to check the database change. SHOW CURRENT DATABASE; Your output should resemble: +-----------------------+ | current database name | +-----------------------+ | cluster_0 | +-----------------------+ Related content¶ USE CATALOG Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql USE ``` ```sql USE database_name; ``` ```sql USE ``` ```sql ``` ```sql ``` ```sql USE ``` ```sql USE ``` ```sql -- Set the current catalog (environment). USE CATALOG my_env; -- Set the current database (Kafka cluster). USE cluster_0; -- Submit a DDL statement. SELECT * FROM my_table; ``` ```sql USE ``` ```sql ``` ```sql USE database1; ``` ```sql USE ``` ```sql SHOW DATABASES; ``` ```sql +---------------+-------------+ | database name | database id | +---------------+-------------+ | cluster_0 | lkc-a123c4 | +---------------+-------------+ ``` ```sql USE cluster_0; ``` ```sql +----------------------+-----------+ | Key | Value | +----------------------+-----------+ | sql.current-database | cluster_0 | +----------------------+-----------+ ``` ```sql SHOW CURRENT DATABASE; ``` ```sql +-----------------------+ | current database name | +-----------------------+ | cluster_0 | +-----------------------+ ``` --- ### Table API on Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/table-api.html Table API on Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® supports programming applications with the Table API in Java and Python. Confluent provides a plugin for running applications that use the Table API on Confluent Cloud. The Table API enables a programmatic way of developing, testing, and submitting Flink pipelines for processing data streams. Streams can be finite or infinite, with insert-only or changelog data. Changelog data enables handling Change Data Capture (CDC) events. To use the Table API, you work with tables that change over time, a concept inspired by relational databases. A Table program is a declarative and structured graph of transformations. The Table API is inspired by SQL and complements it with additional tools for manipulating real-time data. You can use both Flink SQL and the Table API in your applications. A table program has these characteristics: Runs in a regular main() method (Java) Uses Flink APIs Communicates with Confluent Cloud by using REST requests, for example, Statements endpoint. For a list of Table API functions supported by Confluent Cloud for Apache Flink, see Table API functions. For a list of Table API limitations in Confluent Cloud for Apache Flink, see Known limitations. Use the Confluent for VS Code extension to generate a new Flink Table API project that interacts with your Confluent Cloud resources. This option is ideal if you’re learning about the Table API. For more information see Confluent for VS Code for Confluent Cloud. Note The Flink Table API is available for preview. A Preview feature is a Confluent Cloud component that is being introduced to gain early feedback from developers. Preview features can be used for evaluation and non-production testing purposes or to provide feedback to Confluent. The warranty, SLA, and Support Services provisions of your agreement with Confluent do not apply to Preview features. Confluent may discontinue providing preview releases of the Preview features at any time in Confluent’s’ sole discretion. Comments, questions, and suggestions related to the Table API are encouraged and can be submitted through the established channels. Add the Table API to an existing Java project¶ To add the Table API to an existing project, include the following dependencies in the section of your pom.xml file. org.apache.flink flink-table-api-java ${flink.version} io.confluent.flink confluent-flink-table-api-java-plugin ${confluent-plugin.version} Configure the plugin¶ The plugin requires a set of configuration options for establishing a connection to Confluent Cloud. The following configuration options are required. Property key Command-line argument Environment variable Notes client.cloud –cloud CLOUD_PROVIDER Confluent identifier for a cloud provider. Valid values are aws, azure, and gcp. client.compute-pool-id –compute-pool-id COMPUTE_POOL_ID ID of the compute pool, for example, lfcp-8m03rm client.environment-id –environment-id ENV_ID ID of the environment, for example, env-z3y2x1. client.flink-api-key –flink-api-key FLINK_API_KEY API key for Flink access. For more information, see Generate an API Key. client.flink-api-secret –flink-api-secret FLINK_API_SECRET API secret for Flink access. For more information, see Generate an API Key. client.organization-id –organization-id ORG_ID ID of the organization, for example, b0b21724-4586-4a07-b787-d0bb5aacbf87. client.region –region CLOUD_REGION Confluent identifier for a cloud provider’s region, for example, us-east-1. For available regions, see Supported Regions or run confluent flink region list. The following configuration options are required for supporting UDF uploads. For more information, see Upload the jar as a Flink artifact. Note Create a Confluent Cloud API key artifact key and secret in Confluent Cloud Console. For more information, see Manage API Keys in |ccloud|. Property key Command-line argument Environment variable Notes client.artifact-api-key –artifact-api-key ARTIFACT_API_KEY API key for artifact creation client.artifact-api-secret –artifact-api-secret ARTIFACT_API_SECRET API secret for artifact creation The following configuration options are optional. Property key Command-line argument Environment variable Notes client.artifact-endpoint-template –artifact-endpoint-template ARTIFACT_ENDPOINT_TEMPLATE A template for the artifact endpoint URL, for example, https://api.{region}.{cloud}.confluent.cloud. client.catalog-cache Expiration time for catalog objects, for example, '5 min'. The default is '1 min'. '0' disables caching. client.context –context A name for the current Table API session, for example, my_table_program. client.endpoint-template –endpoint-template ENDPOINT_TEMPLATE A template for the endpoint URL, for example, https://flinkpls-dom123.{region}.{cloud}.confluent.cloud. client.principal-id –principal-id PRINCIPAL_ID Principal that runs submitted statements, for example, sa-23kgz4 for a service account. client.rest-endpoint –rest-endpoint REST_ENDPOINT URL to the REST endpoint, for example, proxyto.confluent.cloud. client.statement-name –statement-name Unique name for statement submission. By default, generated using a UUID. client.tmp-dir –tmp-dir Directory for temporary files created by the plugin, like UDF jars, for example, /tmp. The default is java.io.tmpdir. Endpoint configuration¶ The Confluent Flink plugin provides options to configure endpoints for connecting to Confluent Cloud services. The template-based approach is the recommended method. client.endpoint-template¶ This option provides a template for constructing the Flink statement API endpoint URL. Default value: https://flink.{region}.{cloud}.confluent.cloud Example: https://flinkpls-dom123.{region}.{cloud}.confluent.cloud Usage: The template supports placeholders {region} and {cloud} that are replaced with the configured region and cloud provider values. Environment Variable: ENDPOINT_TEMPLATE client.artifact-endpoint-template¶ This option provides a template for constructing the URL used for uploading artifacts, like UDF JARs. Default value: https://api.confluent.cloud Example: https://api.{region}.{cloud}.confluent.cloud Usage: Similar to the endpoint template, this supports placeholders {region} and {cloud}. Environment Variable: ARTIFACT_ENDPOINT_TEMPLATE client.rest-endpoint (Deprecated)¶ This option specifies the base domain for REST API calls to Confluent Cloud. While still supported, using the template-based configuration is preferred. Default value: No default value Example: proxy.confluent.cloud Usage: When specified, the plugin constructs the full Flink statement API endpoint URL as https://flink.{region}.{cloud}.{rest-endpoint} where {region} and {cloud} are replaced with the configured region and cloud provider values. Environment Variable: REST_ENDPOINT Important client.endpoint-template and client.rest-endpoint are mutually exclusive. If both are set, an exception is thrown. Relationship and default behavior¶ The following rules control the relationship between the configuration options. The client.endpoint-template and client.rest-endpoint configuration options can’t be set simultaneously The client.artifact-endpoint-template and client.rest-endpoint configuration options can’t be set simultaneously. The following rules control the default behavior. If neither client.rest-endpoint nor client.endpoint-template is configured, the default template, https://flink.{region}.{cloud}.confluent.cloud is used for statement API If neither client.rest-endpoint nor client.artifact-endpoint-template is specified, the default artifact endpoint, https://api.confluent.cloud is used If endpoint templates are used, each endpoint is constructed independently with the provided templates. The following simple example shows different ways to configure endpoints. // Option 1 (RECOMMENDED): Using endpoint templates // Resolved endpoints: // - Statement API: https://flinkpls-dom123.us-east-1.aws.confluent.cloud ConfluentSettings settings1 = ConfluentSettings.newBuilder() .setRegion("us-east-1") .setCloud("aws") .setEndpointTemplate("https://flinkpls-dom123.{region}.{cloud}.confluent.cloud") .setArtifactEndpointTemplate("https://artifacts.{region}.{cloud}.custom-domain.com") // Other required settings... .build(); // Option 2: Using properties file with endpoint templates // cloud.properties: // client.region=us-east-1 // client.cloud=aws // client.endpoint-template=https://flinkpls-dom123.{region}.{cloud}.confluent.cloud // Resolved endpoints: // - Statement API: https://flinkpls-dom123.us-east-1.aws.confluent.cloud // - Artifact API: https://api.confluent.cloud (default) ConfluentSettings settings2 = ConfluentSettings.fromResource("/cloud.properties"); // Option 3 (DISCOURAGED): Using rest-endpoint (both statement endpoint will be derived from this) // Resolved endpoints: // - Statement API: https://flink.us-east-1.aws.proxy.confluent.cloud // - Artifact API: https://api.proxy.confluent.cloud ConfluentSettings settings3 = ConfluentSettings.newBuilder() .setRegion("us-east-1") .setCloud("aws") .setRestEndpoint("proxy.confluent.cloud") // Other required settings... .build(); ConfluentSettings class¶ The ConfluentSettings class provides configuration options from various sources, so you can combine external input, code, and environment variables to set up your applications. The following precedence order applies to configuration sources, from highest to lowest: CLI arguments or properties file Code Environment variables The following code example shows a TableEnvironment that’s configured by a combination of command-line arguments and code. JavaPythonpublic static void main(String[] args) { // Args might set cloud, region, org, env, and compute pool. // Environment variables might pass key and secret. // Code sets the session name and SQL-specific options. ConfluentSettings settings = ConfluentSettings.newBuilder(args) .setContextName("MyTableProgram") .setOption("sql.local-time-zone", "UTC") .build(); TableEnvironment env = TableEnvironment.create(settings); } from pyflink.table.confluent import ConfluentSettings from pyflink.table import TableEnvironment def run(): # Properties file might set cloud, region, org, env, and compute pool. # Environment variables might pass key and secret. # Code sets the session name and SQL-specific options. settings = ConfluentSettings.new_builder_from_file(...) \ .set_context_name("MyTableProgram") \ .set_option("sql.local-time-zone", "UTC") \ .build() env = TableEnvironment.create(settings) Properties file¶ You can store options in a cloud.properties file and reference the file in code. # Cloud region client.cloud=aws client.region=eu-west-1 # Access & compute resources client.flink-api-key=XXXXXXXXXXXXXXXX client.flink-api-secret=XxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXx client.organization-id=00000000-0000-0000-0000-000000000000 client.environment-id=env-xxxxx client.compute-pool-id=lfcp-xxxxxxxxxx Reference the cloud.properties file in code: JavaPython// Arbitrary file location in file system ConfluentSettings settings = ConfluentSettings.fromPropertiesFile("/path/to/cloud.properties"); // Part of the JAR package (in src/main/resources) ConfluentSettings settings = ConfluentSettings.fromPropertiesResource("/cloud.properties"); from pyflink.table.confluent import ConfluentSettings # Arbitrary file location in file system settings = ConfluentSettings.from_file("/path/to/cloud.properties") Command-line arguments¶ You can pass the configuration settings as command-line options when you run your application’s jar: java -jar my-table-program.jar \ --cloud aws \ --region us-east-1 \ --flink-api-key key \ --flink-api-secret secret \ --organization-id b0b21724-4586-4a07-b787-d0bb5aacbf87 \ --environment-id env-z3y2x1 \ --compute-pool-id lfcp-8m03rm Access the configuration settings from the command-line arguments by using the ConfluentSettings.fromArgs method: JavaPythonpublic static void main(String[] args) { ConfluentSettings settings = ConfluentSettings.fromArgs(args); } from pyflink.table.confluent import ConfluentSettings settings = ConfluentSettings.from_global_variables() Code¶ You can assign the configuration settings in code by using the builder provided with the ConfluentSettings class: JavaPythonConfluentSettings settings = ConfluentSettings.newBuilder() .setCloud("aws") .setRegion("us-east-1") .setFlinkApiKey("key") .setFlinkApiSecret("secret") .setOrganizationId("b0b21724-4586-4a07-b787-d0bb5aacbf87") .setEnvironmentId("env-z3y2x1") .setComputePoolId("lfcp-8m03rm") .build(); from pyflink.table.confluent import ConfluentSettings settings = ConfluentSettings.new_builder() \ .set_cloud("aws") \ .set_region("us-east-1") \ .set_flink_api_key("key") \ .set_flink_api_secret("secret") \ .set_organization_id("b0b21724-4586-4a07-b787-d0bb5aacbf87") \ .set_environment_id("env-z3y2x1") \ .set_compute_pool_id("lfcp-8m03rm") \ .build() Environment variables¶ Set the following environment variables to provide configuration settings. export CLOUD_PROVIDER="aws" export CLOUD_REGION="us-east-1" export FLINK_API_KEY="key" export FLINK_API_SECRET="secret" export ORG_ID="b0b21724-4586-4a07-b787-d0bb5aacbf87" export ENV_ID="env-z3y2x1" export COMPUTE_POOL_ID="lfcp-8m03rm" java -jar my-table-program.jar In code, call: JavaPythonConfluentSettings settings = ConfluentSettings.fromGlobalVariables(); from pyflink.table.confluent import ConfluentSettings settings = ConfluentSettings.from_global_variables() Confluent utilities¶ The ConfluentTools class provides more methods that you can use for developing and testing Table API programs. ConfluentTools.collectChangelog and ConfluentTools.printChangelog¶ Runs the specified table transformations on Confluent Cloud and returns the results locally as a list of changelog rows or prints to the console in a table style. These methods run table.execute().collect() and consume a fixed number of rows from the returned iterator. These methods can work on both finite and infinite input tables. If the pipeline is potentially unbounded, they stop fetching after the desired number of rows has been reached. JavaPython// On a Table object Table table = env.from("examples.marketplace.customers"); List rows = ConfluentTools.collectMaterialized(table, 100); ConfluentTools.printMaterialized(table, 100); // On a TableResult object TableResult tableResult = env.executeSql("SELECT * FROM examples.marketplace.customers"); List rows = ConfluentTools.collectMaterialized(tableResult, 100); ConfluentTools.printMaterialized(tableResult, 100); // For finite (i.e. bounded) tables ConfluentTools.collectMaterialized(table); ConfluentTools.printMaterialized(table); from pyflink.table.confluent import ConfluentSettings, ConfluentTools from pyflink.table import TableEnvironment settings = ConfluentSettings.from_global_variables() env = TableEnvironment.create(settings) # On a Table object table = env.from_path("examples.marketplace.customers") rows = ConfluentTools.collect_changelog_limit(table, 100) ConfluentTools.print_changelog_limit(table, 100) # On a TableResult object tableResult = env.execute_sql("SELECT * FROM examples.marketplace.customers") rows = ConfluentTools.collect_changelog_limit(tableResult, 100) ConfluentTools.print_changelog_limit(tableResult, 100) # For finite (i.e. bounded) tables ConfluentTools.collect_changelog(table) ConfluentTools.print_changelog(table) ConfluentTools.collect_materialized and ConfluentTools.print_materialized¶ Runs the specified table transformations on Confluent Cloud and returns the results locally as a materialized changelog. Changes are applied to an in-memory table and returned as a list of insert-only rows or printed to the console in a table style. These methods run table.execute().collect() and consume a fixed number of rows from the returned iterator. These methods can work on both finite and infinite input tables. If the pipeline is potentially unbounded, they stop fetching after the desired number of rows have been reached. JavaPython// On a Table object Table table = env.from("examples.marketplace.customers"); List rows = ConfluentTools.collectMaterialized(table, 100); ConfluentTools.printMaterialized(table, 100); // On a TableResult object TableResult tableResult = env.executeSql("SELECT * FROM examples.marketplace.customers"); List rows = ConfluentTools.collectMaterialized(tableResult, 100); ConfluentTools.printMaterialized(tableResult, 100); // For finite (i.e. bounded) tables ConfluentTools.collectMaterialized(table); ConfluentTools.printMaterialized(table); from pyflink.table.confluent import ConfluentSettings, ConfluentTools from pyflink.table import TableEnvironment settings = ConfluentSettings.from_global_variables() env = TableEnvironment.create(settings) # On Table object table = env.from_path("examples.marketplace.customers") rows = ConfluentTools.collect_materialized_limit(table, 100) ConfluentTools.print_materialized_limit(table, 100) # On TableResult object tableResult = env.execute_sql("SELECT * FROM examples.marketplace.customers") rows = ConfluentTools.collect_materialized_limit(tableResult, 100) ConfluentTools.print_materialized_limit(tableResult, 100) # For finite (i.e. bounded) tables ConfluentTools.collect_materialized(table) ConfluentTools.print_materialized(table) ConfluentTools.getStatementName and ConfluentTools.stopStatement¶ Additional lifecycle methods for controlling statements on Confluent Cloud after they have been submitted. JavaPython// On TableResult object TableResult tableResult = env.executeSql("SELECT * FROM examples.marketplace.customers"); String statementName = ConfluentTools.getStatementName(tableResult); ConfluentTools.stopStatement(tableResult); // Based on statement name ConfluentTools.stopStatement(env, "table-api-2024-03-21-150457-36e0dbb2e366-sql"); # On TableResult object table_result = env.execute_sql("SELECT * FROM examples.marketplace.customers") statement_name = ConfluentTools.get_statement_name(table_result) ConfluentTools.stop_statement(table_result) # Based on statement name ConfluentTools.stop_statement_by_name(env, "table-api-2024-03-21-150457-36e0dbb2e366-sql") Confluent table descriptor¶ A table descriptor for creating tables located in Confluent Cloud programmatically. Compared to the regular Flink class, the ConfluentTableDescriptor class adds support for Confluent’s system columns and convenience methods for working with Confluent tables. The for_managed() method corresponds to TableDescriptor.for_connector("confluent"). JavaPythonTableDescriptor descriptor = ConfluentTableDescriptor.forManaged() .schema( Schema.newBuilder() .column("i", DataTypes.INT()) .column("s", DataTypes.INT()) .watermark("$rowtime", $("$rowtime").minus(lit(5).seconds())) // Access $rowtime system column .build()) .build(); env.createTable("t1", descriptor); from pyflink.table.confluent import ConfluentTableDescriptor from pyflink.table import Schema, DataTypes from pyflink.table.expressions import col, lit descriptor = ConfluentTableDescriptor.for_managed() \ .schema( Schema.new_builder() .column("i", DataTypes.INT()) .column("s", DataTypes.INT()) .watermark("$rowtime", col("$rowtime").minus(lit(5).seconds)) # Access $rowtime system column .build()) \ .build() env.createTable("t1", descriptor) Known limitations¶ The Table API plugin is in Open Preview stage. Unsupported by Table API Plugin¶ The following features are not supported. Temporary catalog objects (including tables, views, functions) Custom modules Custom catalogs User-defined functions (including system functions) Anonymous, inline objects (including functions, data types) CompiledPlan features are not supported Batch mode Restrictions from Confluent Cloud custom connectors/formats processing time operations structured data types many configuration options limited SQL syntax batch execution mode Issues in Apache Flink¶ Both catalog and database must be set, or identifiers must be fully qualified. A mixture of setting a current catalog and using two-part identifiers can cause errors. String concatenation with .plus causes errors. Instead, use Expressions.concat. Selecting .rowtime in windows causes errors. Using .limit() can cause errors. Next steps¶ Java Table API Quick Start on Confluent Cloud for Apache Flink Python Table API Quick Start on Confluent Cloud for Apache Flink Related content¶ Course: Apache Flink® Table API: Processing Data Streams in Java Table API functions Built-in Functions Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql ``` ```sql org.apache.flink flink-table-api-java ${flink.version} io.confluent.flink confluent-flink-table-api-java-plugin ${confluent-plugin.version} ``` ```sql lfcp-8m03rm ``` ```sql b0b21724-4586-4a07-b787-d0bb5aacbf87 ``` ```sql confluent flink region list ``` ```sql https://api.{region}.{cloud}.confluent.cloud ``` ```sql https://flinkpls-dom123.{region}.{cloud}.confluent.cloud ``` ```sql proxyto.confluent.cloud ``` ```sql java.io.tmpdir ``` ```sql https://flink.{region}.{cloud}.confluent.cloud ``` ```sql https://flinkpls-dom123.{region}.{cloud}.confluent.cloud ``` ```sql ENDPOINT_TEMPLATE ``` ```sql https://api.confluent.cloud ``` ```sql https://api.{region}.{cloud}.confluent.cloud ``` ```sql ARTIFACT_ENDPOINT_TEMPLATE ``` ```sql proxy.confluent.cloud ``` ```sql https://flink.{region}.{cloud}.{rest-endpoint} ``` ```sql REST_ENDPOINT ``` ```sql client.endpoint-template ``` ```sql client.rest-endpoint ``` ```sql client.endpoint-template ``` ```sql client.rest-endpoint ``` ```sql client.artifact-endpoint-template ``` ```sql client.rest-endpoint ``` ```sql client.rest-endpoint ``` ```sql client.endpoint-template ``` ```sql https://flink.{region}.{cloud}.confluent.cloud ``` ```sql client.rest-endpoint ``` ```sql client.artifact-endpoint-template ``` ```sql https://api.confluent.cloud ``` ```sql // Option 1 (RECOMMENDED): Using endpoint templates // Resolved endpoints: // - Statement API: https://flinkpls-dom123.us-east-1.aws.confluent.cloud ConfluentSettings settings1 = ConfluentSettings.newBuilder() .setRegion("us-east-1") .setCloud("aws") .setEndpointTemplate("https://flinkpls-dom123.{region}.{cloud}.confluent.cloud") .setArtifactEndpointTemplate("https://artifacts.{region}.{cloud}.custom-domain.com") // Other required settings... .build(); // Option 2: Using properties file with endpoint templates // cloud.properties: // client.region=us-east-1 // client.cloud=aws // client.endpoint-template=https://flinkpls-dom123.{region}.{cloud}.confluent.cloud // Resolved endpoints: // - Statement API: https://flinkpls-dom123.us-east-1.aws.confluent.cloud // - Artifact API: https://api.confluent.cloud (default) ConfluentSettings settings2 = ConfluentSettings.fromResource("/cloud.properties"); // Option 3 (DISCOURAGED): Using rest-endpoint (both statement endpoint will be derived from this) // Resolved endpoints: // - Statement API: https://flink.us-east-1.aws.proxy.confluent.cloud // - Artifact API: https://api.proxy.confluent.cloud ConfluentSettings settings3 = ConfluentSettings.newBuilder() .setRegion("us-east-1") .setCloud("aws") .setRestEndpoint("proxy.confluent.cloud") // Other required settings... .build(); ``` ```sql ConfluentSettings ``` ```sql ConfluentSettings ``` ```sql TableEnvironment ``` ```sql public static void main(String[] args) { // Args might set cloud, region, org, env, and compute pool. // Environment variables might pass key and secret. // Code sets the session name and SQL-specific options. ConfluentSettings settings = ConfluentSettings.newBuilder(args) .setContextName("MyTableProgram") .setOption("sql.local-time-zone", "UTC") .build(); TableEnvironment env = TableEnvironment.create(settings); } ``` ```sql from pyflink.table.confluent import ConfluentSettings from pyflink.table import TableEnvironment def run(): # Properties file might set cloud, region, org, env, and compute pool. # Environment variables might pass key and secret. # Code sets the session name and SQL-specific options. settings = ConfluentSettings.new_builder_from_file(...) \ .set_context_name("MyTableProgram") \ .set_option("sql.local-time-zone", "UTC") \ .build() env = TableEnvironment.create(settings) ``` ```sql cloud.properties ``` ```sql # Cloud region client.cloud=aws client.region=eu-west-1 # Access & compute resources client.flink-api-key=XXXXXXXXXXXXXXXX client.flink-api-secret=XxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXx client.organization-id=00000000-0000-0000-0000-000000000000 client.environment-id=env-xxxxx client.compute-pool-id=lfcp-xxxxxxxxxx ``` ```sql cloud.properties ``` ```sql // Arbitrary file location in file system ConfluentSettings settings = ConfluentSettings.fromPropertiesFile("/path/to/cloud.properties"); // Part of the JAR package (in src/main/resources) ConfluentSettings settings = ConfluentSettings.fromPropertiesResource("/cloud.properties"); ``` ```sql from pyflink.table.confluent import ConfluentSettings # Arbitrary file location in file system settings = ConfluentSettings.from_file("/path/to/cloud.properties") ``` ```sql java -jar my-table-program.jar \ --cloud aws \ --region us-east-1 \ --flink-api-key key \ --flink-api-secret secret \ --organization-id b0b21724-4586-4a07-b787-d0bb5aacbf87 \ --environment-id env-z3y2x1 \ --compute-pool-id lfcp-8m03rm ``` ```sql ConfluentSettings.fromArgs ``` ```sql public static void main(String[] args) { ConfluentSettings settings = ConfluentSettings.fromArgs(args); } ``` ```sql from pyflink.table.confluent import ConfluentSettings settings = ConfluentSettings.from_global_variables() ``` ```sql ConfluentSettings ``` ```sql ConfluentSettings settings = ConfluentSettings.newBuilder() .setCloud("aws") .setRegion("us-east-1") .setFlinkApiKey("key") .setFlinkApiSecret("secret") .setOrganizationId("b0b21724-4586-4a07-b787-d0bb5aacbf87") .setEnvironmentId("env-z3y2x1") .setComputePoolId("lfcp-8m03rm") .build(); ``` ```sql from pyflink.table.confluent import ConfluentSettings settings = ConfluentSettings.new_builder() \ .set_cloud("aws") \ .set_region("us-east-1") \ .set_flink_api_key("key") \ .set_flink_api_secret("secret") \ .set_organization_id("b0b21724-4586-4a07-b787-d0bb5aacbf87") \ .set_environment_id("env-z3y2x1") \ .set_compute_pool_id("lfcp-8m03rm") \ .build() ``` ```sql export CLOUD_PROVIDER="aws" export CLOUD_REGION="us-east-1" export FLINK_API_KEY="key" export FLINK_API_SECRET="secret" export ORG_ID="b0b21724-4586-4a07-b787-d0bb5aacbf87" export ENV_ID="env-z3y2x1" export COMPUTE_POOL_ID="lfcp-8m03rm" java -jar my-table-program.jar ``` ```sql ConfluentSettings settings = ConfluentSettings.fromGlobalVariables(); ``` ```sql from pyflink.table.confluent import ConfluentSettings settings = ConfluentSettings.from_global_variables() ``` ```sql ConfluentTools ``` ```sql ConfluentTools.collectChangelog ``` ```sql ConfluentTools.printChangelog ``` ```sql table.execute().collect() ``` ```sql // On a Table object Table table = env.from("examples.marketplace.customers"); List rows = ConfluentTools.collectMaterialized(table, 100); ConfluentTools.printMaterialized(table, 100); // On a TableResult object TableResult tableResult = env.executeSql("SELECT * FROM examples.marketplace.customers"); List rows = ConfluentTools.collectMaterialized(tableResult, 100); ConfluentTools.printMaterialized(tableResult, 100); // For finite (i.e. bounded) tables ConfluentTools.collectMaterialized(table); ConfluentTools.printMaterialized(table); ``` ```sql from pyflink.table.confluent import ConfluentSettings, ConfluentTools from pyflink.table import TableEnvironment settings = ConfluentSettings.from_global_variables() env = TableEnvironment.create(settings) # On a Table object table = env.from_path("examples.marketplace.customers") rows = ConfluentTools.collect_changelog_limit(table, 100) ConfluentTools.print_changelog_limit(table, 100) # On a TableResult object tableResult = env.execute_sql("SELECT * FROM examples.marketplace.customers") rows = ConfluentTools.collect_changelog_limit(tableResult, 100) ConfluentTools.print_changelog_limit(tableResult, 100) # For finite (i.e. bounded) tables ConfluentTools.collect_changelog(table) ConfluentTools.print_changelog(table) ``` ```sql ConfluentTools.collect_materialized ``` ```sql ConfluentTools.print_materialized ``` ```sql table.execute().collect() ``` ```sql // On a Table object Table table = env.from("examples.marketplace.customers"); List rows = ConfluentTools.collectMaterialized(table, 100); ConfluentTools.printMaterialized(table, 100); // On a TableResult object TableResult tableResult = env.executeSql("SELECT * FROM examples.marketplace.customers"); List rows = ConfluentTools.collectMaterialized(tableResult, 100); ConfluentTools.printMaterialized(tableResult, 100); // For finite (i.e. bounded) tables ConfluentTools.collectMaterialized(table); ConfluentTools.printMaterialized(table); ``` ```sql from pyflink.table.confluent import ConfluentSettings, ConfluentTools from pyflink.table import TableEnvironment settings = ConfluentSettings.from_global_variables() env = TableEnvironment.create(settings) # On Table object table = env.from_path("examples.marketplace.customers") rows = ConfluentTools.collect_materialized_limit(table, 100) ConfluentTools.print_materialized_limit(table, 100) # On TableResult object tableResult = env.execute_sql("SELECT * FROM examples.marketplace.customers") rows = ConfluentTools.collect_materialized_limit(tableResult, 100) ConfluentTools.print_materialized_limit(tableResult, 100) # For finite (i.e. bounded) tables ConfluentTools.collect_materialized(table) ConfluentTools.print_materialized(table) ``` ```sql ConfluentTools.getStatementName ``` ```sql ConfluentTools.stopStatement ``` ```sql // On TableResult object TableResult tableResult = env.executeSql("SELECT * FROM examples.marketplace.customers"); String statementName = ConfluentTools.getStatementName(tableResult); ConfluentTools.stopStatement(tableResult); // Based on statement name ConfluentTools.stopStatement(env, "table-api-2024-03-21-150457-36e0dbb2e366-sql"); ``` ```sql # On TableResult object table_result = env.execute_sql("SELECT * FROM examples.marketplace.customers") statement_name = ConfluentTools.get_statement_name(table_result) ConfluentTools.stop_statement(table_result) # Based on statement name ConfluentTools.stop_statement_by_name(env, "table-api-2024-03-21-150457-36e0dbb2e366-sql") ``` ```sql ConfluentTableDescriptor ``` ```sql for_managed() ``` ```sql TableDescriptor.for_connector("confluent") ``` ```sql TableDescriptor descriptor = ConfluentTableDescriptor.forManaged() .schema( Schema.newBuilder() .column("i", DataTypes.INT()) .column("s", DataTypes.INT()) .watermark("$rowtime", $("$rowtime").minus(lit(5).seconds())) // Access $rowtime system column .build()) .build(); env.createTable("t1", descriptor); ``` ```sql from pyflink.table.confluent import ConfluentTableDescriptor from pyflink.table import Schema, DataTypes from pyflink.table.expressions import col, lit descriptor = ConfluentTableDescriptor.for_managed() \ .schema( Schema.new_builder() .column("i", DataTypes.INT()) .column("s", DataTypes.INT()) .watermark("$rowtime", col("$rowtime").minus(lit(5).seconds)) # Access $rowtime system column .build()) \ .build() env.createTable("t1", descriptor) ``` ```sql Expressions.concat ``` --- ### SQL Timezone Types in Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/flink/reference/timezone.html Timezone Types in Confluent Cloud for Apache Flink¶ Confluent Cloud for Apache Flink® provides rich data types for date and time, including these: DATE TIME TIMESTAMP TIMESTAMP_LTZ INTERVAL YEAR TO MONTH INTERVAL DAY TO SECOND These datetime types and the related datetime functions enable processing business data across timezones. TIMESTAMP vs TIMESTAMP_LTZ¶ TIMESTAMP type¶ TIMESTAMP(p) is an abbreviation for TIMESTAMP(p) WITHOUT TIME ZONE. The precision p supports a range from 0 to 9. The default is 6. TIMESTAMP describes a timestamp that represents year, month, day, hour, minute, second, and fractional seconds. TIMESTAMP can be specified from a string literal. The following code example shows a SELECT statement that creates a timestamp from a string. SELECT TIMESTAMP '1970-01-01 00:00:04.001'; Your output should resemble: EXPR$0 1970-01-01 00:00:04.001 TIMESTAMP_LTZ type¶ TIMESTAMP_LTZ(p) is an abbreviation for TIMESTAMP(p) WITH LOCAL TIME ZONE. The precision p supports a range from 0* to 9. The default is 6. TIMESTAMP_LTZ describes an absolute time point on the time-line. It stores a LONG value representing epoch-milliseconds and an INT representing nanosecond-of-millisecond. The epoch time is measured from the standard Java epoch of 1970-01-01T00:00:00Z. Every datum of TIMESTAMP_LTZ type is interpreted in the local timezone configured in the current session. Typically, the local timezone is used for computation and visualization. TIMESTAMP_LTZ can be used in cross timezones business because the absolute time point. for example, 4001 milliseconds describes a same instantaneous point in different timezones. If the local system time of all machines in the world returns same value, for example, 4001 milliseconds, this is the meaning of “absolute time point”. TIMESTAMP_LTZ has no literal representation, so you can’t create it from a literal. It can be derived from a LONG epoch time, as shown in the following code example. SET 'sql.local-time-zone' = 'UTC'; Your output should resemble: +---------------------+-------+ | Key | Value | +---------------------+-------+ | sql.local-time-zone | UTC | +---------------------+-------+ Query the TO_TIMESTAMP_LTZ function to convert a Unix time to a TIMESTAMP_LTZ. SELECT TO_TIMESTAMP_LTZ(4001, 3); Your output should resemble: EXPR$0 1970-01-01 00:00:04.001 Change the timezone: SET 'sql.local-time-zone' = 'Asia/Shanghai'; Your output should resemble: +---------------------+---------------+ | Key | Value | +---------------------+---------------+ | sql.local-time-zone | Asia/Shanghai | +---------------------+---------------+ Query the time again: SELECT TO_TIMESTAMP_LTZ(4001, 3); Your output should resemble: EXPR$0 1970-01-01 08:00:04.001 Set the timezone¶ The local timezone defines the current session timezone id. You can configure the timezone in the Flink SQL shell or in your applications. -- set to UTC timezone SET 'sql.local-time-zone' = 'UTC'; -- set to Shanghai timezone SET 'sql.local-time-zone' = 'Asia/Shanghai'; -- set to Los_Angeles timezone SET 'sql.local-time-zone' = 'America/Los_Angeles'; Datetime functions and timezones¶ The return values of the following datetime functions depend on the configured timezone. LOCALTIME LOCALTIMESTAMP CURRENT_DATE CURRENT_TIME CURRENT_TIMESTAMP CURRENT_ROW_TIMESTAMP NOW The following example code shows the return types of these datetime functions. CREATE TABLE timeview AS SELECT LOCALTIME, LOCALTIMESTAMP, CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP, CURRENT_ROW_TIMESTAMP() as current_row_ts, NOW() as now; DESC timeview; Your output should resemble: +-------------------+------------------+----------+--------+ | Column Name | Data Type | Nullable | Extras | +-------------------+------------------+----------+--------+ | LOCALTIME | TIME(0) | NOT NULL | | | LOCALTIMESTAMP | TIMESTAMP(3) | NOT NULL | | | CURRENT_DATE | DATE | NOT NULL | | | CURRENT_TIME | TIME(0) | NOT NULL | | | CURRENT_TIMESTAMP | TIMESTAMP_LTZ(3) | NOT NULL | | | current_row_ts | TIMESTAMP_LTZ(3) | NOT NULL | | | now | TIMESTAMP_LTZ(3) | NOT NULL | | +-------------------+------------------+----------+--------+ Set the timezone to UTC and and query the table. SET 'sql.local-time-zone' = 'UTC'; SELECT * FROM timeview; Your output should resemble: LOCALTIME LOCALTIMESTAMP CURRENT_DATE CURRENT_TIME CURRENT_TIMESTAMP current_row_ts now 04:33:01 2024-09-26 04:33:01.822 2024-09-26 04:33:01 2024-09-25 20:33:01.822 2024-09-25 20:33:01.822 2024-09-25 20:33:01.822 Change the timezone and query the table again. SET 'sql.local-time-zone' = 'Asia/Shanghai'; SELECT * FROM timeview; Your output should resemble: LOCALTIME LOCALTIMESTAMP CURRENT_DATE CURRENT_TIME CURRENT_TIMESTAMP current_row_ts now 04:33:01 2024-09-26 04:33:01.822 2024-09-26 04:33:01 2024-09-26 04:33:01.822 2024-09-26 04:33:01.822 2024-09-26 04:33:01.822 TIMESTAMP_LTZ string representation¶ The session timezone is used when represents a TIMESTAMP_LTZ value to string format, i.e print the value, cast the value to STRING type, cast the value to TIMESTAMP, cast a TIMESTAMP value to TIMESTAMP_LTZ: CREATE TABLE timeview2 AS SELECT TO_TIMESTAMP_LTZ(4001, 3) AS ltz, TIMESTAMP '1970-01-01 00:00:01.001' AS ntz; DESC timeview2; Your output should resemble: +-------------+------------------+----------+--------+ | Column Name | Data Type | Nullable | Extras | +-------------+------------------+----------+--------+ | ltz | TIMESTAMP_LTZ(3) | NULL | | | ntz | TIMESTAMP(3) | NOT NULL | | +-------------+------------------+----------+--------+ Set the timezone to UTC and and query the table. SET 'sql.local-time-zone' = 'UTC'; SELECT * FROM timeview2; Your output should resemble: ltz ntz 1970-01-01 00:00:04.001 1970-01-01 00:00:01.001 Change the timezone and query the table again. SET 'sql.local-time-zone' = 'Asia/Shanghai'; SELECT * FROM timeview2; Your output should resemble: ltz ntz 1970-01-01 08:00:04.001 1970-01-01 00:00:01.001 The following table shows that columns with data types that result from casting. CREATE TABLE timeview3 AS SELECT ltz, CAST(ltz AS TIMESTAMP(3)), CAST(ltz AS STRING), ntz, CAST(ntz AS TIMESTAMP_LTZ(3)) FROM timeview2; DESC timeview3; Your output should resemble: +-------------+------------------+----------+--------+ | Column Name | Data Type | Nullable | Extras | +-------------+------------------+----------+--------+ | ltz | TIMESTAMP_LTZ(3) | NULL | | | ts3 | TIMESTAMP(3) | NULL | | | string_rep | STRING | NULL | | | ntz | TIMESTAMP(3) | NOT NULL | | | ts_ltz3 | TIMESTAMP_LTZ(3) | NOT NULL | | +-------------+------------------+----------+--------+ Query the table. SELECT * FROM timeview3; Your output should resemble: ltz ts3 string_rep ntz ts_ltz3 1970-01-01 08:00:04.001 1970-01-01 08:00:04.001 1970-01-01 08:00:04.001 1970-01-01 00:00:01.001 1970-01-01 00:00:01.001 Time attribute and timezone¶ For more information about time attributes, see Time attributes. Event time and timezone¶ Flink SQL supports defining an event-time attribute on TIMESTAMP and TIMESTAMP_LTZ columns. Event-time attribute on TIMESTAMP¶ If the timestamp data in the source is represented as year-month-day-hour-minute-second, usually a string value without timezone information, for example, 2020-04-15 20:13:40.564, you can define the event-time attribute as a TIMESTAMP column. Event-time attribute on TIMESTAMP_LTZ¶ If the timestamp data in the source is represented as a epoch time, usually as a LONG value, for example, 1618989564564, you can define an event-time attribute as a TIMESTAMP_LTZ column. Daylight Saving Time support¶ Flink SQL supports defining time attributes on a TIMESTAMP_LTZ column, and Flink SQL uses the TIMESTAMP and TIMESTAMP_LTZ types in window processing to support the Daylight Saving Time. Flink SQL uses a timestamp literal to split the window and assigns window to data according to the epoch time of the each row. This means that Flink SQL uses the TIMESTAMP type for window start and window end, like TUMBLE_START and TUMBLE_END, and it uses TIMESTAMP_LTZ for window-time attributes, like TUMBLE_ROWTIME. Given an example tumble window, the Daylight Saving Time in the America/Los_Angeles timezone starts at time 2021-03-14 02:00:00: long epoch1 = 1615708800000L; // 2021-03-14 00:00:00 long epoch2 = 1615712400000L; // 2021-03-14 01:00:00 long epoch3 = 1615716000000L; // 2021-03-14 03:00:00, skip one hour (2021-03-14 02:00:00) long epoch4 = 1615719600000L; // 2021-03-14 04:00:00 The tumble window [2021-03-14 00:00:00, 2021-03-14 00:04:00] collects 3 hours’ worth of data in the America/Los_Angeles timezone, but it collect 4 hours’ worth of data in other non-DST timezones. You only need to define time the attribute on a TIMESTAMP_LTZ column. All windows in Flink SQL, like Hop window, Session window, Cumulative window follow this pattern, and all operations in Flink SQL support TIMESTAMP_LTZ, so Flink SQL provides complete support for Daylight Saving Time. Related content¶ Datetime Functions Time attributes Flink SQL Queries DDL Statements Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql TIMESTAMP(p) ``` ```sql TIMESTAMP(p) WITHOUT TIME ZONE ``` ```sql SELECT TIMESTAMP '1970-01-01 00:00:04.001'; ``` ```sql EXPR$0 1970-01-01 00:00:04.001 ``` ```sql TIMESTAMP_LTZ(p) ``` ```sql TIMESTAMP(p) WITH LOCAL TIME ZONE ``` ```sql TIMESTAMP_LTZ ``` ```sql 1970-01-01T00:00:00Z ``` ```sql TIMESTAMP_LTZ ``` ```sql TIMESTAMP_LTZ ``` ```sql TIMESTAMP_LTZ ``` ```sql SET 'sql.local-time-zone' = 'UTC'; ``` ```sql +---------------------+-------+ | Key | Value | +---------------------+-------+ | sql.local-time-zone | UTC | +---------------------+-------+ ``` ```sql TIMESTAMP_LTZ ``` ```sql SELECT TO_TIMESTAMP_LTZ(4001, 3); ``` ```sql EXPR$0 1970-01-01 00:00:04.001 ``` ```sql SET 'sql.local-time-zone' = 'Asia/Shanghai'; ``` ```sql +---------------------+---------------+ | Key | Value | +---------------------+---------------+ | sql.local-time-zone | Asia/Shanghai | +---------------------+---------------+ ``` ```sql SELECT TO_TIMESTAMP_LTZ(4001, 3); ``` ```sql EXPR$0 1970-01-01 08:00:04.001 ``` ```sql -- set to UTC timezone SET 'sql.local-time-zone' = 'UTC'; -- set to Shanghai timezone SET 'sql.local-time-zone' = 'Asia/Shanghai'; -- set to Los_Angeles timezone SET 'sql.local-time-zone' = 'America/Los_Angeles'; ``` ```sql CREATE TABLE timeview AS SELECT LOCALTIME, LOCALTIMESTAMP, CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP, CURRENT_ROW_TIMESTAMP() as current_row_ts, NOW() as now; DESC timeview; ``` ```sql +-------------------+------------------+----------+--------+ | Column Name | Data Type | Nullable | Extras | +-------------------+------------------+----------+--------+ | LOCALTIME | TIME(0) | NOT NULL | | | LOCALTIMESTAMP | TIMESTAMP(3) | NOT NULL | | | CURRENT_DATE | DATE | NOT NULL | | | CURRENT_TIME | TIME(0) | NOT NULL | | | CURRENT_TIMESTAMP | TIMESTAMP_LTZ(3) | NOT NULL | | | current_row_ts | TIMESTAMP_LTZ(3) | NOT NULL | | | now | TIMESTAMP_LTZ(3) | NOT NULL | | +-------------------+------------------+----------+--------+ ``` ```sql SET 'sql.local-time-zone' = 'UTC'; SELECT * FROM timeview; ``` ```sql LOCALTIME LOCALTIMESTAMP CURRENT_DATE CURRENT_TIME CURRENT_TIMESTAMP current_row_ts now 04:33:01 2024-09-26 04:33:01.822 2024-09-26 04:33:01 2024-09-25 20:33:01.822 2024-09-25 20:33:01.822 2024-09-25 20:33:01.822 ``` ```sql SET 'sql.local-time-zone' = 'Asia/Shanghai'; SELECT * FROM timeview; ``` ```sql LOCALTIME LOCALTIMESTAMP CURRENT_DATE CURRENT_TIME CURRENT_TIMESTAMP current_row_ts now 04:33:01 2024-09-26 04:33:01.822 2024-09-26 04:33:01 2024-09-26 04:33:01.822 2024-09-26 04:33:01.822 2024-09-26 04:33:01.822 ``` ```sql TIMESTAMP_LTZ ``` ```sql TIMESTAMP_LTZ ``` ```sql CREATE TABLE timeview2 AS SELECT TO_TIMESTAMP_LTZ(4001, 3) AS ltz, TIMESTAMP '1970-01-01 00:00:01.001' AS ntz; DESC timeview2; ``` ```sql +-------------+------------------+----------+--------+ | Column Name | Data Type | Nullable | Extras | +-------------+------------------+----------+--------+ | ltz | TIMESTAMP_LTZ(3) | NULL | | | ntz | TIMESTAMP(3) | NOT NULL | | +-------------+------------------+----------+--------+ ``` ```sql SET 'sql.local-time-zone' = 'UTC'; SELECT * FROM timeview2; ``` ```sql ltz ntz 1970-01-01 00:00:04.001 1970-01-01 00:00:01.001 ``` ```sql SET 'sql.local-time-zone' = 'Asia/Shanghai'; SELECT * FROM timeview2; ``` ```sql ltz ntz 1970-01-01 08:00:04.001 1970-01-01 00:00:01.001 ``` ```sql CREATE TABLE timeview3 AS SELECT ltz, CAST(ltz AS TIMESTAMP(3)), CAST(ltz AS STRING), ntz, CAST(ntz AS TIMESTAMP_LTZ(3)) FROM timeview2; DESC timeview3; ``` ```sql +-------------+------------------+----------+--------+ | Column Name | Data Type | Nullable | Extras | +-------------+------------------+----------+--------+ | ltz | TIMESTAMP_LTZ(3) | NULL | | | ts3 | TIMESTAMP(3) | NULL | | | string_rep | STRING | NULL | | | ntz | TIMESTAMP(3) | NOT NULL | | | ts_ltz3 | TIMESTAMP_LTZ(3) | NOT NULL | | +-------------+------------------+----------+--------+ ``` ```sql SELECT * FROM timeview3; ``` ```sql ltz ts3 string_rep ntz ts_ltz3 1970-01-01 08:00:04.001 1970-01-01 08:00:04.001 1970-01-01 08:00:04.001 1970-01-01 00:00:01.001 1970-01-01 00:00:01.001 ``` ```sql 2020-04-15 20:13:40.564 ``` ```sql 1618989564564 ``` ```sql TIMESTAMP_LTZ ``` ```sql TUMBLE_START ``` ```sql TIMESTAMP_LTZ ``` ```sql TUMBLE_ROWTIME ``` ```sql America/Los_Angeles ``` ```sql 2021-03-14 02:00:00 ``` ```sql long epoch1 = 1615708800000L; // 2021-03-14 00:00:00 long epoch2 = 1615712400000L; // 2021-03-14 01:00:00 long epoch3 = 1615716000000L; // 2021-03-14 03:00:00, skip one hour (2021-03-14 02:00:00) long epoch4 = 1615719600000L; // 2021-03-14 04:00:00 ``` ```sql America/Los_Angeles ``` --- ### Flink authentication and authorization event methods (Confluent Cloud audit logs) | Confluent Documentation Source: https://docs.confluent.io/cloud/current/monitoring/audit-logging/event-methods/flink-authn-authz.html Flink Authentication and Authorization Auditable Event Methods on Confluent Cloud¶ Expand all examples | Collapse all examples Confluent Cloud audit logs contain records of auditable events for authentication and authorization operations. When an auditable event occurs, a message is sent to the audit log and is stored as an audit log record. Flink region authentication auditable event methods¶ Included here are operations authenticating to a Flink region that generate auditable event messages for the io.confluent.flink.server/authentication event type. Method name Action triggering an auditable event message flink.Authenticate A request for authentication to a Flink region. Examples¶ flink.Authenticate¶ The flink.Authenticate event method is triggered by a request to authenticate to a Flink region. SUCCESS { "type": "io.confluent.flink.server/authentication", "id": "f388a04b-0bbe-4e10-9b97-b2f565274196", "subject": "crn://confluent.cloud/organization=7c210ed4-6e1e-4355-abf9-b25e25a8b25a/environment=env-xmzdkk/flink-region=AWS.eu-central-1", "@timestamp": "2024-01-12T13:33:46.296Z", "datacontenttype": "application/json", "@version": "1", "kafka.partition": "106", "dataschema": "https://confluent.io/internal/events/AuditLog.v2", "specversion": "1.0", "source": "crn://confluent.cloud/", "kafka.offset": "2495047099", "time": "2024-01-12T13:33:46.296209728Z", "data": { "requestMetadata": { "clientAddress": [ { "ip": "134.238.54.136" } ], "requestId": [ "d31875a39d6e5eae08e0419176808af3" ] }, "internalServiceName": "crn://confluent.cloud/organization=7c210ed4-6e1e-4355-abf9-b25e25a8b25a/environment=env-xmzdkk/flink-region=AWS.eu-central-1", "cloudResources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resourceId": "7c210ed4-6e1e-4355-abf9-b25e25a8b25a" }, { "type": "ENVIRONMENT", "resourceId": "env-xmzdkk" } ] }, "resource": { "type": "FLINK_REGION", "resourceId": "AWS.eu-central-1" } } ], "result": { "status": "SUCCESS" }, "request": { "accessType": "READ_ONLY", "data": "{\"intendedLogicalClusterCrn\":\"crn://confluent.cloud/organization=7c210ed4-6e1e-4355-abf9-b25e25a8b25a/environment=env-xmzdkk/flink-region=AWS.eu-central-1\"}" }, "serviceName": "crn://confluent.cloud/organization=7c210ed4-6e1e-4355-abf9-b25e25a8b25a/environment=env-xmzdkk/flink-region=AWS.eu-central-1", "methodName": "flink.Authenticate", "authenticationInfo": { "result": "SUCCESS", "exposure": "CUSTOMER", "credentials": { "mechanism": "HTTP_BEARER", "idTokenCredentials": { "type": "JWT", "issuer": "Confluent", "subject": "1281943" } } } } } Flink Authorization auditable event methods¶ Included here are operations authorizing principals to access, modify, delete, or create a Flink resource that generate auditable event messages for the io.confluent.flink.server/authorization event type. Method name Action triggering an auditable event message flink.Authorize A request to authorize a principal to access, modify, delete, or create a Flink resource. Examples¶ flink.Authorize¶ The flink.Authorize event method is triggered by a request to authorize a principal to access, modify, delete, or create a Flink resource (STATEMENT OR WORKSPACE). SUCCESS { "cloudResources": [ { "scope": { "resources": [ { "resourceId": "49aea135-19f4-4e75-adb3-8ca5dd04e292", "type": "ORGANIZATION" }, { "resourceId": "env-3ny01o", "type": "ENVIRONMENT" }, { "resourceId": "azure.eastus2", "type": "FLINK_REGION" } ] }, "resource": { "resourceId": "workspace-2024-03-07-030236-92003e1d-1abf-4401-bbfb-57b6b9ead5de", "type": "STATEMENT" } } ], "authorizationInfo": { "resourceName": "workspace-2024-03-07-030236-92003e1d-1abf-4401-bbfb-57b6b9ead5de", "operation": "Describe", "resourceType": "STATEMENT", "rbacAuthorization": { "patternType": "LITERAL", "resourceType": "Statement", "actingPrincipal": { "group": { "resourceId": "group-Xmgn" } }, "role": "FlinkAdmin", "patternName": "*", "operation": "Describe", "cloudScope": { "resources": [ { "resourceId": "49aea135-19f4-4e75-adb3-8ca5dd04e292", "type": "ORGANIZATION" }, { "resourceId": "env-3px32m", "type": "ENVIRONMENT" } ] } }, "result": "ALLOW" }, "request": { "accessType": "READ_ONLY" }, "internalServiceName": "crn://confluent.cloud/organization=49afb126-18f4-4e76-adb3-8ca5dd04e393/environment=env-3px32m/flink-region=azure.eastus2", "authenticationInfo": { "exposure": "CUSTOMER", "identity": "crn://confluent.cloud/organization=49afb126-18f4-4e76-adb3-8ca5dd04e393/identity-provider=Confluent/identity=u-nqxk78", "principal": { "confluentUser": { "resourceId": "u-nqxk78" } }, "result": "SUCCESS" }, "serviceName": "crn://confluent.cloud/organization=49afb126-18f4-4e76-adb3-8ca5dd04e393/environment=env-3px32m/flink-region=azure.eastus2", "methodName": "flink.Authorize", "requestMetadata": { "requestId": [ "52107f4df7fce0356e278c20ce143418" ], "clientAddress": [ { "ip": "1.2.3.4.5" } ] }, "result": { "status": "SUCCESS" } } #### Code Examples ```sql io.confluent.flink.server/authentication ``` ```sql flink.Authenticate ``` ```sql { "type": "io.confluent.flink.server/authentication", "id": "f388a04b-0bbe-4e10-9b97-b2f565274196", "subject": "crn://confluent.cloud/organization=7c210ed4-6e1e-4355-abf9-b25e25a8b25a/environment=env-xmzdkk/flink-region=AWS.eu-central-1", "@timestamp": "2024-01-12T13:33:46.296Z", "datacontenttype": "application/json", "@version": "1", "kafka.partition": "106", "dataschema": "https://confluent.io/internal/events/AuditLog.v2", "specversion": "1.0", "source": "crn://confluent.cloud/", "kafka.offset": "2495047099", "time": "2024-01-12T13:33:46.296209728Z", "data": { "requestMetadata": { "clientAddress": [ { "ip": "134.238.54.136" } ], "requestId": [ "d31875a39d6e5eae08e0419176808af3" ] }, "internalServiceName": "crn://confluent.cloud/organization=7c210ed4-6e1e-4355-abf9-b25e25a8b25a/environment=env-xmzdkk/flink-region=AWS.eu-central-1", "cloudResources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resourceId": "7c210ed4-6e1e-4355-abf9-b25e25a8b25a" }, { "type": "ENVIRONMENT", "resourceId": "env-xmzdkk" } ] }, "resource": { "type": "FLINK_REGION", "resourceId": "AWS.eu-central-1" } } ], "result": { "status": "SUCCESS" }, "request": { "accessType": "READ_ONLY", "data": "{\"intendedLogicalClusterCrn\":\"crn://confluent.cloud/organization=7c210ed4-6e1e-4355-abf9-b25e25a8b25a/environment=env-xmzdkk/flink-region=AWS.eu-central-1\"}" }, "serviceName": "crn://confluent.cloud/organization=7c210ed4-6e1e-4355-abf9-b25e25a8b25a/environment=env-xmzdkk/flink-region=AWS.eu-central-1", "methodName": "flink.Authenticate", "authenticationInfo": { "result": "SUCCESS", "exposure": "CUSTOMER", "credentials": { "mechanism": "HTTP_BEARER", "idTokenCredentials": { "type": "JWT", "issuer": "Confluent", "subject": "1281943" } } } } } ``` ```sql io.confluent.flink.server/authorization ``` ```sql flink.Authorize ``` ```sql { "cloudResources": [ { "scope": { "resources": [ { "resourceId": "49aea135-19f4-4e75-adb3-8ca5dd04e292", "type": "ORGANIZATION" }, { "resourceId": "env-3ny01o", "type": "ENVIRONMENT" }, { "resourceId": "azure.eastus2", "type": "FLINK_REGION" } ] }, "resource": { "resourceId": "workspace-2024-03-07-030236-92003e1d-1abf-4401-bbfb-57b6b9ead5de", "type": "STATEMENT" } } ], "authorizationInfo": { "resourceName": "workspace-2024-03-07-030236-92003e1d-1abf-4401-bbfb-57b6b9ead5de", "operation": "Describe", "resourceType": "STATEMENT", "rbacAuthorization": { "patternType": "LITERAL", "resourceType": "Statement", "actingPrincipal": { "group": { "resourceId": "group-Xmgn" } }, "role": "FlinkAdmin", "patternName": "*", "operation": "Describe", "cloudScope": { "resources": [ { "resourceId": "49aea135-19f4-4e75-adb3-8ca5dd04e292", "type": "ORGANIZATION" }, { "resourceId": "env-3px32m", "type": "ENVIRONMENT" } ] } }, "result": "ALLOW" }, "request": { "accessType": "READ_ONLY" }, "internalServiceName": "crn://confluent.cloud/organization=49afb126-18f4-4e76-adb3-8ca5dd04e393/environment=env-3px32m/flink-region=azure.eastus2", "authenticationInfo": { "exposure": "CUSTOMER", "identity": "crn://confluent.cloud/organization=49afb126-18f4-4e76-adb3-8ca5dd04e393/identity-provider=Confluent/identity=u-nqxk78", "principal": { "confluentUser": { "resourceId": "u-nqxk78" } }, "result": "SUCCESS" }, "serviceName": "crn://confluent.cloud/organization=49afb126-18f4-4e76-adb3-8ca5dd04e393/environment=env-3px32m/flink-region=azure.eastus2", "methodName": "flink.Authorize", "requestMetadata": { "requestId": [ "52107f4df7fce0356e278c20ce143418" ], "clientAddress": [ { "ip": "1.2.3.4.5" } ] }, "result": { "status": "SUCCESS" } } ``` --- ### Auditable event methods for Apache Flink (Confluent Cloud) | Confluent Documentation Source: https://docs.confluent.io/cloud/current/monitoring/audit-logging/event-methods/flink.html Auditable Event Methods for Apache Flink on Confluent Cloud¶ Auditable event methods for Confluent Cloud for Apache Flink are triggered by operations on Apache Flink® in Confluent Cloud and send event messages about the operations to the audit log cluster, where they are stored as event records in a Kafka topic. The resource types for which auditable event methods are triggered include: Flink region (FLINK_REGION) Flink compute pool (COMPUTE_POOL) Flink workspace (FLINK_WORKSPACE) Flink statement (STATEMENT) The following sections provide details about the auditable event methods for each of these resource types. Flink region¶ Auditable event methods for the resource type FLINK_REGION are triggered by operations on Flink compute pool and generate event messages that are sent to the audit log cluster, where they are stored as event records in a Kafka topic. Method name Action triggering an auditable event message ListFlinkRegions A request to list the Flink regions in the organization. ListFlinkRegions¶ The ListFlinkRegions event method is triggered by a request to get a a list of the Flink regions in the organization and sends an event message that is saved in the audit log as an event record. Examples¶ SUCCESS { "specversion": "1.0", "id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", "source": "crn://confluent.cloud/", "type": "io.confluent.cloud/request", "subject": "crn://confluent.cloud/organization=6c2e1a25-2292-483b-9c76-79982e3dc005", "datacontenttype": "application/json", "dataschema": "https://confluent.io/internal/events/AuditLog.v2", "data": { "service_name": "crn://confluent.cloud/service=cc-ksql-api-service", "internal_service_name": "crn://confluent.cloud/service=cc-ksql-api-service", "method_name": "ListFlinkRegions", "cloud_resources": [ { "resource": { "type": "ORGANIZATION", "resource_id": "6c2e1a25-2292-483b-9c76-79982e3dc005" } } ], "authentication_info": { "exposure": "CUSTOMER", "principal": { "confluent_user": { "resource_id": "user-1", "internal_id": "99" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=6c2e1a25-2292-483b-9c76-79982e3dc005/identity-provider=Confluent/identity=user-1" }, "request_metadata": { "request_id": [ "74726163656964303132333435363738" ], "client_address": [ { "ip": "1.2.3.4" } ] }, "request": { "access_type": "READ_ONLY", "data": { "BypassCache": false, "Cloud": 0, "PageSize": 10, "PageToken": "", "RegionName": "" } }, "result": { "status": "SUCCESS", "data": { "elements": [ { "fcpm_v_2_region": { "id": "aws.af-south-1", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-east-1", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-northeast-1", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-northeast-2", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-northeast-3", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-south-1", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-south-2", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-southeast-1", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-southeast-2", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-southeast-3", "metadata": null } } ] } } } } Flink compute pool¶ Auditable event methods for the resource type COMPUTE_POOL are triggered by operations on a Flink compute pool and generate event messages that are sent to the audit log cluster, where they are stored as event records in a Kafka topic. Method name Action triggering an auditable event message CreateComputePool A request to create a Flink compute pool. DeleteComputePool A request to delete a Flink compute pool. GetComputePool A request for a query of a Flink compute pool details. ListComputePools A request for a list of Flink compute pools. UpdateComputePool A request to update a Flink compute pool. CreateComputePool¶ The CreateComputePool event method is triggered by a request to create a Flink compute pool and sends an event message that is saved in the audit log as an event record. Examples¶ SUCCESS { "specversion": "1.0", "id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", "source": "crn://confluent.cloud/", "type": "io.confluent.cloud/request", "subject": "crn://confluent.cloud/organization=6c2e1a25-2292-483b-9c76-79982e3dc005", "datacontenttype": "application/json", "dataschema": "https://confluent.io/internal/events/AuditLog.v2", "data": { "service_name": "crn://confluent.cloud/service=cc-ksql-api-service", "internal_service_name": "crn://confluent.cloud/service=cc-ksql-api-service", "method_name": "ListRegions", "cloud_resources": [ { "resource": { "type": "ORGANIZATION", "resource_id": "6c2e1a25-2292-483b-9c76-79982e3dc005" } } ], "authentication_info": { "exposure": "CUSTOMER", "principal": { "confluent_user": { "resource_id": "user-1", "internal_id": "99" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=6c2e1a25-2292-483b-9c76-79982e3dc005/identity-provider=Confluent/identity=user-1" }, "request_metadata": { "request_id": [ "74726163656964303132333435363738" ], "client_address": [ { "ip": "1.2.3.4" } ] }, "request": { "access_type": "READ_ONLY", "data": { "BypassCache": false, "Cloud": 0, "PageSize": 10, "PageToken": "", "RegionName": "" } }, "result": { "status": "SUCCESS", "data": { "elements": [ { "fcpm_v_2_region": { "id": "aws.af-south-1", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-east-1", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-northeast-1", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-northeast-2", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-northeast-3", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-south-1", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-south-2", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-southeast-1", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-southeast-2", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-southeast-3", "metadata": null } } ] } } } } DeleteComputePool¶ The DeleteComputePool event method is triggered by a request to delete a Flink compute pool and sends an event message that is saved in the audit log as an event record. Examples¶ SUCCESS { "specversion": "1.0", "id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", "source": "crn://confluent.cloud/", "type": "io.confluent.cloud/request", "subject": "crn://confluent.cloud/organization=6c2e1a25-2292-483b-9c76-79982e3dc005/environment=env-j30y0iqp/flink-region=azure.uksouth/compute-pool=lfcp-1", "datacontenttype": "application/json", "dataschema": "https://confluent.io/internal/events/AuditLog.v2", "data": { "service_name": "crn://confluent.cloud/service=cc-ksql-api-service", "internal_service_name": "crn://confluent.cloud/service=cc-ksql-api-service", "method_name": "DeleteComputePool", "cloud_resources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resource_id": "6c2e1a25-2292-483b-9c76-79982e3dc005" }, { "type": "ENVIRONMENT", "resource_id": "env-j30y0iqp" }, { "type": "FLINK_REGION", "resource_id": "azure.uksouth" } ] }, "resource": { "type": "COMPUTE_POOL", "resource_id": "lfcp-1" } } ], "authentication_info": { "exposure": "CUSTOMER", "principal": { "confluent_user": { "resource_id": "user-1", "internal_id": "99" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=6c2e1a25-2292-483b-9c76-79982e3dc005/identity-provider=Confluent/identity=user-1" }, "request_metadata": { "request_id": [ "74726163656964303132333435363738" ], "client_address": [ { "ip": "1.2.3.4" } ] }, "request": { "access_type": "MODIFICATION", "data": { "environment_id": "env-j30y0iqp", "id": "lfcp-1" } }, "result": { "status": "SUCCESS" } } } GetComputePool¶ The GetComputePool event method is triggered by a request to get the details for a Flink compute pool and sends an event message that is saved in the audit log as an event record. Examples¶ SUCCESS { "specversion": "1.0", "id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", "source": "crn://confluent.cloud/", "type": "io.confluent.cloud/request", "subject": "crn://confluent.cloud/organization=6c2e1a25-2292-483b-9c76-79982e3dc005/environment=env-j30y0iqp/flink-region=azure.uksouth/compute-pool=lfcp-1", "datacontenttype": "application/json", "dataschema": "https://confluent.io/internal/events/AuditLog.v2", "data": { "service_name": "crn://confluent.cloud/service=cc-ksql-api-service", "internal_service_name": "crn://confluent.cloud/service=cc-ksql-api-service", "method_name": "DeleteComputePool", "cloud_resources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resource_id": "6c2e1a25-2292-483b-9c76-79982e3dc005" }, { "type": "ENVIRONMENT", "resource_id": "env-j30y0iqp" }, { "type": "FLINK_REGION", "resource_id": "azure.uksouth" } ] }, "resource": { "type": "COMPUTE_POOL", "resource_id": "lfcp-1" } } ], "authentication_info": { "exposure": "CUSTOMER", "principal": { "confluent_user": { "resource_id": "user-1", "internal_id": "99" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=6c2e1a25-2292-483b-9c76-79982e3dc005/identity-provider=Confluent/identity=user-1" }, "request_metadata": { "request_id": [ "74726163656964303132333435363738" ], "client_address": [ { "ip": "1.2.3.4" } ] }, "request": { "access_type": "MODIFICATION", "data": { "environment_id": "env-j30y0iqp", "id": "lfcp-1" } }, "result": { "status": "SUCCESS" } } } ListComputePools¶ The ListComputePools event method is triggered by a request for a list of Flink compute pools and sends an event message that is saved in the audit log as an event record. Examples¶ SUCCESS { "specversion": "1.0", "id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", "source": "crn://confluent.cloud/", "type": "io.confluent.cloud/request", "subject": "crn://confluent.cloud/organization=6c2e1a25-2292-483b-9c76-79982e3dc005/environment=env-j30y0iqp/flink-region=azure.uksouth/compute-pool=lfcp-1", "datacontenttype": "application/json", "dataschema": "https://confluent.io/internal/events/AuditLog.v2", "data": { "service_name": "crn://confluent.cloud/service=cc-ksql-api-service", "internal_service_name": "crn://confluent.cloud/service=cc-ksql-api-service", "method_name": "DeleteComputePool", "cloud_resources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resource_id": "6c2e1a25-2292-483b-9c76-79982e3dc005" }, { "type": "ENVIRONMENT", "resource_id": "env-j30y0iqp" }, { "type": "FLINK_REGION", "resource_id": "azure.uksouth" } ] }, "resource": { "type": "COMPUTE_POOL", "resource_id": "lfcp-1" } } ], "authentication_info": { "exposure": "CUSTOMER", "principal": { "confluent_user": { "resource_id": "user-1", "internal_id": "99" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=6c2e1a25-2292-483b-9c76-79982e3dc005/identity-provider=Confluent/identity=user-1" }, "request_metadata": { "request_id": [ "74726163656964303132333435363738" ], "client_address": [ { "ip": "1.2.3.4" } ] }, "request": { "access_type": "MODIFICATION", "data": { "environment_id": "env-j30y0iqp", "id": "lfcp-1" } }, "result": { "status": "SUCCESS" } } } UpdateComputePool¶ The UpdateComputePool event method is triggered by a request to update a Flink compute pool and sends an event message that is saved in the audit log as an event record. Examples¶ SUCCESS { "specversion": "1.0", "id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", "source": "crn://confluent.cloud/", "type": "io.confluent.cloud/request", "subject": "crn://confluent.cloud/organization=6c2e1a25-2292-483b-9c76-79982e3dc005/environment=env-j30y0iqp/flink-region=azure.uksouth/compute-pool=lfcp-1", "datacontenttype": "application/json", "dataschema": "https://confluent.io/internal/events/AuditLog.v2", "data": { "service_name": "crn://confluent.cloud/service=cc-ksql-api-service", "internal_service_name": "crn://confluent.cloud/service=cc-ksql-api-service", "method_name": "DeleteComputePool", "cloud_resources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resource_id": "6c2e1a25-2292-483b-9c76-79982e3dc005" }, { "type": "ENVIRONMENT", "resource_id": "env-j30y0iqp" }, { "type": "FLINK_REGION", "resource_id": "azure.uksouth" } ] }, "resource": { "type": "COMPUTE_POOL", "resource_id": "lfcp-1" } } ], "authentication_info": { "exposure": "CUSTOMER", "principal": { "confluent_user": { "resource_id": "user-1", "internal_id": "99" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=6c2e1a25-2292-483b-9c76-79982e3dc005/identity-provider=Confluent/identity=user-1" }, "request_metadata": { "request_id": [ "74726163656964303132333435363738" ], "client_address": [ { "ip": "1.2.3.4" } ] }, "request": { "access_type": "MODIFICATION", "data": { "environment_id": "env-j30y0iqp", "id": "lfcp-1" } }, "result": { "status": "SUCCESS" } } } Flink workspace¶ Auditable event methods for the resource type FLINK_WORKSPACE are triggered by operations on a Flink workspace and generate event messages that are sent to the audit log cluster, where they are stored as event records in a Kafka topic. Method name Action triggering an auditable event message CreateWorkspace A request to create a Flink workspace. DeleteWorkspace A request to delete a Flink workspace. GetWorkspace A request for a query of a Flink workspace details. ListWorkspaces A request for a list of Flink workspaces. UpdateWorkspace A request to update a Flink workspace. CreateWorkspace¶ The CreateWorkspace event method is triggered by a request to create a Flink workspace and sends an event message that is saved in the audit log as an event record. Examples¶ SUCCESS { "datacontenttype": "application/json", "data": { "serviceName": "crn://confluent.cloud/", "methodName": "CreateWorkspace", "cloudResources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resourceId": "a56cf537-ab71-480e-b272-43e71531798b" }, { "type": "ENVIRONMENT", "resourceId": "env-rzhxp2" }, { "type": "FLINK_REGION", "resourceId": "aws.us-east-1" } ] }, "resource": { "type": "FLINK_WORKSPACE", "resourceId": "workspace-2023-09-22-162414" } } ], "authenticationInfo": { "principal": { "confluentUser": { "resourceId": "u-123456" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/identity-provider=Confluent/identity=u-123456" }, "requestMetadata": { "requestId": [ "8b4f7ec5693a01fb4a1ae0a24240f944" ], "clientAddress": [ { "ip": "1.2.3.4" } ] }, "request": { "accessType": "MODIFICATION", "data": { "workspace_name": "workspace-2023-09-22-162414", "environment_id": "env-rzhxp2", "org_resource_id": "a56cf537-ab71-480e-b272-43e71531798b", "spec": { "compute_pool": { "id": "lfcp-stgcc30xr80" }, "service_account": null } } }, "result": { "status": "SUCCESS", "data": { "environment_id": "env-rzhxp2", "name": "workspace-2023-09-22-162414", "org_id": "a56cf537-ab71-480e-b272-43e71531798b", "spec": { "compute_pool": { "id": "lfcp-stgcc30xr80" }, "service_account": null } } }, "resourceName": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/environment=env-rzhxp2/flink-region=aws.us-east-1/flink-workspace=workspace-2023-09-22-162414" }, "subject": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/environment=env-rzhxp2/flink-region=aws.us-east-1/flink-workspace=workspace-2023-09-22-162414", "specversion": "1.0", "id": "b76bee22-7678-49ea-8902-67519b0d4133", "source": "crn://confluent.cloud/", "time": "2023-09-22T16:24:15.007233032Z", "type": "io.confluent.cloud/request" } DeleteWorkspace¶ The DeleteWorkspace event method is triggered by a request to delete a Flink workspace and sends an event message that is saved in the audit log as an event record. Examples¶ SUCCESS { "datacontenttype": "application/json", "data": { "serviceName": "crn://confluent.cloud/", "methodName": "DeleteWorkspace", "cloudResources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resourceId": "a56cf537-ab71-480e-b272-43e71531798b" }, { "type": "ENVIRONMENT", "resourceId": "env-rzhxp2" }, { "type": "FLINK_REGION", "resourceId": "aws.us-east-1" } ] }, "resource": { "type": "FLINK_WORKSPACE", "resourceId": "workspace-2023-09-22-162414" } } ], "authenticationInfo": { "principal": { "confluentUser": { "resourceId": "u-123456" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/identity-provider=Confluent/identity=u-123456" }, "requestMetadata": { "requestId": [ "6a4dd657fe6fc5241360983cbf8dc8ce" ], "clientAddress": [ { "ip": "1.2.3.4" } ] }, "request": { "accessType": "MODIFICATION", "data": { "workspace_name": "workspace-2023-09-22-162414", "environment_id": "env-rzhxp2", "org_resource_id": "a56cf537-ab71-480e-b272-43e71531798b" } }, "result": { "status": "SUCCESS" }, "resourceName": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/environment=env-rzhxp2/flink-region=aws.us-east-1/flink-workspace=workspace-2023-09-22-162414" }, "subject": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/environment=env-rzhxp2/flink-region=aws.us-east-1/flink-workspace=workspace-2023-09-22-162414", "specversion": "1.0", "id": "36791901-6bd6-4057-8820-9d6860d56d0c", "source": "crn://confluent.cloud/", "time": "2023-09-22T16:24:41.773914645Z", "type": "io.confluent.cloud/request" } GetWorkspace¶ The GetWorkspace event method is triggered by a request to get the details for a Flink workspace and sends an event message that is saved in the audit log as an event record. Examples¶ SUCCESS { "datacontenttype": "application/json", "data": { "serviceName": "crn://confluent.cloud/", "methodName": "GetWorkspace", "cloudResources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resourceId": "a56cf537-ab71-480e-b272-43e71531798b" }, { "type": "ENVIRONMENT", "resourceId": "env-rzhxp2" }, { "type": "FLINK_REGION", "resourceId": "aws.us-east-1" } ] }, "resource": { "type": "FLINK_WORKSPACE", "resourceId": "workspace-2023-09-22-162414" } } ], "authenticationInfo": { "principal": { "confluentUser": { "resourceId": "u-123456" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/identity-provider=Confluent/identity=u-123456" }, "requestMetadata": { "requestId": [ "ae0fe8164a496916ba2494a4f5cef447" ], "clientAddress": [ { "ip": "1.2.3.4" } ] }, "request": { "accessType": "READ_ONLY", "data": { "environment_id": "env-rzhxp2", "org_resource_id": "a56cf537-ab71-480e-b272-43e71531798b", "workspace_name": "workspace-2023-09-22-162414" } }, "result": { "status": "SUCCESS", "data": { "environment_id": "env-rzhxp2", "name": "workspace-2023-09-22-162414", "org_id": "a56cf537-ab71-480e-b272-43e71531798b", "spec": { "service_account": null, "compute_pool": { "id": "lfcp-stgcc30xr80" } } } }, "resourceName": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/environment=env-rzhxp2/flink-region=aws.us-east-1/flink-workspace=workspace-2023-09-22-162414" }, "subject": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/environment=env-rzhxp2/flink-region=aws.us-east-1/flink-workspace=workspace-2023-09-22-162414", "specversion": "1.0", "id": "ae935a4b-bcc6-4359-9149-3c31e728877a", "source": "crn://confluent.cloud/", "time": "2023-09-22T16:24:15.666686762Z", "type": "io.confluent.cloud/request" } ListWorkspaces¶ The ListWorkspaces event method is triggered by a request for a list of Flink workspaces and sends an event message that is saved in the audit log as an event record. Examples¶ SUCCESS { "datacontenttype": "application/json", "data": { "serviceName": "crn://confluent.cloud/", "methodName": "ListWorkspace", "cloudResources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resourceId": "a56cf537-ab71-480e-b272-43e71531798b" }, { "type": "ENVIRONMENT", "resourceId": "env-rzhxp2" }, { "type": "FLINK_REGION", "resourceId": "aws.us-east-1" } ] }, "resource": { "type": "FLINK_WORKSPACE", "resourceId": "workspace-2023-09-22-162414" } } ], "authenticationInfo": { "principal": { "confluentUser": { "resourceId": "u-123456" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/identity-provider=Confluent/identity=u-123456" }, "requestMetadata": { "requestId": [ "5e926b0c56f3131f8fb350f228ad9b11" ], "clientAddress": [ { "ip": "1.2.3.4" } ] }, "request": { "accessType": "READ_ONLY", "data": { "environment_id": "env-rzhxp2", "org_resource_id": "a56cf537-ab71-480e-b272-43e71531798b", "page_size": 100 } }, "result": { "status": "SUCCESS", "data": { "data": [ { "name": "workspace-2023-09-22-162414", "org_id": "a56cf537-ab71-480e-b272-43e71531798b", "spec": { "compute_pool": { "id": "lfcp-stgcc30xr80" }, "service_account": null }, "environment_id": "env-rzhxp2" } ] } }, "resourceName": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/environment=env-rzhxp2/flink-region=aws.us-east-1/flink-workspace=workspace-2023-09-22-162414" }, "subject": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/environment=env-rzhxp2/flink-region=aws.us-east-1/flink-workspace=workspace-2023-09-22-162414", "specversion": "1.0", "id": "f1f9c92e-f3b8-425e-971f-c0206b0eadc0", "source": "crn://confluent.cloud/", "time": "2023-09-22T16:24:29.707277883Z", "type": "io.confluent.cloud/request" } UpdateWorkspace¶ The UpdateWorkspace event method is triggered by a request to update a Flink workspace and sends an event message that is saved in the audit log as an event record. Examples¶ SUCCESS { "datacontenttype": "application/json", "data": { "serviceName": "crn://confluent.cloud/", "methodName": "UpdateWorkspace", "cloudResources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resourceId": "a56cf537-ab71-480e-b272-43e71531798b" }, { "type": "ENVIRONMENT", "resourceId": "env-rzhxp2" }, { "type": "FLINK_REGION", "resourceId": "aws.us-east-1" } ] }, "resource": { "type": "FLINK_WORKSPACE", "resourceId": "workspace-2023-09-22-162803" } } ], "authenticationInfo": { "principal": { "confluentUser": { "resourceId": "u-123456" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/identity-provider=Confluent/identity=u-123456" }, "requestMetadata": { "requestId": [ "8dd4507a31c9fa9f7ca08fdad18020c5" ], "clientAddress": [ { "ip": "1.2.3.4" } ] }, "request": { "accessType": "MODIFICATION", "data": { "spec": { "compute_pool": null, "service_account": null }, "workspace_name": "workspace-2023-09-22-162803", "environment_id": "env-rzhxp2", "org_resource_id": "a56cf537-ab71-480e-b272-43e71531798b" } }, "result": { "status": "SUCCESS", "data": { "environment_id": "env-rzhxp2", "name": "workspace-2023-09-22-162803", "org_id": "a56cf537-ab71-480e-b272-43e71531798b", "spec": { "compute_pool": null, "service_account": null } } }, "resourceName": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/environment=env-rzhxp2/flink-region=aws.us-east-1/flink-workspace=workspace-2023-09-22-162803" }, "subject": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/environment=env-rzhxp2/flink-region=aws.us-east-1/flink-workspace=workspace-2023-09-22-162803", "specversion": "1.0", "id": "b59d471f-3da3-41e2-847a-8363ab4f9077", "source": "crn://confluent.cloud/", "time": "2023-09-22T16:29:09.323947120Z", "type": "io.confluent.cloud/request" } Flink statement¶ Auditable event methods for the resource type STATEMENT are triggered by operations on a Flink statement and generate event messages that are sent to the audit log cluster, where they are stored as event records in a Kafka topic. Method name Action triggering an auditable event message CreateStatement A request to create a Flink statement. DeleteStatement A request to delete a Flink statement. GetStatement A request for a query of a Flink statement details. ListStatements A request for a list of Flink statements. UpdateStatement A request to update a Flink statement. PatchStatement A request to patch a Flink statement. CreateStatement¶ The CreateStatement event method is triggered by a request to create a Flink statement and sends an event message that is saved in the audit log as an event record. Examples¶ SUCCESS { "datacontenttype": "application/json", "data": { "serviceName": "crn://confluent.cloud/", "methodName": "CreateStatement", "cloudResources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resourceId": "e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5" }, { "type": "ENVIRONMENT", "resourceId": "env-xx5q1x" }, { "type": "FLINK_REGION", "resourceId": "aws.us-west-2" } ] }, "resource": { "type": "STATEMENT", "resourceId": "d730eb03-d3b5-412d" } } ], "authenticationInfo": { "principal": { "confluentUser": { "resourceId": "u-5q0mkq" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5/identity-provider=Confluent/identity=u-5q0mkq" }, "requestMetadata": { "requestId": [ "38cf3bb10d833c36d7b022c633522153" ], "clientAddress": [ { "ip": "1.2.3.4" } ] }, "request": { "accessType": "MODIFICATION", "data": { "environment_id": "env-xx5q1x", "org_resource_id": "e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5", "spec": { "compute_pool_id": "lfcp-devccxwdpvk", "name": "d730eb03-d3b5-412d", "principal": "u-5q0mkq" } } }, "result": { "status": "SUCCESS", "data": { "metadata": { "environment_id": "env-xx5q1x" }, "spec": { "compute_pool_id": "lfcp-devccxwdpvk", "name": "d730eb03-d3b5-412d", "principal": "u-5q0mkq" } } }, "resourceName": "crn://confluent.cloud/organization=e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5/environment=env-xx5q1x/flink-region=aws.us-west-2/statement=d730eb03-d3b5-412d" }, "subject": "crn://confluent.cloud/organization=e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5/environment=env-xx5q1x/flink-region=aws.us-west-2/statement=d730eb03-d3b5-412d", "specversion": "1.0", "id": "d1fbc567-e5bb-4728-bf54-de88a1aba84e", "source": "crn://confluent.cloud/", "time": "2023-09-22T16:45:13.689395512Z", "type": "io.confluent.cloud/request" } DeleteStatement¶ The DeleteStatement event method is triggered by a request to delete a Flink statement and sends an event message that is saved in the audit log as an event record. Examples¶ SUCCESS { "datacontenttype": "application/json", "data": { "serviceName": "crn://confluent.cloud/", "methodName": "DeleteStatement", "cloudResources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resourceId": "e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5" }, { "type": "ENVIRONMENT", "resourceId": "env-v6x7j0" }, { "type": "FLINK_REGION", "resourceId": "aws.us-west-2" } ] }, "resource": { "type": "STATEMENT", "resourceId": "workspace-2023-09-19-024944-b9c724de-c284-486e-a45f-e7dc1100e181" } } ], "authenticationInfo": { "principal": { "confluentUser": { "resourceId": "u-devccq71mwp" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5/identity-provider=Confluent/identity=u-devccq71mwp" }, "requestMetadata": { "requestId": [ "7e9362e01607ffacb08fa80dd2241db2" ], "clientAddress": [ { "ip": "1.2.3.4" } ] }, "request": { "accessType": "MODIFICATION", "data": { "StatementName": "workspace-2023-09-19-024944-b9c724de-c284-486e-a45f-e7dc1100e181", "OrgResourceId": "e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5", "EnvironmentId": "env-v6x7j0" } }, "result": { "status": "SUCCESS" }, "resourceName": "crn://confluent.cloud/organization=e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5/environment=env-v6x7j0/flink-region=aws.us-west-2/statement=workspace-2023-09-19-024944-b9c724de-c284-486e-a45f-e7dc1100e181" }, "subject": "crn://confluent.cloud/organization=e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5/environment=env-v6x7j0/flink-region=aws.us-west-2/statement=workspace-2023-09-19-024944-b9c724de-c284-486e-a45f-e7dc1100e181", "specversion": "1.0", "id": "de07cd1b-ec0f-4d0e-abce-050a993e7532", "source": "crn://confluent.cloud/", "time": "2023-09-22T16:48:05.106656163Z", "type": "io.confluent.cloud/request" } GetStatement¶ The GetStatement event method is triggered by a request to get the details for a Flink statement and sends an event message that is saved in the audit log as an event record. Examples¶ SUCCESS { "datacontenttype": "application/json", "data": { "serviceName": "crn://confluent.cloud/", "methodName": "GetStatement", "cloudResources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resourceId": "a56cf537-ab71-480e-b272-43e71531798b" }, { "type": "ENVIRONMENT", "resourceId": "env-9pjxk0" }, { "type": "FLINK_REGION", "resourceId": "aws.us-west-2" } ] }, "resource": { "type": "STATEMENT", "resourceId": "928c8647-582b-4d3b" } } ], "authenticationInfo": { "principal": { "confluentUser": { "resourceId": "u-21r8oo" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/identity-provider=Confluent/identity=u-21r8oo" }, "requestMetadata": { "requestId": [ "a688f8810ba426f39511c04f7b511a0a" ], "clientAddress": [ { "ip": "1.2.3.4" } ] }, "request": { "accessType": "READ_ONLY", "data": { "statement_name": "928c8647-582b-4d3b", "environment_id": "env-9pjxk0" } }, "result": { "status": "SUCCESS", "data": { "metadata": { "environment_id": "env-9pjxk0" }, "spec": { "compute_pool_id": "lfcp-stgccgjvgr1", "name": "928c8647-582b-4d3b", "principal": "u-21r8oo" } } }, "resourceName": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/environment=env-9pjxk0/flink-region=aws.us-west-2/statement=928c8647-582b-4d3b" }, "subject": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/environment=env-9pjxk0/flink-region=aws.us-west-2/statement=928c8647-582b-4d3b", "specversion": "1.0", "id": "f6f45075-3d85-4e41-8677-c06a80ef903e", "source": "crn://confluent.cloud/", "time": "2023-09-22T16:35:20.968310060Z", "type": "io.confluent.cloud/request" } ListStatements¶ The ListStatements event method is triggered by a request for a list of Flink statements and sends an event message that is saved in the audit log as an event record. Examples¶ SUCCESS { "datacontenttype": "application/json", "data": { "serviceName": "crn://confluent.cloud/", "methodName": "ListStatements", "cloudResources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resourceId": "e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5" }, { "type": "ENVIRONMENT", "resourceId": "env-xx3gwz" }, { "type": "FLINK_REGION", "resourceId": "aws.eu-west-1" } ] }, "resource": { "type": "STATEMENT", "resourceId": "3ab9a756-4bcf-475b" } }, { "scope": { "resources": [ { "type": "ORGANIZATION", "resourceId": "e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5" }, { "type": "ENVIRONMENT", "resourceId": "env-xx3gwz" }, { "type": "FLINK_REGION", "resourceId": "aws.eu-west-1" } ] }, "resource": { "type": "STATEMENT", "resourceId": "e264b999-269c-46d6" } } ], "authenticationInfo": { "principal": { "confluentUser": { "resourceId": "u-devccq71mwp" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5/identity-provider=Confluent/identity=u-devccq71mwp" }, "requestMetadata": { "requestId": [ "7cec6a84ab0b05ccb38ecf14981da31b" ], "clientAddress": [ { "ip": "1.2.3.4" } ] }, "request": { "accessType": "READ_ONLY", "data": { "compute_pool_id": "", "environment_id": "env-xx3gwz", "page_size": 100 } }, "result": { "status": "SUCCESS", "data": { "data": [ { "metadata": { "environment_id": "env-xx3gwz" }, "spec": { "compute_pool_id": "lfcp-devcc36z5jj", "name": "3ab9a756-4bcf-475b", "principal": "u-rk1gy7" } }, { "metadata": { "environment_id": "env-xx3gwz" }, "spec": { "principal": "u-rk1gy7", "compute_pool_id": "lfcp-devcc36z5jj", "name": "e264b999-269c-46d6" } } ] } }, "resourceName": "crn://confluent.cloud/organization=e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5/environment=env-xx3gwz/flink-region=aws.eu-west-1" }, "subject": "crn://confluent.cloud/organization=e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5/environment=env-xx3gwz/flink-region=aws.eu-west-1", "specversion": "1.0", "id": "5e6bc2d3-9881-442b-af0c-a0a6aa127867", "source": "crn://confluent.cloud/", "time": "2023-09-22T16:47:00.894461118Z", "type": "io.confluent.cloud/request" } UpdateStatement¶ The UpdateStatement event method is triggered by a request to update a Flink statement and sends an event message that is saved in the audit log as an event record. Examples¶ SUCCESS { "datacontenttype": "application/json", "data": { "service_name": "crn://confluent.cloud/", "method_name": "UpdateStatement", "cloud_resources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resource_id": "org-123" }, { "type": "ENVIRONMENT", "resource_id": "env-123" }, { "type": "FLINK_REGION", "resource_id": "aws.us-east-2" } ] }, "resource": { "type": "STATEMENT", "resource_id": "statement-123" } }, { "scope": { "resources": [ { "type": "ORGANIZATION", "resource_id": "org-123" }, { "type": "ENVIRONMENT", "resource_id": "env-123" }, { "type": "FLINK_REGION", "resource_id": "aws.us-east-2" } ] }, "resource": { "type": "STATEMENT", "resource_id": "statement-123" } } ], "authentication_info": { "exposure": "CUSTOMER", "principal": { "confluent_user": { "resource_id": "u-123" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=org-123/identity-provider=Confluent/identity=u-123" }, "request_metadata": { "request_id": [ "74726163656964303132333435363738" ], "client_address": [ { "ip": "127.0.0.1" } ] }, "request": { "access_type": "MODIFICATION", "data": { "environment_id": "env-123", "org_resource_id": "org-123", "spec": { "compute_pool_id": "lfcp-123", "name": "statement-123", "principal": "sa-123" } } }, "result": { "status": "SUCCESS", "data": { "metadata": { "environment_id": "env-123" }, "spec": { "compute_pool_id": "lfcp-123", "name": "statement-123", "principal": "sa-123" } } } } } PatchStatement¶ The PatchStatement event method is triggered by a request to patch a Flink statement and sends an event message that is saved in the audit log as an event record. Examples¶ SUCCESS { "datacontenttype": "application/json", "data": { "service_name": "crn://confluent.cloud/", "method_name": "PatchStatement", "cloud_resources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resource_id": "org-123" }, { "type": "ENVIRONMENT", "resource_id": "env-123" }, { "type": "FLINK_REGION", "resource_id": "aws.us-east-2" } ] }, "resource": { "type": "STATEMENT", "resource_id": "statement-123" } }, { "scope": { "resources": [ { "type": "ORGANIZATION", "resource_id": "org-123" }, { "type": "ENVIRONMENT", "resource_id": "env-123" }, { "type": "FLINK_REGION", "resource_id": "aws.us-east-2" } ] }, "resource": { "type": "STATEMENT", "resource_id": "statement-123" } } ], "authentication_info": { "exposure": "CUSTOMER", "principal": { "confluent_user": { "resource_id": "u-123" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=org-123/identity-provider=Confluent/identity=u-123" }, "request_metadata": { "request_id": [ "74726163656964303132333435363738" ], "client_address": [ { "ip": "127.0.0.1" } ] }, "request": { "access_type": "MODIFICATION", "data": { "environment_id": "env-123", "org_resource_id": "org-123", "statement_name": "statement-123" } }, "result": { "status": "SUCCESS" } } } #### Code Examples ```sql FLINK_REGION ``` ```sql ListFlinkRegions ``` ```sql { "specversion": "1.0", "id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", "source": "crn://confluent.cloud/", "type": "io.confluent.cloud/request", "subject": "crn://confluent.cloud/organization=6c2e1a25-2292-483b-9c76-79982e3dc005", "datacontenttype": "application/json", "dataschema": "https://confluent.io/internal/events/AuditLog.v2", "data": { "service_name": "crn://confluent.cloud/service=cc-ksql-api-service", "internal_service_name": "crn://confluent.cloud/service=cc-ksql-api-service", "method_name": "ListFlinkRegions", "cloud_resources": [ { "resource": { "type": "ORGANIZATION", "resource_id": "6c2e1a25-2292-483b-9c76-79982e3dc005" } } ], "authentication_info": { "exposure": "CUSTOMER", "principal": { "confluent_user": { "resource_id": "user-1", "internal_id": "99" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=6c2e1a25-2292-483b-9c76-79982e3dc005/identity-provider=Confluent/identity=user-1" }, "request_metadata": { "request_id": [ "74726163656964303132333435363738" ], "client_address": [ { "ip": "1.2.3.4" } ] }, "request": { "access_type": "READ_ONLY", "data": { "BypassCache": false, "Cloud": 0, "PageSize": 10, "PageToken": "", "RegionName": "" } }, "result": { "status": "SUCCESS", "data": { "elements": [ { "fcpm_v_2_region": { "id": "aws.af-south-1", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-east-1", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-northeast-1", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-northeast-2", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-northeast-3", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-south-1", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-south-2", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-southeast-1", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-southeast-2", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-southeast-3", "metadata": null } } ] } } } } ``` ```sql COMPUTE_POOL ``` ```sql CreateComputePool ``` ```sql { "specversion": "1.0", "id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", "source": "crn://confluent.cloud/", "type": "io.confluent.cloud/request", "subject": "crn://confluent.cloud/organization=6c2e1a25-2292-483b-9c76-79982e3dc005", "datacontenttype": "application/json", "dataschema": "https://confluent.io/internal/events/AuditLog.v2", "data": { "service_name": "crn://confluent.cloud/service=cc-ksql-api-service", "internal_service_name": "crn://confluent.cloud/service=cc-ksql-api-service", "method_name": "ListRegions", "cloud_resources": [ { "resource": { "type": "ORGANIZATION", "resource_id": "6c2e1a25-2292-483b-9c76-79982e3dc005" } } ], "authentication_info": { "exposure": "CUSTOMER", "principal": { "confluent_user": { "resource_id": "user-1", "internal_id": "99" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=6c2e1a25-2292-483b-9c76-79982e3dc005/identity-provider=Confluent/identity=user-1" }, "request_metadata": { "request_id": [ "74726163656964303132333435363738" ], "client_address": [ { "ip": "1.2.3.4" } ] }, "request": { "access_type": "READ_ONLY", "data": { "BypassCache": false, "Cloud": 0, "PageSize": 10, "PageToken": "", "RegionName": "" } }, "result": { "status": "SUCCESS", "data": { "elements": [ { "fcpm_v_2_region": { "id": "aws.af-south-1", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-east-1", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-northeast-1", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-northeast-2", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-northeast-3", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-south-1", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-south-2", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-southeast-1", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-southeast-2", "metadata": null } }, { "fcpm_v_2_region": { "id": "aws.ap-southeast-3", "metadata": null } } ] } } } } ``` ```sql DeleteComputePool ``` ```sql { "specversion": "1.0", "id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", "source": "crn://confluent.cloud/", "type": "io.confluent.cloud/request", "subject": "crn://confluent.cloud/organization=6c2e1a25-2292-483b-9c76-79982e3dc005/environment=env-j30y0iqp/flink-region=azure.uksouth/compute-pool=lfcp-1", "datacontenttype": "application/json", "dataschema": "https://confluent.io/internal/events/AuditLog.v2", "data": { "service_name": "crn://confluent.cloud/service=cc-ksql-api-service", "internal_service_name": "crn://confluent.cloud/service=cc-ksql-api-service", "method_name": "DeleteComputePool", "cloud_resources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resource_id": "6c2e1a25-2292-483b-9c76-79982e3dc005" }, { "type": "ENVIRONMENT", "resource_id": "env-j30y0iqp" }, { "type": "FLINK_REGION", "resource_id": "azure.uksouth" } ] }, "resource": { "type": "COMPUTE_POOL", "resource_id": "lfcp-1" } } ], "authentication_info": { "exposure": "CUSTOMER", "principal": { "confluent_user": { "resource_id": "user-1", "internal_id": "99" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=6c2e1a25-2292-483b-9c76-79982e3dc005/identity-provider=Confluent/identity=user-1" }, "request_metadata": { "request_id": [ "74726163656964303132333435363738" ], "client_address": [ { "ip": "1.2.3.4" } ] }, "request": { "access_type": "MODIFICATION", "data": { "environment_id": "env-j30y0iqp", "id": "lfcp-1" } }, "result": { "status": "SUCCESS" } } } ``` ```sql GetComputePool ``` ```sql { "specversion": "1.0", "id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", "source": "crn://confluent.cloud/", "type": "io.confluent.cloud/request", "subject": "crn://confluent.cloud/organization=6c2e1a25-2292-483b-9c76-79982e3dc005/environment=env-j30y0iqp/flink-region=azure.uksouth/compute-pool=lfcp-1", "datacontenttype": "application/json", "dataschema": "https://confluent.io/internal/events/AuditLog.v2", "data": { "service_name": "crn://confluent.cloud/service=cc-ksql-api-service", "internal_service_name": "crn://confluent.cloud/service=cc-ksql-api-service", "method_name": "DeleteComputePool", "cloud_resources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resource_id": "6c2e1a25-2292-483b-9c76-79982e3dc005" }, { "type": "ENVIRONMENT", "resource_id": "env-j30y0iqp" }, { "type": "FLINK_REGION", "resource_id": "azure.uksouth" } ] }, "resource": { "type": "COMPUTE_POOL", "resource_id": "lfcp-1" } } ], "authentication_info": { "exposure": "CUSTOMER", "principal": { "confluent_user": { "resource_id": "user-1", "internal_id": "99" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=6c2e1a25-2292-483b-9c76-79982e3dc005/identity-provider=Confluent/identity=user-1" }, "request_metadata": { "request_id": [ "74726163656964303132333435363738" ], "client_address": [ { "ip": "1.2.3.4" } ] }, "request": { "access_type": "MODIFICATION", "data": { "environment_id": "env-j30y0iqp", "id": "lfcp-1" } }, "result": { "status": "SUCCESS" } } } ``` ```sql ListComputePools ``` ```sql { "specversion": "1.0", "id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", "source": "crn://confluent.cloud/", "type": "io.confluent.cloud/request", "subject": "crn://confluent.cloud/organization=6c2e1a25-2292-483b-9c76-79982e3dc005/environment=env-j30y0iqp/flink-region=azure.uksouth/compute-pool=lfcp-1", "datacontenttype": "application/json", "dataschema": "https://confluent.io/internal/events/AuditLog.v2", "data": { "service_name": "crn://confluent.cloud/service=cc-ksql-api-service", "internal_service_name": "crn://confluent.cloud/service=cc-ksql-api-service", "method_name": "DeleteComputePool", "cloud_resources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resource_id": "6c2e1a25-2292-483b-9c76-79982e3dc005" }, { "type": "ENVIRONMENT", "resource_id": "env-j30y0iqp" }, { "type": "FLINK_REGION", "resource_id": "azure.uksouth" } ] }, "resource": { "type": "COMPUTE_POOL", "resource_id": "lfcp-1" } } ], "authentication_info": { "exposure": "CUSTOMER", "principal": { "confluent_user": { "resource_id": "user-1", "internal_id": "99" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=6c2e1a25-2292-483b-9c76-79982e3dc005/identity-provider=Confluent/identity=user-1" }, "request_metadata": { "request_id": [ "74726163656964303132333435363738" ], "client_address": [ { "ip": "1.2.3.4" } ] }, "request": { "access_type": "MODIFICATION", "data": { "environment_id": "env-j30y0iqp", "id": "lfcp-1" } }, "result": { "status": "SUCCESS" } } } ``` ```sql UpdateComputePool ``` ```sql { "specversion": "1.0", "id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", "source": "crn://confluent.cloud/", "type": "io.confluent.cloud/request", "subject": "crn://confluent.cloud/organization=6c2e1a25-2292-483b-9c76-79982e3dc005/environment=env-j30y0iqp/flink-region=azure.uksouth/compute-pool=lfcp-1", "datacontenttype": "application/json", "dataschema": "https://confluent.io/internal/events/AuditLog.v2", "data": { "service_name": "crn://confluent.cloud/service=cc-ksql-api-service", "internal_service_name": "crn://confluent.cloud/service=cc-ksql-api-service", "method_name": "DeleteComputePool", "cloud_resources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resource_id": "6c2e1a25-2292-483b-9c76-79982e3dc005" }, { "type": "ENVIRONMENT", "resource_id": "env-j30y0iqp" }, { "type": "FLINK_REGION", "resource_id": "azure.uksouth" } ] }, "resource": { "type": "COMPUTE_POOL", "resource_id": "lfcp-1" } } ], "authentication_info": { "exposure": "CUSTOMER", "principal": { "confluent_user": { "resource_id": "user-1", "internal_id": "99" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=6c2e1a25-2292-483b-9c76-79982e3dc005/identity-provider=Confluent/identity=user-1" }, "request_metadata": { "request_id": [ "74726163656964303132333435363738" ], "client_address": [ { "ip": "1.2.3.4" } ] }, "request": { "access_type": "MODIFICATION", "data": { "environment_id": "env-j30y0iqp", "id": "lfcp-1" } }, "result": { "status": "SUCCESS" } } } ``` ```sql FLINK_WORKSPACE ``` ```sql CreateWorkspace ``` ```sql { "datacontenttype": "application/json", "data": { "serviceName": "crn://confluent.cloud/", "methodName": "CreateWorkspace", "cloudResources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resourceId": "a56cf537-ab71-480e-b272-43e71531798b" }, { "type": "ENVIRONMENT", "resourceId": "env-rzhxp2" }, { "type": "FLINK_REGION", "resourceId": "aws.us-east-1" } ] }, "resource": { "type": "FLINK_WORKSPACE", "resourceId": "workspace-2023-09-22-162414" } } ], "authenticationInfo": { "principal": { "confluentUser": { "resourceId": "u-123456" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/identity-provider=Confluent/identity=u-123456" }, "requestMetadata": { "requestId": [ "8b4f7ec5693a01fb4a1ae0a24240f944" ], "clientAddress": [ { "ip": "1.2.3.4" } ] }, "request": { "accessType": "MODIFICATION", "data": { "workspace_name": "workspace-2023-09-22-162414", "environment_id": "env-rzhxp2", "org_resource_id": "a56cf537-ab71-480e-b272-43e71531798b", "spec": { "compute_pool": { "id": "lfcp-stgcc30xr80" }, "service_account": null } } }, "result": { "status": "SUCCESS", "data": { "environment_id": "env-rzhxp2", "name": "workspace-2023-09-22-162414", "org_id": "a56cf537-ab71-480e-b272-43e71531798b", "spec": { "compute_pool": { "id": "lfcp-stgcc30xr80" }, "service_account": null } } }, "resourceName": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/environment=env-rzhxp2/flink-region=aws.us-east-1/flink-workspace=workspace-2023-09-22-162414" }, "subject": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/environment=env-rzhxp2/flink-region=aws.us-east-1/flink-workspace=workspace-2023-09-22-162414", "specversion": "1.0", "id": "b76bee22-7678-49ea-8902-67519b0d4133", "source": "crn://confluent.cloud/", "time": "2023-09-22T16:24:15.007233032Z", "type": "io.confluent.cloud/request" } ``` ```sql DeleteWorkspace ``` ```sql { "datacontenttype": "application/json", "data": { "serviceName": "crn://confluent.cloud/", "methodName": "DeleteWorkspace", "cloudResources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resourceId": "a56cf537-ab71-480e-b272-43e71531798b" }, { "type": "ENVIRONMENT", "resourceId": "env-rzhxp2" }, { "type": "FLINK_REGION", "resourceId": "aws.us-east-1" } ] }, "resource": { "type": "FLINK_WORKSPACE", "resourceId": "workspace-2023-09-22-162414" } } ], "authenticationInfo": { "principal": { "confluentUser": { "resourceId": "u-123456" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/identity-provider=Confluent/identity=u-123456" }, "requestMetadata": { "requestId": [ "6a4dd657fe6fc5241360983cbf8dc8ce" ], "clientAddress": [ { "ip": "1.2.3.4" } ] }, "request": { "accessType": "MODIFICATION", "data": { "workspace_name": "workspace-2023-09-22-162414", "environment_id": "env-rzhxp2", "org_resource_id": "a56cf537-ab71-480e-b272-43e71531798b" } }, "result": { "status": "SUCCESS" }, "resourceName": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/environment=env-rzhxp2/flink-region=aws.us-east-1/flink-workspace=workspace-2023-09-22-162414" }, "subject": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/environment=env-rzhxp2/flink-region=aws.us-east-1/flink-workspace=workspace-2023-09-22-162414", "specversion": "1.0", "id": "36791901-6bd6-4057-8820-9d6860d56d0c", "source": "crn://confluent.cloud/", "time": "2023-09-22T16:24:41.773914645Z", "type": "io.confluent.cloud/request" } ``` ```sql GetWorkspace ``` ```sql { "datacontenttype": "application/json", "data": { "serviceName": "crn://confluent.cloud/", "methodName": "GetWorkspace", "cloudResources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resourceId": "a56cf537-ab71-480e-b272-43e71531798b" }, { "type": "ENVIRONMENT", "resourceId": "env-rzhxp2" }, { "type": "FLINK_REGION", "resourceId": "aws.us-east-1" } ] }, "resource": { "type": "FLINK_WORKSPACE", "resourceId": "workspace-2023-09-22-162414" } } ], "authenticationInfo": { "principal": { "confluentUser": { "resourceId": "u-123456" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/identity-provider=Confluent/identity=u-123456" }, "requestMetadata": { "requestId": [ "ae0fe8164a496916ba2494a4f5cef447" ], "clientAddress": [ { "ip": "1.2.3.4" } ] }, "request": { "accessType": "READ_ONLY", "data": { "environment_id": "env-rzhxp2", "org_resource_id": "a56cf537-ab71-480e-b272-43e71531798b", "workspace_name": "workspace-2023-09-22-162414" } }, "result": { "status": "SUCCESS", "data": { "environment_id": "env-rzhxp2", "name": "workspace-2023-09-22-162414", "org_id": "a56cf537-ab71-480e-b272-43e71531798b", "spec": { "service_account": null, "compute_pool": { "id": "lfcp-stgcc30xr80" } } } }, "resourceName": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/environment=env-rzhxp2/flink-region=aws.us-east-1/flink-workspace=workspace-2023-09-22-162414" }, "subject": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/environment=env-rzhxp2/flink-region=aws.us-east-1/flink-workspace=workspace-2023-09-22-162414", "specversion": "1.0", "id": "ae935a4b-bcc6-4359-9149-3c31e728877a", "source": "crn://confluent.cloud/", "time": "2023-09-22T16:24:15.666686762Z", "type": "io.confluent.cloud/request" } ``` ```sql ListWorkspaces ``` ```sql { "datacontenttype": "application/json", "data": { "serviceName": "crn://confluent.cloud/", "methodName": "ListWorkspace", "cloudResources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resourceId": "a56cf537-ab71-480e-b272-43e71531798b" }, { "type": "ENVIRONMENT", "resourceId": "env-rzhxp2" }, { "type": "FLINK_REGION", "resourceId": "aws.us-east-1" } ] }, "resource": { "type": "FLINK_WORKSPACE", "resourceId": "workspace-2023-09-22-162414" } } ], "authenticationInfo": { "principal": { "confluentUser": { "resourceId": "u-123456" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/identity-provider=Confluent/identity=u-123456" }, "requestMetadata": { "requestId": [ "5e926b0c56f3131f8fb350f228ad9b11" ], "clientAddress": [ { "ip": "1.2.3.4" } ] }, "request": { "accessType": "READ_ONLY", "data": { "environment_id": "env-rzhxp2", "org_resource_id": "a56cf537-ab71-480e-b272-43e71531798b", "page_size": 100 } }, "result": { "status": "SUCCESS", "data": { "data": [ { "name": "workspace-2023-09-22-162414", "org_id": "a56cf537-ab71-480e-b272-43e71531798b", "spec": { "compute_pool": { "id": "lfcp-stgcc30xr80" }, "service_account": null }, "environment_id": "env-rzhxp2" } ] } }, "resourceName": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/environment=env-rzhxp2/flink-region=aws.us-east-1/flink-workspace=workspace-2023-09-22-162414" }, "subject": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/environment=env-rzhxp2/flink-region=aws.us-east-1/flink-workspace=workspace-2023-09-22-162414", "specversion": "1.0", "id": "f1f9c92e-f3b8-425e-971f-c0206b0eadc0", "source": "crn://confluent.cloud/", "time": "2023-09-22T16:24:29.707277883Z", "type": "io.confluent.cloud/request" } ``` ```sql UpdateWorkspace ``` ```sql { "datacontenttype": "application/json", "data": { "serviceName": "crn://confluent.cloud/", "methodName": "UpdateWorkspace", "cloudResources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resourceId": "a56cf537-ab71-480e-b272-43e71531798b" }, { "type": "ENVIRONMENT", "resourceId": "env-rzhxp2" }, { "type": "FLINK_REGION", "resourceId": "aws.us-east-1" } ] }, "resource": { "type": "FLINK_WORKSPACE", "resourceId": "workspace-2023-09-22-162803" } } ], "authenticationInfo": { "principal": { "confluentUser": { "resourceId": "u-123456" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/identity-provider=Confluent/identity=u-123456" }, "requestMetadata": { "requestId": [ "8dd4507a31c9fa9f7ca08fdad18020c5" ], "clientAddress": [ { "ip": "1.2.3.4" } ] }, "request": { "accessType": "MODIFICATION", "data": { "spec": { "compute_pool": null, "service_account": null }, "workspace_name": "workspace-2023-09-22-162803", "environment_id": "env-rzhxp2", "org_resource_id": "a56cf537-ab71-480e-b272-43e71531798b" } }, "result": { "status": "SUCCESS", "data": { "environment_id": "env-rzhxp2", "name": "workspace-2023-09-22-162803", "org_id": "a56cf537-ab71-480e-b272-43e71531798b", "spec": { "compute_pool": null, "service_account": null } } }, "resourceName": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/environment=env-rzhxp2/flink-region=aws.us-east-1/flink-workspace=workspace-2023-09-22-162803" }, "subject": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/environment=env-rzhxp2/flink-region=aws.us-east-1/flink-workspace=workspace-2023-09-22-162803", "specversion": "1.0", "id": "b59d471f-3da3-41e2-847a-8363ab4f9077", "source": "crn://confluent.cloud/", "time": "2023-09-22T16:29:09.323947120Z", "type": "io.confluent.cloud/request" } ``` ```sql CreateStatement ``` ```sql { "datacontenttype": "application/json", "data": { "serviceName": "crn://confluent.cloud/", "methodName": "CreateStatement", "cloudResources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resourceId": "e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5" }, { "type": "ENVIRONMENT", "resourceId": "env-xx5q1x" }, { "type": "FLINK_REGION", "resourceId": "aws.us-west-2" } ] }, "resource": { "type": "STATEMENT", "resourceId": "d730eb03-d3b5-412d" } } ], "authenticationInfo": { "principal": { "confluentUser": { "resourceId": "u-5q0mkq" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5/identity-provider=Confluent/identity=u-5q0mkq" }, "requestMetadata": { "requestId": [ "38cf3bb10d833c36d7b022c633522153" ], "clientAddress": [ { "ip": "1.2.3.4" } ] }, "request": { "accessType": "MODIFICATION", "data": { "environment_id": "env-xx5q1x", "org_resource_id": "e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5", "spec": { "compute_pool_id": "lfcp-devccxwdpvk", "name": "d730eb03-d3b5-412d", "principal": "u-5q0mkq" } } }, "result": { "status": "SUCCESS", "data": { "metadata": { "environment_id": "env-xx5q1x" }, "spec": { "compute_pool_id": "lfcp-devccxwdpvk", "name": "d730eb03-d3b5-412d", "principal": "u-5q0mkq" } } }, "resourceName": "crn://confluent.cloud/organization=e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5/environment=env-xx5q1x/flink-region=aws.us-west-2/statement=d730eb03-d3b5-412d" }, "subject": "crn://confluent.cloud/organization=e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5/environment=env-xx5q1x/flink-region=aws.us-west-2/statement=d730eb03-d3b5-412d", "specversion": "1.0", "id": "d1fbc567-e5bb-4728-bf54-de88a1aba84e", "source": "crn://confluent.cloud/", "time": "2023-09-22T16:45:13.689395512Z", "type": "io.confluent.cloud/request" } ``` ```sql DeleteStatement ``` ```sql { "datacontenttype": "application/json", "data": { "serviceName": "crn://confluent.cloud/", "methodName": "DeleteStatement", "cloudResources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resourceId": "e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5" }, { "type": "ENVIRONMENT", "resourceId": "env-v6x7j0" }, { "type": "FLINK_REGION", "resourceId": "aws.us-west-2" } ] }, "resource": { "type": "STATEMENT", "resourceId": "workspace-2023-09-19-024944-b9c724de-c284-486e-a45f-e7dc1100e181" } } ], "authenticationInfo": { "principal": { "confluentUser": { "resourceId": "u-devccq71mwp" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5/identity-provider=Confluent/identity=u-devccq71mwp" }, "requestMetadata": { "requestId": [ "7e9362e01607ffacb08fa80dd2241db2" ], "clientAddress": [ { "ip": "1.2.3.4" } ] }, "request": { "accessType": "MODIFICATION", "data": { "StatementName": "workspace-2023-09-19-024944-b9c724de-c284-486e-a45f-e7dc1100e181", "OrgResourceId": "e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5", "EnvironmentId": "env-v6x7j0" } }, "result": { "status": "SUCCESS" }, "resourceName": "crn://confluent.cloud/organization=e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5/environment=env-v6x7j0/flink-region=aws.us-west-2/statement=workspace-2023-09-19-024944-b9c724de-c284-486e-a45f-e7dc1100e181" }, "subject": "crn://confluent.cloud/organization=e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5/environment=env-v6x7j0/flink-region=aws.us-west-2/statement=workspace-2023-09-19-024944-b9c724de-c284-486e-a45f-e7dc1100e181", "specversion": "1.0", "id": "de07cd1b-ec0f-4d0e-abce-050a993e7532", "source": "crn://confluent.cloud/", "time": "2023-09-22T16:48:05.106656163Z", "type": "io.confluent.cloud/request" } ``` ```sql GetStatement ``` ```sql { "datacontenttype": "application/json", "data": { "serviceName": "crn://confluent.cloud/", "methodName": "GetStatement", "cloudResources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resourceId": "a56cf537-ab71-480e-b272-43e71531798b" }, { "type": "ENVIRONMENT", "resourceId": "env-9pjxk0" }, { "type": "FLINK_REGION", "resourceId": "aws.us-west-2" } ] }, "resource": { "type": "STATEMENT", "resourceId": "928c8647-582b-4d3b" } } ], "authenticationInfo": { "principal": { "confluentUser": { "resourceId": "u-21r8oo" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/identity-provider=Confluent/identity=u-21r8oo" }, "requestMetadata": { "requestId": [ "a688f8810ba426f39511c04f7b511a0a" ], "clientAddress": [ { "ip": "1.2.3.4" } ] }, "request": { "accessType": "READ_ONLY", "data": { "statement_name": "928c8647-582b-4d3b", "environment_id": "env-9pjxk0" } }, "result": { "status": "SUCCESS", "data": { "metadata": { "environment_id": "env-9pjxk0" }, "spec": { "compute_pool_id": "lfcp-stgccgjvgr1", "name": "928c8647-582b-4d3b", "principal": "u-21r8oo" } } }, "resourceName": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/environment=env-9pjxk0/flink-region=aws.us-west-2/statement=928c8647-582b-4d3b" }, "subject": "crn://confluent.cloud/organization=a56cf537-ab71-480e-b272-43e71531798b/environment=env-9pjxk0/flink-region=aws.us-west-2/statement=928c8647-582b-4d3b", "specversion": "1.0", "id": "f6f45075-3d85-4e41-8677-c06a80ef903e", "source": "crn://confluent.cloud/", "time": "2023-09-22T16:35:20.968310060Z", "type": "io.confluent.cloud/request" } ``` ```sql ListStatements ``` ```sql { "datacontenttype": "application/json", "data": { "serviceName": "crn://confluent.cloud/", "methodName": "ListStatements", "cloudResources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resourceId": "e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5" }, { "type": "ENVIRONMENT", "resourceId": "env-xx3gwz" }, { "type": "FLINK_REGION", "resourceId": "aws.eu-west-1" } ] }, "resource": { "type": "STATEMENT", "resourceId": "3ab9a756-4bcf-475b" } }, { "scope": { "resources": [ { "type": "ORGANIZATION", "resourceId": "e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5" }, { "type": "ENVIRONMENT", "resourceId": "env-xx3gwz" }, { "type": "FLINK_REGION", "resourceId": "aws.eu-west-1" } ] }, "resource": { "type": "STATEMENT", "resourceId": "e264b999-269c-46d6" } } ], "authenticationInfo": { "principal": { "confluentUser": { "resourceId": "u-devccq71mwp" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5/identity-provider=Confluent/identity=u-devccq71mwp" }, "requestMetadata": { "requestId": [ "7cec6a84ab0b05ccb38ecf14981da31b" ], "clientAddress": [ { "ip": "1.2.3.4" } ] }, "request": { "accessType": "READ_ONLY", "data": { "compute_pool_id": "", "environment_id": "env-xx3gwz", "page_size": 100 } }, "result": { "status": "SUCCESS", "data": { "data": [ { "metadata": { "environment_id": "env-xx3gwz" }, "spec": { "compute_pool_id": "lfcp-devcc36z5jj", "name": "3ab9a756-4bcf-475b", "principal": "u-rk1gy7" } }, { "metadata": { "environment_id": "env-xx3gwz" }, "spec": { "principal": "u-rk1gy7", "compute_pool_id": "lfcp-devcc36z5jj", "name": "e264b999-269c-46d6" } } ] } }, "resourceName": "crn://confluent.cloud/organization=e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5/environment=env-xx3gwz/flink-region=aws.eu-west-1" }, "subject": "crn://confluent.cloud/organization=e9eb4f2c-ef73-475c-ba7f-6b37a4ff00e5/environment=env-xx3gwz/flink-region=aws.eu-west-1", "specversion": "1.0", "id": "5e6bc2d3-9881-442b-af0c-a0a6aa127867", "source": "crn://confluent.cloud/", "time": "2023-09-22T16:47:00.894461118Z", "type": "io.confluent.cloud/request" } ``` ```sql UpdateStatement ``` ```sql { "datacontenttype": "application/json", "data": { "service_name": "crn://confluent.cloud/", "method_name": "UpdateStatement", "cloud_resources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resource_id": "org-123" }, { "type": "ENVIRONMENT", "resource_id": "env-123" }, { "type": "FLINK_REGION", "resource_id": "aws.us-east-2" } ] }, "resource": { "type": "STATEMENT", "resource_id": "statement-123" } }, { "scope": { "resources": [ { "type": "ORGANIZATION", "resource_id": "org-123" }, { "type": "ENVIRONMENT", "resource_id": "env-123" }, { "type": "FLINK_REGION", "resource_id": "aws.us-east-2" } ] }, "resource": { "type": "STATEMENT", "resource_id": "statement-123" } } ], "authentication_info": { "exposure": "CUSTOMER", "principal": { "confluent_user": { "resource_id": "u-123" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=org-123/identity-provider=Confluent/identity=u-123" }, "request_metadata": { "request_id": [ "74726163656964303132333435363738" ], "client_address": [ { "ip": "127.0.0.1" } ] }, "request": { "access_type": "MODIFICATION", "data": { "environment_id": "env-123", "org_resource_id": "org-123", "spec": { "compute_pool_id": "lfcp-123", "name": "statement-123", "principal": "sa-123" } } }, "result": { "status": "SUCCESS", "data": { "metadata": { "environment_id": "env-123" }, "spec": { "compute_pool_id": "lfcp-123", "name": "statement-123", "principal": "sa-123" } } } } } ``` ```sql PatchStatement ``` ```sql { "datacontenttype": "application/json", "data": { "service_name": "crn://confluent.cloud/", "method_name": "PatchStatement", "cloud_resources": [ { "scope": { "resources": [ { "type": "ORGANIZATION", "resource_id": "org-123" }, { "type": "ENVIRONMENT", "resource_id": "env-123" }, { "type": "FLINK_REGION", "resource_id": "aws.us-east-2" } ] }, "resource": { "type": "STATEMENT", "resource_id": "statement-123" } }, { "scope": { "resources": [ { "type": "ORGANIZATION", "resource_id": "org-123" }, { "type": "ENVIRONMENT", "resource_id": "env-123" }, { "type": "FLINK_REGION", "resource_id": "aws.us-east-2" } ] }, "resource": { "type": "STATEMENT", "resource_id": "statement-123" } } ], "authentication_info": { "exposure": "CUSTOMER", "principal": { "confluent_user": { "resource_id": "u-123" } }, "result": "SUCCESS", "identity": "crn://confluent.cloud/organization=org-123/identity-provider=Confluent/identity=u-123" }, "request_metadata": { "request_id": [ "74726163656964303132333435363738" ], "client_address": [ { "ip": "127.0.0.1" } ] }, "request": { "access_type": "MODIFICATION", "data": { "environment_id": "env-123", "org_resource_id": "org-123", "statement_name": "statement-123" } }, "result": { "status": "SUCCESS" } } } ``` --- ### Query Encrypted Data with Flink & Confluent Cloud | Confluent Documentation Source: https://docs.confluent.io/cloud/current/security/encrypt/csfle/flink-integration.html Secure Stream Processing: Query Encrypted Data with Flink on Confluent Cloud¶ Processing sensitive data like personally identifiable information (PII) or financial records in real-time data streams presents a significant challenge. How do you perform meaningful operations like filtering, joining, or aggregating data while it remains fully encrypted and secure? Traditionally, you couldn’t. But with Client-Side Field Level Encryption (CSFLE) and deterministic encryption, Confluent Cloud for Flink gives you the power to query and process encrypted data streams directly, unlocking critical use cases while ensuring your data’s privacy and compliance. This powerful combination allows you to leverage the full capabilities of stream processing while your sensitive data remains protected from start to finish. What is deterministic encryption?¶ At the core of this capability is deterministic encryption, a method where encrypting the same plaintext value with the same key always produces the exact same ciphertext. This property is what allows Flink to process the encrypted ciphertext directly, effectively performing equality comparisons, joins, and groupings on the original data without ever needing to decrypt it. Supported operations on encrypted data¶ While Flink itself does not perform decryption, it operates on raw bytes. This allows it to process the ciphertext produced by CSFLE, and because the encryption is deterministic, you can: Process non-encrypted fields in your data stream without any limitations. Run powerful SQL queries that operate directly on your encrypted fields. Here are some of the key operations possible on deterministically encrypted columns: Filtering and equality: Use encrypted fields in WHERE clauses for exact matches. Grouping and aggregation: Perform GROUP BY operations on encrypted fields. The only aggregation functions that work correctly are those based on uniqueness comparison, such as COUNT and COUNT(DISTINCT). Joins: Join multiple streams together using an encrypted column (for example, joining a stream of user activity to a stream of user profiles on an encrypted user ID). Window functions: Use comparison-based window functions like LEAD and LAG. Example: Query on an encrypted column¶ Suppose you want to count the number of active users, grouping by the deterministically encrypted email field: SELECT COUNT(DISTINCT email_encrypted) FROM users_stream WHERE status = 'ACTIVE'; This example shows a common use case — counting unique, active users — where the encrypted email_encrypted field can be grouped or filtered without being decrypted, leveraging deterministic encryption. Important limitations and trade-offs¶ This capability comes with two important considerations you must understand: Limited aggregation functions: Because the data’s actual value is never revealed to Flink, mathematical operations do not produce correct results. Aggregation functions like SUM, AVG, MIN, and MAX execute but yield erroneous values. The deterministic trade-off: Deterministic encryption inherently reveals when two encrypted values are identical. This is a necessary trade-off that enables querying, but it’s a piece of information that can be analyzed. You should carefully consider this when deciding which fields to encrypt deterministically. How it works¶ When you use CSFLE with Flink on Confluent Cloud, the security of your data is maintained because the actual decryption only happens when the data is read from a sink (like a database or materialized view) by an authorized client application that holds the decryption keys. Flink processes the data without ever having access to the plaintext. This ensures that sensitive data cannot be exposed even in the event of a compromise within the processing environment. Powered by Google Tink¶ The CSFLE implementation uses the open-source Google Tink Cryptographic library to perform deterministic encryption using the AES256_SIV algorithm. For more information on Google Tink, see the following: I want to encrypt data deterministically Deterministic Authenticated Encryption with Associated Data Use Tink to meet FIPS 140-2 requirements Related content¶ Protect Sensitive Data Using Client-Side Field Level Encryption on Confluent Cloud Secure Stream Processing: Query Encrypted Data with Flink on Confluent Cloud #### Code Examples ```sql COUNT(DISTINCT) ``` ```sql SELECT COUNT(DISTINCT email_encrypted) FROM users_stream WHERE status = 'ACTIVE'; ``` ```sql email_encrypted ``` --- ### Query Tableflow Tables with Confluent Cloud for Apache Flink | Confluent Documentation Source: https://docs.confluent.io/cloud/current/topics/tableflow/how-to-guides/query-engines/query-with-flink.html Query Tableflow Tables with Flink in Confluent Cloud for Apache Flink®¶ Confluent Cloud for Apache Flink® supports snapshot queries that read data from a Tableflow-enabled topic at a specific point in time. Querying a Tableflow-enabled topic is similar to querying a Flink topic. If Tableflow is enabled on a topic with Confluent Managed Storage, the query reads from both Kafka and Parquet. If Tableflow is enabled on a topic with custom storage, the query reads from your S3 bucket. This guide shows how to run a snapshot query on a Tableflow-enabled topic. Note Snapshot query is an Early Access Program feature in Confluent Cloud for Apache Flink. An Early Access feature is a component of Confluent Cloud introduced to gain feedback. This feature should be used only for evaluation and non-production testing purposes or to provide feedback to Confluent, particularly as it becomes more widely available in follow-on preview editions. Early Access Program features are intended for evaluation use in development and testing environments only, and not for production use. Early Access Program features are provided: (a) without support; (b) “AS IS”; and (c) without indemnification, warranty, or condition of any kind. No service level commitment will apply to Early Access Program features. Early Access Program features are considered to be a Proof of Concept as defined in the Confluent Cloud Terms of Service. Confluent may discontinue providing preview releases of the Early Access Program features at any time in Confluent’s sole discretion. Prerequisites¶ Access to Confluent Cloud. The OrganizationAdmin, EnvironmentAdmin, or FlinkAdmin role for creating compute pools, or the FlinkDeveloper role if you already have a compute pool. If you don’t have the appropriate role, contact your OrganizationAdmin or EnvironmentAdmin. For more information, see Grant Role-Based Access in Confluent Cloud for Apache Flink. A provisioned Flink compute pool. Step 1: Enable Tableflow on your topic¶ If you want to try querying a table with mock data, complete the steps in Run a Snapshot Query, then proceed to the next step. If you want to query a table with mock data or data from your Kafka topic, and you want to use Confluent Managed Storage, complete the following steps, then proceed to Step 2: Run a snapshot query with Tableflow. In Confluent Cloud Console, navigate to your cluster. In the navigation menu, click Topics. In the topics list, find your topic and click it to open the details page. Click Enable Tableflow. In the Enable Tableflow dialog, select Iceberg and click Use Confluent storage. The topic status updates to Tableflow Syncing. If you want to query a table with data from your Kafka topic, and you want to use custom storage, complete steps 1-4 in Tableflow Quick Start Using Your Storage and AWS Glue and proceed to Step 2: Run a snapshot query with Tableflow. Step 2: Run a snapshot query with Tableflow¶ Once Tableflow is enabled on your topic, you can run a snapshot query on the table by using the same statements that you use for Flink tables. In a Flink workspace or the Flink SQL shell, prepend your query with the following SET statement: SET 'sql.snapshot.mode' = 'now'; Also, in the Flink workspace, you can change the Mode dropdown setting to Snapshot before running your query. For more information, see Run a Snapshot Query. Related content¶ Query with AWS Query with Snowflake Query with Trino Stream Processing with Confluent Cloud for Apache Flink Note This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2. #### Code Examples ```sql SET 'sql.snapshot.mode' = 'now'; ``` --- ### confluent flink application create | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/application/confluent_flink_application_create.html confluent flink application create Description Create a Flink application. confluent flink application create [flags] Flags --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "json" or "yaml". (default "json") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink application - Manage Flink applications. #### Code Examples ```sql confluent flink application create [flags] ``` ```sql --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "json" or "yaml". (default "json") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink application delete | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/application/confluent_flink_application_delete.html confluent flink application delete Description Delete one or more Flink applications. confluent flink application delete [name-2] ... [name-n] [flags] Flags --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. --force Skip the deletion confirmation prompt. Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink application - Manage Flink applications. #### Code Examples ```sql confluent flink application delete [name-2] ... [name-n] [flags] ``` ```sql --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. --force Skip the deletion confirmation prompt. ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink application describe | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/application/confluent_flink_application_describe.html confluent flink application describe Description Describe a Flink application. confluent flink application describe [flags] Flags --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "json" or "yaml". (default "json") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink application - Manage Flink applications. #### Code Examples ```sql confluent flink application describe [flags] ``` ```sql --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "json" or "yaml". (default "json") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink application list | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/application/confluent_flink_application_list.html confluent flink application list Description List Flink applications. confluent flink application list [flags] Flags --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink application - Manage Flink applications. #### Code Examples ```sql confluent flink application list [flags] ``` ```sql --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink application update | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/application/confluent_flink_application_update.html confluent flink application update Description Update a Flink application. confluent flink application update [flags] Flags --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "json" or "yaml". (default "json") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink application - Manage Flink applications. #### Code Examples ```sql confluent flink application update [flags] ``` ```sql --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "json" or "yaml". (default "json") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink application web-ui-forward | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/application/confluent_flink_application_web-ui-forward.html confluent flink application web-ui-forward Description Forward the web UI of a Flink application. confluent flink application web-ui-forward [flags] Flags --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. --port uint16 Port to forward the web UI to. If not provided, a random, OS-assigned port will be used. Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink application - Manage Flink applications. #### Code Examples ```sql confluent flink application web-ui-forward [flags] ``` ```sql --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. --port uint16 Port to forward the web UI to. If not provided, a random, OS-assigned port will be used. ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink application | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/application/index.html confluent flink application Aliases application, app Description Manage Flink applications. Subcommands Command Description confluent flink application create Create a Flink application. confluent flink application delete Delete one or more Flink applications. confluent flink application describe Describe a Flink application. confluent flink application list List Flink applications. confluent flink application update Update a Flink application. confluent flink application web-ui-forward Forward the web UI of a Flink application. #### Code Examples ```sql application, app ``` --- ### confluent flink artifact create | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/artifact/confluent_flink_artifact_create.html confluent flink artifact create Description Create a Flink UDF artifact. confluent flink artifact create [flags] Flags --artifact-file string REQUIRED: Flink artifact JAR file or ZIP file. --cloud string REQUIRED: Specify the cloud provider as "aws", "azure", or "gcp". --region string REQUIRED: Cloud region for Flink (use "confluent flink region list" to see all). --environment string Environment ID. --runtime-language string Specify the Flink artifact runtime language as "python" or "java". (default "java") --description string Specify the Flink artifact description. --documentation-link string Specify the Flink artifact documentation link. --context string CLI context name. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). Examples Create Flink artifact “my-flink-artifact”. confluent flink artifact create my-flink-artifact --artifact-file artifact.jar --cloud aws --region us-west-2 --environment env-123456 Create Flink artifact “flink-java-artifact”. confluent flink artifact create my-flink-artifact --artifact-file artifact.jar --cloud aws --region us-west-2 --environment env-123456 --description flinkJavaScalar See Also confluent flink artifact - Manage Flink UDF artifacts. #### Code Examples ```sql confluent flink artifact create [flags] ``` ```sql --artifact-file string REQUIRED: Flink artifact JAR file or ZIP file. --cloud string REQUIRED: Specify the cloud provider as "aws", "azure", or "gcp". --region string REQUIRED: Cloud region for Flink (use "confluent flink region list" to see all). --environment string Environment ID. --runtime-language string Specify the Flink artifact runtime language as "python" or "java". (default "java") --description string Specify the Flink artifact description. --documentation-link string Specify the Flink artifact documentation link. --context string CLI context name. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` ```sql confluent flink artifact create my-flink-artifact --artifact-file artifact.jar --cloud aws --region us-west-2 --environment env-123456 ``` ```sql confluent flink artifact create my-flink-artifact --artifact-file artifact.jar --cloud aws --region us-west-2 --environment env-123456 --description flinkJavaScalar ``` --- ### confluent flink artifact delete | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/artifact/confluent_flink_artifact_delete.html confluent flink artifact delete Description Delete one or more Flink UDF artifacts. confluent flink artifact delete [id-2] ... [id-n] [flags] Flags --cloud string REQUIRED: Specify the cloud provider as "aws", "azure", or "gcp". --region string REQUIRED: Cloud region for Flink (use "confluent flink region list" to see all). --environment string Environment ID. --force Skip the deletion confirmation prompt. --context string CLI context name. Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). Examples Delete Flink UDF artifact. confluent flink artifact delete --cloud aws --region us-west-2 --environment env-123456 cfa-123456 See Also confluent flink artifact - Manage Flink UDF artifacts. #### Code Examples ```sql confluent flink artifact delete [id-2] ... [id-n] [flags] ``` ```sql --cloud string REQUIRED: Specify the cloud provider as "aws", "azure", or "gcp". --region string REQUIRED: Cloud region for Flink (use "confluent flink region list" to see all). --environment string Environment ID. --force Skip the deletion confirmation prompt. --context string CLI context name. ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` ```sql confluent flink artifact delete --cloud aws --region us-west-2 --environment env-123456 cfa-123456 ``` --- ### confluent flink artifact describe | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/artifact/confluent_flink_artifact_describe.html confluent flink artifact describe Description Describe a Flink UDF artifact. confluent flink artifact describe [flags] Flags --cloud string REQUIRED: Specify the cloud provider as "aws", "azure", or "gcp". --region string REQUIRED: Cloud region for Flink (use "confluent flink region list" to see all). --environment string Environment ID. --context string CLI context name. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). Examples Describe Flink UDF artifact. confluent flink artifact describe --cloud aws --region us-west-2 --environment env-123456 cfa-123456 See Also confluent flink artifact - Manage Flink UDF artifacts. #### Code Examples ```sql confluent flink artifact describe [flags] ``` ```sql --cloud string REQUIRED: Specify the cloud provider as "aws", "azure", or "gcp". --region string REQUIRED: Cloud region for Flink (use "confluent flink region list" to see all). --environment string Environment ID. --context string CLI context name. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` ```sql confluent flink artifact describe --cloud aws --region us-west-2 --environment env-123456 cfa-123456 ``` --- ### confluent flink artifact list | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/artifact/confluent_flink_artifact_list.html confluent flink artifact list Description List Flink UDF artifacts. confluent flink artifact list [flags] Flags --cloud string REQUIRED: Specify the cloud provider as "aws", "azure", or "gcp". --region string REQUIRED: Cloud region for Flink (use "confluent flink region list" to see all). --environment string Environment ID. --context string CLI context name. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). Examples List Flink UDF artifacts. confluent flink artifact list --cloud aws --region us-west-2 --environment env-123456 See Also confluent flink artifact - Manage Flink UDF artifacts. #### Code Examples ```sql confluent flink artifact list [flags] ``` ```sql --cloud string REQUIRED: Specify the cloud provider as "aws", "azure", or "gcp". --region string REQUIRED: Cloud region for Flink (use "confluent flink region list" to see all). --environment string Environment ID. --context string CLI context name. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` ```sql confluent flink artifact list --cloud aws --region us-west-2 --environment env-123456 ``` --- ### confluent flink artifact | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/artifact/index.html confluent flink artifact Description Manage Flink UDF artifacts. Subcommands Command Description confluent flink artifact create Create a Flink UDF artifact. confluent flink artifact delete Delete one or more Flink UDF artifacts. confluent flink artifact describe Describe a Flink UDF artifact. confluent flink artifact list List Flink UDF artifacts. --- ### confluent flink catalog create | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/catalog/confluent_flink_catalog_create.html confluent flink catalog create Description Create a Flink catalog in Confluent Platform that provides metadata about tables and other database objects such as views and functions. confluent flink catalog create [flags] Flags --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink catalog - Manage Flink catalogs in Confluent Platform. #### Code Examples ```sql confluent flink catalog create [flags] ``` ```sql --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink catalog delete | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/catalog/confluent_flink_catalog_delete.html confluent flink catalog delete Description Delete one or more Flink catalogs in Confluent Platform. confluent flink catalog delete [name-2] ... [name-n] [flags] Flags --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. --force Skip the deletion confirmation prompt. Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink catalog - Manage Flink catalogs in Confluent Platform. #### Code Examples ```sql confluent flink catalog delete [name-2] ... [name-n] [flags] ``` ```sql --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. --force Skip the deletion confirmation prompt. ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink catalog describe | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/catalog/confluent_flink_catalog_describe.html confluent flink catalog describe Description Describe a Flink catalog in Confluent Platform. confluent flink catalog describe [flags] Flags --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink catalog - Manage Flink catalogs in Confluent Platform. #### Code Examples ```sql confluent flink catalog describe [flags] ``` ```sql --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink catalog list | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/catalog/confluent_flink_catalog_list.html confluent flink catalog list Description List Flink catalogs in Confluent Platform. confluent flink catalog list [flags] Flags --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink catalog - Manage Flink catalogs in Confluent Platform. #### Code Examples ```sql confluent flink catalog list [flags] ``` ```sql --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink catalog | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/catalog/index.html confluent flink catalog Description Manage Flink catalogs in Confluent Platform. Subcommands Command Description confluent flink catalog create Create a Flink catalog. confluent flink catalog delete Delete one or more Flink catalogs in Confluent Platform. confluent flink catalog describe Describe a Flink catalog in Confluent Platform. confluent flink catalog list List Flink catalogs in Confluent Platform. --- ### confluent flink compute-pool create | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/compute-pool/confluent_flink_compute-pool_create.html confluent flink compute-pool create Description Cloud Create a Flink compute pool. confluent flink compute-pool create [flags] On-Premises Create a Flink compute pool in Confluent Platform. confluent flink compute-pool create [flags] Flags Cloud --cloud string REQUIRED: Specify the cloud provider as "aws", "azure", or "gcp". --region string REQUIRED: Cloud region for Flink (use "confluent flink region list" to see all). --max-cfu int32 Maximum number of Confluent Flink Units (CFU). (default 5) --environment string Environment ID. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") On-Premises --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). Examples Cloud Create Flink compute pool “my-compute-pool” in AWS with 5 CFUs. confluent flink compute-pool create my-compute-pool --cloud aws --region us-west-2 --max-cfu 5 On-Premises No examples. See Also confluent flink compute-pool - Manage Flink compute pools. #### Code Examples ```sql confluent flink compute-pool create [flags] ``` ```sql confluent flink compute-pool create [flags] ``` ```sql --cloud string REQUIRED: Specify the cloud provider as "aws", "azure", or "gcp". --region string REQUIRED: Cloud region for Flink (use "confluent flink region list" to see all). --max-cfu int32 Maximum number of Confluent Flink Units (CFU). (default 5) --environment string Environment ID. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` ```sql confluent flink compute-pool create my-compute-pool --cloud aws --region us-west-2 --max-cfu 5 ``` --- ### confluent flink compute-pool delete | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/compute-pool/confluent_flink_compute-pool_delete.html confluent flink compute-pool delete Description Cloud Delete one or more Flink compute pools. confluent flink compute-pool delete [id-2] ... [id-n] [flags] On-Premises Delete one or more Flink compute pools in Confluent Platform, a compute pool can only be deleted if there are no statements associated with it. confluent flink compute-pool delete [name-2] ... [name-n] [flags] Flags Cloud --force Skip the deletion confirmation prompt. --environment string Environment ID. --context string CLI context name. On-Premises --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. --force Skip the deletion confirmation prompt. Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink compute-pool - Manage Flink compute pools. #### Code Examples ```sql confluent flink compute-pool delete [id-2] ... [id-n] [flags] ``` ```sql confluent flink compute-pool delete [name-2] ... [name-n] [flags] ``` ```sql --force Skip the deletion confirmation prompt. --environment string Environment ID. --context string CLI context name. ``` ```sql --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. --force Skip the deletion confirmation prompt. ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink compute-pool describe | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/compute-pool/confluent_flink_compute-pool_describe.html confluent flink compute-pool describe Description Cloud Describe a Flink compute pool. confluent flink compute-pool describe [id] [flags] On-Premises Describe a Flink compute pool in Confluent Platform. confluent flink compute-pool describe [flags] Flags Cloud --environment string Environment ID. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") On-Premises --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink compute-pool - Manage Flink compute pools. #### Code Examples ```sql confluent flink compute-pool describe [id] [flags] ``` ```sql confluent flink compute-pool describe [flags] ``` ```sql --environment string Environment ID. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink compute-pool list | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/compute-pool/confluent_flink_compute-pool_list.html confluent flink compute-pool list Description Cloud List Flink compute pools. confluent flink compute-pool list [flags] On-Premises List Flink compute pools in Confluent Platform. confluent flink compute-pool list [flags] Flags Cloud --region string Cloud region for Flink (use "confluent flink region list" to see all). --environment string Environment ID. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") On-Premises --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink compute-pool - Manage Flink compute pools. #### Code Examples ```sql confluent flink compute-pool list [flags] ``` ```sql confluent flink compute-pool list [flags] ``` ```sql --region string Cloud region for Flink (use "confluent flink region list" to see all). --environment string Environment ID. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink compute-pool unset | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/compute-pool/confluent_flink_compute-pool_unset.html confluent flink compute-pool unset Description Unset the current Flink compute pool that was set with the use command. confluent flink compute-pool unset [flags] Flags -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). Examples Unset default compute pool: confluent flink compute-pool unset See Also confluent flink compute-pool - Manage Flink compute pools. #### Code Examples ```sql confluent flink compute-pool unset [flags] ``` ```sql -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` ```sql confluent flink compute-pool unset ``` --- ### confluent flink compute-pool update | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/compute-pool/confluent_flink_compute-pool_update.html confluent flink compute-pool update Description Update a Flink compute pool. confluent flink compute-pool update [id] [flags] Flags --name string Name of the compute pool. --max-cfu int32 Maximum number of Confluent Flink Units (CFU). --environment string Environment ID. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). Examples Update name and CFU count of a Flink compute pool. confluent flink compute-pool update lfcp-123456 --name "new name" --max-cfu 5 See Also confluent flink compute-pool - Manage Flink compute pools. #### Code Examples ```sql confluent flink compute-pool update [id] [flags] ``` ```sql --name string Name of the compute pool. --max-cfu int32 Maximum number of Confluent Flink Units (CFU). --environment string Environment ID. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` ```sql confluent flink compute-pool update lfcp-123456 --name "new name" --max-cfu 5 ``` --- ### confluent flink compute-pool use | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/compute-pool/confluent_flink_compute-pool_use.html confluent flink compute-pool use Description Choose a Flink compute pool to be used in subsequent commands which support passing a compute pool with the --compute-pool flag. confluent flink compute-pool use [flags] Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink compute-pool - Manage Flink compute pools. #### Code Examples ```sql --compute-pool ``` ```sql confluent flink compute-pool use [flags] ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink compute-pool | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/compute-pool/index.html confluent flink compute-pool Description Manage Flink compute pools. Subcommands Cloud Command Description confluent flink compute-pool create Create a Flink compute pool. confluent flink compute-pool delete Delete one or more Flink compute pools. confluent flink compute-pool describe Describe a Flink compute pool. confluent flink compute-pool list List Flink compute pools. confluent flink compute-pool unset Unset the current Flink compute pool. confluent flink compute-pool update Update a Flink compute pool. confluent flink compute-pool use Use a Flink compute pool in subsequent commands. On-Premises Command Description confluent flink compute-pool create Create a Flink compute pool in Confluent Platform. confluent flink compute-pool delete Delete one or more Flink compute pools. confluent flink compute-pool describe Describe a Flink compute pool in Confluent Platform. confluent flink compute-pool list List Flink compute pools in Confluent Platform. --- ### confluent flink shell | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/confluent_flink_shell.html confluent flink shell Description Start Flink interactive SQL client. confluent flink shell [flags] Flags --compute-pool string Flink compute pool ID. --service-account string Service account ID. --database string The database which will be used as the default database. When using Kafka, this is the cluster ID. --environment string Environment ID. --context string CLI context name. Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). Examples For a Quick Start with examples in context, see https://docs.confluent.io/cloud/current/flink/get-started/quick-start-shell.html. See Also confluent flink - Manage Apache Flink. #### Code Examples ```sql confluent flink shell [flags] ``` ```sql --compute-pool string Flink compute pool ID. --service-account string Service account ID. --database string The database which will be used as the default database. When using Kafka, this is the cluster ID. --environment string Environment ID. --context string CLI context name. ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink connection create | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/connection/confluent_flink_connection_create.html confluent flink connection create Description Create a Flink connection. confluent flink connection create [flags] Flags --cloud string REQUIRED: Specify the cloud provider as "aws", "azure", or "gcp". --region string REQUIRED: Cloud region for Flink (use "confluent flink region list" to see all). --type string REQUIRED: Specify the connection type as "openai", "azureml", "azureopenai", "bedrock", "sagemaker", "googleai", "vertexai", "mongodb", "elastic", "pinecone", "couchbase", "confluent_jdbc", "rest", or "mcp_server". --endpoint string REQUIRED: Specify endpoint for the connection. --api-key string Specify API key for the type: "openai", "azureml", "azureopenai", "googleai", "elastic", "pinecone", or "mcp_server". --aws-access-key string Specify access key for the type: "bedrock" or "sagemaker". --aws-secret-key string Specify secret key for the type: "bedrock" or "sagemaker". --aws-session-token string Specify session token for the type: "bedrock" or "sagemaker". --service-key string Specify service key for the type: "vertexai". --username string Specify username for the type: "mongodb", "couchbase", "confluent_jdbc", or "rest". --password string Specify password for the type: "mongodb", "couchbase", "confluent_jdbc", or "rest". --token string Specify bearer token for the type: "rest" or "mcp_server". --token-endpoint string Specify OAuth2 token endpoint for the type: "rest" or "mcp_server". --client-id string Specify OAuth2 client ID for the type: "rest" or "mcp_server". --client-secret string Specify OAuth2 client secret for the type: "rest" or "mcp_server". --scope string Specify OAuth2 scope for the type: "rest" or "mcp_server". --sse-endpoint string Specify SSE endpoint for the type: "mcp_server". --transport-type string Specify transport type for the type: "mcp_server". Default: SSE. --environment string Environment ID. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). Examples Create Flink connection “my-connection” in AWS us-west-2 for OpenAPI with endpoint and API key. confluent flink connection create my-connection --cloud aws --region us-west-2 --type openai --endpoint https://api.openai.com/v1/chat/completions --api-key 0000000000000000 See Also confluent flink connection - Manage Flink connections. #### Code Examples ```sql confluent flink connection create [flags] ``` ```sql --cloud string REQUIRED: Specify the cloud provider as "aws", "azure", or "gcp". --region string REQUIRED: Cloud region for Flink (use "confluent flink region list" to see all). --type string REQUIRED: Specify the connection type as "openai", "azureml", "azureopenai", "bedrock", "sagemaker", "googleai", "vertexai", "mongodb", "elastic", "pinecone", "couchbase", "confluent_jdbc", "rest", or "mcp_server". --endpoint string REQUIRED: Specify endpoint for the connection. --api-key string Specify API key for the type: "openai", "azureml", "azureopenai", "googleai", "elastic", "pinecone", or "mcp_server". --aws-access-key string Specify access key for the type: "bedrock" or "sagemaker". --aws-secret-key string Specify secret key for the type: "bedrock" or "sagemaker". --aws-session-token string Specify session token for the type: "bedrock" or "sagemaker". --service-key string Specify service key for the type: "vertexai". --username string Specify username for the type: "mongodb", "couchbase", "confluent_jdbc", or "rest". --password string Specify password for the type: "mongodb", "couchbase", "confluent_jdbc", or "rest". --token string Specify bearer token for the type: "rest" or "mcp_server". --token-endpoint string Specify OAuth2 token endpoint for the type: "rest" or "mcp_server". --client-id string Specify OAuth2 client ID for the type: "rest" or "mcp_server". --client-secret string Specify OAuth2 client secret for the type: "rest" or "mcp_server". --scope string Specify OAuth2 scope for the type: "rest" or "mcp_server". --sse-endpoint string Specify SSE endpoint for the type: "mcp_server". --transport-type string Specify transport type for the type: "mcp_server". Default: SSE. --environment string Environment ID. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` ```sql confluent flink connection create my-connection --cloud aws --region us-west-2 --type openai --endpoint https://api.openai.com/v1/chat/completions --api-key 0000000000000000 ``` --- ### confluent flink connection delete | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/connection/confluent_flink_connection_delete.html confluent flink connection delete Description Delete one or more Flink connections. confluent flink connection delete [name-2] ... [name-n] [flags] Flags --cloud string REQUIRED: Specify the cloud provider as "aws", "azure", or "gcp". --region string REQUIRED: Cloud region for Flink (use "confluent flink region list" to see all). --force Skip the deletion confirmation prompt. --environment string Environment ID. --context string CLI context name. Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink connection - Manage Flink connections. #### Code Examples ```sql confluent flink connection delete [name-2] ... [name-n] [flags] ``` ```sql --cloud string REQUIRED: Specify the cloud provider as "aws", "azure", or "gcp". --region string REQUIRED: Cloud region for Flink (use "confluent flink region list" to see all). --force Skip the deletion confirmation prompt. --environment string Environment ID. --context string CLI context name. ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink connection describe | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/connection/confluent_flink_connection_describe.html confluent flink connection describe Description Describe a Flink connection. confluent flink connection describe [flags] Flags --cloud string REQUIRED: Specify the cloud provider as "aws", "azure", or "gcp". --region string REQUIRED: Cloud region for Flink (use "confluent flink region list" to see all). --environment string Environment ID. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink connection - Manage Flink connections. #### Code Examples ```sql confluent flink connection describe [flags] ``` ```sql --cloud string REQUIRED: Specify the cloud provider as "aws", "azure", or "gcp". --region string REQUIRED: Cloud region for Flink (use "confluent flink region list" to see all). --environment string Environment ID. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink connection list | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/connection/confluent_flink_connection_list.html confluent flink connection list Description List Flink connections. confluent flink connection list [flags] Flags --cloud string REQUIRED: Specify the cloud provider as "aws", "azure", or "gcp". --region string REQUIRED: Cloud region for Flink (use "confluent flink region list" to see all). --environment string Environment ID. --type string Specify the connection type as "openai", "azureml", "azureopenai", "bedrock", "sagemaker", "googleai", "vertexai", "mongodb", "elastic", "pinecone", "couchbase", "confluent_jdbc", "rest", or "mcp_server". -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink connection - Manage Flink connections. #### Code Examples ```sql confluent flink connection list [flags] ``` ```sql --cloud string REQUIRED: Specify the cloud provider as "aws", "azure", or "gcp". --region string REQUIRED: Cloud region for Flink (use "confluent flink region list" to see all). --environment string Environment ID. --type string Specify the connection type as "openai", "azureml", "azureopenai", "bedrock", "sagemaker", "googleai", "vertexai", "mongodb", "elastic", "pinecone", "couchbase", "confluent_jdbc", "rest", or "mcp_server". -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink connection update | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/connection/confluent_flink_connection_update.html confluent flink connection update Description Update a Flink connection. Only secret can be updated. confluent flink connection update [flags] Flags --cloud string REQUIRED: Specify the cloud provider as "aws", "azure", or "gcp". --region string REQUIRED: Cloud region for Flink (use "confluent flink region list" to see all). --api-key string Specify API key for the type: "openai", "azureml", "azureopenai", "googleai", "elastic", "pinecone", or "mcp_server". --aws-access-key string Specify access key for the type: "bedrock" or "sagemaker". --aws-secret-key string Specify secret key for the type: "bedrock" or "sagemaker". --aws-session-token string Specify session token for the type: "bedrock" or "sagemaker". --service-key string Specify service key for the type: "vertexai". --username string Specify username for the type: "mongodb", "couchbase", "confluent_jdbc", or "rest". --password string Specify password for the type: "mongodb", "couchbase", "confluent_jdbc", or "rest". --token string Specify bearer token for the type: "rest" or "mcp_server". --token-endpoint string Specify OAuth2 token endpoint for the type: "rest" or "mcp_server". --client-id string Specify OAuth2 client ID for the type: "rest" or "mcp_server". --client-secret string Specify OAuth2 client secret for the type: "rest" or "mcp_server". --scope string Specify OAuth2 scope for the type: "rest" or "mcp_server". --sse-endpoint string Specify SSE endpoint for the type: "mcp_server". --transport-type string Specify transport type for the type: "mcp_server". Default: SSE. --environment string Environment ID. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). Examples Update API key of Flink connection “my-connection”. confluent flink connection update my-connection --cloud aws --region us-west-2 --api-key new-key See Also confluent flink connection - Manage Flink connections. #### Code Examples ```sql confluent flink connection update [flags] ``` ```sql --cloud string REQUIRED: Specify the cloud provider as "aws", "azure", or "gcp". --region string REQUIRED: Cloud region for Flink (use "confluent flink region list" to see all). --api-key string Specify API key for the type: "openai", "azureml", "azureopenai", "googleai", "elastic", "pinecone", or "mcp_server". --aws-access-key string Specify access key for the type: "bedrock" or "sagemaker". --aws-secret-key string Specify secret key for the type: "bedrock" or "sagemaker". --aws-session-token string Specify session token for the type: "bedrock" or "sagemaker". --service-key string Specify service key for the type: "vertexai". --username string Specify username for the type: "mongodb", "couchbase", "confluent_jdbc", or "rest". --password string Specify password for the type: "mongodb", "couchbase", "confluent_jdbc", or "rest". --token string Specify bearer token for the type: "rest" or "mcp_server". --token-endpoint string Specify OAuth2 token endpoint for the type: "rest" or "mcp_server". --client-id string Specify OAuth2 client ID for the type: "rest" or "mcp_server". --client-secret string Specify OAuth2 client secret for the type: "rest" or "mcp_server". --scope string Specify OAuth2 scope for the type: "rest" or "mcp_server". --sse-endpoint string Specify SSE endpoint for the type: "mcp_server". --transport-type string Specify transport type for the type: "mcp_server". Default: SSE. --environment string Environment ID. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` ```sql confluent flink connection update my-connection --cloud aws --region us-west-2 --api-key new-key ``` --- ### confluent flink connection | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/connection/index.html confluent flink connection Description Manage Flink connections. Subcommands Command Description confluent flink connection create Create a Flink connection. confluent flink connection delete Delete one or more Flink connections. confluent flink connection describe Describe a Flink connection. confluent flink connection list List Flink connections. confluent flink connection update Update a Flink connection. Only secret can be updated. --- ### confluent flink connectivity-type use | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/connectivity-type/confluent_flink_connectivity-type_use.html confluent flink connectivity-type use Description Select a Flink connectivity type for the current environment as “public” or “private”. If unspecified, the CLI will default to public connectivity type. confluent flink connectivity-type use [flags] Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink connectivity-type - Manage Flink connectivity type. #### Code Examples ```sql confluent flink connectivity-type use [flags] ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink connectivity-type | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/connectivity-type/index.html confluent flink connectivity-type Description Manage Flink connectivity type. Subcommands Command Description confluent flink connectivity-type use Select a Flink connectivity type. --- ### confluent flink endpoint list | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/endpoint/confluent_flink_endpoint_list.html confluent flink endpoint list Description List Flink endpoint. confluent flink endpoint list [flags] Flags --context string CLI context name. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). Examples List the available Flink endpoints with current cloud provider and region. confluent flink endpoint list See Also confluent flink endpoint - Manage Flink endpoint. #### Code Examples ```sql confluent flink endpoint list [flags] ``` ```sql --context string CLI context name. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` ```sql confluent flink endpoint list ``` --- ### confluent flink endpoint unset | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/endpoint/confluent_flink_endpoint_unset.html confluent flink endpoint unset Description Unset the current Flink endpoint that was previously set with the use command. confluent flink endpoint unset [flags] Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). Examples Unset the current Flink endpoint “https://flink.us-east-1.aws.confluent.cloud”. confluent flink endpoint unset See Also confluent flink endpoint - Manage Flink endpoint. #### Code Examples ```sql confluent flink endpoint unset [flags] ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` ```sql confluent flink endpoint unset ``` --- ### confluent flink endpoint use | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/endpoint/confluent_flink_endpoint_use.html confluent flink endpoint use Description Use a Flink endpoint as active endpoint for all subsequent Flink dataplane commands in current environment, such as flink connection, flink statement and flink shell. confluent flink endpoint use [flags] Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). Examples Use “https://flink.us-east-1.aws.confluent.cloud” for subsequent Flink dataplane commands. confluent flink endpoint use "https://flink.us-east-1.aws.confluent.cloud" See Also confluent flink endpoint - Manage Flink endpoint. #### Code Examples ```sql flink connection ``` ```sql flink statement ``` ```sql flink shell ``` ```sql confluent flink endpoint use [flags] ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` ```sql confluent flink endpoint use "https://flink.us-east-1.aws.confluent.cloud" ``` --- ### confluent flink endpoint | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/endpoint/index.html confluent flink endpoint Description Manage Flink endpoint. Subcommands Command Description confluent flink endpoint list List Flink endpoint. confluent flink endpoint unset Unset the current Flink endpoint. confluent flink endpoint use Use a Flink endpoint. --- ### confluent flink environment create | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/environment/confluent_flink_environment_create.html confluent flink environment create Description Create a Flink environment. confluent flink environment create [flags] Flags --kubernetes-namespace string REQUIRED: Kubernetes namespace to deploy Flink applications to. --defaults string JSON string defining the environment's Flink application defaults, or path to a file to read defaults from (with .yml, .yaml or .json extension). --statement-defaults string JSON string defining the environment's Flink statement defaults, or path to a file to read defaults from (with .yml, .yaml or .json extension). --compute-pool-defaults string JSON string defining the environment's Flink compute pool defaults, or path to a file to read defaults from (with .yml, .yaml or .json extension). --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink environment - Manage Flink environments. #### Code Examples ```sql confluent flink environment create [flags] ``` ```sql --kubernetes-namespace string REQUIRED: Kubernetes namespace to deploy Flink applications to. --defaults string JSON string defining the environment's Flink application defaults, or path to a file to read defaults from (with .yml, .yaml or .json extension). --statement-defaults string JSON string defining the environment's Flink statement defaults, or path to a file to read defaults from (with .yml, .yaml or .json extension). --compute-pool-defaults string JSON string defining the environment's Flink compute pool defaults, or path to a file to read defaults from (with .yml, .yaml or .json extension). --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink environment delete | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/environment/confluent_flink_environment_delete.html confluent flink environment delete Description Delete one or more Flink environments. confluent flink environment delete [name-2] ... [name-n] [flags] Flags --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. --force Skip the deletion confirmation prompt. Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink environment - Manage Flink environments. #### Code Examples ```sql confluent flink environment delete [name-2] ... [name-n] [flags] ``` ```sql --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. --force Skip the deletion confirmation prompt. ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink environment describe | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/environment/confluent_flink_environment_describe.html confluent flink environment describe Description Describe a Flink environment. confluent flink environment describe [flags] Flags --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink environment - Manage Flink environments. #### Code Examples ```sql confluent flink environment describe [flags] ``` ```sql --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink environment list | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/environment/confluent_flink_environment_list.html confluent flink environment list Description List Flink environments. confluent flink environment list [flags] Flags --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink environment - Manage Flink environments. #### Code Examples ```sql confluent flink environment list [flags] ``` ```sql --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink environment update | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/environment/confluent_flink_environment_update.html confluent flink environment update Description Update a Flink environment. confluent flink environment update [flags] Flags --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. --defaults string JSON string defining the environment's Flink application defaults, or path to a file to read defaults from (with .yml, .yaml or .json extension). --statement-defaults string JSON string defining the environment's Flink statement defaults, or path to a file to read defaults from (with .yml, .yaml or .json extension). --compute-pool-defaults string JSON string defining the environment's Flink compute pool defaults, or path to a file to read defaults from (with .yml, .yaml or .json extension). -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink environment - Manage Flink environments. #### Code Examples ```sql confluent flink environment update [flags] ``` ```sql --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. --defaults string JSON string defining the environment's Flink application defaults, or path to a file to read defaults from (with .yml, .yaml or .json extension). --statement-defaults string JSON string defining the environment's Flink statement defaults, or path to a file to read defaults from (with .yml, .yaml or .json extension). --compute-pool-defaults string JSON string defining the environment's Flink compute pool defaults, or path to a file to read defaults from (with .yml, .yaml or .json extension). -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink environment | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/environment/index.html confluent flink environment Aliases environment, env Description Manage Flink environments. Subcommands Command Description confluent flink environment create Create a Flink environment. confluent flink environment delete Delete one or more Flink environments. confluent flink environment describe Describe a Flink environment. confluent flink environment list List Flink environments. confluent flink environment update Update a Flink environment. #### Code Examples ```sql environment, env ``` --- ### confluent flink | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/index.html confluent flink Description Manage Apache Flink. Subcommands Cloud Command Description confluent flink artifact Manage Flink UDF artifacts. confluent flink compute-pool Manage Flink compute pools. confluent flink connection Manage Flink connections. confluent flink connectivity-type Manage Flink connectivity type. confluent flink endpoint Manage Flink endpoint. confluent flink region Manage Flink regions. confluent flink shell Start Flink interactive SQL client. confluent flink statement Manage Flink SQL statements. On-Premises Command Description confluent flink application Manage Flink applications. confluent flink catalog Manage Flink catalogs in Confluent Platform. confluent flink compute-pool Manage Flink compute pools. confluent flink environment Manage Flink environments. confluent flink statement Manage Flink SQL statements. --- ### confluent flink region list | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/region/confluent_flink_region_list.html confluent flink region list Description List Flink regions. confluent flink region list [flags] Flags --cloud string Specify the cloud provider as "aws", "azure", or "gcp". --context string CLI context name. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). Examples List the available Flink AWS regions. confluent flink region list --cloud aws See Also confluent flink region - Manage Flink regions. #### Code Examples ```sql confluent flink region list [flags] ``` ```sql --cloud string Specify the cloud provider as "aws", "azure", or "gcp". --context string CLI context name. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` ```sql confluent flink region list --cloud aws ``` --- ### confluent flink region unset | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/region/confluent_flink_region_unset.html confluent flink region unset Description Unset the current Flink cloud and region that was set with the use command. confluent flink region unset [flags] Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). Examples Unset the current Flink region us-west-1 with cloud provider = AWS. confluent flink region unset See Also confluent flink region - Manage Flink regions. #### Code Examples ```sql confluent flink region unset [flags] ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` ```sql confluent flink region unset ``` --- ### confluent flink region use | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/region/confluent_flink_region_use.html confluent flink region use Description Choose a Flink region to be used in subsequent commands which support passing a region with the --region flag. confluent flink region use [flags] Flags --cloud string REQUIRED: Specify the cloud provider as "aws", "azure", or "gcp". --region string REQUIRED: Cloud region for Flink (use "confluent flink region list" to see all). Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). Examples Select region “N. Virginia (us-east-1)” for use in subsequent Flink commands. confluent flink region use --cloud aws --region us-east-1 See Also confluent flink region - Manage Flink regions. #### Code Examples ```sql confluent flink region use [flags] ``` ```sql --cloud string REQUIRED: Specify the cloud provider as "aws", "azure", or "gcp". --region string REQUIRED: Cloud region for Flink (use "confluent flink region list" to see all). ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` ```sql confluent flink region use --cloud aws --region us-east-1 ``` --- ### confluent flink region | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/region/index.html confluent flink region Description Manage Flink regions. Subcommands Command Description confluent flink region list List Flink regions. confluent flink region unset Unset the current Flink cloud and region. confluent flink region use Use a Flink region in subsequent commands. --- ### confluent flink statement create | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/statement/confluent_flink_statement_create.html confluent flink statement create Description Cloud Create a Flink SQL statement. confluent flink statement create [name] [flags] On-Premises Create a Flink SQL statement in Confluent Platform. confluent flink statement create [name] [flags] Flags Cloud --sql string REQUIRED: The Flink SQL statement. --compute-pool string Flink compute pool ID. --service-account string Service account ID. --database string The database which will be used as the default database. When using Kafka, this is the cluster ID. --wait Block until the statement is running or has failed. --property strings A mechanism to pass properties in the form key=value when creating a Flink statement. --environment string Environment ID. --context string CLI context name. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") On-Premises --sql string REQUIRED: The Flink SQL statement. --environment string REQUIRED: Name of the Flink environment. --compute-pool string REQUIRED: The compute pool name to execute the Flink SQL statement. --parallelism uint16 The parallelism the statement, default value is 1. (default 1) --catalog string The name of the default catalog. --database string The name of the default database. --flink-configuration string The file path to hold the Flink configuration for the statement. --wait Boolean flag to block until the statement is running or has failed. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). Examples Cloud Create a Flink SQL statement in the current compute pool. confluent flink statement create --sql "SELECT * FROM table;" Create a Flink SQL statement named “my-statement” in compute pool “lfcp-123456” with service account “sa-123456”, using Kafka cluster “my-cluster” as the default database, and with additional properties. confluent flink statement create my-statement --sql "SELECT * FROM my-topic;" --compute-pool lfcp-123456 --service-account sa-123456 --database my-cluster --property property1=value1,property2=value2 On-Premises No examples. See Also confluent flink statement - Manage Flink SQL statements. #### Code Examples ```sql confluent flink statement create [name] [flags] ``` ```sql confluent flink statement create [name] [flags] ``` ```sql --sql string REQUIRED: The Flink SQL statement. --compute-pool string Flink compute pool ID. --service-account string Service account ID. --database string The database which will be used as the default database. When using Kafka, this is the cluster ID. --wait Block until the statement is running or has failed. --property strings A mechanism to pass properties in the form key=value when creating a Flink statement. --environment string Environment ID. --context string CLI context name. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql --sql string REQUIRED: The Flink SQL statement. --environment string REQUIRED: Name of the Flink environment. --compute-pool string REQUIRED: The compute pool name to execute the Flink SQL statement. --parallelism uint16 The parallelism the statement, default value is 1. (default 1) --catalog string The name of the default catalog. --database string The name of the default database. --flink-configuration string The file path to hold the Flink configuration for the statement. --wait Boolean flag to block until the statement is running or has failed. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` ```sql confluent flink statement create --sql "SELECT * FROM table;" ``` ```sql confluent flink statement create my-statement --sql "SELECT * FROM my-topic;" --compute-pool lfcp-123456 --service-account sa-123456 --database my-cluster --property property1=value1,property2=value2 ``` --- ### confluent flink statement delete | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/statement/confluent_flink_statement_delete.html confluent flink statement delete Description Cloud Delete one or more Flink SQL statements. confluent flink statement delete [name-2] ... [name-n] [flags] On-Premises Delete one or more Flink SQL statements in Confluent Platform. confluent flink statement delete [name-2] ... [name-n] [flags] Flags Cloud --cloud string Specify the cloud provider as "aws", "azure", or "gcp". --region string Cloud region for Flink (use "confluent flink region list" to see all). --force Skip the deletion confirmation prompt. --environment string Environment ID. --context string CLI context name. On-Premises --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. --force Skip the deletion confirmation prompt. Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink statement - Manage Flink SQL statements. #### Code Examples ```sql confluent flink statement delete [name-2] ... [name-n] [flags] ``` ```sql confluent flink statement delete [name-2] ... [name-n] [flags] ``` ```sql --cloud string Specify the cloud provider as "aws", "azure", or "gcp". --region string Cloud region for Flink (use "confluent flink region list" to see all). --force Skip the deletion confirmation prompt. --environment string Environment ID. --context string CLI context name. ``` ```sql --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. --force Skip the deletion confirmation prompt. ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink statement describe | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/statement/confluent_flink_statement_describe.html confluent flink statement describe Description Cloud Describe a Flink SQL statement. confluent flink statement describe [flags] On-Premises Describe a Flink SQL statement in Confluent Platform. confluent flink statement describe [name] [flags] Flags Cloud --cloud string Specify the cloud provider as "aws", "azure", or "gcp". --region string Cloud region for Flink (use "confluent flink region list" to see all). --environment string Environment ID. --context string CLI context name. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") On-Premises --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink statement - Manage Flink SQL statements. #### Code Examples ```sql confluent flink statement describe [flags] ``` ```sql confluent flink statement describe [name] [flags] ``` ```sql --cloud string Specify the cloud provider as "aws", "azure", or "gcp". --region string Cloud region for Flink (use "confluent flink region list" to see all). --environment string Environment ID. --context string CLI context name. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink statement list | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/statement/confluent_flink_statement_list.html confluent flink statement list Description Cloud List Flink SQL statements. confluent flink statement list [flags] On-Premises List Flink SQL statements in Confluent Platform. confluent flink statement list [flags] Flags Cloud --cloud string Specify the cloud provider as "aws", "azure", or "gcp". --region string Cloud region for Flink (use "confluent flink region list" to see all). --compute-pool string Flink compute pool ID. --environment string Environment ID. --context string CLI context name. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") --status string Filter the results by statement status. On-Premises --environment string REQUIRED: Name of the Flink environment. --compute-pool string Optional flag to filter the Flink statements by compute pool ID. --status string Optional flag to filter the Flink statements by statement status. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). Examples Cloud List running statements. confluent flink statement list --status running On-Premises No examples. See Also confluent flink statement - Manage Flink SQL statements. #### Code Examples ```sql confluent flink statement list [flags] ``` ```sql confluent flink statement list [flags] ``` ```sql --cloud string Specify the cloud provider as "aws", "azure", or "gcp". --region string Cloud region for Flink (use "confluent flink region list" to see all). --compute-pool string Flink compute pool ID. --environment string Environment ID. --context string CLI context name. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") --status string Filter the results by statement status. ``` ```sql --environment string REQUIRED: Name of the Flink environment. --compute-pool string Optional flag to filter the Flink statements by compute pool ID. --status string Optional flag to filter the Flink statements by statement status. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` ```sql confluent flink statement list --status running ``` --- ### confluent flink statement rescale | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/statement/confluent_flink_statement_rescale.html confluent flink statement rescale Description Rescale a Flink SQL statement in Confluent Platform. confluent flink statement rescale [flags] Flags --environment string REQUIRED: Name of the Flink environment. --parallelism int32 REQUIRED: New parallelism of the Flink SQL statement. (default 1) --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink statement - Manage Flink SQL statements. #### Code Examples ```sql confluent flink statement rescale [flags] ``` ```sql --environment string REQUIRED: Name of the Flink environment. --parallelism int32 REQUIRED: New parallelism of the Flink SQL statement. (default 1) --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink statement resume | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/statement/confluent_flink_statement_resume.html confluent flink statement resume Description Cloud Resume a Flink SQL statement. confluent flink statement resume [flags] On-Premises Resume a Flink SQL statement in Confluent Platform. confluent flink statement resume [flags] Flags Cloud --principal string A user or service account the statement runs as. --compute-pool string Flink compute pool ID. --cloud string Specify the cloud provider as "aws", "azure", or "gcp". --region string Cloud region for Flink (use "confluent flink region list" to see all). --environment string Environment ID. --context string CLI context name. On-Premises --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). Examples Cloud Request to resume the currently stopped statement “my-statement” using original principal id and under the original compute pool. confluent flink statement resume my-statement Request to resume the currently stopped statement “my-statement” using service account “sa-123456”. confluent flink statement resume my-statement --principal sa-123456 Request to resume the currently stopped statement “my-statement” using user account “u-987654”. confluent flink statement resume my-statement --principal u-987654 Request to resume the currently stopped statement “my-statement” and under a different compute pool “lfcp-123456”. confluent flink statement resume my-statement --compute-pool lfcp-123456 Request to resume the currently stopped statement “my-statement” using service account “sa-123456” and under a different compute pool “lfcp-123456”. confluent flink statement resume my-statement --principal sa-123456 --compute-pool lfcp-123456 On-Premises No examples. See Also confluent flink statement - Manage Flink SQL statements. #### Code Examples ```sql confluent flink statement resume [flags] ``` ```sql confluent flink statement resume [flags] ``` ```sql --principal string A user or service account the statement runs as. --compute-pool string Flink compute pool ID. --cloud string Specify the cloud provider as "aws", "azure", or "gcp". --region string Cloud region for Flink (use "confluent flink region list" to see all). --environment string Environment ID. --context string CLI context name. ``` ```sql --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` ```sql confluent flink statement resume my-statement ``` ```sql confluent flink statement resume my-statement --principal sa-123456 ``` ```sql confluent flink statement resume my-statement --principal u-987654 ``` ```sql confluent flink statement resume my-statement --compute-pool lfcp-123456 ``` ```sql confluent flink statement resume my-statement --principal sa-123456 --compute-pool lfcp-123456 ``` --- ### confluent flink statement stop | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/statement/confluent_flink_statement_stop.html confluent flink statement stop Description Cloud Stop a Flink SQL statement. confluent flink statement stop [flags] On-Premises Stop a Flink SQL statement in Confluent Platform. confluent flink statement stop [flags] Flags Cloud --cloud string Specify the cloud provider as "aws", "azure", or "gcp". --region string Cloud region for Flink (use "confluent flink region list" to see all). --environment string Environment ID. --context string CLI context name. On-Premises --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). Examples Cloud Request to stop the currently running statement “my-statement”. confluent flink statement stop my-statement On-Premises No examples. See Also confluent flink statement - Manage Flink SQL statements. #### Code Examples ```sql confluent flink statement stop [flags] ``` ```sql confluent flink statement stop [flags] ``` ```sql --cloud string Specify the cloud provider as "aws", "azure", or "gcp". --region string Cloud region for Flink (use "confluent flink region list" to see all). --environment string Environment ID. --context string CLI context name. ``` ```sql --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` ```sql confluent flink statement stop my-statement ``` --- ### confluent flink statement update | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/statement/confluent_flink_statement_update.html confluent flink statement update Description Update a Flink SQL statement. confluent flink statement update [flags] Flags --principal string A user or service account the statement runs as. --compute-pool string Flink compute pool ID. --stopped Request to stop or resume the statement. --cloud string Specify the cloud provider as "aws", "azure", or "gcp". --region string Cloud region for Flink (use "confluent flink region list" to see all). --environment string Environment ID. --context string CLI context name. Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). Examples Request to resume the currently stopped statement “my-statement” using original principal id and under the original compute pool. confluent flink statement update my-statement --stopped=false Request to resume the currently stopped statement “my-statement” using service account “sa-123456”. confluent flink statement update my-statement --stopped=false --principal sa-123456 Request to resume the currently stopped statement “my-statement” using user account “u-987654”. confluent flink statement update my-statement --stopped=false --principal u-987654 Request to resume the currently stopped statement “my-statement” and under a different compute pool “lfcp-123456”. confluent flink statement update my-statement --stopped=false --compute-pool lfcp-123456 Request to resume the currently stopped statement “my-statement” using service account “sa-123456” and under a different compute pool “lfcp-123456”. confluent flink statement update my-statement --stopped=false --principal sa-123456 --compute-pool lfcp-123456 Request to stop the currently running statement “my-statement”. confluent flink statement update my-statement --stopped=true See Also confluent flink statement - Manage Flink SQL statements. #### Code Examples ```sql confluent flink statement update [flags] ``` ```sql --principal string A user or service account the statement runs as. --compute-pool string Flink compute pool ID. --stopped Request to stop or resume the statement. --cloud string Specify the cloud provider as "aws", "azure", or "gcp". --region string Cloud region for Flink (use "confluent flink region list" to see all). --environment string Environment ID. --context string CLI context name. ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` ```sql confluent flink statement update my-statement --stopped=false ``` ```sql confluent flink statement update my-statement --stopped=false --principal sa-123456 ``` ```sql confluent flink statement update my-statement --stopped=false --principal u-987654 ``` ```sql confluent flink statement update my-statement --stopped=false --compute-pool lfcp-123456 ``` ```sql confluent flink statement update my-statement --stopped=false --principal sa-123456 --compute-pool lfcp-123456 ``` ```sql confluent flink statement update my-statement --stopped=true ``` --- ### confluent flink statement web-ui-forward | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/statement/confluent_flink_statement_web-ui-forward.html confluent flink statement web-ui-forward Description Forward the web UI of a Flink statement in Confluent Platform. confluent flink statement web-ui-forward [flags] Flags --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. --port uint16 Port to forward the web UI to. If not provided, a random, OS-assigned port will be used. Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink statement - Manage Flink SQL statements. #### Code Examples ```sql confluent flink statement web-ui-forward [flags] ``` ```sql --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. --port uint16 Port to forward the web UI to. If not provided, a random, OS-assigned port will be used. ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink statement exception list | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/statement/exception/confluent_flink_statement_exception_list.html confluent flink statement exception list Description Cloud List exceptions for a Flink SQL statement. confluent flink statement exception list [flags] On-Premises List exceptions for a Flink SQL statement in Confluent Platform. confluent flink statement exception list [flags] Flags Cloud --cloud string Specify the cloud provider as "aws", "azure", or "gcp". --region string Cloud region for Flink (use "confluent flink region list" to see all). --environment string Environment ID. --context string CLI context name. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") On-Premises --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") Global Flags -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). See Also confluent flink statement exception - Manage Flink SQL statement exceptions. #### Code Examples ```sql confluent flink statement exception list [flags] ``` ```sql confluent flink statement exception list [flags] ``` ```sql --cloud string Specify the cloud provider as "aws", "azure", or "gcp". --region string Cloud region for Flink (use "confluent flink region list" to see all). --environment string Environment ID. --context string CLI context name. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql --environment string REQUIRED: Name of the Flink environment. --url string Base URL of the Confluent Manager for Apache Flink (CMF). Environment variable "CONFLUENT_CMF_URL" may be set in place of this flag. --client-key-path string Path to client private key for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_KEY_PATH" may be set in place of this flag. --client-cert-path string Path to client cert to be verified by Confluent Manager for Apache Flink. Include for mTLS authentication. Environment variable "CONFLUENT_CMF_CLIENT_CERT_PATH" may be set in place of this flag. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent Manager for Apache Flink connection. Environment variable "CONFLUENT_CMF_CERTIFICATE_AUTHORITY_PATH" may be set in place of this flag. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ```sql -h, --help Show help for this command. --unsafe-trace Equivalent to -vvvv, but also log HTTP requests and responses which might contain plaintext secrets. -v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace). ``` --- ### confluent flink statement exception | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/statement/exception/index.html confluent flink statement exception Description Manage Flink SQL statement exceptions. Subcommands Command Description confluent flink statement exception list List exceptions for a Flink SQL statement. --- ### confluent flink statement | Confluent Documentation Source: https://docs.confluent.io/confluent-cli/current/command-reference/flink/statement/index.html confluent flink statement Description Manage Flink SQL statements. Subcommands Cloud Command Description confluent flink statement create Create a Flink SQL statement. confluent flink statement delete Delete one or more Flink SQL statements. confluent flink statement describe Describe a Flink SQL statement. confluent flink statement exception Manage Flink SQL statement exceptions. confluent flink statement list List Flink SQL statements. confluent flink statement resume Resume a Flink SQL statement. confluent flink statement stop Stop a Flink SQL statement. confluent flink statement update Update a Flink SQL statement. On-Premises Command Description confluent flink statement create Create a Flink SQL statement. confluent flink statement delete Delete one or more Flink SQL statements. confluent flink statement describe Describe a Flink SQL statement. confluent flink statement exception Manage Flink SQL statement exceptions. confluent flink statement list List Flink SQL statements in Confluent Platform. confluent flink statement rescale Rescale a Flink SQL statement. confluent flink statement resume Resume a Flink SQL statement. confluent flink statement stop Stop a Flink SQL statement. confluent flink statement web-ui-forward Forward the web UI of a Flink statement. --- ### Manage Confluent Platform for Apache Flink Applications Using Confluent for Kubernetes | Confluent Documentation Source: https://docs.confluent.io/operator/current/co-manage-flink.html Manage Flink Applications Using Confluent for Kubernetes Apache Flink® is a powerful, scalable, and secure stream processing framework for running complex, stateful, low-latency streaming applications on large volumes of data. Offered with Confluent Platform, Confluent Manager for Apache Flink® (CMF) is a self-managed service for Flink that integrates seamlessly with Apache Kafka®. To learn more about CMF, see Overview of Confluent Platform for Apache Flink. You can use Confluent for Kubernetes (CFK) to manage CMF and Flink applications within the familiar Kubernetes environment and custom resources. The high-level workflow to manage Flink applications with CFK is: Install the Confluent Platform for Apache Flink Kubernetes operator. Install Confluent Manager for Apache Flink. Install or upgrade CFK with Flink integration enabled: helm upgrade --install confluent-operator \ confluentinc/confluent-for-kubernetes To configure CFK to listen to multiple namespaces or single name space, see Configure CFK to manage Confluent Platform components in different namespaces. For example, to configure CFK to manage Flink in two namespaces, confluent and default and only in those namespaces, add the --set namespaceList and --set namespaced=true flags to the helm upgrade command as shown below: helm upgrade --install confluent-operator \ confluentinc/confluent-for-kubernetes \ --set namespaceList="{confluent,default}" \ --set namespaced=true For more information about CFK installation, see Deploy Confluent for Kubernetes. Create a CMF REST class. Create a Flink environment. Create a Flink application. In the Flink Web UI, verify that the application job you created is running. An example scenario of using CMF with CFK is available in the CFK Example Repository. Requirements and considerations To manage Flink in CFK, you need the following versions: CMF version V1 CFK 2.10.0 and higher Confluent Platform 7.8.0 and higher Currently, CFK can authenticate to CMF without authentication or using mTLS. Create a CMF REST Class When managing CMF in CFK, the CMF custom resources, namely, FlinkEnvironment and FlinkApplication, communicate with CMF through the CMF REST Class (CMFRestClass). You need to first set up a CMF REST Class custom resource (CR). CMF REST Class is only used by CFK and is not part of CMF. If using mTLS or TLS to connect to the Flink host, create a secret. Certificates with appropriate Subject Alternate Names (SANs) are required for the mTLS setup. mTLS: You need to create a secret with certs and reference it in the CMFRestClass CR in the next step. TLS: The secret is only required if using a self-signed certificate. See Provide TLS keys and certificates in PEM format and Provide TLS keys and certificates in Java KeyStore format for the expected keys in the TLS secret. Create a a CMF REST Class (CMFRestClass CR) with the following spec and deploy the resource using the kubectl apply -f command. apiVersion: platform.confluent.io/v1beta1 kind: CMFRestClass metadata: name: --- [1] namespace: --- [2] spec: cmfRest: --- [3] authentication: type: --- [4] endpoint: --- [5] tls: --- [6] secretRef: --- [7] [1] The name of the REST Class. [2] The namespace of the CMF REST Class. [3] The CMF cluster. [4] To use mTLS authentication, set to mtls and specify the certificates in [7]. [5] The endpoint of the CMF host. [6] Required when you set the authentication type ([4]) is set to mtls. [7] The name of the secret that contains the TLS certificates. An example CMFRestClass CR: apiVersion: platform.confluent.io/v1beta1 kind: CMFRestClass metadata: name: default namespace: operator spec: cmfRest: endpoint: https://cmf-service:80 authentication: type: mtls sslClientAuthentication: true tls: secretRef: cmf-day2-tls Check the status: kubectl get CMFRestClass default -n -oyaml Create a Flink environment A Flink environment is a set of configurations that Flink applications use. Create a FlinkEnvironment CR using the following spec, and deploy it with the kubectl apply -f command. apiVersion: platform.confluent.io/v1beta1 kind: FlinkEnvironment metadata: name: namespace: spec: kubernetesNamespace: --- [1] flinkApplicationDefaults: --- [2] metadata: --- [3] spec: --- [4] flinkConfiguration: cmfRestClassRef: --- [5] name: namespace: [1] The namespace of the Flink cluster. Typically, you would install the FlinkEnvironment CR in the CFK namespace (metadata.namespace), but the Flink would be in another namespace (spec.kubernetesNamespace), for example, default. [2] Configurations for the Flink cluster to specify the deployment-wide default application settings. [3] Kubernetes API metadata. [4] Spec of the FlinkApplicationSpec type. [5] The reference to the REST Class you created in Create a CMF REST Class. You can install FlinkEnvironment CR and the CMF REST class in different namespaces. If omitted, the CMFRestClass of the name default in the same namespace is used. An example FlinkEnvironment CR: apiVersion: platform.confluent.io/v1beta1 kind: FlinkEnvironment metadata: name: my-env1 namespace: operator spec: kubernetesNamespace: default flinkApplicationDefaults: metadata: labels: "acmecorp.com/owned-by": "analytics-team" spec: flinkConfiguration: taskmanager.numberOfTaskSlots: "2" rest.profiling.enabled": "true" cmfRestClassRef: name: default namespace: operator Check the status. kubectl get flinkEnvironment -n -oyaml Create a Flink application A Flink application is a user program that creates one or more Flink jobs to process data. To create a Flink application resource in CFK: Create a FlinkApplication CR using the following spec and deploy the resource using the kubectl apply -f command. apiVersion: platform.confluent.io/v1beta1 kind: FlinkApplication metadata: spec: cmfRestClassRef: name: --- [1] namespace: --- [2] image: --- [3] flinkEnvironment: --- [4] image: flinkVersion: flinkConfiguration: --- [5] serviceAccount: --- [6] jobManager: --- [7] taskManager: --- [8] job: --- [9] [1] The reference to the REST Class you created in Create a CMF REST Class. If omitted, the CMFRestClass of the name default in the same namespace is used. [2] The namespace of this FlinkApplication CR. The namespace of the Flink cluster is determined by FlinkEnvironment.spec.kubernetesNamespace. [3] The CMF image. [4] The reference to the FlinkEnvironment CR you created in Create a Flink environment. [5] Flink configurations. [6] The service account that runs Flink. [7] FlinkJobManager [8] FlinkTaskManager [9] FlinkJob An example FlinkApplication CR: apiVersion: platform.confluent.io/v1beta1 kind: FlinkApplication metadata: name: my-app1 namespace: default spec: flinkEnvironment: my-env1 image: confluentinc/cp-flink:1.19.1-cp1 flinkVersion: v1_19 flinkConfiguration: "taskmanager.numberOfTaskSlots": "2" "metrics.reporter.prom.factory.class": "org.apache.flink.metrics.prometheus.PrometheusReporterFactory" "metrics.reporter.prom.port": "9249-9250" "rest.profiling.enabled": "true" serviceAccount: flink jobManager: resource: memory: 1048m cpu: 1 taskManager: resource: memory: 1048m cpu: 1 job: jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar state: running parallelism: 3 upgradeMode: stateless cmfRestClassRef: name: default namespace: operator Check the status. kubectl get flinkApplication -n -oyaml For status details, see Check the Flink application status. Check the Flink application status The following are the notable status fields in the CFK-manged FlinkApplication CR: status: cmfSync: --- [1] errorMessage: --- [2] lastSyncTime: --- [3] status: --- [4] error: --- [5] clusterInfo: --- [6] jobManagerDeploymentStatus: --- [7] jobStatus: state: --- [8] [1] The status of the sync between CFK and CMF through the CMFRestClass CR. [2] Any error message related to the sync between CFK and CMF ([1]), for example, a connection, authentication, or validation error. [3] The time when the latest sync between CFK and CMF happened. [4] The sync status. The possible values: CREATED, DELETED, UNKNOWN, FAILED. [5] Indicates async errors during from the Flink deployment. This is only set if status.cmfSync.errorMessage ([2]) is empty and status.cmfSync.status: CREATED. For details about the below status fields, refer to the CMF documentation. [6] Information about the Flink cluster when deployed. This section is only set if the status.error ([5]) is not set. [7] Status of the JobManager deployment in Kubernetes. [8] Status of the Flink job inside the FlinkApplication’s Flink cluster. It is important to note, that there is a hierarchy of status/error fields in the FlinkApplication.status: Level 1. The status.cmfSync field needs to be error-free, as this indicates that CFK was able to submit the FlinkApplication to the CMF backend. Level 2, The CMF backend or the internal Kubernetes Operator might report an error in the status.error field. Level 3. Once the errors with the above field are resolved, the rest of the status fields, get populated. The following is an example of error status: status: cfkInternalState: CREATED clusterInfo: {} cmfSync: errorMessage: "" lastSyncTime: "2024-11-05T19:15:09Z" status: Created error: '{"type":"org.apache.flink.kubernetes.operator.exception.ReconciliationException","message":"org.apache.flink.configuration.IllegalConfigurationException: JobManager memory configuration failed: Sum of configured JVM Metaspace (256.000mb (268435456 bytes)) and JVM Overhead (192.000mb (201326592 bytes)) exceed configured Total Process Memory (1 bytes).","additionalMetadata":{},"throwableList":[{"type":"org.apache.flink.configuration.IllegalConfigurationException","message":"JobManager memory configuration failed: Sum of configured JVM Metaspace (256.000mb (268435456 bytes)) and JVM Overhead (192.000mb (201326592 bytes)) exceed configured Total Process Memory (1 bytes).","additionalMetadata":{}},{"type":"org.apache.flink.configuration.IllegalConfigurationException","message":"Sum of configured JVM Metaspace (256.000mb (268435456 bytes)) and JVM Overhead (192.000mb (201326592 bytes)) exceed configured Total Process Memory (1 bytes).","additionalMetadata":{}}]}' jobManagerDeploymentStatus: MISSING jobStatus: checkpointInfo: lastPeriodicCheckpointTimestamp: 0 jobId: a6251e5a0f3f2e00f56874b56bc0780c jobName: "" savepointInfo: lastPeriodicSavepointTimestamp: 0 savepointHistory: [] state: "" lifecycleState: UPGRADING observedGeneration: 5 reconciliationStatus: lastReconciledSpec: '{"spec":{"job":{"jarURI":"local:///opt/flink/examples/streaming/StateMachineExample.jar","parallelism":1,"entryClass":null,"args":[],"state":"suspended","savepointTriggerNonce":null,"initialSavepointPath":null,"checkpointTriggerNonce":null,"upgradeMode":"stateless","allowNonRestoredState":null,"savepointRedeployNonce":null},"restartNonce":null,"flinkConfiguration":{"rest.profiling.enabled":"true","taskmanager.numberOfTaskSlots":"2"},"image":"confluentinc/cp-flink:1.19.1- cp1","imagePullPolicy":null,"serviceAccount":"flink","flinkVersion":"v1_19","ingress":null,"podTemplate":null,"jobManager":{"resource":{"cpu":1.0,"memory":"1","ephemeralStorage":null},"replicas":1,"podTemplate":{"metadata":{"labels":{"platform.confluent.io/origin":"flink"}}}},"taskManager":{"resource":{"cpu":1.0,"memory":"1","ephemeralStorage":null},"replicas":null,"podTemplate":{"metadata":{"labels":{"platform.confluent.io/ origin":"flink"}}}},"logConfiguration":null,"mode":null},"resource_metadata":{"apiVersion":"flink.apache.org/v1beta1","metadata":{"generation":6},"firstDeployment":true}}' reconciliationTimestamp: 1730834099726 state: UPGRADING taskManager: labelSelector: "" replicas: 0 The following is an example status of a successful FlinkApplication creation: status: cfkInternalState: CREATED clusterInfo: flink-revision: 89d0b8f @ 2024-06-22T13:19:31+02:00 flink-version: 1.19.1-cp1 total-cpu: "2.0" total-memory: "2516582400" cmfSync: errorMessage: "" lastSyncTime: "2024-11-05T19:19:10Z" status: Created jobManagerDeploymentStatus: READY jobStatus: checkpointInfo: lastPeriodicCheckpointTimestamp: 0 jobId: 522d7ff7f15b4e138ffb9ea4053abbd3 jobName: State machine job savepointInfo: lastPeriodicSavepointTimestamp: 0 savepointHistory: [] startTime: "1730834237948" state: RUNNING updateTime: "1730834248753" lifecycleState: STABLE observedGeneration: 6 reconciliationStatus: lastReconciledSpec: '{"spec":{"job":{"jarURI":"local:///opt/flink/examples/streaming/StateMachineExample.jar","parallelism":1,"entryClass":null,"args":[],"state":"running","savepointTriggerNonce":null,"initialSavepointPath":null,"checkpointTriggerNonce":null,"upgradeMode":"stateless","allowNonRestoredState":null,"savepointRedeployNonce":null},"restartNonce":null,"flinkConfiguration":{"rest.profiling.enabled":"true","taskmanager.numberOfTaskSlots":"2"},"image":"confluentinc/cp-flink:1.19.1- cp1","imagePullPolicy":null,"serviceAccount":"flink","flinkVersion":"v1_19","ingress":null,"podTemplate":null,"jobManager":{"resource":{"cpu":1.0,"memory":"1200m","ephemeralStorage":null},"replicas":1,"podTemplate":{"metadata":{"labels":{"platform.confluent.io/origin":"flink"}}}},"taskManager":{"resource":{"cpu":1.0,"memory":"1200m","ephemeralStorage":null},"replicas":null,"podTemplate":{"metadata":{"labels":{"platform.confluent.io/ origin":"flink"}}}},"logConfiguration":null,"mode":null},"resource_metadata":{"apiVersion":"flink.apache.org/v1beta1","metadata":{"generation":12},"firstDeployment":true}}' lastStableSpec: '{"spec":{"job":{"jarURI":"local:///opt/flink/examples/streaming/StateMachineExample.jar","parallelism":1,"entryClass":null,"args":[],"state":"running","savepointTriggerNonce":null,"initialSavepointPath":null,"checkpointTriggerNonce":null,"upgradeMode":"stateless","allowNonRestoredState":null,"savepointRedeployNonce":null},"restartNonce":null,"flinkConfiguration":{"rest.profiling.enabled":"true","taskmanager.numberOfTaskSlots":"2"},"image":"confluentinc/cp-flink:1.19.1- cp1","imagePullPolicy":null,"serviceAccount":"flink","flinkVersion":"v1_19","ingress":null,"podTemplate":null,"jobManager":{"resource":{"cpu":1.0,"memory":"1200m","ephemeralStorage":null},"replicas":1,"podTemplate":{"metadata":{"labels":{"platform.confluent.io/origin":"flink"}}}},"taskManager":{"resource":{"cpu":1.0,"memory":"1200m","ephemeralStorage":null},"replicas":null,"podTemplate":{"metadata":{"labels":{"platform.confluent.io/ origin":"flink"}}}},"logConfiguration":null,"mode":null},"resource_metadata":{"apiVersion":"flink.apache.org/v1beta1","metadata":{"generation":12},"firstDeployment":true}}' reconciliationTimestamp: 1730834229475 state: DEPLOYED taskManager: labelSelector: component=taskmanager,app=app111 replicas: 1 #### Code Examples ```sql helm upgrade --install confluent-operator \ confluentinc/confluent-for-kubernetes ``` ```sql --set namespaceList ``` ```sql --set namespaced=true ``` ```sql helm upgrade ``` ```sql helm upgrade --install confluent-operator \ confluentinc/confluent-for-kubernetes \ --set namespaceList="{confluent,default}" \ --set namespaced=true ``` ```sql kubectl apply -f ``` ```sql apiVersion: platform.confluent.io/v1beta1 kind: CMFRestClass metadata: name: --- [1] namespace: --- [2] spec: cmfRest: --- [3] authentication: type: --- [4] endpoint: --- [5] tls: --- [6] secretRef: --- [7] ``` ```sql apiVersion: platform.confluent.io/v1beta1 kind: CMFRestClass metadata: name: default namespace: operator spec: cmfRest: endpoint: https://cmf-service:80 authentication: type: mtls sslClientAuthentication: true tls: secretRef: cmf-day2-tls ``` ```sql kubectl get CMFRestClass default -n -oyaml ``` ```sql kubectl apply -f ``` ```sql apiVersion: platform.confluent.io/v1beta1 kind: FlinkEnvironment metadata: name: namespace: spec: kubernetesNamespace: --- [1] flinkApplicationDefaults: --- [2] metadata: --- [3] spec: --- [4] flinkConfiguration: cmfRestClassRef: --- [5] name: namespace: ``` ```sql metadata.namespace ``` ```sql spec.kubernetesNamespace ``` ```sql apiVersion: platform.confluent.io/v1beta1 kind: FlinkEnvironment metadata: name: my-env1 namespace: operator spec: kubernetesNamespace: default flinkApplicationDefaults: metadata: labels: "acmecorp.com/owned-by": "analytics-team" spec: flinkConfiguration: taskmanager.numberOfTaskSlots: "2" rest.profiling.enabled": "true" cmfRestClassRef: name: default namespace: operator ``` ```sql kubectl get flinkEnvironment -n -oyaml ``` ```sql kubectl apply -f ``` ```sql apiVersion: platform.confluent.io/v1beta1 kind: FlinkApplication metadata: spec: cmfRestClassRef: name: --- [1] namespace: --- [2] image: --- [3] flinkEnvironment: --- [4] image: flinkVersion: flinkConfiguration: --- [5] serviceAccount: --- [6] jobManager: --- [7] taskManager: --- [8] job: --- [9] ``` ```sql FlinkEnvironment.spec.kubernetesNamespace ``` ```sql apiVersion: platform.confluent.io/v1beta1 kind: FlinkApplication metadata: name: my-app1 namespace: default spec: flinkEnvironment: my-env1 image: confluentinc/cp-flink:1.19.1-cp1 flinkVersion: v1_19 flinkConfiguration: "taskmanager.numberOfTaskSlots": "2" "metrics.reporter.prom.factory.class": "org.apache.flink.metrics.prometheus.PrometheusReporterFactory" "metrics.reporter.prom.port": "9249-9250" "rest.profiling.enabled": "true" serviceAccount: flink jobManager: resource: memory: 1048m cpu: 1 taskManager: resource: memory: 1048m cpu: 1 job: jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar state: running parallelism: 3 upgradeMode: stateless cmfRestClassRef: name: default namespace: operator ``` ```sql kubectl get flinkApplication -n -oyaml ``` ```sql status: cmfSync: --- [1] errorMessage: --- [2] lastSyncTime: --- [3] status: --- [4] error: --- [5] clusterInfo: --- [6] jobManagerDeploymentStatus: --- [7] jobStatus: state: --- [8] ``` ```sql status.cmfSync.errorMessage ``` ```sql status.cmfSync.status: CREATED ``` ```sql status.error ``` ```sql FlinkApplication.status ``` ```sql status.cmfSync ``` ```sql status.error ``` ```sql status: cfkInternalState: CREATED clusterInfo: {} cmfSync: errorMessage: "" lastSyncTime: "2024-11-05T19:15:09Z" status: Created error: '{"type":"org.apache.flink.kubernetes.operator.exception.ReconciliationException","message":"org.apache.flink.configuration.IllegalConfigurationException: JobManager memory configuration failed: Sum of configured JVM Metaspace (256.000mb (268435456 bytes)) and JVM Overhead (192.000mb (201326592 bytes)) exceed configured Total Process Memory (1 bytes).","additionalMetadata":{},"throwableList":[{"type":"org.apache.flink.configuration.IllegalConfigurationException","message":"JobManager memory configuration failed: Sum of configured JVM Metaspace (256.000mb (268435456 bytes)) and JVM Overhead (192.000mb (201326592 bytes)) exceed configured Total Process Memory (1 bytes).","additionalMetadata":{}},{"type":"org.apache.flink.configuration.IllegalConfigurationException","message":"Sum of configured JVM Metaspace (256.000mb (268435456 bytes)) and JVM Overhead (192.000mb (201326592 bytes)) exceed configured Total Process Memory (1 bytes).","additionalMetadata":{}}]}' jobManagerDeploymentStatus: MISSING jobStatus: checkpointInfo: lastPeriodicCheckpointTimestamp: 0 jobId: a6251e5a0f3f2e00f56874b56bc0780c jobName: "" savepointInfo: lastPeriodicSavepointTimestamp: 0 savepointHistory: [] state: "" lifecycleState: UPGRADING observedGeneration: 5 reconciliationStatus: lastReconciledSpec: '{"spec":{"job":{"jarURI":"local:///opt/flink/examples/streaming/StateMachineExample.jar","parallelism":1,"entryClass":null,"args":[],"state":"suspended","savepointTriggerNonce":null,"initialSavepointPath":null,"checkpointTriggerNonce":null,"upgradeMode":"stateless","allowNonRestoredState":null,"savepointRedeployNonce":null},"restartNonce":null,"flinkConfiguration":{"rest.profiling.enabled":"true","taskmanager.numberOfTaskSlots":"2"},"image":"confluentinc/cp-flink:1.19.1- cp1","imagePullPolicy":null,"serviceAccount":"flink","flinkVersion":"v1_19","ingress":null,"podTemplate":null,"jobManager":{"resource":{"cpu":1.0,"memory":"1","ephemeralStorage":null},"replicas":1,"podTemplate":{"metadata":{"labels":{"platform.confluent.io/origin":"flink"}}}},"taskManager":{"resource":{"cpu":1.0,"memory":"1","ephemeralStorage":null},"replicas":null,"podTemplate":{"metadata":{"labels":{"platform.confluent.io/ origin":"flink"}}}},"logConfiguration":null,"mode":null},"resource_metadata":{"apiVersion":"flink.apache.org/v1beta1","metadata":{"generation":6},"firstDeployment":true}}' reconciliationTimestamp: 1730834099726 state: UPGRADING taskManager: labelSelector: "" replicas: 0 ``` ```sql status: cfkInternalState: CREATED clusterInfo: flink-revision: 89d0b8f @ 2024-06-22T13:19:31+02:00 flink-version: 1.19.1-cp1 total-cpu: "2.0" total-memory: "2516582400" cmfSync: errorMessage: "" lastSyncTime: "2024-11-05T19:19:10Z" status: Created jobManagerDeploymentStatus: READY jobStatus: checkpointInfo: lastPeriodicCheckpointTimestamp: 0 jobId: 522d7ff7f15b4e138ffb9ea4053abbd3 jobName: State machine job savepointInfo: lastPeriodicSavepointTimestamp: 0 savepointHistory: [] startTime: "1730834237948" state: RUNNING updateTime: "1730834248753" lifecycleState: STABLE observedGeneration: 6 reconciliationStatus: lastReconciledSpec: '{"spec":{"job":{"jarURI":"local:///opt/flink/examples/streaming/StateMachineExample.jar","parallelism":1,"entryClass":null,"args":[],"state":"running","savepointTriggerNonce":null,"initialSavepointPath":null,"checkpointTriggerNonce":null,"upgradeMode":"stateless","allowNonRestoredState":null,"savepointRedeployNonce":null},"restartNonce":null,"flinkConfiguration":{"rest.profiling.enabled":"true","taskmanager.numberOfTaskSlots":"2"},"image":"confluentinc/cp-flink:1.19.1- cp1","imagePullPolicy":null,"serviceAccount":"flink","flinkVersion":"v1_19","ingress":null,"podTemplate":null,"jobManager":{"resource":{"cpu":1.0,"memory":"1200m","ephemeralStorage":null},"replicas":1,"podTemplate":{"metadata":{"labels":{"platform.confluent.io/origin":"flink"}}}},"taskManager":{"resource":{"cpu":1.0,"memory":"1200m","ephemeralStorage":null},"replicas":null,"podTemplate":{"metadata":{"labels":{"platform.confluent.io/ origin":"flink"}}}},"logConfiguration":null,"mode":null},"resource_metadata":{"apiVersion":"flink.apache.org/v1beta1","metadata":{"generation":12},"firstDeployment":true}}' lastStableSpec: '{"spec":{"job":{"jarURI":"local:///opt/flink/examples/streaming/StateMachineExample.jar","parallelism":1,"entryClass":null,"args":[],"state":"running","savepointTriggerNonce":null,"initialSavepointPath":null,"checkpointTriggerNonce":null,"upgradeMode":"stateless","allowNonRestoredState":null,"savepointRedeployNonce":null},"restartNonce":null,"flinkConfiguration":{"rest.profiling.enabled":"true","taskmanager.numberOfTaskSlots":"2"},"image":"confluentinc/cp-flink:1.19.1- cp1","imagePullPolicy":null,"serviceAccount":"flink","flinkVersion":"v1_19","ingress":null,"podTemplate":null,"jobManager":{"resource":{"cpu":1.0,"memory":"1200m","ephemeralStorage":null},"replicas":1,"podTemplate":{"metadata":{"labels":{"platform.confluent.io/origin":"flink"}}}},"taskManager":{"resource":{"cpu":1.0,"memory":"1200m","ephemeralStorage":null},"replicas":null,"podTemplate":{"metadata":{"labels":{"platform.confluent.io/ origin":"flink"}}}},"logConfiguration":null,"mode":null},"resource_metadata":{"apiVersion":"flink.apache.org/v1beta1","metadata":{"generation":12},"firstDeployment":true}}' reconciliationTimestamp: 1730834229475 state: DEPLOYED taskManager: labelSelector: component=taskmanager,app=app111 replicas: 1 ``` ---