Let's see what is Bigquery Slots, Reservation API and How we can use them to reduce the SLA time. Considering that you know Bigquery Slots, Cloud Function and Cloud Composer. If not then, I suggest you to have a look on them first.
- Bigquery Slots
- Flex Slots Pricing
- Bigquery Reservations
- Project Implementation
In this article, we will learn about Bigquery Reservation and Flex Slots. Google has recently introduced a new pricing model for Bigquery under the name of “Bigquery Flex Slots”. We will learn how we can purchase, assign them to the required project as per the need and remove them when not required.
First things first. I will start by introducing important concepts before deep-diving into the code implementation of BigQuery Flex Slots. These two concepts are “BigQuery slots”, “Flex slots pricing” and “Introduction to Reservation”.
A BigQuery slot is a unit of computational capacity required to execute SQL queries. You can see it as a mix of CPU, memory, and network. BigQuery automatically calculates how many slots are required by each query, depending on query size and complexity.
Flex Slots Pricing
This newly-introduced flex slots BigQuery pricing model is ideal for cyclical workloads or ad-hoc queries that require some extra power for a certain predictable amount of time. BigQuery flex slots give you the capability to commit to a certain amount of slots, through a reservation, for a duration as short as 60 seconds (billed at $0.04 per slot per hour, at the time of writing; which works out to be 20$ per hour for 500 slots).
For many companies, data analytics has evolved from an occasional task to something that’s mission-critical to their business. When you’re doing data analytics at scale, predictable spending is key.
Google has released, BigQuery Reservations, an easy and flexible self-service way to take advantage of BigQuery flat-rate pricing. Reservations make it even simpler to plan your spending and add flexibility and visibility to your data analytics use cases.
BigQuery Reservations enables you to do the following:
- Switch from the on-demand pricing model to flat-rate pricing, or combine the two models.
- Purchase and deploy dedicated BigQuery slots for analysis.
- Allocate BigQuery slots across your entire organization for workload management.
BigQuery Reservations introduces three key concepts:
- You can purchase dedicated BigQuery slots by procuring commitments.
- Optionally, you can split up and isolate your committed BigQuery slots to reservations for workload management purposes.
- Use assignments to assign Google Cloud projects, folders, or your entire organization to reservations.
As an example, you may need 1,000 BigQuery slots for your organization. Your BigQuery users include a data science team, a high-priority ELT workload, and BI dashboards.
With BigQuery Reservations, you can:
- Purchase a 1,000-slot commitment
- Create reservations “ds” with 500 slots, “elt” with 300 slots, and “bi” with 200 slots
- Assign the data science team’s Google Cloud projects to “ds” reservation
- Assign your ELT projects to “elt” reservation
- Assign the project that runs your BI dashboard to “bi” reservation
Now each of your workloads has a dedicated capacity. In addition, any single unused BigQuery slot is automatically and immediately available to other workloads in your organization.
Use case — Our Batch Load was getting delayed every day by 50–70 min. The root cause of it was, we workflows we using complete 2000 and letting the rest of them wait in a queue. We came up with an approach, that whenever we need slots we will increase them and whenever we don't we will decrease them.
We have implemented the flex slots programmatically, using Bigquery Reservations API. We have used cloud functions to increase and decrease the flex slots.
Airflow dag is used to trigger cloud functions, at appropriate times.
In the cloud function code, we need to call the required functions of the API, to purchase commitment, allocate the slots to reservations, and then assign reservations to the assignment(i.e project, org).
Cloud function to increase the slots
This cloud function is triggered through an airflow dag. We have used a simpleHttpOperator to trigger the cloud function. Once, it gets triggered, it purchases slots, allocates it to a reservation, assigns the reservation to the assignment. This all is done using the methods provided by Reservation API. After that, we are storing the details like commitment name, reservation name, and assignment name in a text file every time the CF is triggered. This helps us later to decrease the slots.
Cloud function to decrease the slots
This is triggered the same as the previous one. This cloud function fetches the required details of the slots to be deleted. And deletes them using the methods provided by Reservation API.
Let's have a quick look at Airflow DAGs
Below, we have 2 dags. One to trigger cloud function to start flex slots and the other to trigger cloud function to stop flex slots. In both the dags we have used simpleHttpOperator that hits the HTTP URL.
Thats, all I have. Let me know if get any questions.