Requests
Other Infrastructure
Athena
What?
Athena is a serverless interactive SQL-like query service that works directly with data stored in S3. Serverless is a cloud computing execution model where the cloud provider dynamically manages the allocation and provisioning of servers. A serverless application runs in stateless compute containers that are event-triggered, ephemeral (may last for one invocation), and fully managed by the cloud provider. Pricing is based on the number of executions rather than pre-purchased compute capacity.
How?
Athena uses a distributed SQL engine (Presto) to perform read operations (select, etc.) and Apache Hive to perform write operations (create table, etc.). A variety of Presto function calls can be found in this documentation. Athena allows you to project your schema on to your data at the time you execute a query (schema-on-read).
Features:
- Queries w/ regular expressions
- Reading of Parquet, JSON, etc.
- A created table automatically grows automatically when you add more data to the S3 bucket (“prefix”) it points to
- Supported functions
- By default, query results are stored as txt files an S3 bucket of your choice (default for emr-comscore was s3://emr-comscore/aws-athena-query-results-929035564788-us-west-2) and are billed at standard Amazon S3 rates
Recently supported:
- CREATE TABLE AS SELECT, which creates a table from the result of a SELECT query statement
- multiple sql statements in one query
- CREATE TABLE ‘s LOCATION must be a directory – Hive will include all files in that directory
- Athena only allows you to create tables with the EXTERNAL keyword. Dropping a table created with the External keyword does not delete the underlying data.
How much?
Athena charges per query, conditional on the amount of data scanned.
- $5 per TB of data scanned
- rounded up to the nearest megabyte
- 10MB minimum per query
-
no charges for
- Data Definition Language (DDL) statements like CREATE/ALTER/DROP TABLE
- statements for managing partitions
- failed queries
- cancelled queries are charged based on the amount of data scanned
- standard S3 rates apply for storage, requests, and data transfer
-
Cost/Performance Efficiency:
- Columnar data (i.e. Parquet) allows Athena to selectively read only required columns to process the data
- Partitioning your data also allows Athena to restrict the amount of data scanned
- see the Athena pricing example.
Connect with us