71c9a5659259efd253c5f0e64581e61ebe498a84
FieldNotes/AthenaViaCloudformation.md
... | ... | @@ -0,0 +1,86 @@ |
1 | +--- |
|
2 | +title: Using CloudFormation to create Athena tables |
|
3 | +--- |
|
4 | + |
|
5 | +## Background |
|
6 | + |
|
7 | +If you use [Amazon Athena](https://aws.amazon.com/athena/) to query data stored in [Amazon S3](https://aws.amazon.com/s3/) (for example, log files, or large datasets used for analysis), you may find yourself wanting to version-control table definitions using [AWS CloudFormation](https://aws.amazon.com/cloudformation/). Since Athena uses the [AWS Glue](https://aws.amazon.com/glue/) data catalog underneath, you can create [Glue tables](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-glue-table.html) to populate your Athena databases. |
|
8 | + |
|
9 | +(You should note that, with a bit of effort, you can likely adapt this information to e.g. [Terraform's Glue table resource](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/glue_catalog_table).) |
|
10 | + |
|
11 | +## Simple example |
|
12 | + |
|
13 | +Using the [CloudFront Logs example](https://docs.aws.amazon.com/athena/latest/ug/getting-started.html) table, here's a resource definition suitable for a CloudFormation YAML template: |
|
14 | + |
|
15 | +```yaml |
|
16 | +CloudfrontLogsTable: |
|
17 | + Type: "AWS::Glue::Table" |
|
18 | + Properties: |
|
19 | + CatalogId: !Ref "AWS::AccountId" |
|
20 | + DatabaseName: default |
|
21 | + TableInput: |
|
22 | + Name: cloudfront_logs |
|
23 | + TableType: EXTERNAL_TABLE |
|
24 | + StorageDescriptor: |
|
25 | + Location: "s3://athena-examples-us-east-1/cloudfront/plaintext/" |
|
26 | + StoredAsSubDirectories: true |
|
27 | + |
|
28 | + InputFormat: org.apache.hadoop.mapred.TextInputFormat |
|
29 | + OutputFormat: IgnoreKeyTextOutputFormat |
|
30 | + |
|
31 | + SerdeInfo: |
|
32 | + SerializationLibrary: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|
33 | + Parameters: |
|
34 | + field.delim: "\t" |
|
35 | + serialization.format: "\t" |
|
36 | + |
|
37 | + Columns: |
|
38 | + - Name: Date |
|
39 | + Type: date |
|
40 | + - Name: Time |
|
41 | + Type: string |
|
42 | + - Name: Location |
|
43 | + Type: string |
|
44 | + - Name: Bytes |
|
45 | + Type: int |
|
46 | + - Name: RequestIP |
|
47 | + Type: string |
|
48 | + - Name: Method |
|
49 | + Type: string |
|
50 | + - Name: Host |
|
51 | + Type: string |
|
52 | + - Name: Uri |
|
53 | + Type: string |
|
54 | + - Name: Status |
|
55 | + Type: int |
|
56 | + - Name: Referrer |
|
57 | + Type: string |
|
58 | + - Name: ClientInfo |
|
59 | + Type: string |
|
60 | +``` |
|
61 | + |
|
62 | +## Defining partition projection |
|
63 | + |
|
64 | +[Partition projection](https://docs.aws.amazon.com/athena/latest/ug/partition-projection.html) is supported here too. For example, on a CloudTrail logs table with partition projection: |
|
65 | + |
|
66 | +```yaml |
|
67 | +CloudtrailLogsTable: |
|
68 | + Properties: |
|
69 | + TableInput: |
|
70 | + Parameters: |
|
71 | + projection.enabled: "true" |
|
72 | + projection.timestamp.type: "date" |
|
73 | + projection.timestamp.format: "yyyy/MM/dd" |
|
74 | + projection.timestamp.interval: "1" |
|
75 | + projection.timestamp.interval.unit: "DAYS" |
|
76 | + projection.awsregion.type: "enum" |
|
77 | + projection.awsregion.values: "us-east-1,us-east-2" |
|
78 | + storage.location.template: !Sub "s3://my-cloudtrail-logs/AWSLogs/${AWS::AccountId}/CloudTrail/${!awsregion}/${!timestamp}" |
|
79 | + |
|
80 | + PartitionKeys: |
|
81 | + - Name: awsregion |
|
82 | + Type: string |
|
83 | + |
|
84 | + - Name: timestamp |
|
85 | + Type: string |
|
86 | +``` |
Home.md
... | ... | @@ -19,6 +19,7 @@ I have a particular interest in [Huntsville, Alabama](https://huntsvilleal.gov)' |
19 | 19 | |
20 | 20 | Field notes are short, to-the-point, and informal. I use them to document my experience with something or how to perform a particular task "in the field." |
21 | 21 | |
22 | +* [Using CloudFormation to maintain Athena tables](FieldNotes/AthenaViaCloudformation) |
|
22 | 23 | * [How I use Taskwarrior](FieldNotes/Taskwarrior) |
23 | 24 | * [Using TPM-backed SSH keys on NixOS](FieldNotes/TPMKeys) |
24 | 25 | * [Changing your legal name in Madison County, Alabama](FieldNotes/LegalName) (originally published October 2020) |