FieldNotes/AthenaViaCloudformation.md
... ...
@@ -0,0 +1,86 @@
1
+---
2
+title: Using CloudFormation to create Athena tables
3
+---
4
+
5
+## Background
6
+
7
+If you use [Amazon Athena](https://aws.amazon.com/athena/) to query data stored in [Amazon S3](https://aws.amazon.com/s3/) (for example, log files, or large datasets used for analysis), you may find yourself wanting to version-control table definitions using [AWS CloudFormation](https://aws.amazon.com/cloudformation/). Since Athena uses the [AWS Glue](https://aws.amazon.com/glue/) data catalog underneath, you can create [Glue tables](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-glue-table.html) to populate your Athena databases.
8
+
9
+(You should note that, with a bit of effort, you can likely adapt this information to e.g. [Terraform's Glue table resource](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/glue_catalog_table).)
10
+
11
+## Simple example
12
+
13
+Using the [CloudFront Logs example](https://docs.aws.amazon.com/athena/latest/ug/getting-started.html) table, here's a resource definition suitable for a CloudFormation YAML template:
14
+
15
+```yaml
16
+CloudfrontLogsTable:
17
+ Type: "AWS::Glue::Table"
18
+ Properties:
19
+ CatalogId: !Ref "AWS::AccountId"
20
+ DatabaseName: default
21
+ TableInput:
22
+ Name: cloudfront_logs
23
+ TableType: EXTERNAL_TABLE
24
+ StorageDescriptor:
25
+ Location: "s3://athena-examples-us-east-1/cloudfront/plaintext/"
26
+ StoredAsSubDirectories: true
27
+
28
+ InputFormat: org.apache.hadoop.mapred.TextInputFormat
29
+ OutputFormat: IgnoreKeyTextOutputFormat
30
+
31
+ SerdeInfo:
32
+ SerializationLibrary: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
33
+ Parameters:
34
+ field.delim: "\t"
35
+ serialization.format: "\t"
36
+
37
+ Columns:
38
+ - Name: Date
39
+ Type: date
40
+ - Name: Time
41
+ Type: string
42
+ - Name: Location
43
+ Type: string
44
+ - Name: Bytes
45
+ Type: int
46
+ - Name: RequestIP
47
+ Type: string
48
+ - Name: Method
49
+ Type: string
50
+ - Name: Host
51
+ Type: string
52
+ - Name: Uri
53
+ Type: string
54
+ - Name: Status
55
+ Type: int
56
+ - Name: Referrer
57
+ Type: string
58
+ - Name: ClientInfo
59
+ Type: string
60
+```
61
+
62
+## Defining partition projection
63
+
64
+[Partition projection](https://docs.aws.amazon.com/athena/latest/ug/partition-projection.html) is supported here too. For example, on a CloudTrail logs table with partition projection:
65
+
66
+```yaml
67
+CloudtrailLogsTable:
68
+ Properties:
69
+ TableInput:
70
+ Parameters:
71
+ projection.enabled: "true"
72
+ projection.timestamp.type: "date"
73
+ projection.timestamp.format: "yyyy/MM/dd"
74
+ projection.timestamp.interval: "1"
75
+ projection.timestamp.interval.unit: "DAYS"
76
+ projection.awsregion.type: "enum"
77
+ projection.awsregion.values: "us-east-1,us-east-2"
78
+ storage.location.template: !Sub "s3://my-cloudtrail-logs/AWSLogs/${AWS::AccountId}/CloudTrail/${!awsregion}/${!timestamp}"
79
+
80
+ PartitionKeys:
81
+ - Name: awsregion
82
+ Type: string
83
+
84
+ - Name: timestamp
85
+ Type: string
86
+```
Home.md
... ...
@@ -19,6 +19,7 @@ I have a particular interest in [Huntsville, Alabama](https://huntsvilleal.gov)'
19 19
20 20
Field notes are short, to-the-point, and informal. I use them to document my experience with something or how to perform a particular task "in the field."
21 21
22
+* [Using CloudFormation to maintain Athena tables](FieldNotes/AthenaViaCloudformation)
22 23
* [How I use Taskwarrior](FieldNotes/Taskwarrior)
23 24
* [Using TPM-backed SSH keys on NixOS](FieldNotes/TPMKeys)
24 25
* [Changing your legal name in Madison County, Alabama](FieldNotes/LegalName) (originally published October 2020)