Streaming fastly logs to fluentd
We have recently started down the path of revamping our CDN infrastructure, moving away from Cloudfront, which is a nice default in AWS, to Fastly, which is a much more robust, programmable CDN.
Part of the challenge when switching CDNs in flight is having good visibility into what is happening during the switch. For us, that means having good logging infrastructure to monitor any problems that our users might be experiencing.
Luckily, Fastly provides logging outputs for a variety of platforms. The only problem is that they don’t do anything out of the box for fluentd, which is what we are running as our log aggregator. They do, however, provide a syslog output, which is fully customizable, so it’s really a “send a string to a tcp port” more than a syslog output.
Our first attempt was to use the fluentd syslog input, but it constantly had problems parsing the Fastly output. This coupled with the fact that we needed a special token to secure things, led us to change to the tcp input plugin.
We run an ECS-based container infrastructure using convox (an open-source heroku-like api layer on top of AWS). That means that our point of entry will be a Dockerfile — we start simple, fluentd 12 and our custom conf file.
FROM fluent/fluentd:v0.12
USER root
COPY fluent.conf /fluentd/etc/fluent.conf
Securing communications
In order to secure traffic from fastly, we did two things:
- Use TLS for communication so that the log stream is encrypted in transit
- Use a shared secret token that only fastly and our log service know about so that malicious users cannot send garbage data to our log endpoint
The first step is easy, just click the TLS checkbox on Fastly’s syslog output. The second step is a little trickier.
Fluentd does not support tokens in neither its syslog nor its tcp listener, and while Fastly does have a special token field, it appears not to be sent as part of the tcp message (probably ends up as a header).
However, by embedding the token in the message and into the fluentd parsing regex, we can validate the “format” of the message is correct, and thus our token is correct.
We decided to customize the fastly log format to be key=value pairs so that it would be easier to parse in fluentd. Note how we have specified a secret token as the leading part of the line.
token=”secret_goes_here” service=”reverb.com” remote_address=”%h” timestamp=”%t” request=”%r” status=”%>s” size=”%b”
We also went to the Fastly Advanced options in the Syslog output, and selected “Blank” in the log line format so that fastly doesn’t decorate our log line with any syslog-specific indicators.
On the fluentd side, we are going to expect the same token in our tcp listener:
At this point we have parsed the fastly logs and can direct the output of fluentd to wherever we want. If setting this up in AWS, remember to use SSL on the load balancer so that the traffic is encrypted.
If you want to work on modern infrastructure built on aws/terraform/ECS/convox/ruby/golang, we are hiring!