Misframe by Preetam Jinka

Time Series Databases Discussion Notes

A couple of weeks ago at Gophercon, a few of us got together at a table to discuss time series databases. Specifically, we were interested in talking about time series storage. The group included Jason Moiron (@jmoiron), Paul Dix (@pauldix), Ben Johnson (@benbjohnson), Julius Volz (@juliusvolz), and others representing companies and projects such as Datadog, InfluxDB, and Prometheus, just to name a few. I was representing VividCortex. What follow are observations about each group’s time series storage problem and some approaches that they’re taking (or have taken) in order to solve it.

Solving Time Series Storage with Brute Force

Many months ago, I first read “Searching 20 GB/sec: Systems Engineering Before Algorithms” [1], an excellent post by Steve at Scalyr. I re-read it six months ago when I was on winter break between semesters of college. I was traveling, working on Cistern [2], and thinking about time series storage. By “thinking,” I mean “annoyed.” I was using Bolt [3] (the B+tree-based, transactional storage engine implemented in Go) to store Cistern’s time series.

Current Projects

My interests generally seem to change every few months, but I tend to hover around the same topics. These days, I’ve been focusing on peer-based communication, consensus, failure detectors, and C++. I’m a fan of thinking big but starting small[1]. The “big” project I have right now is a failure detector. A failure detector simply checks for failures among nodes in a distributed system using some sort of ping[2]. I am starting to write this using C++.

Writing Quality

About a year ago, I told myself that I would lower my mental thresholds for content to force myself to write more. Not just blog posts, but code too. I’m not sure if it made a difference for blog posts or not, but I don’t agree with that idea anymore. When I look back at all of the posts I’ve written so far on Misframe, I see plenty of issues. I frequently find typos, issues with how the writing flows, and some things that just aren’t that good.

Cistern: The Vision of Reinvented Network Monitoring

Background As a hosting provider, I’ve had my fair share of DDoS attacks. My company doesn’t do any peering with transit providers. We just have a single upstream provider at our Ashburn data center. My provider has an automated DDoS detection system, which is made from scratch, that detects anomalous flows and either automatically blocks traffic or sends email alerts. I sometimes get alerts forwarded to me in case it’s an outbound anomaly originating from one of my clients’ VMs.

Optimizing Concurrent Map Access in Go

One of the more contentious sections of code in Catena, my time series storage engine, is the function that fetches a metricSource given its name. Every insert operation has to call this function at least once, but realistically it will be called potentially hundreds or thousands of times. This also happens across multiple goroutines, so we’ll have to have some sort of synchronization. The purpose of this function is to retrieve a pointer to an object given its name.

State of the State Part III

First, I suggest reading Baron’s “Time-Series Database Requirements” blog post to get some more context for this post. I read that and, as I usually do, had my mind set on low-level thoughts. I wrote the following comment: I took this screenshot a few months ago, so it has actually been almost a year since I wrote that. Time flies! Cistern’s graphs Cistern had graphs back in October 2014. I think I used my metricstore package.


Someone broke into my server. I was at beSwarm yesterday with my “social networking” setup. Social networking! pic.twitter.com/fdApIwlKyy — Preetam Jinka (@PreetamJinka) February 7, 2015 I was demoing Cistern in some form. Cistern doesn’t expose much to the user right now since most of my time was spent on very core features. So, what most people usually saw was the terminal log output. It’s still a little interesting because you can see it do some basic host discovery using SNMP, and it prints flow data as it arrives.

Personal projects, knowledge, and intuition

I had a short conversation with someone recently about having personal projects and applying to internships. The short version of what he said is, I do my school work well and good grades. Besides that, I also do well during internships. Why do I need personal projects? (The following represent my own views. This is a personal blog after all, and this is just, like, my opinion, man.) Just for fun, I did a quick search on personal projects to see what kind of links would show up.

Observium Annoys Me

I first started using Observium in 2011 or 2012. I was a senior in high school. I wasn’t that good at programming. I mean, I could write code in a few languages, knew the basic data structures, Big-O, etc. but I was not familiar with many higher level concepts like monitoring. I knew about SNMP, but I didn’t know anything at all about the implementation. As Bitcable’s infrastructure grew to include network switches and more hardware, I needed a monitoring tool.