On HFT: Assumptions, Agent-Based Modeling, and a Philosophy of Error

On High-Frequency Trading

In which I identify errors in recent HFT academic papers and discuss some difficulties of knowledge

Simplicity does not come of itself but must be created – Clifford Truesdell

The Blogs Must Be Crazy

When it comes to trading, especially of the high-frequency (HFT) variety, a few of my favorite finance and economics bloggers write some really strange things. In an old piece stuffed with bad ideas, Felix Salmon basically states that markets should exist for people to be able to unload depreciating assets at a decent price.[1] Now Noah Smith thinks high-frequency trading consumes liquidity — never mind that the most liquid markets in the world have the highest number of HFT players.[2] Fear not my rhetorically prolific but millisecondly challenged friends, the markets are in good health.

I know because I am HFT. I mean, I made one of them — with a few other dudes.[3] I am a fish. One night, after a few too many, an account rep quipped: “high frequency boys are sharks bumping into each other.” (An instant classic, it has since stuck with us as a joke. Permit me to run with the metaphor.) However, I’m certainly not a shark. If we’re playing Odell Down Under, I’m probably a clown fish. I am a small part of a vast ocean. Even though many of the swimmers in this ocean have a similar goal of being able to swim very fast, not all of us are going for the same food. The longevity of the ecosystem depends on its robustness. The fishies know this.

Economists love the ocean. They love studying the ocean. They want to make sure the ocean has a long and prosperous and healthy life. In order to do that they must understand how it works. This is hard. Sometimes they make assumptions about the fish in order to make it easier to understand the ocean. But erroneous assumptions of the fish often lead to erroneous conclusions about the ocean. This post will provide some fish-perspective on recent academic research which tried to answer big, juicy, ocean-policy relevant questions (or in Academese it’s “contribute to the growing body of research”).

As always, there is a significantly non-zero probability that anything I say is wrong to very wrong. Not being an economist, I apologize in advance. I reserve the right to correct and be corrected.

Very Liquidity. Such Market. Wow.

In the paper Noah cites, Johannes Breckenfelder attempts to study the effects competing HFTs had on “market quality” after high-frequency traders entered Sweden’s equities markets circa 2009. Breckenfelder concludes that HFT competition consumes liquidity and worsens market quality. To be able to make such broad claims requires extreme care and nuance. The paper exhibits neither. I believe a more precise analysis of counterparties will lead Breckenfelder to the opposite conclusion: his data actually suggests HFTs in competition improves market quality. He just needs to understand what he’s looking at.

Quality Control

First, it’s important for us to define “market quality.” Unfortunately the paper does not do this very well. Generally, economists like Noah and Breckenfelder are concerned with longitudinal volatility and liquidity measures — things which reflect the long-term stability of markets and for the ability of large, fundamental investors to be able to get in and out of their big positions. This makes sense. We need to have some belief that we can eventually sell our investments in the future. But Breckenfelder’s findings are only significant with respect to intraday metrics. Intraday volatility is inconsequential to the “market quality” in which economists are interested if long-term frequencies remain untouched.

In fact, Breckenfelder doesn’t find evidence that there is any effect on daily volatility or liquidity! I propose that his findings of intraday volatility might actually be beneficial. Intraday volatility presents new opportunities for large and small investors to get in and out of their positions at a more advantageous price. In practice this is the case. Increasingly, long term funds do not execute their trades manually. They have their own high speed execution algorithms that get them into the positions they want over time as opportunities arise. And with HFTs competing intraday, there is more opportunity for these algorithms to obtain their long term positions.

In terms of metrics, Breckenfelder fails to acknowledge the multi-dimensionality of liquidity and the weaknesses of his own data set. The paper liberally modifies a heuristic liquidity calculation — initially designed for low frequency analysis — and, by stating that it is  “well known”, implicitly asserts that it is a sufficient measure of liquidity. This is wrong.  When we examine his results with the relevant lens, the data shows that the markets are indeed resilient at the daily level, which is a finding certainly more important than an increase in 5 minute volatility. Without access to order and book depth data, Breckenfelder is unable to make strong conclusions about effects on liquidity. He should be more careful.

Givers and Takers

To quantify the amount of liquidity HFTs provide versus take, Breckenfelder creates a novel metric based on assumptions he considers to be “intuitive.” Noah also agrees with the assumptions. This metric becomes a breaking point of the paper, as it is based on a fundamental misunderstanding of HFT strategies. This is enough to merit a reanalysis:

A high-frequency trade is assumed to be consuming liquidity if the stock midpoint price moved upwards (downwards) prior to a high-frequency buy (sell) trade; otherwise, it is assumed to be providing liquidity. We construct our measure for both the one-minute period and the five-minute period prior to each midpoint price change. The intuition behind this measure is that, given a trend, a liquidity provider will execute trades in the opposite direction to the trend. Trading with the trend amplifies it.

First, a liquidity provider provides liquidity. An order which adds to the total quantity in the order book is a liquidity providing order. Second, all trades “consume” liquidity. Every trade removes an order from the limit order book. When a trade occurs there are two sides: active and passive. The active side is the order which matched against an existing passive buy or sell order. In the context of the paper, the active order is the one which should be considered the “liquidity consuming” trade. Breckenfelder chooses to make up his own definition. Here-I-made-you-a-picture:

liquidity2
this is what a market looks like

In liquid markets, when the midpoint moves up (down), that often means an order has come in which is larger than the best ask (bid). After the quantity of the best ask (bid) is matched against the order, the remaining quantity becomes the new best bid (ask). High frequency liquidity providers will want to add passive bid (ask) orders at the new best price in order to obtain queue priority. Since trading frequently occurs back and forth on both sides before one is knocked out, called bid-ask bounce, the liquidity providers with the best queue position will get filled, even in the direction of the trend!

I might be missing something with the design of this metric, but I don’t think it even passes the smell test. Typical of economists, Noah and Breckenfelder fail to consider cost.[4] Regardless of frequency, the goal of a trading firm is to make money. The naive trend following strategy used for the metric implicitly assumes a predictive power (expected value of the trade) greater than the average bid-ask spread. Highly unlikely. This means that a strategy taking liquidity according to this metric has a negative expected return. And that does not make sense. He should use the active/passive data he already has rather than invent his own metric — and certainly shouldn’t base anything on “intuition.”

Secret Agent-Based Man

Broadly: an agent-based model (ABM) is a type of simulation that is interested in discovering nifty stuff (emergent properties) which can arise when individually behaving entities (agents) interact with each other and their environment according to a set of rules. I believe there is great potential for using ABMs as a tool to investigate models, show us where they break, and explore the limitations of theories.

In economic research, the Santa Fe Institute has been working on some very interesting ABM projects. One group, lead by John Geanakoplos and Robert Axtell, is investigating a systemic risk model which includes the housing market.[5] Using an enormous set of individual loan data from Washington, D.C., they have constructed a model which not only resembles the empirical data, but allows them to manipulate individual variables to give “what might have happened” scenarios. So far the results look promising and they are trying to replicate the model for data from Boston and San Francisco. Note how careful they are about making (and not making) inferential claims from their measurements. This type of precise language is used by scientists who understand how hard it is to know something.

On the other end of the spectrum, ABMs can easily give us a false sense of understanding. Since there are so many free parameters in both the design of the agents and the interactions between them, you can get an ABM to show pretty much any results you want. That’s exactly what another group unknowingly did in a recent HFT paper posted to the arXiv. Sandrine Leal et al attempt to investigate whether high frequency trading generates “flash crashes.” They conclude that high frequency trading leads to flash crashes. I don’t want to discourage ambitious researchers from pursuing interesting models, but I think this paper is poorly designed and provides no relevant information.

It takes two to trade

In order to reach conclusions of real world significance, a proper ABM model must make some attempt to create agents which map to reality. This is obviously a difficult undertaking when it comes to trading markets. Before the high frequency era, institutional investors often traded against a counterparty (sometimes called a “broker”) who offered a market — a price they are willing to buy and sell at — either over the phone or electronically. Now most market making is done by algorithms which offer to buy and sell at many prices. As discussed earlier, this competition for queue priority enhances stability by increasing the total volume of shares available to buy or sell at any given time. Any model of trading must have these types of participants.

The first problem in Leal is that the market participants (agents) are inaccurately specified. The paper only has two types of agents: low-frequency (LF) and high-frequency (HF). In each “trading period,” the LF traders place buy and sell orders, then the HF traders incrementally place orders based on those. This awkward setup is perhaps due to technology limitations — forgivable for student research — but to not model the primary counterparty is just awful. Even with purely random HF strategies they are going to see flash crashes when there aren’t any liquidity providers.

Another contentious choice was to permit all agents to switch their strategies on a whim or not trade at all depending on profitability. This doesn’t make any sense. On the low-frequency side, institutional investors and traders are heavily bound to the types of strategies they can use, with many funds being long only, or long-short in a narrow range. For HF firms, not every algorithm is an intraday fast-follower (a point all of these economists should study very very carefully). By giving all agents a stochastic variable for strategy choice which is weighted toward the strategy that has the best local performance, of course flash crashes will occur. This is an extremely poor design choice on the authors’ part and does not reflect the consistent diversity of investors and strategies that are found in the market. Leal needs to create a more representative inventory of agents.

Is this real life?

Perhaps the most important flaw in the paper is its failure to reconcile results against any empirical data. Whereas the Geanakoplos paper used empirical data for both calibration and verification, Leal’s model is primarily heuristic on all sides. Even the control case is a universe populated solely by the paper’s own low-frequency traders. Simply because they managed to get low-frequency traders to not generate flash crashes doesn’t make it a reasonable control. There could be hundreds of errors in the LF agents which still result in the same observed behavior. In other words: we need a benchmark.

I concede that they do attempt to use existing empirical research as the basis for some of the agents’ behaviors, but it’s pretty half-assed. Citing a CFTC preliminary report about the May 2010 Flash Crash  (I assume though it’s incorrectly referenced), Leal creates a behavior specific to the HF agents: unfilled orders are cancelled after a fixed period of time to represent the higher rate of cancellations seen in HFTs. By manipulating the cancellation parameter they conclude that higher cancellation rates make markets more prone to flash crashes and suggest a policy involving a minimum period of time before a cancel can be made. This implementation doesn’t make any sense however. As I earlier discussed, orders are left out to obtain queue priority. A typical HF strategy will not pull an order until it becomes the best bid or ask (or close to it) and the strategy has decided that there’s a negative expectancy of being filled at that price. This means the cancellation parameter is inextricably tied to the strategy. While in the context of the paper I think it’s interesting, a fixed parameter does not provide compelling evidence worthy of policy consideration.

Finally, the paper leans on weak research to define what a flash crash is and how often they happen. The paper from which they derive their model conflates flash crashes with the numerous and frequent “fat fingers” which happen across all markets. The model then fits to these poor examples of “crashes.” Curiously, given the long/short design of the agent’s potential inventory, Leal should have seen just as many “flash rallies” in which the price shoots upward. The existence of those is neither noted nor considered. In the end, there is no data to suggest the Leal model shows us anything except that they were able to design a poorly mapping agent model in which their mistaken definition of flash crash occurs. So what?

E.R.R.O.R.

The previous examples highlight the limitations of proxies — the numbers we create to measure something situationally meaningful. Breckenfelder was so focused on experimental design that he didn’t consider the possibility of his metrics measuring inconsequential noise. Leal et al wanted to show that certain agents turn a market fragile but they didn’t first create a realistically stable one. For each proxy we use, we must at once possess a deep understanding of how it measures, and when it will fail to do so. In the absence of such an understanding, a researcher is likely to misspecify the experiment and reach an invalid or overreaching conclusion.

The late, Nobel laureate physicist Richard Feynman championed the “great value of a satisfactory philosophy of ignorance.” In an old interview, he talks about “how hard it is to know something”, while expressing doubt that many social scientists have taken the requisite care to be able to reach their conclusions. In my interpretation, a philosophy of ignorance asks us to be acutely aware of the fallibility of measurements and experimentation. Formally: when will our experiment E confirm our hypothesis H when H is false?

This is one of the driving questions underlying the error-statistics philosophy of Professor D. G. Mayo, which also introduces the highly useful concept of “severity assessment.”[6] A test T is said to be “sufficiently severe” if it has a very high probability of determining the falsity of a hypothesis H when H is false, or failing to do so when H is true. This assessment permits us to construct satisfaction criteria for the tests we need to perform in order to probe our hypotheses. Only when we have met all of the criteria can we claim knowledge. Until then, we must remain satisfactorily ignorant.

This is my number, there are many like it…

While obviously not unique to the field, the proxy problem is fairly pervasive in economics. In the current issue of Foreign Affairs, economist and money manager Zachary Karabell laments the widespread use of the GDP statistic in many ways for which it was not designed. Further, Karabell stresses the need for new, specific measurements for our questions of interest, rather than over-relying on existing numbers:

[Simon] Kuznets and his cohort, for their part, understood these limitations [of GDP] well. As Kuznets wrote in 1934, “The valuable capacity of the human mind to simplify a complex situation … becomes dangerous when not controlled in terms of definitely stated criteria.” He warned that numbers and statistics were particularly susceptible to the illusion of “precision and simplicity” and that officials and others could easily misuse them.

To be useful, a new generation of indicators would have to answer particular, well-defined questions. But they cannot look like new versions of the old numbers. They cannot be one-size-fits-all generalizations. Instead of a few big averages, officials and ordinary people need a multiplicity of numbers that seek to answer a multitude of questions. … In short, we do not need better leading indicators. We need bespoke indicators tailored to the specific needs of governments, businesses, communities, and individuals.

Historically, scientists interested in rigorous inquiry have produced similar sets of bespoke indicators. In Mayo’s 2010 book “Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science” she discusses the development of the parameterized post-Newtownian (PPN) framework which allowed physicists to systematically test the predictions of Einstein’s general theory of relativity (GTR). The framework was designed so that violations of GTR’s hypotheses could be described and measured, thus attempting to avoid premature acceptance of the theory. But it was also important for the framework to not be biased toward GTR; at the time there were competing relativistic theories of gravity which needed to be compared. By formalizing and listing all of these parameter predictions, physicists were able to compare models prior to experimentation and determine the contribution of any one test toward the overall objective.

In Context

We should now be able to clearly identify why the previous HFT papers are of little value. In Breckenfelder, the entry of any type of participant that traded on intraday trends would give the false positive of worsening market quality. Further, he ignored more important indicators of market quality so that his results would favor the hypothesis. In Leal, they specifically designed agents in a way that would cause flash crashes. None of their tests could be said to be sufficiently severe. And most importantly, the authors neglect the untested predictions of their hypotheses. It might seem counterintuitive to enumerate such things, but avoiding them signals a lack of interest in rigorous inquiry.

Under The Sea

In grade school science classes, a popular “video day” involved watching a Discovery Science or NOVA episode about oceans in which the narrator would always, every time, excitingly declare something like “people who think outer space is the final frontier might be shocked to learn that we’ve only explored x% of the depths of our oceans!” I think of that whenever I see an economist talk about HFT policy recommendations. (I don’t really, but for the purposes of the metaphor…)

Perhaps this fact is not as obvious as it should be: The Market is not a single, homogeneous entity where everyone meets to exchange everything. Competition exists not only within markets, but also between markets. People want to trade where they feel like they have a satisfactory blend between advantage and risk. Markets set rules to attract customers and ensure long term growth. This has led to a number of solutions to the potential problems with HFT. The currency exchange EBS has been slowly rolling out latency floors with great success. The futures exchanges run by the CME impose messaging limits. Moving to larger tick sizes increases the liquidity available at every level and prevents over-trading by high speed algorithms. So when a researcher puts on his space suit and asks: should we mandate batch auctions or implement a transaction tax? The fish answer no. Because those ideas are silly.

I see little upside for the fish when people who can’t even swim want to meddle with the entire ocean. Before any claim of knowledge or recommendation of policy, the economist must get wet.[7]

   

Addendum (3/28)

Professor Paul Pfleiderer at Stanford posted a fantastic essay about how models and assumptions can easily lead us astray. See: Chameleons: The Misuse of Theoretical Models in Economics and Finance. The abstract:

In this essay I discuss how theoretical models in finance and economics are used in ways that make them “chameleons” and how chameleons devalue the intellectual currency and muddy policy debates. A model becomes a chameleon when it is built on assumptions with dubious connections to the real world but nevertheless has conclusions that are uncritically (or not critically enough) applied to understanding our economy. I discuss how chameleons are created and nurtured by the mistaken notion that one should not judge a model by its assumptions, by the unfounded argument that models should have equal standing until definitive empirical tests are conducted, and by misplaced appeals to “as-if” arguments, mathematical elegance, subtlety, references to assumptions that are “standard in the literature,” and the need for tractability.

Footnotes    (↵ returns to text)

  1. Among his numerous fumbles, Felix also writes that during the Flash Crash of 2010 it was the machines that turned themselves off. Hardly. I was at the Chicago Board of Trade that day when there were zero orders in the market. Even completely automated firms have people, called operations or “ops”, who monitor the overall health of the systems to make sure they are functioning as desired. When something out of the ordinary happens, they have to quickly figure out whether or not to turn off certain algorithms. During the flash crash, humans chose to turn the machines off and pull the orders. Many people, including Felix, also incorrectly assume that the famous fat finger in the e-Mini S&P futures contracts was the sole triggering event. Those people might not remember that CNBC had been showing live footage of Greece burning all day. Then a good 5 minutes before the fat finger, there were significant volatility spikes in major currency pairs. Humans were there. Felix shouldn’t pretend that they weren’t.
  2. Full disclosure: I’m a big Noah Smith fan. You should read his blog (link) and follow him on twitter. EDIT: I should also probably mention that Noah doesn’t necessarily think that. He’s really cautious about jumping to conclusions. But that doesn’t make for a good controversial opener, so I deliberately overstated his conviction in Breckenfelder’s paper.
  3. When I say made: you have to do something with all the packets of information the exchange sends you. They go through a crazy and beautiful complex of code until eventually you generate a packet to send back to the exchange. Of course you need strategies as well; so you need to build research tools to mock up your ideas. But you also need data, which has to be reliably stored somewhere for easy and continuous retrieval by your research tools as well as by your system. And because you’re going to be trading on highly granular data, you need to build a highly sophisticated simulation environment so that you can make sure your model expectations might be somewhere close to reality. You don’t want to reinvent the wheel every week, so you recreate a lot, a lot, of academic papers to get an idea if any can provide new information. But your system currently doesn’t handle the models you want to test, so you build out some more infrastructure. Eventually you need to find bottlenecks in your code and figure out why it’s using up so much memory. Now you’re multi-threading, which is somewhat like starting the first day of computer science all over again. You can’t waste time doing mundane things, so any routine task you do should be automated. You’re on a team; you need bug tracking software and an in-house wiki, and your code needs to be documented. Are you even live yet? Probably not. If you’ve done any test trades you might wonder why your orders don’t match up to backtesting. Go fix your simulator to model passive orders correctly. You also want to experiment with different types of execution so you can get into the positions your models want you to be in. Your historical testing has a lot of spikes from foreseeable events like economic numbers. So your algorithms need to handle those, and of course you need the historical economic data. Now of course you didn’t write the proper unit-tests for every piece of code throughout this whole time, so your errors propagate. Debug. Debug. Now you’ve got some confidence in a few things. Ready? Not quite. You need to look at some different frequencies to make a more robust alpha that is uncorrelated to your current market environment. Then, your potentially dangerous system needs to be certified by the exchange to make sure it has risk limits and can handle strange “edge case” scenarios. Eventually you get some test trades going, and obviously you do those during a week or two of draw downs. Maybe a few weeks (months) later you get some small size going live and finally start putting some points on the board. Then you have to figure out how to scale it up. And make it better. And faster… oh, and someone has to make the GUI. Mother. Fucking. GUIs.
  4. #snarky
  5. I’d like to thank Professor Axtell for fielding some questions I had about his work. I look forward to seeing more of it! And I also owe a big thanks to Professor Rajiv Sethi for promptly responding to numerous inquiries and pointing me in the right direction.
  6. Full disclosure: Professor Mayo sent me her book “Error and Inference” (link) after I won her monthly palindrome contest back in December. Some papers by Andrew Gelman discussing his brand of Bayesian philosophy originally sparked my interest in the philosophy of statistics/science. (link) While I am highly likely to over-simplify or screw up the specifics of Mayo’s work (for which I apologize in advance), I believe the broad motifs are robust to any contemporary philosophy of science: what do our measurements really mean; how do they contribute to evidence for or against our hypotheses.
  7. The blogosphere increasingly blurs the line between economist and person-who-talks-about-economics. Both, however, need quite a bit more humble pie in their diets. An obvious extreme example is Peter Schiff (not an economist despite what he believes) who is constantly derping about hyperinflation and returning to the gold standard. But there are many “respected academics” who produce egregiously idiotic ideas about things of which they understand very little — especially when it comes to financial products. Quantitative Finance professor Robert Frey recently expressed outrage that “the Fed has been the major purchaser at some auctions”, apparently unaware that the Fed is not a purchaser at Treasury auctions. Economics writer Evan Soltas once concluded that “the Fed should target Eurodollars” despite not knowing the first thing about interest rate spreads or even Eurodollars themselves. Economist Scott Sumner designed an NGDP futures product with such a frightful ignorance of trading mechanics that it’s almost impossible to believe his PhD research has anything to do with markets. Even well-intended, their ideas are of no use other than signaling faux-knowledge to people who don’t know better — many of whom are other academics. In Feynman’s words: “they haven’t done the work.”