Feb 012012

One of the blogs I follow is Passive Guy (at his site called the Passive Voice), partly because he has a really good site for the latest news on the ebook front with several excerpts / re-tweets a day. And one of his posts today caught my eye given the whole Megaupload thing in the past week — the post was entitled “Piracy Does Depress Sales“. The post is an excerpt from another site by an attorney named Terry Hart, linking to a study by Stan Liebowitz at the University of Texas entitled “The Metric is the Message: How Much of the Decline in Sound Recording Sales is Due to File-Sharing?”. The claim from Liebowitz? That *all* of the decline in record sales could be attributed to file-sharing.

Now many of my readers know that I did a MA in public policy. Which means I also did graduate-level stats and economics courses. So, when I see an academic making such bold claims, two things happen — first, my interest is piqued … maybe they have some ground-breaking analysis and research to support this argument, after all it’s “published” and their careers depend on on it, and if not, maybe at least an innovative approach; second, my BS detector goes haywire.

Without going into the long, down and dirty statistical parts in too great of detail, Leibowitz’s argument rests on two premises:

  1. Levels of Record Sales (RS) is based on three variables: a + bFS + cZ, where a, b, c are coefficients, FS is level of filesharing, and Z is other various factors that influence Record Sales; and,
  2. Two of those variables can be ignored (a and cZ) if the other one (bFS) is large enough to explain on its own the level of change in Record Sales.

Unfortunately, both of these assumptions / premises are completely false and are fatal flaws to the rest of the econometric argument based on other people’s studies (note he did no research of his own, just summarized other literature which had their own flaws and failures, but I digress).

On the first premise, I’ll deal with the four sub-components in turn:

  • “RS” — this represents overall record sales…note that RIAA and others like to list this in terms of dollar sales / value, just as the movie industry does. However, $$ are impossible to compare in this scenario because file transfers have $0 value, and the only REAL indicator is # of units sold. However, even with that, you would have to do a lot of regression to show what the “real” # of units sold vs. nominal increases vs. sales of legacy songs that have been out for a long time (i.e. the long tail) vs. new releases as the metrics for measuring sales have changed dramatically in recent years (just check out the Billboard Magazine’s change in methodology in recent years to no longer count albums that were given away or sold too cheaply towards their charting results) not to mention that people have gone from buying whole albums to breaking it into individual tracks now through sites like iTunes. Plus, you’d have to cross-correlate that with big producer sales vs. indie band sales…the market is no longer homogenous, with much fewer barriers to entry/exit for entrepreneurs who can sell. The piracy world would not treat all the music the same, i.e. big bands are always going to be more sought after than garage bands, and so RS are not all able to be grouped together either…with market segmentation comes the need to segment your formulas too. This isn’t “completely fatal” to the approach, more like a statistical caveat that the RS figures are unstable and hence unreliable over time;
  • “a” — this is a non-zero stand alone parameter, arguing that there is some natural, presumably static rate of sales as it is a stand-alone coefficient (i.e. not a factor of economy, etc.). In other words, if the other two elements ended up being zero for some reason, there would still be some level of record sales. I don’t disagree that there could be legacy sales, etc., but it would never be static and is highly dependent on the economy itself…if the economy is doing well, and people have money, they spend it on non-essentials like music, but it isn’t entirely 1:1 either since the research shows that some people substitute away from other entertainment expenses like movies during economic downturns…so it is not some stable “alpha” coefficient, there is another factor that goes with it or maybe it just rolls into the “cZ” variable. I’m definitely not buying the argument that there is some “natural” rate of Record Sales...again, this isn’t fatal to his end argument as he ignores this element, dealt with later;
  • “b x FS” — the argument here is that there are two components as a result of filesharing — first, there is the rate of displacement (b) by which the number of sales that are displaced by a single transfer…or put in different terms, if the rate was 10%, it would be 10 transfers would equal one lost sale, etc. And then you multiply it by the total number of files shared. Seems like straightforward math, but it isn’t. First of all, you have to decide what that coefficient is (b). And there’s really no way to know, considering that the piracy market is segmented by the type of pirate. For example, there are the fetish pirates — these are basically hoarders. They pirate to just collect songs, most of them with more songs than they could ever listen to in their lifetime. It’s a collection compulsion. Not unlike your grandmother who might have hoarded old twist ties or butter containers, they save them / copy them / keep them. But like the grandmothers compulsion to “save” them and collect them, they wouldn’t go out and buy them. If they had to pay for the song, even a penny, they wouldn’t. They’re only taking them cuz they’re free. So how many lost sales are there here? NONE. They wouldn’t pay for the song, they just collected it for no reason other than they liked having the most songs, blah blah blah. Many of them collect songs and artists in genres they DON’T EVEN LIKE. Just because it is there. For them, the coefficient is 0 … if they want a new song, they probably buy it on day of release to have it and then upload it, but their own receipt of songs displaces very little sales. After that are your casual pirates…these are people who don’t hoard everything, but hear a song on the radio and pick it up for free off the pirate sites. These people too wouldn’t buy many of their songs, but they might. Probably about a 10% rate, but not as a direct function of FileSharing because they wouldn’t buy all the songs…for example, it might seem like a 1:10 ratio, but that might be true for the first 10 songs, and then their pocketbook forces them to make choices so it starts to rise to 1:50, 1:100, etc. Definitely a variable rate diminishing over time with previous sales (i.e. their purchases would not be independent events). After that are your sampler pirates…these are people who actually go the opposite way. They might “sample” 100 songs to find 2 they actually like, and then they go the person’s whole album. So their coefficient is actually HELPING sales, all they’re really doing is saving themselves from going to old-fashioned record stores to listen to everything first. Finally, you have what I consider to be the true pirates. These are people who actively go out and download what they would otherwise buy…and THEIR coefficient is 100%. 1 download = 1 lost sale. The RIAA and others argue that this is EVERYONE who pirates, when in fact the %age is REALLY hard to calculate. In part because there are hybrid pirates — part sampler, part hoarder, part true pirate, etc. And lord knows what THEIR coefficient is. However, regardless, you have to take their individual coefficients by group and multiply that by the total filesharing number. Why can’t you just average them, as Liebowitz’s methodological review does? Because the hoarders filesharing behaviour is way higher in volume than the casual or samplers, and the true pirates are somewhere in between. At the very least you would need to weight it to figure out the percentages. And now for the extra kicker: no one has a reliable measure of filesharing. Say what? That’s right…they know how much global bandwidth was used, and they have lots of ways of proxying the number of files transferred, etc. But guess what? It doesn’t distinguish in many cases between movies (large size), software/warez (medium size), MP3s (small size) and ebooks/texts (microsize). And it also doesn’t differentiate very well between three key areas — porn, other file transfers including streaming, and legitimate file transfers. It just does a multipart estimate, piling error on estimate on wild ass guess as it goes. Why does this matter? Because if you can’t figure out the right coefficient to use and multiply it by the right file and size to get true MP3/music filesharing, you can’t figure out the relative change in this parameter over time.
  • “c x Z” — this is the catch all to suggest that there is some coefficient (hopefully positive) that multiples by a bunch of grouped variables (Z) that contribute to sales…things like previous sales, new releases, promotion, touring, economic growth, etc…while ignoring things like streaming narrow niche radio stations that are substitutes for purchases too, a technology that didn’t exist 20 years ago. Overall, this element is not problematic until you try to negate it out below.

So what does Leibowitz do with this argument? He basically comes to the hypothesis that perhaps the change in RS could be accounted for just through the second variable alone. In other words, given the size of files being shared being large enough to impact sales, and assuming “b” to be negative (i.e. pirating has SOME displacement effect on sales), could the percentage change in the second variable be enough on its own to account for changes in RS? Liebowitz takes a bunch of studies, drops the “a” and “cZ” variables, in effect averages out the “displacement variable” (i.e. “b”), multiplies it by the filesharing rate of change over time, and ends up with a figure equal or greater than the change in sales. Ergo, he concludes that the second variable CAN account for all the change and therefore DOES account for all the change.

But the coefficient is done wrong and the filesharing numbers are wrong. Plus, to correctly argue that you can ignore “a” and “cZ”, those numbers have to be positive (i.e. assuming they wouldn’t have reduced sales themselves, they would have only helped sales). In fact, he goes further at the end to suggest since those numbers were LIKELY positive, pirate sales probably depressed sales even more than 100% i.e. compounded reductions from what natural growth would have been. However, those other numbers are NOT necessarily positive. In fact, lots of literature has argued either way on those elements too…stagnating industry, economic collapse, streaming substitutes, musical tastes and purchasing power of aging demographics who already bought their favorite bands and aren’t buying pop stuff. Not to mention it is all digital now — so people aren’t WEARING out their CDs or MP3s at the rate of cassettes, 8-tracks, or vinyl. Which means they’re not buying replacement copies. If you want to see this playing out in the ebook world, look at the Big 6 refusing to sell ebooks to libraries or limiting number of loans because they want to ensure future replacement sales as they do now with paper versions that wear out.

Overall, I’m REALLY disappointed with the article. I’m not an expert, just someone well-versed in economics, performance measurement and statistics, and I can see that the Liebowitz article basically plays fast and loose with techniques to argue something he decided from the beginning — if every pirated song represented a lost sale, then piracy accounts for all the losses. Not surprisingly, if you start with that as a hidden premise to your argument, you find that the results match.

Now waaaaay back in the title, I mentioned a link to ebooks. And here’s the rub. Piracy or lending libraries have the same effect on sales — they’re partly substitute goods and partly complementary. What does that mean in layman terms? It means some pirates / borrowers will be true pirates — if they can pirate, they will pirate books they would have otherwise bought. Not 1:1, but a decent sized coefficient perhaps. Highly dependent on the price of the book, actually (even filesharing of MP3s took a hit when iTunes started selling singles for $1) so the $25 new release book might get pirated a lot faster than your $3 backlist, varying by both pirate and author’s name. But as those participating in the lending library and Kindle Select are finding, those other free sales / pirated copies can also ADD to other sales when the reader reads book 1 in a series and comes back later to buy book 2 or 3. They may have got their “taste” for free, but they still bought the other books. Note that Amazon pointed this out for their Prime lending experience … people BORROWED book 1 of the Hunger Games, but then came back and bought Book 2 and 3.

The REAL difference at the moment though is that ebook pirating is not quite as easy as MP3 pirating…For example, I can’t go into Walmart and pick up a disk with 10 ebooks on it, rip it at home and upload everywhere. Plus the software to burn / copy onto ereaders hasn’t reached the user-friendliness yet of iTunes or WinAmp or RealPlayer that manages your library completely for you, regardless of purchase source and moves it to your device seamlessly. It is still far easier for ereader users to just get their books from Kindle store, or Kobo, or Apple, etc. Which isn’t to say it’s “hard” to do the other, just that it isn’t seamless yet. And not everyone is using the same ubiquitous format (ePub and Mobi will win, but they’re not at “MP3″ level acceptance yet). Which mean the displacement coefficient is probably less for ebooks than it is for MP3s, simply as the technology is a barrier to the casual, sampler, true, and hybrid pirates.

Only time will tell what the long-term coefficients will be for the ebook industry. But I’m pretty sure the # will not be 100% of ebook sales, even if ebook sales will eventually represent 100% of the decline in paper copies.

  One Response to “Pricing — Online piracy, music and ebooks…”

  1. Thanks for succinctly summarizing Liebowitz’s criticisms and reworking of the various filesharing studies. I skimmed his argument, and it was way too technical for me to wade through without spending many hours and basically doing a (self-taught) refresher course in stats. However, some of the qualitative comments he made were red flags, so the whole thing appeared suspect. Your educated analysis is a big help!

 Leave a Reply



You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>