Friday, July 12, 2013

What get stored is getting larger

My last entry tried to describe how historical data in the electronic age can get lost and degraded over time.  I intimated that the advent of purely digital storage systems would counter that scenario due to several properties of digital storage.
  • Digital storage is content format neutral.
  • Copying from one generation to the next generation of storage systems is a one step process and the same user cost no matter how large the content.
I realize that there are some unspoken assumptions behind this claim as well.
  • For the same or lower operational cost, the next generation has larger capacity.
  • You (or someone) is willing to expend the on going operational cost.
I used to think that continuous and sometimes rapid improvement of technological  would have to come to an end.  I don't think that end will be in sight as long as we maintain a commercially viable  economy. ( I might be able to explain that assumption in a future post).  The real bug-a-boo is the assumption that someone is willing to expend the costs.  Even with the costs declining it's still a cost and has to be measured against any other use or value that can be had from expending that cost on something else.  Which is a convoluted way of saying that maybe I, or you, or someone will get bored or tired of keeping the content around.  Or find something else to store of higher value in that space.

I believe that content format will also play a role in that lack of interest in continuing to maintain the content, this is really a degradation of value.  And it plays out in decades and sheer quantity.  If you come to believe that the content has little to no value, say for example, because the content was acquired in a way no longer considered viable, why would you expend any effort to maintain it?

I think that statement is true no matter what the content, whether it be digital medical records or digital photographs.  The variability in format is part of the picture, but in a subtle way.  Let's take digital photographs.  In the beginning of my storage I used jpeg and slowly added raw when it became available.  I still shoot jpeg for a variety of reasons.  First the summary:

Date Camera MegaPixels JPEG sizes Raw Sizes
June 15, 2001 Casio QV3000 3 1 N/A
June 12, 2004 Panasonic FZ10 4 3-4 N/A
March 26, 2005 Canon Digital Rebel XT 8 3-4 10
Feb 6, 2008 Olympus E510 10 7 9
April 5, 2009 Nikon D300 12 5-9 11-12
April 22,2009 Panasonic G1 12 3-5 14
March 26, 2012 Olympus E-M5 16 8-10 12-13

A flower sample from each, a same sized crop from what a print would look like viewed from the same distance.  At the smaller size all the cameras are pretty good (minus the bad exposures in several of them).  At the larger size, I have never printed this large, to my eye's things don't start looking good until 12 MP.  Too bad I didn't have the foresight to photograph exactly the same thing with each camera over the last 12 years :)

Camera
11 x 15 print
4 x 7 print

Casio QV3000




Panasonic FZ10




Canon Digital Rebel XT




Olympus E510




Nikon D300




Olympus EM5











Here is what I realize.  With flower pictures or any scene that can be easily found again in the real world, the superiority of camera's 2 to 3 generations ahead is so great that I am unlikely to keep the earlier photo's if I have to make a choice.  I am still copying them forward, simply because it has little to no cost, since between 2001 and 2013, my storage had increased into the Terabytes on the server side and 8 to 32 GB memory cards in the camera allow me to shoot 2 week vacations in both raw and jpeg.

One more thing, while one can argue that for my purposes of on screen viewing and at max 8 x 10 prints, 12MP is as good 16MP.  I do not think the MP increase is over and there are several reasons for that.  There are advantages in reducing noise and increasing apparent acuity but perhaps the most important is that it allows a true digital zoom.  The example here is to compare the Canon XT image to the Olympus EM5 image, one could crop out a 8 MP or so image from the Olympus and achieve acceptable digital zoom.  First quality digital zoom wont happen until we have 24 to 36 MP to play with.  You can already see that advantage in the smallish Nokia Lumia  phone cameras that use 48MP to reduce noise and allow digital zoom.

Tuesday, July 9, 2013

Yet another new start.

Yeah, I have heard that before.  Blogging is something that happens inside my head more than on the keyboard.  I blog when I am driving or walking the dog.  When I sit down to type, my intentions change and drift off into irrelevance.

I retired from the University of Michigan Medical School last September and moved away from my hometown of some 40 years, Ann Arbor, Michigan.  We moved to be nearer Kathie's family and that has been a blessing and good move.  My love/hate relationship with technology continues into retirement, sometimes it seems as if I have not retired at all since I still build, run and fix computers.  My perspective has changed.

I now take preserving my digital artifacts (mainly photo's) with a high degree of paranoia.  I used to speculate and wonder how we, that is the IT departments of medical centers, were going to be able to keep over a 100 years of medical records.  Think of it, computers are at most 65 years old and most digital originated medical records less than a decade.  We have no precedent. The same thing is true for our personal digital lives, our photo's, documents, movies and music.  In my relatively long career in computers, just a little over 40 years and my nearly 45 years with home entertainment electronics I have seen so many fundamental format changes take place that I loose track.

My conclusion has been that the only way forward is to continuously migrate digital records to new technology.  It's important when doing this migration to not loose data and that includes resolution data as well.  Let me recount my history of audio format's as a parable of what mostly likely is happening in the entire digital era.

Well, I open of course in the analog era.  My first source of music was 33 1/3 rpm and 45 rpm plastic records.  These were spirals of a continuous groove dug into the plastic and reproduced with diamond or ruby needles tracing the modulations of an audio signal in the plastic.  This was in the sixties, by which time this technology, first deployed by Thomas Edison in 1877 was close to a hundred years old.  Think of it, a hundred years of technological development had reached it's pinnacle of perfection.  In an early replay of the format wars which came to characterize recording and playback systems, the records disc, initially inferior to the records cylinder, but cheaper to make and mass produce, had become standardized as the 78 rpm record by 1925.

There was no home recording technology until the reel to reel tape drive appeared, copyright and patent battles settled and Sony started mass producing them.  I owned several of them, a Sony, a Tandberg and a Teac.  At their best, they could reproduce the entire range of a 331/3 rpm vinyl record (LP's or Long Play). In a comprize of data integrity (as we would call it now) mainly a degradation in signal to noise and frequency response, I used slower and slower recording speeds since that extended the capacity of my tapes.  (hey, does this sound familiar to anyone?) I started recording my LPs on first play to minimize the noise degradation that always seems to accompany the playback of LP's.  Due to time and costs, only a small portion of my LP collection was ever transcribed, and of those none are with me today.

Into the 70's, portable access to music was coveted,  We wanted to listen to our music in our car's and then everywhere we were.  The first format I used for that was the 8 track tape cartridge.  This was a continuous loop of tape ingeniously placed into a plastic box which was open on one end to expose the tape for reproduction purposes.  Of course, this was a highly compromised format, but very popular for a while.  IT fit easily into a car, but was still too big for true portability.  Home recording was very limited and never really took off.  Then came the cassette tape.  This was a miniaturized reel to reel tape and very portable.  Technological advances attempted to overcome the compromises of the format and home recording became easy and cheap.  Hence the appearance of that new social phenomena, the mix tape. Yet another social (or anti-social as it were) development, the 'walkman' appeared.  A tape player you could wear on your belt and listen to in private with headphones.    I eventually abandoned reel to reel and started recording my LP's onto cassette (it was good enough) and making mix tapes.  Only a small part of my small reel to reel collection was migrated onto cassette.  My main music collection was still LP's and I could re-record them onto cassette.

Indeed, I started a massive project to transcribe all my LP's onto cassette.  Then came the Compact Disc, the first mass produced digital recording formation.  Within a decade the cassette was dead.  I, along with everyone else, stopped buying LP's and started buying CD's.  I still transcribed them onto cassette, mostly out of habit.  One day, I realized I was no longer using any of my cassette's, I had a portable CD player and a CD player in my car.  Cassette tape was dead, CD's were audibly superior, but nor very recordable.  Almost none of either my LP nor my cassette collection made the transition into CD's.  The music that did re-appear, was because I purchased a CD copy.

CD's are a weird combination of analog waveform and digital recording.  Reasonably lossless.  But the next portable format was all digital and to make it small enough to be mass marketed, it involved compromising with data integrity.  This was the MP3 format.  Today, my home recorded CD's have MP3's on them, enabled with new technology since the basic CD format was digital in the first place, one could replace the analog waveform recording with a digital recording.  There are loss less digital formats and indeed I once again embarked on a migration project, this time trans-coding all my CD's into loss less digital FLAC (free lossless audio codec) and storing them on a home media server.  It's been years underway but I still have not had the time or inclination to migrate the entire CD collection.

So let's recap this long story:

LP's to reel to reel tape - less than 10% of the LP collection.
reel to reel to cassette - only a few titles made it because I had the 'original' LP's.
LP's to cassette's - maybe 33% were recorded and some were re-purchased.
cassette to CD - zero percent, a total replacement technology.
LP to CD - zero percent, no  point to it.
CD (replaced LP's as the 'original' collection and now very much larger)
CD to MP3 - only a small amount of 'mix tapes' (ha, an old term continues).
CD to FLAC - so far less than 50% has made it.  Once again, time and effort come into play.

So, the history of this analog to digital transcoding is not very impressive as far as any statistic is concerned,  Data is lost (LP's and CD's not transcribed) and Data integrity is lost (LP to analog tape and LP/CD to MP3).

That brings us to the server story and pretty much back to my current interest in digital  storage technology.
My first server was basically just my home PC.  My next server was an old home PC turned into a Linux server.  Following that was a dedicated Linux server.  And after my most recent move, I purchased a small, low powered, linux based media server from Iomega (then bought by EMC and now bought by Lenovo).

So, I have already had 3 server technology transitions.  But because each one of those was simply a way to record a digital format, there was no loss (of the source format)  and the entire collections were able to be transferred.  The reason the entire collections were transferred is that it took one operation for the entire collection.  Basically, Copy *.*  and go to bed.....  Digital offers a great advantage over analog in that entire digital formats can be preserved easily, as to whether they will be understandable in 100 years, that's another story.  Photography is perhaps a better way to tell that story however.

What is my history with photography like?  Strangley enough, it's very similar to audio.  Analog formats ruled the day for about a 100 years and copy and preserving was an act of reproduction using the newer cameras and films.  Then came digital.  My digital collection is now close to 13 years old.  The only universal storage format is another compromise with data integrity, the jpeg format (very similar to the MP3 format in result).  The lossless formats exist, but each camera manufacturer uses their own.  Since we have owned five different brands, I now have five different lossless formats; CR2 (Canon), NEF (Nikon), ORF (Olympus), ARW (Sony) and RW2(Panasonic).  Each one of them has made the server transition intact. The question is whether knowledge of those formats will last for the next 100 years?

Yes, there is a universal, lossless digital image format from Adobe called DNG.  But I don't use that yet and along with the software ability to understand digital formats, that's another blog entry,.