Category MWJ

RSS feed changes

Among the many, many, many upgrades we’ve suffered through in the past few weeks (including the production machine, production software, hard drives, machine repairs, even light fixtures and internet access hardware) have been server changes. While we’re shuffling some things around, we figured it was time to make the changes announced last year permanent.

So, effective as of now, the original https://secure.macjournals.com/ RSS feeds for MDJ and MWJ subscribers now get HTTP redirects to the newer feeds introduced last July, and those feeds are no longer “beta.” (Ironically, we may need to rewrite the software that makes those feeds, since it turns out the database behind them is really quite stupendously horribly designed [our fault]), but the URLs will remain the same.)

Your newsreader should easily and permanently replace your old URLs with the new ones the next time you refresh. If it doesn’t, we heartily recommend the now-free NetNewsWire 3.1, which handles this with aplomb and is, as mentioned, now free. Hence the adjective.


We had no idea how much stuff needed maintenance until we started fixing a few of the more broken things. It was kind of like fixing a broken drawer and discovering that the entire cabinet was riddled with termites, and then that the carpet needed repair, and on and on. We had about a 10-day stretch in February where something new, but minor, broke every single day. They were all manageable and fixed in a few hours, but when they come 10-15 per week for 2-3 weeks in a row, it makes you want to do something self-destructive, like enter politics or speculate about iPhone carrier agreements.

Almost everything is upgraded and fixed, including a few things MDJ and MWJ readers have wanted for nearly two years, with testing to resume this week (cross your fingers). Almost every single thing between the editors and the internet that we use to produce MDJ and MWJ has been upgraded or replaced since the last issue, and that almost included the very Ethernet cables connecting the machines. It still may—LAN transfers are slower than they ought to be in some cases but not others—but it’s been quite the makeover. We’re just glad it’s done. Hauling G5/Mac Pro cases around and copying 300GB hard drives over and over is something we’re happy to leave for special occasions. We’ll have more upgrading to do in the second half of 2008, but we’re just about done for now.

An Expo-timed update

It was about a week ago that we noted that our production machine had stopped working.

At the time, we did not know that the local Apple Store’s “diagnosis in 24-48 hours” would take five days—and end with “we can’t find anything else wrong so it must be the logic board; please mortgage all your property to buy a new one.”

We’re investigating other options, including last week’s new machines. (Even if we’d purchased AppleCare on the sytem, it would have expired six months ago.) We started limping along on a secondary machine by Thursday morning, and as of Sunday night, we’re fully up and running on it, although as usual we have to re-enter a bunch of serial numbers, fix some aliases, and so forth.

Since we purchased our production Power Mac G5 in late 2004, we’ve had no need to replace it, so the temporary machine we’ll be using marks the first time that MDJ and MWJ shall be produced on Intel machines. It also marks the first production under Leopard, and the first production using Microsoft Word 2008. We haven’t seen much of Word yet, since our first and primary task was to convert the handful of Visual Basic macros we use in production to AppleScripts, and some very subtle changes made that more complicated than we’d thought. We’ll talk more about that, either here or in MDJ and MWJ, later in the week.

We’d prefer to tear through a ton of material and publish around 8:00 AM CST, but unfortunately, the publisher is bugging out because he has an important doctor’s appointment at noon today that was scheduled months ago (and has been drafted to run other important errands as long as he’s out). We did take the first week of the year off, instead of Christmas week, because that’s just how it worked out for us. We were busy at work a week ago, until the machine died. Now we’re back on a smaller machine, an Intel iMac. It seems to work fine, but with some Rosetta programs still in our production flow, we wish it had a bit more RAM.

We had been working on some philosophical articles about issues that have been raised online in the past few weeks as if each was some kind of crisis, but in fact, none of them were any kind of crisis. Ironically, one of those was about obtaining repairs, so we’re a bit more up-to-date on that one now.

Those are not time-critical, so we’ll likely put them on hold and go straight into all the news from the Expo, and provide our pre-keynote gloss on some of the more prevalent rumors. If there’s not much news on Monday (hey, it could happen, right after President Gore cuts all funding to the EPA), we’ll finish up the stuff in the pipeline.

Either way, the plan is to publish that Tuesday morning, and take that plus the previous “Think Secret” coverage and put it in MWJ for Tuesday morning, then head straight into whatever happens at Moscone Center on Tuesday afternoon (our time). If the doctors find something wrong or there’s nuclear war or something, the schedule may get pushed out again, in which case we’ll likely go straight to keynote coverage to get that out Wednesday morning.

We had also planned to spend time last week on Apple’s Q4 FY2006 financial results, which happened before the publisher got his strength back. We’ll now postpone that until the coming weekend or early next week, to try to get all the financial news contained in a couple of issues for those of you who prefer to skip it (and to keep it isolated for those of you who pore over it in great detail).

That’s the current set of plans, since a few of you asked how the machine is doing. The machine is not doing well at all, but we’re now managing with a substitute while eagerly awaiting something bigger to run all our tools at once. It’s really quite astonishing how much RAM and disk space all these things take. Why, we remember when code segments couldn’t be larger than 32KB, and the whole system and applications loaded in 512KB, and…

OK, we’ll stop now. We’re old.

Let’s wrap up this ZFS thing, shall we?

Drew Thaler has responded to the response about ZFS, and now that we’re having an honest to goodness discussion about trade-offs, there’s a lot more where Thaler and MWJ agree than you might have imagined. We agree with Thaler on the five data points he lists as, well, points of agreement.

Yet although Thaler doesn’t like the point-by-point quote and response, it’s the clearest way to deal with misconceptions and misrepresentations, so we’ll stick by it for a little while longer. Let’s make even more clarity, shall we?

It would be an absolutely terrible idea to take people’s perfectly working HFS+ installations on existing computers and forcibly convert them to ZFS, chuckling evilly all the while. Not quite sure where that strawman came from.

It came from where this entire ZFS storyline came from—the reports in June that Apple was about to make ZFS the “default file system” in Leopard. That is, if you clicked “Erase and Install,” you’d wind up with a ZFS storage pool; that you’d be missing out on Leopard benefits unless you “switched” to ZFS, and so on.

This was not exactly a hidden story, then or now. The argument from MDJ and MWJ has never been that ZFS support is a bad idea—just that ZFS is a poor fit as a primary file system for today’s (10.4) and tomorrow’s (10.5) Mac OS X, for lots of reasons that were detailed in MWJ. Thaler obviously knew about it, because he called it a “weird and obviously fake rumor.” Lots of people didn’t find it so obviously fake. That’s why MWJ debunked it.

ZFS would be awfully nice for a small segment of the Mac OS X user base if it were ready today.

Absolutely, 100% true. Again, the argument is not that Mac OS X should not support ZFS, it’s that ZFS is a bad fit as Mac OS X’s default file system, which was, after all, the story. Every report since then, from AppleInsider’s leak-report on a v1.1 read-write preview for developers, to today’s analyst report, are in some way predicated on the concept that Mac OS X will not simply support ZFS, but rely upon it as primary storage in the very near future.

We think that unlikely for all the reasons stated.

ZFS — or something with all the features of ZFS — will be more than nice, it will be necessary for tomorrow’s Macintosh computers.

Uh…we’d rather state our point of agreement as “Tomorrow’s Macintosh computers will need improved file systems designed for much larger storage capacities and increaesd reliability.” ZFS is a good candidate for that, but it didn’t exist six years ago, and the best choice in six years may not exist today. If this is what Thaler means by “something with all the features of ZFS,” we might agree. We’re not sure it needs all the features of ZFS.

Then again, ZFS does not have all the features of HFS Plus (larger in-catalog file types, faster non-pathname access, and more efficient storage, for example).

Look, let’s be honest: most file systems are tailored for the OS architecture where they debuted. HFS and HFS Plus have lots of 32-bit quantities, even back when most processors were 16 bit, because the Mac processor architectures always had 32-bit registers. ZFS uses 64-bit and 128-bit quantities because it was designed for Solaris. While it’s extensible to “arbitrary” metadata, ZFS puts the data that Solaris needs directly in the catalog records or, at most, one level away so it can get to it very fast. The stuff that other operating systems might need gets shoved farther away.

Every engineering effort is about trade-offs. Many of the trade-offs in ZFS favor Solaris, just like many of the trade-offs in HFS Plus favor traditional Mac programming needs. It’s ridiculous to pretend otherwise, or to pretend that these trade-offs don’t make a difference for their respective operating systems.

Still think end-to-end data integrity isn’t worth it?

Wow, talk about strawmen—we never said end-to-end data integrity wasn’t “worth it,” whatever “it” may be. For some people, it very well may be. The error rate Thaler quotes is an error of one bit out of every 100,000,000,000,000 bits read. One out of every one hundred trillion bits, or as he calculates, one and a half errant bytes in a read of 150GB.

Is that “bad?” Sure—errors are always bad. How bad is it? That varies wildly. It might be a single bit error in an MP3 file, producing an audio blip that you’d never notice if you heard it a billion times. Or it might be a permission bit on the root of your filesystem, granting write permission it shouldn’t, which would be very bad indeed. If you’re recording live video in the field on a MacBook Pro using lossy compression, is the error rate of one byte every 100GB worth adding CPU time (and reducing battery life) to checksum every block?

We don’t know. We do, however, reject the notion that everyone would be willing to make that compromise with today’s hardware. We believe those who are should have the option. But, again, remember that much of the meta-discussion here is about ZFS becoming the default file system, as in “the file system on your MacBook’s internal hard drive by default.” That’s far less of a “choice” than using ZFS on an external volume.

Apple using ZFS rather than writing their own is a smart choice.

Re-inventing the wheel is usually a waste of time. Yet time and again throughout the decade of transition to Mac OS X, we’ve seen Apple discard technologies that work well for Mac users and programmers, and replace them with technologies designed for different programming conventions (i.e., filename extensions instead of true file types, tons of tiny files instead of resources, inflexible and fragile paths instead of flexible aliases, and so on).

The repeated, loud, insistent message from the open-source community is “stop doing what’s right for your platform and do it our way or we will scream and scream and scream until you do it our way.” It’s the entire command-line philosophy: make it easier on the programmers by making it harder on the users, and compensate by telling the users how stupid they are.

We decline to participate in this delusion.

The choices at this point are essentially twofold: (1) start completely from scratch, or (2) use ZFS. There’s really no point in starting over. ZFS has a usable license and has been under development for at least five years by now. By the time you started over and burned five years on catching up it would be too late.

See? Thaler creates another strawman by making our exact previous point, asserting there are only two choices: starting completely over, or using ZFS as it is. HFS Plus can’t be improved (even though its very existence shows that HFS could be improved, something that not many people saw coming a decade ago), ZFS ideas can’t be integrated into HFS Plus, nor can Apple invent yet another file system despite having outstanding file system talent ranging from the HFS Plus creators (though Deric is not in engineering anymore) to Dominic Giampolo, author of the BeFS. No, you’re told, it’s either the exact open-source solution or something never before seen.

ZFS is apparently so marvelous that, now that it’s been created, no one has the choice to eclipse it. Wow. As Thaler’s next paragraphs go on to say, in essence, ZFS is cool and HFS is not, and that’s apparently the end of any serious discussion. And that’s why so many people believe so many false things about ZFS. The entire discussion of filesystems is skewed, from the start, towards the default position that Macintosh-related filesystem constructs are somehow bad.

For example?

Another likely source of fatzaps in ZFS on Mac OS X is the resource fork. But with Classic gone, new Macs ship with virtually no resource forks on disk. There are none in the BSD subsystem. There are a handful in /System and /Library, mostly fonts. The biggest culprits are large old applications like Quicken and Microsoft Office. A quick measurement on my heavily-used one-year-old laptop shows that I have exactly 1877 resource forks out of 722210 files — that’s 0.2%, not 20%.

(Fun fact: The space that would be consumed by fatzap headers for these resource files comes out to just 235 MiB, or roughly six and a half Keyboard Software Updates. Again: not nothing, but hardly a crisis to scream about.)

And if it was wise to make decisions for 25,000,000 users based on Thaler’s laptop, that would be great. Our Power Macintosh G5 production system has non-zero resource forks on 76,060 files, or about 4.1% of the total number of files. They occupy about 2.1GB on disk using 4KB allocation blocks. Using a fatzap allocates another 128KB for each of those forks just for ZFS overhead, or an additional 9.3GB to store no additional information.

This system’s internal hard drive has 42GB of free space. “Switching” it to ZFS would eliminate 22% of our remaining free space to store no additional information. Thaler describes this as “negligible,” and “not nothing, but hardly a significant problem.” Your mileage may vary.

Seriously, folks—Solaris doesn’t use extended attributes much, so unless they’re tiny, each one requires 128KB of overhead. Thaler says:

Classic HFS attributes (FinderInfo, ExtendedFinderInfo, etc) are largely unnecessary and unused today because the Finder uses .DS_Store files instead. In the few cases where these attributes are set and used by legacy code, they should fit easily in a small number of microzaps.

Page 37 of the ZFS On-Disk Specification says that microzap objects can hold attributes only if all of the names are less than or equal to 50 characters (including NULL terminator byte) and if all of the values are 64-bit values. In essence, Thaler is saying that the Mac OS X implementation of ZFS would need to remap all accesses of FinderInfo and its siblings to be individual name-value extended attributes stored two ore more levels of indirection away from the catalog entry, even though they need to be read on just about every directory access.

Possible? Sure. Inefficient? Hard to say—with aggressive caching and fast hard drives, the OS might be able to mitigate the change, which would plainly require the drive to read at least three sectors from disk for every file instead of the one sector most entries take now. Fragmentation might actually become an issue. We know that you can’t assume there won’t be performance problems. Would they be worth the trade-offs? Probably, to some users today, to more users tomorrow. But not to everyone, and certainly not to everyone today.

There’s good news in this, though: the 128KB fatzap object can point to all of the attributes for a given object. If a given file has a resource fork and 200 extended attributes, it only needs one fatzap object, not 201 fatzap objects. We found only 300 files with extended named attributes on our production system’s drive, so if any of those also had resource forks (and it looks like they didn’t), they’d each only need one fatzap object to point to the extended attributes and a resource fork, which would itself actually be an “extended attribute” in ZFS terminology. Alas, none of the files had an extended attribute whose value was 64 bits or less, so they’d all require a fatzap.

ZFS snapshots don’t have to be wasteful.

No, of course they don’t. MWJ even pointed out that ZFS snapshots would be a far more efficient backup mechanism than Time Machine in Leopard, at least for large files.

That said, Thaler’s description of separating static and transient data is pretty much a pipe dream, unless you’re willing to change how you work to suit the computer (instead of the other way around).

Of course your “/Applications” folder is mostly static. So are most of the Unix folders like “/usr/bin“, your Fonts folders, and so on. ZFS snapshots affect an entire filesystem, so Thaler says the trick is to separate your storage into however many filesystems you need, all mounted at different places in the “/” hierarchy, so they can have different ZFS features:

Once the transient data is out of the picture, our snapshots will consist of 95% or more static data — which is not copied in any way — and a tiny percentage of dynamic data. And remember, the dynamic data is not even copied unless and until it changes. The net effect is very similar to doing an incremental backup of exactly and only the files you are working on. This is essentially a perfect local backup: no duplication except where it’s actually needed.

By transient, though, Thaler means what most Mac OS X users refer to as “temporary”—caches, temporary files, stuff that you can recreate or expect to be erased upon restart. If you shunt all that stuff off to a filesystem that has no snapshots, then creating a snapshot of what’s left would be more efficient. That’s true, and we grant that.

Snapshots are also an efficient way to store multiple backups. Think of a large E-mail database. During the course of the day, you’re likely going to receive lots of E-mail, and each new message changes the database. Time Machine would currently back up that entire database (let’s say it’s 700MB) every hour, because it changed every hour. A snapshot would only include the actual data blocks changed as you received E-mail. If you only received 250KB of E-mail during the day, the snapshot would only record 250KB worth of changes.

Even excluding transient files, though, you’re talking about a lot of data. Backing up full files, we currently back up about 6GB per day of files from our production system, just that had changed since the previous night, excluding the “transient” folders. If ZFS eliminated 90% of that as unchanging, it would still be about 600MB per day (or one CD-ROM), and that’s for only one copy. Thaler speaks of 12 hourly snapshots, seven daily snapshots, and four weekly ones. That’s going to take a fair amount of disk space in your storage pool, because that’s how snapshots work. It’s not like you can plug in an external drive, take a snapshot “to” that drive, and unplug it.

Thaler is right that if data doesn’t change, it doesn’t take up space in a snapshot—but then again, it doesn’t take up space in Time Machine, either. Except in the initial backup, of course, and to be safe you still need to make one of those with a ZFS storage pool, especially one that’s not mirrored. If a power surge or other non-software glitch kills your drive, it’s just as dead with or without checksums. Extra copies of the data on that same drive won’t really help you. (Note, of course, that Time Machine will also let you make this mistake, though it’s not quite as easy.)

Absent the full description of the copy-on-write mechanism used by snapshots in MWJ 2007.06.11, some people have misinterpreted our comments about snapshots not releasing disk space. As long as a data block is used by any snapshot, it cannot be reclaimed. So, for example, let’s assume that a new E-mail message changes one of the blocks in your E-mail database, on a filesystem that has active snapshots. The data block that just got replaced remains allocated, but it is now allocated to the snapshot and not to the database file itself.

That’s really a very cool way to do it. But, you must be aware that until you kill all of the snapshots that reference that block, it remains allocated on disk. If you delete your entire database file to “save disk space” and switch to a different E-mail program, you’re not saving any disk space, because the snapshots still have all of the blocks you deleted, and retain them until you destroy the snapshots.

Since snapshots describe an entire filesystem, you get the choice of using them to back up everything or nothing. If you’d rather only back up “/Applications” once per week, but “~/Documents” once per hour, you have to have them in separate file systems, mounted at the appropriate places in the hierarchy. Thaler says he doesn’t expect users to set this up, but that Apple could. He’s right — but if you want to change it, welcome to (at present) the world of ZFS command-line arcana.

Also, note that since the way to access a snapshot is to mount it (as a clone), the way to find backups of files is to mount a copy of your filesystem that is the snapshot, so you can go into it and find the file and copy it over to the live filesystem (or, if you want and can keep the paths straight, to read it directly from the snapshot). A good human interface can simplify this, of course, but that’s where it stands today.

So, yeah, there’s a lot of great stuff about snapshots. They don’t solve the world’s ills, and a lot of the push for ZFS pretty much assumes they do. Look at these entries in Thaler’s comments section:

The MWJ rebuttal claims “without RAID-Z, ZFS can only tell you that the data is bad.” This is not true.

ZFS has significant self-healing capabilities even when used on a single disk. Specifically, the filesystem’s uberblock and all metadata blocks are replicated. ZFS also allows file data to be replicated via ditto blocks. While it is possible that every copy of an block could be corrupted, this is extremely unlikely.

Well, yeah. All file systems allow blocks to be replicated. This is called “backing up.” Note that the words “ditto” or “replicate” do not appear anywhere in the ZFS On-Disk Specification, either, so it’s not like these are features automatically provided by every ZFS implementation. They could be, but if they were, wouldn’t the disk spec include descriptions of those replicated blocks so all implementations would do it the same way? (We may be missing it, but we’ve read the spec a few times now, and searched for keywords like “copy,” and we’re not finding it.)

Plus, the commenter doesn’t seem to realize that HFS Plus also keeps duplicate copies of key filesystem data in extra blocks. Not as much as he says ZFS does, but some. And if you’re arguing that ZFS replicates every block, that’s therefore indistinguishable from “mirroring” in any significant respect, including taking twice the hard disk space.

It’s great to have such options, but you don’t get to argue that ZFS really won’t eat a lot of disk space and then argue that keeping multiple copies of every block (and using a lot of disk space) answers the other objections. It’s all trade-offs, but you don’t get that from ZFS advocates.

I was really struck that MacJournal’s article sounded a lot like what everyone was saying when Apple was switching to NeXT’s OS: a lot of scare-mongering about how it’s the wrong fit, isn’t needed, and how what we’re used to now will be good enough in the future. ZFS is amazing, and if Apple can put a decent GUI on it we’d be fools to not want it. I can’t wait for the linux community to get their act together and replace ext3/4 with it for my servers.

In other words, “HFS Plus is old, ZFS is cool. Let’s tear everything up and replace it with ZFS! You’d be stupid to not want this open-source marvel.” Where, exactly, were those strawmen again?

1) replacing drives in pools: zpool replace and sliver/unsliver should do most of what they want.

We didn’t say replacing drives would be difficult. We said removing drives would be difficult, because it is, and we’re saying it because the ZFS advocates won’t. Look, for example, at this bit of “analysis” from Blackfriars’ Carl Howe in his ZFS lovefest today:

Want to add more storage? Just add another disk to the pool, and ZFS knows what to do. Want to replace a disk? Tell ZFS to remove it from the pool, and it clears it off for you.

OK, so suppose you have a computer with a 500GB internal hard drive and a 300GB external hard drive, and you’re using 700GB of storage space in a single pool between the two drives. If you truly want to replace the 300GB with a new one, you can—just connect the new drive, add it to the storage pool, and tell ZFS to clear stuff off the old one.

If you want to remove the external drive, well, you can’t. It’s not mirrored, and not even ZFS is cool enough to synthesize 200GB of storage space out of thin air to hold the data that would go missing if you removed the external drive without providing at least 200GB of storage space to replace it. Once you’ve added a drive to a ZFS storage pool, you’re stuck with it. If you have enough space on all drives to store what you’d remove from one drive, you can ask ZFS to replace or sliver/unsliver to free up the drive you want to remove. If you don’t, you’ve got to add more drives to be able to remove existing ones or your volume is damaged.

You can try telling us that people not intimately versed in filesystems won’t read that as “add or remove drives any time you like and it all just magically works,” but we’re not buying it. Adding magically works. Removing does not. This also means, by the way, that if your external drive fails, your entire system is compromised, including all of the on-disk snapshots on that drive. Not quite what most people wanting “next-century” storage had in mind, we’d bet. And there’s nothing wrong with that. That’s an entirely reasonable design trade-off for ZFS — but you never hear about it from ZFS advocates, who want to pretend that their file system is magic and infallible, and everything else is old and icky. It just doesn’t work that way.

snapshot size issues are identical with Apple’s TimeMachine, but there they are on a per-file basis, not a per-block basis.

In which case they’re not identical. (Time Machine snapshots are larger.)

Performance concerns are irrelevant.

Well, that resolves that.

The other thing that is rarely mentioned is that snapshots via copy-on-write are MUCH more efficient than the file-based snapshots of Time Machine for large files.

Well, we mentioned it: in MWJ 2007.06.11, in the first response to Thaler, and again here. It’s one of those design trade-offs. It’s not like ZFS has no advantages, it’s just that it’s not the solution to every problem for which it’s proffered.

Every time you boot your Windows or Linux VM, the whole virtual disk (Vista’s minimum is over 10GB) will be changed. With Time Machine, my understanding is all of that data will be backed up, even though 99% of the drive file is static. Under ZFS, only those few changed blocks will be backed up. Much more efficient and lightweight.

Yes, as we said. Though in this example, if your virtual disk is accessible to the OS as a mountable volume, Time Machine would only back up the files on the virtual volume that changed, at least if properly configured. If not, then yes, it would want to copy the entire 10GB virtual volume each time.

Can someone clue me in on who MWJ is or why anything they say matters to anyone? I think I was sick that day.

We’ve been ill much of the past year, so we’re sympathetic.

See here for information on MWJ, which has been publishing ad-free Macintosh analysis and commentary, relied upon by professionals around the world, for more than ten years. MWJ’s analysis has also been reprinted in Macworld, TidBITS, and MacAddict, among other places. MDJ and MWJ staffers have presented at WWDC as far back as 1990. You can check out free sample issues going back a decade, including a complete description of HFS Plus and why it evolved the way it did. There’s lots of free stuff on the site; please read and enjoy it.

In fact, we’ve only been able to do this publicly in the hopes of getting clear information out there—not that ZFS is “bad” or “wrong,” but that people need to be honest about the trade-offs involved both in its design and in what that would mean for Mac OS X users and applications. If you like what you’ve read, sign up for the “free trial” of MDJ or MWJ and see more of it as we can get it done. But those commitments require that this be the end of this for a while. Thanks for your understanding.

How to become a financial tech analyst in 4 easy steps

  1. Read rumor sites.
  2. Ignore dishonest marketing spin in rumors because neither you nor the rumormongers understand what it is.
  3. Repeat rumors as “insight.”
  4. Profit!

(A tip of the hat to The Mac Observer for catching it.)

Tsai-ing one on

Michael Tsai, of SpamSieve, DropDMG, and ATPM fame, gets it:

The problem is that, as MacJournals explains, ZFS (as it currently stands) can’t be a drop-in replacement for HFS+. It supports filenames only half as long. It’s case-sensitive rather than case-insensitive/case-preserving. It has inefficient storage for certain extended attributes.

…My guess is that Apple will eventually ship Macs that boot from some version of ZFS. Perhaps it will make changes to ZFS and try to get them into the official version. I don’t think this is an area where it makes sense to swim against the tide. Users may end up having to make a few concessions, but frankly I think using 128-character filenames instead of 256-character ones is a great tradeoff in return for ZFS’s data integrity features alone. (Use of super-long filenames is limited due to OS X’s PATH_MAX, anyway.)

The question is not whether or not Mac OS X should support ZFS. Of course it should, just as it should support NTFS (better than it does), FAT32, ISO-9660, sshfs, and any other file system that people want to use. It should be easier to add file system support than it is today.

There is similarly no question that ZFS offers advanced features that make sense for several larger storage pools today, and based on what we currently know about computing, probably will make sense for at least some portable computers in 5-10 years.

We have a big problem, however, with ZFS fans dishonestly asserting all of its “magic” properties without honestly discussing its limitations (the current inability to remove a hard drive from a computer unless it’s been mirrored, the current inability to boot, the inefficient storage of attributes not used daily in Solaris, the extra requirements for CPU power for scrubbing and encryption that would drain notebook batteries). Those are independent from structural differences (shorter filenames, case-sensitivity) that can cause compatibility problems for existing programs in addition to a changed user experience.

HFS Plus is not perfect, but in almost all online ZFS discussions, HFS Plus’s perceived shortcomings are amplified, usually without any specifics other than the vague insinuations that “it’s old” and “it’s not ZFS.” Meanwhile, ZFS’s deficiencies go without mention, either by fanboy blindness or by deliberate spin. See, for example, the insistence that ZFS can “fix errors” without adding that this requires mirroring. Without mirroring, ZFS has a better chance at fixing errors, and can almost always detect errors, but that’s not enough for ZFS fans—they insist on saying ZFS will fix errors, silently and in the background. Without mirroring, that’s simply false.

There was a time in 2000 when what was so recently and aptly called the “Nerd Feedback Loop” insisted that Mac OS X’s support for UFS meant HFS Plus was dead, that within a year or two everyone would be using the “superior” and “faster” UFS because it was better designed for Unix. Yet amazingly, when real people got to see the real trade-offs involved in each file system, most people stuck with HFS Plus.

ZFS is a great choice for some people, and will become a better choice over the next several years. There’s nothing wrong with that discussion, as long as it’s an honest one about the actual features and limits of the relevant filesystems. What we usually get instead is a Web-wide extension of Sun’s marketing department.

You don’t have to hate ZFS to know it’s wrong for you

In a well-considered and thought-out blog post, file system engineer Drew Thaler takes issue with our previous post blasting AppleInsider for, without understanding file systems, touting ZFS as the cure for all storage problems, real and imagined—the same kind of uninformed tripe that makes people who know nothing about file systems crave ZFS and think they’re somehow “cheated” by being “stuck” with HFS Plus.

Thaler knows his stuff, but in mistaking our disdain for ZFS rumors as “ZFS hate,” he minimizes the real and significant problems that this advanced file system would bring to today’s Macintosh computers. Of course, part of the problem is that the post is an abbreviated argument, not our entire case.

At the moment, subscribing to the free trial of MWJ gets you a full copy of MWJ 2007.06.11, with more than sixteen detailed pages on why ZFS is no replacement now or in the foreseeable future for HFS Plus. We stand by this article in its entirety.

We don’t hate ZFS. It’s a remarkable and advanced file system, with a lot of concepts that make plenty of sense for 20TB server storage. These same concepts make absolutely zero sense in today’s Macs. Let’s explore the complaints one by one, and look at HFS Plus.

ZFS is great specifically because it takes two things that modern computers tend to have a surplus of — CPU time and hard disk space — and borrows a bit of it in the name of data integrity and ease of use.

MWJ 2007.06.11:

Processes like background storage pool scrubbing are probably fine for stand-alone servers, but additional compression, checksumming, scrubbing, mirroring, fatzap tracing, and copy-on-write time could be a killer to someone trying to encode real-time video effects on a MacBook Pro. A new “default” file system that takes more disk space _and_ more CPU time to store the same data is not a win for a portable computer–and in case you hadn’t noticed, Apple’s MacBook sales continue to outpace desktop system sales by a margin that only grows by the quarter. As one staffer noted to [MWJ], Mac users already complain about CPU and disk performance attributed to Spotlight. Imagine adding that to everything.

Drew Thaler’s post:

HFS Plus administration is simple, but it’s also inflexible. It locks you into what I call the “big floppy” model of storage. This only gets more and more painful as disks get bigger and bigger. ZFS administration lets you do thing that HFS Plus can only dream of.

And later on:

How dumb do you need to be to willfully ignore the fact that the things that are bleeding-edge today will be commonplace tomorrow? Twenty gigabyte disks were massive server arrays ten years ago. Today I use a hard drive ten times bigger than that just to watch TV.

First, we didn’t say “20GB disks” were “massive server arrays.” We said “20TB”, or twenty terabytes. We’ll stand by that.

Second, we don’t think HFS Plus “dreams” of shorter battery life and requiring more disk space to store the same data. ZFS handles very large storage pools efficiently, in ways that standard block-based file systems can’t. At some point, that may become a great trade-off. Today, when people still stupidly “clean” their apps of alternate processor architecture code or foreign language support to save disk space, it would be a customer relations nightmare.

The fatzap storage structure requires an additional 128KB of disk space for any attribute that’s not part of the basic ZFS model, including most of those that Mac OS X software use. MWJ 2007.06.11 says:

A standard Mac OS X installation can easily include 600,000 separate files. If just 20% of them have attributes that require a fatzap object, ZFS would require 120,000 fatzap objects at 128KB each, or more than 15GB of additional disk space above what an HFS Plus disk would require for the same data storage.

Apple will wind up having sold more than seven million Macs in fiscal year 2007. Do you want to answer the phones and E-mail when 6.7 million of those owners want an explanation for why they had to give up 15GB or more of their hard disk to support a file system that geeks with blogs say will be really cool several years after you’ve upgraded to a completely new computer?

Yeah, today’s hard disks are 500GB each for a reasonable price. Just six years ago, a state of the art PowerBook G4 had a 20GB hard drive. How many people purchasing a 667MHz PowerBook G4 in 2001 would have been willing to give up 600MB or more of hard disk space to support a file system that might pay dividends six years later? We’d venture “not many.” And HFS Plus was already nearly four years old by that point.

As for the “big floppy” model, well, Thaler doesn’t explain what he means, but we’re guessing he means that since each disk can only be divided into so many allocation blocks, then as volumes get bigger, each allocation block gets bigger. This was the major flaw in HFS for its day—by the time disks started getting big, every allocation block was taking huge amounts of disk space, and on average, half of the last block of each file is wasted.

If that’s the problem, we think it’s pretty ridiculous to be arguing that ZFS can use smaller allocation blocks on larger file systems while, simultaneously, ignoring that the fatzap system requires spending 128KB for every arbitrary attribute not used by Solaris. Again, from MWJ 2007.06.11:

First, “128-bit file system” does not mean 128-bit block numbers–it means you need 128 bits to express the largest possible data size. Yet if you’re going to go to the theoretical, an HFS Plus volume can have 2^32 allocation blocks, each of which could theoretically hold as many as 2^32 bytes (that’s 2GB per allocation block), for a total of 2^64 bytes, or 16 exbibytes. According to Wikipedia, that’s the maximum size of any file system in ZFS anyway! “ZFS” has nothing to do with zettabytes, and the “128-bit file system” holds files exactly as big as HFS Plus can hold. (In fact, ZFS developer Jeff Bonwick admits that he picked the name “ZFS” first and then retrofitted it to stand for something, but now they say it doesn’t stand for anything since it really has nothing to do with zettabytes.)

This is what happens when people read slogans and mistake them for features. There’s a lot of innovation in ZFS, and it’s true that ZFS storage pools can literally hold one quintillion times more data than an HFS Plus file system can hold. If you’re absolutely convinced that the file system decisions you make today will be unchangeable in the next two decades, you probably have to care about that. No one else does.

Thaler says:

Sure, RAID-Z helps a lot by storing an error-correcting code. But even without RAID-Z, simply recognizing that the data is bad gets you well down the road to recovering from an error — depending on the exact nature of the problem, a simple retry loop can in fact get you the right data the second or third time. And as soon as you know there is a problem you can mark the block as bad and aggressively copy it elsewhere to preserve it. I suppose the author would prefer that the filesystem silently returned bad data?

No—the authors would prefer that Sun’s marketing department and ZFS fans stop deliberately conflating checksums with error recovery. Without RAID-Z, ZFS can only tell you that the data is bad. That is useful and significant, but remember what AppleInsider’s report actually said, quoting Sun’s marketing material:

“A scrub traverses the entire storage pool to read every copy of every block, validate it against its 256-bit checksum, and repair it if necessary,” the description reads. “All this happens while the storage pool is live and in use.”

ZFS can try to repair non-RAID data when it detects a bad checksum, and sometimes repeated reads can make that work. But the description and article say that ZFS does repair bad data without pointing out that to guarantee this functionality, you must use RAID to keep extra copies of the data. The description above needs big letters that say “attempts” and “sometimes,” and they’re almost always missing from ZFS advocacy.

Thaler’s opinions are well-founded, but we disagree with him on several important points. To wit:

  • “Hard disks … are building blocks that you can just drop in to add storage to your system. Partitioning, formatting, migrating data from old small drive to new big drive — these all go away.”

    True—but with current ZFS, you can never remove these building blocks. If you add an external hard drive to your MacBook’s ZFS pool, you must keep the external hard drive attached to the pool from then onward or else the filesystem is more or less destroyed. The only way around that is—wait for it—using RAID-Z so that the filesystems have at least one copy of the data.

    This is not rocket science, folks: if you split a file system across multiple physical devices, and then remove one of those devices, then a big chunk of your data is offline. No modern OS is designed around the idea that random blocks from a file may not be available. ZFS partisans tend to omit that you can’t remove devices from the storage pool unless all the data on said device is duplicated elsewhere in the storage pool. That means if you add a hard disk to your portable computer’s ZFS storage pool, you have to keep that hard disk attached to it from then onward, or destroy and recreate the entire filesystem to get rid of it, which may not be possible without at least temporary use of even more hard disks.

    This is a fine trade-off for what ZFS does, but in our opinion, that kind of pain is not worth it for the average Mac user. This seductive goal of just plugging in drives and using them “magically” looks a lot different when you realize what you have to go through to unplug them, something Thaler, like most ZFS partisans, never even mentions. (It’s not entirely clear that he’s thought about the implications of using ZFS on portable computers very much at all.)

  • Snapshots “can eliminate entire classes of problems,” serving as a system-wide trash can metaphor and preventing problems with losing data.

    Everyone loves the ZFS snapshots, but no one seems willing to point out that they eat disk space like crazy. Sun is fond of saying that snapshots take up no disk space, but that’s only true when they’re created, just like an empty file. Again, MWJ 2007.06.11:

    Since ZFS writes new copies of data and leaves the old data alone and “orphaned,” then if you think about it, the storage pool now has both the old and new versions of the same file. Eventually, ZFS will reallocate the older blocks and re-use them, but the ZFS developers realized it doesn’t have to do that. ZFS supports the concept of a snapshot—an exact copy of an entire file system at the moment in time when you create the snapshot (“take the picture,” if you will). A snapshot is just another ZFS object, and when you create it, it indeed takes up almost no storage whatsoever.

    How can that be? At the moment you create a snapshot, its contents are exactly the same as the live file system. There’s no difference, so it takes no storage space to record the differences. As you continue to modify data on the file system, ZFS uses copy-on-write to allocate new storage blocks for the modified data, as you just read. When the new data is confirmed written to disk, ZFS normally frees the blocks held by the old data. However, if any snapshots are active, ZFS simply reassigns the old storage blocks to the snapshot instead of marking it as free space. A snapshot, therefore, holds the difference between the live data and the state of the data at the time you created the snapshot. It’s extremely clever and a great feature.

    It’s not “free,” though. Keeping a snapshot of a file system means that deleting a file does not actually free any storage space on the disks in the pool—ZFS has to keep all those data blocks around as part of the snapshot once you “delete” the file. In fact, deleting a file can actually result in ZFS using more disk space than before you deleted it, because in some cases, it has to write a new copy of the file’s directory so the snapshot can keep the old copy.

    (You may recall that, in theory, deleting a file on an HFS or HFS Plus disk can also consume more disk space because the OS has to re-balance the directory tree, and that may require more storage space in some cases–but in HFS and HFS Plus, deleting a file always frees up at least one allocation block, provided the file contains at least one byte of data, so it’s all but impossible for it to result in a net loss of disk space. In ZFS, with snapshots, deleting files consumes more disk space as a matter of course.)

    Later in the same issue:

    The biggest factor against ZFS as a primary Mac OS X file system is that it eats disk space like Philadelphia eats cheesesteaks. If you have even one snapshot in your storage pool, the storage pool will never use less disk space than when you created the snapshot. It will keep all the data from that snapshot, as well as the current version of the filesystem. If you take snapshots every hour, as Time Machine is reported to do, you’ll fill up your drive fast.

    You’ll also still take the performance hit to copy the data to an external backup drive unless the OS somehow makes your backup and your primary drive part of the same storage pool—and if it does that, you won’t be able to run the computer without the backup drive. That’s likely not what MacBook Pro owners had in mind. Also remember that the self-healing capabilities of ZFS come from mirrors in the storage pool—unless you add enough drives to use mirrors, then you can still lose data, and you can’t remove non-mirrored drives from the storage pool at this time.

    [MWJ] should note that ZFS on an external drive would have one Time Machine advantage: through copy-on-write and snapshots, ZFS would make it easier for Time Machine to keep multiple backups of large files. If you changed 8KB in the middle of a 2GB database, an HFS Plus-based Time Machine would want to copy the entire 2GB database again. A version implemented on top of ZFS might use snapshots to record only the 8KB that changed. While this should not be minimized, [MWJ] also notes that you’d need your Time Machine backup volume to be a separate ZFS storage pool from your boot drive—you can’t merge them into a single storage pool for “free” snapshots or else it’s not a backup drive.

    When Thaler says snapshots are “so cheap you wouldn’t possibly notice,” he is either being naive or disingenuous. Installing Leopard, for example, would take the full 5GB for Leopard’s estimated storage (not including the extra 15GB it might need thanks to ZFS’s inefficient fatzap storage for single non-Solaris attributes), on top of all the disk space you currently use for Tiger. Even if you delete a file, the disk space is never reclaimed. To deny that this would eat disk space like crazy is simply incomprehensible.

  • ZFS is “designed to support Unix” with all its “subtleties,” while HFS Plus “was not really designed for that purpose.”

    It was designed with those concepts in mind, though. The problem with this statement is that ZFS was designed to support Solaris, and has some difficulties with file system attributes that Solaris doesn’t use. Again, MWJ 2007.06.11:

    [Extended attributes are] not a huge problem for Solaris—the original home of ZFS—because there, extended attributes are rare. Few files have them, and even fewer see them read or written frequently. In fact, although HFS Plus stores ACLs in extended attributes because it’s relatively fast and painless to access them, ZFS stores ACLs one level higher than attributes—at the same level where you find the pointer to the ZAP object for the file’s extended attribute—because opening the extended attributes ZAP object, reading the names, and finding the values in the fatzap would be far too slow for something that has to happen every time someone accesses the file.

    This unwieldy fatzap structure is what Mac OS X would have to use for every piece of file metadata that didn’t fit into the Unix metadata model, meaning there wasn’t already space allocated for it in the file’s catalog entry. File type and creator type can fit in a microzap object, but a resource fork cannot. Neither can the Finder Info structure associated with each HFS Plus catalog entry, or some extended attributes added by Apple and third-party developers. That’s because Apple told developers to name their attributes with reverse domain name syntax, as in “com.apple.filesystem.attribute”. Any such attribute names longer than 50 characters blow away the microzap rules and require allocating a fatzap object for the file.

    Each and every fatzap object begins with a 128KB header.

  • ZFS is “actually used by someone besides Apple,” something that starts a “virtuous circle” of more use and more support.

    This is a thinly-disguised version of the same thing command-line partisans have been spewing for two decades: things Apple invents must be bad, but open-source solutions to the same problems must be good. People who use command-line-based operating systems have been trying to kill HFS and its descendants since they day they appeared because they use extended metadata and other attributes that are hard to access from the command line. We’re now stuck with incredibly stupid, unwieldy filename extensions that describe a file’s data type in its filename because to do otherwise would prevent Unix fans from typing things like "*.jpg" and instead make them conform to the world that non-Unix people live in.

    In the comments to his post, Thaler repeats this error when he opines that case-insensitivity is bad for a file system because “it slows down performance, drags huge text-encoding tables into the kernel, creates heinous and subtle encoding problems, and reinfornces non-portable bad habits.”

    Well, boo-frickin-hoo. The point of the operating system is to make the computer easier for the customer to use, not for the programmer to maintain. Writing easy code and pushing the learning curve onto the user is how we got command-line systems in the first place. The very notion that people should be forced to learn case-sensitive rules on the Mac after more than two decades of case-insensitivity—really, that making life a little easier for three programmers is worth inconveniencing 25,000,000 users forever—is exactly why people don’t use those tools, no matter how loudly their implementors scream about what uneducated idiots everyone else is for not doing things the hard way.

    Thaler contends that case-insensitivity could be enforced in the human interface in the “Save” dialog boxes and in the Finder, “which are just about the only two places you actually need it.” And in the custom Adobe file boxes. And in Path Finder. And File Buddy. And in Terminal, unless you’re going to allow people to create case-sensitive filenames that duplicate case-insensitive ones that they then couldn’t access in any other way. And in AppleScript, and in URLs, and everywhere you might actually type a pathname or filename. Thaler’s idea of “correct” operating system design is to take the code to make sure that “File.txt” doesn’t exist in a directory before you write “file.txt” to that directory and move it into dozens, if not hundreds of applications, to prevent the operating system from having to do it once. This kind of push-the-problem-onto-others thinking is exactly what’s wrong with so much of modern OS design.

    Besides, HFS Plus is open-source as well as part of the Darwin project, and has been for five years. The “virtuous circle” could start just as easily by other people adopting it in their operating systems, but ZFS partisans don’t want to soil their machines with that. Maybe it uses too little disk space for them to take it seriously.

Thaler says, “But really, seriously, dude. The kid is cool. Don’t be like that. Don’t be a ZFS hater.” No one who reads our analysis in MWJ 2007.06.11 could possibly accuse us of hating ZFS. It’s a great file system for what it does. HFS Plus could really benefit from sparse files (although Thaler calling that a “modern filesystem concept” when it was present in Apple SOS/ProDOS in 1983 is a bit funny), copy-on-write, I/O sorting, and other stuff. Multiple prefetch streams would kind of require Mac OS X to get rid of the kernel funnel that prohibits more than one I/O transaction at a time, but that’s bound to happen sooner or later anyway.

Despite the fact that they only recently got any OS to boot from ZFS, it’s a fine file system for Solaris, and for big servers, and for big disks. It is not, in any significant way, suitable for today’s Macintosh systems, and punishing people today for how cheap hardware may be on their next computer is just stupid. As MWJ said:

Again, [MWJ] is not suggesting that these problems [making ZFS unsuitable for a sole Mac OS X filesystem] can’t be fixed—merely that it would be extremely expensive to deliver something that doesn’t offer current Mac OS X users much advantage, but creates lots of headaches. Would you like to be the Apple Genius who has to explain why a customer’s new US$129 operating system upgrade doesn’t let him recover any disk space when he deletes files? And that it’s supposed to work that way? ZFS would make a great addition for Xserve RAID and Mac OS X Server, and there’s little reason it shouldn’t be in Mac OS X itself for the discophiles who can’t wait to use it—but there’s even less reason why it should be the default file system for anything smaller than four drives and 2TB.

Don’t be a ZFS hater? Don’t be a Mac OS X hater, trying to punish people with tons of complications and worse performance on most Macs for advantages they won’t possibly notice for years. When the file system makes things harder yet provides most users with no benefits they can’t get at much lower cost today, it’s a bad idea. No matter how many geeks love it.

Correcting editing errors in AppleInsider’s ZFS article

Due to space limitations, AppleInsider’s latest ZFS article was published with several missing phrases and explanations. As a service to the community, we are happy to fill in the gaps in the story as published.

ZFS to play larger role in future versions of Mac OS X

Since Apple has stated publicly that Mac OS X 10.5 “Leopard” will include a read-only implemenation of ZFS, and since the current version of Mac OS X has no support for ZFS, this headline is unquestionably true. The rest of the article is not so lucky.

Sun Microsystems’ relatively new ZFS filesystem will see rudimentary support under the soon-to-be released Mac OS X Leopard, but will eventually play a much larger role in future versions of the Apple operating system, AppleInsider has been told.

…by Apple’s engineering management? By file system experts?

People familiar with the matter reveal that Apple on Wednesday provided developers with “ZFS on Mac OS X Preview 1.1″ and associated documentation, in which the company asserted that it alone was responsible for porting the filesystem to Mac OS X.

Ah. The first paragraph should have ended with the text “…by people who leak pre-release code and know nothing about ZFS.” AppleInsider apologizes for the misunderstanding.

The Cupertino-based firm also officially confirmed to developers receiving the pre-release software that Mac OS X 10.5 Leopard — due out later this month — will officially support ZFS, albeit restricted to a read-only implementation with which no ZFS pools or filesystems can be modified.

“…as Apple stated publicly in June, as reported by our own corporate sister site, and therefore did not need confirmation, which we’re not supplying anyway since this is all third-hand rumor.”

Developers receiving the latest ZFS preview, however, are granted access to full read and write capabilities under Leopard, including the ability to create and destruct ZFS pools and filesystems.

“…where by ‘destruct,’ we mean ‘destroy,’ or would if we knew anything about ZFS. Write capabilities are no big secret, since the entire file system is open source, but it sounds a lot better if we portray it as something mystical.”

The developer release, those people familiar with the matter say, is a telltale sign that Apple plans further adoption of ZFS under Mac OS X as the operating system matures. …

“…unlike most Apple technologies, which, as far as ‘people familiar with the matter’ apparently think, are never advanced after initial release.”

… It’s further believed that ZFS is a candidate to eventually succeed HFS+ as the default operating system for Mac OS X — an unfulfilled claim already made in regard to Leopard by Sun’s chief executive Jonathan Schwartz back in June.

“This claim, by the way, is widely believed by Johnathan Schwartz, cultists who assume that Apple’s advanced file system must be bad but Sun’s must be good, and people who know so little about HFS Plus and ZFS that they can’t do more than recite Sun’s marketing propaganda. For those keeping score at home, we at AppleInsider fall into the latter group.”

Unlike Apple’s progression from HFS to HFS+, ZFS is not an incremental improvement to existing technology, but rather a fundamentally new approach to data management. …

“… or, at least, that’s what Sun’s marketing department tells us, and we’re far too ignorant to figure out that using significantly more disk space and processor time to store the same data is not ‘fundamentally new.’”

It aims to provide simple administration, transactional semantics, end-to-end data integrity, and immense scalability.

“We don’t find HFS Plus administration to be complex, and we can’t tell you what those other things mean, but they sound really cool, and therefore we want them. On the magic unlocked iPhone. For free.”

According to Sun’s description of ZFS, the filesystem offers a pooled storage model that completely eliminates the concept of volumes and the associated problems of partitions, provisioning, wasted bandwidth and stranded storage. Thousands of filesystems can draw from a common storage pool, each one consuming only as much space as it actually needs. Therefore, Sun says, the combined I/O bandwidth of all devices in the pool is available to all filesystems at all times.

“We like how this somehow implies that hard disks can suddenly read and write for multiple clients at once, but we apparently aren’t aware that this is not true, and you’re still limited to the speed of your bus (or RAID controller) and your devices. Plus, ZFS can be a lot slower because it imposes tons of overhead on the kind of tiny files that Mac OS X uses by the thousands, but maybe magic hard disks will fix that, as far as we know.”

In addition, ZFS provides a feature called “disk scrubbing,” which is similar to ECC memory scrubbing; it reads all data to detect latent errors in the file system while they’re still correctable.

“… but can only correct them if you’re using ZFS’s RAID-like capability to store duplicate copies of information, in which case a standard RAID system could fix the error too. Oops.”

“A scrub traverses the entire storage pool to read every copy of every block, validate it against its 256-bit checksum, and repair it if necessary,” the description reads. “All this happens while the storage pool is live and in use.”

“It can only correct the error if you’re using RAID and have a good copy of the bad data, but leaving that out makes you want ZFS a lot more. So does leaving out its battery-chomping, disk-eating storage hog nature that makes it fantastic for 20TB disk arrays and entirely, completely unsuitable for a Mac OS X startup disk, now or in the foreseeable future.”

A more comprehensive description of ZFS, along with several other features it offers, is available on Sun’s OpenSolaris website.

“It didn’t make us understand anything, but it repeats all these same phrases so it’ll make you think we know what we’re talking about, even though we don’t.”

MDJ on Ringtones: reprinted at Macworld.com

Coverage from MDJ 2007.09.15 on copyright and ringtones has just gone live at Macworld.com, if you’d like to refer your friends to it. All MDJ and MWJ paid subscribers already have access to this, though—it was in MDJ 2007.09.15, and was made available immediately in the MWJ RSS feeds for MWJ subscribers due to the current publication delays. If you’re a paid subscriber and haven’t read it, grab a copy from the RSS feed; the ringtone analysis is only about half of the full issue. Another one is on the drafting board, too.

New RSS feed testing!

If you’re slightly adventurous and would like to test the new and improved MDJ and MWJ RSS feeds, here they are:

We’ve implemented a bunch of changes to try to make issues more accessible to subscribers (and, in enlightened self-interest, reduce the amount of time we have to spend getting them to you if E-mail fails):

  • The feeds are regular unsecured http feeds, which should solve the problems with numerous non-NetNewsWire newsreaders either refusing to read the feeds or not updating them properly. There’s absolutely no reason the old way shouldn’t work, but the fact is that in a lot of readers, it doesn’t.

  • The issues themselves, however, are still for subscribers only. The links go to the https secure site and require your user ID and password, but every reader we’ve tested can handle that (handing off to your browser if necessary). You can even read PDF issues on the iPhone!

  • Each issue’s entry includes links to the issue in both PDF and setext format, and the PDF is available both uncompressed and as a ZIP archive. This lets you retrieve any version at any time, even read the PDF version on the iPhone. The enclosure element still refers to the ZIP file – each item can have only one enclosure, and for compatibility, we’ve kept that as the ZIP-compressed PDF version. If you set your newsreader not to download enclosures automatically, you can choose which format you want on an issue-by-issue basis.

This is the first time we’ve made uncompressed PDF and setext versions available other than by E-mail, and we know a lot of you have asked for it over the years, so we’re pleased to roll it out. If you have any problems with the new feeds, please let us know through the normal channels.

The previous RSS feeds are still there, and will be maintained at least until the PDF-in-E-mail changes noted earlier are implemented. Once everything is debugged, we’ll adjust the old feeds to permanently redirect to the new ones, so those who don’t keep up won’t have to do anything – your newsreader will automagically pick up the new feed at the new place from then on.

(Or should, at least. We’ve learned that what newsreaders do and what they’re supposed to do are often two different things. But we remain optimistic.)

MDJ on iPhone is really cool

Well, we think so, at least. We can’t really take sharp pictures of it, but try mailing yourself a PDF issue of MDJ or MWJ, making sure you don’t compress it with Zip or StuffIt or anything else. Then read the message on iPhone – at the bottom of the page, you’ll see an attachments button that lets you open and read the PDF right there on the phone, with the proper font rendering and everything.

Frankly, this surprises us, but we’re not complaining. OK, we’re complaining about two things:

  1. iPhone’s PDF reader does not recognize hyperlinks within a PDF document. You can see that something is a link from the blue text, but tapping it does nothing.

  2. Even though MDJ is presented in two columns, iPhone’s double-tap-zoom metaphor does nothing but zoom the full page to fit the iPhone’s screen. We even tried testing an older issue of MDJ that had PDF articles defined for the text, allowing Acrobat and Acrobat Reader to follow stories across columns and pages automaticcally. No dice – iPhone’s PDF viewer knows nothing about them, so they don’t provide any advantages.

    (Ironically, we stopped including the “article” features in MDJ and MWJ in 2002 with the new design because Adobe InDesign has no way to generate them from columns and text frames on the page. InDesign has its own PDF export that does a good job in many areas, but this has been a glaring omission since version 1.0.)

We’ve always compressed MDJ and MWJ issues for delivery for a few reasons:

  • The #1 error we used to get in delivery was “mailbox full,” so naturally we want the issues to be as small as possible.

  • In the days of Mac OS 9, compression was necessary to include HFS metadata, like the file type 'PDF ' and the creator type 'CARO', necessary to allow double-clicking the file to open Acrobat.

  • When we started this 11 years ago, most people didn’t have broadband services, and those outside the US were slower than those here. Downloading big files could take a long time.

We’ve long considered ditching the compression and sending the file as MIME type "application/pdf" because Mac OS X’s “Mail” application can display uncompressed PDFs inline, but that would have left people who want compressed files without options, on top of rewriting our software to do the new thing.</P.

But now, since mail is so spammily broken to begin with, we have ZIP-compressed RSS feeds for people who want compressed files, and Apple continues to improve the experience for people using its products if we mail uncompressed PDF files. The RSS feeds are irrelevant to the iPhone – it redirects display of any RSS URL to Apple’s “reader.mac.com” Web application, but reader.mac.com cannot access or display secure RSS feeds like ours, so at present, you can’t view MDJ or MWJ RSS on the iPhone.

Therefore, starting around 1 August (2007.08.01), we’re going to change our delivery system to mail PDF versions of MDJ and MWJ without compression, as MIME type "application/pdf", encoded with Base-64. We’ll also add a new “no E-mail” type of subscription for people who prefer compressed files in RSS – when a new issue is published, we won’t send you E-mail at all, just let you find it in the RSS feed. (After all, if you want compressed files, it makes no sense to mail you an uncompressed PDF or setext version that you don’t want. If you still want those, they’re still available, on top of the RSS feeds available to all subscribers.)

We’ll announce this in MDJ and MWJ also, but since the “StuffIt file in a Binhex wrapper” format of MDJ and MWJ PDF delivery hasn’t really changed in over a decade (except to move to the “newer” StuffIt 5 archives in the late 1990s), we thought we’d give some of you a heads-up in case you have mail filters, automatic processing, or anything similar. Most of you won’t notice any difference except that PDF issues won’t need to be decompressed before viewing. In most modern mail applications, you’ll see an enclosure icon that opens the issue with a single click – and in iPhone, you can tap the enclosure to read it. The coolness factor there isn’t going to wear off for a while around here.