Wayback Machine and Google
Archive Billions of Pages,
Including Deleted Ones
By DAVID KESMODEL
THE WALL STREET JOURNAL ONLINE
Earlier this year, executives at Dell Inc. tried to shut down DellComputersSuck.com, a Web site promoting an obscure brand of computers. Dell's lawyers dispatched a stern letter, and within a few days, the site's owner revamped it into an online discussion group about computers. The old version disappeared from view.
The PC giant still wanted to seize the address, a move permitted under rules governing the use of domain names. But Dell had to prove to an arbitration panel it had been used in "bad faith." So Dell's legal team turned to the Wayback Machine, a massive archive of Web pages dating back nine years. There, Dell found copies of the deleted site and was able to prove that its owner, Innervision Web Solutions, had used it to redirect consumers to another Web address selling PCs with names such as ZMachinez and Jetbook. In May, an arbitration panel ordered the domain name be transferred to Dell.
The Web, seemingly one of the most ephemeral of media, is instead starting to leave permanent records. Through the Wayback Machine, and similar services offered by companies such as Google Inc., it's now easy to retrieve all kinds of online material, from defunct Web pages to old versions of sites. While these databases have caught on among historians and scholars, they are proving particularly enticing for lawyers.
At some law firms, litigators now ask researchers, "can you do a Wayback on that?" The archives are most attractive to specialists in intellectual-property law -- in particular, areas such as domain-name battles -- and have been used by companies as diverse as EchoStar Communications Corp. and Playboy Enterprises Inc. In February, recovered pages prompted a mistrial in a prominent murder case in Canada.
The archive tools provide lawyers with a quick and inexpensive way to unearth evidence that otherwise might not be available. Lawyers have always been able to seek copies of old Web pages in a pretrial phase known as discovery. But some parties might not save every version of their Web sites and others might routinely get rid of stored pages. Meanwhile, in domain-name disputes handled by arbitrators, there's no discovery process.
Allison McDade, counsel for trademarks and copyrights at Dell, of Round Rock, Texas, says the company frequently uses the Wayback Machine and other computerized tools to protect its trademarks online, as it did in its dispute with Innervision.
"That's the only thing they had on me," Edward Ziejka, Innervision's owner, says of the Wayback Machine. He says he registered DellComputersSuck.com with the intention of creating a complaint forum for PC consumers but never got around to the task. He says he redirected it to his own site "as a joke" and forgot about it until Dell came calling.
The Wayback Machine (www.waybackmachine.org) is run by the Internet Archive, a nonprofit group started in 1996 to build a massive digital repository of cultural artifacts, including old TV shows, books and live music recordings. Named for the time-travel device in the "Rocky and Bullwinkle" cartoons, the free service searches for specific Web addresses and pulls up multiple versions, sometimes dating back years. The Wayback Machine has archived 40 billion Web pages using computer programs, known as "bots," that crawl the Internet and make electronic copies of information they come across.
Google's system, known as Google Cache -- "cache" is a computer term for a place where information is stored -- works in a similar way, although its archive is less extensive. On Google's results page, users can click on a link to see how sites look whenever Google last indexed them, something it does often.
Hacking Paris Hilton
Earlier this year, savvy Web users were able to view contact details from a hacked cellphone belonging to heiress and party girl Paris Hilton, which featured telephone numbers for celebrities including musician Christina Aguilera and actor Vin Diesel. The information was available in Google Cache several days after the U.S. Secret Service forced sites to delete the information. A spokesman for the Secret Service, which has authority to investigate computer-related fraud, declines to comment. A Google spokesman also declines to comment.
Neither archive is exhaustive. Individual Web-site operators can ask the Wayback Machine and Google to remove pages. Both services say they'll comply if the person making the request demonstrates they have authority over the Web site in question. In the wake of the Sept. 11, 2001, terrorist attacks, for example, the Nuclear Regulatory Commission asked Google to take certain Web pages out of its cache.
Requests from third parties to remove information are generally denied. The Wayback Machine makes exceptions in certain circumstances, for example if the Web pages contain personal information provided in confidence, such as medical data.
In addition, Web-site operators can prevent material from remaining in the public domain by using a piece of computer code, known as a robots.txt file, which stops bots belonging to the Wayback Machine and regular search engines from copying pages.
One of the earliest cases to feature the Wayback Machine involved EchoStar's Dish Network. Telewizja Polska USA Inc., the exclusive provider in the U.S. of Polish-language channel TV Polonia, sued EchoStar in 2002 for breach of contract, among other claims, in the Northern District of Illinois. The company contended that Dish Network had pitched itself using the TV Polonia brand even though a marketing agreement between them had expired. EchoStar shot back in court motions with evidence culled from the Wayback Machine. It showed how Telewizja Polska's Web site had in the past touted its connection to the Dish Network, also after the agreement had expired.
Telewizja Polska argued in filings that the exhibit shouldn't be allowed because it hadn't been properly authenticated. A representative of the Internet Archive signed an affidavit attesting to the authenticity of the Web-page copies, and in a pretrial ruling last fall, the judge said the evidence could be allowed. The case hasn't yet gone to trial.
Attorneys at Playboy use the Wayback Machine "once every month or couple of months," says Anamaria E. Cashman, Playboy's senior intellectual-property counsel. Chicago-based Playboy keeps careful watch of the Web to ensure that sites don't illegally use the magazine's trademark bunny, or other images, even if they have since removed the information. "We use it to verify the various excuses that the infringers give us," Ms. Cashman says.
In 2003, Ms. Cashman says, the company cited the Wayback Machine during a court hearing to prove that a defendant used the term "sex court" on his Web site only after Playboy aired a TV show with the same name. In his defense, the site operator asserted he had been using the name months before. The case was settled midtrial.
One of the Wayback Machine's most popular uses is in adjudicating cases of cyber-squatting, a tactic in which people register Web names associated with famous brands, either to piggyback on their fame or make a quick buck by selling the domain name back to its owner.
British cellphone giant Vodafone Group PLC won a case in 2001, the first year the Wayback Machine could be used by the public, against the owner of the domain name Vodaphone.com. The cellphone company said the domain owner had made a thinly veiled request to be compensated for giving up her rights to the Web address. During the arbitration proceedings, the owner denied any intention of selling the name. Instead, she said she was offering a service to help confused Web surfers.
The panel, which used the Wayback Machine to see how Vodaphone.com had been used in the past, determined that she "intended to misleadingly attract consumers" and had no legitimate purpose in operating the site. As a result, the name was transferred to Vodafone.
Lawyers handling cyber-squatting cases now use the Wayback Machine as a matter of course. "It's becoming almost an automatic," says Rich Peirce, a lawyer who specializes in trademark and domain-name litigation at Ballard Spahr Andrews & Ingersoll LLP in Philadelphia. "A lot of times, [someone who registers a domain name] may not even know of the existence of this tool, so you may in fact be able to catch them in fabrication."
As the Wayback Machine catches on, it is stirring new kinds of legal disputes. A Philadelphia company, Healthcare Advocates Inc., says it used a robots.txt file to block access to older versions of its site. When a law firm used the Wayback Machine to nonetheless access old material from the site, Healthcare Advocates sued, alleging computer fraud and violation of federal copyright law. In its suit, the health-care firm contends the law firm "intentionally circumvented" the robot.txt's blocking mechanism by making repeated search requests. Healthcare Advocates is embroiled in a trademark dispute with a client of the law firm, Harding Earley Follmer & Frailey.
Healthcare Advocates also sued the Internet Archive for breach of contract. It alleges the Internet Archive failed, contrary to an agreement, to block public access to the archived Web pages.
John Earley, an attorney at Harding Earley, calls the suit "meritless" and says the firm's searches were routine. "There was no hacking" of the site, he says. "They are trying to paint a picture that we somehow did something out of the ordinary, and that is not the case." Brewster Kahle, the director of the Internet Archive, declines to comment, says his assistant, Beatrice Murch.
Murder in Canada
Archive tools played a pivotal role in a February trial of three teenagers accused of murdering a 12-year-old Toronto boy. The prosecution's star witness was a teenage girl who had taped a phone call in which one of the accused bragged about the murder plan before it was committed. She testified that she found his obsessions with blood and gore immature. The defense argued that the boy was only trying to impress her.
After the case was handed to the jury, a reporter for Canada's National Post reported that the girl had posted comments on a Web site for vampire enthusiasts in which she said her "likes" included blood, pain, drugs and knives. The postings had been removed from the original site and the reporter found them using Google Cache and the Wayback Machine. The report prompted the judge to declare a mistrial on the grounds that the witness's credibility had been damaged.
Dennis Lenzin, an attorney for another of the defendants, says the defense "was completely taken aback" by the postings. He says he researched the girl online using only Yahoo Inc.'s search engine and didn't know of the archives' existence. The case "is a warning shot" for all lawyers, he says.
A spokesman for the Ministry of the Attorney General for Ontario declines to comment on the grounds that the case is going to be retried.
The issue of whether archived Web sites are admissible as evidence hasn't been widely tested in criminal cases, where the burden of proof is higher than in civil court, says Ken Strutin, director of legal information for the New York State Defenders Association, a group for criminal-defense lawyers. Lawyers seeking to submit evidence from a cache system or the Wayback Machine would face "a much more rigorous standard," he says.
Subscribe now to The Wall Street Journal and get up to 8 weeks FREE