counter create hit Uncharted: Big Data and an Emerging Science of Human History - Download Free eBook
Ads Banner
Hot Best Seller

Uncharted: Big Data and an Emerging Science of Human History

Availability: Ready to download

One of the greatest untapped resources of today isn’t offshore oil or natural gas—it’s data. Gigabytes, exabytes (that’s one quintillion bytes) of data are sitting on servers across the world. So how can we start to access this explosion of information, this “big data,” and what can it tell us?   Erez Aiden and Jean-Baptiste Michel are two young scientists at Harvard who s One of the greatest untapped resources of today isn’t offshore oil or natural gas—it’s data. Gigabytes, exabytes (that’s one quintillion bytes) of data are sitting on servers across the world. So how can we start to access this explosion of information, this “big data,” and what can it tell us?   Erez Aiden and Jean-Baptiste Michel are two young scientists at Harvard who started to ask those questions. They teamed up with Google to create the Ngram Viewer, a Web-based tool that can chart words throughout the massive Google Books archive, sifting through billions of words to find fascinating cultural trends. On the day that the Ngram Viewer debuted in 2010, more than one million queries were run through it.   On the front lines of Big Data, Aiden and Michel realized that this big dataset—the Google Books archive that contains remarkable information on the human experience—had huge implications for looking at our shared human history. The tool they developed to delve into the data has enabled researchers to track how our language has evolved over time, how art has been censored, how fame can grow and fade, how nations trend toward war. How we remember and how we forget. And ultimately, how Big Data is changing the game for the sciences, humanities, politics, business, and our culture.


Compare
Ads Banner

One of the greatest untapped resources of today isn’t offshore oil or natural gas—it’s data. Gigabytes, exabytes (that’s one quintillion bytes) of data are sitting on servers across the world. So how can we start to access this explosion of information, this “big data,” and what can it tell us?   Erez Aiden and Jean-Baptiste Michel are two young scientists at Harvard who s One of the greatest untapped resources of today isn’t offshore oil or natural gas—it’s data. Gigabytes, exabytes (that’s one quintillion bytes) of data are sitting on servers across the world. So how can we start to access this explosion of information, this “big data,” and what can it tell us?   Erez Aiden and Jean-Baptiste Michel are two young scientists at Harvard who started to ask those questions. They teamed up with Google to create the Ngram Viewer, a Web-based tool that can chart words throughout the massive Google Books archive, sifting through billions of words to find fascinating cultural trends. On the day that the Ngram Viewer debuted in 2010, more than one million queries were run through it.   On the front lines of Big Data, Aiden and Michel realized that this big dataset—the Google Books archive that contains remarkable information on the human experience—had huge implications for looking at our shared human history. The tool they developed to delve into the data has enabled researchers to track how our language has evolved over time, how art has been censored, how fame can grow and fade, how nations trend toward war. How we remember and how we forget. And ultimately, how Big Data is changing the game for the sciences, humanities, politics, business, and our culture.

30 review for Uncharted: Big Data and an Emerging Science of Human History

  1. 5 out of 5

    ☘Misericordia☘ ~ The Serendipity Aegis ~ ⚡ϟ⚡ϟ⚡⛈ ✺❂❤❣

    Not too groundbreaking or anything.

  2. 4 out of 5

    Jim

    I really should only buy books at the airport. I picked up Uncharterd because it had a subtitle with "big data" in it. As my office has started looking at how to mine big data and how to visualize it I thought a "how to" book would help me get moving on understanding how to develop a plan. Well the book failed miserably at that but then that was not it's intent. This is one of the most educational book I have read in years. If I learned nothing I learned that if you have an idea you have to nurt I really should only buy books at the airport. I picked up Uncharterd because it had a subtitle with "big data" in it. As my office has started looking at how to mine big data and how to visualize it I thought a "how to" book would help me get moving on understanding how to develop a plan. Well the book failed miserably at that but then that was not it's intent. This is one of the most educational book I have read in years. If I learned nothing I learned that if you have an idea you have to nurture it. Others will say "we don't do that", "you cannot do that" or "that does not exist". They did not have the idea and hence they are not motivated to make it happen. I also learned that when dealing with big data that you follow the data to learn. You might learn early on what you wanted to learn is really not what you needed to learn. That understanding the risk of false positives and other bias is more than important it is absolutely essential. The book reinforces my thought that technology is speeding everything up and making everything new again. People post their lives on twitter, Facebook, instagram and then say they want privacy. It is only going to get worse and I honestly do not think we are going to put it in a bottle and say oops did not mean for that to happen. We have had digital lives for about 25 years now, we have not even seen the "terrible two's" yet. This was a tough book for me only because I kept expect it to deliver a formula which it never did. Instead it delivered thoughts, ideas, perspective and for these reasons this is clearly a book worth reading.

  3. 5 out of 5

    Philippe

    This was a fun and entertaining read. It starts with a unique set of data becoming available for the first time in human history. As Google started to scan millions of books into its digital library, an opportunity arose to explore new perspectives on the dynamics of cultural evolution over the last couple of centuries. It spawned a new branch of data science: culturomics. The first part of the book narrates how the authors, two young scientists with a multidisciplinary background, convinced Goo This was a fun and entertaining read. It starts with a unique set of data becoming available for the first time in human history. As Google started to scan millions of books into its digital library, an opportunity arose to explore new perspectives on the dynamics of cultural evolution over the last couple of centuries. It spawned a new branch of data science: culturomics. The first part of the book narrates how the authors, two young scientists with a multidisciplinary background, convinced Google to provide access to the data and how they prepared the dataset so as to avoid legal complications and maximise analytic accuracy. Then they discuss how they tackled some tantalising research questions: How does a language grow and develop? How do people become famous? How does censorship work? How does collective memory work? How do technological inventions enter our cultural fabric? This is quite instructive on different levels. One learns about the framing and operationalisation of tantalising data scientific challenges. Then the insights into the dynamics of cultural evolution are quite fascinating and not at all intuitive. Finally, I was surprised to learn about the startling contributions of many people I had never heard about: George Kingsley Zipf, Kristian Andvord, Hermann Ebbinghaus, Charlotte Salomon, ... The book is told in a very sober, sympathetic voice. Very easy to follow but not at all condescending. Recommended for people curious about what data science and big data can mean for the way we humans understand ourselves.

  4. 5 out of 5

    Manu

    The book was published in 2013, relatively the early days of what has come to be a fairly common buzzword. Therefore, it is probably unfair to expect this book to have the understanding or perspectives that the field has accumulated in the last few years. Having said that, I still think my expectation from the book was higher. It stemmed mostly from the title, and I thought there was tremendous scope there. We now live consume, produce and share tons of data on a daily basis. What could it say a The book was published in 2013, relatively the early days of what has come to be a fairly common buzzword. Therefore, it is probably unfair to expect this book to have the understanding or perspectives that the field has accumulated in the last few years. Having said that, I still think my expectation from the book was higher. It stemmed mostly from the title, and I thought there was tremendous scope there. We now live consume, produce and share tons of data on a daily basis. What could it say about us at a societal level? Wouldn't that be a great way to study how our culture has evolved as a species and perhaps differently in various parts of the world? How do ideas spread, how many of them are universal, and do some have more velocity than others? But hold on. While this book does try to give some answers, it is solely based on the authors' experiments with datasets using Google Books Ngram Viewer. This is a formidable tool - 30 million books digitized by Google. But it is limited too. These are only books published, and a subset of them. Books are only a small representation of culture, and by virtue of publishing being gated (in the past) would carry inherent biases. To be fair, the authors are aware of this and bring it up towards the end. It also raises the concerns that have now grown louder - who owns the data, who has access, what is it being used for? So, if you go by the title, you might be a little disappointed, but it is an interesting story well told and made accessible. It does provide many, many interesting trends and findings across disparate things like technology, popularity, grammar. You would like it especially if you're interested in language - words, their usage, grammar etc.

  5. 4 out of 5

    Peter (Pete) Mcloughlin

    What happens when Google uploads practically every book and publication and this data can be studied for words or phrases and their frequency you get a lens into culture with graphs and plots. These plots will measure things going on in the culture. It will quantify fame and how long it lasts. It will detect censorship and repression. It will chart changes in the English or any other language finding when and where grammar changes took place and when new words came into use. With big data we ca What happens when Google uploads practically every book and publication and this data can be studied for words or phrases and their frequency you get a lens into culture with graphs and plots. These plots will measure things going on in the culture. It will quantify fame and how long it lasts. It will detect censorship and repression. It will chart changes in the English or any other language finding when and where grammar changes took place and when new words came into use. With big data we can now plot the zeitgeist in nice easy to see charts. A very fun book. See my status updates for more details. big science crunches culture a treat.

  6. 4 out of 5

    Ami Iida

    the book's theme is "Google books Ngram Viewer" https://books.google.com/ngrams It analyses lots of books (tens millions books) The human being can read several books but it read mathematically millions books. it can lead the relationships of every books. it is the books revolution! the book's theme is "Google books Ngram Viewer" https://books.google.com/ngrams It analyses lots of books (tens millions books) The human being can read several books but it read mathematically millions books. it can lead the relationships of every books. it is the books revolution!

  7. 5 out of 5

    Dana

    Note: I received an advanced reading copy from the publisher in exchange for an honest review. I find Google’s Ngram Viewer—a graphing tool that charts the frequency of words/phrases as they occur over time in the books currently digitized by Google—to be addictively fun and fascinating, so I was thrilled to find out that the creators wrote a book about it. “Uncharted” starts out with an overview of “the natural selection”/”survival of the fittest” of irregular verbs, which leads into the story b Note: I received an advanced reading copy from the publisher in exchange for an honest review. I find Google’s Ngram Viewer—a graphing tool that charts the frequency of words/phrases as they occur over time in the books currently digitized by Google—to be addictively fun and fascinating, so I was thrilled to find out that the creators wrote a book about it. “Uncharted” starts out with an overview of “the natural selection”/”survival of the fittest” of irregular verbs, which leads into the story behind the Ngram Viewer’s development. The rest of the book delves into how studying the rise and fall of word/phrase usage through the Ngram Viewer can reveal things about our history and culture, including insights into fame, censorship, collective memory, and language evolution and growth. Ngram charts are integrated throughout to illustrate specific words/phrases being discussed, and there’s also an appendix of some amusing “Great Battles of History” Ngram charts. The concluding chapter explores possible future impacts and uses of big data. Overall, an excellent, quick read. I appreciated that the authors present their findings as implications instead of Irrefutable Facts. Not only are they frank about the weaknesses and limitations of the Ngram Viewer, such as how statistical bias, false positives, false negatives, and other data curveballs can skew the charted results, but they also explain how they tried to avoid or diminish deceptive outcomes. The authors’ writing is clear and easy to understand, and has some humor sprinkled in to keep the detailed academic talk from getting too dry for a general audience. I think academic-minded readers would enjoy it too though—it’s a thought-provoking look at history, culture and language through the big data “telescope lens” of the Ngram Viewer. If nothing else, the book will inspire readers to experiment with the Ngram Viewer themselves and encourage further reading on big data. Looking at some other reviews, I see complaints that this book focuses too much on the Ngram Viewer and not enough on big data in general. I can see why others thought the title was misleading (it’s metaphorical whereas many people interested in this stuff might be used to explicit journal article titles), and I can see how someone might be misled by the summary on my advanced reading copy, which mentions the Ngram Viewer as well as the implications of big data in general. It looks like the summary on Amazon (and presumably the finished copy) is a bit clearer about the Ngram Viewer being the focus, so hopefully future readers won’t have a similar problem.

  8. 5 out of 5

    Cliff Chew

    I saw "Big Data" in its title, and I just had to grab it off the library shelf. Although it is a light read with slight over 200 pages, certain parts of the book felt pretty boring to me. But maybe it is because I am not really into literature. What I loved about the book was what it drew out from the entire process, from ensuring the issues of copyrights, practicalities of releasing the data, to dealing with the messiness of the data, problems with confounding factors, and how these issues were I saw "Big Data" in its title, and I just had to grab it off the library shelf. Although it is a light read with slight over 200 pages, certain parts of the book felt pretty boring to me. But maybe it is because I am not really into literature. What I loved about the book was what it drew out from the entire process, from ensuring the issues of copyrights, practicalities of releasing the data, to dealing with the messiness of the data, problems with confounding factors, and how these issues were addressed. Towards the end, there were noticeably highlights of some findings from this extremely large dataset of millions and millions of digitalise books. I would classify this book as one where I needed to dig through some soil to get to the gold. Re-emphasize is that it was at least a light read.

  9. 4 out of 5

    Andrew Marti

    Big Data isn't just analyzing what's happening right now. It's used to analyze how we have changed over time. The authors use Google's ngram project to get insight into trends in culture through books and words in those books over the last 300 years. One amazing chapter described the measurable impact on culture due to Nazi oppression in the 1940s. Other chapters show how quickly people gain...and then lose fame. Words, too, have a measurable life and death. Big Data isn't just for geeks. It's f Big Data isn't just analyzing what's happening right now. It's used to analyze how we have changed over time. The authors use Google's ngram project to get insight into trends in culture through books and words in those books over the last 300 years. One amazing chapter described the measurable impact on culture due to Nazi oppression in the 1940s. Other chapters show how quickly people gain...and then lose fame. Words, too, have a measurable life and death. Big Data isn't just for geeks. It's for anyone fascinated in history, culture and how we change as a society. Highly recommended.

  10. 5 out of 5

    Ninakix

    The problem with this book is that it gets boring, quickly. It discusses the creation of Google's N-gram viewer and how it has been used to study history: which would be great, if the insights being generated were unique. But they aren't: they are primarily reflections of what we already know. I wish the book felt more in depth and thoughtful.

  11. 4 out of 5

    Kylie Burkot

    2.5 stars The last chapter was really strange and way out of scope. Overall, this book was just okay.

  12. 5 out of 5

    Laurel

    Uncharted can be thought of as a case study for a piece of software that demonstrates two emerging intellectual trends: big data and digital humanities. These are explored in the book though the creation of the Ngram Viewer interface for examining the scanned Google Books collection. Digital humanities is an interdisciplinary trend that brings computerized tracking and digital curating tools to fields such as History, Literature, Philosophy, Geography, and Language studies. When the data being e Uncharted can be thought of as a case study for a piece of software that demonstrates two emerging intellectual trends: big data and digital humanities. These are explored in the book though the creation of the Ngram Viewer interface for examining the scanned Google Books collection. Digital humanities is an interdisciplinary trend that brings computerized tracking and digital curating tools to fields such as History, Literature, Philosophy, Geography, and Language studies. When the data being examined is itself language, digital humanities overlaps quite nicely with a methodology that has been in place for the past five decades, corpus linguistics. But while corpus linguistics relies on different pieces of specialized concordancing software to gather, count, and track word combinations, Google Ngram Viewer, launched on Dec 2010, is a very accessible way to bring some of these tools to the fingertips of the general public. In this book, Ngram Viewer is deployed as a way to answer questions quick questions about cultural history. The larger field of DH is introduced in Chapter 7 (Utopia, Dystopia, and Dat(a)topia), which looks at the range of historical records that could be digitized, and also some of the pitfalls of ever-wider access to such records. They note, for example, the spotty coverage of newspaper digitization e.g. “Most of Poe’s newspaper articles have not been digitized, and no one knows when they will be” (p. 172), and the even spottier digitization of the many unpublished formats of writing: manuscripts, letters, wills, etc. It’s worth noting that the problem is not only one of getting data into a digital form. Even some of the born-digital materials that humans now create will have a limited appearance in the historical record, since blog posts, email, web page ads, and caches of digitized recordings and transcripts are only as accessible as the servers that host them. In focusing on occurrences found in Google Books, the book provides an entry into diachronic changes in word use. The results they show are exciting, but a cautionary note should be sounded. That is, it’s not as simple as looking at an ngram chart to have the story. What words are used is now clearly knowable, but capturing why they are used and identifying the right contexts in which to interpret them are still the necessary next steps of scholarship. Yet the authors sometimes present these as finished tasks. On seeing the first graph of ngram data for the word “evolution”, they note: “drawing from an ocean of data, the curve had distilled a simple powerful story that anyone could understand” (p. 159). They do, however, acknowledge that as a data source, book publishing is too slow to trace certain faster moving ideas and information (148) i.e., many ideas are more typically discussed in media other than books, e.g. texting, email, TV news, face to face conversation. But this is often overlooked in the book, such as the claims on p. 157 that it’s now possible “to quantify the spirit of the people, the Volksgeist, by empirically measuring aspects of collective consciousness and collective memory.” This enthusiasm leads the authors to coin the name of their approach as “culturomics”: where “the omics denotes big data” and the cultur- evokes the anthropological studies of Franz Boas in being “empirically knowable” (158-9). Such big picture excitement is indicative of their repeated, but unexamined, premise that the number of written occurrences of a word can be equated with the frequency of the thoughts or experiences it represents: “By seeing how often people talk [in print] about a year, we can get a sense of how present the events of that year are in their minds” (p. 144), “Ngrams tell us about the past. Alas, they do not predict the future. Yet.” p. 157. However on p. 189, they return to the topic of predictions, with the claim that “Ngrams that are going up [in a 20-year period] tend to keep going up. Ngrams that are going down tend to keep going down,” leading the authors to hint at the possibility of “a predictive science of history.” Some reader-friendly history of science is presented at several points throughout the book, including an amusing discussion of Ebbinghouse’s original experiments on long and short term memory, which make up some of groundwork of the field of psychology (pp.138-141) and a useful introduction to Zipf’s law, explaining normal and non-normal distribution (pp. 28-33). Several of the cultural incidents chosen as illustrations, however, verge on the melodramatic: “the impact on their lives and careers was immediate and devastating” of the Hollywood Ten (p. 124); “this heartbreaking chart” showing mentions of Tiananmen Square (p. 127); the despondent painter Charlotte Saloman who died in Auschwitz (p. 131); the 9-11 destruction of the World Trade Center, on p. 142; the digital hounding that ended in the 2013 suicide of Rehtaeh Parsons, (p. 181). At the same time, it’s through the discussion of stories of such wide-ranging historical breadth that the authors first mention a very intriguing way to use the diachronic tracking of Ngram Viewer to automate finding gaps in the historical record that could indicate suppressed information. The final chapter presents a much-needed call for the funding of humanities data collection to equal the level at which science projects are funded, suggesting that we need to “consider the potential impact of a multi-billion-dollar project aimed at recording, preserving, and sharing the most important and fragile tranches of our history to make them widely available for ourselves and our children” (p. 174). Ngram Viewer is put forward as an enticing way of showing what could be found by exploring such data collections. The fun of tracking ngrams is aptly described as “a new and extremely nerdy form of heroin” (p. 162). The book ends with 48 graphs that illustrate this addictiveness, with charts aptly presented in xkcd style drawings. More about these authors: • Jean-Baptiste Michel’s 2012 TED talk on this topic (called The Mathematics of History). • Erez Aiden will appear as a keynote speaker at the 1st Inaugural Texas Digital Humanities Conference on Networks in the Humanities on April 10-12, 2014. This review was written for LibraryThing Early Reviewers.

  13. 5 out of 5

    Lewis Menelaws

    I love Data. I have been in involved in the Data Science and Big Data scene for a while so I was really pleased to have picked up this book from my local library. The book talks about analyzing human culture using Google's largest collection of digitized books (over 30 million books). It goes into detail about the fame of some words, slang and how words change using the data as a method of showing how it might of happened. It also goes deep into other methods such as how the Nazi's censored thei I love Data. I have been in involved in the Data Science and Big Data scene for a while so I was really pleased to have picked up this book from my local library. The book talks about analyzing human culture using Google's largest collection of digitized books (over 30 million books). It goes into detail about the fame of some words, slang and how words change using the data as a method of showing how it might of happened. It also goes deep into other methods such as how the Nazi's censored their country by showing you the lack of words that were said in books at the time. The book then gives a bit of a moral dilemma by providing the reader with the direction on big data and how it will effect our lives more later down the line due to Social Media. The points the authors went over were great and varied. I especially liked seeing the one about censorship during the USSR/Nazi timeframes. Perhaps I am being pessimistic but I wish the authors would of went deeper into some of the subjects they went into. It seemed like they only just brushed the surface at times. I would recommend this book to anyone. Even if they aren't a techie or if "big data" intimidates them. The authors do a great job at making it for anyone to pick up and get into.

  14. 5 out of 5

    Mwalenski

    This is a fascinating, interesting, entertaining, and very well written story about language, culture, and big data from the creators of google's ngram viewer. So why only three stars? The book starts with a discussion of whether a picture is worth a thousand words or a million, and sadly, for a book so taken with the visual representation of data, the pictures here aren't worth the price of admission. Multiple long thin lines on a graph may work well on a computer screen in primary colors (as in This is a fascinating, interesting, entertaining, and very well written story about language, culture, and big data from the creators of google's ngram viewer. So why only three stars? The book starts with a discussion of whether a picture is worth a thousand words or a million, and sadly, for a book so taken with the visual representation of data, the pictures here aren't worth the price of admission. Multiple long thin lines on a graph may work well on a computer screen in primary colors (as in google's ngram viewer), but rendered as only vaguely dissimilar shades of gray on the printed page those same thin lines are an exercise in frustration. The authors really should have worked harder to differentiate the lines on their many graphs - with some effort I could puzzle out which line was which, but they should have done the work to make their point more obvious for the reader. These graphs could have easily been a source of inspiration, but as printed were mostly just an annoyance. It's easy enough to recreate many of the graphs online using the tool google provides if you're near a computer and interested (and you should be interested - the data are really cool). But printing them in the book as they are is just a waste of time and paper, and the authors really should have made the effort to do it better.

  15. 4 out of 5

    Syed Ashrafulla

    The book has an interesting premise: using counting as a way to track the evolution of language. The problem is that all the actual counting is boring, and all of the conclusions are "look at how cool this is" with few attempts to provide an explanation. I thought this book was going to be good when they were describing the phasing out of irregular verb conjugation, but then the other parts were simply charts to answer boring questions. What I would have liked is a more formal time-based or large The book has an interesting premise: using counting as a way to track the evolution of language. The problem is that all the actual counting is boring, and all of the conclusions are "look at how cool this is" with few attempts to provide an explanation. I thought this book was going to be good when they were describing the phasing out of irregular verb conjugation, but then the other parts were simply charts to answer boring questions. What I would have liked is a more formal time-based or large-dimensional analysis. Yes, it's cool to see how word frequencies go up and down. But there is no conclusions after the fact. The book is basically summarized by the appendix that dumps charts of word frequencies over time. In fact, the book is basically summarized by playing with the n-gram viewer. So go play with that instead of reading the book.

  16. 4 out of 5

    Mattila

    Really cool concept. Started off really strong. Can we map human history and rather than let our subjective tales provide the narrative- can we use data science and get real evidence. Then they do it a few times by showing relative frequencies of words or spellings (this, sadly, was the most effective use) over time. Then they pose cool questions and don't answer all of them until an appendix that is grey scale. I couldn't tell which line went with which object in the legends- that was mildly inf Really cool concept. Started off really strong. Can we map human history and rather than let our subjective tales provide the narrative- can we use data science and get real evidence. Then they do it a few times by showing relative frequencies of words or spellings (this, sadly, was the most effective use) over time. Then they pose cool questions and don't answer all of them until an appendix that is grey scale. I couldn't tell which line went with which object in the legends- that was mildly infuriating. Didn't really care about their copyright challenges with google scholar/books or their pitching and developing of their report. Think they were padding to get a book out of what should've been an article. +1 for cool use of data science, history, and graphs. -1 for horrible formatting, personal asides, and length. Probably wouldn't recommend.

  17. 5 out of 5

    A Mig

    What started like a very promising book ended up being a collection of snapshots on specific term comparisons using Google Ngram. It was interesting but a bit shallow as a consequence. This is unfortunate. What I found fascinating was the part (in the first pages of the book) on the origin of irregular verbs. Anomalies in Zipf's law, they are the relics of the Proto-Indo-European language (6-12,000yrs old - Ablaut grammatical scheme such as ring rang rung, sing sang sung). They have survived into What started like a very promising book ended up being a collection of snapshots on specific term comparisons using Google Ngram. It was interesting but a bit shallow as a consequence. This is unfortunate. What I found fascinating was the part (in the first pages of the book) on the origin of irregular verbs. Anomalies in Zipf's law, they are the relics of the Proto-Indo-European language (6-12,000yrs old - Ablaut grammatical scheme such as ring rang rung, sing sang sung). They have survived into the Proto-Germanic language (500-250 BCE) but they are progressively being wiped out of our modern language. And one could make predictions as to the next irregular verbs to disappear! That's really the main example I will remember from that book. It was also fun, I must admit, to try to reproduce some of the book's plots on Google Ngram.

  18. 5 out of 5

    Amelie Parent

    Before you read this, know that it is outdated and don’t fault it for that, because when you get past phrases like “web start up called Snapchat”, it is truly worth the read. This book is fantastic from start to finish. I had no idea what it was about when I picked it up, and it is not a topic I have ever read about, but it was fascinating. The writing style is great and there were parts that made me laugh out loud. At a non fiction book. About data. If this topic is at all interesting to you, p Before you read this, know that it is outdated and don’t fault it for that, because when you get past phrases like “web start up called Snapchat”, it is truly worth the read. This book is fantastic from start to finish. I had no idea what it was about when I picked it up, and it is not a topic I have ever read about, but it was fascinating. The writing style is great and there were parts that made me laugh out loud. At a non fiction book. About data. If this topic is at all interesting to you, please pick this book up because it’s delightfully nerdy and interesting and just really, really good.

  19. 5 out of 5

    Ian G

    This book is an extension of Aiden's and Michel's article (Michel et al. 2011) that became an instant classic within the quantitative literary criticism world. It provides a fascinating backstory to how the digitization process began (thankfully Google's Larry Page enjoys books quite a bit) and the history of a few key players in QLC (some even before computers existed). It also provides some insight into where QLC might bring us, both the good and the bad. An excellent book for both the quantit This book is an extension of Aiden's and Michel's article (Michel et al. 2011) that became an instant classic within the quantitative literary criticism world. It provides a fascinating backstory to how the digitization process began (thankfully Google's Larry Page enjoys books quite a bit) and the history of a few key players in QLC (some even before computers existed). It also provides some insight into where QLC might bring us, both the good and the bad. An excellent book for both the quantitative and humanities types.

  20. 4 out of 5

    Duane

    Take 40 million books, through an analyzer at it, and what do you get? A way to understand society that uses brute force statistics over hearsay and anecdotes. Especially enjoyed the section on the fragility of fame.

  21. 5 out of 5

    Tamara Jill

    Interesting material but jumbled style/organization. The book has too many little stories and ignores established research - like hundreds of years of research by linguists. They come off a bit like know it alls who don’t know quite as much as they think they do.

  22. 4 out of 5

    Clivemichael

    Fairly entertaining, a few lol moments, well described and presented.

  23. 5 out of 5

    Donald Sherer

    A light and easy read. Some good material but the authors biases come through in the selection and presentation of the data.

  24. 4 out of 5

    Sarah

    So many dad jokes.

  25. 5 out of 5

    Douglas

    This book promotes an interesting program which Aiden and Michel helped to develop (the Google Ngram Viewer) and a term they invented (Culturomics - the use of huge amounts of digital information to track changes in language, culture and history), yet I feel they are only touching the surface with the technology they helped to create. The Ngram Viewer and the use of Culturomics can be useful (software engineer Jeremy Ginsberg observed by researching googling records for a region that a flu epidem This book promotes an interesting program which Aiden and Michel helped to develop (the Google Ngram Viewer) and a term they invented (Culturomics - the use of huge amounts of digital information to track changes in language, culture and history), yet I feel they are only touching the surface with the technology they helped to create. The Ngram Viewer and the use of Culturomics can be useful (software engineer Jeremy Ginsberg observed by researching googling records for a region that a flu epidemic can be quickly identified and can provide an early warning system for that region), yet the examples they give from their own research provide a reaffirmation of something we already know (the words unemployment and inflation are used more during economic depressions) or of something we could care little to know other than as an interesting tidbit ('doughnut' was overtaken by 'donut' soon after the business Dunkin' Donuts began). They state that "digital historical records are making it possible to quantify our human collective as never before" and that their culturomics is a "microscope to measure human culture"...yet the book lacks the deep thinking to reach a worthwhile goal, only interesting "potato chips for intellectuals" as stated in William Grimes NYT review ‘Uncharted,’ by Erez Aiden and Jean-Baptiste Michel. For now, it is mostly increasing awareness of trivial matters. But there is always the future to look forward to...and much more data. Quote from page 10: 'As we experience all that contemporary life has to offer, as we live out more and more of our lives on the Internet, we've begun to leave an increasingly exhaustive trail of digital bread crumbs: a personal historical record of astonishing breadth and depth.' How much of a trail? page 11: One bit ( binary digit) is like one yes/no question where 1 is yes 0 is no. "...the average person's data footprint... is a little less than one terabyte' or 'about 8 trillion yes-or-no questions. 'Humanity produces five zettabytes each year : 40,000,000,000,000,000,000,000 (forty sextillion) bits.' That's why it's called big data. The total data footprint is doubling each year! Data records make it possible to reliably transform and manipulate information, so it is clear that this will be a territory that will probably remain mostly uncharted for a long time, or maybe just become a massive wasteland that only a few will care to bother visiting. This book's concluding chapter focuses on similar future developments (life logging and mind-machine interfaces) that Smarter Than You Think by Clive Thompson covers in more depth and breadth. I would recommend simply checking out their TED talk What we learned from 5 million books ...its a brief account of what is covered in Uncharted, and that is really all the information that is needed . Read Clive Thompson's Smarter Than You Think to gain a deeper insight into the uncharted territory of technology. And read David Egger's The Circle for a fun and insightful fictional look into the future of life logging.

  26. 5 out of 5

    Dwight Penny

    The authors worked with the folks at Google to create an interesting sort of index. They looked in the corpus of some 33 million books that Google had scanned for the Google Books project, and counted the occurrences of all words or short sequences of words, up to five words in length published in a given year. The result came to be the Google Books Ngram viewer, where you can type in a handful of words or phrases in a comma-separated list, and graph their frequency of usage over the years from The authors worked with the folks at Google to create an interesting sort of index. They looked in the corpus of some 33 million books that Google had scanned for the Google Books project, and counted the occurrences of all words or short sequences of words, up to five words in length published in a given year. The result came to be the Google Books Ngram viewer, where you can type in a handful of words or phrases in a comma-separated list, and graph their frequency of usage over the years from 1800 to 2008. Why is this interesting? Well, it gives a certain mathematical exactitude to the popularity of words and phrases over time. For example, type in "telegraph, telephone, radio, television, Internet" and you can graphically see when each technology entered the scene, and the relative stir it created, and compare them to each other (spoiler: "Internet" is a pretty big deal right about now). Or, you can type in the names and use it as a measure of fame (spoiler: the Beatles were never anywhere near as big as Jesus). The book deals with the details of their technique, some thoughts on the promises of using this kind of "Big Data" for what they call Culturnomics, an attempt to make the "science" of social sciences a bit more than wishful thinking. Reading the book can be a very interactive experience. I found myself getting up and playing with the Ngram Viewer online, coming up with variations on their examples, or going off on tangents of my own. Since the book was written, Google has added some refinements. You can use wildcards, like "Queen of *", which charts the top 10 ways to finish that phrase. Or you can tag a word to isolate verb or noun forms, so "liason_VERB" can be used to find that particularly dark period of our language when the word was used that way. All in all, pretty fun. Even if you don't have time to read the book, pull up the Ngram viewer in your browser, and fool around a bit.

  27. 5 out of 5

    Paul

    Everyone has heard of Big Data; huge amounts of information, usually involving computers or the Internet. Is there a cultural or historical equivalent of Big Data? Yes, and it comes from Google's intention to digitize all the world's books (or, at least, a significant portion of them). The authors created an algorithm that would search all those books for certain words. On a chart, it will show, for instance, how many times, per million words, the name "Abraham Lincoln" was used, or "World War II Everyone has heard of Big Data; huge amounts of information, usually involving computers or the Internet. Is there a cultural or historical equivalent of Big Data? Yes, and it comes from Google's intention to digitize all the world's books (or, at least, a significant portion of them). The authors created an algorithm that would search all those books for certain words. On a chart, it will show, for instance, how many times, per million words, the name "Abraham Lincoln" was used, or "World War II." It can also be used to compare the historical use of pairs of words, like Satan/Santa, evolution/DNA, men/women, war/peace, tea/coffee or old school/new school. It can be found at books.google.com/ngrams ("Possibly the greatest time-waster in the history of the Internet." - Mother Jones magazine). Google needed convincing that this was a good idea, that it would not open them up to millions of copyright infringement lawsuits. Using this algorithm, it is possible to look at things like historical attempts at censorship. It can range from Nazi attempts to remove Jewish artists like Marc Chagall from the German cultural landscape, to the 1950's Hollywood Blacklist. A person can also look at how long a certain word or phrase stays in the cultural memory. For instance, "Korean war" has a huge jump in usage in the 1960's, then an equally huge drop in usage soon after, down to its present level of almost nothing. The book also looks at the evolution of the English language. If we have pairs of words like drive/drove, what happened to thrive/throve? Also, what happened to words like burnt, learnt and dwelt? It all has to do with irregular verbs, which change over time. This is a fascinating book, but it will take some effort on the part of the reader. It's very well done, and it gives the reader the chance to do their own historical research.

  28. 5 out of 5

    Evan

    This book summarizes the PhD theses of the two co-authors, and builds on a research article published in Science in 2011. The project itself is fantastic, as is the Google Books project (at least in terms of the scope of data scanned and generated); however, this book falls a little short in digging in to it. First, there are no endnotes or footnotes or sidenotes (a la Tufte); there is a chapter of "Notes" at the end, but they aren't easily referenced in the text. Second, every plot is essentiall This book summarizes the PhD theses of the two co-authors, and builds on a research article published in Science in 2011. The project itself is fantastic, as is the Google Books project (at least in terms of the scope of data scanned and generated); however, this book falls a little short in digging in to it. First, there are no endnotes or footnotes or sidenotes (a la Tufte); there is a chapter of "Notes" at the end, but they aren't easily referenced in the text. Second, every plot is essentially a histogram of word frequency vs time, binned by year, and the y-axis changes between "mentions per billion words" and "mentions per million words" without much note. Third, perhaps over the authors' objections, the book is published in greyscale: with a dataset of nearly six million books, using color as an additional visualization information channel would be quite helpful. This also limits the comparisons (in greyscale) to about two to six topics per plot. Further, no "small multiples" are used, which limits the plots to two per page. This is something which could be addressed with more expensive printing (a la Stephen Few's or Edward Tufte's books and layouts), use of Tableau or other visualization software, or a more time spent on the VISUALIZATION of the enormous data sets analyzed, would bring their analysis to life. Many peaks and dips in their plots also go unexplained, except for the obvious, e.g. 9/11 brought about a spike in mentions of Pearl Harbor. Overall, the technical effort was monumental and the launch of Google N-grams was undoubtedly a great public service. But this book fell a little short of those standards.

  29. 4 out of 5

    Russell Atkinson

    This nonfiction account of the creation of Google's Ngram Viewer is fascinating. An Ngram is a word or phrase (N words long) and the Viewer measures how often that Ngram appears in books in recorded history up to 2008, at least in those scanned by Google. The authors devised the program's basic features to view history and social change through a factual scientific lens, to see how our word usage changes over time and what that tells us. It begins with the example of illustrating when the United This nonfiction account of the creation of Google's Ngram Viewer is fascinating. An Ngram is a word or phrase (N words long) and the Viewer measures how often that Ngram appears in books in recorded history up to 2008, at least in those scanned by Google. The authors devised the program's basic features to view history and social change through a factual scientific lens, to see how our word usage changes over time and what that tells us. It begins with the example of illustrating when the United States changed from a plural to a singular noun. Popular accounts attributing that to the Civil War fully uniting the states into one entity once and for all turn out to be false. The trend toward the singular began before that and didn't really take off until after 1880. If you don't want to read the rather dry prose and the authors' own speculations on social trends you can go directly to the appendix to see some of the charts that tell us how Santa compares to Satan, when data became more important (in books anyway) than God, and so on. They do touch on other forms of big data, but I wish they had spent more time and space on things other than Ngrams. What are the possible benefits and harm of all those photos being massively uploaded onto the Web? What about medical data - can it be used to identify causes or cures of diseases by examining massive trends. Google is now already quicker and better at predicting flu outbreaks than the NIH based on web searches for terms like"flu," "influenza," "fever," etc.

  30. 5 out of 5

    Andrew

    "Uncharted" by Erez Aiden and Jean-Baptiste Michel is an interesting look at the future of cultural studies through the use of "big-data." These two Harvard PHD's use Google's vast collection of digitized books to try and determine trends throughout history, termed by the authors "culturomics." This is an interesting look at the future of social studies. As we continue to aggregate and digitize our collective knowledge, it will become easier to look back and see how things changed within the Eng "Uncharted" by Erez Aiden and Jean-Baptiste Michel is an interesting look at the future of cultural studies through the use of "big-data." These two Harvard PHD's use Google's vast collection of digitized books to try and determine trends throughout history, termed by the authors "culturomics." This is an interesting look at the future of social studies. As we continue to aggregate and digitize our collective knowledge, it will become easier to look back and see how things changed within the English language and culture. The authors analyze these trends through the written word, looking at N-gram data to compare word usages. I found this approach fascinating and engaging. The book itself is entertaining, with references to everything from Nazi's to Google to the invention of flight and really shows how ambitious the two authors are. The excitement the authors give off over their experiments is contagious. My only real criticism with the book is its sourcing style. This work covers many important historical dates, but does a poor job of leading the reader to the proper source. In-text citations would have made this book much more enjoyable and lasting. Conclusion: If you are looking for a good read on how cultural studies is being influenced by science, technology and corporations, then this is the book for you. It is an entertaining read on how the digitization of humanities collective knowledge can make a major impact on many different fields of study.

Add a review

Your email address will not be published. Required fields are marked *

Loading...
We use cookies to give you the best online experience. By using our website you agree to our use of cookies in accordance with our cookie policy.