Improved water coverage displayed in Indiemapper

Indiemapper

Indiemapper is a convenient way to show map visualization.  I found its strength is it can input data from indiemapper’s own library.  I opened data library and selected “Health”. It tells me that the data of “Health” is from WHO. I choose improved drinking water coverage in total population:

11

 

I changed the size of circles and move the number to mean value:

12

 

Then we can see more clear that most of improved drinking water coverage is in Africa.

I put five classes and use Quantile method:

13

 

We can see althrough the amount is large in Africa but high quality water coverage is in Europe.

Exploratory Data Analysis – Titanic Case

Exploratory data analysis assumes the test in order to form a data analysis method and a supplement to traditional statistical hypothesis testing mean. The method is named after the famous American statistician John Tukey.

Titanic was a British passenger liners that sank in the North Antarctic Ocean on 15 April 1912 after colliding with an iceberg during her maiden voyage  from Southampton, UK to New York, US. The Titanic sinking caused the deaths of 1,502 people .

I present the data from Titanic by Mondrian.  Through analyzing by Mondrian, we can see the survivors in different categories: Class, Age and Sex.

Open a txt file of Titanic data and we select all categories which are in the window:

1

2 3 4 5

Display by Parallel Barplot:

6

We can click first class and see how it goes::

7

Then we see crew:

8

And the third class:

9

Present by Parallel Coordinate

10

Then we can see that the majority of survivors are passengers in first class. Most casualties are passengers  in crew and third class, and among them are male adults.

Data and Scientific Knowledge

Information technology is transferring things in the real world into CYBER storage space in the form of data. These data are a representation of nature and life, these data also recorded human behavior, including work, life, and social development. Today, data is quickly and in large numbers produced and stored in the CYBER space, this phenomenon is called data explosion. In addition, the exploration of the laws and phenomena of data is to explore the laws of the universe, to explore the laws of life, looking for the laws of human behavior, an important means to find the law of social development, such as: research data to study the life (Bioinformatics), the study of human behavior (behavior informatics).Data reproducible and transparent.  In the article Reproducible research and Biostatistics, the author, Peng pointed that “reproducible research requires that data sets and computer code be made available to others for verifying published results and conducting alternative analyses”.  This theory is mainly used in biostatics.

How to make data show scientific knowledge? Nowadays R is one of the best choice. R contains a large number of data and all the data can downloaded from R library.  In Scott Chamberlain’s slides- Web Data Acquisition with R-he gives us three reasons to use R: the first one is getting data from a web takes too long time; the second is workflow integration; finally,  your work is reproducible and transparent if done from R.  I also read a book from O’Reilly, called Exploring Everyday Things with R and Ruby . First of all, O’Reilly introduces R and Ruby in two chapters. Then in the third chapter, he solves the problem of the number of toilets in the office, using Ruby to simulate the number of people on the toilet, and then use R to draw every possible situation.  In the forth chapter he establishes a simple dynamic system, including producers, consumers, the price and the market, and these factors were simulated. In his book, one of the interesting thing is he uses mails in Ruby library to obtain the Enron scandal in the e-mail data and email, and then use the R time distribution description and text mining.

Data can be used as a symbolic representation of information and knowledge, or carrier, but the data itself is not information or knowledge. Data science research object is the data, rather than information is not knowledge. We obtain the understanding of nature, life and behavior, and thus access to information and knowledge by studying the data. The data object of study and research purposes and research methods are essentially different from the existing computer science, information science, and knowledge science.

Natural phenomena and laws of natural science research is the object of knowledge throughout nature and substance of all types of nature, status, attributes and forms of exercise. Behavioral science is the scientific study of human behavior in the natural and social environment as well as low-level animal behavior, has been recognized disciplines, including psychology, sociology, social anthropology, and other similar disciplines. The data learning support natural science and behavioral science research. With the progress of the science of data, more and more scientific research will directly address data, which will enable the human understanding of data to understand nature and behavior.

In the process of explore the reality of human nature and the discovery of human computer processing, human society, nature and people, data has been produced in large amount. We have created a more complex data nature.  What we believe now, can be defeated by analyzing by data. Is data a more reasonable approach to tell truth? Ioannidis used modeling to examine most published research findings are false.  Data is a more scientific way by a variety of detection methods.

We have a lot of tools and software, like R, but still can’t be used widely because R needs a solid knowledge of statistics and English skills. Sometimes these reasons can increase the difficulty of data analysis. What we can expect now is looking forward to new data analysis tools.

 

 

John Snow and the cholera outbreak

John Snow was once a well-known doctor in London. He had excellent medical skills, so that Queen Victoria fired him as her private doctor. Cholera was a deadly disease at the time, people did not know its cause, nor understand its treatment. There were two views of cholera causes: The first view is the cholera virus breeding, it was like a surge of dangerous gases float around in the air, until the victims of the virus found. The second view is that people eat when the virus into the body. Virus attack from the stomach rapidly bring disaster to the whole body, the patient will die soon.

John Snow speculated that the second statement was correct, but he needed evidence. Thus in the 1854, London cholera outbreak again when he was preparing for his investigations. When the rapid spread of cholera in the slums, he began to collect data in two specific streets. The cholera epidemic was very serious, resulting in more than 500 people died within 10 days. He determined to identify the reason.

First, he marked on a map the exact place where all the dead live. This provided him with a description of cholera causes valuable clues. Many of the dead are near the water pump in Broad Street (especially the street, 16,37,38 and 40). John Snow also noted that some of the residents (such as the width of the street, No. 20 and No. 21, Cambridge street No. 8 and No. 9) had bare death. He did not expect it and he made ​​a further investigation. He found that these people are working in Cambridge Street on the 7th of pubs and taverns to provide them with free beer, so they do not have to drink the water pump pumped water. Cholera epidemic seems to be blamed on the drinking water.

Second, John Snow investigated the water resource of these two streets. He found that water is calling from the river, and the river was in London the dirty water pollution discharge. John Snow immediately called wide panic of the people in the streets removed the pump handle. In this way, water pumps could not be used. Soon, the epidemic has been alleviated. The map contains —the street names, breweries,  workhouses, and water pumps—the map revealed an overwhelming connection between the Broad Street pump and cholera transmission. However, what we can’t see from John’s map, it is time.  After the handle was removed, there are still people dead because it was too late. If we use different sized points represent different month, the death can be visualized better.

Before this map, John Snow created a map during his South London study that featured handinked dots, which were hard to read, and cloudy colors that tried, but failed to show the connection between cholera deaths and water sources. Snow also published a table to tell the same story. But it wasn’t quite right. It lacked key pieces of information gleaned.  After these failures, Snow realized to tell the truth of data, he needed to make some variables and their connections visualized.

The cholera outbreak in London tells us sometimes visualization is better than calculation.  Data visualization is presenting the fact into maps or other tools. Map, the ground coordinates information visualization and generates graphical tools, it is easier for people to explore the relationship of them, then discover hidden truths. Let’s zoom out. Any thing or fact is one type of information: tables, graphics, maps, and even text, whether static or dynamic, and provide us with a means of understanding the world. Visualization will be several times to enlarge their power.