data – Information Science

Data Insecurity

In the modern age there are hundreds of companies that have access to its users personal information. Many companies require some sort of sign up process for first time users. This often involves the user giving the company their name, address, birthday, and in some cases even more sensitive information such as a credit card number. Most people do not expect this data to be leaked around the internet, but it happens quite often. When a data breach occurs, the sensitive information that is gathered is likely put up for sale on different dark web sites for anyone to purchase. This type of information breach can happen to any company, even if they might not seem like they would have much sensitive information on their customers. For example, Panera suffered a massive data breach last year where the data of 37 million customers were exposed. Here is a website that talks about some of the other biggest data breaches of last year, and dives more into the specifics of each breach. The cyber criminals that are responsible for these data breaches continue to find new hacking methods to uncover this data. So since any company seems to be susceptible to these data breaches, what can be done to improve data security and keep everyone’s data safe?

Big companies use a variety of techniques to keep their customer’s sensitive data safe. At a purely physical level they have many policies to restrict the possibility of a data breach. For example, many companies use encrypted hard drives to store information, encrypted USBs to protect moving data, and also encrypted phones to protect data shared through telephone. Many companies have policies that require theses devices to be used, and also extra policies about employees own devices. Employees are often required to use a laptop or other device that has no USB slots, and is not able to download or export data over cloud. This is to prevent data from reaching outside sources by employees. A statistic that I got from this website (also a very interesting article) says that ignorance and negligence from employees cause 54% of all data breaches.

Many people also falsely believe that the big companies just have all the data encrypted, so it would not be accessible anywayss, but that is often not the case. Most large amounts of data from companies get stored in a relational database, as it is the easiest method of storing big data. However, it is difficult to encrypt data that is stored in a relational database, so whoever has access to it can often just read the data inside. Encrypting a database is also very expensive when you are purchasing that database from another company. All companies should be required to encrypt their customers sensitive data. It is a major violation of data integrity.

People tend not to think very often about how much information about their personal lives are truly out there for companies to sell around. We have no real idea how well protected the data, that we casually enter in when registering for a website, truly is. Most likely it is going into a database with no real protection at all! Data security still has a long way to go, and certainly more companies need to start implementing better encryption of their customers data. We all need to be more careful with our sensitive information, and pause to think where exactly the credit card number we are entering in is really going.

Nick Bagley

Graph Overload

There are hundreds of different types of graphs that a person can use to represent data. This often makes it difficult to figure out which type of graph is the optimal choice to display the information most clearly. Different graphs are good for different purposes, and in this post I will discuss a few of the key graph types that can be used in common situations.

If the data that is being displayed is not overly complex, then often times the simplest graphs are the best to use. The basic bar graph is good to use when comparing different numerical values against each other. For example, if data is gathered on several group’s opinions on a topic, a bar graph is an easy way to represent the number of people from each group that favor one opinion or the other. This kind of graph is also very useful in a financial standpoint, allowing for different dollar amounts to be compared between different time periods or companies. Another very simple yet powerful graph is the line graph. This graph is mainly used to represent trends, which clearly shows if a certain data set is increasing or decreasing based on the parameters. One of the most recognizable uses of this graph is in representations of the stock market. It shows the trends of different stock prices, and allows the reader of the graph to very quickly identify which stocks have an upward trajectory and which do not. Both bar graphs and line graphs can be understood without much analysis, making them very useful for quick and easy representations of data.

While bar graphs and line graphs are very useful in industries such as politics and business, they are not as widely used in the more scientific fields. The kind of data gathered through scientific research does not always make sense when put into these graphs. This is where graphs such as scatter plots are useful. The scatter plot allows two variables to be considered, and when the points are analyzed a relationship between these two variables can be found. This is helpful for scientists to find patterns in their data and make new discoveries based on connections that would not be able to be seen otherwise. Spider charts are also very useful in the scientific world, allowing for more than two variables to be considered for the data. A single entry can be considered against multiple variables around the circle of the graph, and additional entries can be compared in the same graph by having a color key present. This allows scientists to identify the entries that are best suited for a specific variable, and which are far below the competition.

While many graphs purpose are to display data in the most efficient manner, there are also times where the goal of a graph might be to be understood as simply as possible to a large group of people. These are graphs that may want to be used for presentations in a meeting for a large number of people so that the main idea can be communicated clearly. A very strong example of a graph in this category is the pictograph. This is a graph where the data gathered is translated into pictures so that it can be easily visualized by the audience. Venn diagrams are another graph that audiences easily understand. The venn diagram clearly shows two opposing sides, and shows the audience where the two intersect and where they have their differences. While these types of graphs are not necessarily the best for representing complex data, or even simple data, they can have a strong effect on an audience because of how easy they are to understand, and how they do not force the audience to analyze raw data too intensely.

There are many more kinds of graphs that can be used in a number of different situations. While some are hard to read, and some have very specific usage, there is a graph for every data set. This website shows a large number of different charts and graphs, many of which I have never seen before. New graph types will be created constantly with all of the new types and representations of data being introduced in the modern world, and it is important to keep all of these graphs in your arsenal when dealing with the unavoidable mountain of information in today’s age.

Nick Bagley

Data Encroachment

Most people would say that privacy is a crucial right, however big companies and organizations, such as Google and the federal government, have been pushing the boundaries of privacy. The average person generates approximately a gigabyte of data everyday, and more of that data than ever before can now be collected and used in different ways. However, many people do not want this data to be collected from them. A recent example of data collection going too far is the case against Mark Zuckerberg and Facebook last year. Here is a link to the article on the case from the New York Times, including a video of Zuckerberg’s testimony. Many people are outraged at the fact that their data can be harvested and seen by huge companies, and often times sold to other companies so that your private information has spread to many different areas.

This invasion of people’s data has caused people to alter the way they use the internet. Most people will at the least have some sort of cyber security system on their browser, such as a firewall. Many people are intent on making sure their privacy is kept, which springs up products such as a tabs for devices that cover the camera so that nobody can access it. While many people do not think that so much personal data should be collected and seen by others, there is a strong counterargument. More collection of data means major possible improvements across many different fields.

The health industry is an example of where an increase in gathering data could help a multitude of people. With modern technology scientists are able to create a device that people can put on that tracks a person’s heart rate, blood pressure, and other vitals at all times. This device can warn a person if there is an issue such as an upcoming heart attack so that they have enough time to seek medical help before it actually occurs. This device would be constantly gathering data from a person’s body, which is exactly why many people would not want something like this hitting mainstream usage. The human body generates about two terabytes of data everyday, and this technology is able to collect all of that data. This means that there could be thousands of different people that all have information on things like a person’s heartbeat at all times. This is seen as an invasion of privacy to many people, which is why limiting the misuse of data is an issue that the government pays strong attention to.

Data collection is used by companies to make the lives of their customers better. Sites such as Google collect an incredible amount of data from users which allows them to provide relevant advertisements and accurately predict what the user is going to search based on their past. However, when does this begin to invade a person’s privacy? Google could gather the data from a hospital showing that a patient just had a child. Then they would use this information and advertise products for babies and new mothers to the patient. This sharing of information between different companies and across industries is where people believe their privacy is intruded upon. Companies continue to try and collect more data and push the boundaries of privacy, and this is where we will see a true data encroachment.

Nick Bagley

Swimming in Data

The first true form of data storage for machine use are punch cards. They were invented in 1725 and continued to be used commonly for centuries. They were cards with holes in them, and these holes represented instructions for the machine to follow. The most common uses for punch cards early on were textile looms and self playing pianos. These punch cards were easily understood by both humans and machines. By reading the documentation on the punch card, people could easily understand what the holes were supposed to do. For people to share this data, there would simply be increased production of the same punch cards. People use replicas of the same data to make machines carry out the same tasks. There was no need to go to computer science school to understand how these punch cards operate. Data storage was at a very elementary level, and there were no special languages required to extract the data and use it somewhere else.

It was not until 1948 that the first instance of RAM was introduced. Frederick Williams was able to store 1024 bits of information digitally. Intel started to release the first computer chip in 1966 that stored 2000 bits of information. Soon after external hard drives began to be made. Floppy disks and hard disks were introduced. However, even though there was more data able to be stored than ever before, it still was not easily transferable. To access this data, the physical drive had to be present, and the machines had to have a drive to access the data. There was no cloud where all of the information went. The data storage continued to improve. SSD cards and flash drives were invented in the 2000s to continue to make more storage and smaller physical chips possible.

In 2006 the term “cloud” was finally introduced. More data was being produced than ever before. Here is an interesting website that has a lot of interesting statistics about the amazing increase in data. By 2020 1.7 megabytes of information will be generated every second. This is the point in which data becomes difficult to access and to transfer. With the large amount of programming languages used it becomes harder to write universal programs manipulating the data. This is where XML files are necessary. The method of using XPath to access data from XML files is crucial to the sharing of this data.

Before information was accessible to everyone through a cloud, there was not as strong of a need to digitally transfer data between databases. However, now that it is possible for multiple people to access the same data, the question of how to efficiently manage it becomes increasingly more prominent. Data can be lost in a data lake, making it very difficult for anybody to access if it must be used. That is why it is important to have some way of transferring data into an XML. Once the data is organized in an XML format, the universally known XPath language can be used to access certain pieces of the information. Before data is generated, the software that generates it should have a way of storing that data into an organized store. This way the data can be utilized across different languages and machines. More data is about to be introduced into the world than ever before, and we cannot let it become lost.

-Nick Bagley