Let's read the text
Required reading: 8 minutes
Joachim gave us an introduction to data science and the programming language Python in May 2018, during a 6-week course at the CODE University of Applied Sciences Berlin. A few days ago I met him for an interview.
Hello Joachim, how would you explain Data Science to an eight-year-old child?
Joachim Krois: Data Science – This is a spongy term. I do not know whether there is a clear definition of this at all and whether the term is not more likely to fall within the scope of Buzzwords. Data science is a new field that has developed over the last 10 years. If I were to explain to a child what data science is, I would probably most likely try to describe what part of that work is. Only part – not the whole spectrum – because this is a very large spectrum that can be seen as a cross-cutting task. One speaks of three tasks. These are scientific mathematical basic knowledge, coding experience (use of script-based programming languages for analysis) and domain knowledge, i.e. a special knowledge of the area you are currently dealing with.
I would explain the term data science/data science to an 8-year-old: Imagine I want to know something about you, but I don't want to question you about it. Also imagine that I go to your nursery and take some of your toys and clothes and put those things in a box. As a data scientist, I now look into the box and try to make statements about you based on these things. For example, whether you are a boy or a girl, how old you are, what your favorite comic is and what things you enjoy and which are not.
"Data acquisition and predictive work are Data Scientists' two major areas of responsibility."
This is part of the tasks as data scientist: to describe data and to extract added value from the data obtained. In addition to this descriptive, exploratory nature of a data scientist, there is another large field, namely the predictive or predictiveing being. The idea is that I take not only a child of selected toys and clothes, but also many children. So I go to several nursery rooms, always grab the same stuff and ask the respective child about his age, his favorite food, his favorite color, his favorite comic etc. When I have visited enough children's rooms, I can go to any child's room from a certain point without knowing the child. I look at his things and can then say with some certainty what age it has, what sex it has, what its favorite comic is, etc.
These are the two major areas of responsibility of data scientists. You have these crates that you fill – this Is data acquisition – and then you try to work either exploratory or predictive based on this data. For the predictive work one usually needs several of these acquisition campaigns.
Our readers are particularly interested in artificial intelligence (AI). How are data science and AI connected?
Joachim Krois: Artificial intelligence -Again a buzzword. Many people understand these things differently. I looked it up again, because I also understand different – depending on the day's constitution – below. One of the definitionsthat I have found is to teach AI a machine to solve intellectual tasks in a way that would solve a human being.
Not so long ago, it could have been called Ki if a machine is capable of defeating a chess Grandmaster. This has now been solved. Would the machine now be called AI? No! Far from it. A few years ago a much more complex game called Go was resolved by a machine. If you look at the algorithms, it's a pretty impressive achievement. Many people did not believe that this would be possible. It happened, but did the machine achieve an AI? Not even that!
There are segmented or domain Specific servicesthat are not necessarily what you would know from comics or movies and would be called AI or general AI. It's a huge field and sometimes just as hard to grasp as data science. What the areas have in common, and where one of the intersection points is, is the Machine learning. This enables us to write programs and build frameworks in such a way that machines no longer have to be programmed to perform actions. If you dictate the rules and the machine then acts, you speak of symbolic ai. In machine learning , data is available and the machine learns the rules.
"Machine learning is one of the techniques used in data science to work predictively."
For KI, this aspect is relevant: that programs or systems are able to respond to data or input from the world, learn from it and perform according to knowledge-led, cunning and comprehensible actions. Then you could speak of Ki. Machine learning is one of the techniques used in data science, especially in order to work predictive, i.e. to carry out predictive analyses. I have a record and try to predict what the outcome is based on the record for new instances. And here too, we use many of the algorithms that machine learning brings with it.
Both data science and AI use algorithmsthat enable machine learning for their respective goals. Data science has even more facets. Questions such as: How do I get data, how do I upload it, how do I transform it also plays a big role.
Where do we encounter Data Science in our everyday lives? Do you know any exciting examples?
Joachim Krois: The question may be rather where you don't encounter data science. As I said, data science is a buzzword, and for many people it means quite different. The question can be broken down to: where do we encounter things that are not based on statistics or mathematics? All the systems that surround us are based on mathematics and statistics. Cars would not be possible without mathematical and statistical foundations. Just like road crossings and traffic light systems. The question is more like what we want to call it. As a result, these systems and phenomena are part of our cultural technological development, i.e. we see them everywhere. They are particularly important now because these techniques have developed digital services such as automatic translations or product recommendations.
"It's hard for me to see aspects where statistics and data science don't matter."
Anyone who knows a library knows that it needs guidance in a library. In the past, there were boxes in which keywords could be looked up. These boxes are very large and the only thing to put together was a mammoth task. In the meantime, there are corresponding technical systems that simplify this. We enter a keyword into the search engine and then get either the book or related books immediately shown.
From a technological point of view, I find it difficult to see aspects where statistics and data science do not play a role. All sensual things like the sound of the sea or the sunset are of course completely free of it. But when you enter the technical sphere, we find mathematical and statistical approaches everywhere – it is only now different.
What did you study and how did you discover data science for yourself?
Joachim Krois: I studied geological sciences at the Freie Universität Berlin. It is not believed, but geoscientists are also constantly confronted with data. These are data from and about the Earth, and about the environment and interactions between human and natural systems. Statistics and data analysis do not come around when you are pursuing this study.
"A data scientist is a statistician who knows more about programming than a classic statistician and at the same time someone who knows less about programming than a programmer."
I would probably not call myself a data scientist until today, because I do not know exactly what this term actually means. If you were to nail me, I'd rather see myself as a data analyst . In my current position I am working as a Geo statistician . I would like to quote Anthony Goldbloom, CEO of Kaggle, who recently published the Data Framed podcast on the topic "Kaggle and the future of data scienceLanguage. In this conversation he explained what constitutes a data scientist and in this definition I found myself again. In his opinion, a data scientist is a statistician who understands more of programming than the classic statistician and at the same time someone who has less idea of programming than a programmer. In this field of tension, I feel very comfortable – better to code than a statistician and less well than a full-stack developer.
What advice would you give to lay people who are interested in data science and would like to find an introduction to the topic?
Joachim Krois: Since factual and textbooks on the subject are only slowly coming onto the market, in my view the Internet is the best approach to get closer to this field. I also recommend looking for blogs on the topic. These have the advantage that they are more accessible and easier to understand than scientific papers, which usually represent the basis or methodology for the blogs. If you encounter exciting Buzzwordswhen reading such blogs, you can continue to research them on the Internet and deepen your knowledge. There are also fantastic online resources such as The AcademyThat help overcome hurdles like "I Can't program".
Image by Johnstocker, EyeEm
Since September 2017, Joachim Krois has been working as a geo-statistician in the Department of Conservative Dentistry and Preventive Dentistry at the Charité Universitätsmedizin Berlin. His professional career is based on Joachim Kroi's doctoral studies in geology at Freie Universität Berlin. During his studies, the native Austrian studied hydrological phenomena and data analysis intensively. He regularly publishes his research results in scientific papers.