In recent years, news on migration have been part of our everyday lives. Our analysis studies the images of articles published online between 27th September 2014 and 11th June 2016. The analysis of images is considered highly significant, since the visual representation of refugees drastically shape the receivers’ thoughts about this group. The visualization of refugees strongly influence public opinion about migration crisis as the majority of the population do not tend to meet the refugees in person.
On the one hand, the role of images inserted into online media texts is to illustrate the content objectively: an image can support the statements of an article or it can highlight certain points of its content. These images are likely to appear as profile pictures of public media posts. Images have great importance during superficial reading, which precedes or even substitutes in-depth reading. A thorough analysis of more than ten thousand images would be an extremely time-consuming and costly task, so the toolkit of automatic content analysis was applied. Our project aims to identify the most significant topics of images. The most typical feature of the studied images is that they represent people. There are persons, especially politicians, who are named in the texts. Others, mainly the immigrants and the members of organisations that get in contact with them, are referred only by collective nouns (e.g. immigrants, police, civilians and officials), but they are represented on the images. They will also be included in the analysis, which is based on face recognition.
After a short comment on our data and the interpretation of the results, the topics of the images are introduced. The exhibition of topic models is followed by face recognition. First the identified demographic features are compared with the statistics of refugees, then the emotions on the faces are analysed topic by topic. Finally, the study summarizes what image metaphors were detected in articles on migration, within the framework of traditional qualitative researches. Although metaphors could not be completely identified, a comprehensive theory of classification was developed.
The corpus was compiled from the articles of the most significant Hungarian online news websites. The full list is available on the dashboard by clicking on the domain filter. Applying the search engines of the web pages, we made a search on the key words of the migration crisis (e.g. bevándorló ‘immigrant’ and migráns ‘migrant’). Then, following the records, the articles were harvested. Raw data were classified on the basis of similarity measures. Then the classes were examined by our annotator team, who filtered the majority of the non-relevant contents and duplicates. The data gathering resulted in 42,845 articles, from which 42,311 images were extracted. A significant amount of the images were not relevant. Most of these images were filtered by a simple heuristic: images below a certain size are usually logos of the web page they are displayed on or that of any other companies. The same procedure was followed in the analysis of images that of texts: the pictures were classified on the basis of similarity, supported by simhash algorithm. These groups were also checked by the annotator team to remove duplicates and non-relevant images. Finally, many images that were irrelevant to the research were also filtered with the help of topic models, which will be introduced in a later section of the article. As a result, we got 10,330 unique images, which were detected 55,003 times in the articles.
Our study was inspired by the work of Gábor Bernáth and Vera Messing and the article of CRCB. However, in contrast with the qualitative methods of traditional content analysis, our goal was to study as many images as possible. More than ten thousand images were gathered in the above introduced way, and were processed automatically, with the use of algorithms. The applied algorithms are not able to perform at 100%. It is important to note that both the demographic features and the emotion identification heavily depend on the face recognition algorithm, since they assume that they process a picture of the human face. Consequently, they can provide results about figures that were mistakenly identified as faces. Automatic content analysis cannot substitute human work, it can only support it by providing the professionals with well pre-processed data. In our project, human work was minimalized, so the forthcoming results are for information purposes only.
Due to embedded interactive visualization content, it is recommended to open our article in Chrome. The link for each visualization is provided to get an access to its separate version. If the embedded version does not load, it is worth following the link.
To be able to build a topic model from the images, the information incorporated by the images had to be converted into textual information. In order to accomplish this task, the labelling service of Clarifai was applied. It matches each image with a relevant label on the basis of its content and provides the level of relevance, expressed in certainty values. Only those labels were considered which value was above 0.75. The labels that belonged to less than 25 images were excluded. Part of the labels did not seem relevant at the first sight. For example, the categories of festival, concert and beach were extremely common. After studying the images, we learnt that these labels are applied by Clarifai systematically and it is not able to take the specificity of our corpus into account. The labelling algorithm was not trained in a dataset of refugees, hence it was common that the crowd standing in front of a cordon, or tents were labelled as festival. The problematic labels were renamed. For instance, the festival label was replaced with the label of mass scene.
A topic model is a statistic method with the help of which it can be explored which abstract topics are present in our documents. The main assumptions of the model are that each text is made up of a mixture of n number of different topics and that each topic of the document incorporates a well-defined set of words. In other words, some of the words are more likely to appear in the texts, while other are less. In our case, labels were treated as documents, from which topics – more precisely the word list, which was considered typical for them and the rate of the topics, which construct the documents – were extracted with Latent Dirichlet Allocation (LDA). Like in every machine learning task, the most challenging task was to find the right settings of the parameters (numbers of topics, alpha and beta etc.) of the algorithm. In the optimisation process, at the beginning eight topics were identified, but we early recognized that one of them is a junk topic, which contains images that are not related to the topic. These images were excluded and LDA was executed again on the corpus. This time it was instructed to classify the data into seven topics.
Based on the results of the topic model, the images were organized into seven groups, according to the topics which were considered fitting them the best. Some topics were named, after viewing most of the images of the given topic and taking into account their labels, in order to avoid referring to them only with numbers and with the words relating to the topics. This is illustrated by the table below.
|Topic number||Topic||Label||Number of images||Number of faces|
|0||Bigger groups: in camps, by the Keleti railway station, demonstrations, in front lines||military battle, police, crime, war, gather, crowd, mass scene, election||3113||1082|
|1||Smaller groups: individual portraits: refugees, figures of the media, religious leaders||music, mass scene, portrait, achievement, film, facial expression, religion, leader, actor, health||857||537|
|2||On the way: train, bus, ship, on foot||accident, vehicle, road, transport, street, city, car driver, action, industry||1103||116|
|3||Smaller groups: families, individuals, refugees, politicians, members of armed forces||portrait, child, boy, man, girl, family, facial expression, happiness, education, being together||1133||527|
|4||Individual portraits, smaller groups: politicians, public figures who shape public opinion, well-known figures||administration, leading politician, election, chair, portrait, meeting, finance, house, geography||1814||1572|
|5||Data: tables, maps, diagrams, screenshots, drawings, illustrations||visualization, sign, wallpaper, vector, textual, paper, design, symbol, data, horizontal, line, travelling, landscape, nature, water||1089||0|
|6||Crossing: fence, border, sea||nature, heaven, transport, summer, train, sunshine, vehicle||1221||89|
The second visualization represents how strongly the topics are related. In other words, it gives an idea of how many times the topics were in the top 3 regarding each image.
The relation of the topics is based on the number of times they were ranked as the first three most typical topics of a given image. The central role of the Data topic is evident: the images often display signs, e.g. signposts or tags on T-shirts or a name card in front of a shop assistant. The visualization below represents how strongly the topics are related. It gives an idea of how many times the topics were in the top 3 regarding each image.
We had tested many freely available face detection devices, but finally we decided to use the pre-trained Haar Cascade model of the OpenCV library. Our decision was based on the fact that most of our images are of spontaneous moments, hence people do not look into the camera and their head can be covered by a hat, hood or head-scarf. They might have been eating and covering their mouth with their hand when the picture was taken. As per the literature, the Haar Cascade can even perform at 90% accuracy, although in our case, this number is not even approximated. Comparing with other algorithms, it gave much less false positive results, which convinced us that it is the best choice.
After studying many freely available devices, we decided to create our own sex and age classifier. Our training data was the IMDB-WIKI-500k dataset, which contains the images and data (sex, age) of the actors of the IMDB data set. With the use of the Keras deep learning framework, a convolunional neural network was trained for sex and age identification. Our age identifier performs with 81% accuracy. It makes errors typically in case of persons below 14, over 60 and between 30 and 40. The accuracy of the sex identification is 77%. Regarding youngsters below 14, they are almost always identified as men, while persons over 60 are labelled as women by our algorithm. If the data of sex and age are contrasted, it turns out that in case of women the algorithm errors +/- 5 years, while in case of men this value is +/- 3 years on average.
The results of age and sex identification are represented by topics in the following age graphs. There are some outstanding features of each age graph, one of which is that much more men can be seen on the images than women. If we study the above presented image montages more carefully, this result seems justified by our intuitions. However, we found it surprising that there is a slight difference among the topics in respect of what percentage of the images, making up the topics, represents men and women. Men are represented in the highest percentage in topic 6 (Crossing: fence, border, sea: 74%), which is followed by topic 2 (On the way: train, bus, ship, on foot: 72%) and by topic 4 (Individual portraits, small groups: politicians, persons who shape public opinion, well-known persons: 71%). The dominance of the latter group is not surprising, as in political life men are mainly over-represented, however, noteworthy conclusions can be drawn from the presence of the first two groups. When images of crossing, border crossing and travelling appeared in the media, mostly we met faces of men. Women and the youngest age group (people/youngsters below 15) rather appear on group images as members of the crowd. As for age, it is typical for each topic that persons between 30 and 34 are present in the greatest proportion everywhere. This observation resonates with the findings of UNHCR and with the data of Eurostat.
If you cannot open the diagrams, click on this link, where you can access the figures one by one.
Data on migration between 2008 and 2017 can also be found on the website of Eurostat. The number of asylum seekers per country per month can be studied on the website, which provides the researcher with the opportunity to search further aspects of the phenomenon (e.g. sex, age group, nationality). The following figure focuses on Eurostat data regarding Hungary: we cumulated the number of asylum seekers between 2015 and 2016 in respect of the categories of sex and age group, and also calculated how certain categories are related to the total number of asylum seekers. Following the same pattern, we retrieved the age groups of Eurostat in respect of the above presented six topics and investigated the portion of the given categories of sex and age group in total. It is noticeable that the age group of 18-34 is not only considered outstanding in respect of the images, but it is also a dominant group of asylum seekers as per the Eurostat data. In this group there is a slight deviance regarding the proportion of women: they are over-represented compared to the portion of real asylum seekers. It is also worth studying that the younger groups are under-represented in case of the images, yet they appear in a great portion among the asylum seekers. The difference between the two is partly due to the defect of our age group classifier, which tends to error in case of persons younger than 14.
The founder of automatic emotion recognition was Charles Darwin, who published his theory in The Expression of the Emotions in Man and Animals in 1872. Darwin claims in his book that certain bodily expressions, which accompany certain emotional states, (e.g. blushing and wrinkling of the forehead) are genetically determined and can be universal features of the species. Darwin argues that emotions link emotional states with bodily expressions. He considers these bodily reactions functional and that they emerge from effective movements of animals in the course of selection (e.g. when one is alarmed, her eyes and nose are widen, which enhances better perception). As these bodily reactions are universal, in other words they are shared by all humans, they prove the existence of the hypothetical common ancestor of mankind.
One hundred years later, the universal nature of emotion expression became the subject of attention. The ideas broadly shared by anthropologists, according to which the expression of our emotions and decoding others’ emotions are purely social skills, i.e. they are culture specific human abilities, were challenged in the 1970s. Paul Ekman and Wallace Friesen published an article entitled Constants across cultures in the face and emotion in 1971, in which they reported the results of a large-scale research, which was carried out worldwide. In their inquiry, photos of persons expressing various emotions were presented to subjects with different cultural backgrounds both healthy and exposed to psychiatric treatment. Among the subjects there were also persons from an isolated tribe of Papua New Guinea. The study aimed to identify those emotions that are identically identified by all the participants. It is hypothesized that the facial expressions of these emotions are genetically encoded and are universal.
Ekman and Friesen found six universal emotions at first, which were identified by all participants in the same way. These are: Anger, Fear, Disgust, Happiness, Sadness and Surprise. Contempt is considered the seventh emotion, which is a bit weaker compared to the other. The first six emotions are called the six basic emotions, referring to the fact that they are exhibited in the same way all around the world. Later, a catalogue of facial movements was compiled, which contains almost 100 micro movements, which compose the facial expressions accompanying the emotions. This catalogue is called the Facial Action Coding System (FACS). It significantly contributed to the simplification of automatic emotion recognition and that of the production of animations expressing authentic emotions.
After getting familiar with the freely available devices of emotion recognition, we decided to train our own algorithm, to which we used fer2013 dataset. We trained a convolutional neural network to this task with the use of Keras deep learning framework. It performs at 70% accuracy, which means it provides valid evaluation to 70% of the images identified as a certain emotion. Furthermore, it is able to recall 58%, which means that 58% of the images belonging to a certain emotion is recognized. These are far not the best results, but taking into consideration that it is challenging even for a human to identify an emotion barley on the basis of an image of a face, we can be satisfied with this performance. Our algorithm tends to play on the same side, so if it makes an error, then it evaluates the given facial expression as neutral. Its typical error is the identification of surprise as happiness and to mismatch the negative emotions. If our research is complemented with the data on sex and age, we find that in case of women and persons below 20, the probability to error is higher. The performance of our algorithm is the best in case of middle-aged men – it guesses their emotion the best from their facial expressions.
In certain topics, like "Crossing" and "On the way", the number of the recognized faces is less than 10% of the images, but in case of the "Bigger groups" category it is only one third of it, while in case of "Smaller groups: refugees, figures of the media, religious leaders"; "Smaller groups: families, individuals, refugees, politicians, members of armed forces" and "Individual portraits and small groups" our algorithm identified a lot of faces comparing to the images. The topics of "Crossing" and "On the way" rather contain refugees, while on the picture of the topic of "Individual portraits and small groups" mainly national and European politicians can be seen. The topics that comprises smaller groups are mixed in this respect. If the images that represent the refugees in motion are considered, it seems to be proven that media is in favour of representing them as a faceless crowd. Face recognition is not only profitable because it enables one to identify the sex and age of persons – with certain limitations due to the limits of the algorithms – but also our emotions are unwillingly expressed on our face. As it is described in the Image section above, although our emotion detection algorithm is still far from perfect, it is surprising that regarding each topic, anger is the most commonly identified emotion, which is followed by sadness – lagging far behind.
How do we imagine an average refugee? How do we imagine a refugee’s face and clothing? What does her face tell? In which situation do we see her? What is her posture like? Have we got a mental picture of a young or an old person? Is a refugee a man or a woman? Which ethnical group does she belong to? Has she got relatives? If yes, where are they? What do we think, what is her profession? What might have happened to her in the past? What can be her present ambitions?
In recent years, the Hungarian online media was flooded by the notions of refugee (menekült in Hungarian), asylum seeker (menedékkérő in Hungarian), migrant (migráns in Hungarian) and immigrant (bevándorló in Hungarian). Generally, citizens of Syria, Afghan, Iraq, Somalia and Pakistan, arriving in great masses from the war zones of Near East and Africa are meant by the above listed terms. Roughly we think of them and we see them in our mind’s eye, with their appearance and hypothesized goals. The mental construction of a refugee could be motivated by several factors, among which the most prominent landmark is the media. The posture, the sex, the age and the clothing of the person and her environment on the image are all interpreted as signs, which are decoded on the basis of our previous knowledge. We have certain impressions about the person, since she looks like someone we have already met.
It is essential to keep in mind that we perceive the refugee’s figure not directly but indirectly. Our impression on her appearance is influenced not only by her real appearance, but also by the way she is presented in the media. In other words, our perception is shaped by the settings of the photos published in online media. These images do determine our mental image of a member of an unknown group. This works on the other way around as well. Our pre-suppositions determine what photographs are taken and published about the group. It is the photographer who arranges the settings of the photograph and it is the editor who selects certain photographs among many available ones. They are likely to take sides by a setting which is clearly interpretable and well-know, hence it is able to strengthen the preconceptions of the social majority about the minority groups, or it is supposed to generate certain (predictable) emotional reactions in the receivers.
The literature of social sciences distinguishes the typical compositions of images in the media that display the typical strategies of visualization of a minority group. These compositions trigger various emotional associations, or they support and validate the assumptions about the features of the minority groups, which the text also reads. We looked for examples for them in the corpus.
A group of images illustrates the refugees’ bumpy road to Europe with full of difficulties. Images show dinghies flouncing on the Mediterranean Sea, small groups marching across fields and woods or along railways.
A man on his way and wandering are well-known motifs of the Christian-Jewish culture. They have been the central topics of countless stories and pieces of arts for centuries. Wandering (for example in mythological and Biblical stories and stories which go back to these sources) is also associated with expulsion and expiation, which are preceded by a sinful action. Later examples (e.g. pieces of literature from the 20th century) equate the physical wandering with the improvement of the soul and with learning a lesson from the difficulties.
The phrases of stream, wave and flood are often used to refer to masses of immigrants. The idea is supported by the visual representation of the phenomenon: masses of refugees can be seen on the photographs, who are marching in long and wide rows. The faces of those who are standing in the crowd are not or hardly visible and their personalities and personal stories are negligible. Due to their high number and appearance, the great groups of refugees seem to be dangerous and threatening – which is reinforced by the linguistic metaphors of natural disaster (e.g. flood) elaborated in the texts. It is assumed that it triggers fear and the feeling of being helpless and unprotected. It may lead to a hostile attitude towards strangers.
As identified in texts on migration, the main reasons for the fear connected to the refugees and migrants are the threat of the spread of epidemics and the risk to public health. This idea can also be detected on the images that represent officers in gloves and white cloths while dealing with refugees. Some photos are of refugees wearing clothes and accessories that refer to health risk. These ways of visual representation may increase and justify the society’s fear of the refugees and may confirm the necessity to keep distance from strangers. They may suggest that refugees are not only different, but also threaten the life of the social majority.
Refugees are often subjects of photographs that were taken during criminal investigations. On these photos their faces are blurred, or they are showing their back, or they are being scanned or handcuffed. The setting of these photos may convey the implicit or explicit message that the refugees are criminals – without proving this fact by the decision of the court.
Hungarian press tends to publish news and pictures of police.hu, which exclusively introduces the refugees as criminals. This content is usually transformed when it is published, namely the press oversimplifies it. In this way, it may heavily increase prejudice towards refugees, since danger, committing crime and cruelty are attributed as the most easily identifiable features of the group.
It is considered that a person is displayed in an alienating way, when her face cannot be seen (as it is blurred out) or it cannot be made out on the picture (as it is blurry or can be seen from a distance). Consequently, the person is just a blurred figure of a group, which makes her deprived of her personality. It is a common strategy to portray minority groups in this way. The media tend to emphasize the minority groups’ existing or suspected ethnic and cultural differences and to identify the group with a social problem or threat. The representation lacks the individuals, whose complex personalities, life stories, goals are marginalized or even disappear. Only one of their characteristic features is foregrounded, namely that they belong to a certain minority group. As a consequence, they become one-dimensional characters, who personify a potential threat and a general problem. The receiver may create an objectified relationship with such a group and with those persons who bear the features of this group. Settings that illustrates the refugees in great masses or in a passive situation (e.g. lying on the ground), on thermal images or as data in diagrams are considered alienating and dehumanizing.
Depicting the refugees as a great mass with no faces is one end of the extremes. The other end of this scale is the visual representation of the most vulnerable and unprotected groups – the mothers and their children. The images may generate sympathy and make the audience feel sorry for the refugees. While the group of young men rather makes the members of the social majority feel threatened by this group, this does not come into their minds about the women and children, who are rather considered victims of the tough and challenging circumstances. The visual display of the women with children are often trivial. We are all familiar with the image of a mother with a child, hence it is easy to imagine the situation of a mother nurturing a child. The receivers may easily sympathize with them.
A photo report is a detailed, complex and sensitive way of visual representation. Photos of this category are closely zoomed on a person, on one’s family and on one’s everyday life. The person’s face can be seen from very close. She is engaged in everyday situations (e.g. having a meal or doing sports), which are well-known activities for the members of the social majority. These photos can either support a lengthy report or short texts can stand below these images, which interpret what is presented on the pictures. In these cases, the refugee is the main character, whose name and face are known. Parts of the refugee’s life is revealed, e.g. what happened to her and with her family, how she feels and what future she envisages.
A group of images was separately classified, which represents diagrams, maps, pie charts and other figures. If these figures visualize information about the refugees, they can have an alienating effect, since instead of human faces and stories only numbers and trends are shown (e.g. how many refugees arrived in the last two years to Hungary and how much is invested into their supply). On the other hand, these figures tend to illustrate articles that interpret the problem of migration as an economic and political issue. Although it is part of the complex issue of the migration crisis, it is not exclusively about it. This interpretation may contribute to the objectified description of the refugees.
In television reports on migration the members of the minority groups are not likely to be interviewed, just filmed. The characters who are interviewed about the issue of migration are the members of the social majority, i.e. well-known figures, public figures, members of the government and other politicians. They are acknowledged as persons having relevant comments on the topic of migration, and are interviewed frequently. A great proportion of the corpus we investigated contains pictures of politicians of the government and other national and international characters of the political scene.
Our article aimed to study images of articles published in online media from September 2014 to June 2016. Roughly 10,000 unique images were extracted from the articles – which were detected around 55,000 times – and then processed automatically.
The images were classified into seven topics: 1) Bigger groups: in camps, by the Keleti railway station, demonstrations, in front lines, 2) Smaller groups: individual portraits: refugees, figures of the media, religious leaders,
3) On the way: train, bus, ship, on foot, 4) Smaller groups: families, individuals, refugees, politicians, members of armed forces, 5) Individual portraits, smaller groups: politicians, public figures who shape the public opinion, well-known figures, 6) Data: tables, maps, diagrams, screenshots, drawings, illustrations and 7) Crossing: fence, border, sea. The most common was the sixth one, which is the consequence of the fact that images often display signs, e.g. signposts or tags on T-shirts or name cards.
After testing several freely available face detection devices, we used the pre-trained Haar Cascade model of the OpenCV library for face detection. Furthermore, we created a sex and age classifier on our own. As a result, we managed to train an algorithm, which errors +/- 5 years regarding women, while in case of men this value is +/- 3 years on average. For emotion recognition, we came up with an individual solution as well: we trained a convolutional neural network to this task with the use of the Keras deep learning framework. Our algorithm performed the best in case of middle-aged men, since it guessed their emotion the best from their facial expressions.
To sum up the results of face, sex, age and emotion detection, it was found that men are represented in the highest percentage in topic 6 (Crossing: fence, border, sea: 74%), which is followed by topic 2 (On the way: train, bus, ship, on foot: 72%) and by topic 4 (Individual portraits, small groups: politicians, persons who shape public opinion, well-known persons: 71%). Women and the youngest age group (people/youngsters below 15) rather appear on group images as members of the crowd. As for age, it is typical for each topic that persons between 30 and 34 are present in the greatest proportion everywhere. Regarding each topic, anger is the most commonly identified emotion, which is followed by sadness – lagging far behind. Finally, as for image metaphors, Threat of epidemics, On the way, Flood, Criminalization, Being face-less and personality-less, The refugee mother, Photo report, Statistics, Comments of other relevant figures were the most significant ones identified in the dataset.