Analyzing Image Reach on Wikipedia Sites Using BaGLAMa 2

--

Introduction

Wikimedia Commons, part of the non-profit, multilingual, free-content Wikipedia family, is a media file repository making available public domain and freely-licensed educational media content following the terms set by the author. In the last few years, many cultural heritage institutions have begun to upload images and data onto Wikimedia Commons and other Wikipedia sites to increase their online presence. However, it is not currently possible to determine the precise engagement of intended audiences with this content.

BaGLAMa 2 is an online tool created by Mangus Manske that tracks pageviews for Wikipedia article pages that contain images drawn from Wikimedia Commons categories and other Wikipedia sites. This tool allows users to freely analyze the reach of GLAM collection images on Wikipedia.

BaGLAMa 2 main page

On the main page, BaGLAMa 2 displays Wikimedia Commons categories, how many months that category has been tracked, which month the tool last collected data, total pageviews of images in that category of all time, and a link to more statistics of that specific category.

BaGLAMa 2 category details for Images from Metropolitan Museum of Art

Specific Commons category pages display a line graph of pageviews tracked over all time that BaGLAMa 2 has collected data, a monthly breakdown of pages and views by Wikipedia language site, and image files by pageviews on Wikipedia article pages.

Project

I chose to compare five institutions: Los Angeles County Museum of Art (Los Angeles, CA), Cleveland Museum of Art (Cleveland, OH), National Gallery of Art (Washington, D.C.), The Metropolitan Museum of Art (New York, NY), and The Walters Art Museum (Baltimore, MD). The institutions all have data collected by BaGLAMa 2 between July 1, 2020 — December 31, 2020, are non-profit art museums, allowed for comparative analysis, and had a minimum of 15,000 media files on Wikimedia Commons.

Analysis

BaGLAMa 2 is an unusual tool. Unlike other digital analytical tools, BaGLAMa 2 requires a creative methodology to gather, sort, and analyze data in order to examine image reach. It was necessary to collect data for each individual institution before I could compare my findings with the other institutions. First, I manually added total category pageviews of all time tracked within the time scope. I decided to limit my analysis to English Wikipedia since all of my institutions are located within the United States. Next, I recorded Wikipedia article pages and pageviews within the time scope for each month. Thankfully, BaGLAMa 2 has options to download information per institution in a csv format. For measuring image data, I downloaded the csv tables and recorded the top 15 Wikipedia article pages, pageviews, and image files within the time scope for each month. Then, I sorted that data to determine the top Wikipedia article pages, pageviews, and image(s) on a page for each institution. Lastly, I recorded the number of Wikimedia Commons files per institution. From this process, I was able to create data visualizations to inform my analysis.

Total category pageviews of all time tracked and in scope for all five institutions

I found that the total category pageviews of all time tracked does not necessarily mean an institution had the most category pageviews within my time scope. I also found that a category’s total months tracked does not necessarily mean an institution had more total pageviews or number of files in Wikimedia Commons.

Months tracked of all time and number of Wikimedia Commons files for all five institutions

During analysis and data visualization, I discovered that the data sets varied greatly between institutions which complicated comparisons. For this reason, I decided to compare monthly pageviews and Wikipedia pages for each individual institution.

Monthly pageviews and Wikipedia pages for four of five institutions

Generally, most monthly pageviews increased and decreased within similar ranges and Wikipedia pages increased minimally. However, I found an outlier in data from The Metropolitan Museum of Art.

Monthly pageviews and Wikipedia pages for The Metropolitan Museum of Art

Immediately, it is noticeable that there is an absence of data for August 2020. It’s unclear why BaGLAMa 2 did not collect this data. Secondly, there is an enormous spike in monthly pageviews in November 2020. This jump occurred because a work owned by The Metropolitan Museum of Art, Louis XIII style Ovolo frame (for Ingres’s Portrait of the Princesse de Broglie), was featured on Wikipedia’s main page. While this skewed my data set, it is undeniably the most successful case of image reach of all the institutions.

Finally, I investigated which images BaGLAMa 2 reported had the highest reach per Wikipedia article page by pageviews for each institution.

Data visualization of image reach comparison for two of five institutions

Limitations

While using the BaGLAMa 2 tool, I discovered several limitations while trying to measure image reach. All of these limitations can lead to inaccurate or missing data.

  • It is not supported by Wikipedia, backed by a big tech organization like Google Analytics, and does not allow for data collection add-ons like Supermetrics.
  • It primarily records pageviews which means the user viewed the page but possibly not the image.
  • Comparisons between institutions are difficult because category data sets are highly varied.
  • Institutions and projects are represented by Wikimedia Commons categories which means images are dependent on categorization rather than belonging to a single institution. This can contribute to confusion between data sets. For example, BaGLAMa 2 tracks Brooklyn Museum of Art data in three different categories: Images at the Brooklyn Museum, Collections of the Brooklyn Museum, and Media contributed by the Brooklyn Museum.

Findings

Overall, I found image reach is difficult to measure because it is impossible to determine if the user viewed or engaged with the image. Other findings include tool limitations, representation of images, and associated data.

  • It is hard to interpret why an image or page increases or decreases in views.
  • Although analyzing art museums, most images appear on pages unrelated to art; image reach is highest on general and broad topic article pages.
  • Institution “highlight” works do not have a high image reach.
  • Images vary in quality.
  • Images often lack metadata which negatively impacts the ability of the user to search and view media.
  • There is a surprisingly low Wikimedia Commons presence for prominent art institutions.
  • Large uploads are typically singular projects.

Recommendations

Based on this analysis and findings, institutions and communities interested in extending their image reach on Wikipedia should consider the following:

  • Foremost, a long-term and sustainable linked open data digital strategy.
  • Increasing image uploads which may lead to an increase in pages and pageviews.
  • Ensuring accurate information on pages and for images.
  • Providing open access to collection data.
  • Adding structured metadata that is human and machine-readable in order to make collections searchable and findable.
  • Hosting Wikipedia editing events and other collaborations with Wikipedia contributors and communities.
  • Hiring a Wikimedian-In-Residence to facilitate the strategy and implementation of media and data on Wikipedia sites.
  • Developing better analytical tools.

Conclusion

I chose to use BaGLAMa 2 for this project because I was interested in exploring and learning a new tool. I became particularly interested in the limitations of gathering and analyzing data especially in comparison to other powerful digital analytic tools like Google Analytics. Although my selection was narrow in scope, I was able to see the significance of GLAM institutions supporting the availability of data on an equitable level and hope I can contribute to that mission in the future.

Written by Marisa Kurtz, Pratt Institute M.S. Museums and Digital Culture, Advanced Certificate in Conservation and Digital Curation

Please find me at marisa.a.kurtz@gmail.com

--

--