If anyone knows me, knows that I use social media heavily. It’s not an addiction, I promise. Recently my phone stopped working (it was not my fault) and I lived without one for a week, and was more than happy to not doom scroll TikTok while my code was running. Yes, I use TikTok, and that should tell you about my vast and expansive use of social media.
But also, for those who know me, what they don’t know is that I never had social media while growing up. Sure I used Facebook for playing Farmville and asking whether we had school the next day but it’s not really a social media app anymore. I didn’t have Instagram until my 3rd year of undergrad. Didn’t have Snapchat until after I finished undergrad. I never used TikTok while it was legally allowed in India. BeReal didn’t exist till I was in grad school. Twitter, I started properly using it only after grad school.
And each time, there was a singular reason to join, which was to connect with peers that I’d lose with every passing phase of life and a singular hesitancy point, which was that I did not relate to the medium of expression that these apps ran on. As someone who LOVES to express and share what they feel, preferably anonymously, these applications depended upon expressing through pictures and videos, and the aesthetic to match these pictures and videos, changed every week. However, friendships were important and I soon ignored this hesitancy. As I grew, and accumulated these social media accounts like the infinity stones on Thano’s ring, I grew confident in my usage and my ability to accommodate to each platform.
I wanted to find a sweet balance to retain my authenticity and my affinity to choose words over images, and that led me to create an alternate and public Instagram account where I’d post pictures and describe the trips that led to these pictures. A small disclaimer, I am not a photographer, so my USP lay in my witty (according to me) and long (according to my sister) captions. So when people liked these posts, I’d always assume it was because they liked all the components. Unfortunately for me and you (because you have probably been forced to read this by me), it was not the case. An initial human, non-IRB-approved evaluation on my friends showed me that they were more likely to see the image and like it than to read the long but relevant text that accompanied it. I was plating a spread and they relished just a single bite.
In another instance, I posted a story of an Instagram post of a Tweet where the original tweeter exclaimed that kids in India soon would wonder why a monument was named after the tea brand- Taj. This was in reference to NCERT’s move to expunge chapters on Mughal history from the history textbook, an explanation of which was detailed in the caption. While I will not comment on this motive as it may breach my limitations on getting political on a public forum, it was surprising to see how so many people replied talking to me about how they miss their chai in India (I do too!) and completely ignoring the point of this post, the consequences of which could be catastrophic.
I don’t blame them though, because I observed that on Stories, you’d barely see a line of the caption, which reduced their incentive to read through the description. On Instagram Reels, smaller captions found a greater audience than longer captions. BeReal and Snapchat barely allowed for textual content either way. You’d say these are primarily photo-sharing platforms and I agree. I also needed stronger metrics, thus turned to Twitter.
I chose Twitter as a platform since compared to all my other social media options, this seemed the one that was the most ‘text heavy’ – all 280 characters of it. Additionally, Twitter’s mission statement reads:
We serve the public conversation, that’s why it matters to us that people have a free and safe space to talk.
Following traditional definitions of conversations and talk, I’d assume I’d find more textual content than I’d find anywhere else. I pulled the recent popular 100 tweets with a single query ‘writing’. On a deep dive (which really was reusing code that I run every day) I saw that out of these 100 tweets, 44 were accompanied by an image/video/GIF. Since some do consider emojis as images, out of these 100, surprisingly only 12 were accompanied by an emoji. The average count of words used (including hashtags) was merely 16 words/tweet. What this means is that more tweets that are gaining traction are the ones that are accompanied by more than one form of media or mode, and the emphasis on language as a mode is reduced.
Both these results, the quantifiable metrics and my unquantifiable but equally trustworthy human evaluation (I can pinky swear), agreed with the movement that as our involvement with Social Media grew, our attention spans grew shorter, and we are likely to focus on one mode and get distracted by another or entirely another task.
Simultaneously, a couple of weeks ago, I attended a talk by Dr Ajay Divakaran who spoke about how our models and systems are expanding towards multimodal capacities. He shared his paper that evaluated current LLMs to measure how much they understood relevant concepts and how consistent they were with these understandings. While the results of this analysis showed that LLMs were probably not as consistent as we’d like them to be, it is important to note that current research in AI is programmatically evolving to create systems that can understand answers, develop a chain of thought, accept feedback, respond with a logical mindset in conversations and keep this logic active and consistent for a given period of time. And this logic, thought, mindset and conversations are expanded to textual, audio and visual data.
And his results were consistent with systems that we interact with every day. Snapchat’s AI feature was able to understand that I am eating noodles on Tuesday night and that as a college student, I should probably eat something healthier and suggested cheap and healthy meals. ChatGPT could understand my friend’s text filled with emojis and tell me that even though she called me disoriented, the use of 5 laughing emojis, indicated that it was all in good humour. These AI systems were able to understand my world, better than I did.
Just a few weeks ago, a bunch of tech gurus all over the world came together demanding a 6-month moratorium on AI development. Their debate rests on the case that AI was catching up to human capabilities and they predicted an AI apocalypse, not acknowledging that as humans our capabilities too were diminishing. Instead of slow AI, we need safe, fair and responsible AI. Safe to ensure it doesn’t threaten the principles our society is based upon. Fair to ensure it doesn’t cultivate and exaggerate the already existing biases that we humans seem to share. Responsible to ensure that it doesn’t interfere with and shrink our natural psyche.
We’ve been taught that our senses and sensibilities are linked, and that to understand the full picture, we must look at all the parts of the puzzle. Somehow we’re creating and enabling systems that will hold the ability of comprehension towards multimodality, and yet as humans who are capable of logic and emotions all naturally and congruently, we turn ignorant.