Wednesday, 8 January 2025

How to download any attachment from GMail

Okay, on the back of my earlier post today, I couldn't help myself but continue to tinker with Google Apps script platform - by taking the automation further. This is the power of AI tools - imagine how much productivity and efficiency improvements can be unlocked to anyone, especially for the small business owner, or your own personal workflow automations -- automating manual back-office processes without relying on an IT person to support you!? 

Sure I spent about four hours playing around - the process wasn't perfect. It can be tricky making changes, the tendency to keep adding new features, sometimes breaking everything and starting over again. ChatGPT Canvas is still a work in progress, but it's no stretch of imagination to expect future enhancements with a fully integrated developer environment, deployment with version control, rollbacks, etc. Anyone with a little bit of technical curiosity can start to unlock personal productivity improvements by investing some time in learning and experimentation.

Here's a sample of the tool:


Here's the write up from ChatGPT using Canvas:

How to Build a Google Apps Script to Download Gmail Attachments

Downloading Gmail attachments automatically can save time and improve productivity. In this guide, we'll walk through building a Google Apps Script to dynamically fetch email senders, process their emails, and save attachments directly to Google Drive.

This guide assumes basic knowledge of Google Apps Script and JavaScript. Let’s dive in!


Step 1: Set Up Your Google Apps Script Project

  1. Go to Google Apps Script.
  2. Click New Project to create a blank script project.
  3. Rename the project to something meaningful, like Gmail Attachment Downloader.

Step 2: Write the Backend Script (Code.gs)

The backend script handles:

  • Fetching Gmail threads and senders.
  • Processing selected emails.
  • Saving attachments to Google Drive.

Here’s the Code.gs file:

function doGet() {
  return HtmlService.createHtmlOutputFromFile('SenderSelection').setTitle('Select Sender');
}

function getSendersChunk(startIndex, chunkSize) {
  var threads = GmailApp.getInboxThreads(startIndex, chunkSize);
  Logger.log(`Fetched ${threads.length} threads starting from index ${startIndex}.`);

  var senders = new Set();
  threads.forEach(function (thread) {
    var messages = thread.getMessages();
    messages.forEach(function (message) {
      senders.add(message.getFrom());
    });
  });

  return Array.from(senders).sort(); // Return sorted unique senders
}

function getTotalThreads() {
  return GmailApp.getInboxThreads().length;
}

function processSelectedSender(senderEmail) {
  var threads = GmailApp.search('from:' + senderEmail);
  var totalEmails = threads.length;

  var folderName = senderEmail.replace(/[<>:"/\|?*]/g, '_'); // Sanitize folder name
  var dateSuffix = Utilities.formatDate(new Date(), Session.getScriptTimeZone(), 'yyyy-MM-dd');
  var folder = DriveApp.createFolder(folderName + "_" + dateSuffix);
  var logMessages = [`Created folder: ${folder.getName()}`, `Found ${totalEmails} emails from sender: ${senderEmail}`];

  threads.forEach(function (thread, threadIndex) {
    var messages = thread.getMessages();
    messages.forEach(function (message, messageIndex) {
      var subject = message.getSubject();
      logMessages.push(`Processing email ${threadIndex + 1}/${totalEmails}: "${subject}"`);

      var attachments = message.getAttachments();
      attachments.forEach(function (attachment) {
        var fileName = attachment.getName();
        var extensionIndex = fileName.lastIndexOf(".");
        var baseName = fileName.substring(0, extensionIndex);
        var extension = fileName.substring(extensionIndex);
        var timestampedFileName = baseName + "_" + new Date().getTime() + extension;

        try {
          folder.createFile(attachment).setName(timestampedFileName);
          Logger.log(`Saved attachment: ${timestampedFileName}`);
        } catch (e) {
          Logger.log(`Error saving attachment: ${e.message}`);
        }
      });
    });
  });

  logMessages.push(`Processing complete for sender: ${senderEmail}`);
  return logMessages; // Return all log messages to the frontend
}

Step 3: Create the Frontend (SenderSelection.html)

The frontend displays a dropdown list of senders and allows the user to process selected emails. Here's the SenderSelection.html file:

<!DOCTYPE html>
<html>
  <head>
    <base target="_top">
    <style>
      body {
        font-family: Arial, sans-serif;
        margin: 20px;
      }
      label, select, button {
        font-size: 14px;
      }
      select {
        width: 100%;
        margin-bottom: 10px;
      }
      #log {
        margin-top: 20px;
        padding: 10px;
        border: 1px solid #ccc;
        background: #f9f9f9;
        height: 300px;
        overflow-y: auto;
        font-size: 12px;
        white-space: pre-wrap;
      }
      .progress {
        font-weight: bold;
        margin-top: 10px;
      }
    </style>
  </head>
  <body>
    <h3>Select a Sender</h3>
    <form id="senderForm">
      <label for="sender">Senders:</label>
      <select id="sender" name="sender">
        <!-- Options will be populated dynamically -->
      </select>
      <button type="button" onclick="submitSender()">Process</button>
    </form>

    <div id="log">Logs will appear here...</div>
    <div id="progress" class="progress"></div>

    <script>
      let startIndex = 0;
      const chunkSize = 50;
      let totalThreads = 0;

      function initialize() {
        google.script.run.withSuccessHandler(setTotalThreads).getTotalThreads();
      }

      function setTotalThreads(total) {
        totalThreads = total;
        addLog(`Total threads in Gmail: ${totalThreads}`);
        loadNextChunk();
      }

      function loadNextChunk() {
        if (startIndex >= totalThreads) {
          addLog("All senders loaded.");
          return;
        }

        addLog(`Loading senders from threads ${startIndex + 1} to ${startIndex + chunkSize}...`);
        google.script.run.withSuccessHandler(populateSenders).getSendersChunk(startIndex, chunkSize);
        startIndex += chunkSize;
      }

      function populateSenders(senders) {
        const senderDropdown = document.getElementById('sender');
        senders.forEach(function (sender) {
          const option = document.createElement('option');
          option.value = sender;
          option.textContent = sender;
          senderDropdown.appendChild(option);
        });

        loadNextChunk();
      }

      function submitSender() {
        const selectedSender = document.getElementById('sender').value;
        if (selectedSender) {
          clearLogs();
          setProgress(`Processing emails for: ${selectedSender}`);
          google.script.run.withSuccessHandler(displayLogs).processSelectedSender(selectedSender);
        } else {
          alert('Please select a sender.');
        }
      }

      function displayLogs(logMessages) {
        logMessages.forEach(addLog);
        setProgress("Processing complete.");
      }

      function addLog(message) {
        const logDiv = document.getElementById('log');
        logDiv.textContent += message + "\n";
        logDiv.scrollTop = logDiv.scrollHeight;
      }

      function clearLogs() {
        document.getElementById('log').textContent = "";
        setProgress("");
      }

      function setProgress(message) {
        document.getElementById('progress').textContent = message;
      }

      window.onload = initialize;
    </script>
  </body>
</html>

Step 4: Deploy the Web App

  1. Go to Deploy > New Deployment.
  2. Choose Web App.
  3. Configure:
    • Execute as: Me.
    • Who has access: Only Me.
  4. Deploy and copy the deployment URL.

Step 5: Test the App

  1. Open the deployment URL in your browser.
  2. Wait for senders to load dynamically.
  3. Select a sender from the dropdown.
  4. Click Process.
  5. Check the logs in the "Logs" section for real-time updates on email processing and attachment saving.
  6. Validate in Google Drive:
    • Go to your Google Drive.
    • Check for a newly created folder named after the sender's email address with a date suffix.
    • Ensure that all attachments from the processed emails appear in this folder.

Key Features

  • Dynamic Sender Loading: Senders are loaded in chunks for responsiveness.
  • Real-Time Logs: Displays progress for email and attachment processing. Note: Currently, the logs only appear once all emails are processed. This is an opportunity to enhance the script for real-time updates.
  • Automatic Folder Creation: Creates a unique folder in Google Drive for each sender's attachments.
  • Filename Uniqueness: Attachments are saved with a timestamp appended to the original filename, ensuring no files are overwritten during the saving process.

Potential Feature Enhancements

  1. Real-Time Logging:

    • Update the script to send logs incrementally to the frontend as each email is processed, instead of batching logs after completion.
  2. Progress Bar Integration:

    • Add a progress bar in the frontend to visually indicate the percentage of emails processed.
  3. Advanced Filters:

    • Allow users to apply filters such as date ranges, subject keywords, or labels to narrow down email selection.
  4. Attachment Size Limit:

    • Introduce a feature to skip downloading large attachments beyond a specified size limit.
  5. Retry Mechanism:

    • Implement error handling to retry saving attachments in case of temporary failures (e.g., Drive quota issues).
  6. Specify File Extensions:

    • Add an option to download only specific file types (e.g., PDF, DOCX) by specifying file extensions.
  7. Search by Google Contact Emails:

    • Allow users to search emails from their Google contacts directly for better targeting.
  8. Google Sheet Integration:

    • Create a Google Sheet to log the email subject, sender, and a link to the saved file in Google Drive for better tracking and reporting.
  9. Real-Time Logging:

    • Update the script to send logs incrementally to the frontend as each email is processed, instead of batching logs after completion.
  10. Progress Bar Integration:

    • Add a progress bar in the frontend to visually indicate the percentage of emails processed.
  11. Advanced Filters:

    • Allow users to apply filters such as date ranges, subject keywords, or labels to narrow down email selection.
  12. Attachment Size Limit:

    • Introduce a feature to skip downloading large attachments beyond a specified size limit.
  13. Retry Mechanism:

    • Implement error handling to retry saving attachments in case of temporary failures (e.g., Drive quota issues).

With this guide, you can easily automate the task of downloading Gmail attachments to Google Drive. Customize it further based on your needs, and streamline your email management!

How AI helped me automate downloading attachments in GMail

So today I wanted to download my entire history of municipality statements and use some AI tools to build some analytical dashboards for insights. The first thing to automate was downloading attachments from GMail. Instead of manually downloading one-by-one, or manually using a filter rule on GMail, I wanted to test ChatGPT and Gemini to see if it could provide a Google Workspace Appscript to meet my needs.

So Gemini was OKAY - I'm using Gemini Advanced 1.5 Pro with Advanced research. Interesting, it took the prompt "Help me with a google script or query in gmail that I can use for finding all emails from a specified email sender, downloading the attachments from each email, to a specified folder". Gemini then went along with doing the research, then provided the code. Unfortunately, the code didn't work, even after a few tries - maybe it was a Google drive sync issue - but Gemini was stuck on maintaining its response on the nature of the exception ("either an auth issue" or "folder does not exist"). 

So I resorted to ChatGPT 4-Turbo, created a project with these instructions, and used the same prompt and included the original code from Gemini "Act as an expert software engineer to build a google appscript for downloading attachments for all emails from a specific sender. The script needs to work flawlessly from within the Google Apps Script console. I wrote the following code - but the script does not work. I'm getting an Exception: Unexpected error while getting the method or property getFolderById on object DriveApp"

Guess what? ChatGPT nailed it on first try, even added enough logging for tracing! Here's the code. Read the report from Gemini, as this is an example of how content can be generated by AI, and it's only going to get better.

function downloadAttachmentsFromSender() {
// Replace with the email address of the sender.
var senderEmail = '[enter email here]';

// Replace with the ID or intended name of the folder in Google Drive.
var folderName = '[name of folder]'; // Base name for the folder

// Variable to store the target folder
var folder;

try {
// Verify Drive API is working
DriveApp.getFolders();
} catch (e) {
Logger.log("Please authorize the script to access your Drive.");
return; // Exit if Drive access is not authorized
}

// Check if the folder exists
var folders = DriveApp.getFoldersByName(folderName);
if (folders.hasNext()) {
folder = folders.next();
Logger.log("Using existing folder: " + folder.getName());
} else {
// Create a new folder with the current date appended
var dateSuffix = Utilities.formatDate(new Date(), Session.getScriptTimeZone(), 'yyyy-MM-dd');
var newFolderName = folderName + "_" + dateSuffix;
folder = DriveApp.createFolder(newFolderName);
Logger.log("Created new folder: " + newFolderName);
}

// Search for emails from the specified sender.
var threads;
try {
threads = GmailApp.search('from:' + senderEmail);
Logger.log("Found " + threads.length + " threads from sender: " + senderEmail);
} catch (e) {
Logger.log("Error searching for emails: " + e.message);
return; // Exit if Gmail search fails
}

// Iterate through each email thread.
threads.forEach(function (thread) {
var messages = thread.getMessages();

messages.forEach(function (message) {
var attachments = message.getAttachments();

attachments.forEach(function (attachment) {
// Extract the original file extension
var fileName = attachment.getName();
var fileExtension = fileName.substring(fileName.lastIndexOf("."));

// Create the new filename with timestamp and original extension
var newFileName = fileName.substring(0, fileName.lastIndexOf(".")) +
"_" + new Date().getTime() + fileExtension;

try {
folder.createFile(attachment).setName(newFileName);
Logger.log("Saved attachment: " + newFileName);
} catch (e) {
Logger.log("Error saving attachment: " + e.message);
}
});
});
});

Logger.log("Process complete.");
}

Anyway, I now have a repeatable way to extract attachments from emails in GMail, and now you can too.

Monday, 6 January 2025

Where did my time go in 2024? Sharing my 2024 Personametry results.

It's that time of year again to share my time tracking analysis for 2024. The visuals should be self explanatory and nothing surprising. 2024 was the year my professional work stream dominated, but not completely. I did also index on spending more time with family (at the expense of 1:1 time as a husband), and also spent much time on my own "me time" primarily focused on experimenting with AI tools. I maintained previous levels of spiritual hours, back to 2022 norms since 2023 was an outlier (pilgrimage to Makkah was the highlight of 2023). My health and fitness took a severe knock and is on a downwards trend since 2021. 2021 was the year I made big changes impacting personal and professional streams: Relocated to a new city (Cape Town) leaving friends and family behind in Johannesburg, starting a new job in a new company in a new domain -- I'm now four years into this transition, and I'm not pleased with the downwards trends in my personal streams. 2025 will need to be a year of intentional change in managing my work time, setting firmer boundaries, and finding a solution to working late nights, weekends, and limiting the streaks of long work hours. 2024 reached the peak in my work hours since starting time tracking in 2016, that's almost 10 years of time tracking! In the last four years, since starting my new job in 2021, the work hours are trending upwards (in 2022 I made adjustments to reduce work hours with some success), only to rebound higher in 2023, and much higher in 2024. In a future post, I will share the challenges of working for  a global high-tech giant like Amazon AWS from South Africa and the impact on personal time - it has been, and continues to be, the biggest challenge in maintaining my professional "happiness", especially when it comes to maintaining a regular "start day" and "end day" routines and rituals.

Overview since 2018
Here's the summary dashboard of my time spent in my personas since 2018:


What's my workflow for tracking all this data?
Nothing much has changed since I started my experiment back in 2016. I use Harvest for tracking my personal and professional streams. I use a free account, with one major project called "RAGE Time tracking". Tasks are related to my personas. I track my time from the minute I wake up till I go to sleep. Throughout the day, I switch between tasks, making sure I don't track my personal time as work time, including lunch breaks. The hours logged do track closely to 24 hours/day, for the whole year, with a negligible margin of error that's within reasonable tolerance (I'm still human!). The idea is to regularly inspect where my time is being spent, and make adjustments according to my prioritized streams. 2024 unfortunately wasn't my best year in reaching a harmony-state, something I'm not pleased about. The data helps me make sense of how my life is transforming, triggering moments of reflection for adapting or at times, just general acceptance of my current reality, it is what it is -- forcing me to make sense of my aspirations to perhaps re-calibrate (for areas where I have full agency to control and influence).

How has my workflow for insights generation changed over time - are genAI tools of any help?
In the early years, I used Excel as my main source of data insights. Later I moved over to Amazon QuickSight. I still use QuickSight to generate my analysis and dashboards, as I have a few transformations with calculated fields. I then export the Quicksight data tables as Excel file as the source data set for AI tools. Quicksight too, is a little quirky with ingesting new data, maintaining the integrity of calculated fields and sometimes data gets corrupted when updating its splice models. I spent a good few hours debugging failures in data alignment, missing rows due to errors in calculated fields (that should have "just worked").  I use a free account, so I'm unable to use any of the AI "Amazon Q" features - ideally, I'm expecting an interactive analysis/dashboard AI assistant that I can have a conversation with. Alas, there is no way to share or embed QuickSight dashboards openly, embedding QuickSight dashboards in blog posts would be cool.

Anyway, with the advent of ChatGPT, in the last two years, I started integrating AI data analysis. I spent the last few days with ChatGPT - suffice it to say, I'm not that impressed by ChatGPT's Data Analyst capability. Nothing much has improved since the last year. At least it can understand large datasets in Excel or CSV format which is cool, something that Claude, Gemini or NotebookLM don't currently support. 

ChatGPT's "Data Analyst" does a decent job performing exploratory data analysis, was able to infer what my dataset was largely about without me providing clues or guiding info. The biggest problem I encountered was loss of context, loss of memory, and complete loss of the data's runtime environment. I had to repeatedly reload the dataset because ChatGPT "lost the data", then re-coax it to understand the data fields. It also hallucinated and made simple mistakes calculating aggregated results. It got simple math wrong. At times, I had to specify in detail how to use the data fields to generate the insights I asked for. This was surprising as I had a much better experience on the model from a year ago. I'm using the latest 4o model, being a ChatGPT Plus subscriber. I feel that OpenAI team should just kill the Data Analyst as a separate feature and promote the Canvas Projects capability going forward.

On ditching the built-in "Data Analyst", I switched to the latest Projects capability in ChatGPT. I shared my dataset as project files, included all my blog posts on Personametry and RAGE as context data, and provided fine tuned instructions for the project as data scientist responsible for producing business intelligence reporting. I had a much much better experience using ChatGPT projects than Data Analyst - however - I still ran into issues with hallucination, over confidence in the AI's interpretation of data and calculated results. AI is still far from being a trusted source of business intelligence as far as I'm concerned. You need to take time to manually build your own reports and analysis, like the way I use Quicksight -- and then test the AI generates the same results. I've spent hours on this over the last few days - yes, it was a little frustrating - but nevertheless insightful. The more time I spend playing around with these tools, the more I am amazed at the potential power and future disruption in this space - and at the same time, the more I gain some comfort that there is still a long journey ahead. For my simple data set of hours logged by task, the AI doesn't perform 100% on first shot -- so for big businesses out there, don't go firing your business analysts, BI engineers anytime soon, thinking GenAI will replace these roles... IMHO, we are at least another 2 years away from maturing, if not, more. A key input is validation - you would need to generate reference test cases to validate all the metrics compute correctly to gain trust in AI-generated insights.

In 2023 I had some fun with Noteable's ChatGPT plug-in. I was really looking forward to Noteable's progress in the area of AI-assisted data intelligence, but alas, the company and product is no more. I'm now experimenting with something similar from deepnote, but it's early days - and also quite expensive for a general hobbyist as myself. I believe this is the future though.

Where do I see this experiment going?
Personal AI assistants are all the rage these days, some say 2025 is the year of AI agents. I'd like to build a truly personalized agent that is not just about rudimentary calendar integration, answering texts and emails - but an assistant that truly understands me as a person - across my personal and professional streams. Integrating my time tracking data, health, fitness and well-being data from smart trackers, integrated with my todo lists, etc. Coming back to personametry, I would like daily, weekly insights from my personal assistant. Imagine if we could have an assistant that says "Mo, I noticed you haven't exercised in awhile, and you're finishing work quite late. You've not had quality time with your wife and kids - and your recent trip to the ER is not boding well. I've booked dinner for you and the family at your favorite restaurant, and for the upcoming long weekend, I've made a booking at mountain lodge you will enjoy. I scanned your calendar, there's no major work deliverables or travel planned - so take some time to rest and recharge, or else you will burnout soon!"

So, what's the 2024 insights then?
Until I can get my personal agent implemented, manual Google slides will have to do. Check it out below!