The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 2,465 other followers

Archive for May 24th, 2011

mijnalbum.nl URLs and downloading pictures

Posted by jpluimers on 2011/05/24

MijnAlbum.nl is a popular site for storing and printing photos.

The UX of the site is far below what I’d like so I searched around for some scripts and wrote a bit of code for my own to make it easier.

First a few URL tricks for mijnalbum.nl (with the free demo album with Album ID TYEGMIMD):

  1. Album URLs follow this pattern:
    http://www.mijnalbum.nl/Album=TYEGMIMD
    where the letters after Album= contain the Album ID.
  2. When you click on a photo a new window opens with a URL like
    http://www.mijnalbum.nl/GroteFoto-3ANGJPWW.jpg
    where the letters after GroteFoto- contain the Photo ID and the .jpg extension.
    You don’t need to be logged in to download these photos.
  3. The album page includes a frame with thumbnails that has a URL like
    http://www.mijnalbum.nl/index.php?m=albumview&a=2&key=TYEGMIMD&selectedfoto=3ANGJPWW
    but a bit of experimentation reveals it can be condensed into
    http://www.mijnalbum.nl/?m=albumview&a=2&key=TYEGMIMD
    where the letters after key= contain the Album ID.
  4. The thumbnail frame contains photo’s that link to URLs like
    http://www.mijnalbum.nl/index.php?m=albumview&a=1&key=3ANGJPWW&album=TYEGMIMD
    that can be condensed into
    http://www.mijnalbum.nl/?m=albumview&a=1&key=3ANGJPWW&album=TYEGMIMD
    where you both need the Photo ID as key value, and Album ID as album value.
    The thumbnails are tiny, so not very convenient to browse.
  5. Download URLs look like this
    http://www.mijnalbum.nl/index.php?m=download&a=10&key=TYEGMIMD
    which can be condensed into
    http://www.mijnalbum.nl/?m=download&a=10&key=TYEGMIMD
    but you need to be logged in to download other albums than the demo album.

The thumbnails frame contains two lists of the photos in the album. Given the demo album, it first contains a this section inside a bunch of JavaScript code:

thumbs.push(new Array("http://www.mijnalbum.nl/MiniFoto-3ANGJPWW.jpg"));
thumbs.push(new Array("http://www.mijnalbum.nl/MiniFoto-6OKRE67M.jpg"));
thumbs.push(new Array("http://www.mijnalbum.nl/MiniFoto-4OSAXTJM.jpg"));
thumbs.push(new Array("http://www.mijnalbum.nl/MiniFoto-7MFJVO4F.jpg"));
thumbs.push(new Array("http://www.mijnalbum.nl/MiniFoto-YQ4BZ4AD.jpg"));
thumbs.push(new Array("http://www.mijnalbum.nl/MiniFoto-3LKQD6AG.jpg"));
thumbs.push(new Array("http://www.mijnalbum.nl/MiniFoto-O8RUL378.jpg"));
thumbs.push(new Array("http://www.mijnalbum.nl/MiniFoto-7TTFUUCG.jpg"));
thumbs.push(new Array("http://www.mijnalbum.nl/MiniFoto-L6SLLZHO.jpg"));
thumbs.push(new Array("http://www.mijnalbum.nl/MiniFoto-YLSWWFJI.jpg"));
thumbs.push(new Array("http://www.mijnalbum.nl/MiniFoto-CFWOY8KK.jpg"));

and further in the page source a HTML table with HTML fragements like these:

        <a onclick="resetScrollNextImg();" onfocus="this.blur();" href="index.php?m=albumview&a=1&key=3ANGJPWW&album=TYEGMIMD" target="fotoframe">
 <img id="thumb-3ANGJPWW" style="margin-bottom: -3px;" title="Eendjes in de wei" src="http://www.mijnalbum.nl/MiniFoto-3ANGJPWW.jpg" alt="Eendjes in de wei" width="90" height="90" border="0" />
 </a>

The HTML fragment has more context (Picture ID in both the a and img tags, Picture description in the alt attribute of the img tag).

You need to parse either of the two tables, depending on what information you are interested in.

I was interested in the thumbnail URL, and the Photo IDs and GroteFoto download URLs because it is way easier to select photos that are bigger. Something like this list:

Thumbnail URL for Album TYEGMIMD
http://www.mijnalbum.nl/index.php?m=albumview&a=2&key=TYEGMIMD
Photo URLs for Album TYEGMIMD











Since the thumbnail URL page is not xhtml, you have two options:

  • Force the page to become XHTML by some library
  • Use Regular Expressions

Though I am not a fan of using Regular Expressions for parsing general HTML, the thumbnail frame is generated in a very consistent way, so in this case I don’t mind using RegEx.

A complete console app is part of the bo.codeplex.com library.
This is the C# code I wrote as a base:

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace MijnAlbum.NL.Download.ConsoleApplication
{
    public class MijnAlbumNl
    {
        protected const string ThumbnailPrefix = "http://www.mijnalbum.nl/index.php?m=albumview&a=2&key=";
        protected const string BigJpegUrlMask = "http://www.mijnalbum.nl/GroteFoto-{0}.jpg";

        private static string DownloadString(string url)
        {
            var html = string.Empty;
            using (var webClient = new System.Net.WebClient())
            {
                html = webClient.DownloadString(url);
            }
            return html;
        }

        protected static string DownloadThumbnailsHtml(string AlbumID)
        {
            string url = GetThumbnailsUrl(AlbumID);
            string html = DownloadString(url);
            return html;
        }

        protected static string GetThumbnailsUrl(string AlbumID)
        {
            string url = ThumbnailPrefix + AlbumID;
            return url;
        }

        protected static List getPhotoIds(bool dumpGroups, string html)
        {
            List photoIds = new List();
            /* find strings like these:
            thumbs.push(new Array("http://www.mijnalbum.nl/MiniFoto-3ANGJPWW.jpg"));
            thumbs.push(new Array("http://www.mijnalbum.nl/MiniFoto-6OKRE67M.jpg"));
                         * RegEx string:
            thumbs\.push\(new\ Array\("http://www.mijnalbum.nl/MiniFoto-(?.*?)\.jpg"\)\);
                         */
            // With @, you only have to escape the double quotes with another double quote: http://weblogs.asp.net/lorenh/archive/2004/09/30/236134.aspx
            const string RegExPattern = @"thumbs\.push\(new\ Array\(""http://www.mijnalbum.nl/MiniFoto-(?.*?)\.jpg\""\)\);";
            const string PhotoIdRegExCaptureGroupName = "PhotoId";
            //const string RegExReplacement = @"${PhotoId}";
            MatchCollection matchCollection = Regex.Matches(html, RegExPattern);
            // Matches uses lazy evaluation, so each match will be evaluated when it is accessed in the loop
            foreach (Match match in matchCollection)
            {
                Group positionIdGroup = match.Groups[PhotoIdRegExCaptureGroupName];
                if (dumpGroups)
                    Console.WriteLine(positionIdGroup.Value);
                photoIds.Add(positionIdGroup.Value);
            }
            return photoIds;
        }

        protected static string GetBigJpegUrl(string photoId)
        {
            string bigJpegUrl = string.Format(BigJpegUrlMask, photoId);
            return bigJpegUrl;
        }

    }

}

Note that others wrote some scripts too, so for instance Kees Hink  wrote on Foto’s downloaden van mijnalbum.nl and there is the MijnAlbumDownloader app.

Note 2: The free RegEx Expresso tool is very nice for building and testing Regular Expressions.

–jeroen

Posted in .NET, C#, Development, Software Development | 3 Comments »

 
%d bloggers like this: