The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 1,769 other followers

Archive for May, 2011

mijnalbum.nl URLs and downloading pictures

Posted by jpluimers on 2011/05/24

MijnAlbum.nl is a popular site for storing and printing photos.

The UX of the site is far below what I’d like so I searched around for some scripts and wrote a bit of code for my own to make it easier.

First a few URL tricks for mijnalbum.nl (with the free demo album with Album ID TYEGMIMD):

  1. Album URLs follow this pattern:
    http://www.mijnalbum.nl/Album=TYEGMIMD
    where the letters after Album= contain the Album ID.
  2. When you click on a photo a new window opens with a URL like
    http://www.mijnalbum.nl/GroteFoto-3ANGJPWW.jpg
    where the letters after GroteFoto- contain the Photo ID and the .jpg extension.
    You don’t need to be logged in to download these photos.
  3. The album page includes a frame with thumbnails that has a URL like
    http://www.mijnalbum.nl/index.php?m=albumview&a=2&key=TYEGMIMD&selectedfoto=3ANGJPWW
    but a bit of experimentation reveals it can be condensed into
    http://www.mijnalbum.nl/?m=albumview&a=2&key=TYEGMIMD
    where the letters after key= contain the Album ID.
  4. The thumbnail frame contains photo’s that link to URLs like
    http://www.mijnalbum.nl/index.php?m=albumview&a=1&key=3ANGJPWW&album=TYEGMIMD
    that can be condensed into
    http://www.mijnalbum.nl/?m=albumview&a=1&key=3ANGJPWW&album=TYEGMIMD
    where you both need the Photo ID as key value, and Album ID as album value.
    The thumbnails are tiny, so not very convenient to browse.
  5. Download URLs look like this
    http://www.mijnalbum.nl/index.php?m=download&a=10&key=TYEGMIMD
    which can be condensed into
    http://www.mijnalbum.nl/?m=download&a=10&key=TYEGMIMD
    but you need to be logged in to download other albums than the demo album.

The thumbnails frame contains two lists of the photos in the album. Given the demo album, it first contains a this section inside a bunch of JavaScript code:

thumbs.push(new Array("http://www.mijnalbum.nl/MiniFoto-3ANGJPWW.jpg"));
thumbs.push(new Array("http://www.mijnalbum.nl/MiniFoto-6OKRE67M.jpg"));
thumbs.push(new Array("http://www.mijnalbum.nl/MiniFoto-4OSAXTJM.jpg"));
thumbs.push(new Array("http://www.mijnalbum.nl/MiniFoto-7MFJVO4F.jpg"));
thumbs.push(new Array("http://www.mijnalbum.nl/MiniFoto-YQ4BZ4AD.jpg"));
thumbs.push(new Array("http://www.mijnalbum.nl/MiniFoto-3LKQD6AG.jpg"));
thumbs.push(new Array("http://www.mijnalbum.nl/MiniFoto-O8RUL378.jpg"));
thumbs.push(new Array("http://www.mijnalbum.nl/MiniFoto-7TTFUUCG.jpg"));
thumbs.push(new Array("http://www.mijnalbum.nl/MiniFoto-L6SLLZHO.jpg"));
thumbs.push(new Array("http://www.mijnalbum.nl/MiniFoto-YLSWWFJI.jpg"));
thumbs.push(new Array("http://www.mijnalbum.nl/MiniFoto-CFWOY8KK.jpg"));

and further in the page source a HTML table with HTML fragements like these:

        <a onclick="resetScrollNextImg();" onfocus="this.blur();" href="index.php?m=albumview&a=1&key=3ANGJPWW&album=TYEGMIMD" target="fotoframe">
 <img id="thumb-3ANGJPWW" style="margin-bottom: -3px;" title="Eendjes in de wei" src="http://www.mijnalbum.nl/MiniFoto-3ANGJPWW.jpg" alt="Eendjes in de wei" width="90" height="90" border="0" />
 </a>

The HTML fragment has more context (Picture ID in both the a and img tags, Picture description in the alt attribute of the img tag).

You need to parse either of the two tables, depending on what information you are interested in.

I was interested in the thumbnail URL, and the Photo IDs and GroteFoto download URLs because it is way easier to select photos that are bigger. Something like this list:

Thumbnail URL for Album TYEGMIMD
http://www.mijnalbum.nl/index.php?m=albumview&a=2&key=TYEGMIMD
Photo URLs for Album TYEGMIMD











Since the thumbnail URL page is not xhtml, you have two options:

  • Force the page to become XHTML by some library
  • Use Regular Expressions

Though I am not a fan of using Regular Expressions for parsing general HTML, the thumbnail frame is generated in a very consistent way, so in this case I don’t mind using RegEx.

A complete console app is part of the bo.codeplex.com library.
This is the C# code I wrote as a base:

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace MijnAlbum.NL.Download.ConsoleApplication
{
    public class MijnAlbumNl
    {
        protected const string ThumbnailPrefix = "http://www.mijnalbum.nl/index.php?m=albumview&a=2&key=";
        protected const string BigJpegUrlMask = "http://www.mijnalbum.nl/GroteFoto-{0}.jpg";

        private static string DownloadString(string url)
        {
            var html = string.Empty;
            using (var webClient = new System.Net.WebClient())
            {
                html = webClient.DownloadString(url);
            }
            return html;
        }

        protected static string DownloadThumbnailsHtml(string AlbumID)
        {
            string url = GetThumbnailsUrl(AlbumID);
            string html = DownloadString(url);
            return html;
        }

        protected static string GetThumbnailsUrl(string AlbumID)
        {
            string url = ThumbnailPrefix + AlbumID;
            return url;
        }

        protected static List getPhotoIds(bool dumpGroups, string html)
        {
            List photoIds = new List();
            /* find strings like these:
            thumbs.push(new Array("http://www.mijnalbum.nl/MiniFoto-3ANGJPWW.jpg"));
            thumbs.push(new Array("http://www.mijnalbum.nl/MiniFoto-6OKRE67M.jpg"));
                         * RegEx string:
            thumbs\.push\(new\ Array\("http://www.mijnalbum.nl/MiniFoto-(?.*?)\.jpg"\)\);
                         */
            // With @, you only have to escape the double quotes with another double quote: http://weblogs.asp.net/lorenh/archive/2004/09/30/236134.aspx
            const string RegExPattern = @"thumbs\.push\(new\ Array\(""http://www.mijnalbum.nl/MiniFoto-(?.*?)\.jpg\""\)\);";
            const string PhotoIdRegExCaptureGroupName = "PhotoId";
            //const string RegExReplacement = @"${PhotoId}";
            MatchCollection matchCollection = Regex.Matches(html, RegExPattern);
            // Matches uses lazy evaluation, so each match will be evaluated when it is accessed in the loop
            foreach (Match match in matchCollection)
            {
                Group positionIdGroup = match.Groups[PhotoIdRegExCaptureGroupName];
                if (dumpGroups)
                    Console.WriteLine(positionIdGroup.Value);
                photoIds.Add(positionIdGroup.Value);
            }
            return photoIds;
        }

        protected static string GetBigJpegUrl(string photoId)
        {
            string bigJpegUrl = string.Format(BigJpegUrlMask, photoId);
            return bigJpegUrl;
        }

    }

}

Note that others wrote some scripts too, so for instance Kees Hink  wrote on Foto’s downloaden van mijnalbum.nl and there is the MijnAlbumDownloader app.

Note 2: The free RegEx Expresso tool is very nice for building and testing Regular Expressions.

–jeroen

Posted in .NET, C#, Development, Software Development | 3 Comments »

hard drive – When to stop using a HDD? What rules/software apply?

Posted by jpluimers on 2011/05/23

Recently, I had a hard-drive fail: no SMART error, the vendor tools didn’t return an error, but HDtune did.

I couldn’t boot from the drive any more, and the partitions were only half visible, so I had a clear indication it should be replaced.

Since the vendor tools couldn’t indicate a problem, I returned it using a generic code, and hoped for a replacement under warranty (based on the HDtune results), and I was glad to receive a free replacement.

Obvious indicators of course are audible noises like clicking, stiction, repeated seeks, etc.
But those audible noises are harder to listen to as drive heads get smaller and the total noise level less loud.

So I collected a few links that help you determine when you should not use a HDD any more:

–jeroenpo

Posted in Power User | 3 Comments »

Zo onderweg naar #Taptoe #Mierlo om met #adestmusica te spelen #bolesian collega’s kom eens kijken :-)

Posted by jpluimers on 2011/05/21

Om 19:00 vanavond speel ik als deel van Adest Musica op taptoe Mierlo.

Dus (oud) Bolesianen en andere bekenden uit die buurt: als je tijd zin hebt, komeens kijken naar wat voor mooie show we daar geven.

–jeroen

Posted in About, Adest Musica, Personal | Leave a Comment »

Post 500

Posted by jpluimers on 2011/05/21

This is post 500 after 2 years and 1 month of blogging here.

When starting this blog, I never estimated the volume would be like this.

So far, the blog has been a great way of meeting new people, something I didn’t anticipate when starting it. Thanks everyone for commenting and teaching me. I’m never too old to learn something new :-)

One thing I did hope for was that the blog would make it easier to find things back. That really works: somehow my blog scores quite well on various search engines. That is only possible when other sites link to me: they apparently do, and I sure hope it is because of the content here. Thanks for linking!

I’m anxious how the blog evolves over the next year: will post # 750 be there? Did the topics change or not?

I’m open to suggestions, so please let me know what you like about this blog, and what direction you would like to evolve it to.

–jeroen

Posted in About, Personal | 1 Comment »

MIX2011 Fiddler talk is now live – Fiddler Web Debugger – Site Home – MSDN Blogs

Posted by jpluimers on 2011/05/20

I just found out that the talk that Eric Law gave on Fiddler during MIX2011 : he blogged MIX2011 Fiddler talk is now live, you can find the video here.

During that talk he:

  • launched the new version of Fiddler2
  • that IE9 allows localhost traffic to be intercepted by Fiddler (so no more ipv4.fiddler hacks)
  • indicated that FireFox now can use the INET layer that Fiddler2 intercepts, so no more need for FiddlerHook

–jeroen

Posted in Development, Fiddler, Power User, Software Development, Web Development | 1 Comment »

 
%d bloggers like this: