Search This Blog

Thursday, March 24, 2011

Getting MetroLyrics Lyrics Saved As A Text File

MetroLyrics - How to save Lyrics as Text


I recently installed Winamp 5.0+ (whatever version it is) and as part of this I wanted to look for info about artists including Lyrics for songs. I was directed to MetroLyrics.com to get lyrics with the following an example page for a song:

http://www.metrolyrics.com/my-name-is-lyrics-eminem.html

As you can see above, this shows you the lyrics but if you try to get the lyrics you cant cut and paste them, nor can you view the source and get the lyrics text that way. So how to get around this was my thought. After a bit of looking I found the following out. There is a div tag with the following HTML:

<div id="lyrics">&#83;&#116;&#101;&#112;&#32; ... etc ... 

and above this there is a second div with the following id:

<div id="lyricsBox">

I'm not 100% sure how they stop you from saving the text with cut and paste...I have my theories that JavaScript is used to stop you right clicking and perhaps somehow marking the lyrics div as non-selectable via a CNTRL+A key stroke but I'm not sure. What I have noticed is that apart from whatever CSS / JS that is used they also take the lyrics which are converted from characters that are human-readable ASCII to Unicode. Hence in the lyrics div the huge set of '&#xxx;' values are the Unicode conversions of each letter in the Lyrics.

My first thought was to view the source, cut out the Unicode from within the 'lyrics' div and then use a web based converter and convert from Unicode to ASCII:

http://www.1pagedesign.com/unicode_ascii_converter/ 
http://www.mikezilla.com/exp0012.html

However the size limitations of the tools and the fact that there are non Unicode HTML break tags (ie: <br />) embedded in the middle caused some issues. So instead I decided to simply use Firefox's HTML parser to display the div but without whatever CSS and JavaScript is being used to stop the cut and paste. By viewing the source and then searching for the div id's, ie: 'lyricsBox' and 'lyrics', I copy the source from the first 'lyricsBox' div start through to the end of the 'lyrics' div containing the Unicode and then save this in a new HTML file. Since I'm not copying any of the associated CSS or JavaScript the HTML I copy is just a couple of plain DIV's with no formatting or attached JavaScript behavior, which as I mentioned earlier is what I believe stops the use of Cut and Paste. Viewing this file in browser will then correctly and automatically display the lyrics by converting the Unicode to the appropriate ASCII characters without any further interference from you required and with no fancy formatting (ie: plain text display in the browser).

Further by copying the div 'lyricsBox' I not only get the lyrics stored in the other 'lyrics' div but also get the Artist name, song title and songwriter name with a bit of bold formatting to make it stand out a bit. Great!!!

So this is a quick and simple way to get around the blocked cut & paste and still get your hands on a HTML or text version of the lyrics (once the HTML is parsed by the browser and viewed it appears as just plain text so you can easily cut and paste this as just plain text if you so wish and want to store this as text rather than HTML/Unicode characters).  I tend to just copy a number of lyrics files (in the form of the 2 DIV's of info) one after another in a single HTML file and then view the whole lot of them at once.

Enjoy :-)

B.T.W: I thought of this because I first noticed that when the page was in the process of loading prior to it being completely loaded it was still possible to highlight the lyrics text, use CNTRL+A to get the text and not only that but viewing the source prior to the page fully loading still displayed the text as ASCII rather than Unicode. This implied that the removal of the cut & paste ability and the conversion of the text to Unicode was something that happened after the page loaded (ie: triggered by CSS or more likely JavaScript) ... which made me think that removing the JavaScript (or in this case ALL JavaScript on the page) would halt the process and break me in at a point when I cold still get at the text. Originally I thought of using Firebug to break into the DIV element and try to manually remove any associated JavaScript or CSS but my final process described above was a much simpler and easier to use process.

No comments:

Post a Comment