Friday, February 27, 2015

How to Scrape the Twitter Notifications Timeline

A while back, I promised I would share the fugly hack I'm using to scrape Twitter profile pics (as live links) to use at the bottom of each blog post. (Scroll to the bottom to see what I mean.) The code ain't pretty, but it works. And it's fast.

You don't have to know JavaScript to use the code: In Firefox, open a Scratchpad window (Shift+F4); or in Chrome do Control-Shift-J to get a console window. Be sure your active browser tab is the Twitter Notifications timeline view. Paste the code (below) into the scratchpad window. In Firefox, do Control-L to run the code. (Results show up in the Scratchpad itself.) In Chrome, hit Enter (return). You should get a dump of a lot of raw HTML. Copy and paste that into your web page. (Good luck getting Wordpress to display it the way you want! But at least you now have the raw HTML.)

You may have to scroll sideways to see all the code. The code was formatted using http://hilite.me/.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
/*
You want all classes "stream-item-content clearfix stream-item-activity stream-item-retweet stream-item-activity-me"
You want all <a> nodes within that.
*/

function getRTers() {
  
  var tClass = 'stream-item-content clearfix stream-item-activity stream-item-retweet stream-item-activity-me';
  var cl = document.getElementsByClassName(tClass);
  var r = []; // for caching the hits
  var lut = {}; // for unduping the hits
  
  for (var i = 0; i < cl.length; i++) {
    
    var item = cl[i];
    var a = item.getElementsByTagName('a');
    if (!a) throw "There is no spoon.";
    
    // dig all links out
    for (var j = 0; j < a.length; j++) {
      
      var hasImg = a[j].getElementsByTagName('img');
      
      // no avatar(s)? just move on
      if (!hasImg || hasImg.length == 0)
         continue;
      
      // need to ensure expanded URL, not relative URL
      var username = a[j].href.toString();
      a[j].setAttribute('title', username);
      a[j].setAttribute('href', username);
      
      var result = a[j].outerHTML;
      
      if (username in lut) // undupe
          continue;
      lut[username] = 1; // mark as visited
      
      r.push(result);  // save the markup
      
    }
  }
  return r; // return the hits
}


 // Now use the function:

 var r = getRTers();
 r.join('\n') + '\n' + r.length; // displays in console

The code relies on the fact that the retweet nodes are contained in a special class with a big huge long name. How did I figure out the huge name? I used Firefox's Inspect Element (right click on any part of any web page and choose Inspect Element from the popup menu).

Not much else to explain here, really. I do go to the trouble of unduplicating the links. For that, I use a lookup table (although I'm not using it to look anything up):

var lut = {};  // new object (aka lookup table)

In JavaScript, an object is just a hashed list, which you can think of as an array that uses text to index into the array instead of a number. (Of course, under the covers it's all numbers, but that's not our concern.) You can do

lut[ "whatever text you want" ] = 1;

and the number one gets associated with the index string "whatever text you want". There's no magic to the number one, in the above code. I have to use something to mark the index as taken. It could just as well have been 'true' or zero or Math.PI, or whatever.

When you're done, in any case, you get the HTML markup that produces this lovely mosaic:



And those are the wonderful people who retweeted me yesterday. I want to thank each and every person shown above. Please follow these great people. They retweet!

Have you joined the mailing list? What are you waiting for? 

18 comments:

  1. I wanted to thank you for this excellent read!! I definitely loved every little bit of it.Cheers for the info!!!! wordpress customization service

    ReplyDelete
    Replies
    1. Hello Everyone !

      USA SSN Leads/Fullz available, along with Driving License/ID Number with good connectivity.

      All SSN's are Tested & Verified.

      **DETAILS IN LEADS/FULLZ**

      ->FULL NAME
      ->SSN
      ->DATE OF BIRTH
      ->DRIVING LICENSE NUMBER
      ->ADDRESS WITH ZIP
      ->PHONE NUMBER, EMAIL
      ->EMPLOYEE DETAILS

      *Price for SSN lead $2
      *You can ask for sample before any deal
      *If you buy in bulk, will give you discount
      *Sampling is just for serious buyers

      ->Hope for the long term business
      ->You can buy for your specific states too

      **Contact 24/7**

      Whatsapp > +923172721122

      Email > leads.sellers1212@gmail.com

      Telegram > @leadsupplier

      ICQ > 752822040

      Delete
  2. A very good and informative article indeed . It helps me a lot to enhance my knowledge, I really like the way the writer presented his views. I hope to see more informative and useful articles in future. read more

    ReplyDelete
  3. visit the site This means you that end up benefitting from a huge range of choice.

    ReplyDelete
  4. When you are seeking the assistance of a resume writing service, my recommendation is to require an inexpensive quantity of your time. raise queries. Look over writing samples. Yes I know, longing for a reliable, professional, and top quality resume writing service are often a duty in itself, an additional task side to the already nerve-racking job hunt. That said, here ar five of the foremost vital belongings you ought to rummage around for... my blog

    ReplyDelete
  5. Thanks kas thomas for this post.This is a no evaluate informative addendum occurring.Every man should have a locate the maintenance for in severity of fellowship personal add-on.I am waiting for this within enough limits of vent. app development ukraine visit the site

    ReplyDelete
  6. i am browsing this website dailly and get nice facts from here all the time

    ReplyDelete
  7. Hello Everyone !

    USA SSN Leads/Fullz available, along with Driving License/ID Number with good connectivity.

    All SSN's are Tested & Verified.

    **DETAILS IN LEADS/FULLZ**

    ->FULL NAME
    ->SSN
    ->DATE OF BIRTH
    ->DRIVING LICENSE NUMBER
    ->ADDRESS WITH ZIP
    ->PHONE NUMBER, EMAIL
    ->EMPLOYEE DETAILS

    *Price for SSN lead $2
    *You can ask for sample before any deal
    *If you buy in bulk, will give you discount
    *Sampling is just for serious buyers

    ->Hope for the long term business
    ->You can buy for your specific states too

    **Contact 24/7**

    Whatsapp > +923172721122

    Email > leads.sellers1212@gmail.com

    Telegram > @leadsupplier

    ICQ > 752822040

    ReplyDelete
  8. i am a successful business owner and father. I got one of these already programmed blank ATM cards that allows me withdraw a maximum of $5,000 daily for 30 days. I am so happy about these cards because I received mine last week and have already used it to get $20,000. Skylink technology is giving out these cards to support people in any kind of financial problem. I must be sincere to you, when i first saw the advert, I believed it to be illegal and a hoax but when I contacted this team, they confirmed to me that although it is illegal, nobody gets caught while using these cards because they have been programmed to disable every communication once inserted into any Automated Teller Machine(ATM). If interested get through to them on mail: skylinktechnes@yahoo.com  or  whatsapp/telegram: +1(213)785-1553  

    ReplyDelete
  9. Great article! We will be linking to this great article on our website. Keep up the good writing.
    بازاریابی و مدیریت بازار پیشرفته بازاریابی و مدیریت بازار پیشرفته بازاریابی و مدیریت بازار پیشرفته
    https://www.facebook.com/codetoolsir/posts/766147670679176/

    ReplyDelete
  10. سیستم های خرید و انبارداری ppt
    nice info thanks This is nice and awesome thanks for share us.Fantastic Blog…I will definitly share your blog with other http://www.codetools.ir/p/system-buy-inventory-distribution

    ReplyDelete

Add a comment. Registration required because trolls.