Wednesday, January 29, 2014

Why is [programming language] so...?


I saw an infographic recently called Why is [state] so... and thought it would be interesting to do the same for programming languages. Unfortunately, there's no convenient visualization (like a map) for programming languages, so screen shots will have to do.  Here are a few results for the languages that Google actually recognizes as programming languages.

ASP.NET


C++


C#


Cobol


Erlang


Fortran


Haskell


Java


JavaScript


Lisp


Objective-C


Perl


PHP


Python


Ruby


Smalltalk


VB.NET



As a long-time Java programmer, I'm just pleasantly surprised that "popular" was the top result instead of "slow."

Thursday, December 26, 2013

Creating a Twitter 'Bot on Google App Engine in Python

I've been running a Twitter 'bot from my laptop using Windows Task Scheduler for the past several months, and finally decided that it's time to upload it to a server to run from the cloud. @BountyBot tweets new and interesting bounty questions from Stack Overflow several times per day. By following the instructions below, you can set up your own Twitter 'bot that runs on Google App Engine.

What you'll need to get started:
  1. Python 2.7
  2. A Google App Engine Account
  3. The Google App Engine SDK
  4. A Twitter account (with authentication credentials)
  5. A Python library for accessing the Twitter API
  6. A source of information to tweet about

What is Google App Engine?

Google App Engine is Google's cloud computing platform. It allows you to create web applications and run them on Google's existing infrastructure. GAE supports web applications written in Python, Java, PHP, and the Go programming language.  You can find a lot more information on the Google App Engine page, or keep reading for a quick guide to getting set up and posting a Python app on Google App Engine.

Setting up a Google App Engine Account

You can set up a Google App Engine Account for free.  You're only charged for the resources that your application uses (that is, if you even surpass the free service quotas), so it can be a very good low-cost alternative to traditional web hosting providers that charge a flat monthly rate, particularly for a low-resource application like a Twitter 'bot.

Go to https://accounts.google.com to sign in to the GAE dashboard with your Google account credentials. From there you can create a new application. You'll need to provide an application identifier that will be used in your application's configuration later.

Installing Python 2.7

If you don't already have Python installed, you can just download version 2.7 from the official Python download page. If you already have Python on your machine, the Google App Engine SDK installer (see next section) will check for the correct version for you. If it reports an error, you may need to download the correct version of Python, or re-install Python 2.7 in the default location for your operating system.

Installing the Google App Engine SDK

Google App Engine has its own software development kit (SDK) available for free download that allows you to  quickly get started developing your app. Choose Python from the Downloads page, then download and run the installer for your operating system. This should place a shortcut on your desktop to a program called the Google App Engine Launcher.

Deploying and Testing

Before we get to the Twitter 'bot, let's create a quick test page and deploy it to Google App Engine to make sure everything we've done so far is working. The Google App Engine Hello, World! documentation already shows how to create, test, and deploy an application from the command line. I'm going to show how to do the same simple application from the Google App Engine Launcher desktop program that we downloaded and installed in the last section.

Launch the Google App Engine Launcher program and choose Create New Applicaton... from the File menu.


You can provide a new name, or the same name you provided as an Application Identifier when you created your Google App Engine account earlier. Also provide a parent directory on your system for the project files to be stored, then click the Create button. If you go to the project directory, you'll see that several files were created for you.

  • app.yaml - The configuration file that maps URLs to handler scripts. This file also contains your unique application identifier and a version number that allows you to roll your app back to specific versions from the GAE admin console.
  • favicon.ico - The icon that will be displayed in browser tabs when your app is viewed. The Google App Engine icon is the default.
  • index.yaml - A configuration file that specifies which indexes your app uses in the App Engine datastore. Not used in this application.
  • main.py - A Python script that handles requests using the webapp2 framework.

Select the newly created application in GAE Launcher and click the Run button. If you open the log console you'll be able to see what commands are run.  When the application is running, you'll be able to go to http://localhost:8080/ in your browser and see the program's output. (If instead of a running app you get an error message at this point, you may need to go to Edit > Preferences and set the correct Python path.)

Next, click the Deploy button in GAE Launcher. You'll be prompted for your email address and password, then all of the application files will be uploaded to Google App Engine.  Once this is done, you can visit your application's public URL to view the program's output again. Congratulations, the app is now online!

Getting your Twitter Account Authentication Credentials

Before you can post tweets from your Google App Engine project, you'll need to set up some authentication credentials with your Twitter account.  Sign in to the Twitter Developers page, choose My applications from the menu at the top right, then create a new application.  You'll need to change the application type to Read and Write on the settings tab in order to give the new application access to post tweets to your Twitter account.

You'll need the Consumer key and Consumer secret from the OAuth settings section, and you'll need to create an Access token and Access token secret. Be careful! You want to keep these values secret so that other people can't use them to post status updates to your Twitter account.  I keep them in a separate properties file so that they stay out of my source code, and don't accidentally get published where people can access them. You'll see how these values are used in a later section, when we look at the BountyBot code.

Tweeting in Python

You'll need a Twitter API wrapper in order to post tweets in Python. I used tweepy when creating BountyBot because it makes posting a tweet as simple as possible. Once a status message is composed, posting it on Twitter can be done in four lines of code using tweepy, and three of those are for authentication. It doesn't get much simpler. Conveniently, tweepy is also compatible with Python 2.7.

You can download tweepy by following the instructions on the GitHub project linked above. Since it needs to be uploaded to Google App Engine in order to be used by the web app, I just copied the entire tweepy directory into the project directory for my GAE application.

What does a Twitter 'bot tweet about?

Even if it's just for your own personal amusement, you're going to want to give your 'bot something interesting to tweet about. Fortunately, there are a lot of 'bots already on Twitter to look to for inspiration.  There are 'bots that tweet weather updates, breaking news headlines, stock quotes, the price of Bitcoin, and other seemingly random facts.You're really only limited by your imagination and Twitter's 140-character post limit. Check out sites like ProgrammableWeb, Data.gov, and World Bank for thousands of data sets and APIs to use.

Stack Overflow and Bounties

The Twitter 'bot I'm going to use for this demonstration gets its information from Stack Overflow using the Stack Exchange API. Stack Overflow is a question and answer site for programmers. Professional programmers and students post questions about code that they're writing for other programmers in the community to answer. The best answers get voted up, earning reputation points for the person who posted the answer. If a question doesn't get a good answer for some time, a "bounty" of bonus reputation can be placed on the question by anyone who wants to get an answer (provided they have the extra reputation to spend on a bounty).

Bounties last for seven days, unless the person who placed the bounty awards it early. You can view all of the questions that have open bounties on the featured questions tab.  Since Stack Overflow is a very active site (thousands of questions are posted every day), it sees about 60 bounties posted per day on average. This is why it's convenient to have a Twitter 'bot that posts links to only the most interesting bounty questions (as determined by the amount of the bounty and the number of upvotes the question receives).

All of the questions and answers posted on Stack Overflow are accessible through the Stack Exchange API, including a method for returning information about questions with active bounties. The Python code we'll look at in the next section will call this API method to get all of the bounties posted in the past 8 hours.

Putting it all together

Now that all the pieces are in place, we can see how they all fit together. You can take a look at the full code for BountyBot on GitHub, and I'll explain several key points here.

The tweet_bounty.py file contains all of the updated code for BountyBot to run on Google App Engine. It follows the same basic structure as the "Hello, World!" example that we looked at earlier. The script contains a class named TweetBounty that extends the webapp2.RequestHandler class. The get method of this class is configured to handle requests.

The get method queries the Stack Exchange API for the most recently posted bounties, finds the most interesting bounty in that list, formats it into a 140-character (maximum) message, then posts that message as a status update to Twitter.

  • request_bounties - Requests a list of bounty questions from the Stack Exchange API. The most recent bounties are those that will expire in one week, so the time stamps passed to this method form an eight hour window that ends one week from the current time and date.
  • find_max - Loops through the list of bountied questions and returns the one with the highest bounty amount. Upvotes on the questions are used to break ties.
  • format_status_msg - Takes the maximum bounty question and formats it into a 140-character message for posting to Twitter. (Question title, short link to the question, bounty amount, and most relevant tags that will fit in the 140-character limit.)
  • tweet - Takes the formatted status message and posts it to the Twitter account whose authentication credentials are supplied in the settings.cfg file.
The tweet function is where the magic happens, so it's worth taking a closer look at it here.

# Update the Twitter account authorized
# in settings.cfg with a status message.
def tweet(status):
    config = ConfigParser.RawConfigParser()
    config.read('settings.cfg')
    
    # http://dev.twitter.com/apps/myappid
    CONSUMER_KEY = config.get('Twitter OAuth', 'CONSUMER_KEY')
    CONSUMER_SECRET = config.get('Twitter OAuth', 'CONSUMER_SECRET')
    # http://dev.twitter.com/apps/myappid/my_token
    ACCESS_TOKEN_KEY = config.get('Twitter OAuth', 'ACCESS_TOKEN_KEY')
    ACCESS_TOKEN_SECRET = config.get('Twitter OAuth', 'ACCESS_TOKEN_SECRET')

    auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
    auth.set_access_token(ACCESS_TOKEN_KEY, ACCESS_TOKEN_SECRET)
    api = tweepy.API(auth)
    result = api.update_status(status)

The tweet function takes in status as an argument and posts it to Twitter. The first two lines of the function read a configuration file that contain the Twitter authentication credentials we set up earlier, and the next four lines read those values from the file. If we were going to reuse these values it would be worth it to pull this part of the code out into a separate function, but since this script only accesses Twitter once, we can do it all in one place.

The last four lines use the authentication credentials loaded from the file to prove to Twitter that the script has permission to post status messages on the associated Twitter account, then updates the status of that account with the status message passed in as an argument. That's all there is to it.

Scheduling tweets with cron

Now that we've got a web app up and running that can post to Twitter, all that's left is to set up a schedule for when those tweets should be posted using cron. You do this on Google App Engine by creating a cron.yaml file that specifies when you want tasks to be executed. Tasks for BountyBot in cron.yaml have the following format:

- description: daily 1PM tweet
  url: /tweet_bounty
  schedule: every day 13:00
  timezone: America/New_York

The first line is just a description and doesn't change the way cron behaves. The second line is the URL of the script that needs to be run on a schedule. It's a relative URL from the base of the application on GAE. The next line tells cron when to run the script, and the last one specifies what timezone that schedule is based in. If you leave out the timezone, GAE will assume UTC. Since I want BountyBot to post tweets three times per day, I have three entries in my cron.yaml file, one each for 5AM, 1PM, and 9PM in my timezone.

Important: Since scheduled tasks usually do things that you only want to be done on a schedule and not by users visiting the URL of the script (like posting status updates to your Twitter account, for one example), it's important to secure those scripts so that they can only be run by a site administrator (you) and the task scheduler (cron). You can secure a script by adding login: admin to its entry in your app.yaml file.

- url: /tweet_bounty
  script: tweet_bounty.app
  login: admin

You can read more about formatting tasks in cron.yaml in the GAE article Scheduled Tasks With Cron for Python.

Finally...

 If you made it this far, congratulations, you're a Google App Engine expert! Just kidding. This article really only scratches the surface when it comes to Google App Engine, but it does serve as a quick start guide that you can use to get something up and running in a weekend. Be sure to visit the Google App Engine developer's guide to find much more in-depth tutorials, sample code, and videos that explain all the features of the platform. If you run into problems, remember that you can always search or ask the experts on Stack Overflow for some help. Good luck!

Tuesday, April 30, 2013

SICP 2.66: Sets and information retrieval

From SICP section 2.3.3 Sets and information retrieval

The final part of section 2.3.3 asks us to consider a database that contains records, each of which has a key and some data. If the database is represented as an unordered list, a record can be looked up by its key using the following procedure.

(define (lookup given-key set-of-records)
  (cond ((null? set-of-records) false)
        ((equal? given-key (key (car set-of-records)))
         (car set-of-records))
        (else (lookup given-key (cdr set-of-records)))))

We can define simple procedures for making a record out of a key and its data, and for extracting the key and data from an existing record in order to test the procedure above.

(define (key record) (car record))
(define (data record) (cdr record))
(define (make-record key data) (cons key data))

(define database
  (list (make-record 1 'Bill)
        (make-record 2 'Joe)
        (make-record 3 'Frank)
        (make-record 4 'John)))
  
> (lookup 3 database)
'(3 . Frank)
> (data (lookup 1 database))
'Bill

Exercise 2.66 asks us to implement the lookup procedure for the case where the set of records is structured as a binary tree, ordered by the numerical values of the keys.

We can start by including the list->tree and partial-tree procedures given for exercise 2.64, along with a few required supporting procedures.

(define (entry tree) (car tree))
(define (left-branch tree) (cadr tree))
(define (right-branch tree) (caddr tree))
(define (make-tree entry left right)
  (list entry left right))
  
(define (list->tree elements)
  (car (partial-tree elements (length elements))))

(define (partial-tree elts n)
  (if (= n 0)
      (cons '() elts)
      (let ((left-size (quotient (- n 1) 2)))
        (let ((left-result (partial-tree elts left-size)))
          (let ((left-tree (car left-result))
                (non-left-elts (cdr left-result))
                (right-size (- n (+ left-size 1))))
            (let ((this-entry (car non-left-elts))
                  (right-result (partial-tree (cdr non-left-elts)
                                              right-size)))
              (let ((right-tree (car right-result))
                    (remaining-elts (cdr right-result)))
                (cons (make-tree this-entry left-tree right-tree)
                      remaining-elts))))))))

This makes it easier to convert the existing database to one structured as a binary tree.

> (define tree-db (list->tree database))
> tree-db
'((2 . Joe) ((1 . Bill) () ()) ((3 . Frank) () ((4 . John) () ())))

Finally, we can write the new implementation of lookup using element-of-set? as a guide.

(define (lookup given-key set-of-records)
  (cond ((null? set-of-records) #f)
        ((= given-key (key (car set-of-records)))
         (car set-of-records))
        ((< given-key (key (car set-of-records)))
         (lookup given-key (left-branch set-of-records)))
        ((> given-key (key (car set-of-records)))
         (lookup given-key (right-branch set-of-records)))))

> (lookup 3 tree-db)
'(3 . Frank)
> (lookup 1 tree-db)
'(1 . Bill)
> (lookup 5 tree-db)
#f
> (data (lookup 2 tree-db))
'Joe

For links to all of the SICP lecture notes and exercises that I've done so far, see The SICP Challenge.

Sunday, March 24, 2013

SICP 2.63 - 2.65: Sets as binary trees

From SICP section 2.3.3 Example: Representing Sets

So far in this section we've looked at two ways of representing sets. First as unordered lists, then as ordered lists. Now we'll look at how we can represent sets as binary trees, and we'll see what advantages this representation has over ordered lists.

Each node of a tree holds one element of the set, called the "entry" at that node, and a link to each of two other (possibly empty) nodes. The "left" link points to elements smaller than the one at the node, and the "right" link to elements greater than the one at the node. The same set may be represented by a number of different trees. The only requirements for a valid representation is that all elements in the left subtree must be smaller than the node entry and all elements in the right subtree be must be larger than the node entry. Figure 2.16 in the text shows several valid tree representations of the same set of values.



Recall that for an ordered set of n elements, we had to search through (on average) n/2 elements to locate a particular value. We do this by searching through the elements in order. The advantage of a tree representation is that we can cut this effort down to log n if the tree is balanced.


Exercise 2.63 asks us if the following two procedures produce the same results for every tree, and if not how they differ.

(define (tree->list-1 tree)
  (if (null? tree)
      '()
      (append (tree->list-1 (left-branch tree))
              (cons (entry tree)
                    (tree->list-1 (right-branch tree))))))
     
(define (tree->list-2 tree)
  (define (copy-to-list tree result-list)
    (if (null? tree)
        result-list
        (copy-to-list (left-branch tree)
                      (cons (entry tree)
                            (copy-to-list (right-branch tree)
                                          result-list)))))
  (copy-to-list tree '()))

The tree->list-1 procedure checks to see if the tree passed in is null, and if so returns an empty list. If the tree is not null, it creates a list by appending the left branch of the tree, the element at the root node, and the right branch of the tree. Elements of the left and right branches are flattened into lists using recursive calls to tree->list-1. The tree->list-2 procedure defines a helper function copy-to-list that takes the tree and a result-list as arguments. When the tree is null, it returns the result-list that was passed in. The copy-to-list helper function also uses recursive calls to the left and right branches of the tree while building the final result list. These two procedures will produce the same results for every tree.

We're asked to test the two procedures on the trees in figure 2.16.

(define tree1 '(7 (3 (1 () ()) (5 () ())) (9 () (11 () ()))))
(define tree2 '(3 (1 () ()) (7 (5 () ()) (9 () (11 () ())))))
(define tree3 '(5 (3 (1 () ()) ()) (9 (7 () ()) (11 () ()))))

> (tree->list-1 tree1)
'(1 3 5 7 9 11)
> (tree->list-2 tree1)
'(1 3 5 7 9 11)
> (tree->list-1 tree2)
'(1 3 5 7 9 11)
> (tree->list-2 tree2)
'(1 3 5 7 9 11)
> (tree->list-1 tree3)
'(1 3 5 7 9 11)
> (tree->list-2 tree3)
'(1 3 5 7 9 11)

We can see from these results that both procedures return an in-order traversal for every tree.

We're also asked if the two procedures have the same order of growth for a balanced tree, and if not, which one grows more slowly?

We can see from the results above and from inspecting the two procedures that each node of the tree is visited one time by each algorithm. What happens at each of those n steps is subtly different though. The second procedure simply calls cons at each step, which we'll assume is a constant-time operation, so the tree->list-2 procedure has a time complexity of $O(n)$. The first procedure calls append at each step, which we saw in section 2.2.1 has the following definition:

(define (append list1 list2)
  (if (null? list1)
      list2
      (cons (car list1) (append (cdr list1) list2))))

From this definition we can see that the order of growth of append is proportional to the first list argument that's passed in. In the case of tree->list-1, the first list argument is the left branch of the tree, which is about half of a node's elements for a balanced tree. This means that for each recursive call, approximately half of the number of nodes will be in the first list argument as in the previous call. Since the number of elements is cut in half on each of the n calls to append, the tree->list-1 procedure has a complexity of $O(n log n)$ for a balanced tree.


Exercise 2.64 introduces the list->tree procedure, which converts an ordered list to a balanced binary tree using the helper procedure partial-tree that takes as arguments an integer n and list of at least n elements and constructs a balanced tree containing the first n elements of the list.

(define (list->tree elements)
  (car (partial-tree elements (length elements))))

(define (partial-tree elts n)
  (if (= n 0)
      (cons '() elts)
      (let ((left-size (quotient (- n 1) 2)))
        (let ((left-result (partial-tree elts left-size)))
          (let ((left-tree (car left-result))
                (non-left-elts (cdr left-result))
                (right-size (- n (+ left-size 1))))
            (let ((this-entry (car non-left-elts))
                  (right-result (partial-tree (cdr non-left-elts)
                                              right-size)))
              (let ((right-tree (car right-result))
                    (remaining-elts (cdr right-result)))
                (cons (make-tree this-entry left-tree right-tree)
                      remaining-elts))))))))

First we're asked to explain how partial-tree works, then draw the tree produced by list->tree for the list (1 3 5 7 9 11).

The partial-tree procedure works by dividing the list into three parts, a center element (the root node of the tree), everything before the center element, and everything after the center element. All the elements before the center element are then passed to a recursive call to partial-tree to create the left branch of the tree, and all the elements after the center element are passed recursively to partial-tree to create the right branch. These recursive call continue until no elements are remaining, and the balanced binary tree is assembled.

The tree produced by list->tree for the list (1 3 5 7 9 11) is:



To verify this, we can simply call the procedure.

> (list->tree '(1 3 5 7 9 11))
'(5 (1 () (3 () ())) (9 (7 () ()) (11 () ())))

Next we're asked what is the order of growth in the number of steps required by list->tree to convert a list of n elements? The procedure only needs to visit each element of the list once, and it only performs a cons for each element it visits, so the number of steps is proportional to the size of the list, or $O(n)$.


Exercise 2.65 asks us to use the results of the previous two exercises to give $O(n)$ implementations of union-set and intersection-set for sets implemented as (balanced) binary trees.

We implemented union-set for the unordered list representation of sets back in exercise 2.59. This implementation had to check all elements of one set for each element of the other, so it's complexity was $O(n^2)$, quite poor. We improved on this in exercise 2.62 when we wrote an implementation of union-set for the ordered list representation of sets, which was $O(n)$. The text supplied a similar implementation of intersection-set that was also $O(n)$. We could use these ordered set implementations as a guide to writing efficient implementations of union-set and intersection-set for balanced binary trees, but that wouldn't require the results of the previous two exercises. Instead, we can use the $O(n)$ implementations of all of the procedures we've built so far to perform the following steps:

  • Convert the balanced binary trees to ordered lists.
  • Perform the desired operation (union-set or intersection-set).
  • Convert the resulting ordered set back to a balanced binary tree.

(define (union-set tree1 tree2)
  (define (union-list set1 set2)
    (cond ((null? set1) set2)
          ((null? set2) set1)
          ((= (car set1) (car set2))
           (cons (car set1) (union-list (cdr set1) (cdr set2))))
          ((< (car set1) (car set2))
           (cons (car set1) (union-list (cdr set1) set2)))
          (else (cons (car set2) (union-list set1 (cdr set2))))))
  (list->tree (union-list (tree->list-2 tree1)
                          (tree->list-2 tree2))))

(define (intersection-set tree1 tree2)
  (define (intersection-list set1 set2)
    (if (or (null? set1) (null? set2))
        '()    
        (let ((x1 (car set1)) (x2 (car set2)))
          (cond ((= x1 x2)
                 (cons x1
                       (intersection-list (cdr set1)
                                          (cdr set2))))
                ((< x1 x2)
                 (intersection-list (cdr set1) set2))
                ((< x2 x1)
                 (intersection-list set1 (cdr set2)))))))
  (list->tree (intersection-list (tree->list-2 tree1)
                                 (tree->list-2 tree2))))

In the implementations above, I've just defined the earlier ordered set implementations of union-set and intersection-set as helper functions named union-list and intersection-list. With these helper functions, all union-set and intersection-set need to do is convert from tree to list and back from list to tree. We can define a few balanced trees to test that these new implementations work as expected.

> (define evens (list->tree '(0 2 4 6 8 10)))
> (define odds (list->tree '(1 3 5 7 9)))
> (define primes (list->tree '(2 3 5 7 11 13 17 19)))
>
> evens
'(4 (0 () (2 () ())) (8 (6 () ()) (10 () ())))
> odds
'(5 (1 () (3 () ())) (7 () (9 () ())))
> primes
'(7 (3 (2 () ()) (5 () ())) (13 (11 () ()) (17 () (19 () ()))))
>
> (union-set odds evens)
'(5
  (2 (0 () (1 () ())) (3 () (4 () ())))
  (8 (6 () (7 () ())) (9 () (10 () ()))))
> (union-set odds odds)
'(5 (1 () (3 () ())) (7 () (9 () ())))
> (intersection-set evens primes)
'(2 () ())
> (intersection-set odds primes)
'(5 (3 () ()) (7 () ()))
> (intersection-set odds evens)
'()


For links to all of the SICP lecture notes and exercises that I've done so far, see The SICP Challenge.

Saturday, September 1, 2012

12 (Really) Controversial Programming Opinions

A few days ago, Yannis Rizos posted 20 controversial programming opinions on the Programmers Community Blog. Judging by the comments on the blog, and on reddit and Hacker News, none of these opinions are considered all that controversial by the programming community at large. The problem stems from the fact that the opinions posted were selected from among the top-voted answers to Jon Skeet’s question What’s your most controversial programming opinion?, originally asked on Stack Overflow on January 2, 2009. People seem to have voted for answers they strongly agreed with, making those top answers some of the least controversial opinions you could gather.

I decided to take a different approach. What follows are some of the opinions that I found near the middle or at the end of the list. I tried to pick only answers where the author made an attempt at supporting their opinion, but as you can see some of these opinions were downvoted more heavily than they were upvoted by the Stack Overflow community. (I'll add that my own "controversial" opinion is that Jon's question is perhaps the best argument we have that these types of opinion polls simply do not work on Stack Overflow.)

1. Two lines of code is too many. (+20/-32) by Jay Bazuzi

If a method has a second line of code, it is a code smell. Refactor.

2. If it's not native, it's not really programming (+5/-15) by Mason Wheeler

By definition, a program is an entity that is run by the computer. It talks directly to the CPU and the OS. Code that does not talk directly to the CPU and the OS, but is instead run by some other program that does talk directly to the CPU and the OS, is not a program; it's a script. Read more

3. The "While" construct should be removed from all programming languages. (+6/-14) by seanyboy

You can easily replicate While using "Repeat" and a boolean flag, and I just don't believe that it's useful to have the two structures. In fact, I think that having both "Repeat...Until" and "While..EndWhile" in a language confuses new programmers. Read more

4. Copy/Pasting is not an antipattern, it fact it helps with not making more bugs (+4/-5) by serg

My rule of thumb - typing only something that cannot be copy/pasted. If creating similar method, class, or file - copy existing one and change what's needed. (I am not talking about duplicating a code that should have been put into a single method). Read more

5. Developing on .NET is not programming. Its just stitching together other people's code. (+7/-5) by Gerard

Having come from a coding background where you were required to know the hardware, and where this is still a vital requirements in my industry, I view high level languages as simply assembling someone else's work. Nothing essentially wrong with this, but is it 'programming'? Read more

6. The use of try/catch exception handling is worse than the use of simple return codes and associated common messaging structures to ferry useful error messages. (+11/-3) by Einstein

Littering code with try/catch blocks is not a solution.

Just passing exceptions up the stack hoping whats above you will do the right thing or generate an informative error is not a solution. Read more

7. Test Constantly (+15/-7) by PJ Davis

You have to write tests, and you have to write them FIRST. Writing tests changes the way you write your code. It makes you think about what you want it to actually do before you just jump in and write something that does everything except what you want it to do. Read more

8. Object Oriented Programming is absolutely the worst thing that's ever happened to the field of software engineering. (+34/-14) by Breton

The primary problem with OOP is the total lack of a rigorous definition that everyone can agree on. This easily leads to implementations that have logical holes in them, or language like Java that adhere to this bizarre religious dogma about what OOP means, while forcing the programmer into doing all these contortions and "design patterns" just to work around the limitations of a particular OOP system. Read more

9. C (or C++) should be the first programming language (+24/-5) by hansen j

The first language should NOT be the easy one, it should be one that sets up the student's mind and prepare it for serious computer science.
C is perfect for that, it forces students to think about memory and all the low level stuff, and at the same time they can learn how to structure their code (it has functions!)

C++ has the added advantage that it really sucks :) thus the students will understand why people had to come up with Java and C#.

10. Classes should fit on the screen. (+22/-7) by Jay Bazuzi

If you have to use the scroll bar to see all of your class, your class is too big.

Code folding and miniature fonts are cheating.

11. Making invisible characters syntactically significant in python was a bad idea (+43/-5) by Paul Wicks

It's distracting, causes lots of subtle bugs for novices and, in my opinion, wasn't really needed. About the only code I've ever seen that didn't voluntarily follow some sort of decent formatting guide was from first-year CS students. And even if code doesn't follow "nice" standards, there are plenty of tools out there to coerce it into a more pleasing shape.

12. Singletons are not evil (+42/-7) by Steve

There is a place for singletons in the real world, and methods to get around them (i.e. monostate pattern) are simply singletons in disguise. For instance, a Logger is a perfect candidate for a singleton. Additionally, so is a message pump. My current app uses distributed computing, and different objects need to be able to send appropriate messages. There should only be one message pump, and everyone should be able to access it. The alternative is passing an object to my message pump everywhere it might be needed and hoping that a new developer doesn't new one up without thinking and wonder why his messages are going nowhere. The uniqueness of the singleton is the most important part, not its availability. The singleton has its place in the world.

Sunday, July 1, 2012

SICP 2.61 - 2.62: Sets as ordered lists

From SICP section 2.3.3 Example: Representing Sets

In exercises 2.59 and 2.60 we looked at how to represent sets as unordered lists. In the next two we'll look at representing sets as ordered lists, and at what benefits we get from the new implementation.

First, if elements in a set are guaranteed to be in order, we can improve the performance of element-of-set by returning false as soon as we encounter an element of the set that's less than the value that we're looking for. All of the remaining elements are guaranteed to be smaller as well. This saves us on average half the number of comparisons from the previous implementation.
(define (element-of-set? x set)
  (cond ((null? set) false)
        ((= x (car set)) true)
        ((< x (car set)) false)
        (else (element-of-set? x (cdr set)))))
We get a more impressive improvement from intersection-set. In the old implementation using unordered lists we had to compare every element of each set to every element in the other set. This had a complexity of $O(n^2)$. With the elements of both sets in order we can reduce that all the way down to $O(n)$. This is possible due to the fact that if the car of one ordered set is smaller than the car of another ordered set, it has to be smaller than all the other elements and we can discard it right away.
(define (intersection-set set1 set2)
  (if (or (null? set1) (null? set2))
      '()    
      (let ((x1 (car set1)) (x2 (car set2)))
        (cond ((= x1 x2)
               (cons x1
                     (intersection-set (cdr set1)
                                       (cdr set2))))
              ((< x1 x2)
               (intersection-set (cdr set1) set2))
              ((< x2 x1)
               (intersection-set set1 (cdr set2)))))))
Exercise 2.61 asks us to give an implementation of adjoin-set using the ordered representation. Like element-of-set?, our implementation of adjoin-set should require only half as many steps on average as the unordered representation.

The original implementation of adjoin-set made use of element-of-set? to check and see if the new element was already a member of the set. We no longer need to do this since we need to find the exact position in the set to insert the new element. As we scan through the set looking for the right position, we can simply return the set if we encounter the element we're trying to place.
(define (adjoin-set x set)
  (cond ((null? set) (cons x '()))
        ((= x (car set)) set)
        ((< x (car set)) (cons x set))
        ((> x (car set)) (cons (car set)
                               (adjoin-set x (cdr set))))))
There are several different conditions, and we need to cover them all. First, since we're no longer using element-of-set, we need to check to see if the set itself is null or empty. Second, if the we encounter the element we're trying to add, we can just return the set. Next, if the element we're adding is smaller than the first element of the set, we can simply cons the new element to the beginning of the set. Last, if the new element is greater than the first element of the set, we join the first element followed by the adjoin-set of the new element and the rest of the set.
> (adjoin-set 3 null)
(3)
> (adjoin-set 3 '())
(3)
> (adjoin-set 3 '(3 4 5))
(3 4 5)
> (adjoin-set 3 '(4 5 6))
(3 4 5 6)
> (adjoin-set 3 '(1 2 4 5))
(1 2 3 4 5)
> (adjoin-set 3 '(0 1 2))
(0 1 2 3)
Like element-of-set?, this implementation should only scan through half the set on average before finding the correct insertion point for the new element.

Exercise 2.62 asks us to give a $O(n)$ implementation of union-set for sets represented as ordered lists. Since the two sets are in order, we can simply step through each set comparing the first elements of each. At each step we decide which of the first two elements to place in the resulting set.
(define (union-set set1 set2)
  (cond ((null? set1) set2)
        ((null? set2) set1)
        ((= (car set1) (car set2))
         (cons (car set1) (union-set (cdr set1) (cdr set2))))
        ((< (car set1) (car set2))
         (cons (car set1) (union-set (cdr set1) set2)))
        (else (cons (car set2) (union-set set1 (cdr set2))))))
Anyone familiar with the mergesort algorithm will quickly recognize that this implementation of union-set is almost exactly the same procedure as the merge subroutine. It's purpose is to quickly merge two sorted lists into one. The only difference is that the union-set implementation above only keeps one copy of duplicate elements.
> (define odds '(1 3 5 7))
> (define evens '(2 4 6 8))
> (define primes '(2 3 5 7))
> (union-set odds evens)
(1 2 3 4 5 6 7 8)
> (union-set odds primes)
(1 2 3 5 7)
> (union-set evens primes)
(2 3 4 5 6 7 8)

For links to all of the SICP lecture notes and exercises that I've done so far, see The SICP Challenge.

Monday, May 28, 2012

SICP 2.59 - 2.60: Sets as unordered lists

From SICP section 2.3.3  Example: Representing Sets

Section 2.3.3 shows us several ways to represent sets in Scheme, starting with unordered sets. To start, we define what a 'set' is by specifying the operations that are to be used on sets. These are:

  • element-of-set? - a predicate function that determines whether a given element is a member of a set.
  • adjoin-set - takes an object and a set as arguments and returns the set that contains the elements of the original set and the adjoined object.
  • intersection-set - computes the intersection of two sets, which is the set containing only elements that appear in both arguments.
  • union-set - computes the union of two sets, which is the set containing each element that appears in either argument.

We represent a set as an unordered list by making sure no element appears more than once. The text provides the following implementations for representing sets as unordered lists.

(define (element-of-set? x set)
  (cond ((null? set) false)
        ((equal? x (car set)) true)
        (else (element-of-set? x (cdr set)))))

(define (adjoin-set x set)
  (if (element-of-set? x set)
      set
      (cons x set)))

(define (intersection-set set1 set2)
  (cond ((or (null? set1) (null? set2)) '())
        ((element-of-set? (car set1) set2)        
         (cons (car set1)
               (intersection-set (cdr set1) set2)))
        (else (intersection-set (cdr set1) set2))))

We can define a few lists to test and see how these procedures work together.

> (define odds '(1 3 5 7))
> (define evens '(2 4 6 8))
> (define primes '(2 3 5 7))
> (element-of-set? 5 odds)
#t
> (element-of-set? 5 evens)
#f
> odds
(1 3 5 7)
> (adjoin-set 9 odds)
(9 1 3 5 7)
> (intersection-set odds evens)
()
> (intersection-set odds primes)
(3 5 7)
> (intersection-set evens primes)
(2)

Exercise 2.59 asks us to implement the union-set operation for the unordered-list representation of sets.

(define (union-set set1 set2)
  (cond ((null? set1) set2)
        ((null? set2) set1)
        ((element-of-set? (car set1) set2) (union-set (cdr set1) set2))
        (else (cons (car set1) (union-set (cdr set1) set2)))))

This implementation starts by checking to see if either set is null. If so, the other set is returned immediately. The procedure then checks to see if the first element of set1 is an element of set2. If so, that element is discarded from the first set, and the union of the remaining elements of set1 and set2 are returned. If the first element of set1 is not an element of set2, it is included in the list returned by the procedure.

> (union-set odds evens)
(1 3 5 7 2 4 6 8)
> (union-set odds primes)
(1 2 3 5 7)
> (union-set evens primes)
(4 6 8 2 3 5 7)


Exercise 2.60 asks us what would need to change in the implementations above if we redefine 'set' to allow duplicate elements. For example, the set {1, 2, 3} could be represented as the list (2 3 2 1 3 2 2). We need to compare the efficiency of each of the new procedures with their original (non-duplicate) implementations. The exercise also asks if there are applications where the new representation would be preferred over the original.

The implementation of element-of-set? doesn't need to change. It returns #t when it finds an element that matches the input element, otherwise it returns #f. The implementation of adjoin-set is simplified by the new definition. Since we no longer need to check to see if the input element already exists in the set, we can just cons the element to the existing set.

(define (adjoin-set x set)
   (cons x set))

Like adjoin-set, union-set is also simplified by allowing duplicate elements.

(define (union-set set1 set2) 
   (append set1 set2))

We have an interesting choice when it comes to intersection-set. The intersection of two sets under the original (non-duplicate) definition doesn't seem like it requires any change in implementation since duplicates are now allowed, not necessarily required. However, look what happens when you execute intersection-set with duplicate elements in the first set vs. the second set.

> (define primes '(2 3 5 7))
> (define twos '(2 2 2))
> (intersection-set primes twos)
(2)
> (intersection-set twos primes)
(2 2 2)

The result is different depending on which set has duplicate elements. This is because the original implementation checks each element of the first set independently to see if they're present in the second set. When the first set contains duplicates, so will the result. Since the instructions in the exercise are ambiguous (and being the typical lazy programmer that I am), I'm going to say that this implementation does not need to change.

Since element-of-set? and intersection-set haven't changed, neither will their performance. Since adjoin-set and union-set no longer need to check for duplicate elements, the performance of both of these procedures is improved. The number of steps required by adjoin-set has gone from $O(n)$ to $O(1)$. The number of steps required by union-set has gone from $O(n^2)$ to $O(n)$.

The penalty that we pay for these performance improvements is that sets now require more memory to accommodate duplicate elements. This representation would be preferred over the original in cases where we don't care about that added memory overhead, and where most of our operations are either adjoin-set or union-set.


 Related:

For links to all of the SICP lecture notes and exercises that I've done so far, see The SICP Challenge.