Sunday, January 9, 2011

String formatting in Python

You have written a piece of code in your favorite language Python.  The code does some fancy (or simple) calculations.  Now you want to output the results of these calculations in a format that is easier and more pleasant for humans to read.  Keep reading to find out about some options available in Python (version 2.6 and above) for basic formatting of the output produced using print statements. Specifically, below I discuss:

  1. how to set the column width of the column in which the output is printed;
  2. how to align the output
    • left, center or right align
    • how to align the digits such that the positive and negative numbers start in the same column
  3. how to set the precision of floating numbers
  4. some other examples that illustrate output of numbers with thousand separator commas, output of numbers as percentages etc.

Not all possibilities are discussed below. For more information, see http://www.python.org/dev/peps/pep-3101/

Suppose you have a list of tuples called lang_info where each tuple has the name of the language, the year it was developed (taken from this Wikipedia article) and TIOBE rating. You want simple text output of this information in the form of a table.



lang_info =  [('Fortran', 1954, 0.435), ('Cobol', 1959, 0.391),
              ('C', 1972, 16.076), ('C++', 1980, 9.014), 
              ('Python', 1991, 6.482), 
              ('Java', 1995, 17.99), ('C#', 2001, 6.687)]


If you use the code:

print "Language Year Developed TIOBE rating"
print "--------------------------------------"
for element in lang_info:
    print element[0], element[1], element[2]

you will get the output (call this output_1):

Language Year Developed TIOBE rating
--------------------------------------
Fortran 1954 0.435
Cobol 1959 0.391
C 1972 16.076
C++ 1980 9.014
Python 1991 6.482
Java 1995 17.99
C# 2001 6.687

This is not very easy or pleasant to read. Let us format it so that we get the output:

Language      Year Developed      TIOBE rating
----------------------------------------------
Fortran            1954                   0.43
Cobol              1959                   0.39
C                  1972                  16.08
C++                1980                   9.01
Python             1991                   6.48
Java               1995                  17.99
C#                 2001                   6.69

The first step is to replace the words and numbers (such as "Language", "Year Developed"), and the variables (element[0], element[1]) that we want to format by what are known as replacement fields.

So instead of:

print "Language Year Developed TIOBE rating"
print "--------------------------------------"
for element in lang_info:
    print element[0], element[1], element[2]

we have,

print "{0} {1} {2}".format("Language", "Year Developed",
                            "TIOBE rating")
print "-"*46
for element in lang_info:
    print "{0} {1} {2}".format(element[0], element[1], element[2])

The curly braces are known as replacement fields. The numbers within them are called field names and specify the position of the argument in the .format method that will replace that replacement field. For example, {1} will be replaced by argument in position 1 in the .format method. Running the above code gives us the same output as in output_1; this is because we have not applied any formatting to it. Next we type what are known as 'format specifiers' in the replacement fields. These are separated from the field name by a colon (:). The general form of the format specifier is

[[fill]align][sign][#][0][minimumwidth][.precision][type]

We will discuss only some of these flags (specifically, we will discuss align, sign, minimumwidth, .precision and type flags) and that too only some of the possible values for these flags.

First, note that all flags are optional. The following are some caveats:

  •  if you specify two or more flags they should be specified in the same order as shown in the general form of the format specifier.
  • if you want to use the fill, you must give a value for the align flag.

Let us first set the column width. The minimumwidth flag allows us to do this. It is an integer which specifies the width of the column. The following code:

print "{0:12} {1:16} {2:16}".format("Language", "Year Developed", 
                                     "TIOBE rating")
print "-"*46
for element in lang_info:
    print "{0:12} {1:16} {2:16}".format(element[0], element[1], 
                                       element[2])

gives the following output:

Language     Year Developed   TIOBE rating    
----------------------------------------------
Fortran                  1954             0.43
Cobol                    1959            0.391
C                        1972           16.076
C++                      1980            9.014
Python                   1991            6.482
Java                     1995            17.99
C#                       2001            6.687

This is already looking better, but still there is room for improvement. While the numbers are right aligned, the column titles (in second and third column) are left aligned. Let us change the alignment of each of these to center. We will use the align flag for this. Some possible values for the align flag are:

'^' -  for center alignment
'<' -  for left alignment 
'>' -  for right alignment

Applying the left alignment tag to the year column:

print "{0:12} {1:16} {2:16}".format("Language", "Year Developed", 
                                    "TIOBE rating")
print "-"*46
for element in lang_info:
    print "{0:12} {1:<16} {2:16}".format(element[0], element[1], 
                                         element[2])
gives the following output.

Language     Year Developed   TIOBE rating    
----------------------------------------------
Fortran      1954                         0.43
Cobol        1959                        0.391
C            1972                       16.076
C++          1980                        9.014
Python       1991                        6.482
Java         1995                        17.99
C#           2001                        6.687
Still not great. Let us center align the header of the second column, right align the header of the third column and center align the contents of the second column (the year column).

print "{0:<12} {1:^16} {2:>16}".format("Language", 
                                       "Year Developed", 
                                       "TIOBE rating")
print "-"*46
for element in lang_info:
    print "{0:12} {1:^16} {2:16}".format(element[0], element[1], 
                                         element[2])

This piece of code gives:

Language      Year Developed      TIOBE rating
----------------------------------------------
Fortran            1954                   0.43
Cobol              1959                  0.391
C                  1972                 16.076
C++                1980                  9.014
Python             1991                  6.482
Java               1995                  17.99
C#                 2001                  6.687
The last column does not look that great given different number of digits after the decimal point. We will use the .precision flag and the type flag to fix this:


precision is a whole number that specifies the number of digits you want to display after the decimal point.


type takes many different values (see  http://www.python.org/dev/peps/pep-3101/ for all different values that this flag takes). Here, we use f for fixed point number.

Hence, using

print "{0:<12} {1:^16} {2:>16}".format("Language", 
                                        "Year Developed", 
                                        "TIOBE rating")
print "-"*46
for element in lang_info:
    print "{0:12} {1:^16} {2:16.2f}".format(element[0], 
                                            element[1], 
                                            element[2])

we have,

Language      Year Developed      TIOBE rating
----------------------------------------------
Fortran            1954                   0.43
Cobol              1959                   0.39
C                  1972                  16.08
C++                1980                   9.01
Python             1991                   6.48
Java               1995                  17.99
C#                 2001                   6.69
That looks much nicer and is much easier to understand.


Another example using value n for the type flag.

import locale

# Setting the locale to US English
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')

print "Number without formatting applied: 65739838"
print "Number with formatting applied: {0:n}".format(65739838)

The output of this code is:

Number without formatting applied: 65739838
Number with formatting applied: 65,739,838

I think the second is far easier to read than the first.

8 comments:

  1. Thanks, nice summary of the possibilities. I didn't know about the justifiers (<,^,>) and for printing a large integer in a pretty way I always used a custom function :)

    ReplyDelete
  2. Hero of the day!

    Thanks a lot for the exact words I needed to make things click!

    ReplyDelete
  3. I want you to write more python articles, pweety pwease! You be an excellent teacher!

    ReplyDelete
  4. Excellent outline - suits my needs better than any packages that are specifically meant for this.

    ReplyDelete