As more and more code is written in Python — both PyRosetta protocols and other accessory scripts — it will be good to have a consistency and quality to our code. This page presents a list of guidelines and conventions for writing Python code in the Rosetta community.

This page is modeled after the C++ coding conventions; for the list of those conventions, see: Coding Conventions.

If not otherwise stated, the Python style guidelines presented in PEP 8 should be followed.

Conventions

File Layout

All Python code should have the following general file layout:

  • Header
  • Imports
  • Constant Definitions
  • Class & Method Definitions
  • Main Body

Header

Shebang

If the Python code is a script, include the following line, including the path to Python:

#!/usr/bin/env python
Note that, if present, this line must be the very first line in the file, before the copyright header, docstring, or additional comments.
Copyright Notice

The Rosetta Commons copyright header is required for every source code file in Rosetta. Do not make modifications. The header you should use for all .py files is:

# (c) Copyright Rosetta Commons Member Institutions. 
# (c) This file is part of the Rosetta software suite and is made available under license. 
# (c) The Rosetta software is developed by the contributing members of the Rosetta Commons. 
# (c) For more information, see http://www.rosettacommons.org. Questions about this can be 
# (c) addressed to University of Washington CoMotion, email: license@uw.edu. 
Main Docstring

Immediately below the copyright notice block comments should go the "docstring" for the .py file. (See more on documentation below, including how Doxygen reads Python comments.) This text should be opened and closed by a triplet of double quotes (""").

Include headers such as "Brief:", "Params:", "Output:", "Example:", "Remarks:", "Author:", ''etc''.

Example:

"""Brief:   This PyRosetta script does blah.

Params:  ./blah.py .pdb 

Example: ./blah.py foo.pdb 1000

Remarks: Blah blah blah, blah, blah.

Author:  Jason W. Labonte

"""

Imports

import statements come after the header and before any constants.

  • Import only one module per line: import rosetta import rosetta.protocols.rigid not import rosetta, rosetta.protocols.rigid

    • It is OK to import multiple classes and/or methods from the same module on the same line: from rosetta import Pose, ScoreFunction
  • For really long Rosetta namespaces, use namespace aliases, e.g.: import rosetta.protocols.loops.loop_closure.kinematic_closure as KIC_protocols

  • Avoid importing * from any module. With large libraries, such as Rosetta, this is a waste of time and memory.

  • Group import statements in the following order with comment headings (e.g., # Python standard library) and a blank line between each group:

    1. Python standard library modules
    2. Other, non-Rosetta, 3rd party modules
    3. Rosetta modules (by namespace)
    4. Your own custom Python modules
  • rosetta.init() belongs in the main body of your script, not in the imports section. (See Main Body below.)

Constants & Module-Wide Variables

Module-wide constants and variables should be defined next, after the imports.

  • Constants should be named in ALL_CAPS_WITH_UNDERSCORES. (See Naming Conventions below.)

  • Avoid using the global statement anywhere in your code. All constants should be treated as read-only.

  • Do not define your own mathematical constants. Use the ones found in the Python math module.

Class & Method Definitions

Classes and exposed methods should come next in the code.

  • Add two blank lines between each class and exposed method.

  • Add one blank line between each class method.

  • Docstrings for classes and methods are indented below the class/method declaration as part of the definition.

  • Group non-public methods together, followed by shared methods. Non-public methods should also be prefixed with (at least) a single underscore. (See Naming Conventions below.)

  • For methods, if there are too many arguments in the declaration to fit on a single line, align the wrapped arguments with the first character of the first argument: def take_pose_and_apply_foo_to_its_bar_residue_n_times(pose, foo,                                                        bar, n):     """Apply foo to the bar residue of pose n times."""     pass

Main Body

  • If your Python code is intended to be used as a script as well as a module, put if name == "main":     rosetta.init()     my_main_subroutine(my_arg1, my_arg2) at the end of the file.

  • If your Python code is intended to be a module only, do not include rosetta.init(). It should be in the main body of the calling script.

  • If your Python code is intended to be a script only, do not include the if name == "main": check.

    • (Although, it is probably smarter to write your code in such a way that other scripts can call its methods. You never know when you might want to re-use an awesome function.)

Naming Conventions

As in Rosetta 3 C++ code, use the following naming conventions:

  • Use CamelCase for class names (and therefore, exception names).
    • Derived classes should indicate in their name the type of class they are:
      • Exceptions should end in Exception, Error, or Warning, as appropriate.
      • Movers should end in Mover or Protocol.
      • Energy methods should end in EnergyMethod.
  • Use box_car with underscores separating words for variable and method names.

    • Separate getter/accessor and setter functions should be prefixed with get_ (or is_ for functions returning a boolean).
    • Overloaded functions that perform both gets and sets do not need prefixes. (However, in Python, if you are simply accessing a public property of a class, you do not need to write a getter or setter; you may access — and set — the property directly, e.g., PyJobDistributor.native_pose = pose.)
  • Use box_car with underscores for namespaces & directories, i.e., modules & packages.

    • For Python files, use box_car with underscores even for modules containing only a single class. (This differs from the C++ convention (e.g., Pose.cc) and is because the filenames themselves become the namespace in Python.)
  • It is OK to use capital letters within variable and method names for words that are acronyms or abbreviations, e.g., HTML_parser().

  • Likewise, it is OK to use an underscore to separate an acronym or abbreviation from other words in class names, e.g., PyMOL_Mover.

In addition, the following conventions are unique to Python code:

  • Use ALL_CAPS with underscores for constants.

  • Use _box_car with a leading underscore for non-public methods and _CamelCase with a leading underscore for non-public classes.

    • These will not be imported if one invokes from module import *.
    • Note that this is the opposite of the naming convention for C++.
    • Use a leading double underscore in a parent class's attribute to avoid name clashes with any sub-classes.
  • Use box_car_ with a trailing underscore only to avoid conflicts with Python keywords, e.g., class_.

  • Never use box_car with leading and trailing double underscores; that is reserved for special Python keywords, e.g., init.

  • Avoid one-letter variable names. A descriptive name is almost always better.

    • One-letter variables are fine for mathematical variables or indices, e.g., x, y, z, i, j, k.
    • Never use the characters l, O, or I as one-letter variable names, as they are easily confused with 1 and 0.
  • Use self as the name of the first argument for class methods...

    • ...unless it is a static class method — in which case use cls.

Programming Guidelines

Methods

  • Python automatically passes objects into methods by reference and not by value. Thus, there usually is not a need to return an object that was passed and then modified by the method.

    • Python automatically passes primitive types into methods by value.
  • As in C++, conditional checks should happen inside the called method rather than in the calling method when possible. This helps keep things a bit more modular and also ensures that your method has no bad side effects if someone calls it but forgets to check for the essential condition. For example: Instead of if condition_exists:     my_method() use my_method() where my_method() begins with if not condition_exists:     return

Classes & Objects

  • Avoid multiple inheritance.

  • All custom exceptions should inherit from Exception. Do not use string exceptions. (They were removed in Python 2.6. See Exception Handling below.)

    • As with any class, include a docstring. (See Documentation below.)
  • Check the type of arguments passed to a Python class method if it is possible that that method could be called from both Python and C++.

    • When Rosetta 3 code calls a Python method (such as a custom mover being called by a Rosetta 3 mover container), the arguments are passed as access pointers (AP), which must be converted to raw pointers (with the get() method) for Python to use them.
    • When Python calls a Python method, the arguments are passed by reference.
    • To avoid this issue, check the type of any object arguments passed; if they are APs, call get(). For example: class MyMover(rosetta.protocols.moves.Mover):     def apply(self, pose):         if isinstance(pose, rosetta.core.pose.PoseAP):             pose = pose.get()         pass

    • (For convenience, a "dummy" get() method has been added to Pose that returns the instance of the Pose, so that one can use pose = pose.get() without checking its type first. However, it would be impractical to add a get() method to every class in Rosetta that one might wish to use in PyRosetta, so check the type!)

  • Be careful not to say pose = native_pose when you really mean pose.assign(native_pose). The former creates a shallow copy; the latter a deep copy.

Comparisons

  • For comparisons to singletons (True, False, None, a static class) use is not ==.

    • … Except you should not need to ever write if is_option_on is True:; write if is_option_on: instead.
    • Likewise, use if not is_option_on:.
    • Be careful! Only use if variable: for booleans; if variable is not None: is safer. (Some container types, e.g., can be false in a boolean sense.)
      • …So use if not list: in place of if len(list) == 0:.
  • Don't write if x < 10 and x > 5:; simplify this to if 5 < x < 10:.

  • Use != instead of <> for consistency.

  • *Use if my_string.startswith("foo"): and if my_string.endswith("bar"): instead of if my_string[:3] == "foo": and if my_string[-3:] == "bar". Besides the fact that the former is *way easier to read, it's safer.

  • Use if isinstance(object, rosetta.core.pose.Pose):; do not use if type(object) is rosetta.core.pose.Pose:.

Command-Line Options

  • Use the module argparse, not optparse. optparse was deprecated and replaced in Python 2.7, and argparse does all the same things.

Exception Handling

  • Use if and try/except blocks to test for errors that could happen in normal program operation. Errors should generate useful reports that include the values of relevant variables.

    • Your errors should be classes that inherit from Exception. (See Classes above.)
  • Use raise HugeF_ingError("Oh, crap!"), instead of raise HugeF_ingError, "Oh, crap!". (This makes it easier to wrap long error messages. Plus, it's going away in Python 3.0.)

    • Never write raise "Oh, crap!"; that already went away in Python 2.6.
  • Keep the number of lines tested in a try block to the bare minimum. This makes it easier to isolate the actual problem.

    • (Remember, you can add an else or a finally afterwards.)
  • Avoid naked except clauses. They make it more difficult to isolate the actual problem.

    • If you have a good reason, at least use except Exception:, which is better than except: because it will only catch exceptions from actual program errors; naked excepts will also catch keyboard and system errors.
    • You can also catch multiple exceptions with the same except clause, e.g., except HugeF_ingError, SneakyError:.

PyRosetta-Unique Methods

  • If you need to instantiate a particular Rosetta vector1_ container for use in a Rosetta function, if possible, use PyRosetta's Vector1() constructor function, which takes a list as input and determines which vector1_ is needed. For example, instead of: list_of_filenames = utility.vector1_string() list_of_filenames.extend(["file1.fasta", "file2.fasta", "file3.fasta"]) use: list_of_filenames = Vector1(["file1.fasta", "file2.fasta", "file3.fasta"])

  • Use pose_from_sequence() instead of make_pose_from_sequence(). (The former has ω angles set to 180° automatically.)

Miscellaneous

  • Write a += n, not a = a + n.

    • Likewise, use a -= n, a *= n, and a /= n.
  • Use list comprehensions. They are beautiful.

    • For example, instead of: cubes = [] for x in range(10):     cubes.append(x3) or: cubes = map(lambda x: x3, range(10)) use: cubes = [x**3 for x in range(10)] (Lambda functions are super cool, but the second example is far less readable than the third; if you have a lambda inside a map, you should be using a list comprehension instead.)

    • You can also use if to limit the elements in your list: even_cubes = [x3 for x in range(10) if x3 % 2 == 0]

  • Use with when opening files. Context management is safer, because the file is automatically closed, even if an error occurs. For example, instead of: file = open("NOEs.cst") constraints = file.readlines() file.close() use: with open("NOEs.cst") as file:     constraints = file.readlines()


Documentation

Docstrings

Doxygen can autodocument Python code in addition to C++ code, but in the case of Python, it simply reads the doc attribute of every module, class, and method. This is why it is crucial to put all class and method docstrings indented below the declaration line. Otherwise, they will not be stored in the proper doc variable. (In Python,...

class MyClass():
    """This is my class.
    It is great.
    
    """

...is equivalent to...

class MyClass():
    __doc__ = "This is my class.\nIt is great.\n\n"
  • Always use double quotes for docstrings. While Python recognizes both """ and ''', some text editors only recognize the former.

  • All classes and methods must contain a docstring with at least a brief statement of what the class or method is or should do, respectively.

    • The first sentence of the docstring contains this "brief".
      • It should fit on one line.
      • It should be on the same line as the opening """.
      • If it is the only line in your docstring, put the closing """ on the same line.
      • For classes, it should be a descriptive/indicative sentence, e.g., """This class contains terpene-specific ligand data."""
      • For methods, it should be an imperative sentence, e.g., """Return True if this ligand is a sesquiterpene."""
    • Separated by a blank line, further details of the class or method's implementation should follow, particularly describing key methods (for classes) and key arguments and what is returned (for methods).
    • While not required, it is encouraged that you include an example.
      • If you want really great examples, make them compatible with Python's doctest module.
    • It is also helpful to include a "See also" list.
    • End the docstring with a blank line and then the closing """.
  • Docstrings for scripts should be written to serve also as the script's help/usage message.

    • …unless you are using the argparse module, in which case, use it to generate help/usage messages.
  • Docstrings for modules should list all public classes and methods that are exported by the module.

  • Document a class's constructor in the init() method's docstring, not in the class's docstring.

Comments

  • Comments should usually be complete sentences.

    • Use proper grammar, capitilization, and punctuation.
    • Use two spaces between sentences.
    • Never change the case of a variable mentioned in a comment, even if the first word of a sentence.
  • Use block comments to talk about the code immediately following the comments.

    • Don't use docstrings to substitute for block comments.
  • Use inline comments (sparingly) to clarify confusing lines.


Coding Style

As mentioned at the top of this page, when it doubt, follow PEP 8.

Indentation

  • Use 4 spaces to indent, not tabs.

    • (C++ code should use tabs.)
  • Indent 4 spaces per nested level.

  • Indent 4 additional spaces when method arguments must be wrapped beyond the first line.

    • ...Or use spaces to align the arguments.
    • Example 1:
      def take_pose_and_apply_foo_to_its_bar_residue_n_times(
      pose,
      foo,
      bar,
      n):
      """Applies foo to the bar residue of pose n times."""
      pass
      
      Example 2:
      def take_pose_and_apply_foo_to_its_bar_residue_n_times(pose, foo,
                                                     bar, n):
      """Applies foo to the bar residue of pose n times."""
      pass
      

Spaces in Expressions & Statements

While there are numerous different styles for using spaces in argument lists and expressions in C++, PEP 8 recommends the following (and provides many examples):

  • Do not put spaces around parentheses, brackets, or curly braces.

  • Do not put spaces before commas, colons, or semicolons.

  • Put one space around operators.

    • Exception: do not put spaces around = if used as part of a keyword argument or default value assignment in a method call or declaration, respectively.
    • Do not use more than one space, such as to line up values when variable names have different lengths.
    • In mathematical formulae, it is OK to not use spaces with operators of higher priority, e.g., y = ax2 + bx + c.

Miscellaneous

  • Limit lines to a maximum of 79 characters.

    • Python knows to continue lines that have not closed their parentheses, brackets, or curly brackets yet. Use this to your advantage.
    • If you must use </code> to wrap lines, do so after any operators.
  • Do not put semicolons at the end of lines.

  • Do not use semicolons to combine multiple lines.

    • Similarly, do not put the body of a block statement on the same line as the statement itself; indent.

Example Code

[to be added later… ~ labonte 20:56, 31 Aug 2012 (PDT)]


See Also