how to add a column to a dataframe in python

4 Methods for Adding Columns to Pandas Dataframes

Explained with examples

Pandas is a data analysis and manipulation library for Python. It provides numerous functions and methods to manage tabular data. The core data structure of Pandas is data frame which stores data in tabular form with labelled rows and columns.

From data perspective, rows represent observations or data points. Columns represent features or attributes about the observations. Consider a data frame of house prices. Each row is a house and each column is a feature about the house such as age, number of rooms, price, and so on.

Adding or dr o pping columns is a common operation in data analysis. In this article, we will go over 4 different ways of adding a new column to a data frame.

Let's first create a simple data frame to use in the examples.

            import numpy as np
import pandas as pd            df = pd.DataFrame({"A": [1, 2, 3, 4],
              "B": [5, 6, 7, 8]})            df

Method 1

This might be the most commonly used method for creating a new column.

            df["C"] = [10, 20, 30, 40]            df

We specify the column name like we are selecting a column in the data frame. Then, the values are assigned to this column. The new column is added as the last column (i.e. the column with the highest index).

We can add multiple columns at once. Column names are passed in a list and values need to be two dimensional compatible with the number of rows and columns. For instance, the following code adds three columns filled with random integers between 0 and 10.

            df[["1of3", "2of3", "3of3"]] = np.random.randint(10, size=(4,3))            df

Let's drop these three columns before going to the next method.

            df.drop(["1of3", "2of3", "3of3"], axis=1, inplace=True)

Method 2

In the first method, the new column is added at the end. Pandas also allows for adding new column at a specific index. The insert function can be used to customize the location of the new column. Let's add one next to column A.

            df.insert(1, "D", 5)            df

The insert function takes 3 parameters which are the index, the name of the column, and the values. The column indices start from 0 so we set the index parameter as 1 to add the new column next to column A. We can pass a constant value to be filled in all rows.

Method 3

The loc method selects rows and columns using their labels. It is also possible to create a new column with this method.

            df.loc[:, "E"] = list("abcd")            df

In order to select rows and columns, we pass the desired labels. The colon indicates that we want to select all the rows. In the column part, we specify the labels of the columns to be selected. Since the data frame does not have column E, Pandas creates a new column.

Method 4

The last method is the assign function.

            df = df.assign(F = df.C * 10)            df

We specify both the column name and values inside the assign function. You may notice that we derive the values using another column in the data frame. The previous methods also allow for such derivations.

There is an important difference between the insert and assign functions. The insert function works in place which means the change (adding new column) is saved in the data frame.

The situation is a little different with the assign function. It returns the modified data frame but does not change the original one. In order to use the modified version (with the new column), we need to explicitly assign it.

Conclusion

We have covered 4 different methods for adding new columns to a Pandas data frame. It is a common operation in data analysis and manipulation.

One of the things I like about Pandas is that it usually provides multiple ways to perform a given task. I think it is a result of the flexibility and versatility of Pandas.

Thank you for reading. Please let me know if you have any feedback.