AWS Machine Learning Blog

Announcing new Jupyter contributions by AWS to democratize generative AI and scale ML workloads

Project Jupyter is a multi-stakeholder, open-source project that builds applications, open standards, and tools for data science, machine learning (ML), and computational science. The Jupyter Notebook, first released in 2011, has become a de facto standard tool used by millions of users worldwide across every possible academic, research, and industry sector. Jupyter enables users to work with code and data interactively, and to build and share computational narratives that provide a full and reproducible record of their work.

Given the importance of Jupyter to data scientists and ML developers, AWS is an active sponsor and contributor to Project Jupyter. Our goal is to work in the open-source community to help Jupyter to be the best possible notebook platform for data science and ML. AWS is a platinum sponsor of Project Jupyter through the NumFOCUS Foundation, and I am proud and honored to lead a dedicated team of AWS engineers who contribute to Jupyter’s software and participate in Jupyter’s community and governance. Our open-source contributions to Jupyter include JupyterLab, Jupyter Server, and the Jupyter Notebook subprojects. We are also members of the Jupyter working groups for Security, and Diversity, Equity, and Inclusion (DEI). In parallel to these open-source contributions, we have AWS product teams who are working to integrate Jupyter with products such as Amazon SageMaker.

Today at JupyterCon, we are excited to announce several new tools for Jupyter users to improve their experience and boost development productivity. All of these tools are open-source and can be used anywhere you are running Jupyter.

Introducing two generative AI extensions for Jupyter

Generative AI can significantly boost the productivity of data scientists and developers as they write code. Today, we are announcing two Jupyter extensions that bring generative AI to Jupyter users through a chat UI, IPython magic commands, and autocompletion. These extensions enable you to perform a wide range of development tasks using generative AI models in JupyterLab and Jupyter notebooks.

Jupyter AI, an open-source project to bring generative AI to Jupyter notebooks

Using the power of large language models like ChatGPT, AI21’s Jurassic-2, and (coming soon) Amazon Titan, Jupyter AI is an open-source project that brings generative AI features to Jupyter notebooks. For example, using a large language model, Jupyter AI can help a programmer generate, debug, and explain their source code. Jupyter AI can also answer questions about local files and generate entire notebooks from a simple natural language prompt. Jupyter AI offers both magic commands that work in any notebook or IPython shell, and a friendly chat UI in JupyterLab. Both of these experiences work with dozens of models from a wide range of model providers. JupyterLab users can select any text or notebook cells, enter a natural language prompt to perform a task with the selection, and then insert the AI-generated response wherever they choose. Jupyter AI is integrated with Jupyter’s MIME type system, which lets you work with inputs and outputs of any type that Jupyter supports (text, images, etc.). Jupyter AI also provides integration points that allows third parties to configure their own models. Jupyter AI is an official open-source project of Project Jupyter.

Amazon CodeWhisperer Jupyter extension

Autocompletion is foundational for developers and generative AI can significantly enhance the code suggestion experience. That is why we announced the general availability of Amazon CodeWhisperer earlier in 2023. CodeWhisperer is an AI coding companion that uses foundational models under the hood to radically improve developer productivity. This works by generating code suggestions in real time based on developers’ comments in natural language and prior code in their integrated development environment (IDE).

Today, we are excited to announce that JupyterLab users can install and use the CodeWhisperer extension for free to generate real-time, single-line, or full-function code suggestions for Python notebooks in JupyterLab and Amazon SageMaker Studio. With CodeWhisperer, you can write a comment in natural language that outlines a specific task in English, such as “Create a pandas dataframe using a CSV file.” Based on this information, CodeWhisperer recommends one or more code snippets directly in the notebook that can accomplish the task. You can quickly and easily accept the top suggestion, view more suggestions, or continue writing your own code.

During its preview, CodeWhisperer proved it is excellent at generating code to accelerate coding tasks, helping developers complete tasks an average of 57% faster. Additionally, developers who used CodeWhisperer were 27% more likely to complete a coding task successfully than those who did not. This is a giant leap forward in developer productivity. CodeWhisperer also includes a built-in reference tracker that detects whether a code suggestion might resemble open-source training data and can flag such suggestions.

Introducing new Jupyter extensions to build, train, and deploy ML at scale

Our mission at AWS is to democratize access to ML across industries. To achieve this goal, starting from 2017, we launched the Amazon SageMaker notebook instance—a fully managed compute instance running Jupyter that includes all the popular data science and ML packages. In 2019, we made a significant leap forward with the launch of SageMaker Studio, an IDE for ML built on top of JupyterLab that enables you to build, train, tune, debug, deploy, and monitor models from a single application. Tens of thousands of customers are using Studio to empower data science teams of all sizes. In 2021, we further extended the benefits of SageMaker to the community of millions of Jupyter users by launching Amazon SageMaker Studio Lab—a free notebook service, again based on JupyterLab, that includes free compute and persistent storage.

Today, we are excited to announce three new capabilities to help you scale ML development faster.

Notebooks scheduling

In 2022, we released a new capability to enable our customers to run notebooks as scheduled jobs in SageMaker Studio and Studio Lab. Thanks to this capability, many of our customers have saved time by not having to manually set up complex cloud infrastructure to scale their ML workflows.

We are excited to announce that the notebooks scheduling tool is now an open-source Jupyter extension that allows JupyterLab users to run and schedule notebooks on SageMaker anywhere JupyterLab runs. Users can select a notebook and automate it as a job that runs in a production environment via a simple yet powerful user interface. After a notebook is selected, the tool takes a snapshot of the entire notebook, packages its dependencies in a container, builds the infrastructure, runs the notebook as an automated job on a schedule set by the user, and deprovisions the infrastructure upon job completion. This reduces the time it takes to move a notebook to production from weeks to hours.

SageMaker open-source distribution

Data scientists and developers want to begin developing ML applications quickly, and it can be complex to install the mutually compatible versions of all the necessary packages. To remove the manual work and improve productivity, we are excited to announce a new open-source distribution that includes the most popular packages for ML, data science, and data visualization. This distribution includes deep learning frameworks like PyTorch, TensorFlow, and Keras; popular Python packages like NumPy, scikit-learn, and pandas; and IDEs like JupyterLab and the Jupyter Notebook. The distribution is versioned using SemVer and will be released on a regular basis moving forward. The container is available via Amazon ECR Public Gallery, and its source code is available on GitHub. This provides enterprises transparency into the packages and build process, thereby making it easier for them to reproduce, customize, or re-certify the distribution. The base image comes with pip and Conda/Mamba, so that data scientists can quickly install additional packages to meet their specific needs.

Amazon CodeGuru Jupyter extension

Amazon CodeGuru Security now supports security and code quality scans in JupyterLab and SageMaker Studio. This new capability assists notebook users in detecting security vulnerabilities such as injection flaws, data leaks, weak cryptography, or missing encryption within the notebook cells. You can also detect many common issues that affect the readability, reproducibility, and correctness of computational notebooks, such as misuse of ML library APIs, invalid run order, and nondeterminism. When vulnerabilities or quality issues are identified in the notebook, CodeGuru generates recommendations that enable you to remediate those issues based on AWS security best practices.

Conclusion

We are excited to see how the Jupyter community will use these tools to scale development, increase productivity, and take advantage of generative AI to transform their industries. Check out the following resources to learn more about Jupyter on AWS and how to install and get started with these new tools:


About the Author

Brian Granger is a leader of the Python project, co-founder of Project Jupyter, and an active contributor to a number of other open-source projects focused on data science in Python. In 2016, he co-created the Altair package for statistical visualization in Python. He is an advisory board member of the NumFOCUS Foundation, a faculty fellow of the Cal Poly Center for Innovation and Entrepreneurship, and the Sr. Principal Technologist at AWS.