What’s new with NumFOCUS projects: March 2024

NumFOCUS
6 min readMar 21, 2024

--

A lot has happened since our last set of project updates. See all the recent updates from our projects below!

Sponsored Project Updates

TARDIS

  • We are excited to announce that TARDIS has been accepted as a mentoring organization in Google Summer of Code 2024! Check out our ideas page at https://tardis-sn.github.io/summer_of_code/ideas/ and connect with us on Gitter at https://gitter.im/tardis-sn/gsoc if you are interested!
  • We have been modularising the components further to make it easier to customize TARDIS, unlocking the potential for more science cases.

Spyder

  • Spyder 5.5.2 was released, which future-proofs and improves the updater, resolves console crashes, and fixes hangs while plotting

MDAnalysis

Events

Additional Announcements

Dask

Dask Updates

Query planning for Dask DataFrame. Dask DataFrame is now more performant and reliable since optimizations like predicate pushdown and column filtering are applied automatically. Query planning is enabled by default for Dask >= 2024.3.0. See the GitHub issue.

Coiled Updates

  • Try out ARM on Google Cloud for free. Tau T2A VMs are available in select regions for a free trial until March 31, 2024. See the release notes.

New Blog Posts

  • One trillion row challenge. The one billion row challenge was fun, and we thought it’d be interesting to compare performance across different sets of big data tools at a much larger scale. Read the blog post.

Events

  • The next Dask Demo Day is on March 21st. Watch last month’s recording for live demos on the 1TRC, Dask on Databricks, scaling LlamaIndex with Dask, deploying Prefect on the cloud, and changes to AWS costs for public IPv4 addresses. Have something you’d like to share next week? Let us know!
  • Schedule Python Jobs with Prefect and Coiled. Thanks to those who joined for the webinar last week! Watch the recording.

Just signed up for Coiled?

Watch our short demo to see how to get started. It’s easy to connect Coiled to your AWS, GCP, or Azure account.

New to Dask?

Check out our pre-recorded tutorials on using Dask DataFrames, parallelizing your Python code, plus more advanced use cases.

Affiliated Project Updates

Optuna

The next minor release (v3.6) is coming soon! We are planning to include several new features to Optuna v3.6, including:

  • Wilcoxon pruner, a pruner based on Wilcoxon signed-rank test
  • Native GP implementation faster than BoTorchSampler, which supports mixed (with continuous, discrete, and categorical) search spaces
  • PED-ANOVA importance evaluator much quicker than the original f-ANOVA
  • Migrating the entire `optuna.integration` module to `optuna-integration` package while retaining the backward compatibility with `pip install optuna-integration`,

Furthermore, we are implementing a subset of Optuna features with Rust to provide a faster Optuna binding for other languages such as C++ and TypeScript.

Last but not least, we are recruiting GSoC 2024 participants! Please check out the program description here.

Open2C

  • Our NumFOCUS-affiliated project, Bioframe, has been published as an application note in Bioinformatics, “Bioframe: operations on genomic intervals in Pandas dataframes
  • We are hiring! Abdennur Lab, a core contributor to Open2C at UMass Chan Medical School, is looking for a #Postdoc in #ComputationalBiology and #Genomics. Projects focus on investigating 3D and functional genomics and on cutting-edge genomic data science and data vis. Successful candidates will contribute to the NumFOCUS-affiliated projects in Open2C. https://www.ummsjobs.com/job/10102/

signac

  • More caching in the signac framework makes common code paths up to 200 times faster on slow file systems, making daily work with large projects more fluid.

Releases with these improvements:

GeomScale

GeomScale is a research and development project that delivers open-source code for state-of-the-art algorithms at the intersection of data science, optimization, geometric, and statistical computing.

Our news:

  • GeomScale organization has been accepted as a mentoring organization for the 2024 Google Summer of Code. We are searching for contributors with strong programming skills and background in computer science and/or applied mathematics. For more details: https://www.linkedin.com/feed/update/urn:li:activity:7167873127702511616/
  • A new version 1.2.0 of the R interface to volesti library is released and hosted to a new repository: https://github.com/GeomScale/Rvolesti
  • Many new functions for statistical tests are available as well as new samplers for logconcave distributions. More details: https://github.com/GeomScale/Rvolesti/blob/develop/NEWS.md#volesti-120
  • A new preprint on “Randomized Control in Performance Analysis and Empirical Asset Pricing” is available on Arxiv: https://arxiv.org/pdf/2403.00009.pdf. The article explores the application of randomized control techniques in empirical asset pricing and performance evaluation.

Mesa

Mesa (Agent-Based Models with Python)

Mesa is proud to announce it was selected for Google Summer of Code (GSoC)!

This is Mesa’s first time in GSoC, and we have four projects to help Mesa become even more capable.

MesaFrames

Description: Contributors develop a DataFrame way to conduct vectorized operations as part of agent-based modeling to cause significant speed-up in Mesa processes. Initial attempts have shown Polars provides significant speed-up to Mesa models and can help dramatically improve ABMs in Python.

Mesa RL

Description: One of the first reinforcement learning models was the ABM El Farol Bar model. Multi-Agent RL (MARL) is a fundamental way to develop agent behaviors to uncover the dynamics of complex systems. Mesa would like an extension that allows users to easily integrate Python’s reinforcement learning libraries (e.g. KerasRL, OpenAIbaselines, Open AI Gym, TFAgents etc) into agent evolution.

Cacheable Mesa

Description: As ABMs are simulations and often have phase transitions (periods of rapid change to new stable states), being able to go back in time and replay key results would be a great addition to Mesa. Critically, no computation would be needed as the results are stored.

NetworkScheduler

Description: Create an activation regime based on a network or hypergraph approach with potential inputs from set, group or topology that allows for multiple levels of agent processing. The idea is dynamic agent groups get , created, destroyed, activated,

or go dormant based on interactions with the environment or other agents. A substantive start is multi-level mesa which needs to refactored, with a detailed description of this idea is available through arxiv — Multi-Level Mesa

GSoC proposals open March 18th, and based on the discussion so far, we are looking forward to some exciting proposals.

Join us on Matrix

Join us on GitHub

Trixi.jl

We released Trixi.Jl v0.7.0, which since the last minor release, v0.6.0, in December, has a ton of nice improvements and updates. The biggest change is certainly that support for advanced simulations of shallow water-type equations has been moved into a new downstream package TrixiShallowWater.jl, where we will bundle all our efforts in this area.

--

--

NumFOCUS

Our open source scientific software projects are changing the world. Learn more on our website: https://numfocus.org