Expanding Picking Actions for Time-Efficient Online 3D Bin Packing

STEP (Space-Time Efficient Packing) explicitly accounts for the time required to reorient and handle packages, jointly reasoning over space utilization and operational time. STEP learns to balance speed and packing efficiency through optimized item and face selection. This demo shows packing behaviors under different space and time weights, ranging from fast execution to dense packing.

Overview

Different faces of a box showing various grasp orientations — Face selection matters: different grasp directions achieve better packing through reorientation, but also affects how quickly and safely a robot can pick and place items.

Robotic bin packing in warehouses demands a careful balance between two often conflicting goals: maximizing space utilization and minimizing operational time. STEP (Space-Time Efficient Packing) addresses this trade-off by learning to select not just which item to pack, but also which face to grasp it from. Each choice influences both the quality of placement and the time required to execute it. Our method uses a preference-conditioned multi-objective policy that dynamically weighs packing efficiency against time cost, adapting its strategy based on user-specified preferences. While STEP is trained and evaluated in simulation, its design is grounded in real-world robotic constraints which include grasp direction, object surface type, and the impact of transport failures. These factors can directly be abstracted into the time model used during training, ensuring the policy reflects practical deployment conditions.

Real-World Insights

Different grasp directions directly affect how quickly and reliably the robot can pick an item:

Top grasps are often faster.
Side grasps can be slower.
A robot may also reorient an item by first picking from the top face, placing it in a new orientation, and then picking it again.

Importantly, reorientation times are not universal - they depend on the specific robotic setup, hardware, and motion planning constraints. STEP abstracts all of these strategies into a unified time cost, which is explicitly taken into account when planning for bin packing. Each graspable face is treated as a separate candidate with its own time cost.

Transport speed also affects reliability:

If a box with an uneven or difficult surface is moved too quickly, the suction cups may lose contact and the package can drop.
Slowing down avoids failure, but increases operation time.

The suction-cup dynamics with the box surface directly influence the safety and reliability of transport, and these effects depend on the specific setup and item. Since STEP explicitly optimizes for operational time, such behaviors are abstracted into face-dependent time penalties during bin packing. In training, we defined three surface categories - smooth, plastic-wrapped, and package-labeled — and randomly assigned each box face to one of these categories in simulation, with each category given a fixed time penalty.

System Architecture

STEP frames bin packing as a multi-candidate, multi-objective selection problem that aims to balance space utilization and operational time. Each graspable face of each item in the buffer is treated as a distinct candidate in the selection process.

The policy receives as input:

Current bin state (as empty maximal spaces)
Item-face pairs with their selected placement position and rotation
Estimated operational time for each item-face pair
Preference vector specifying space-time trade-off

The Transformer-Select module processes the bin and item-face features to produce embeddings.

These embeddings are then combined with the preference vector in the actor and critic heads to compute scores over all item-face candidates.

The highest-scoring candidate is selected; the bin state is updated, and a new item enters the buffer.

From Space to Time: A Pareto Perspective

Space utilization and operational time are inherently conflicting in robotic bin packing:

Achieving denser packing often requires extra reorientations and longer actions.
Minimizing execution time, in contrast, can lead to wasted bin space.

STEP resolves this conflict through a preference vector, which defines how much weight to place on each objective. By tuning this vector, the policy can prioritize space efficiency, time efficiency, or a balance of both within a single framework.

The Pareto front below illustrates the achievable trade-offs. STEP-n denotes a policy with n items in the buffer in the semi-online setting. The figure shows results for buffers of size 1, 3, and 5, each evaluated across preference vectors ranging from 0 to 1 for both objectives. Larger buffers provide more candidate choices and improve space utilization, while the preference vector governs how the trade-off between space and time is considered, respectively.

Hover over points to see detailed values

Results

We compare STEP-1 against three strong baselines, each with a different strategy for handling grasping and reorientation:

TopFaceSpace - picks items only from the top face, optimizing for space.
ReorientSpace - allows reorientation to improve space utilization, ignoring time cost.
ReorientTime - reorients items while penalizing time, but without balancing both objectives.

Video

Abstract

Robotic bin packing is widely deployed in warehouse automation, with current systems achieving robust performance through heuristic and learning-based strategies. These systems must balance compact placement with rapid execution, where actions such as selecting alternative items or reorienting them can improve space utilization but introduce additional time. We propose a selection-based formulation that explicitly reasons over this trade-off: at each step, the robot evaluates multiple candidate actions, weighing expected packing benefit against estimated operational time. This enables time-aware strategies that selectively accept increased operational time when it yields meaningful spatial improvements. Our method, STEP (Space-Time Efficient Packing), uses a Transformer-based policy conditioned on dynamic preferences, and allows generalization across candidate set sizes and integration with standard placement modules. It achieves higher packing density without compromising operational time.