kungchuking commited on 21 days ago

Commit

2c76547

1 Parent(s): 70bca94

Copied from github repository.

Browse files

Files changed (38) hide show

README.md +139 -0
datasets/augmentor.py +200 -0
datasets/dynamic_stereo_datasets.py +743 -0
datasets/frame_utils.py +118 -0
evaluation/configs/eval_dynamic_replica_150_frames.yaml +8 -0
evaluation/configs/eval_dynamic_replica_40_frames.yaml +8 -0
evaluation/configs/eval_real_data.yaml +9 -0
evaluation/configs/eval_sintel_clean.yaml +9 -0
evaluation/configs/eval_sintel_final.yaml +9 -0
evaluation/core/evaluator.py +152 -0
evaluation/evaluate.py +143 -0
evaluation/utils/eval_utils.py +213 -0
evaluation/utils/utils.py +351 -0
links_lite.json +15 -0
models/core/attention.py +240 -0
models/core/corr.py +88 -0
models/core/dynamic_stereo.py +506 -0
models/core/extractor.py +139 -0
models/core/model_zoo.py +48 -0
models/core/sci_codec.py +180 -0
models/core/update.py +370 -0
models/core/utils/config.py +961 -0
models/core/utils/utils.py +44 -0
models/dynamic_stereo_model.py +50 -0
models/raft_stereo_model.py +84 -0
notebooks/Dynamic_Replica_demo.ipynb +0 -0
notebooks/evaluate.ipynb +0 -0
requirements.txt +20 -0
scripts/checksum_check.py +154 -0
scripts/download_dynamic_replica.py +35 -0
scripts/download_utils.py +280 -0
scripts/dr_sha256.json +106 -0
setup.csh +9 -0
train.csh +34 -0
train.py +565 -0
train_utils/logger.py +67 -0
train_utils/losses.py +158 -0
train_utils/utils.py +180 -0

README.md ADDED Viewed

	@@ -0,0 +1,139 @@

+# [CVPR 2023] DynamicStereo: Consistent Dynamic Depth from Stereo Videos.
+**[Meta AI Research, FAIR](https://ai.facebook.com/research/)**; **[University of Oxford, VGG](https://www.robots.ox.ac.uk/~vgg/)**
+[Nikita Karaev](https://nikitakaraevv.github.io/), [Ignacio Rocco](https://www.irocco.info/), [Benjamin Graham](https://ai.facebook.com/people/benjamin-graham/), [Natalia Neverova](https://nneverova.github.io/), [Andrea Vedaldi](https://www.robots.ox.ac.uk/~vedaldi/), [Christian Rupprecht](https://chrirupp.github.io/)
+[[`Paper`](https://research.facebook.com/publications/dynamicstereo-consistent-dynamic-depth-from-stereo-videos/)] [[`Project`](https://dynamic-stereo.github.io/)] [[`BibTeX`](#citing-dynamicstereo)]
+![nikita-reading](https://user-images.githubusercontent.com/37815420/236242052-e72d5605-1ab2-426c-ae8d-5c8a86d5252c.gif)
+**DynamicStereo** is a transformer-based architecture for temporally consistent depth estimation from stereo videos. It has been trained on a combination of two datasets: [SceneFlow](https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html) and **Dynamic Replica** that we present below.
+## Dataset
+https://user-images.githubusercontent.com/37815420/236239579-7877623c-716b-4074-a14e-944d095f1419.mp4
+The dataset consists of 145200 *stereo* frames (524 videos) with humans and animals in motion.
+We provide annotations for both *left and right* views, see [this notebook](https://github.com/facebookresearch/dynamic_stereo/blob/main/notebooks/Dynamic_Replica_demo.ipynb):
+- camera intrinsics and extrinsics
+- image depth (can be converted to disparity with intrinsics)
+- instance segmentation masks
+- binary foreground / background segmentation masks
+- optical flow (released!)
+- long-range pixel trajectories (released!)
+### Download the Dynamic Replica dataset
+Due to the enormous size of the original dataset, we created the `links_lite.json` file to enable quick testing by downloading just a small portion of the dataset.
+```
+python ./scripts/download_dynamic_replica.py --link_list_file links_lite.json --download_folder ./dynamic_replica_data --download_splits test train valid real
+```
+To download the full dataset, please visit [the original site](https://github.com/facebookresearch/dynamic_stereo) created by Meta.
+## Installation
+Describes installation of DynamicStereo with the latest PyTorch3D, PyTorch 1.12.1 & cuda 11.3
+### Setup the root for all source files:
+```
+git clone https://github.com/facebookresearch/dynamic_stereo
+cd dynamic_stereo
+export PYTHONPATH=`(cd ../ && pwd)`:`pwd`:$PYTHONPATH
+```
+### Create a conda env:
+```
+conda create -n dynamicstereo python=3.8
+conda activate dynamicstereo
+```
+### Install requirements
+```
+conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
+# It will require some time to install PyTorch3D. In the meantime, you may want to take a break and enjoy a cup of coffee.
+pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"
+pip install -r requirements.txt
+```
+### (Optional) Install RAFT-Stereo
+```
+mkdir third_party
+cd third_party
+git clone https://github.com/princeton-vl/RAFT-Stereo
+cd RAFT-Stereo
+bash download_models.sh
+cd ../..
+```
+## Evaluation
+To download the checkpoints, you can follow the below instructions:
+```
+mkdir checkpoints
+cd checkpoints
+wget https://dl.fbaipublicfiles.com/dynamic_replica_v1/dynamic_stereo_sf.pth
+wget https://dl.fbaipublicfiles.com/dynamic_replica_v1/dynamic_stereo_dr_sf.pth
+cd ..
+```
+You can also download the checkpoints manually by clicking the links below. Copy the checkpoints to `./dynamic_stereo/checkpoints`.
+- [DynamicStereo](https://dl.fbaipublicfiles.com/dynamic_replica_v1/dynamic_stereo_sf.pth) trained on SceneFlow
+- [DynamicStereo](https://dl.fbaipublicfiles.com/dynamic_replica_v1/dynamic_stereo_dr_sf.pth) trained on SceneFlow and *Dynamic Replica*
+To evaluate DynamicStereo:
+```
+python ./evaluation/evaluate.py --config-name eval_dynamic_replica_40_frames \
+ MODEL.model_name=DynamicStereoModel exp_dir=./outputs/test_dynamic_replica_ds \
+ MODEL.DynamicStereoModel.model_weights=./checkpoints/dynamic_stereo_sf.pth
+```
+Due to the high image resolution, evaluation on *Dynamic Replica* requires a 32GB GPU. If you don't have enough GPU memory, you can decrease `kernel_size` from 20 to 10 by adding `MODEL.DynamicStereoModel.kernel_size=10` to the above python command. Another option is to decrease the dataset resolution.
+As a result, you should see the numbers from *Table 5* in the [paper](https://arxiv.org/pdf/2305.02296.pdf). (for this, you need `kernel_size=20`)
+Reconstructions of all the *Dynamic Replica* splits (including *real*) will be visualized and saved to `exp_dir`.
+If you installed [RAFT-Stereo](https://github.com/princeton-vl/RAFT-Stereo), you can run:
+```
+python ./evaluation/evaluate.py --config-name eval_dynamic_replica_40_frames \
+  MODEL.model_name=RAFTStereoModel exp_dir=./outputs/test_dynamic_replica_raft
+```
+Other public datasets we use:
+ - [SceneFlow](https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html)
+ - [Sintel](http://sintel.is.tue.mpg.de/stereo)
+ - [Middlebury](https://vision.middlebury.edu/stereo/data/)
+ - [ETH3D](https://www.eth3d.net/datasets#low-res-two-view-training-data)
+ - [KITTI 2015](http://www.cvlibs.net/datasets/kitti/eval_stereo.php)
+## Training
+Training requires a 32GB GPU. You can decrease `image_size` and / or `sample_len` if you don't have enough GPU memory.
+You need to donwload SceneFlow before training. Alternatively, you can only train on *Dynamic Replica*.
+```
+python train.py --batch_size 1 \
+ --spatial_scale -0.2 0.4 --image_size 384 512 --saturation_range 0 1.4 --num_steps 200000  \
+ --ckpt_path dynamicstereo_sf_dr  \
+  --sample_len 5 --lr 0.0003 --train_iters 10 --valid_iters 20    \
+  --num_workers 28 --save_freq 100  --update_block_3d --different_update_blocks \
+  --attention_type self_stereo_temporal_update_time_update_space --train_datasets dynamic_replica things monkaa driving
+```
+If you want to train on SceneFlow only, remove the flag `dynamic_replica` from `train_datasets`.
+## License
+The majority of dynamic_stereo is licensed under CC-BY-NC, however portions of the project are available under separate license terms: [RAFT-Stereo](https://github.com/princeton-vl/RAFT-Stereo) is licensed under the MIT license, [LoFTR](https://github.com/zju3dv/LoFTR) and [CREStereo](https://github.com/megvii-research/CREStereo) are licensed under the Apache 2.0 license.
+## Citing DynamicStereo
+If you use DynamicStereo or Dynamic Replica in your research, please use the following BibTeX entry.
+```
+@article{karaev2023dynamicstereo,
+  title={DynamicStereo: Consistent Dynamic Depth from Stereo Videos},
+  author={Nikita Karaev and Ignacio Rocco and Benjamin Graham and Natalia Neverova and Andrea Vedaldi and Christian Rupprecht},
+  journal={CVPR},
+  year={2023}
+}
+```

datasets/augmentor.py ADDED Viewed

	@@ -0,0 +1,200 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+import numpy as np
+import random
+from PIL import Image
+import cv2
+cv2.setNumThreads(0)
+cv2.ocl.setUseOpenCL(False)
+from torchvision.transforms import ColorJitter, functional, Compose
+class AdjustGamma(object):
+    def __init__(self, gamma_min, gamma_max, gain_min=1.0, gain_max=1.0):
+        self.gamma_min, self.gamma_max, self.gain_min, self.gain_max = (
+            gamma_min,
+            gamma_max,
+            gain_min,
+            gain_max,
+        )
+    def __call__(self, sample):
+        gain = random.uniform(self.gain_min, self.gain_max)
+        gamma = random.uniform(self.gamma_min, self.gamma_max)
+        return functional.adjust_gamma(sample, gamma, gain)
+    def __repr__(self):
+        return f"Adjust Gamma {self.gamma_min}, ({self.gamma_max}) and Gain ({self.gain_min}, {self.gain_max})"
+class SequenceDispFlowAugmentor:
+    def __init__(
+        self,
+        crop_size,
+        min_scale=-0.2,
+        max_scale=0.5,
+        do_flip=True,
+        yjitter=False,
+        saturation_range=[0.6, 1.4],
+        gamma=[1, 1, 1, 1],
+    ):
+        # spatial augmentation params
+        self.crop_size = crop_size
+        self.min_scale = min_scale
+        self.max_scale = max_scale
+        self.spatial_aug_prob = 1.0
+        self.stretch_prob = 0.8
+        self.max_stretch = 0.2
+        # flip augmentation params
+        self.yjitter = yjitter
+        self.do_flip = do_flip
+        self.h_flip_prob = 0.5
+        self.v_flip_prob = 0.1
+        # photometric augmentation params
+        self.photo_aug = Compose(
+            [
+                ColorJitter(
+                    brightness=0.4,
+                    contrast=0.4,
+                    saturation=saturation_range,
+                    hue=0.5 / 3.14,
+                ),
+                AdjustGamma(*gamma),
+            ]
+        )
+        self.asymmetric_color_aug_prob = 0.2
+        self.eraser_aug_prob = 0.5
+    def color_transform(self, seq):
+        """Photometric augmentation"""
+        # asymmetric
+        if np.random.rand() < self.asymmetric_color_aug_prob:
+            for i in range(len(seq)):
+                for cam in (0, 1):
+                    seq[i][cam] = np.array(
+                        self.photo_aug(Image.fromarray(seq[i][cam])), dtype=np.uint8
+                    )
+        # symmetric
+        else:
+            image_stack = np.concatenate(
+                [seq[i][cam] for i in range(len(seq)) for cam in (0, 1)], axis=0
+            )
+            image_stack = np.array(
+                self.photo_aug(Image.fromarray(image_stack)), dtype=np.uint8
+            )
+            split = np.split(image_stack, len(seq) * 2, axis=0)
+            for i in range(len(seq)):
+                seq[i][0] = split[2 * i]
+                seq[i][1] = split[2 * i + 1]
+        return seq
+    def eraser_transform(self, seq, bounds=[50, 100]):
+        """Occlusion augmentation"""
+        ht, wd = seq[0][0].shape[:2]
+        for i in range(len(seq)):
+            for cam in (0, 1):
+                if np.random.rand() < self.eraser_aug_prob:
+                    mean_color = np.mean(seq[0][0].reshape(-1, 3), axis=0)
+                    for _ in range(np.random.randint(1, 3)):
+                        x0 = np.random.randint(0, wd)
+                        y0 = np.random.randint(0, ht)
+                        dx = np.random.randint(bounds[0], bounds[1])
+                        dy = np.random.randint(bounds[0], bounds[1])
+                        seq[i][cam][y0 : y0 + dy, x0 : x0 + dx, :] = mean_color
+        return seq
+    def spatial_transform(self, img, disp):
+        # randomly sample scale
+        ht, wd = img[0][0].shape[:2]
+        min_scale = np.maximum(
+            (self.crop_size[0] + 8) / float(ht), (self.crop_size[1] + 8) / float(wd)
+        )
+        scale = 2 ** np.random.uniform(self.min_scale, self.max_scale)
+        scale_x = scale
+        scale_y = scale
+        if np.random.rand() < self.stretch_prob:
+            scale_x *= 2 ** np.random.uniform(-self.max_stretch, self.max_stretch)
+            scale_y *= 2 ** np.random.uniform(-self.max_stretch, self.max_stretch)
+        scale_x = np.clip(scale_x, min_scale, None)
+        scale_y = np.clip(scale_y, min_scale, None)
+        if np.random.rand() < self.spatial_aug_prob:
+            # rescale the images
+            for i in range(len(img)):
+                for cam in (0, 1):
+                    img[i][cam] = cv2.resize(
+                        img[i][cam],
+                        None,
+                        fx=scale_x,
+                        fy=scale_y,
+                        interpolation=cv2.INTER_LINEAR,
+                    )
+                    if len(disp[i]) > 0:
+                        disp[i][cam] = cv2.resize(
+                            disp[i][cam],
+                            None,
+                            fx=scale_x,
+                            fy=scale_y,
+                            interpolation=cv2.INTER_LINEAR,
+                        )
+                        disp[i][cam] = disp[i][cam] * [scale_x, scale_y]
+        if self.yjitter:
+            y0 = np.random.randint(2, img[0][0].shape[0] - self.crop_size[0] - 2)
+            x0 = np.random.randint(2, img[0][0].shape[1] - self.crop_size[1] - 2)
+            for i in range(len(img)):
+                y1 = y0 + np.random.randint(-2, 2 + 1)
+                img[i][0] = img[i][0][
+                    y0 : y0 + self.crop_size[0], x0 : x0 + self.crop_size[1]
+                ]
+                img[i][1] = img[i][1][
+                    y1 : y1 + self.crop_size[0], x0 : x0 + self.crop_size[1]
+                ]
+                if len(disp[i]) > 0:
+                    disp[i][0] = disp[i][0][
+                        y0 : y0 + self.crop_size[0], x0 : x0 + self.crop_size[1]
+                    ]
+                    disp[i][1] = disp[i][1][
+                        y1 : y1 + self.crop_size[0], x0 : x0 + self.crop_size[1]
+                    ]
+        else:
+            y0 = np.random.randint(0, img[0][0].shape[0] - self.crop_size[0])
+            x0 = np.random.randint(0, img[0][0].shape[1] - self.crop_size[1])
+            for i in range(len(img)):
+                for cam in (0, 1):
+                    img[i][cam] = img[i][cam][
+                        y0 : y0 + self.crop_size[0], x0 : x0 + self.crop_size[1]
+                    ]
+                    if len(disp[i]) > 0:
+                        disp[i][cam] = disp[i][cam][
+                            y0 : y0 + self.crop_size[0], x0 : x0 + self.crop_size[1]
+                        ]
+        return img, disp
+    def __call__(self, img, disp):
+        img = self.color_transform(img)
+        img = self.eraser_transform(img)
+        img, disp = self.spatial_transform(img, disp)
+        for i in range(len(img)):
+            for cam in (0, 1):
+                img[i][cam] = np.ascontiguousarray(img[i][cam])
+                if len(disp[i]) > 0:
+                    disp[i][cam] = np.ascontiguousarray(disp[i][cam])
+        return img, disp

datasets/dynamic_stereo_datasets.py ADDED Viewed

	@@ -0,0 +1,743 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+# # Data loading based on https://github.com/NVIDIA/flownet2-pytorch
+# -- Added by Chu King on 16th November 2025 for debugging purposes.
+import torch.distributed as dist
+import signal
+import os
+import copy
+import gzip
+import logging
+import torch
+import numpy as np
+import torch.utils.data as data
+import torch.nn.functional as F
+import os.path as osp
+from glob import glob
+from collections import defaultdict
+from PIL import Image
+from dataclasses import dataclass
+from typing import List, Optional
+from pytorch3d.renderer.cameras import PerspectiveCameras
+from pytorch3d.implicitron.dataset.types import (
+    FrameAnnotation as ImplicitronFrameAnnotation,
+    load_dataclass,
+)
+from datasets import frame_utils
+from evaluation.utils.eval_utils import depth2disparity_scale
+from datasets.augmentor import SequenceDispFlowAugmentor
+@dataclass
+class DynamicReplicaFrameAnnotation(ImplicitronFrameAnnotation):
+    """A dataclass used to load annotations from json."""
+    camera_name: Optional[str] = None
+class StereoSequenceDataset(data.Dataset):
+    def __init__(self, aug_params=None, sparse=False, reader=None):
+        self.augmentor = None
+        self.sparse = sparse
+        self.img_pad = (
+            aug_params.pop("img_pad", None) if aug_params is not None else None
+        )
+        if aug_params is not None and "crop_size" in aug_params:
+            if sparse:
+                raise ValueError("Sparse augmentor is not implemented")
+            else:
+                self.augmentor = SequenceDispFlowAugmentor(**aug_params)
+        if reader is None:
+            self.disparity_reader = frame_utils.read_gen
+        else:
+            self.disparity_reader = reader
+        self.depth_reader = self._load_16big_png_depth
+        self.is_test = False
+        self.sample_list = []
+        self.extra_info = []
+        self.depth_eps = 1e-5
+    def _load_16big_png_depth(self, depth_png):
+        with Image.open(depth_png) as depth_pil:
+            # the image is stored with 16-bit depth but PIL reads it as I (32 bit).
+            # we cast it to uint16, then reinterpret as float16, then cast to float32
+            depth = (
+                np.frombuffer(np.array(depth_pil, dtype=np.uint16), dtype=np.float16)
+                .astype(np.float32)
+                .reshape((depth_pil.size[1], depth_pil.size[0]))
+            )
+        return depth
+    def _get_pytorch3d_camera(
+        self, entry_viewpoint, image_size, scale: float
+    ) -> PerspectiveCameras:
+        assert entry_viewpoint is not None
+        # principal point and focal length
+        principal_point = torch.tensor(
+            entry_viewpoint.principal_point, dtype=torch.float
+        )
+        focal_length = torch.tensor(entry_viewpoint.focal_length, dtype=torch.float)
+        half_image_size_wh_orig = (
+            torch.tensor(list(reversed(image_size)), dtype=torch.float) / 2.0
+        )
+        # first, we convert from the dataset's NDC convention to pixels
+        format = entry_viewpoint.intrinsics_format
+        if format.lower() == "ndc_norm_image_bounds":
+            # this is e.g. currently used in CO3D for storing intrinsics
+            rescale = half_image_size_wh_orig
+        elif format.lower() == "ndc_isotropic":
+            rescale = half_image_size_wh_orig.min()
+        else:
+            raise ValueError(f"Unknown intrinsics format: {format}")
+        # principal point and focal length in pixels
+        principal_point_px = half_image_size_wh_orig - principal_point * rescale
+        focal_length_px = focal_length * rescale
+        # now, convert from pixels to PyTorch3D v0.5+ NDC convention
+        # if self.image_height is None or self.image_width is None:
+        out_size = list(reversed(image_size))
+        half_image_size_output = torch.tensor(out_size, dtype=torch.float) / 2.0
+        half_min_image_size_output = half_image_size_output.min()
+        # rescaled principal point and focal length in ndc
+        principal_point = (
+            half_image_size_output - principal_point_px * scale
+        ) / half_min_image_size_output
+        focal_length = focal_length_px * scale / half_min_image_size_output
+        return PerspectiveCameras(
+            focal_length=focal_length[None],
+            principal_point=principal_point[None],
+            R=torch.tensor(entry_viewpoint.R, dtype=torch.float)[None],
+            T=torch.tensor(entry_viewpoint.T, dtype=torch.float)[None],
+        )
+    def _get_output_tensor(self, sample):
+        output_tensor = defaultdict(list)
+        sample_size = len(sample["image"]["left"])
+        output_tensor_keys = ["img", "disp", "valid_disp", "mask"]
+        add_keys = ["viewpoint", "metadata"]
+        for add_key in add_keys:
+            if add_key in sample:
+                output_tensor_keys.append(add_key)
+        for key in output_tensor_keys:
+            output_tensor[key] = [[] for _ in range(sample_size)]
+        if "viewpoint" in sample:
+            viewpoint_left = self._get_pytorch3d_camera(
+                sample["viewpoint"]["left"][0],
+                sample["metadata"]["left"][0][1],
+                scale=1.0,
+            )
+            viewpoint_right = self._get_pytorch3d_camera(
+                sample["viewpoint"]["right"][0],
+                sample["metadata"]["right"][0][1],
+                scale=1.0,
+            )
+            depth2disp_scale = depth2disparity_scale(
+                viewpoint_left,
+                viewpoint_right,
+                torch.Tensor(sample["metadata"]["left"][0][1])[None],
+            )
+        for i in range(sample_size):
+            for cam in ["left", "right"]:
+                if "mask" in sample and cam in sample["mask"]:
+                    mask = frame_utils.read_gen(sample["mask"][cam][i])
+                    mask = np.array(mask) / 255.0
+                    output_tensor["mask"][i].append(mask)
+                if "viewpoint" in sample and cam in sample["viewpoint"]:
+                    viewpoint = self._get_pytorch3d_camera(
+                        sample["viewpoint"][cam][i],
+                        sample["metadata"][cam][i][1],
+                        scale=1.0,
+                    )
+                    output_tensor["viewpoint"][i].append(viewpoint)
+                if "metadata" in sample and cam in sample["metadata"]:
+                    metadata = sample["metadata"][cam][i]
+                    output_tensor["metadata"][i].append(metadata)
+                if cam in sample["image"]:
+                    img = frame_utils.read_gen(sample["image"][cam][i])
+                    img = np.array(img).astype(np.uint8)
+                    # grayscale images
+                    if len(img.shape) == 2:
+                        img = np.tile(img[..., None], (1, 1, 3))
+                    else:
+                        img = img[..., :3]
+                    output_tensor["img"][i].append(img)
+                if cam in sample["disparity"]:
+                    disp = self.disparity_reader(sample["disparity"][cam][i])
+                    if isinstance(disp, tuple):
+                        disp, valid_disp = disp
+                    else:
+                        valid_disp = disp < 512
+                    disp = np.array(disp).astype(np.float32)
+                    disp = np.stack([-disp, np.zeros_like(disp)], axis=-1)
+                    output_tensor["disp"][i].append(disp)
+                    output_tensor["valid_disp"][i].append(valid_disp)
+                elif "depth" in sample and cam in sample["depth"]:
+                    depth = self.depth_reader(sample["depth"][cam][i])
+                    depth_mask = depth < self.depth_eps
+                    depth[depth_mask] = self.depth_eps
+                    disp = depth2disp_scale / depth
+                    disp[depth_mask] = 0
+                    valid_disp = (disp < 512) * (1 - depth_mask)
+                    disp = np.array(disp).astype(np.float32)
+                    disp = np.stack([-disp, np.zeros_like(disp)], axis=-1)
+                    output_tensor["disp"][i].append(disp)
+                    output_tensor["valid_disp"][i].append(valid_disp)
+        return output_tensor
+    def __getitem__(self, index):
+        im_tensor = {"img": None}
+        sample = self.sample_list[index]
+        if self.is_test:
+            sample_size = len(sample["image"]["left"])
+            im_tensor["img"] = [[] for _ in range(sample_size)]
+            for i in range(sample_size):
+                for cam in ["left", "right"]:
+                    img = frame_utils.read_gen(sample["image"][cam][i])
+                    img = np.array(img).astype(np.uint8)[..., :3]
+                    img = torch.from_numpy(img).permute(2, 0, 1).float()
+                    im_tensor["img"][i].append(img)
+            im_tensor["img"] = torch.stack(im_tensor["img"])
+            return im_tensor, self.extra_info[index]
+        index = index % len(self.sample_list)
+        try:
+            output_tensor = self._get_output_tensor(sample)
+        except:
+            logging.warning(f"Exception in loading sample {index}!")
+            index = np.random.randint(len(self.sample_list))
+            logging.info(f"New index is {index}")
+            sample = self.sample_list[index]
+            output_tensor = self._get_output_tensor(sample)
+        sample_size = len(sample["image"]["left"])
+        if self.augmentor is not None:
+            output_tensor["img"], output_tensor["disp"] = self.augmentor(
+                output_tensor["img"], output_tensor["disp"]
+            )
+        for i in range(sample_size):
+            for cam in (0, 1):
+                if cam < len(output_tensor["img"][i]):
+                    img = (
+                        torch.from_numpy(output_tensor["img"][i][cam])
+                        .permute(2, 0, 1)
+                        .float()
+                    )
+                    if self.img_pad is not None:
+                        padH, padW = self.img_pad
+                        img = F.pad(img, [padW] * 2 + [padH] * 2)
+                    output_tensor["img"][i][cam] = img
+                if cam < len(output_tensor["disp"][i]):
+                    disp = (
+                        torch.from_numpy(output_tensor["disp"][i][cam])
+                        .permute(2, 0, 1)
+                        .float()
+                    )
+                    if self.sparse:
+                        valid_disp = torch.from_numpy(
+                            output_tensor["valid_disp"][i][cam]
+                        )
+                    else:
+                        valid_disp = (
+                            (disp[0].abs() < 512)
+                            & (disp[1].abs() < 512)
+                            & (disp[0].abs() != 0)
+                        )
+                    disp = disp[:1]
+                    output_tensor["disp"][i][cam] = disp
+                    output_tensor["valid_disp"][i][cam] = valid_disp.float()
+                if "mask" in output_tensor and cam < len(output_tensor["mask"][i]):
+                    mask = torch.from_numpy(output_tensor["mask"][i][cam]).float()
+                    output_tensor["mask"][i][cam] = mask
+                if "viewpoint" in output_tensor and cam < len(
+                    output_tensor["viewpoint"][i]
+                ):
+                    viewpoint = output_tensor["viewpoint"][i][cam]
+                    output_tensor["viewpoint"][i][cam] = viewpoint
+        res = {}
+        if "viewpoint" in output_tensor and self.split != "train":
+            res["viewpoint"] = output_tensor["viewpoint"]
+        if "metadata" in output_tensor and self.split != "train":
+            res["metadata"] = output_tensor["metadata"]
+        for k, v in output_tensor.items():
+            if k != "viewpoint" and k != "metadata":
+                for i in range(len(v)):
+                    if len(v[i]) > 0:
+                        v[i] = torch.stack(v[i])
+                if len(v) > 0 and (len(v[0]) > 0):
+                    res[k] = torch.stack(v)
+        return res
+    def __mul__(self, v):
+        copy_of_self = copy.deepcopy(self)
+        copy_of_self.sample_list = v * copy_of_self.sample_list
+        copy_of_self.extra_info = v * copy_of_self.extra_info
+        return copy_of_self
+    def __len__(self):
+        return len(self.sample_list)
+class DynamicReplicaDataset(StereoSequenceDataset):
+    def __init__(
+        self,
+        aug_params=None,
+        root="./dynamic_replica_data",
+        split="train",
+        sample_len=-1,
+        only_first_n_samples=-1,
+        t_step_validation=1,     # -- Added by Chu King on 24th November 2025 to control the separation between consecutive samples in validation
+        VERBOSE=False            # -- Added by Chu King on 16th November 2025 for debugging purposes
+    ):
+        super(DynamicReplicaDataset, self).__init__(aug_params)
+        self.root = root
+        self.sample_len = sample_len
+        self.split = split
+        frame_annotations_file = f"frame_annotations_{split}.jgz"
+        with gzip.open(
+            osp.join(root, split, frame_annotations_file), "rt", encoding="utf8"
+        ) as zipfile:
+            frame_annots_list = load_dataclass(
+                zipfile, List[DynamicReplicaFrameAnnotation]
+            )
+        seq_annot = defaultdict(lambda: defaultdict(list))
+        for frame_annot in frame_annots_list:
+            seq_annot[frame_annot.sequence_name][frame_annot.camera_name].append(
+                frame_annot
+            )
+        # -- Added by Chu King on 16th November 2025 for debugging purposes
+        if VERBOSE:
+            rank = dist.get_rank() if dist.is_initialized() else 0
+            with open(f"debug_rank_{rank}.txt", "a") as f:
+                f.write("[INFO] seq_annot: {}\n".format(seq_annot))
+                # -- os.kill(os.getpid(), signal.SIGABRT)
+        for seq_name in seq_annot.keys():
+            # -- Added by Chu King on 16th November 2025 for debugging purposes
+            if VERBOSE:
+                rank = dist.get_rank() if dist.is_initialized() else 0
+                with open(f"debug_rank_{rank}.txt", "a") as f:
+                    f.write("---- ----\n")
+                    f.write("[INFO] seq_name: {}\n".format(seq_name))
+            try:
+                filenames = defaultdict(lambda: defaultdict(list))
+                for cam in ["left", "right"]:
+                    for framedata in seq_annot[seq_name][cam]:
+                        im_path = osp.join(root, split, framedata.image.path)
+                        depth_path = osp.join(root, split, framedata.depth.path)
+                        mask_path = osp.join(root, split, framedata.mask.path)
+                        # -- Added by Chu King on 16th November 2025 for debugging purposes
+                        if VERBOSE:
+                            rank = dist.get_rank() if dist.is_initialized() else 0
+                            with open(f"debug_rank_{rank}.txt", "a") as f:
+                                f.write("[INFO] cam: {}\n".format(cam))
+                                f.write("[INFO] framedata: {}\n".format(framedata))
+                                f.write("[INFO] framedata.viewpoint: {}\n".format(framedata.viewpoint))
+                                f.write("[INFO] im_path: {}\n".format(im_path))
+                                f.write("[INFO] depth_path: {}\n".format(depth_path))
+                                f.write("[INFO] mask_path: {}\n".format(mask_path))
+                        # -- Modified by Chu King on 16th November 2025 to clarify the nature of assertion errors.
+                        assert os.path.isfile(im_path), "[ERROR] Rectified image path {} doesn't exist.".format(im_path)
+                        tokens = root.split("/")
+                        # -- if split != "test" and "real" not in tokens:
+                        # --     assert os.path.isfile(depth_path), "[ERROR] Depth path {} doesn't exist. ".format(depth_path)
+                        if not os.path.isfile(depth_path):
+                            if split != "test" or "real" not in tokens:
+                                print ("[WARNING] Depth path {} doesn't exist.".format(depth_path))
+                        assert os.path.isfile(mask_path), "[ERROR] Mask path {} doesn't exist.".format(mask_path)
+                        filenames["image"][cam].append(im_path)
+                        filenames["mask"][cam].append(mask_path)
+                        filenames["depth"][cam].append(depth_path)
+                        filenames["viewpoint"][cam].append(framedata.viewpoint)
+                        filenames["metadata"][cam].append(
+                            [framedata.sequence_name, framedata.image.size]
+                        )
+                        for k in filenames.keys():
+                            assert (
+                                len(filenames[k][cam])
+                                == len(filenames["image"][cam])
+                                > 0
+                            ), framedata.sequence_name
+                if not os.path.isfile(depth_path):
+                    del filenames["depth"]
+                seq_len = len(filenames["image"][cam])
+                print("seq_len", seq_name, seq_len)
+                if split == "train":
+                    for ref_idx in range(0, seq_len, 3):
+                        # -- step = 1 if self.sample_len == 1 else np.random.randint(1, 6)
+                        # -- Modified by Chu King on 24th November 2025 to handle high-speed motion.
+                        step = 1 if self.sample_len == 1 else np.random.randint(1, 12)
+                        if ref_idx + step * self.sample_len < seq_len:
+                            sample = defaultdict(lambda: defaultdict(list))
+                            for cam in ["left", "right"]:
+                                for idx in range(
+                                    ref_idx, ref_idx + step * self.sample_len, step
+                                ):
+                                    for k in filenames.keys():
+                                        if "mask" not in k:
+                                            sample[k][cam].append(
+                                                filenames[k][cam][idx]
+                                            )
+                            self.sample_list.append(sample)
+                else:
+                    step = self.sample_len if self.sample_len > 0 else seq_len
+                    counter = 0
+                    for ref_idx in range(0, seq_len, step):
+                        sample = defaultdict(lambda: defaultdict(list))
+                        for cam in ["left", "right"]:
+                            # -- Modified by Chu King on 24th November 2025 to control the separation between samples during validation.
+                            # -- for idx in range(ref_idx, ref_idx + step):
+                            for idx in range(ref_idx, ref_idx + step * t_step_validation, t_step_validation):
+                                for k in filenames.keys():
+                                    sample[k][cam].append(filenames[k][cam][idx])
+                        self.sample_list.append(sample)
+                        counter += 1
+                        if only_first_n_samples > 0 and counter >= only_first_n_samples:
+                            break
+            except Exception as e:
+                print(e)
+                print("Skipping sequence", seq_name)
+        assert len(self.sample_list) > 0, "No samples found"
+        print(f"Added {len(self.sample_list)} from Dynamic Replica {split}")
+        logging.info(f"Added {len(self.sample_list)} from Dynamic Replica {split}")
+class SequenceSceneFlowDataset(StereoSequenceDataset):
+    def __init__(
+        self,
+        aug_params=None,
+        root="./datasets",
+        dstype="frames_cleanpass",
+        sample_len=1,
+        things_test=False,
+        add_things=True,
+        add_monkaa=True,
+        add_driving=True,
+    ):
+        super(SequenceSceneFlowDataset, self).__init__(aug_params)
+        self.root = root
+        self.dstype = dstype
+        self.sample_len = sample_len
+        if things_test:
+            self._add_things("TEST")
+        else:
+            if add_things:
+                self._add_things("TRAIN")
+            if add_monkaa:
+                self._add_monkaa()
+            if add_driving:
+                self._add_driving()
+    def _add_things(self, split="TRAIN"):
+        """Add FlyingThings3D data"""
+        original_length = len(self.sample_list)
+        root = osp.join(self.root, "FlyingThings3D")
+        image_paths = defaultdict(list)
+        disparity_paths = defaultdict(list)
+        for cam in ["left", "right"]:
+            image_paths[cam] = sorted(
+                glob(osp.join(root, self.dstype, split, f"*/*/{cam}/"))
+            )
+            disparity_paths[cam] = [
+                path.replace(self.dstype, "disparity") for path in image_paths[cam]
+            ]
+        # Choose a random subset of 400 images for validation
+        state = np.random.get_state()
+        np.random.seed(1000)
+        val_idxs = set(np.random.permutation(len(image_paths["left"]))[:40])
+        np.random.set_state(state)
+        np.random.seed(0)
+        num_seq = len(image_paths["left"])
+        for seq_idx in range(num_seq):
+            if (split == "TEST" and seq_idx in val_idxs) or (
+                split == "TRAIN" and not seq_idx in val_idxs
+            ):
+                images, disparities = defaultdict(list), defaultdict(list)
+                for cam in ["left", "right"]:
+                    images[cam] = sorted(
+                        glob(osp.join(image_paths[cam][seq_idx], "*.png"))
+                    )
+                    disparities[cam] = sorted(
+                        glob(osp.join(disparity_paths[cam][seq_idx], "*.pfm"))
+                    )
+                self._append_sample(images, disparities)
+        assert len(self.sample_list) > 0, "No samples found"
+        print(
+            f"Added {len(self.sample_list) - original_length} from FlyingThings {self.dstype}"
+        )
+        logging.info(
+            f"Added {len(self.sample_list) - original_length} from FlyingThings {self.dstype}"
+        )
+    def _add_monkaa(self):
+        """Add FlyingThings3D data"""
+        original_length = len(self.sample_list)
+        root = osp.join(self.root, "Monkaa")
+        image_paths = defaultdict(list)
+        disparity_paths = defaultdict(list)
+        for cam in ["left", "right"]:
+            image_paths[cam] = sorted(glob(osp.join(root, self.dstype, f"*/{cam}/")))
+            disparity_paths[cam] = [
+                path.replace(self.dstype, "disparity") for path in image_paths[cam]
+            ]
+        num_seq = len(image_paths["left"])
+        for seq_idx in range(num_seq):
+            images, disparities = defaultdict(list), defaultdict(list)
+            for cam in ["left", "right"]:
+                images[cam] = sorted(glob(osp.join(image_paths[cam][seq_idx], "*.png")))
+                disparities[cam] = sorted(
+                    glob(osp.join(disparity_paths[cam][seq_idx], "*.pfm"))
+                )
+            self._append_sample(images, disparities)
+        assert len(self.sample_list) > 0, "No samples found"
+        print(
+            f"Added {len(self.sample_list) - original_length} from Monkaa {self.dstype}"
+        )
+        logging.info(
+            f"Added {len(self.sample_list) - original_length} from Monkaa {self.dstype}"
+        )
+    def _add_driving(self):
+        """Add FlyingThings3D data"""
+        original_length = len(self.sample_list)
+        root = osp.join(self.root, "Driving")
+        image_paths = defaultdict(list)
+        disparity_paths = defaultdict(list)
+        for cam in ["left", "right"]:
+            image_paths[cam] = sorted(
+                glob(osp.join(root, self.dstype, f"*/*/*/{cam}/"))
+            )
+            disparity_paths[cam] = [
+                path.replace(self.dstype, "disparity") for path in image_paths[cam]
+            ]
+        num_seq = len(image_paths["left"])
+        for seq_idx in range(num_seq):
+            images, disparities = defaultdict(list), defaultdict(list)
+            for cam in ["left", "right"]:
+                images[cam] = sorted(glob(osp.join(image_paths[cam][seq_idx], "*.png")))
+                disparities[cam] = sorted(
+                    glob(osp.join(disparity_paths[cam][seq_idx], "*.pfm"))
+                )
+            self._append_sample(images, disparities)
+        assert len(self.sample_list) > 0, "No samples found"
+        print(
+            f"Added {len(self.sample_list) - original_length} from Driving {self.dstype}"
+        )
+        logging.info(
+            f"Added {len(self.sample_list) - original_length} from Driving {self.dstype}"
+        )
+    def _append_sample(self, images, disparities):
+        seq_len = len(images["left"])
+        for ref_idx in range(0, seq_len - self.sample_len):
+            sample = defaultdict(lambda: defaultdict(list))
+            for cam in ["left", "right"]:
+                for idx in range(ref_idx, ref_idx + self.sample_len):
+                    sample["image"][cam].append(images[cam][idx])
+                    sample["disparity"][cam].append(disparities[cam][idx])
+            self.sample_list.append(sample)
+            sample = defaultdict(lambda: defaultdict(list))
+            for cam in ["left", "right"]:
+                for idx in range(ref_idx, ref_idx + self.sample_len):
+                    sample["image"][cam].append(images[cam][seq_len - idx - 1])
+                    sample["disparity"][cam].append(disparities[cam][seq_len - idx - 1])
+            self.sample_list.append(sample)
+class SequenceSintelStereo(StereoSequenceDataset):
+    def __init__(
+        self,
+        dstype="clean",
+        aug_params=None,
+        root="./datasets",
+    ):
+        super().__init__(
+            aug_params, sparse=True, reader=frame_utils.readDispSintelStereo
+        )
+        self.dstype = dstype
+        original_length = len(self.sample_list)
+        image_root = osp.join(root, "sintel_stereo", "training")
+        image_paths = defaultdict(list)
+        disparity_paths = defaultdict(list)
+        for cam in ["left", "right"]:
+            image_paths[cam] = sorted(
+                glob(osp.join(image_root, f"{self.dstype}_{cam}/*"))
+            )
+        cam = "left"
+        disparity_paths[cam] = [
+            path.replace(f"{self.dstype}_{cam}", "disparities")
+            for path in image_paths[cam]
+        ]
+        num_seq = len(image_paths["left"])
+        # for each sequence
+        for seq_idx in range(num_seq):
+            sample = defaultdict(lambda: defaultdict(list))
+            for cam in ["left", "right"]:
+                sample["image"][cam] = sorted(
+                    glob(osp.join(image_paths[cam][seq_idx], "*.png"))
+                )
+            cam = "left"
+            sample["disparity"][cam] = sorted(
+                glob(osp.join(disparity_paths[cam][seq_idx], "*.png"))
+            )
+            for im1, disp in zip(sample["image"][cam], sample["disparity"][cam]):
+                assert (
+                    im1.split("/")[-1].split(".")[0]
+                    == disp.split("/")[-1].split(".")[0]
+                ), (im1.split("/")[-1].split(".")[0], disp.split("/")[-1].split(".")[0])
+            self.sample_list.append(sample)
+        logging.info(
+            f"Added {len(self.sample_list) - original_length} from SintelStereo {self.dstype}"
+        )
+def fetch_dataloader(args):
+    """Create the data loader for the corresponding trainign set"""
+    aug_params = {
+        "crop_size": args.image_size,
+        "min_scale": args.spatial_scale[0],
+        "max_scale": args.spatial_scale[1],
+        "do_flip": False,
+        "yjitter": not args.noyjitter,
+    }
+    if hasattr(args, "saturation_range") and args.saturation_range is not None:
+        aug_params["saturation_range"] = args.saturation_range
+    if hasattr(args, "img_gamma") and args.img_gamma is not None:
+        aug_params["gamma"] = args.img_gamma
+    if hasattr(args, "do_flip") and args.do_flip is not None:
+        aug_params["do_flip"] = args.do_flip
+    train_dataset = None
+    add_monkaa = "monkaa" in args.train_datasets
+    add_driving = "driving" in args.train_datasets
+    add_things = "things" in args.train_datasets
+    add_dynamic_replica = "dynamic_replica" in args.train_datasets
+    new_dataset = None
+    if add_monkaa or add_driving or add_things:
+        clean_dataset = SequenceSceneFlowDataset(
+            aug_params,
+            dstype="frames_cleanpass",
+            sample_len=args.sample_len,
+            add_monkaa=add_monkaa,
+            add_driving=add_driving,
+            add_things=add_things,
+        )
+        final_dataset = SequenceSceneFlowDataset(
+            aug_params,
+            dstype="frames_finalpass",
+            sample_len=args.sample_len,
+            add_monkaa=add_monkaa,
+            add_driving=add_driving,
+            add_things=add_things,
+        )
+        new_dataset = clean_dataset + final_dataset
+    if add_dynamic_replica:
+        dr_dataset = DynamicReplicaDataset(
+            aug_params, split="train", sample_len=args.sample_len
+        )
+        if new_dataset is None:
+            new_dataset = dr_dataset
+        else:
+            new_dataset = new_dataset + dr_dataset
+    logging.info(f"Adding {len(new_dataset)} samples from SceneFlow")
+    train_dataset = (
+        new_dataset if train_dataset is None else train_dataset + new_dataset
+    )
+    train_loader = data.DataLoader(
+        train_dataset,
+        batch_size=args.batch_size,
+        pin_memory=True,
+        shuffle=True,
+        num_workers=args.num_workers,
+        drop_last=True,
+    )
+    logging.info("Training with %d image pairs" % len(train_dataset))
+    return train_loader

datasets/frame_utils.py ADDED Viewed

	@@ -0,0 +1,118 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+import numpy as np
+from PIL import Image
+from os.path import *
+import re
+import imageio
+import cv2
+cv2.setNumThreads(0)
+cv2.ocl.setUseOpenCL(False)
+TAG_CHAR = np.array([202021.25], np.float32)
+def readFlow(fn):
+    """Read .flo file in Middlebury format"""
+    # Code adapted from:
+    # http://stackoverflow.com/questions/28013200/reading-middlebury-flow-files-with-python-bytes-array-numpy
+    # WARNING: this will work on little-endian architectures (eg Intel x86) only!
+    # print 'fn = %s'%(fn)
+    with open(fn, "rb") as f:
+        magic = np.fromfile(f, np.float32, count=1)
+        if 202021.25 != magic:
+            print("Magic number incorrect. Invalid .flo file")
+            return None
+        else:
+            w = np.fromfile(f, np.int32, count=1)
+            h = np.fromfile(f, np.int32, count=1)
+            # print 'Reading %d x %d flo file\n' % (w, h)
+            data = np.fromfile(f, np.float32, count=2 * int(w) * int(h))
+            # Reshape data into 3D array (columns, rows, bands)
+            # The reshape here is for visualization, the original code is (w,h,2)
+            return np.resize(data, (int(h), int(w), 2))
+def readPFM(file):
+    file = open(file, "rb")
+    color = None
+    width = None
+    height = None
+    scale = None
+    endian = None
+    header = file.readline().rstrip()
+    if header == b"PF":
+        color = True
+    elif header == b"Pf":
+        color = False
+    else:
+        raise Exception("Not a PFM file.")
+    dim_match = re.match(rb"^(\d+)\s(\d+)\s$", file.readline())
+    if dim_match:
+        width, height = map(int, dim_match.groups())
+    else:
+        raise Exception("Malformed PFM header.")
+    scale = float(file.readline().rstrip())
+    if scale < 0:  # little-endian
+        endian = "<"
+        scale = -scale
+    else:
+        endian = ">"  # big-endian
+    data = np.fromfile(file, endian + "f")
+    shape = (height, width, 3) if color else (height, width)
+    data = np.reshape(data, shape)
+    data = np.flipud(data)
+    return data
+def readDispSintelStereo(file_name):
+    """Return disparity read from filename."""
+    f_in = np.array(Image.open(file_name))
+    d_r = f_in[:, :, 0].astype("float64")
+    d_g = f_in[:, :, 1].astype("float64")
+    d_b = f_in[:, :, 2].astype("float64")
+    disp = d_r * 4 + d_g / (2 ** 6) + d_b / (2 ** 14)
+    mask = np.array(Image.open(file_name.replace("disparities", "occlusions")))
+    valid = (mask == 0) & (disp > 0)
+    return disp, valid
+def readDispMiddlebury(file_name):
+    assert basename(file_name) == "disp0GT.pfm"
+    disp = readPFM(file_name).astype(np.float32)
+    assert len(disp.shape) == 2
+    nocc_pix = file_name.replace("disp0GT.pfm", "mask0nocc.png")
+    assert exists(nocc_pix)
+    nocc_pix = imageio.imread(nocc_pix) == 255
+    assert np.any(nocc_pix)
+    return disp, nocc_pix
+def read_gen(file_name, pil=False):
+    ext = splitext(file_name)[-1]
+    if ext == ".png" or ext == ".jpeg" or ext == ".ppm" or ext == ".jpg":
+        return Image.open(file_name)
+    elif ext == ".bin" or ext == ".raw":
+        return np.load(file_name)
+    elif ext == ".flo":
+        return readFlow(file_name).astype(np.float32)
+    elif ext == ".pfm":
+        flow = readPFM(file_name).astype(np.float32)
+        if len(flow.shape) == 2:
+            return flow
+        else:
+            return flow[:, :, :-1]
+    return []

evaluation/configs/eval_dynamic_replica_150_frames.yaml ADDED Viewed

	@@ -0,0 +1,8 @@

+defaults:
+  - default_config_eval
+visualize_interval: 0
+exp_dir: ./outputs/dynamic_stereo_DR
+sample_len: 150
+MODEL:
+    model_name: DynamicStereoModel

evaluation/configs/eval_dynamic_replica_40_frames.yaml ADDED Viewed

	@@ -0,0 +1,8 @@

+defaults:
+  - default_config_eval
+visualize_interval: 0
+exp_dir: ./outputs/dynamic_stereo_DR
+sample_len: 40
+MODEL:
+    model_name: DynamicStereoModel

evaluation/configs/eval_real_data.yaml ADDED Viewed

	@@ -0,0 +1,9 @@

+defaults:
+  - default_config_eval
+visualize_interval: 1
+exp_dir: ./outputs/dynamic_stereo_real
+dataset_name: real
+sample_len: 40
+MODEL:
+    model_name: DynamicStereoModel

evaluation/configs/eval_sintel_clean.yaml ADDED Viewed

	@@ -0,0 +1,9 @@

+defaults:
+  - default_config_eval
+visualize_interval: -1
+exp_dir: ./outputs/dynamic_stereo_sintel_clean
+sample_len: 30
+dataset_name: sintel
+dstype: clean
+MODEL:
+    model_name: DynamicStereoModel

evaluation/configs/eval_sintel_final.yaml ADDED Viewed

	@@ -0,0 +1,9 @@

+defaults:
+  - default_config_eval
+visualize_interval: -1
+exp_dir: ./outputs/dynamic_stereo_sintel_final
+sample_len: 30
+dataset_name: sintel
+dstype: final
+MODEL:
+    model_name: DynamicStereoModel

evaluation/core/evaluator.py ADDED Viewed

	@@ -0,0 +1,152 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+import os
+from collections import defaultdict
+import torch.nn.functional as F
+import torch
+from tqdm import tqdm
+from omegaconf import DictConfig
+from pytorch3d.implicitron.tools.config import Configurable
+from evaluation.utils.eval_utils import depth2disparity_scale, eval_batch
+from evaluation.utils.utils import (
+    PerceptionPrediction,
+    pretty_print_perception_metrics,
+    visualize_batch,
+)
+class Evaluator(Configurable):
+    """
+    A class defining the DynamicStereo evaluator.
+    Args:
+        eps: Threshold for converting disparity to depth.
+    """
+    eps = 1e-5
+    def setup_visualization(self, cfg: DictConfig) -> None:
+        # Visualization
+        self.visualize_interval = cfg.visualize_interval
+        self.exp_dir = cfg.exp_dir
+        if self.visualize_interval > 0:
+            self.visualize_dir = os.path.join(cfg.exp_dir, "visualisations")
+    @torch.no_grad()
+    def evaluate_sequence(
+        self,
+        sci_enc_L,
+        sci_enc_R,
+        model,
+        test_dataloader: torch.utils.data.DataLoader,
+        is_real_data: bool = False,
+        step=None,
+        writer=None,
+        train_mode=False,
+        interp_shape=None,
+        resolution=[480, 640]
+    ):
+        # -- Modified by Chu King on 20th November 2025 for SCI Stereo.
+        # -- model.eval()
+        per_batch_eval_results = []
+        if self.visualize_interval > 0:
+            os.makedirs(self.visualize_dir, exist_ok=True)
+        for batch_idx, sequence in enumerate(tqdm(test_dataloader)):
+            batch_dict = defaultdict(list)
+            batch_dict["stereo_video"] = sequence["img"]
+            if not is_real_data:
+                batch_dict["disparity"] = sequence["disp"][:, 0].abs()
+                batch_dict["disparity_mask"] = sequence["valid_disp"][:, :1] # ~ (T, 1, 720, 1280)
+                if "mask" in sequence:
+                    batch_dict["fg_mask"] = sequence["mask"][:, :1]
+                else:
+                    batch_dict["fg_mask"] = torch.ones_like(
+                        batch_dict["disparity_mask"]
+                    )
+            elif interp_shape is not None:
+                left_video = batch_dict["stereo_video"][:, 0]
+                left_video = F.interpolate(
+                    left_video, tuple(interp_shape), mode="bilinear"
+                )
+                right_video = batch_dict["stereo_video"][:, 1]
+                right_video = F.interpolate(
+                    right_video, tuple(interp_shape), mode="bilinear"
+                )
+                batch_dict["stereo_video"] = torch.stack([left_video, right_video], 1)
+            # -- This method is always invoked with train_mode=True.
+            if train_mode:
+                # -- Modified by Chu King on 20th November 2025.
+                # -- predictions = model.forward_batch_test(batch_dict)
+                predictions = model.forward_batch_test(batch_dict, sci_enc_L, sci_enc_R)
+            else:
+                predictions = model(batch_dict)
+            assert "disparity" in predictions
+            predictions["disparity"] = predictions["disparity"][:, :1].clone().cpu()
+            # -- print ("[INFO] predictions[\"disparity\"].shape", predictions["disparity"].shape)
+            # -- print ("[INFO] batch_dict[\"disparity_mask\"][..., :resolution[0], :resolution[1]].shape", batch_dict["disparity_mask"][..., :resolution[0], :resolution[1]].shape)
+            # -- print ("[INFO] batch_dict[\"disparity_mask\"][..., :resolution[0], :resolution[1]].round().shape", batch_dict["disparity_mask"][..., :resolution[0], :resolution[1]].round().shape)
+            if not is_real_data:
+                predictions["disparity"] = predictions["disparity"] * (
+                    # -- Modified by Chu King on 22nd November 2025
+                    # -- batch_dict["disparity_mask"].round()
+                    batch_dict["disparity_mask"][..., :resolution[0], :resolution[1]].round()
+                )
+                batch_eval_result, seq_length = eval_batch(batch_dict, predictions)
+                per_batch_eval_results.append((batch_eval_result, seq_length))
+                pretty_print_perception_metrics(batch_eval_result)
+            if (self.visualize_interval > 0) and (
+                batch_idx % self.visualize_interval == 0
+            ):
+                perception_prediction = PerceptionPrediction()
+                pred_disp = predictions["disparity"]
+                pred_disp[pred_disp < self.eps] = self.eps
+                scale = depth2disparity_scale(
+                    sequence["viewpoint"][0][0],
+                    sequence["viewpoint"][0][1],
+                    torch.tensor([pred_disp.shape[2], pred_disp.shape[3]])[None],
+                )
+                perception_prediction.depth_map = (scale / pred_disp).cuda()
+                perspective_cameras = []
+                for cam in sequence["viewpoint"]:
+                    perspective_cameras.append(cam[0])
+                perception_prediction.perspective_cameras = perspective_cameras
+                # -- Modified by Chu King on 22nd November 2025 to fix image resolution during training.
+                if "stereo_original_video" in batch_dict:
+                    batch_dict["stereo_video"] = batch_dict["stereo_original_video"][..., :resolution[0], :resolution[1]].clone()
+                for k, v in batch_dict.items():
+                    if isinstance(v, torch.Tensor):
+                        batch_dict[k] = v.cuda()
+                visualize_batch(
+                    batch_dict,
+                    perception_prediction,
+                    output_dir=self.visualize_dir,
+                    sequence_name=sequence["metadata"][0][0][0],
+                    step=step,
+                    writer=writer,
+                    # -- Added by Chu King on 22nd November 2025 to fix image resolution during evaluation.
+                    resolution=resolution
+                )
+        return per_batch_eval_results

evaluation/evaluate.py ADDED Viewed

	@@ -0,0 +1,143 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+import json
+import os
+from dataclasses import dataclass, field
+from typing import Any, Dict, Optional
+import hydra
+import numpy as np
+import torch
+from omegaconf import OmegaConf
+from dynamic_stereo.evaluation.utils.utils import aggregate_and_print_results
+import dynamic_stereo.datasets.dynamic_stereo_datasets as datasets
+from dynamic_stereo.models.core.model_zoo import (
+    get_all_model_default_configs,
+    model_zoo,
+)
+from pytorch3d.implicitron.tools.config import get_default_args_field
+from dynamic_stereo.evaluation.core.evaluator import Evaluator
+@dataclass(eq=False)
+class DefaultConfig:
+    exp_dir: str = "./outputs"
+    # one of [sintel, dynamicreplica, things]
+    dataset_name: str = "dynamicreplica"
+    sample_len: int = -1
+    dstype: Optional[str] = None
+    # clean, final
+    MODEL: Dict[str, Any] = field(
+        default_factory=lambda: get_all_model_default_configs()
+    )
+    EVALUATOR: Dict[str, Any] = get_default_args_field(Evaluator)
+    seed: int = 42
+    gpu_idx: int = 0
+    visualize_interval: int = 0  # Use 0 for no visualization
+    # Override hydra's working directory to current working dir,
+    # also disable storing the .hydra logs:
+    hydra: dict = field(
+        default_factory=lambda: {
+            "run": {"dir": "."},
+            "output_subdir": None,
+        }
+    )
+def run_eval(cfg: DefaultConfig):
+    """
+    Evaluates new view synthesis metrics of a specified model
+    on a benchmark dataset.
+    """
+    # make the experiment directory
+    os.makedirs(cfg.exp_dir, exist_ok=True)
+    # dump the exp cofig to the exp_dir
+    cfg_file = os.path.join(cfg.exp_dir, "expconfig.yaml")
+    with open(cfg_file, "w") as f:
+        OmegaConf.save(config=cfg, f=f)
+    torch.manual_seed(cfg.seed)
+    np.random.seed(cfg.seed)
+    evaluator = Evaluator(**cfg.EVALUATOR)
+    model = model_zoo(**cfg.MODEL)
+    model.cuda(0)
+    evaluator.setup_visualization(cfg)
+    if cfg.dataset_name == "dynamicreplica":
+        test_dataloader = datasets.DynamicReplicaDataset(
+            split="valid", sample_len=cfg.sample_len, only_first_n_samples=1
+        )
+    elif cfg.dataset_name == "sintel":
+        test_dataloader = datasets.SequenceSintelStereo(dstype=cfg.dstype)
+    elif cfg.dataset_name == "things":
+        test_dataloader = datasets.SequenceSceneFlowDatasets(
+            {},
+            dstype=cfg.dstype,
+            sample_len=cfg.sample_len,
+            add_monkaa=False,
+            add_driving=False,
+            things_test=True,
+        )
+    elif cfg.dataset_name == "real":
+        for real_sequence_name in ["teddy_static", "ignacio_waving", "nikita_reading"]:
+            ds_path = f"./dynamic_replica_data/real/{real_sequence_name}"
+            # seq_len_real = 20
+            real_dataset = datasets.DynamicReplicaDataset(
+                split="test",
+                sample_len=cfg.sample_len,
+                root=ds_path,
+                only_first_n_samples=1,
+            )
+            evaluator.evaluate_sequence(
+                model=model,
+                test_dataloader=real_dataset,
+                is_real_data=True,
+                train_mode=False,
+            )
+        return
+    print()
+    evaluate_result = evaluator.evaluate_sequence(
+        model,
+        test_dataloader,
+    )
+    aggreegate_result = aggregate_and_print_results(evaluate_result)
+    result_file = os.path.join(cfg.exp_dir, f"result_eval.json")
+    print(f"Dumping eval results to {result_file}.")
+    with open(result_file, "w") as f:
+        json.dump(aggreegate_result, f)
+cs = hydra.core.config_store.ConfigStore.instance()
+cs.store(name="default_config_eval", node=DefaultConfig)
+@hydra.main(config_path="./configs/", config_name="default_config_eval")
+def evaluate(cfg: DefaultConfig) -> None:
+    os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
+    os.environ["CUDA_VISIBLE_DEVICES"] = str(cfg.gpu_idx)
+    run_eval(cfg)
+if __name__ == "__main__":
+    evaluate()

evaluation/utils/eval_utils.py ADDED Viewed

	@@ -0,0 +1,213 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+from dataclasses import dataclass
+from typing import Dict, Optional, Union
+import torch
+from pytorch3d.utils import opencv_from_cameras_projection
+@dataclass(eq=True, frozen=True)
+class PerceptionMetric:
+    metric: str
+    depth_scaling_norm: Optional[str] = None
+    suffix: str = ""
+    index: str = ""
+    def __str__(self):
+        return (
+            self.metric
+            + self.index
+            + (
+                ("_norm_" + self.depth_scaling_norm)
+                if self.depth_scaling_norm is not None
+                else ""
+            )
+            + self.suffix
+        )
+def eval_endpoint_error_sequence(
+    x: torch.Tensor,
+    y: torch.Tensor,
+    mask: torch.Tensor,
+    crop: int = 0,
+    mask_thr: float = 0.5,
+    clamp_thr: float = 1e-5,
+) -> Dict[str, torch.Tensor]:
+    assert len(x.shape) == len(y.shape) == len(mask.shape) == 4, (
+        x.shape,
+        y.shape,
+        mask.shape,
+    )
+    assert x.shape[0] == y.shape[0] == mask.shape[0], (x.shape, y.shape, mask.shape)
+    # chuck out the border
+    if crop > 0:
+        if crop > min(y.shape[2:]) - crop:
+            raise ValueError("Incorrect crop size.")
+        y = y[:, :, crop:-crop, crop:-crop]
+        x = x[:, :, crop:-crop, crop:-crop]
+        mask = mask[:, :, crop:-crop, crop:-crop]
+    y = y * (mask > mask_thr).float()
+    x = x * (mask > mask_thr).float()
+    y[torch.isnan(y)] = 0
+    results = {}
+    for epe_name in ("epe", "temp_epe"):
+        if epe_name == "epe":
+            endpoint_error = (mask * (x - y) ** 2).sum(dim=1).sqrt()
+        elif epe_name == "temp_epe":
+            delta_mask = mask[:-1] * mask[1:]
+            endpoint_error = (
+                (delta_mask * ((x[:-1] - x[1:]) - (y[:-1] - y[1:])) ** 2)
+                .sum(dim=1)
+                .sqrt()
+            )
+        # epe_nonzero = endpoint_error != 0
+        nonzero = torch.count_nonzero(endpoint_error)
+        epe_mean = endpoint_error.sum() / torch.clamp(
+            nonzero, clamp_thr
+        )  # average error for all the sequence pixels
+        epe_inv_accuracy_05px = (endpoint_error > 0.5).sum() / torch.clamp(
+            nonzero, clamp_thr
+        )
+        epe_inv_accuracy_1px = (endpoint_error > 1).sum() / torch.clamp(
+            nonzero, clamp_thr
+        )
+        epe_inv_accuracy_2px = (endpoint_error > 2).sum() / torch.clamp(
+            nonzero, clamp_thr
+        )
+        epe_inv_accuracy_3px = (endpoint_error > 3).sum() / torch.clamp(
+            nonzero, clamp_thr
+        )
+        results[f"{epe_name}_mean"] = epe_mean[None]
+        results[f"{epe_name}_bad_0.5px"] = epe_inv_accuracy_05px[None] * 100
+        results[f"{epe_name}_bad_1px"] = epe_inv_accuracy_1px[None] * 100
+        results[f"{epe_name}_bad_2px"] = epe_inv_accuracy_2px[None] * 100
+        results[f"{epe_name}_bad_3px"] = epe_inv_accuracy_3px[None] * 100
+    return results
+def depth2disparity_scale(left_camera, right_camera, image_size_tensor):
+    # # opencv camera matrices
+    (_, T1, K1), (_, T2, _) = [
+        opencv_from_cameras_projection(
+            f,
+            image_size_tensor,
+        )
+        for f in (left_camera, right_camera)
+    ]
+    fix_baseline = T1[0][0] - T2[0][0]
+    focal_length_px = K1[0][0][0]
+    # following this https://github.com/princeton-vl/RAFT-Stereo#converting-disparity-to-depth
+    return focal_length_px * fix_baseline
+def depth_to_pcd(
+    depth_map,
+    img,
+    focal_length,
+    cx,
+    cy,
+    step: int = None,
+    inv_extrinsic=None,
+    mask=None,
+    filter=False,
+):
+    __, w, __ = img.shape
+    if step is None:
+        step = int(w / 100)
+    Z = depth_map[::step, ::step]
+    colors = img[::step, ::step, :]
+    Pixels_Y = torch.arange(Z.shape[0]).to(Z.device) * step
+    Pixels_X = torch.arange(Z.shape[1]).to(Z.device) * step
+    X = (Pixels_X[None, :] - cx) * Z / focal_length
+    Y = (Pixels_Y[:, None] - cy) * Z / focal_length
+    inds = Z > 0
+    if mask is not None:
+        inds = inds * (mask[::step, ::step] > 0)
+    X = X[inds].reshape(-1)
+    Y = Y[inds].reshape(-1)
+    Z = Z[inds].reshape(-1)
+    colors = colors[inds]
+    pcd = torch.stack([X, Y, Z]).T
+    if inv_extrinsic is not None:
+        pcd_ext = torch.vstack([pcd.T, torch.ones((1, pcd.shape[0])).to(Z.device)])
+        pcd = (inv_extrinsic @ pcd_ext)[:3, :].T
+    if filter:
+        pcd, filt_inds = filter_outliers(pcd)
+        colors = colors[filt_inds]
+    return pcd, colors
+def filter_outliers(pcd, sigma=3):
+    mean = pcd.mean(0)
+    std = pcd.std(0)
+    inds = ((pcd - mean).abs() < sigma * std)[:, 2]
+    pcd = pcd[inds]
+    return pcd, inds
+# -- Modified by Chu King on 22nd November 2025 to fix the resolution during evaluation.
+def eval_batch(batch_dict, predictions, resolution=[480, 640]) -> Dict[str, Union[float, torch.Tensor]]:
+    """
+    Produce performance metrics for a single batch of perception
+    predictions.
+    Args:
+        frame_data: A PixarFrameData object containing the input to the new view
+            synthesis method.
+        preds: A PerceptionPrediction object with the predicted data.
+    Returns:
+        results: A dictionary holding evaluation metrics.
+    """
+    results = {}
+    assert "disparity" in predictions
+    mask_now = torch.ones_like(batch_dict["fg_mask"][..., :resolution[0], :resolution[1]])
+    mask_now = mask_now * batch_dict["disparity_mask"][..., :resolution[0], :resolution[1]]
+    eval_flow_traj_output = eval_endpoint_error_sequence(
+        predictions["disparity"], batch_dict["disparity"][..., :resolution[0], :resolution[1]], mask_now
+    )
+    for epe_name in ("epe", "temp_epe"):
+        results[PerceptionMetric(f"disp_{epe_name}_mean")] = eval_flow_traj_output[
+            f"{epe_name}_mean"
+        ]
+        results[PerceptionMetric(f"disp_{epe_name}_bad_3px")] = eval_flow_traj_output[
+            f"{epe_name}_bad_3px"
+        ]
+        results[PerceptionMetric(f"disp_{epe_name}_bad_2px")] = eval_flow_traj_output[
+            f"{epe_name}_bad_2px"
+        ]
+        results[PerceptionMetric(f"disp_{epe_name}_bad_1px")] = eval_flow_traj_output[
+            f"{epe_name}_bad_1px"
+        ]
+        results[PerceptionMetric(f"disp_{epe_name}_bad_0.5px")] = eval_flow_traj_output[
+            f"{epe_name}_bad_0.5px"
+        ]
+    if "endpoint_error_per_pixel" in eval_flow_traj_output:
+        results["disp_endpoint_error_per_pixel"] = eval_flow_traj_output[
+            "endpoint_error_per_pixel"
+        ]
+    return (results, len(predictions["disparity"]))

evaluation/utils/utils.py ADDED Viewed

	@@ -0,0 +1,351 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+from collections import defaultdict
+import configparser
+import os
+import math
+from typing import Optional, List
+import torch
+import cv2
+import numpy as np
+from dataclasses import dataclass
+from tabulate import tabulate
+from pytorch3d.structures import Pointclouds
+from pytorch3d.transforms import RotateAxisAngle
+from pytorch3d.utils import (
+    opencv_from_cameras_projection,
+)
+from pytorch3d.renderer import (
+    AlphaCompositor,
+    PointsRasterizationSettings,
+    PointsRasterizer,
+    PointsRenderer,
+)
+from evaluation.utils.eval_utils import depth_to_pcd
+@dataclass
+class PerceptionPrediction:
+    """
+    Holds the tensors that describe a result of any perception module.
+    """
+    depth_map: Optional[torch.Tensor] = None
+    disparity: Optional[torch.Tensor] = None
+    image_rgb: Optional[torch.Tensor] = None
+    fg_probability: Optional[torch.Tensor] = None
+def aggregate_eval_results(per_batch_eval_results, reduction="mean"):
+    total_length = 0
+    aggregate_results = defaultdict(list)
+    for result in per_batch_eval_results:
+        if isinstance(result, tuple):
+            reduction = "sum"
+            length = result[1]
+            total_length += length
+            result = result[0]
+        for metric, val in result.items():
+            if reduction == "sum":
+                aggregate_results[metric].append(val * length)
+    if reduction == "mean":
+        return {k: torch.cat(v).mean().item() for k, v in aggregate_results.items()}
+    elif reduction == "sum":
+        return {
+            k: torch.cat(v).sum().item() / float(total_length)
+            for k, v in aggregate_results.items()
+        }
+def aggregate_and_print_results(
+    per_batch_eval_results: List[dict],
+):
+    print("")
+    result = aggregate_eval_results(
+        per_batch_eval_results,
+    )
+    pretty_print_perception_metrics(result)
+    result = {str(k): v for k, v in result.items()}
+    print("")
+    return result
+def pretty_print_perception_metrics(results):
+    metrics = sorted(list(results.keys()), key=lambda x: x.metric)
+    print("===== Perception results =====")
+    print(
+        tabulate(
+            [[metric, results[metric]] for metric in metrics],
+        )
+    )
+def read_calibration(calibration_file, resolution_string):
+    # ported from https://github.com/stereolabs/zed-open-capture/
+    # blob/dfa0aee51ccd2297782230a05ca59e697df496b2/examples/include/calibration.hpp#L4172
+    zed_resolutions = {
+        "2K": (1242, 2208),
+        "FHD": (1080, 1920),
+        "HD": (720, 1280),
+        # "qHD": (540, 960),
+        "VGA": (376, 672),
+    }
+    assert resolution_string in zed_resolutions.keys()
+    image_height, image_width = zed_resolutions[resolution_string]
+    # Open camera configuration file
+    assert os.path.isfile(calibration_file)
+    calib = configparser.ConfigParser()
+    calib.read(calibration_file)
+    # Get translations
+    T = np.zeros((3, 1))
+    T[0] = float(calib["STEREO"]["baseline"])
+    T[1] = float(calib["STEREO"]["ty"])
+    T[2] = float(calib["STEREO"]["tz"])
+    baseline = T[0]
+    # Get left parameters
+    left_cam_cx = float(calib[f"LEFT_CAM_{resolution_string}"]["cx"])
+    left_cam_cy = float(calib[f"LEFT_CAM_{resolution_string}"]["cy"])
+    left_cam_fx = float(calib[f"LEFT_CAM_{resolution_string}"]["fx"])
+    left_cam_fy = float(calib[f"LEFT_CAM_{resolution_string}"]["fy"])
+    left_cam_k1 = float(calib[f"LEFT_CAM_{resolution_string}"]["k1"])
+    left_cam_k2 = float(calib[f"LEFT_CAM_{resolution_string}"]["k2"])
+    left_cam_p1 = float(calib[f"LEFT_CAM_{resolution_string}"]["p1"])
+    left_cam_p2 = float(calib[f"LEFT_CAM_{resolution_string}"]["p2"])
+    left_cam_k3 = float(calib[f"LEFT_CAM_{resolution_string}"]["k3"])
+    # Get right parameters
+    right_cam_cx = float(calib[f"RIGHT_CAM_{resolution_string}"]["cx"])
+    right_cam_cy = float(calib[f"RIGHT_CAM_{resolution_string}"]["cy"])
+    right_cam_fx = float(calib[f"RIGHT_CAM_{resolution_string}"]["fx"])
+    right_cam_fy = float(calib[f"RIGHT_CAM_{resolution_string}"]["fy"])
+    right_cam_k1 = float(calib[f"RIGHT_CAM_{resolution_string}"]["k1"])
+    right_cam_k2 = float(calib[f"RIGHT_CAM_{resolution_string}"]["k2"])
+    right_cam_p1 = float(calib[f"RIGHT_CAM_{resolution_string}"]["p1"])
+    right_cam_p2 = float(calib[f"RIGHT_CAM_{resolution_string}"]["p2"])
+    right_cam_k3 = float(calib[f"RIGHT_CAM_{resolution_string}"]["k3"])
+    # Get rotations
+    R_zed = np.zeros(3)
+    R_zed[0] = float(calib["STEREO"][f"rx_{resolution_string.lower()}"])
+    R_zed[1] = float(calib["STEREO"][f"cv_{resolution_string.lower()}"])
+    R_zed[2] = float(calib["STEREO"][f"rz_{resolution_string.lower()}"])
+    R = cv2.Rodrigues(R_zed)[0]
+    # Left
+    cameraMatrix_left = np.array(
+        [[left_cam_fx, 0, left_cam_cx], [0, left_cam_fy, left_cam_cy], [0, 0, 1]]
+    )
+    distCoeffs_left = np.array(
+        [left_cam_k1, left_cam_k2, left_cam_p1, left_cam_p2, left_cam_k3]
+    )
+    # Right
+    cameraMatrix_right = np.array(
+        [
+            [right_cam_fx, 0, right_cam_cx],
+            [0, right_cam_fy, right_cam_cy],
+            [0, 0, 1],
+        ]
+    )
+    distCoeffs_right = np.array(
+        [right_cam_k1, right_cam_k2, right_cam_p1, right_cam_p2, right_cam_k3]
+    )
+    # Stereo
+    R1, R2, P1, P2, Q = cv2.stereoRectify(
+        cameraMatrix1=cameraMatrix_left,
+        distCoeffs1=distCoeffs_left,
+        cameraMatrix2=cameraMatrix_right,
+        distCoeffs2=distCoeffs_right,
+        imageSize=(image_width, image_height),
+        R=R,
+        T=T,
+        flags=cv2.CALIB_ZERO_DISPARITY,
+        newImageSize=(image_width, image_height),
+        alpha=0,
+    )[:5]
+    # Precompute maps for cv::remap()
+    map_left_x, map_left_y = cv2.initUndistortRectifyMap(
+        cameraMatrix_left,
+        distCoeffs_left,
+        R1,
+        P1,
+        (image_width, image_height),
+        cv2.CV_32FC1,
+    )
+    map_right_x, map_right_y = cv2.initUndistortRectifyMap(
+        cameraMatrix_right,
+        distCoeffs_right,
+        R2,
+        P2,
+        (image_width, image_height),
+        cv2.CV_32FC1,
+    )
+    zed_calib = {
+        "map_left_x": map_left_x,
+        "map_left_y": map_left_y,
+        "map_right_x": map_right_x,
+        "map_right_y": map_right_y,
+        "pose_left": P1,
+        "pose_right": P2,
+        "baseline": baseline,
+        "image_width": image_width,
+        "image_height": image_height,
+    }
+    return zed_calib
+def visualize_batch(
+    batch_dict: dict,
+    preds: PerceptionPrediction,
+    output_dir: str,
+    ref_frame: int = 0,
+    only_foreground=False,
+    step=0,
+    sequence_name=None,
+    writer=None,
+    # -- Added by Chu King on 22nd November 2025 to fix image resolution during evaluation.
+    resolution=[480, 640]
+):
+    os.makedirs(output_dir, exist_ok=True)
+    outputs = {}
+    if preds.depth_map is not None:
+        device = preds.depth_map.device
+        pcd_global_seq = []
+        # -- H, W = batch_dict["stereo_video"].shape[3:]
+        H, W = resolution
+        for i in range(len(batch_dict["stereo_video"])):
+            R, T, K = opencv_from_cameras_projection(
+                preds.perspective_cameras[i],
+                torch.tensor([H, W])[None].to(device),
+            )
+            extrinsic_3x4_0 = torch.cat([R[0], T[0, :, None]], dim=1)
+            extr_matrix = torch.cat(
+                [
+                    extrinsic_3x4_0,
+                    torch.Tensor([[0, 0, 0, 1]]).to(extrinsic_3x4_0.device),
+                ],
+                dim=0,
+            )
+            inv_extr_matrix = extr_matrix.inverse().to(device)
+            pcd, colors = depth_to_pcd(
+                preds.depth_map[i, 0],
+                batch_dict["stereo_video"][..., :resolution[0], : resolution[1]][i][0].permute(1, 2, 0),
+                K[0][0][0],
+                K[0][0][2],
+                K[0][1][2],
+                step=1,
+                inv_extrinsic=inv_extr_matrix,
+                mask=batch_dict["fg_mask"][..., :resolution[0], : resolution[1]][i, 0] if only_foreground else None,
+                filter=False,
+            )
+            R, T = inv_extr_matrix[None, :3, :3], inv_extr_matrix[None, :3, 3]
+            pcd_global_seq.append((pcd, colors, (R, T, preds.perspective_cameras[i])))
+        raster_settings = PointsRasterizationSettings(
+            image_size=[H, W], radius=0.003, points_per_pixel=10
+        )
+        R, T, cam_ = pcd_global_seq[ref_frame][2]
+        median_depth = preds.depth_map.median()
+        cam_.cuda()
+        for mode in ["angle_15", "angle_-15", "changing_angle"]:
+            res = []
+            for t, (pcd, color, __) in enumerate(pcd_global_seq):
+                if mode == "changing_angle":
+                    angle = math.cos((math.pi) * (t / 15)) * 15
+                elif mode == "angle_15":
+                    angle = 15
+                elif mode == "angle_-15":
+                    angle = -15
+                delta_x = median_depth * math.sin(math.radians(angle))
+                delta_z = median_depth * (1 - math.cos(math.radians(angle)))
+                cam = cam_.clone()
+                cam.R = torch.bmm(
+                    cam.R,
+                    RotateAxisAngle(angle=angle, axis="Y", device=device).get_matrix()[
+                        :, :3, :3
+                    ],
+                )
+                cam.T[0, 0] = cam.T[0, 0] - delta_x
+                cam.T[0, 2] = cam.T[0, 2] - delta_z + median_depth / 2.0
+                rasterizer = PointsRasterizer(
+                    cameras=cam, raster_settings=raster_settings
+                )
+                renderer = PointsRenderer(
+                    rasterizer=rasterizer,
+                    compositor=AlphaCompositor(background_color=(1, 1, 1)),
+                )
+                pcd_copy = pcd.clone()
+                point_cloud = Pointclouds(points=[pcd_copy], features=[color / 255.0])
+                images = renderer(point_cloud)
+                res.append(images[0, ..., :3].cpu())
+            res = torch.stack(res)
+            video = (res * 255).numpy().astype(np.uint8)
+            save_name = f"{sequence_name}_reconstruction_{step}_mode_{mode}_"
+            if writer is None:
+                outputs[mode] = video
+            if only_foreground:
+                save_name += "fg_only"
+            else:
+                save_name += "full_scene"
+            video_out = cv2.VideoWriter(
+                os.path.join(
+                    output_dir,
+                    f"{save_name}.mp4",
+                ),
+                cv2.VideoWriter_fourcc(*"mp4v"),
+                fps=10,
+                frameSize=(res.shape[2], res.shape[1]),
+                isColor=True,
+            )
+            for i in range(len(video)):
+                video_out.write(cv2.cvtColor(video[i], cv2.COLOR_BGR2RGB))
+            video_out.release()
+            if writer is not None:
+                writer.add_video(
+                    f"{sequence_name}_reconstruction_mode_{mode}",
+                    (res * 255).permute(0, 3, 1, 2).to(torch.uint8)[None],
+                    global_step=step,
+                    fps=8,
+                )
+    return outputs

links_lite.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+	"real": [
+      "https://dl.fbaipublicfiles.com/dynamic_replica_v2/real/real_000.zip"
+    ],
+	 "test": [
+      "https://dl.fbaipublicfiles.com/dynamic_replica_v2/test/test_000.zip"
+    ],
+	 "valid": [
+      "https://dl.fbaipublicfiles.com/dynamic_replica_v2/valid/valid_000.zip",
+      "https://dl.fbaipublicfiles.com/dynamic_replica_v2/valid/valid_001.zip"
+    ],
+    "train": [
+      "https://dl.fbaipublicfiles.com/dynamic_replica_v2/train/train_000.zip"
+    ]
+  }

models/core/attention.py ADDED Viewed

	@@ -0,0 +1,240 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+import math
+import copy
+import torch
+import torch.nn as nn
+from torch.nn import Module, Dropout
+"""
+Linear Transformer proposed in "Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention"
+Modified from: https://github.com/idiap/fast-transformers/blob/master/fast_transformers/attention/linear_attention.py
+"""
+def elu_feature_map(x):
+    return torch.nn.functional.elu(x) + 1
+class PositionEncodingSine(nn.Module):
+    """
+    This is a sinusoidal position encoding that generalized to 2-dimensional images
+    """
+    def __init__(self, d_model, max_shape=(256, 256), temp_bug_fix=True):
+        """
+        Args:
+            max_shape (tuple): for 1/8 featmap, the max length of 256 corresponds to 2048 pixels
+            temp_bug_fix (bool): As noted in this [issue](https://github.com/zju3dv/LoFTR/issues/41),
+                the original implementation of LoFTR includes a bug in the pos-enc impl, which has little impact
+                on the final performance. For now, we keep both impls for backward compatability.
+                We will remove the buggy impl after re-training all variants of our released models.
+        """
+        super().__init__()
+        # -- d_model: embedding dimension
+        pe = torch.zeros((d_model, *max_shape))
+        y_position = torch.ones(max_shape).cumsum(0).float().unsqueeze(0)
+        x_position = torch.ones(max_shape).cumsum(1).float().unsqueeze(0)
+        if temp_bug_fix:
+            div_term = torch.exp(
+                torch.arange(0, d_model // 2, 2).float()
+                * (-math.log(10000.0) / (d_model // 2))
+            )
+        else:  # a buggy implementation (for backward compatability only)
+            div_term = torch.exp(
+                torch.arange(0, d_model // 2, 2).float()
+                * (-math.log(10000.0) / d_model // 2)
+            )
+        div_term = div_term[:, None, None]  # [C//4, 1, 1]
+        pe[0::4, :, :] = torch.sin(x_position * div_term)
+        pe[1::4, :, :] = torch.cos(x_position * div_term)
+        pe[2::4, :, :] = torch.sin(y_position * div_term)
+        pe[3::4, :, :] = torch.cos(y_position * div_term)
+        self.register_buffer("pe", pe.unsqueeze(0), persistent=False)  # [1, C, H, W]
+    def forward(self, x):
+        """
+        Args:
+            x: [N, C, H, W]
+        """
+        return x + self.pe[:, :, : x.size(2), : x.size(3)].to(x.device)
+class LinearAttention(Module):
+    def __init__(self, eps=1e-6):
+        super().__init__()
+        self.feature_map = elu_feature_map
+        self.eps = eps
+    def forward(self, queries, keys, values, q_mask=None, kv_mask=None):
+        """Multi-Head linear attention proposed in "Transformers are RNNs"
+        Args:
+            queries: [N, L, H, D]
+            keys: [N, S, H, D]
+            values: [N, S, H, D]
+            q_mask: [N, L]
+            kv_mask: [N, S]
+        Returns:
+            queried_values: (N, L, H, D)
+        """
+        Q = self.feature_map(queries)
+        K = self.feature_map(keys)
+        # set padded position to zero
+        if q_mask is not None:
+            Q = Q * q_mask[:, :, None, None]
+        if kv_mask is not None:
+            K = K * kv_mask[:, :, None, None]
+            values = values * kv_mask[:, :, None, None]
+        v_length = values.size(1)
+        values = values / v_length  # prevent fp16 overflow
+        KV = torch.einsum("nshd,nshv->nhdv", K, values)  # (S,D)' @ S,V
+        Z = 1 / (torch.einsum("nlhd,nhd->nlh", Q, K.sum(dim=1)) + self.eps)
+        queried_values = torch.einsum("nlhd,nhdv,nlh->nlhv", Q, KV, Z) * v_length
+        return queried_values.contiguous()
+class FullAttention(Module):
+    def __init__(self, use_dropout=False, attention_dropout=0.1):
+        super().__init__()
+        self.use_dropout = use_dropout
+        self.dropout = Dropout(attention_dropout)
+    def forward(self, queries, keys, values, q_mask=None, kv_mask=None):
+        """Multi-head scaled dot-product attention, a.k.a full attention.
+        Args:
+            queries: [N, L, H, D]
+            keys: [N, S, H, D]
+            values: [N, S, H, D]
+            q_mask: [N, L]
+            kv_mask: [N, S]
+        Returns:
+            queried_values: (N, L, H, D)
+        """
+        # Compute the unnormalized attention and apply the masks
+        QK = torch.einsum("nlhd,nshd->nlsh", queries, keys)
+        if kv_mask is not None:
+            QK.masked_fill_(
+                ~(q_mask[:, :, None, None] * kv_mask[:, None, :, None]), float("-inf")
+            )
+        # Compute the attention and the weighted average
+        softmax_temp = 1.0 / queries.size(3) ** 0.5  # sqrt(D)
+        A = torch.softmax(softmax_temp * QK, dim=2)
+        if self.use_dropout:
+            A = self.dropout(A)
+        queried_values = torch.einsum("nlsh,nshd->nlhd", A, values)
+        return queried_values.contiguous()
+# Ref: https://github.com/zju3dv/LoFTR/blob/master/src/loftr/loftr_module/transformer.py
+class LoFTREncoderLayer(nn.Module):
+    def __init__(self, d_model, nhead, attention="linear"):
+        super(LoFTREncoderLayer, self).__init__()
+        self.dim = d_model // nhead
+        self.nhead = nhead
+        # multi-head attention
+        self.q_proj = nn.Linear(d_model, d_model, bias=False)
+        self.k_proj = nn.Linear(d_model, d_model, bias=False)
+        self.v_proj = nn.Linear(d_model, d_model, bias=False)
+        # -- LoFTR optionally uses linear attention (faster, avoids quadratic cost), otherwise normal softmax attention.
+        self.attention = LinearAttention() if attention == "linear" else FullAttention()
+        self.merge = nn.Linear(d_model, d_model, bias=False)
+        # feed-forward network
+        self.mlp = nn.Sequential(
+            nn.Linear(d_model * 2, d_model * 2, bias=False),
+            nn.ReLU(),
+            nn.Linear(d_model * 2, d_model, bias=False),
+        )
+        # norm and dropout
+        self.norm1 = nn.LayerNorm(d_model)
+        self.norm2 = nn.LayerNorm(d_model)
+    def forward(self, x, source, x_mask=None, source_mask=None):
+        """
+        Args:
+            x (torch.Tensor): [N, L, C]
+            source (torch.Tensor): [N, S, C]
+            x_mask (torch.Tensor): [N, L] (optional)
+            source_mask (torch.Tensor): [N, S] (optional)
+        """
+        bs = x.size(0)
+        query, key, value = x, source, source
+        # multi-head attention
+        query = self.q_proj(query).view(bs, -1, self.nhead, self.dim)  # [N, L, (H, D)]
+        key = self.k_proj(key).view(bs, -1, self.nhead, self.dim)  # [N, S, (H, D)]
+        value = self.v_proj(value).view(bs, -1, self.nhead, self.dim)
+        message = self.attention(
+            query, key, value, q_mask=x_mask, kv_mask=source_mask
+        )  # [N, L, (H, D)]
+        message = self.merge(message.view(bs, -1, self.nhead * self.dim))  # [N, L, C]
+        message = self.norm1(message)
+        # feed-forward network
+        message = self.mlp(torch.cat([x, message], dim=2))
+        message = self.norm2(message)
+        return x + message
+class LocalFeatureTransformer(nn.Module):
+    """A Local Feature Transformer (LoFTR) module."""
+    def __init__(self, d_model, nhead, layer_names, attention):
+        super(LocalFeatureTransformer, self).__init__()
+        self.d_model = d_model
+        self.nhead = nhead
+        self.layer_names = layer_names
+        encoder_layer = LoFTREncoderLayer(d_model, nhead, attention)
+        self.layers = nn.ModuleList(
+            [copy.deepcopy(encoder_layer) for _ in range(len(self.layer_names))]
+        )
+        self._reset_parameters()
+    def _reset_parameters(self):
+        for p in self.parameters():
+            if p.dim() > 1:
+                nn.init.xavier_uniform_(p)
+    def forward(self, feat0, feat1, mask0=None, mask1=None):
+        """
+        Args:
+            feat0 (torch.Tensor): [N, L, C]
+            feat1 (torch.Tensor): [N, S, C]
+            mask0 (torch.Tensor): [N, L] (optional)
+            mask1 (torch.Tensor): [N, S] (optional)
+        """
+        assert self.d_model == feat0.size(
+            2
+        ), "the feature number of src and transformer must be equal"
+        for layer, name in zip(self.layers, self.layer_names):
+            if name == "self":
+                feat0 = layer(feat0, feat0, mask0, mask0)
+                feat1 = layer(feat1, feat1, mask1, mask1)
+            elif name == "cross":
+                feat0 = layer(feat0, feat1, mask0, mask1)
+                feat1 = layer(feat1, feat0, mask1, mask0)
+            else:
+                raise KeyError
+        return feat0, feat1

models/core/corr.py ADDED Viewed

	@@ -0,0 +1,88 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+import torch
+import torch.nn.functional as F
+def bilinear_sampler(img, coords, mode="bilinear", mask=False, stereo=True):
+    """Wrapper for grid_sample, uses pixel coordinates"""
+    H, W = img.shape[-2:]
+    xgrid, ygrid = coords.split([1, 1], dim=-1)
+    xgrid = 2 * xgrid / (W - 1) - 1
+    if not stereo:
+        ygrid = 2 * ygrid / (H - 1) - 1
+    else:
+        assert torch.unique(ygrid).numel() == 1 and H == 1  # This is a stereo problem
+    img = img.contiguous()
+    grid = torch.cat([xgrid, ygrid], dim=-1).contiguous()
+    img = F.grid_sample(img, grid, align_corners=True)
+    if mask:
+        mask = (xgrid > -1) & (ygrid > -1) & (xgrid < 1) & (ygrid < 1)
+        return img, mask.float()
+    return img
+def coords_grid(batch, ht, wd, device):
+    coords = torch.meshgrid(
+        torch.arange(ht, device=device), torch.arange(wd, device=device), indexing="ij"
+    )
+    coords = torch.stack(coords[::-1], dim=0).float()
+    return coords[None].repeat(batch, 1, 1, 1)
+class CorrBlock1D:
+    def __init__(self, fmap1, fmap2, num_levels=4, radius=4):
+        self.num_levels = num_levels
+        self.radius = radius
+        self.corr_pyramid = []
+        self.coords = coords_grid(
+            fmap1.shape[0], fmap1.shape[2], fmap1.shape[3], fmap1.device
+        )
+        # all pairs correlation
+        corr = CorrBlock1D.corr(fmap1, fmap2)
+        batch, h1, w1, dim, w2 = corr.shape
+        corr = corr.reshape(batch * h1 * w1, dim, 1, w2)
+        self.corr_pyramid.append(corr)
+        for i in range(self.num_levels):
+            corr = F.avg_pool2d(corr, [1, 2], stride=[1, 2])
+            self.corr_pyramid.append(corr)
+    def __call__(self, flow):
+        r = self.radius
+        coords = self.coords + flow
+        coords = coords[:, :1].permute(0, 2, 3, 1)
+        batch, h1, w1, _ = coords.shape
+        out_pyramid = []
+        for i in range(self.num_levels):
+            corr = self.corr_pyramid[i]
+            dx = torch.linspace(-r, r, 2 * r + 1)
+            dx = dx.view(1, 1, 2 * r + 1, 1).to(coords.device)
+            x0 = dx + coords.reshape(batch * h1 * w1, 1, 1, 1) / 2 ** i
+            y0 = torch.zeros_like(x0)
+            coords_lvl = torch.cat([x0, y0], dim=-1)
+            corr = bilinear_sampler(corr, coords_lvl)
+            corr = corr.view(batch, h1, w1, -1)
+            out_pyramid.append(corr)
+        out = torch.cat(out_pyramid, dim=-1)
+        return out.permute(0, 3, 1, 2).contiguous().float()
+    @staticmethod
+    def corr(fmap1, fmap2):
+        B, D, H, W1 = fmap1.shape
+        _, _, _, W2 = fmap2.shape
+        fmap1 = fmap1.view(B, D, H, W1)
+        fmap2 = fmap2.view(B, D, H, W2)
+        corr = torch.einsum("aijk,aijh->ajkh", fmap1, fmap2)
+        corr = corr.reshape(B, H, W1, 1, W2).contiguous()
+        return corr / torch.sqrt(torch.tensor(D).float())

models/core/dynamic_stereo.py ADDED Viewed

	@@ -0,0 +1,506 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+# -- Added by Chu King on 16th November 2025 for debugging purposes.
+import os, signal
+import logging
+import torch.distributed as dist
+from typing import Dict, List
+from einops import rearrange
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from collections import defaultdict
+from models.core.update import (
+    BasicUpdateBlock,
+    SequenceUpdateBlock3D,
+    TimeAttnBlock,
+)
+# -- Added by Chu King on 21st November 2025
+from models.core.sci_codec import sci_decoder
+from models.core.extractor import BasicEncoder
+from models.core.corr import CorrBlock1D
+from models.core.attention import (
+    PositionEncodingSine,
+    LocalFeatureTransformer,
+)
+from models.core.utils.utils import InputPadder, interp
+autocast = torch.cuda.amp.autocast
+class DynamicStereo(nn.Module):
+    def __init__(
+        self,
+        max_disp: int = 192,
+        mixed_precision: bool = False,
+        num_frames: int = 5,
+        attention_type: str = None,
+        use_3d_update_block: bool = False,
+        different_update_blocks: bool = False,
+    ):
+        super(DynamicStereo, self).__init__()
+        self.max_flow = max_disp
+        self.mixed_precision = mixed_precision
+        self.hidden_dim = 128
+        self.context_dim = 128
+        dim = 256
+        self.dim = dim
+        self.dropout = 0 # -- dropout probability
+        # -- decide whether to use 3D update blocks (like RAFT3D) or simpler 2D blocks.
+        self.use_3d_update_block = use_3d_update_block
+        # -- Modified by Chu King on 21st November 2025
+        # -- CNN encoder that extracts features from images.
+        #  * output_dim: output channels
+        #  * norm_fn="instance": applies instance normalization.
+        # -- self.fnet = BasicEncoder(
+        # --     output_dim=dim, norm_fn="instance", dropout=self.dropout
+        # -- )
+        self.fnet = sci_decoder(
+                n_frame=num_frames,
+                n_taps=2,
+                output_dim=dim,
+                norm_fn="instance",
+                dropout=self.dropout
+            )
+        # -- Boolean flag to decide whether different update blocks are used for different resolutions.
+        self.different_update_blocks = different_update_blocks
+        # -- Cost volumne planes (matching costs for disparity computation).
+        cor_planes = 4 * 9
+        self.depth = 4
+        self.attention_type = attention_type
+        # attention_type is a combination of the following attention types:
+        # self_stereo, temporal, update_time, update_space
+        # for example, self_stereo_temporal_update_time_update_space
+        if self.use_3d_update_block:
+            # -- Uses 3D convolutions for spatiotemporal processing.
+            if self.different_update_blocks:
+                # -- self.update_block08, self.update_block16, self.update_block04
+                #    are update blocks for different resolution levels (i.e. 1/8, 1/16, 1/4)
+                self.update_block08 = SequenceUpdateBlock3D(
+                    hidden_dim=self.hidden_dim, cor_planes=cor_planes, mask_size=4
+                )
+                self.update_block16 = SequenceUpdateBlock3D(
+                    hidden_dim=self.hidden_dim,
+                    cor_planes=cor_planes,
+                    mask_size=4,
+                    attention_type=attention_type,
+                )
+                self.update_block04 = SequenceUpdateBlock3D(
+                    hidden_dim=self.hidden_dim, cor_planes=cor_planes, mask_size=4
+                )
+            else:
+                self.update_block = SequenceUpdateBlock3D(
+                    hidden_dim=self.hidden_dim, cor_planes=cor_planes, mask_size=4
+                )
+        else:
+            # -- Uses standard 2D update blocks.
+            if self.different_update_blocks:
+                self.update_block08 = BasicUpdateBlock(
+                    hidden_dim=self.hidden_dim, cor_planes=cor_planes, mask_size=4
+                )
+                self.update_block16 = BasicUpdateBlock(
+                    hidden_dim=self.hidden_dim,
+                    cor_planes=cor_planes,
+                    mask_size=4,
+                    attention_type=attention_type,
+                )
+                self.update_block04 = BasicUpdateBlock(
+                    hidden_dim=self.hidden_dim, cor_planes=cor_planes, mask_size=4
+                )
+            else:
+                self.update_block = BasicUpdateBlock(
+                    hidden_dim=self.hidden_dim, cor_planes=cor_planes, mask_size=4
+                )
+        if attention_type is not None:
+            # -- The model incorporates several attention types.
+            if ("update_time" in attention_type) or ("temporal" in attention_type):
+                # -- This variable learns positional embeddings for different time steps in the sequence.
+                self.time_embed = nn.Parameter(torch.zeros(1, num_frames, dim))
+            # -- Temporal attention: processes information across different time frames.
+            if "temporal" in attention_type:
+                self.time_attn_blocks = nn.ModuleList(
+                    [TimeAttnBlock(dim=dim, num_heads=8) for _ in range(self.depth)]
+                )
+            # -- Stereo attention: includes self-attention and cross attention blocks for processing
+            #    left-right stereo image pairs.
+            if "self_stereo" in attention_type:
+                self.self_attn_blocks = nn.ModuleList(
+                    [
+                        LocalFeatureTransformer(
+                            d_model=dim,
+                            nhead=8,
+                            layer_names=["self"] * 1,
+                            attention="linear",
+                        )
+                        for _ in range(self.depth)
+                    ]
+                )
+                self.cross_attn_blocks = nn.ModuleList(
+                    [
+                        LocalFeatureTransformer(
+                            d_model=dim,
+                            nhead=8,
+                            layer_names=["cross"] * 1,
+                            attention="linear",
+                        )
+                        for _ in range(self.depth)
+                    ]
+                )
+        self.num_frames = num_frames
+    @torch.jit.ignore
+    def no_weight_decay(self):
+        return {"time_embed"}
+    def freeze_bn(self):
+        for m in self.modules():
+            if isinstance(m, nn.BatchNorm2d):
+                m.eval()
+    def convex_upsample(self, flow: torch.Tensor, mask: torch.Tensor, rate: int = 4):
+        """Upsample flow field [H/8, W/8, 2] -> [H, W, 2] using convex combination"""
+        N, _, H, W = flow.shape
+        mask = mask.view(N, 1, 9, rate, rate, H, W)
+        mask = torch.softmax(mask, dim=2)
+        up_flow = F.unfold(rate * flow, [3, 3], padding=1)
+        up_flow = up_flow.view(N, 2, 9, 1, 1, H, W)
+        up_flow = torch.sum(mask * up_flow, dim=2)
+        up_flow = up_flow.permute(0, 1, 4, 2, 5, 3)
+        return up_flow.reshape(N, 2, rate * H, rate * W)
+    def zero_init(self, fmap: torch.Tensor):
+        N, _, H, W = fmap.shape
+        _x = torch.zeros([N, 1, H, W], dtype=torch.float32)
+        _y = torch.zeros([N, 1, H, W], dtype=torch.float32)
+        zero_flow = torch.cat((_x, _y), dim=1).to(fmap.device)
+        return zero_flow
+    def forward_batch_test(
+        self, batch_dict: Dict, sci_enc_L, sci_enc_R, kernel_size: int = 14, iters: int = 20
+    ):
+        stride = kernel_size // 2
+        predictions = defaultdict(list)
+        disp_preds = []
+        video = batch_dict["stereo_video"]
+        num_ims = len(video)
+        print("video", video.shape)
+        # -- Divide a single long sequence to multiple long sequences.
+        # -- For SCI stereo, we only test the first sequence.
+        # -- for i in range(0, num_ims, stride):
+        for i in range(1):
+            left_ims = video[i : min(i + kernel_size, num_ims), 0]
+            # -- padder = InputPadder(left_ims.shape, divis_by=32)
+            right_ims = video[i : min(i + kernel_size, num_ims), 1]
+            # -- left_ims, right_ims = padder.pad(left_ims, right_ims)
+            # -- Modified by Chu King on 20th November 2025
+            # 0) Convert to Gray
+            def rgb_to_gray(x):
+                weights = torch.tensor([0.2989, 0.5870, 0.1140], dtype=x.dtype, device=x.device)
+                gray = (x * weights[None, None, :, None, None]).sum(dim=2)
+                return gray # -- shape: [B, T, H, W]
+            video_L = rgb_to_gray(left_ims.to(next(sci_enc_L.parameters()).device))  # ~ (b, t, h, w)
+            video_R = rgb_to_gray(right_ims.to(next(sci_enc_R.parameters()).device)) # ~ (b, t, h, w)
+            # 1) Extract and normalize input videos.
+            # -- min_max_norm = lambda x : 2. * (x / 255.) - 1.
+            min_max_norm = lambda x: x / 255.
+            video_L = min_max_norm(video_L)
+            video_R = min_max_norm(video_R)
+            # 2) If the tensor is non-contiguous and we try .view() later, PyTorch will raise an error:
+            video_L = video_L.contiguous()
+            video_R = video_R.contiguous()
+            # 3) Coded exposure modeling.
+            snapshot_L = sci_enc_L(video_L)
+            snapshot_R = sci_enc_L(video_R)
+            with autocast(enabled=self.mixed_precision):
+                disparities_forw = self.forward(
+                    # -- Modified by Chu King on 20th November 2025
+                    # -- left_ims[None].cuda(),
+                    # -- right_ims[None].cuda(),
+                    snapshot_L,
+                    snapshot_R,
+                    iters=iters,
+                    test_mode=True,
+                )
+            # -- Padding disabled by Chu King on 20th November 2025
+            # -- disparities_forw = padder.unpad(disparities_forw[:, 0])[:, None].cpu()
+            disparities_forw = disparities_forw[:, 0][:, None].cpu()
+            # -- We are not doing overlapping chunks in SCI stereo.
+            disp_preds.append(disparities_forw)
+            # -- if len(disp_preds) > 0 and len(disparities_forw) >= stride:
+            # --     if len(disparities_forw) < kernel_size:
+            # --         disp_preds.append(disparities_forw[stride // 2 :])
+            # --     else:
+            # --         disp_preds.append(disparities_forw[stride // 2 : -stride // 2])
+            # -- elif len(disp_preds) == 0:
+            # --     disp_preds.append(disparities_forw[: -stride // 2])
+        predictions["disparity"] = (torch.cat(disp_preds).squeeze(1).abs())[:, :1]
+        return predictions
+    def forward_sst_block(
+        self, fmap1_dw16: torch.Tensor, fmap2_dw16: torch.Tensor, T: int
+    ):
+        # -- fmap1_dw16 ~ (B*T, C, H, W) -- left-view features
+        # -- fmap2_dw16 ~ (B*T, C, H, W) -- right-view features
+        *_, h, w = fmap1_dw16.shape
+        # positional encoding and self-attention
+        pos_encoding_fn_small = PositionEncodingSine(d_model=self.dim, max_shape=(h, w))
+        fmap1_dw16 = pos_encoding_fn_small(fmap1_dw16)
+        fmap2_dw16 = pos_encoding_fn_small(fmap2_dw16)
+        if self.attention_type is not None:
+            # add time embeddings
+            if (
+                "temporal" in self.attention_type
+                or "update_time" in self.attention_type
+            ):
+                fmap1_dw16 = rearrange(
+                    fmap1_dw16, "(b t) m h w -> (b h w) t m", t=T, h=h, w=w
+                )
+                fmap2_dw16 = rearrange(
+                    fmap2_dw16, "(b t) m h w -> (b h w) t m", t=T, h=h, w=w
+                )
+                # interpolate if video length doesn't match
+                if T != self.num_frames:
+                    time_embed = self.time_embed.transpose(1, 2)
+                    new_time_embed = F.interpolate(time_embed, size=(T), mode="nearest")
+                    new_time_embed = new_time_embed.transpose(1, 2).contiguous()
+                else:
+                    new_time_embed = self.time_embed
+                fmap1_dw16 = fmap1_dw16 + new_time_embed
+                fmap2_dw16 = fmap2_dw16 + new_time_embed
+                fmap1_dw16 = rearrange(
+                    fmap1_dw16, "(b h w) t m -> (b t) m h w", t=T, h=h, w=w
+                )
+                fmap2_dw16 = rearrange(
+                    fmap2_dw16, "(b h w) t m -> (b t) m h w", t=T, h=h, w=w
+                )
+            if ("self_stereo" in self.attention_type) or (
+                "temporal" in self.attention_type
+            ):
+                for att_ind in range(self.depth):
+                    if "self_stereo" in self.attention_type:
+                        fmap1_dw16 = rearrange(
+                            fmap1_dw16, "(b t) m h w -> (b t) (h w) m", t=T, h=h, w=w
+                        )
+                        fmap2_dw16 = rearrange(
+                            fmap2_dw16, "(b t) m h w -> (b t) (h w) m", t=T, h=h, w=w
+                        )
+                        fmap1_dw16, fmap2_dw16 = self.self_attn_blocks[att_ind](
+                            fmap1_dw16, fmap2_dw16
+                        )
+                        fmap1_dw16, fmap2_dw16 = self.cross_attn_blocks[att_ind](
+                            fmap1_dw16, fmap2_dw16
+                        )
+                        fmap1_dw16 = rearrange(
+                            fmap1_dw16, "(b t) (h w) m -> (b t) m h w ", t=T, h=h, w=w
+                        )
+                        fmap2_dw16 = rearrange(
+                            fmap2_dw16, "(b t) (h w) m -> (b t) m h w ", t=T, h=h, w=w
+                        )
+                    if "temporal" in self.attention_type:
+                        fmap1_dw16 = self.time_attn_blocks[att_ind](fmap1_dw16, T=T)
+                        fmap2_dw16 = self.time_attn_blocks[att_ind](fmap2_dw16, T=T)
+        return fmap1_dw16, fmap2_dw16
+    def forward_update_block(
+        self,
+        update_block: nn.Module,
+        corr_fn: CorrBlock1D,
+        flow: torch.Tensor,
+        net: torch.Tensor,
+        inp: torch.Tensor,
+        predictions: List,
+        iters: int,
+        interp_scale: float,
+        t: int,
+    ):
+        for _ in range(iters):
+            flow = flow.detach()
+            out_corrs = corr_fn(flow)
+            with autocast(enabled=self.mixed_precision):
+                net, up_mask, delta_flow = update_block(net, inp, out_corrs, flow, t=t)
+            flow = flow + delta_flow
+            flow_up = flow_out = self.convex_upsample(flow, up_mask, rate=4)
+            if interp_scale > 1:
+                flow_up = interp_scale * interp(
+                    flow_out,
+                    (
+                        interp_scale * flow_out.shape[2],
+                        interp_scale * flow_out.shape[3],
+                    ),
+                )
+            flow_up = flow_up[:, :1]
+            predictions.append(flow_up)
+        return flow_out, net
+    def forward(self, image1, image2, flow_init=None, iters=10, test_mode=False):
+        """Estimate optical flow between pair of frames"""
+        b, *_ = image1.shape
+        hdim = self.hidden_dim
+        with autocast(enabled=self.mixed_precision):
+            fmap1, fmap2 = self.fnet([image1, image2])
+            net, inp = torch.split(fmap1, [hdim, hdim], dim=1)
+            net = torch.tanh(net)
+            inp = F.relu(inp)
+            *_, h, w = fmap1.shape
+            # 1/4 -> 1/16
+            # feature
+            fmap1_dw16 = F.avg_pool2d(fmap1, 4, stride=4)
+            fmap2_dw16 = F.avg_pool2d(fmap2, 4, stride=4)
+            fmap1_dw16, fmap2_dw16 = self.forward_sst_block(fmap1_dw16, fmap2_dw16, T=self.num_frames)
+            net_dw16, inp_dw16 = torch.split(fmap1_dw16, [hdim, hdim], dim=1)
+            net_dw16 = torch.tanh(net_dw16)
+            inp_dw16 = F.relu(inp_dw16)
+            fmap1_dw8 = (
+                F.avg_pool2d(fmap1, 2, stride=2) + interp(fmap1_dw16, (h // 2, w // 2))
+            ) / 2.0
+            fmap2_dw8 = (
+                F.avg_pool2d(fmap2, 2, stride=2) + interp(fmap2_dw16, (h // 2, w // 2))
+            ) / 2.0
+            net_dw8, inp_dw8 = torch.split(fmap1_dw8, [hdim, hdim], dim=1)
+            net_dw8 = torch.tanh(net_dw8)
+            inp_dw8 = F.relu(inp_dw8)
+        # Cascaded refinement (1/16 + 1/8 + 1/4)
+        predictions = []
+        flow = None
+        flow_up = None
+        if flow_init is not None:
+            scale = h / flow_init.shape[2]
+            flow = -scale * interp(flow_init, (h, w))
+        else:
+            # zero initialization
+            flow_dw16 = self.zero_init(fmap1_dw16) # -- (N, 2, H, W)
+            # Recurrent Update Module
+            # Update 1/16
+            update_block = (
+                self.update_block16
+                if self.different_update_blocks
+                else self.update_block
+            )
+            corr_fn_att_dw16 = CorrBlock1D(fmap1_dw16, fmap2_dw16)
+            flow, net_dw16 = self.forward_update_block(
+                update_block=update_block,
+                corr_fn=corr_fn_att_dw16,
+                flow=flow_dw16,
+                net=net_dw16,
+                inp=inp_dw16,
+                predictions=predictions,
+                iters=iters // 2,
+                interp_scale=4,
+                t=self.num_frames,
+            )
+            scale = fmap1_dw8.shape[2] / flow.shape[2]
+            flow_dw8 = -scale * interp(flow, (fmap1_dw8.shape[2], fmap1_dw8.shape[3]))
+            net_dw8 = (
+                net_dw8
+                + interp(net_dw16, (2 * net_dw16.shape[2], 2 * net_dw16.shape[3]))
+            ) / 2.0
+            # Update 1/8
+            update_block = (
+                self.update_block08
+                if self.different_update_blocks
+                else self.update_block
+            )
+            corr_fn_dw8 = CorrBlock1D(fmap1_dw8, fmap2_dw8)
+            flow, net_dw8 = self.forward_update_block(
+                update_block=update_block,
+                corr_fn=corr_fn_dw8,
+                flow=flow_dw8,
+                net=net_dw8,
+                inp=inp_dw8,
+                predictions=predictions,
+                iters=iters // 2,
+                interp_scale=2,
+                t=self.num_frames,
+            )
+            scale = h / flow.shape[2]
+            flow = -scale * interp(flow, (h, w))
+        net = (
+            net + interp(net_dw8, (2 * net_dw8.shape[2], 2 * net_dw8.shape[3]))
+        ) / 2.0
+        # Update 1/4
+        update_block = (
+            self.update_block04 if self.different_update_blocks else self.update_block
+        )
+        corr_fn = CorrBlock1D(fmap1, fmap2)
+        flow, __ = self.forward_update_block(
+            update_block=update_block,
+            corr_fn=corr_fn,
+            flow=flow,
+            net=net,
+            inp=inp,
+            predictions=predictions,
+            iters=iters,
+            interp_scale=1,
+            t=self.num_frames,
+        )
+        predictions = torch.stack(predictions)
+        predictions = rearrange(predictions, "d (b t) c h w -> d t b c h w", b=b, t=self.num_frames)
+        flow_up = predictions[-1]
+        if test_mode:
+            return flow_up
+        return predictions

models/core/extractor.py ADDED Viewed

	@@ -0,0 +1,139 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+import torch
+import torch.nn as nn
+# -- Added by Chu King on 16th November 2025 for debugging purposes.
+import os, signal
+import logging
+import torch.distributed as dist
+class ResidualBlock(nn.Module):
+    def __init__(self, in_planes, planes, norm_fn="group", stride=1):
+        super(ResidualBlock, self).__init__()
+        self.conv1 = nn.Conv2d(
+            in_planes, planes, kernel_size=3, padding=1, stride=stride
+        )
+        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, padding=1)
+        self.relu = nn.ReLU(inplace=True)
+        num_groups = planes // 8
+        if norm_fn == "group":
+            self.norm1 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)
+            self.norm2 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)
+            self.norm3 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)
+        elif norm_fn == "batch":
+            self.norm1 = nn.BatchNorm2d(planes)
+            self.norm2 = nn.BatchNorm2d(planes)
+            self.norm3 = nn.BatchNorm2d(planes)
+        elif norm_fn == "instance":
+            self.norm1 = nn.InstanceNorm2d(planes, affine=False)
+            self.norm2 = nn.InstanceNorm2d(planes, affine=False)
+            self.norm3 = nn.InstanceNorm2d(planes, affine=False)
+        elif norm_fn == "none":
+            self.norm1 = nn.Sequential()
+            self.norm2 = nn.Sequential()
+            self.norm3 = nn.Sequential()
+        self.downsample = nn.Sequential(
+            nn.Conv2d(in_planes, planes, kernel_size=1, stride=stride), self.norm3
+        )
+    def forward(self, x):
+        y = x
+        y = self.relu(self.norm1(self.conv1(y)))
+        y = self.relu(self.norm2(self.conv2(y)))
+        # -- ensures that x is transformed to the correct shape so it can be added to y.
+        x = self.downsample(x)
+        return self.relu(x + y)
+class BasicEncoder(nn.Module):
+    def __init__(self, output_dim=128, norm_fn="batch", dropout=0.0):
+        super(BasicEncoder, self).__init__()
+        self.norm_fn = norm_fn
+        if self.norm_fn == "group":
+            self.norm1 = nn.GroupNorm(num_groups=8, num_channels=64)
+        elif self.norm_fn == "batch":
+            self.norm1 = nn.BatchNorm2d(64)
+        elif self.norm_fn == "instance":
+            self.norm1 = nn.InstanceNorm2d(64, affine=False)
+        elif self.norm_fn == "none":
+            self.norm1 = nn.Sequential()
+        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3)
+        self.relu1 = nn.ReLU(inplace=True)
+        self.in_planes = 64
+        self.layer1 = self._make_layer(64, stride=1)
+        self.layer2 = self._make_layer(96, stride=2)
+        self.layer3 = self._make_layer(128, stride=1)
+        # output convolution
+        self.conv2 = nn.Conv2d(128, output_dim, kernel_size=1)
+        self.dropout = None
+        if dropout > 0:
+            self.dropout = nn.Dropout2d(p=dropout)
+        # -- self.modules() is a PyTorch utility function that returns all submodules of this nn.Module recursively.
+        # -- This means it will looop through every layer: conv1, layer1, layer2, layer3, conv2 and so on.
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                nn.init.kaiming_normal_(m.weight, mode="fan_out", nonlinearity="relu")
+            elif isinstance(m, (nn.BatchNorm2d, nn.InstanceNorm2d, nn.GroupNorm)):
+                if m.weight is not None:
+                    nn.init.constant_(m.weight, 1)
+                if m.bias is not None:
+                    nn.init.constant_(m.bias, 0)
+    def _make_layer(self, dim, stride=1):
+        layer1 = ResidualBlock(self.in_planes, dim, self.norm_fn, stride=stride)
+        layer2 = ResidualBlock(dim, dim, self.norm_fn, stride=1)
+        layers = (layer1, layer2)
+        self.in_planes = dim
+        return nn.Sequential(*layers)
+    def forward(self, x):
+        # -- x = [L, R]
+        # -- L, R ~ (b*t, c, h, w)
+        # if input is list, combine batch dimension
+        is_list = isinstance(x, tuple) or isinstance(x, list)
+        if is_list:
+            batch_dim = x[0].shape[0]
+            x = torch.cat(x, dim=0)
+        x = self.conv1(x)
+        x = self.norm1(x)
+        x = self.relu1(x)
+        x = self.layer1(x)
+        x = self.layer2(x)
+        x = self.layer3(x)
+        x = self.conv2(x)
+        if self.dropout is not None:
+            x = self.dropout(x)
+        if is_list:
+            x = torch.split(x, x.shape[0] // 2, dim=0)
+        return x

models/core/model_zoo.py ADDED Viewed

	@@ -0,0 +1,48 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+import copy
+from dynamic_stereo.models.dynamic_stereo_model import DynamicStereoModel
+from pytorch3d.implicitron.tools.config import get_default_args
+try:
+    from dynamic_stereo.models.raft_stereo_model import RAFTStereoModel
+    MODELS = [RAFTStereoModel, DynamicStereoModel]
+except:
+    MODELS = [DynamicStereoModel]
+_MODEL_NAME_TO_MODEL = {model_cls.__name__: model_cls for model_cls in MODELS}
+_MODEL_CONFIG_NAME_TO_DEFAULT_CONFIG = {}
+for model_cls in MODELS:
+    _MODEL_CONFIG_NAME_TO_DEFAULT_CONFIG[
+        model_cls.MODEL_CONFIG_NAME
+    ] = get_default_args(model_cls)
+MODEL_NAME_NONE = "NONE"
+def model_zoo(model_name: str, **kwargs):
+    if model_name.upper() == MODEL_NAME_NONE:
+        return None
+    model_cls = _MODEL_NAME_TO_MODEL.get(model_name)
+    if model_cls is None:
+        raise ValueError(f"No such model name: {model_name}")
+    model_cls_params = {}
+    if "model_zoo" in getattr(model_cls, "__dataclass_fields__", []):
+        model_cls_params["model_zoo"] = model_zoo
+    print(
+        f"{model_cls.MODEL_CONFIG_NAME} model configs:",
+        kwargs.get(model_cls.MODEL_CONFIG_NAME),
+    )
+    return model_cls(**model_cls_params, **kwargs.get(model_cls.MODEL_CONFIG_NAME, {}))
+def get_all_model_default_configs():
+    return copy.deepcopy(_MODEL_CONFIG_NAME_TO_DEFAULT_CONFIG)

models/core/sci_codec.py ADDED Viewed

	@@ -0,0 +1,180 @@

+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import numpy as np
+from models.core.extractor import ResidualBlock
+autocast = torch.cuda.amp.autocast
+class ste_fn(torch.autograd.Function):
+    @staticmethod
+    def forward(ctx, x):
+        return (x > 0).float()
+    @staticmethod
+    def backward(ctx, grad):
+        return F.hardtanh(grad)
+class STE(nn.Module):
+    def __init__(self):
+        super(STE, self).__init__()
+    def forward(self, x):
+        return ste_fn.apply(x)
+class sci_encoder(nn.Module):
+    def __init__(
+        self,
+        sigma_range=[0, 1e-9],
+        n_frame=8,
+        in_channels=1,
+        n_taps=2,
+        resolution=[480, 640]):
+        super(sci_encoder, self).__init__()
+        assert n_taps in [1, 2], "[ERROR] n_taps should be either 1 or 2."
+        self.sigma_range = sigma_range
+        self.n_frame = n_frame
+        self.in_channels = in_channels
+        self.n_taps = n_taps
+        self.resolution = resolution
+        # -- Shutter code; Learnable parameters
+        self.ce_weight = nn.Parameter(torch.Tensor(n_frame, in_channels, *resolution))
+        # -- initialize
+        nn.init.uniform_(self.ce_weight, a=-1, b=1)
+        self.ste = STE()
+    def forward(self, frames):
+        # -- print ("[INFO] self.ce_weight.device: ", self.ce_weight.device)
+        ce_code = self.ste(self.ce_weight)
+        # -- print ("[INFO] ce_code.device: ", ce_code.device)
+        frames = frames[..., :self.resolution[0], :self.resolution[1]]
+        frames = frames.contiguous()
+        frames = torch.unsqueeze(frames, 2)
+        # -- print ("[INFO] ce_code.shape: ", ce_code.shape)
+        # -- print ("[INFO] frames.shape: ", frames.shape)
+        # -- repeat by the batch size
+        ce_code = ce_code.repeat(frames.shape[0], 1, 1, 1, 1)
+        # -- print ("[INFO] ce_code.shape: ", ce_code.shape)
+        # -- print ("[INFO] ce_code.squeeze(2).shape: ", ce_code.squeeze(2).shape)
+        ce_blur_img = torch.zeros(frames.shape[0], self.in_channels * self.n_taps, *self.resolution).to(frames.device) # -- (b, c, h, w)
+        # -- print ("[INFO] ce_blur_img.shape: ", ce_blur_img.shape)
+        ce_blur_img[:, 0, ...] = torch.sum(      ce_code  * frames, axis=1) / self.n_frame
+        ce_blur_img[:, 1, ...] = torch.sum((1. - ce_code) * frames, axis=1) / self.n_frame
+        # -- add noise
+        noise_level = np.random.uniform(*self.sigma_range)
+        ce_blur_img_noisy = ce_blur_img + torch.tensor(noise_level).to(frames.device) * torch.randn(ce_blur_img.shape).to(frames.device)
+        # -- concat snapshots and mask patterns
+        out = torch.zeros(frames.shape[0], self.n_taps + self.n_frame, *self.resolution).to(frames.device)
+        # -- print ("[INFO] out.shape: ", out.shape)
+        out[:, :self.n_taps, :, :] = ce_blur_img_noisy
+        out[:, self.n_taps:, :, :] = ce_code.squeeze(2)
+        return out
+class sci_decoder(nn.Module):
+    def __init__(self,
+        n_frame=8,
+        n_taps=2,
+        output_dim=128,
+        norm_fn="batch",
+        dropout=.0):
+        super(sci_decoder, self).__init__()
+        self.norm_fn = norm_fn
+        if norm_fn == "group":
+            self.norm1 = nn.GroupNorm(num_groups=4, num_channels=4*n_frame)
+        elif norm_fn == "batch":
+            self.norm1 = nn.BatchNorm2d(4*n_frame)
+        elif norm_fn == "instance":
+            self.norm1 = nn.InstanceNorm2d(4*n_frame, affine=True)
+        elif norm_fn == "none":
+            self.norm1 = nn.Sequential()
+        # -- Input Convoultion
+        # -- Assuming n_frame=8; n_ich=10; n_och=32
+        self.conv1 = nn.Conv2d(n_taps+n_frame, 4*n_frame, kernel_size=7, stride=2, padding=3)
+        self.relu1 = nn.ReLU(inplace=True)
+        # -- Residual Blocks
+        self.layer1 = self._make_layer( 4*n_frame,  4*n_frame, stride=1)
+        self.layer2 = self._make_layer( 4*n_frame, 16*n_frame, stride=2)
+        self.layer3 = self._make_layer(16*n_frame, 64*n_frame, stride=1)
+        # -- Output Convolution
+        self.conv2 = nn.Conv2d(64*n_frame, output_dim*n_frame, kernel_size=1)
+        if dropout > 0.:
+            self.dropout = nn.Dropout2d(p=dropout)
+        else:
+            self.dropout = None
+        # -- self.modules() is a PyTorch utility function that returns all submodules of this nn.Module recursively.
+        # -- This means it will looop through every layer: conv1, layer1, layer2, layer3, conv2 and so on.
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                nn.init.kaiming_normal_(m.weight, mode="fan_out", nonlinearity="relu")
+            elif isinstance(m, (nn.BatchNorm2d, nn.InstanceNorm2d, nn.GroupNorm)):
+                if m.weight is not None:
+                    nn.init.constant_(m.weight, 1)
+                if m.bias is not None:
+                    nn.init.constant_(m.bias, 0)
+    # -- Private function to make residual blocks
+    def _make_layer(self, n_ich, n_och, stride=1):
+        layer1 = ResidualBlock(n_ich, n_och, self.norm_fn, stride=stride)
+        layer2 = ResidualBlock(n_och, n_och, self.norm_fn, stride=1)
+        layers = (layer1, layer2)
+        return nn.Sequential(*layers)
+    def forward(self, x):
+        # -- x = [L, R]
+        # -- L, R ~ (b, c, h, w); c=n_taps+n_frame
+        # -- if input is list, combine batch dimension
+        is_list = isinstance(x, tuple) or isinstance(x, list)
+        if is_list:
+            batch_dim = x[0].shape[0]
+            x = torch.cat(x, dim=0)
+        # -- print ("[INFO] x.shape: ", x.shape)
+        x = self.conv1(x)
+        x = self.norm1(x)
+        x = self.relu1(x)
+        x = self.layer1(x)
+        x = self.layer2(x)
+        x = self.layer3(x)
+        x = self.conv2(x)
+        # -- expand the temporal dimension
+        # -- (b, c, h, w) -> (b*t, c//t, h, w)
+        x = x.contiguous()
+        x = x.view(x.shape[0]*8, x.shape[1]//8, x.shape[-2], x.shape[-1])
+        if self.dropout is not None:
+            x = self.dropout(x)
+        # -- if input is list, split the first dimension
+        if is_list:
+            x = torch.split(x, x.shape[0] // 2, dim=0)
+        return x

models/core/update.py ADDED Viewed

	@@ -0,0 +1,370 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+from einops import rearrange
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from models.core.attention import LoFTREncoderLayer
+# -- Added by Chu King on 16th November 2025 for debugging purposes.
+import os, signal
+import logging
+import torch.distributed as dist
+# Ref: https://github.com/princeton-vl/RAFT/blob/master/core/update.py
+class FlowHead(nn.Module):
+    def __init__(self, input_dim=128, hidden_dim=256):
+        super(FlowHead, self).__init__()
+        self.conv1 = nn.Conv2d(input_dim, hidden_dim, 3, padding=1)
+        self.conv2 = nn.Conv2d(hidden_dim, 2, 3, padding=1)
+        self.relu = nn.ReLU(inplace=True)
+    def forward(self, x):
+        return self.conv2(self.relu(self.conv1(x)))
+class SepConvGRU(nn.Module):
+    def __init__(self, hidden_dim=128, input_dim=192 + 128):
+        super(SepConvGRU, self).__init__()
+        self.convz1 = nn.Conv2d(
+            hidden_dim + input_dim, hidden_dim, (1, 5), padding=(0, 2)
+        )
+        self.convr1 = nn.Conv2d(
+            hidden_dim + input_dim, hidden_dim, (1, 5), padding=(0, 2)
+        )
+        self.convq1 = nn.Conv2d(
+            hidden_dim + input_dim, hidden_dim, (1, 5), padding=(0, 2)
+        )
+        self.convz2 = nn.Conv2d(
+            hidden_dim + input_dim, hidden_dim, (5, 1), padding=(2, 0)
+        )
+        self.convr2 = nn.Conv2d(
+            hidden_dim + input_dim, hidden_dim, (5, 1), padding=(2, 0)
+        )
+        self.convq2 = nn.Conv2d(
+            hidden_dim + input_dim, hidden_dim, (5, 1), padding=(2, 0)
+        )
+    def forward(self, h, x):
+        # horizontal
+        hx = torch.cat([h, x], dim=1)
+        z = torch.sigmoid(self.convz1(hx))
+        r = torch.sigmoid(self.convr1(hx))
+        q = torch.tanh(self.convq1(torch.cat([r * h, x], dim=1)))
+        h = (1 - z) * h + z * q
+        # vertical
+        hx = torch.cat([h, x], dim=1)
+        z = torch.sigmoid(self.convz2(hx))
+        r = torch.sigmoid(self.convr2(hx))
+        q = torch.tanh(self.convq2(torch.cat([r * h, x], dim=1)))
+        h = (1 - z) * h + z * q
+        return h
+class ConvGRU(nn.Module):
+    def __init__(self, hidden_dim, input_dim, kernel_size=3):
+        super(ConvGRU, self).__init__()
+        self.convz = nn.Conv2d(
+            hidden_dim + input_dim, hidden_dim, kernel_size, padding=kernel_size // 2
+        )
+        self.convr = nn.Conv2d(
+            hidden_dim + input_dim, hidden_dim, kernel_size, padding=kernel_size // 2
+        )
+        self.convq = nn.Conv2d(
+            hidden_dim + input_dim, hidden_dim, kernel_size, padding=kernel_size // 2
+        )
+    def forward(self, h, x):
+        hx = torch.cat([h, x], dim=1)
+        z = torch.sigmoid(self.convz(hx))
+        r = torch.sigmoid(self.convr(hx))
+        q = torch.tanh(self.convq(torch.cat([r * h, x], dim=1)))
+        h = (1 - z) * h + z * q
+        return h
+class SepConvGRU3D(nn.Module):
+    def __init__(self, hidden_dim=128, input_dim=192 + 128):
+        super(SepConvGRU3D, self).__init__()
+        self.convz1 = nn.Conv3d(
+            hidden_dim + input_dim, hidden_dim, (1, 1, 5), padding=(0, 0, 2)
+        )
+        self.convr1 = nn.Conv3d(
+            hidden_dim + input_dim, hidden_dim, (1, 1, 5), padding=(0, 0, 2)
+        )
+        self.convq1 = nn.Conv3d(
+            hidden_dim + input_dim, hidden_dim, (1, 1, 5), padding=(0, 0, 2)
+        )
+        self.convz2 = nn.Conv3d(
+            hidden_dim + input_dim, hidden_dim, (1, 5, 1), padding=(0, 2, 0)
+        )
+        self.convr2 = nn.Conv3d(
+            hidden_dim + input_dim, hidden_dim, (1, 5, 1), padding=(0, 2, 0)
+        )
+        self.convq2 = nn.Conv3d(
+            hidden_dim + input_dim, hidden_dim, (1, 5, 1), padding=(0, 2, 0)
+        )
+        self.convz3 = nn.Conv3d(
+            hidden_dim + input_dim, hidden_dim, (5, 1, 1), padding=(2, 0, 0)
+        )
+        self.convr3 = nn.Conv3d(
+            hidden_dim + input_dim, hidden_dim, (5, 1, 1), padding=(2, 0, 0)
+        )
+        self.convq3 = nn.Conv3d(
+            hidden_dim + input_dim, hidden_dim, (5, 1, 1), padding=(2, 0, 0)
+        )
+    def forward(self, h, x):
+        hx = torch.cat([h, x], dim=1)
+        z = torch.sigmoid(self.convz1(hx))
+        r = torch.sigmoid(self.convr1(hx))
+        q = torch.tanh(self.convq1(torch.cat([r * h, x], dim=1)))
+        h = (1 - z) * h + z * q
+        # vertical
+        hx = torch.cat([h, x], dim=1)
+        z = torch.sigmoid(self.convz2(hx))
+        r = torch.sigmoid(self.convr2(hx))
+        q = torch.tanh(self.convq2(torch.cat([r * h, x], dim=1)))
+        h = (1 - z) * h + z * q
+        # time
+        hx = torch.cat([h, x], dim=1)
+        z = torch.sigmoid(self.convz3(hx))
+        r = torch.sigmoid(self.convr3(hx))
+        q = torch.tanh(self.convq3(torch.cat([r * h, x], dim=1)))
+        h = (1 - z) * h + z * q
+        return h
+class BasicMotionEncoder(nn.Module):
+    def __init__(self, cor_planes):
+        super(BasicMotionEncoder, self).__init__()
+        self.convc1 = nn.Conv2d(cor_planes, 256, 1, padding=0)
+        self.convc2 = nn.Conv2d(256, 192, 3, padding=1)
+        self.convf1 = nn.Conv2d(2, 128, 7, padding=3)
+        self.convf2 = nn.Conv2d(128, 64, 3, padding=1)
+        self.conv = nn.Conv2d(64 + 192, 128 - 2, 3, padding=1)
+    def forward(self, flow, corr):
+        cor = F.relu(self.convc1(corr))
+        cor = F.relu(self.convc2(cor))
+        flo = F.relu(self.convf1(flow))
+        flo = F.relu(self.convf2(flo))
+        cor_flo = torch.cat([cor, flo], dim=1)
+        out = F.relu(self.conv(cor_flo))
+        return torch.cat([out, flow], dim=1)
+class Attention(nn.Module):
+    def __init__(self, dim, num_heads=8, qkv_bias=False, qk_scale=None):
+        super().__init__()
+        self.num_heads = num_heads
+        head_dim = dim // num_heads
+        self.scale = qk_scale or head_dim ** -0.5
+        self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)
+        self.proj = nn.Linear(dim, dim)
+    def forward(self, x):
+        B, N, C = x.shape
+        # -- Bug fixed by Chu King on 22nd November 2025
+        qkv = self.qkv(x)
+        # -- qkv = x.reshape(B, N, self.num_heads, C // self.num_heads).permute(0, 2, 1, 3)
+        qkv = qkv.view(B, N, 3, self.num_heads, C // self.num_heads)
+        qkv = qkv.permute(0, 3, 1, 2, 4) # -- (B, H, N, 3, -1)
+        # -- q, k, v = qkv, qkv, qkv
+        q, k, v = qkv.unbind(dim=3)
+        attn = (q @ k.transpose(-2, -1)) * self.scale
+        attn = attn.softmax(dim=-1)
+        x = (attn @ v).transpose(1, 2).reshape(B, N, C).contiguous()
+        x = self.proj(x)
+        return x
+class Mlp(nn.Module):
+    def __init__(
+        self,
+        in_features,
+        hidden_features=None,
+        out_features=None,
+        act_layer=nn.GELU,
+        drop=0.0,
+    ):
+        super().__init__()
+        out_features = out_features or in_features
+        hidden_features = hidden_features or in_features
+        self.fc1 = nn.Linear(in_features, hidden_features)
+        self.act = act_layer()
+        self.fc2 = nn.Linear(hidden_features, out_features)
+        self.drop = nn.Dropout(drop)
+    def forward(self, x):
+        x = self.fc1(x)
+        x = self.act(x)
+        x = self.drop(x)
+        x = self.fc2(x)
+        x = self.drop(x)
+        return x
+class TimeAttnBlock(nn.Module):
+    def __init__(self, dim=256, num_heads=8):
+        super(TimeAttnBlock, self).__init__()
+        self.temporal_attn = Attention(dim, num_heads=8, qkv_bias=False, qk_scale=None)
+        self.temporal_fc = nn.Linear(dim, dim)
+        self.temporal_norm1 = nn.LayerNorm(dim)
+        nn.init.constant_(self.temporal_fc.weight, 0)
+        nn.init.constant_(self.temporal_fc.bias, 0)
+    def forward(self, x, T=1):
+        _, _, h, w = x.shape
+        x = rearrange(x, "(b t) m h w -> (b h w) t m", h=h, w=w, t=T)
+        res_temporal1 = self.temporal_attn(self.temporal_norm1(x))
+        res_temporal1 = rearrange(
+            res_temporal1, "(b h w) t m -> b (h w t) m", h=h, w=w, t=T
+        )
+        res_temporal1 = self.temporal_fc(res_temporal1)
+        res_temporal1 = rearrange(
+            res_temporal1, " b (h w t) m -> b t m h w", h=h, w=w, t=T
+        )
+        x = rearrange(x, "(b h w) t m -> b t m h w", h=h, w=w, t=T)
+        x = x + res_temporal1
+        x = rearrange(x, "b t m h w -> (b t) m h w", h=h, w=w, t=T)
+        return x
+class SpaceAttnBlock(nn.Module):
+    def __init__(self, dim=256, num_heads=8):
+        super(SpaceAttnBlock, self).__init__()
+        self.encoder_layer = LoFTREncoderLayer(dim, nhead=num_heads, attention="linear")
+    def forward(self, x, T=1):
+        _, _, h, w = x.shape
+        x = rearrange(x, "(b t) m h w -> (b t) (h w) m", h=h, w=w, t=T)
+        x = self.encoder_layer(x, x)
+        x = rearrange(x, "(b t) (h w) m -> (b t) m h w", h=h, w=w, t=T)
+        return x
+class BasicUpdateBlock(nn.Module):
+    def __init__(self, hidden_dim, cor_planes, mask_size=8, attention_type=None):
+        super(BasicUpdateBlock, self).__init__()
+        self.attention_type = attention_type
+        if attention_type is not None:
+            if "update_time" in attention_type:
+                self.time_attn = TimeAttnBlock(dim=256, num_heads=8)
+            if "update_space" in attention_type:
+                self.space_attn = SpaceAttnBlock(dim=256, num_heads=8)
+        self.encoder = BasicMotionEncoder(cor_planes)
+        self.gru = SepConvGRU(hidden_dim=hidden_dim, input_dim=128 + hidden_dim)
+        self.flow_head = FlowHead(hidden_dim, hidden_dim=256)
+        self.mask = nn.Sequential(
+            nn.Conv2d(128, 256, 3, padding=1),
+            nn.ReLU(inplace=True),
+            nn.Conv2d(256, mask_size ** 2 * 9, 1, padding=0),
+        )
+    def forward(self, net, inp, corr, flow, upsample=True, t=1):
+        motion_features = self.encoder(flow, corr)
+        inp = torch.cat((inp, motion_features), dim=1)
+        if self.attention_type is not None:
+            if "update_time" in self.attention_type:
+                inp = self.time_attn(inp, T=t)
+            if "update_space" in self.attention_type:
+                inp = self.space_attn(inp, T=t)
+        net = self.gru(net, inp)
+        delta_flow = self.flow_head(net)
+        # scale mask to balence gradients
+        mask = 0.25 * self.mask(net)
+        return net, mask, delta_flow
+class FlowHead3D(nn.Module):
+    def __init__(self, input_dim=128, hidden_dim=256):
+        super(FlowHead3D, self).__init__()
+        self.conv1 = nn.Conv3d(input_dim, hidden_dim, 3, padding=1)
+        self.conv2 = nn.Conv3d(hidden_dim, 2, 3, padding=1)
+        self.relu = nn.ReLU(inplace=True)
+    def forward(self, x):
+        return self.conv2(self.relu(self.conv1(x)))
+class SequenceUpdateBlock3D(nn.Module):
+    def __init__(self, hidden_dim, cor_planes, mask_size=8, attention_type=None):
+        super(SequenceUpdateBlock3D, self).__init__()
+        # -- Extracts motion-related features from:
+        #    * current flow estimate
+        #    * correlation volume
+        self.encoder = BasicMotionEncoder(cor_planes)
+        # -- 3D separable convolution GRU enables temporal reasoning with 3D convolutions.
+        self.gru = SepConvGRU3D(hidden_dim=hidden_dim, input_dim=128 + hidden_dim)
+        self.flow_head = FlowHead3D(hidden_dim, hidden_dim=256)
+        self.mask = nn.Sequential(
+            nn.Conv2d(hidden_dim, hidden_dim + 128, 3, padding=1),
+            nn.ReLU(inplace=True),
+            nn.Conv2d(hidden_dim + 128, (mask_size ** 2) * 9, 1, padding=0),
+        )
+        self.attention_type = attention_type
+        if attention_type is not None:
+            if "update_time" in attention_type:
+                self.time_attn = TimeAttnBlock(dim=256, num_heads=8)
+            if "update_space" in attention_type:
+                self.space_attn = SpaceAttnBlock(dim=256, num_heads=8)
+    def forward(self, net, inp, corrs, flows, t, upsample=True):
+        inp_tensor = []
+        motion_features = self.encoder(flows, corrs)
+        inp_tensor = torch.cat([inp, motion_features], dim=1)
+        if self.attention_type is not None:
+            if "update_time" in self.attention_type:
+                inp_tensor = self.time_attn(inp_tensor, T=t)
+            if "update_space" in self.attention_type:
+                inp_tensor = self.space_attn(inp_tensor, T=t)
+        net = rearrange(net, "(b t) c h w -> b c t h w", t=t)
+        inp_tensor = rearrange(inp_tensor, "(b t) c h w -> b c t h w", t=t)
+        net = self.gru(net, inp_tensor)
+        delta_flow = self.flow_head(net)
+        # scale mask to balance gradients
+        net = rearrange(net, " b c t h w -> (b t) c h w")
+        mask = 0.25 * self.mask(net)
+        delta_flow = rearrange(delta_flow, " b c t h w -> (b t) c h w")
+        return net, mask, delta_flow

models/core/utils/config.py ADDED Viewed

	@@ -0,0 +1,961 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+import dataclasses
+import inspect
+import itertools
+import sys
+import warnings
+from collections import Counter, defaultdict
+from enum import Enum
+from typing import Any, Callable, Dict, List, Optional, Tuple, Type, TypeVar, Union
+from omegaconf import DictConfig, OmegaConf, open_dict
+from pytorch3d.common.datatypes import get_args, get_origin
+"""
+This functionality allows a configurable system to be determined in a dataclass-type
+way. It is a generalization of omegaconf's "structured", in the dataclass case.
+Core functionality:
+- Configurable -- A base class used to label a class as being one which uses this
+                    system. Uses class members and __post_init__ like a dataclass.
+- expand_args_fields -- Expands a class like `dataclasses.dataclass`. Runs automatically.
+- get_default_args -- gets an omegaconf.DictConfig for initializing a given class.
+- run_auto_creation -- Initialises nested members. To be called in __post_init__.
+In addition, a Configurable may contain members whose type is decided at runtime.
+- ReplaceableBase -- As a base instead of Configurable, labels a class to say that
+                     any child class can be used instead.
+- registry -- A global store of named child classes of  ReplaceableBase classes.
+              Used as `@registry.register` decorator on class definition.
+Additional utility functions:
+- remove_unused_components -- used for simplifying a DictConfig instance.
+- get_default_args_field -- default for DictConfig member of another configurable.
+- enable_get_default_args -- Allows get_default_args on a function or plain class.
+1. The simplest usage of this functionality is as follows. First a schema is defined
+in dataclass style.
+    class A(Configurable):
+        n: int = 9
+    class B(Configurable):
+        a: A
+        def __post_init__(self):
+            run_auto_creation(self)
+Then it can be used like
+    b_args = get_default_args(B)
+    b = B(**b_args)
+In this case, get_default_args(B) returns an omegaconf.DictConfig with the right
+members {"a_args": {"n": 9}}. It also modifies the definitions of the classes to
+something like the following. (The modification itself is done by the function
+`expand_args_fields`, which is called inside `get_default_args`.)
+    @dataclasses.dataclass
+    class A:
+        n: int = 9
+    @dataclasses.dataclass
+    class B:
+        a_args: DictConfig = dataclasses.field(default_factory=lambda: DictConfig({"n": 9}))
+        def __post_init__(self):
+            self.a = A(**self.a_args)
+2. Pluggability. Instead of a dataclass-style member being given a concrete class,
+it can be given a base class and the implementation will be looked up by name in the
+global `registry` in this module. E.g.
+    class A(ReplaceableBase):
+        k: int = 1
+    @registry.register
+    class A1(A):
+        m: int = 3
+    @registry.register
+    class A2(A):
+        n: str = "2"
+    class B(Configurable):
+        a: A
+        a_class_type: str = "A2"
+        b: Optional[A]
+        b_class_type: Optional[str] = "A2"
+        def __post_init__(self):
+            run_auto_creation(self)
+will expand to
+    @dataclasses.dataclass
+    class A:
+        k: int = 1
+    @dataclasses.dataclass
+    class A1(A):
+        m: int = 3
+    @dataclasses.dataclass
+    class A2(A):
+        n: str = "2"
+    @dataclasses.dataclass
+    class B:
+        a_class_type: str = "A2"
+        a_A1_args: DictConfig = dataclasses.field(
+            default_factory=lambda: DictConfig({"k": 1, "m": 3}
+        )
+        a_A2_args: DictConfig = dataclasses.field(
+            default_factory=lambda: DictConfig({"k": 1, "n": 2}
+        )
+        b_class_type: Optional[str] = "A2"
+        b_A1_args: DictConfig = dataclasses.field(
+            default_factory=lambda: DictConfig({"k": 1, "m": 3}
+        )
+        b_A2_args: DictConfig = dataclasses.field(
+            default_factory=lambda: DictConfig({"k": 1, "n": 2}
+        )
+        def __post_init__(self):
+            if self.a_class_type == "A1":
+                self.a = A1(**self.a_A1_args)
+            elif self.a_class_type == "A2":
+                self.a = A2(**self.a_A2_args)
+            else:
+                raise ValueError(...)
+            if self.b_class_type is None:
+                self.b = None
+            elif self.b_class_type == "A1":
+                self.b = A1(**self.b_A1_args)
+            elif self.b_class_type == "A2":
+                self.b = A2(**self.b_A2_args)
+            else:
+                raise ValueError(...)
+3. Aside from these classes, the members of these classes should be things
+which DictConfig is happy with: e.g. (bool, int, str, None, float) and what
+can be built from them with `DictConfig`s and lists of them.
+In addition, you can call `get_default_args` on a function or class to get
+the `DictConfig` of its defaulted arguments, assuming those are all things
+which `DictConfig` is happy with, so long as you add a call to
+`enable_get_default_args` after its definition. If you want to use such a
+thing as the default for a member of another configured class,
+`get_default_args_field` is a helper.
+"""
+_unprocessed_warning: str = (
+    " must be processed before it can be used."
+    + " This is done by calling expand_args_fields "
+    + "or get_default_args on it."
+)
+TYPE_SUFFIX: str = "_class_type"
+ARGS_SUFFIX: str = "_args"
+ENABLED_SUFFIX: str = "_enabled"
+class ReplaceableBase:
+    """
+    Base class for dataclass-style classes which
+    can be stored in the registry.
+    """
+    def __new__(cls, *args, **kwargs):
+        """
+        This function only exists to raise a
+        warning if class construction is attempted
+        without processing.
+        """
+        obj = super().__new__(cls)
+        if cls is not ReplaceableBase and not _is_actually_dataclass(cls):
+            warnings.warn(cls.__name__ + _unprocessed_warning)
+        return obj
+class Configurable:
+    """
+    This indicates a class which is not ReplaceableBase
+    but still needs to be
+    expanded into a dataclass with expand_args_fields.
+    This expansion is delayed.
+    """
+    def __new__(cls, *args, **kwargs):
+        """
+        This function only exists to raise a
+        warning if class construction is attempted
+        without processing.
+        """
+        obj = super().__new__(cls)
+        if cls is not Configurable and not _is_actually_dataclass(cls):
+            warnings.warn(cls.__name__ + _unprocessed_warning)
+        return obj
+_X = TypeVar("X", bound=ReplaceableBase)
+class _Registry:
+    """
+    Register from names to classes. In particular, we say that direct subclasses of
+    ReplaceableBase are "base classes" and we register subclasses of each base class
+    in a separate namespace.
+    """
+    def __init__(self) -> None:
+        self._mapping: Dict[
+            Type[ReplaceableBase], Dict[str, Type[ReplaceableBase]]
+        ] = defaultdict(dict)
+    def register(self, some_class: Type[_X]) -> Type[_X]:
+        """
+        A class decorator, to register a class in self.
+        """
+        name = some_class.__name__
+        self._register(some_class, name=name)
+        return some_class
+    def _register(
+        self,
+        some_class: Type[ReplaceableBase],
+        *,
+        base_class: Optional[Type[ReplaceableBase]] = None,
+        name: str,
+    ) -> None:
+        """
+        Register a new member.
+        Args:
+            cls: the new member
+            base_class: (optional) what the new member is a type for
+            name: name for the new member
+        """
+        if base_class is None:
+            base_class = self._base_class_from_class(some_class)
+            if base_class is None:
+                raise ValueError(
+                    f"Cannot register {some_class}. Cannot tell what it is."
+                )
+        if some_class is base_class:
+            raise ValueError(f"Attempted to register the base class {some_class}")
+        self._mapping[base_class][name] = some_class
+    def get(
+        self, base_class_wanted: Type[ReplaceableBase], name: str
+    ) -> Type[ReplaceableBase]:
+        """
+        Retrieve a class from the registry by name
+        Args:
+            base_class_wanted: parent type of type we are looking for.
+                        It determines the namespace.
+                        This will typically be a direct subclass of ReplaceableBase.
+            name: what to look for
+        Returns:
+            class type
+        """
+        if self._is_base_class(base_class_wanted):
+            base_class = base_class_wanted
+        else:
+            base_class = self._base_class_from_class(base_class_wanted)
+            if base_class is None:
+                raise ValueError(
+                    f"Cannot look up {base_class_wanted}. Cannot tell what it is."
+                )
+        result = self._mapping[base_class].get(name)
+        if result is None:
+            raise ValueError(f"{name} has not been registered.")
+        if not issubclass(result, base_class_wanted):
+            raise ValueError(
+                f"{name} resolves to {result} which does not subclass {base_class_wanted}"
+            )
+        return result
+    def get_all(
+        self, base_class_wanted: Type[ReplaceableBase]
+    ) -> List[Type[ReplaceableBase]]:
+        """
+        Retrieve all registered implementations from the registry
+        Args:
+            base_class_wanted: parent type of type we are looking for.
+                        It determines the namespace.
+                        This will typically be a direct subclass of ReplaceableBase.
+        Returns:
+            list of class types
+        """
+        if self._is_base_class(base_class_wanted):
+            return list(self._mapping[base_class_wanted].values())
+        base_class = self._base_class_from_class(base_class_wanted)
+        if base_class is None:
+            raise ValueError(
+                f"Cannot look up {base_class_wanted}. Cannot tell what it is."
+            )
+        return [
+            class_
+            for class_ in self._mapping[base_class].values()
+            if issubclass(class_, base_class_wanted) and class_ is not base_class_wanted
+        ]
+    @staticmethod
+    def _is_base_class(some_class: Type[ReplaceableBase]) -> bool:
+        """
+        Return whether the given type is a direct subclass of ReplaceableBase
+        and so gets used as a namespace.
+        """
+        return ReplaceableBase in some_class.__bases__
+    @staticmethod
+    def _base_class_from_class(
+        some_class: Type[ReplaceableBase],
+    ) -> Optional[Type[ReplaceableBase]]:
+        """
+        Find the parent class of some_class which inherits ReplaceableBase, or None
+        """
+        for base in some_class.mro()[-3::-1]:
+            if base is not ReplaceableBase and issubclass(base, ReplaceableBase):
+                return base
+        return None
+# Global instance of the registry
+registry = _Registry()
+class _ProcessType(Enum):
+    """
+    Type of member which gets rewritten by expand_args_fields.
+    """
+    CONFIGURABLE = 1
+    REPLACEABLE = 2
+    OPTIONAL_CONFIGURABLE = 3
+    OPTIONAL_REPLACEABLE = 4
+def _default_create(
+    name: str, type_: Type, process_type: _ProcessType
+) -> Callable[[Any], None]:
+    """
+    Return the default creation function for a member. This is a function which
+    could be called in __post_init__ to initialise the member, and will be called
+    from run_auto_creation.
+    Args:
+        name: name of the member
+        type_: type of the member (with any Optional removed)
+        process_type: Shows whether member's declared type inherits ReplaceableBase,
+                    in which case the actual type to be created is decided at
+                    runtime.
+    Returns:
+        Function taking one argument, the object whose member should be
+            initialized.
+    """
+    def inner(self):
+        expand_args_fields(type_)
+        args = getattr(self, name + ARGS_SUFFIX)
+        setattr(self, name, type_(**args))
+    def inner_optional(self):
+        expand_args_fields(type_)
+        enabled = getattr(self, name + ENABLED_SUFFIX)
+        if enabled:
+            args = getattr(self, name + ARGS_SUFFIX)
+            setattr(self, name, type_(**args))
+        else:
+            setattr(self, name, None)
+    def inner_pluggable(self):
+        type_name = getattr(self, name + TYPE_SUFFIX)
+        if type_name is None:
+            setattr(self, name, None)
+            return
+        chosen_class = registry.get(type_, type_name)
+        if self._known_implementations.get(type_name, chosen_class) is not chosen_class:
+            # If this warning is raised, it means that a new definition of
+            # the chosen class has been registered since our class was processed
+            # (i.e. expanded). A DictConfig which comes from our get_default_args
+            # (which might have triggered the processing) will contain the old default
+            # values for the members of the chosen class. Changes to those defaults which
+            # were made in the redefinition will not be reflected here.
+            warnings.warn(f"New implementation of {type_name} is being chosen.")
+        expand_args_fields(chosen_class)
+        args = getattr(self, f"{name}_{type_name}{ARGS_SUFFIX}")
+        setattr(self, name, chosen_class(**args))
+    if process_type == _ProcessType.OPTIONAL_CONFIGURABLE:
+        return inner_optional
+    return inner if process_type == _ProcessType.CONFIGURABLE else inner_pluggable
+def run_auto_creation(self: Any) -> None:
+    """
+    Run all the functions named in self._creation_functions.
+    """
+    for create_function in self._creation_functions:
+        getattr(self, create_function)()
+def _is_configurable_class(C) -> bool:
+    return isinstance(C, type) and issubclass(C, (Configurable, ReplaceableBase))
+def get_default_args(C, *, _do_not_process: Tuple[type, ...] = ()) -> DictConfig:
+    """
+    Get the DictConfig corresponding to the defaults in a dataclass or
+    configurable. Normal use is to provide a dataclass can be provided as C.
+    If enable_get_default_args has been called on a function or plain class,
+    then that function or class can be provided as C.
+    If C is a subclass of Configurable or ReplaceableBase, we make sure
+    it has been processed with expand_args_fields.
+    Args:
+        C: the class or function to be processed
+        _do_not_process: (internal use) When this function is called from
+                    expand_args_fields, we specify any class currently being
+                    processed, to make sure we don't try to process a class
+                    while it is already being processed.
+    Returns:
+        new DictConfig object, which is typed.
+    """
+    if C is None:
+        return DictConfig({})
+    if _is_configurable_class(C):
+        if C in _do_not_process:
+            raise ValueError(
+                f"Internal recursion error. Need processed {C},"
+                f" but cannot get it. _do_not_process={_do_not_process}"
+            )
+        # This is safe to run multiple times. It will return
+        # straight away if C has already been processed.
+        expand_args_fields(C, _do_not_process=_do_not_process)
+    if dataclasses.is_dataclass(C):
+        # Note that if get_default_args_field is used somewhere in C,
+        # this call is recursive. No special care is needed,
+        # because in practice get_default_args_field is used for
+        # separate types than the outer type.
+        out: DictConfig = OmegaConf.structured(C)
+        exclude = getattr(C, "_processed_members", ())
+        with open_dict(out):
+            for field in exclude:
+                out.pop(field, None)
+        return out
+    if _is_configurable_class(C):
+        raise ValueError(f"Failed to process {C}")
+    if not inspect.isfunction(C) and not inspect.isclass(C):
+        raise ValueError(f"Unexpected {C}")
+    dataclass_name = _dataclass_name_for_function(C)
+    dataclass = getattr(sys.modules[C.__module__], dataclass_name, None)
+    if dataclass is None:
+        raise ValueError(
+            f"Cannot get args for {C}. Was enable_get_default_args forgotten?"
+        )
+    return OmegaConf.structured(dataclass)
+def _dataclass_name_for_function(C: Any) -> str:
+    """
+    Returns the name of the dataclass which enable_get_default_args(C)
+    creates.
+    """
+    name = f"_{C.__name__}_default_args_"
+    return name
+def enable_get_default_args(C: Any, *, overwrite: bool = True) -> None:
+    """
+    If C is a function or a plain class with an __init__ function,
+    and you want get_default_args(C) to work, then add
+    `enable_get_default_args(C)` straight after the definition of C.
+    This makes a dataclass corresponding to the default arguments of C
+    and stores it in the same module as C.
+    Args:
+        C: a function, or a class with an __init__ function. Must
+            have types for all its defaulted args.
+        overwrite: whether to allow calling this a second time on
+            the same function.
+    """
+    if not inspect.isfunction(C) and not inspect.isclass(C):
+        raise ValueError(f"Unexpected {C}")
+    field_annotations = []
+    for pname, defval in _params_iter(C):
+        default = defval.default
+        if default == inspect.Parameter.empty:
+            # we do not have a default value for the parameter
+            continue
+        if defval.annotation == inspect._empty:
+            raise ValueError(
+                "All arguments of the input callable have to be typed."
+                + f" Argument '{pname}' does not have a type annotation."
+            )
+        _, annotation = _resolve_optional(defval.annotation)
+        if isinstance(default, set):  # force OmegaConf to convert it to ListConfig
+            default = tuple(default)
+        if isinstance(default, (list, dict)):
+            # OmegaConf will convert to [Dict|List]Config, so it is safe to reuse the value
+            field_ = dataclasses.field(default_factory=lambda default=default: default)
+        elif not _is_immutable_type(annotation, default):
+            continue
+        else:
+            # we can use a simple default argument for dataclass.field
+            field_ = dataclasses.field(default=default)
+        field_annotations.append((pname, defval.annotation, field_))
+    name = _dataclass_name_for_function(C)
+    module = sys.modules[C.__module__]
+    if hasattr(module, name):
+        if overwrite:
+            warnings.warn(f"Overwriting {name} in {C.__module__}.")
+        else:
+            raise ValueError(f"Cannot overwrite {name} in {C.__module__}.")
+    dc = dataclasses.make_dataclass(name, field_annotations)
+    dc.__module__ = C.__module__
+    setattr(module, name, dc)
+def _params_iter(C):
+    """Returns dict of keyword args of a class or function C."""
+    if inspect.isclass(C):
+        return itertools.islice(  # exclude `self`
+            inspect.signature(C.__init__).parameters.items(), 1, None
+        )
+    return inspect.signature(C).parameters.items()
+def _is_immutable_type(type_: Type, val: Any) -> bool:
+    PRIMITIVE_TYPES = (int, float, bool, str, bytes, tuple)
+    # sometimes type can be too relaxed (e.g. Any), so we also check values
+    if isinstance(val, PRIMITIVE_TYPES):
+        return True
+    return type_ in PRIMITIVE_TYPES or (
+        inspect.isclass(type_) and issubclass(type_, Enum)
+    )
+# copied from OmegaConf
+def _resolve_optional(type_: Any) -> Tuple[bool, Any]:
+    """Check whether `type_` is equivalent to `typing.Optional[T]` for some T."""
+    if get_origin(type_) is Union:
+        args = get_args(type_)
+        if len(args) == 2 and args[1] == type(None):  # noqa E721
+            return True, args[0]
+    if type_ is Any:
+        return True, Any
+    return False, type_
+def _is_actually_dataclass(some_class) -> bool:
+    # Return whether the class some_class has been processed with
+    # the dataclass annotation. This is more specific than
+    # dataclasses.is_dataclass which returns True on anything
+    # deriving from a dataclass.
+    # Checking for __init__ would also work for our purpose.
+    return "__dataclass_fields__" in some_class.__dict__
+def expand_args_fields(
+    some_class: Type[_X], *, _do_not_process: Tuple[type, ...] = ()
+) -> Type[_X]:
+    """
+    This expands a class which inherits Configurable or ReplaceableBase classes,
+    including dataclass processing. some_class is modified in place by this function.
+    For classes of type ReplaceableBase, you can add some_class to the registry before
+    or after calling this function. But potential inner classes need to be registered
+    before this function is run on the outer class.
+    The transformations this function makes, before the concluding
+    dataclasses.dataclass, are as follows.  if X is a base class with registered
+    subclasses Y and Z, replace a class member
+        x: X
+    and optionally
+        x_class_type: str = "Y"
+        def create_x(self):...
+    with
+        x_Y_args : DictConfig = dataclasses.field(default_factory=lambda: get_default_args(Y))
+        x_Z_args : DictConfig = dataclasses.field(default_factory=lambda: get_default_args(Z))
+        def create_x(self):
+            self.x = registry.get(X, self.x_class_type)(
+                **self.getattr(f"x_{self.x_class_type}_args)
+            )
+        x_class_type: str = "UNDEFAULTED"
+    without adding the optional attributes if they are already there.
+    Similarly, replace
+        x: Optional[X]
+    and optionally
+        x_class_type: Optional[str] = "Y"
+        def create_x(self):...
+    with
+        x_Y_args : DictConfig = dataclasses.field(default_factory=lambda: get_default_args(Y))
+        x_Z_args : DictConfig = dataclasses.field(default_factory=lambda: get_default_args(Z))
+        def create_x(self):
+            if self.x_class_type is None:
+                self.x = None
+                return
+            self.x = registry.get(X, self.x_class_type)(
+                **self.getattr(f"x_{self.x_class_type}_args)
+            )
+        x_class_type: Optional[str] = "UNDEFAULTED"
+    without adding the optional attributes if they are already there.
+    Similarly, if X is a subclass of Configurable,
+        x: X
+    and optionally
+        def create_x(self):...
+    will be replaced with
+        x_args : DictConfig = dataclasses.field(default_factory=lambda: get_default_args(X))
+        def create_x(self):
+            self.x = X(self.x_args)
+    Similarly, replace,
+        x: Optional[X]
+    and optionally
+        def create_x(self):...
+        x_enabled: bool = ...
+    with
+        x_args : DictConfig = dataclasses.field(default_factory=lambda: get_default_args(X))
+        x_enabled: bool = False
+        def create_x(self):
+            if self.x_enabled:
+                self.x = X(self.x_args)
+            else:
+                self.x = None
+    Also adds the following class members, unannotated so that dataclass
+    ignores them.
+        - _creation_functions: Tuple[str] of all the create_ functions,
+            including those from base classes.
+        - _known_implementations: Dict[str, Type] containing the classes which
+            have been found from the registry.
+            (used only to raise a warning if it one has been overwritten)
+        - _processed_members: a Dict[str, Any] of all the members which have been
+            transformed, with values giving the types they were declared to have.
+            (E.g. {"x": X} or {"x": Optional[X]} in the cases above.)
+    Args:
+        some_class: the class to be processed
+        _do_not_process: Internal use for get_default_args: Because get_default_args calls
+                        and is called by this function, we let it specify any class currently
+                        being processed, to make sure we don't try to process a class while
+                        it is already being processed.
+    Returns:
+        some_class itself, which has been modified in place. This
+        allows this function to be used as a class decorator.
+    """
+    if _is_actually_dataclass(some_class):
+        return some_class
+    # The functions this class's run_auto_creation will run.
+    creation_functions: List[str] = []
+    # The classes which this type knows about from the registry
+    # We could use a weakref.WeakValueDictionary here which would mean
+    # that we don't warn if the class we should have expected is elsewhere
+    # unused.
+    known_implementations: Dict[str, Type] = {}
+    # Names of members which have been processed.
+    processed_members: Dict[str, Any] = {}
+    # For all bases except ReplaceableBase and Configurable and object,
+    # we need to process them before our own processing. This is
+    # because dataclasses expect to inherit dataclasses and not unprocessed
+    # dataclasses.
+    for base in some_class.mro()[-3:0:-1]:
+        if base is ReplaceableBase:
+            continue
+        if base is Configurable:
+            continue
+        if not issubclass(base, (Configurable, ReplaceableBase)):
+            continue
+        expand_args_fields(base, _do_not_process=_do_not_process)
+        if "_creation_functions" in base.__dict__:
+            creation_functions.extend(base._creation_functions)
+        if "_known_implementations" in base.__dict__:
+            known_implementations.update(base._known_implementations)
+        if "_processed_members" in base.__dict__:
+            processed_members.update(base._processed_members)
+    to_process: List[Tuple[str, Type, _ProcessType]] = []
+    if "__annotations__" in some_class.__dict__:
+        for name, type_ in some_class.__annotations__.items():
+            underlying_and_process_type = _get_type_to_process(type_)
+            if underlying_and_process_type is None:
+                continue
+            underlying_type, process_type = underlying_and_process_type
+            to_process.append((name, underlying_type, process_type))
+    for name, underlying_type, process_type in to_process:
+        processed_members[name] = some_class.__annotations__[name]
+        _process_member(
+            name=name,
+            type_=underlying_type,
+            process_type=process_type,
+            some_class=some_class,
+            creation_functions=creation_functions,
+            _do_not_process=_do_not_process,
+            known_implementations=known_implementations,
+        )
+    for key, count in Counter(creation_functions).items():
+        if count > 1:
+            warnings.warn(f"Clash with {key} in a base class.")
+    some_class._creation_functions = tuple(creation_functions)
+    some_class._processed_members = processed_members
+    some_class._known_implementations = known_implementations
+    dataclasses.dataclass(eq=False)(some_class)
+    return some_class
+def get_default_args_field(C, *, _do_not_process: Tuple[type, ...] = ()):
+    """
+    Get a dataclass field which defaults to get_default_args(...)
+    Args:
+        As for get_default_args.
+    Returns:
+        function to return new DictConfig object
+    """
+    def create():
+        return get_default_args(C, _do_not_process=_do_not_process)
+    return dataclasses.field(default_factory=create)
+def _get_type_to_process(type_) -> Optional[Tuple[Type, _ProcessType]]:
+    """
+    If a member is annotated as `type_`, and that should expanded in
+    expand_args_fields, return how it should be expanded.
+    """
+    if get_origin(type_) == Union:
+        # We look for Optional[X] which is a Union of X with None.
+        args = get_args(type_)
+        if len(args) != 2 or all(a is not type(None) for a in args):  # noqa: E721
+            return
+        underlying = args[0] if args[1] is type(None) else args[1]  # noqa: E721
+        if (
+            isinstance(underlying, type)
+            and issubclass(underlying, ReplaceableBase)
+            and ReplaceableBase in underlying.__bases__
+        ):
+            return underlying, _ProcessType.OPTIONAL_REPLACEABLE
+        if isinstance(underlying, type) and issubclass(underlying, Configurable):
+            return underlying, _ProcessType.OPTIONAL_CONFIGURABLE
+    if not isinstance(type_, type):
+        # e.g. any other Union or Tuple
+        return
+    if issubclass(type_, ReplaceableBase) and ReplaceableBase in type_.__bases__:
+        return type_, _ProcessType.REPLACEABLE
+    if issubclass(type_, Configurable):
+        return type_, _ProcessType.CONFIGURABLE
+def _process_member(
+    *,
+    name: str,
+    type_: Type,
+    process_type: _ProcessType,
+    some_class: Type,
+    creation_functions: List[str],
+    _do_not_process: Tuple[type, ...],
+    known_implementations: Dict[str, Type],
+) -> None:
+    """
+    Make the modification (of expand_args_fields) to some_class for a single member.
+    Args:
+        name: member name
+        type_: member type (with Optional removed if needed)
+        process_type: whether member has dynamic type
+        some_class: (MODIFIED IN PLACE) the class being processed
+        creation_functions: (MODIFIED IN PLACE) the names of the create functions
+        _do_not_process: as for expand_args_fields.
+        known_implementations: (MODIFIED IN PLACE) known types from the registry
+    """
+    # Because we are adding defaultable members, make
+    # sure they go at the end of __annotations__ in case
+    # there are non-defaulted standard class members.
+    del some_class.__annotations__[name]
+    if process_type in (_ProcessType.REPLACEABLE, _ProcessType.OPTIONAL_REPLACEABLE):
+        type_name = name + TYPE_SUFFIX
+        if type_name not in some_class.__annotations__:
+            if process_type == _ProcessType.OPTIONAL_REPLACEABLE:
+                some_class.__annotations__[type_name] = Optional[str]
+            else:
+                some_class.__annotations__[type_name] = str
+            setattr(some_class, type_name, "UNDEFAULTED")
+        for derived_type in registry.get_all(type_):
+            if derived_type in _do_not_process:
+                continue
+            if issubclass(derived_type, some_class):
+                # When derived_type is some_class we have a simple
+                # recursion to avoid. When it's a strict subclass the
+                # situation is even worse.
+                continue
+            known_implementations[derived_type.__name__] = derived_type
+            args_name = f"{name}_{derived_type.__name__}{ARGS_SUFFIX}"
+            if args_name in some_class.__annotations__:
+                raise ValueError(
+                    f"Cannot generate {args_name} because it is already present."
+                )
+            some_class.__annotations__[args_name] = DictConfig
+            setattr(
+                some_class,
+                args_name,
+                get_default_args_field(
+                    derived_type, _do_not_process=_do_not_process + (some_class,)
+                ),
+            )
+    else:
+        args_name = name + ARGS_SUFFIX
+        if args_name in some_class.__annotations__:
+            raise ValueError(
+                f"Cannot generate {args_name} because it is already present."
+            )
+        if issubclass(type_, some_class) or type_ in _do_not_process:
+            raise ValueError(f"Cannot process {type_} inside {some_class}")
+        some_class.__annotations__[args_name] = DictConfig
+        setattr(
+            some_class,
+            args_name,
+            get_default_args_field(
+                type_,
+                _do_not_process=_do_not_process + (some_class,),
+            ),
+        )
+        if process_type == _ProcessType.OPTIONAL_CONFIGURABLE:
+            enabled_name = name + ENABLED_SUFFIX
+            if enabled_name not in some_class.__annotations__:
+                some_class.__annotations__[enabled_name] = bool
+                setattr(some_class, enabled_name, False)
+    creation_function_name = f"create_{name}"
+    if not hasattr(some_class, creation_function_name):
+        setattr(
+            some_class,
+            creation_function_name,
+            _default_create(name, type_, process_type),
+        )
+    creation_functions.append(creation_function_name)
+def remove_unused_components(dict_: DictConfig) -> None:
+    """
+    Assuming dict_ represents the state of a configurable,
+    modify it to remove all the portions corresponding to
+    pluggable parts which are not in use.
+    For example, if renderer_class_type is SignedDistanceFunctionRenderer,
+    the renderer_MultiPassEmissionAbsorptionRenderer_args will be
+    removed. Also, if chocolate_enabled is False, then chocolate_args will
+    be removed.
+    Args:
+        dict_: (MODIFIED IN PLACE) a DictConfig instance
+    """
+    keys = [key for key in dict_ if isinstance(key, str)]
+    suffix_length = len(TYPE_SUFFIX)
+    replaceables = [key[:-suffix_length] for key in keys if key.endswith(TYPE_SUFFIX)]
+    args_keys = [key for key in keys if key.endswith(ARGS_SUFFIX)]
+    for replaceable in replaceables:
+        selected_type = dict_[replaceable + TYPE_SUFFIX]
+        if selected_type is None:
+            expect = ""
+        else:
+            expect = replaceable + "_" + selected_type + ARGS_SUFFIX
+        with open_dict(dict_):
+            for key in args_keys:
+                if key.startswith(replaceable + "_") and key != expect:
+                    del dict_[key]
+    suffix_length = len(ENABLED_SUFFIX)
+    enableables = [key[:-suffix_length] for key in keys if key.endswith(ENABLED_SUFFIX)]
+    for enableable in enableables:
+        enabled = dict_[enableable + ENABLED_SUFFIX]
+        if not enabled:
+            with open_dict(dict_):
+                dict_.pop(enableable + ARGS_SUFFIX, None)
+    for key in dict_:
+        if isinstance(dict_.get(key), DictConfig):
+            remove_unused_components(dict_[key])

models/core/utils/utils.py ADDED Viewed

	@@ -0,0 +1,44 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+import torch.nn.functional as F
+def interp(tensor, size):
+    return F.interpolate(
+        tensor,
+        size=size,
+        mode="bilinear",
+        align_corners=True,
+    )
+class InputPadder:
+    """Pads images such that dimensions are divisible by 8"""
+    def __init__(self, dims, mode="sintel", divis_by=8):
+        self.ht, self.wd = dims[-2:]
+        pad_ht = (((self.ht // divis_by) + 1) * divis_by - self.ht) % divis_by
+        pad_wd = (((self.wd // divis_by) + 1) * divis_by - self.wd) % divis_by
+        if mode == "sintel":
+            self._pad = [
+                pad_wd // 2,
+                pad_wd - pad_wd // 2,
+                pad_ht // 2,
+                pad_ht - pad_ht // 2,
+            ]
+        else:
+            self._pad = [pad_wd // 2, pad_wd - pad_wd // 2, 0, pad_ht]
+    def pad(self, *inputs):
+        assert all((x.ndim == 4) for x in inputs)
+        return [F.pad(x, self._pad, mode="replicate") for x in inputs]
+    def unpad(self, x):
+        assert x.ndim == 4
+        ht, wd = x.shape[-2:]
+        c = [self._pad[2], ht - self._pad[3], self._pad[0], wd - self._pad[1]]
+        return x[..., c[0] : c[1], c[2] : c[3]]

models/dynamic_stereo_model.py ADDED Viewed

	@@ -0,0 +1,50 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+from typing import ClassVar
+import torch
+from pytorch3d.implicitron.tools.config import Configurable
+from dynamic_stereo.models.core.dynamic_stereo import DynamicStereo
+class DynamicStereoModel(Configurable, torch.nn.Module):
+    MODEL_CONFIG_NAME: ClassVar[str] = "DynamicStereoModel"
+    # model_weights: str = "./checkpoints/dynamic_stereo_sf.pth"
+    model_weights: str = "./checkpoints/dynamic_stereo_dr_sf.pth"
+    kernel_size: int = 20
+    def __post_init__(self):
+        super().__init__()
+        self.mixed_precision = False
+        model = DynamicStereo(
+            mixed_precision=self.mixed_precision,
+            num_frames=5,
+            attention_type="self_stereo_temporal_update_time_update_space",
+            use_3d_update_block=True,
+            different_update_blocks=True,
+        )
+        state_dict = torch.load(self.model_weights, map_location="cpu")
+        if "model" in state_dict:
+            state_dict = state_dict["model"]
+        if "state_dict" in state_dict:
+            state_dict = state_dict["state_dict"]
+            state_dict = {"module." + k: v for k, v in state_dict.items()}
+        model.load_state_dict(state_dict, strict=False)
+        self.model = model
+        self.model.to("cuda")
+        self.model.eval()
+    def forward(self, batch_dict, iters=20):
+        return self.model.forward_batch_test(
+            batch_dict, kernel_size=self.kernel_size, iters=iters
+        )

models/raft_stereo_model.py ADDED Viewed

	@@ -0,0 +1,84 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+from collections import defaultdict
+from types import SimpleNamespace
+from typing import ClassVar
+import torch
+from pytorch3d.implicitron.tools.config import Configurable
+import importlib
+import sys
+sys.path.append("third_party/RAFT-Stereo")
+raft_stereo = importlib.import_module(
+    "dynamic_stereo.third_party.RAFT-Stereo.core.raft_stereo"
+)
+raft_stereo_utils = importlib.import_module(
+    "dynamic_stereo.third_party.RAFT-Stereo.core.utils.utils"
+)
+autocast = torch.cuda.amp.autocast
+class RAFTStereoModel(Configurable, torch.nn.Module):
+    MODEL_CONFIG_NAME: ClassVar[str] = "RAFTStereoModel"
+    model_weights: str = "./third_party/RAFT-Stereo/models/raftstereo-middlebury.pth"
+    def __post_init__(self):
+        super().__init__()
+        model_args = SimpleNamespace(
+            hidden_dims=[128] * 3,
+            corr_implementation="reg",
+            shared_backbone=False,
+            corr_levels=4,
+            corr_radius=4,
+            n_downsample=2,
+            slow_fast_gru=False,
+            n_gru_layers=3,
+            mixed_precision=False,
+            context_norm="batch",
+        )
+        self.args = model_args
+        model = torch.nn.DataParallel(
+            raft_stereo.RAFTStereo(model_args), device_ids=[0]
+        )
+        state_dict = torch.load(self.model_weights, map_location="cpu")
+        if "state_dict" in state_dict:
+            state_dict = state_dict["state_dict"]
+            state_dict = {"module." + k: v for k, v in state_dict.items()}
+        model.load_state_dict(state_dict)
+        self.model = model.module
+        self.model.to("cuda")
+        self.model.eval()
+    def forward(self, batch_dict, iters=32):
+        predictions = defaultdict(list)
+        for stereo_pair in batch_dict["stereo_video"]:
+            left_image_rgb = stereo_pair[None, 0].cuda()
+            right_image_rgb = stereo_pair[None, 1].cuda()
+            padder = raft_stereo_utils.InputPadder(left_image_rgb.shape, divis_by=32)
+            left_image_rgb, right_image_rgb = padder.pad(
+                left_image_rgb, right_image_rgb
+            )
+            with autocast(enabled=self.args.mixed_precision):
+                _, flow_up = self.model.forward(
+                    left_image_rgb,
+                    right_image_rgb,
+                    iters=iters,
+                    test_mode=True,
+                )
+            flow_up = padder.unpad(flow_up)
+            predictions["disparity"].append(flow_up)
+        predictions["disparity"] = (
+            torch.stack(predictions["disparity"]).squeeze(1).abs()
+        )
+        return predictions

notebooks/Dynamic_Replica_demo.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

notebooks/evaluate.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

requirements.txt ADDED Viewed

	@@ -0,0 +1,20 @@

+hydra-core==1.1
+einops==0.4.1
+flow_vis==0.1
+imageio==2.21.1
+matplotlib==3.5.3
+munch==2.5.0
+numpy==1.23.5
+omegaconf==2.1.0
+opencv_python==4.6.0.66
+opt_einsum==3.3.0
+Pillow==9.5.0
+pytorch_lightning==1.6.0
+requests
+scikit_image==0.19.2
+scipy==1.10.0
+setuptools==65.6.3
+tabulate==0.8.10
+tqdm==4.64.1
+moviepy
+jupyter

scripts/checksum_check.py ADDED Viewed

	@@ -0,0 +1,154 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+import os
+import glob
+import argparse
+import hashlib
+import json
+from typing import Optional
+from multiprocessing import Pool
+from tqdm import tqdm
+DEFAULT_SHA256S_FILE = os.path.join(__file__.rsplit(os.sep, 2)[0], "dr_sha256.json")
+BLOCKSIZE = 65536
+def main(
+    download_folder: str,
+    sha256s_file: str,
+    dump: bool = False,
+    n_sha256_workers: int = 4
+):
+    if not os.path.isfile(sha256s_file):
+        raise ValueError(f"The SHA256 file does not exist ({sha256s_file}).")
+    expected_sha256s = get_expected_sha256s(
+        sha256s_file=sha256s_file
+    )
+    zipfiles = sorted(glob.glob(os.path.join(download_folder, "*.zip")))
+    print(f"Extracting SHA256 hashes for {len(zipfiles)} files in {download_folder}.")
+    extracted_sha256s_list = []
+    with Pool(processes=n_sha256_workers) as sha_pool:
+        for extracted_hash in tqdm(
+            sha_pool.imap(_sha256_file_and_print, zipfiles),
+            total=len(zipfiles),
+        ):
+            extracted_sha256s_list.append(extracted_hash)
+            pass
+    extracted_sha256s = dict(
+        zip([os.path.split(z)[-1] for z in zipfiles], extracted_sha256s_list)
+    )
+    if dump:
+        print(extracted_sha256s)
+        with open(sha256s_file, "w") as f:
+            json.dump(extracted_sha256s, f, indent=2)
+    missing_keys, invalid_keys = [], []
+    for k in expected_sha256s.keys():
+        if k not in extracted_sha256s:
+            print(f"{k} missing!")
+            missing_keys.append(k)
+        elif expected_sha256s[k] != extracted_sha256s[k]:
+            print(
+                f"'{k}' does not match!"
+                + f" ({expected_sha256s[k]} != {extracted_sha256s[k]})"
+            )
+            invalid_keys.append(k)
+    if len(invalid_keys) + len(missing_keys) > 0:
+        raise ValueError(
+            f"Checksum checker failed!"
+            + f" Non-matching checksums: {str(invalid_keys)};"
+            + f" missing files: {str(missing_keys)}."
+        )
+def get_expected_sha256s(
+    sha256s_file: str
+):
+    with open(sha256s_file, "r") as f:
+        expected_sha256s = json.load(f)
+    return expected_sha256s
+def check_dr_sha256(
+    path: str,
+    sha256s_file: str,
+    expected_sha256s: Optional[dict] = None,
+    do_assertion: bool = True,
+):
+    zipname = os.path.split(path)[-1]
+    if expected_sha256s is None:
+        expected_sha256s = get_expected_sha256s(
+            sha256s_file=sha256s_file,
+        )
+    extracted_hash = sha256_file(path)
+    if do_assertion:
+        assert (
+            extracted_hash == expected_sha256s[zipname]
+        ), f"{zipname}: ({extracted_hash} != {expected_sha256s[zipname]})"
+    else:
+        return extracted_hash == expected_sha256s[zipname]
+def sha256_file(path: str):
+    sha256_hash = hashlib.sha256()
+    with open(path, "rb") as f:
+        file_buffer = f.read(BLOCKSIZE)
+        while len(file_buffer) > 0:
+            sha256_hash.update(file_buffer)
+            file_buffer = f.read(BLOCKSIZE)
+    digest_ = sha256_hash.hexdigest()
+    return digest_
+def _sha256_file_and_print(path: str):
+    digest_ = sha256_file(path)
+    print(f"{path}: {digest_}")
+    return digest_
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(
+        description="Check SHA256 hashes of the Dynamic Replica dataset."
+    )
+    parser.add_argument(
+        "--download_folder",
+        type=str,
+        help="A local target folder for downloading the the dataset files.",
+    )
+    parser.add_argument(
+        "--sha256s_file",
+        type=str,
+        help="A local target folder for downloading the the dataset files.",
+        default=DEFAULT_SHA256S_FILE,
+    )
+    parser.add_argument(
+        "--num_workers",
+        type=int,
+        default=4,
+        help="The number of sha256 extraction workers.",
+    )
+    parser.add_argument(
+        "--dump_sha256s",
+        action="store_true",
+        help="Store sha256s hashes.",
+    )
+    args = parser.parse_args()
+    main(
+        str(args.download_folder),
+        dump=bool(args.dump_sha256s),
+        n_sha256_workers=int(args.num_workers),
+        sha256s_file=str(args.sha256s_file),
+    )

scripts/download_dynamic_replica.py ADDED Viewed

	@@ -0,0 +1,35 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+import os
+import sys
+sys.path.append("./scripts/")
+from download_utils import build_arg_parser, download_dataset
+DEFAULT_LINK_LIST_FILE = os.path.join(os.path.dirname(__file__), "links.json")
+DEFAULT_SHA256S_FILE = os.path.join(os.path.dirname(__file__), "dr_sha256.json")
+if __name__ == "__main__":
+    parser = build_arg_parser(
+        "dynamic_replica", DEFAULT_LINK_LIST_FILE, DEFAULT_SHA256S_FILE
+    )
+    args = parser.parse_args()
+    os.makedirs(args.download_folder, exist_ok=True)
+    download_dataset(
+        str(args.link_list_file),
+        str(args.download_folder),
+        n_download_workers=int(args.n_download_workers),
+        n_extract_workers=int(args.n_extract_workers),
+        download_splits=args.download_splits,
+        checksum_check=bool(args.checksum_check),
+        clear_archives_after_unpacking=bool(args.clear_archives_after_unpacking),
+        sha256s_file=str(args.sha256_file),
+        skip_downloaded_archives=not bool(args.redownload_existing_archives),
+    )

scripts/download_utils.py ADDED Viewed

	@@ -0,0 +1,280 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+import os
+import shutil
+import requests
+import functools
+import json
+import warnings
+from argparse import ArgumentParser
+from typing import List, Optional
+from multiprocessing import Pool
+from tqdm import tqdm
+import sys
+sys.path.append("./scripts/")
+from checksum_check import check_dr_sha256
+def download_dataset(
+    link_list_file: str,
+    download_folder: str,
+    n_download_workers: int = 4,
+    n_extract_workers: int = 4,
+    download_splits: List[str] = ['real', 'valid', 'test', 'train'],
+    checksum_check: bool = False,
+    clear_archives_after_unpacking: bool = False,
+    skip_downloaded_archives: bool = True,
+    sha256s_file: Optional[str] = None,
+):
+    """
+    Downloads and unpacks the dataset in CO3D format.
+    Note: The script will make a folder `<download_folder>/_in_progress`, which
+        stores files whose download is in progress. The folder can be safely deleted
+        the download is finished.
+    Args:
+        link_list_file: A text file with the list of zip file download links.
+        download_folder: A local target folder for downloading the
+            the dataset files.
+        n_download_workers: The number of parallel workers
+            for downloading the dataset files.
+        n_extract_workers: The number of parallel workers
+            for extracting the dataset files.
+        download_splits: A list of data splits to download.
+            Must be in ['real', 'valid', 'test', 'train'].
+        checksum_check: Enable validation of the downloaded file's checksum before
+            extraction.
+        clear_archives_after_unpacking: Delete the unnecessary downloaded archive files
+            after unpacking.
+        skip_downloaded_archives: Skip re-downloading already downloaded archives.
+    """
+    if checksum_check and not sha256s_file:
+        raise ValueError(
+            "checksum_check is requested but ground-truth SHA256 file not provided!"
+        )
+    if not os.path.isfile(link_list_file):
+        raise ValueError(
+            "Please specify `link_list_file` with a valid path to a json"
+            " with zip file download links."
+            # " The file is stored in the DynamicStereo github:"
+            # " https://github.com/facebookresearch/dynamic_stereo/blob/main/dynamic_stereo/links.json"
+        )
+    if not os.path.isdir(download_folder):
+        raise ValueError(
+            "Please specify `download_folder` with a valid path to a target folder"
+            + " for downloading the dataset."
+            + f" {download_folder} does not exist."
+        )
+    # read the link file
+    with open(link_list_file, "r") as f:
+        links = json.load(f)
+    for split in download_splits:
+        if split not in ['real', 'valid', 'test', 'train']:
+            raise ValueError(
+                        f"Download split {str(split)} is not valid"
+                    )
+    data_links = []
+    for split_name, urls in links.items():
+        if split_name in download_splits:
+            for url in urls:
+                link_name = os.path.split(url)[-1]
+                data_links.append((split_name, link_name, url))
+    with Pool(processes=n_download_workers) as download_pool:
+        download_ok = {}
+        for link_name, ok in tqdm(
+            download_pool.imap(
+                functools.partial(
+                    _download_split_file,
+                    download_folder,
+                    checksum_check,
+                    sha256s_file,
+                    skip_downloaded_archives,
+                ),
+                data_links,
+            ),
+            total=len(data_links),
+        ):
+            download_ok[link_name] = ok
+    with Pool(processes=n_extract_workers) as extract_pool:
+        for _ in tqdm(
+            extract_pool.imap(
+                functools.partial(
+                    _unpack_split_file,
+                    download_folder,
+                    clear_archives_after_unpacking,
+                ),
+                data_links,
+            ),
+            total=len(data_links),
+        ):
+            pass
+    print("Done")
+def build_arg_parser(
+    dataset_name: str,
+    default_link_list_file: str,
+    default_sha256_file: str,
+) -> ArgumentParser:
+    parser = ArgumentParser(description=f"Download the {dataset_name} dataset.")
+    parser.add_argument(
+        "--download_folder",
+        type=str,
+        required=True,
+        help="A local target folder for downloading the the dataset files.",
+    )
+    parser.add_argument(
+        "--n_download_workers",
+        type=int,
+        default=4,
+        help="The number of parallel workers for downloading the dataset files.",
+    )
+    parser.add_argument(
+        "--n_extract_workers",
+        type=int,
+        default=4,
+        help="The number of parallel workers for extracting the dataset files.",
+    )
+    parser.add_argument(
+        "--download_splits",
+        default=['real', 'valid', 'test', 'train'],
+        nargs='+',
+        help=f"A comma-separated list of {dataset_name} splits to download.",
+    )
+    parser.add_argument(
+        "--link_list_file",
+        type=str,
+        default=default_link_list_file,
+        help=(
+            f"The file with html links to the {dataset_name} dataset files."
+            + " In most cases the default local file `links.json` should be used."
+        ),
+    )
+    parser.add_argument(
+        "--sha256_file",
+        type=str,
+        default=default_sha256_file,
+        help=(
+            f"The file with SHA256 hashes of {dataset_name} dataset files."
+            + " In most cases the default local file `dr_sha256.json` should be used."
+        ),
+    )
+    parser.add_argument(
+        "--checksum_check",
+        action="store_true",
+        default=True,
+        help="Check the SHA256 checksum of each downloaded file before extraction.",
+    )
+    parser.add_argument(
+        "--no_checksum_check",
+        action="store_false",
+        dest="checksum_check",
+        default=False,
+        help="Does not check the SHA256 checksum of each downloaded file before extraction.",
+    )
+    parser.set_defaults(checksum_check=True)
+    parser.add_argument(
+        "--clear_archives_after_unpacking",
+        action="store_true",
+        default=False,
+        help="Delete the unnecessary downloaded archive files after unpacking.",
+    )
+    parser.add_argument(
+        "--redownload_existing_archives",
+        action="store_true",
+        default=False,
+        help="Redownload the already-downloaded archives.",
+    )
+    return parser
+def _unpack_split_file(
+    download_folder: str,
+    clear_archive: bool,
+    link: str,
+):
+    split, link_name, url = link
+    local_fl = os.path.join(download_folder, link_name)
+    print(f"Unpacking dataset file {local_fl} ({link_name}) to {download_folder}.")
+    download_folder_split = os.path.join(download_folder, split)
+    # os.makedirs(download_folder_split, exist_ok=True)
+    shutil.unpack_archive(local_fl, download_folder_split)
+    if clear_archive:
+        os.remove(local_fl)
+def _download_split_file(
+    download_folder: str,
+    checksum_check: bool,
+    sha256s_file: Optional[str],
+    skip_downloaded_files: bool,
+    link: str,
+):
+    __, link_name, url = link
+    local_fl_final = os.path.join(download_folder, link_name)
+    if skip_downloaded_files and os.path.isfile(local_fl_final):
+        print(f"Skipping {local_fl_final}, already downloaded!")
+        return link_name, True
+    in_progress_folder = os.path.join(download_folder, "_in_progress")
+    os.makedirs(in_progress_folder, exist_ok=True)
+    local_fl = os.path.join(in_progress_folder, link_name)
+    print(f"Downloading dataset file {link_name} ({url}) to {local_fl}.")
+    _download_with_progress_bar(url, local_fl, link_name)
+    if checksum_check:
+        print(f"Checking SHA256 for {local_fl}.")
+        try:
+            check_dr_sha256(
+                local_fl,
+                sha256s_file=sha256s_file,
+            )
+        except AssertionError:
+            warnings.warn(
+                f"Checksums for {local_fl} did not match!"
+                + " This is likely due to a network failure,"
+                + " please restart the download script."
+            )
+            return link_name, False
+    os.rename(local_fl, local_fl_final)
+    return link_name, True
+def _download_with_progress_bar(url: str, fname: str, filename: str):
+    # taken from https://stackoverflow.com/a/62113293/986477
+    resp = requests.get(url, stream=True)
+    print(url)
+    total = int(resp.headers.get("content-length", 0))
+    with open(fname, "wb") as file, tqdm(
+        desc=fname,
+        total=total,
+        unit="iB",
+        unit_scale=True,
+        unit_divisor=1024,
+    ) as bar:
+        for datai, data in enumerate(resp.iter_content(chunk_size=1024)):
+            size = file.write(data)
+            bar.update(size)
+            if datai % max((max(total // 1024, 1) // 20), 1) == 0:
+                print(f"{filename}: Downloaded {100.0*(float(bar.n)/max(total, 1)):3.1f}%.")
+                print(bar)

scripts/dr_sha256.json ADDED Viewed

	@@ -0,0 +1,106 @@

+{
+  "real_000.zip": "e5c2aac04146d783c64f76d0ef7a9e8d49d80ffac99d2a795563517f15943a6f",
+  "valid_000.zip": "0f35bee47030ae1a30289beb92ba69c5336491e0f07aab0a05cb5505173d1faf",
+  "valid_001.zip": "cb37d3b1f643118ae22840b4212b00c00a8fe137099d3730a07796a5fefab24a",
+  "valid_002.zip": "5535f2a98e06c68cf97e3259e962e3d44465a1820369e4425c4ef2a719b01ad0",
+  "valid_003.zip": "e19db94514d22829743aa363698f407ecfd98d8f08eab037289a420939ef5143",
+  "valid_004.zip": "953328f24ba0c3e8709df3829cce238305a8998bf7ae938c80069fab6f513862",
+  "valid_005.zip": "27ce4c7424292dcf3e8e0b370fbbc848bd6d73ae28ea5832fddfa8e9c17d6011",
+  "test_000.zip": "a56fa676a7a3dc52b33f1571d41fb0221e289735acccb7b9ad42dfb13fdac68c",
+  "test_001.zip": "43580e89331826182f41d2ce9f06f62da46617fea9e612a16b2610de8ffdc10b",
+  "test_002.zip": "33551fb68979d3d2f20e1976d9169a84ad58658c459aba4d7a2671c8d66904b9",
+  "test_003.zip": "45ad28d7555e3579225d26dfcb8244b65de0d1ee749560cc6dd84f121b4b40de",
+  "test_004.zip": "d736b56fe15410525deda1c16c0b8da4497383480a4328da92bc0ddb64a62d52",
+  "test_005.zip": "3ae331047019a39c6306a17407c72e40dc15b5113f6f9ef72aba2da0b859ea7d",
+  "test_006.zip": "94341c8ac8ed1d7f11816ad121e6c5821a751fdc3d3122a653c86f7b5845ca80",
+  "test_007.zip": "4e18facbd507e16fc41d90d5c2ce1b44c511d3e2986e1ccdf5d264748d5d7e15",
+  "test_008.zip": "e4d5aa0c25eb01863bbced477e17fddd9d8217d23d238bb06b0b688a0f6ed8e3",
+  "test_009.zip": "5a413411cfc376078ed0357708084f71949159c13119aabb5c9ae1ffde33b6b7",
+  "test_010.zip": "82ea42c7544385aa2d41271e63399534a398dbbef8a06cb990c8bb34296928c8",
+  "train_000.zip": "e9fd9af579b0d08d538551c0ab6f7231a1fd162139667803e672cc0dc8b98b03",
+  "train_001.zip": "65cb438c7a48567f85db8e54109db6c25d2a621fcbd3267c542a8a640e1dad56",
+  "train_002.zip": "c3d9a76a955dd9feb0275837a17133a1d7ee76c963f1c6fa7630deb0ca8209b2",
+  "train_003.zip": "13e108f78c7da1f1c1469dd87fab55a6e4ec79f1fcb6d7d16cc9006a933979f4",
+  "train_004.zip": "171b92a62b46a68f1d89c2326ba67b6433faf087bc1eecc7a947c19d0f90d3e6",
+  "train_005.zip": "75461ffe13cfbd87b4f0f9ffc83002b8381f5a0a212ece18b8961012f865a46e",
+  "train_006.zip": "7546f94817814031a738082e6b30858d0057710af052a88fa505a961b6699886",
+  "train_007.zip": "371dd100b215bcd41129def1c8fd07f974af11a9b3d3b9966ce5d9700b9929ad",
+  "train_008.zip": "313f5c2089c6afc1691edf054e8b2af9eb8b2d91f791153763758c8d91abee48",
+  "train_009.zip": "9cbb9f44bb6b7dcc74f00a51db4d2a8797c95a0d880d63ef1612d3883b16b995",
+  "train_010.zip": "eb158fccc23a4b41358ec94be203f49a677f86626af7a88f0e649454c409c706",
+  "train_011.zip": "f8b3f8c738cdcdbbdf346a4dd78b99883b5d4ab74c11b64ec7b4f8ccd3b68ffc",
+  "train_012.zip": "b364ba9d35d7e55019d3554cf65b295d2358859c222b3b847b0f2cced948cfce",
+  "train_013.zip": "c8a50efbd93e6e422eabf1846dac2d75e81dfcfcd4d785fe18b01526af9695f6",
+  "train_014.zip": "52a768ce76310861cf1fc990ebb8d16f0c27fceff02c12b11638d36ca1c3a927",
+  "train_015.zip": "67bf0ba775948997f5ab3cc810b6d0e8149758334210ace6f5cdfc529fe7d26e",
+  "train_016.zip": "d5b9a26736421d8f330fd5e531d26071531501a88609d29d580b9d56b6bc17a3",
+  "train_017.zip": "5f2d2c93e7944baf1e6d3dee671b12abb7476a75cbd6f572af86fe5c22472fa6",
+  "train_018.zip": "77aa801b6b0359b970466329e4a05b937df94b650228cf4797a2a029606b8e5b",
+  "train_019.zip": "30934c91cc0ae69acef6a89e4a5180686bd04080e2384a8bde5877cbaaadc575",
+  "train_020.zip": "901d5c08705a70053a3e865354a4e7149c35f026b6ed166fee029d829d88c124",
+  "train_021.zip": "f27019ff58e54a004ed2cf2106ed459a31b010ed82d32028b0e196dd365b8b0e",
+  "train_022.zip": "0600346a2ce162f7e9824e90c553b69a656d4731c86d903e300d932ec8ba7600",
+  "train_023.zip": "660d768e4b1bfe742a42ae6ee84f5e91c930789488a7c7f118e5d0edd1f1a010",
+  "train_024.zip": "1f8792002baceaba8f93f93be1bee7c83a48c677e4b2d025b6f0047a796e94cd",
+  "train_025.zip": "0b92b3f41c18fded8fcb7aba44e7d8738750b8155c907924200fdf4dc1718794",
+  "train_026.zip": "4dc401639317527231abfef07221b8d7db2d0950008828104cd1f72092325d05",
+  "train_027.zip": "e8313eaa21163f9dd2ff4558d16b1c9cf4962c2e4c0403d6a315955660a98b14",
+  "train_028.zip": "d73edf1c500b4311795aaae0a03b3bc04a2c266e2a20b27ba9b6e72fb27fd277",
+  "train_029.zip": "c5e4d302c62e693626445aba19638711108049235b0075558e7949b189050c56",
+  "train_030.zip": "506b9ba7a740b0bf84159546f797437a48a24e468cb949f2189e51cf404c6170",
+  "train_031.zip": "f36bb4b77fdb255dae2050884cf59cd3f8e46e77ea2984b4b219b799c4aac089",
+  "train_032.zip": "fddca4efc40ed8d05adf9d519e4fb5b486ac77e8fa08c98d5c4be15867fda8a0",
+  "train_033.zip": "c24d2b5c04f3e90b265fd0762e7ae19fb01a7c1948a4c09451383a9eec9f640f",
+  "train_034.zip": "5828fbd615c4476f6107fe844cbf81632eff2f9c75194cb84d749630d9359e14",
+  "train_035.zip": "7b60fe125fd1a9ba7991e2accd0f2b212968983b4631d43eccff9836a0c35ba8",
+  "train_036.zip": "0f4eaf464a2afc62447a802159b3844487b80e9d1c9b0a7d324b0d5914514d60",
+  "train_037.zip": "ba85a6692d86e48c4c787b334d4384c08b914e4cee7f3d2692dcae1bbac55878",
+  "train_038.zip": "c67b0f5305560d8089bdc2f6212c05256c044e50a715d59b864fbef705bc6b5c",
+  "train_039.zip": "f4b66c9e1360a8d6d8337c94eefb1132d865c2735c6b78ba726a590073174aad",
+  "train_040.zip": "2c64b76d028fcc153f267925b79a24cf3bb0e42cc7716773df2139f5cec5e319",
+  "train_041.zip": "22b1c0ab99a7f8bd0d36c2d2511d3d469cc390776c38132d1e8f1ad7aae5d4ff",
+  "train_042.zip": "8f2afaecb9f90947c9071111fde9c015acfceb432ae0bf94deff3ecd581b26c8",
+  "train_043.zip": "adf7ea7c356339b10b797c49163252704b4e6b0cebcc741d3374f8c9467f6b43",
+  "train_044.zip": "3d0fe4a85fd22ff9c8ed468ca8173d93406a72fadf800d9e6bbf209348cf8965",
+  "train_045.zip": "70874eca6bce66cb7681092755d066968e9c8fc32a266d7c0d2f29c01b2b2669",
+  "train_046.zip": "01adcdbba0a25383e2281ce02a946f6bc824e1b8e16cf88e85a4ad275203884c",
+  "train_047.zip": "50ed632ae330acf60c1b2e22b28fbfab5ccf0e8f23320b2911dcc2d43db048b6",
+  "train_048.zip": "f302984f486df60d7a281e2b0a9b6d32456fc6042eb596cb5ef54ee919ccd7bb",
+  "train_049.zip": "8e8e0a426796f76dfb2d29cb855894fd01cc954b017aa1d06ae1a121fb310088",
+  "train_050.zip": "051f0dd8e612e7073dd20585c42681daeff853a6ee0de6f2e8ff4581cdf4f83b",
+  "train_051.zip": "3f39b3732c32b960aef4bf3f152b1a72195dc4ab4bbc10116a05875ca8d40417",
+  "train_052.zip": "361b9bcd3364c63c8f2814dfacf91489b79c9cedf03ffcb03b3dacfb77cee3a1",
+  "train_053.zip": "f6afe23b3005b1889f76ea9c10ac42f7c4f07cefbe737781229640b834f8ede2",
+  "train_054.zip": "ef993bd657104770df8e07a9d7c8ac1d1c3ac57b91f66796bea97f03e5a01df2",
+  "train_055.zip": "ec0dea8199e1db7bd8e19f85b0d1a9ab9e8fc2be2c5da5b3455f96e074ad7f22",
+  "train_056.zip": "44259829f6832c3dc14b893d5f5b7b6f784a09570f26e9cc9749807a1b05b21e",
+  "train_057.zip": "263b712fe2ded353cb248324305f831d8b14aa0858f005067bb27e88decd7f32",
+  "train_058.zip": "c44fb44365bc4cd8c4c9bb13d70fa9bb290708b7d3fe44fd79c6eed42702ed70",
+  "train_059.zip": "43dd65609afb3992273f914b4d0108187f85eaf1f252f85556f10e40816d5e6c",
+  "train_060.zip": "97b2abe90259f4629d7c1c1cec2427f155252403f5dcfea563e2d1338ae63150",
+  "train_061.zip": "9d8c790d1806659617ddd6dd99ae56388b5eb9f311c47a079ac8fa5df8f44f57",
+  "train_062.zip": "5b4398d6a8709ddf1b050b03b19dfe8aacf3378a4879402f457f12bd97ab99df",
+  "train_063.zip": "05024f1b0671cb3026db0b9e801c9aab000b828784839f970a8ad0bc23125435",
+  "train_064.zip": "b9bba3999971745ea2cdce69c00c49b109ba02c9f3169614d1d229e468bebc68",
+  "train_065.zip": "ff4084dd7c017478b872fd7c9152df5271a7088489d3b86cc21968db272356ef",
+  "train_066.zip": "9d8158fd6691065c1cb76ac36c3be90b065e8848856a66b10475b11e1261dd4d",
+  "train_067.zip": "3e4b9ebef2bdecab5774a72037d9f1f7c40359e6a2d00851c0c40bdd686373c5",
+  "train_068.zip": "a89d53ce7c79af32a659a2a59138568ada1395c56c6063f4f49c1d4e052cf9cd",
+  "train_069.zip": "3f66206486af3f0bfa04ce8f664b6af6aa7fd2ad8ebadd5c75039de8c5ffea91",
+  "train_070.zip": "e8a95aad5f81e7185a7dacb9031a5c27010ec17302e2e35f7f1de3dc88e02a7b",
+  "train_071.zip": "677bf42f8d576c79189cd5af2abf420990368d9c7d768a21a10fc0939dde121f",
+  "train_072.zip": "f8d5ea223dc13663bbaae6c5bbd732db15f1c249e7fe2da44b5a6ba5b7dbf505",
+  "train_073.zip": "3057bda88ebd5bffb0da030d1126e1fb4fed4b5fbfc547dc0be669ece39979c1",
+  "train_074.zip": "f3a01d19e6fedd44679d76ee93051b91b616a55b6b22861db126b8d2bfdba7ce",
+  "train_075.zip": "0faa29f3f712f744e003da29b249896cc770fb9b357e8a4c447eeb6ad2798ce2",
+  "train_076.zip": "d9943f9b72be89dd8f1273bd02133ab24b81e3c3f794e13362a96b0826518696",
+  "train_077.zip": "cfab28d27c1532a91980b65baa4d40c8e13144788b9ae7a4c36ce8b909e51e55",
+  "train_078.zip": "b06277baadbe60b2019d0f7b6ed637b23957b6320797bf4b6b9099dc4df0cc7e",
+  "train_079.zip": "2163ef05752f7a8813fa9cd5661547bc280239fd3bd903b94a8aef37182e9645",
+  "train_080.zip": "13ae6b86afe4aa00ce19f4f7a8df24d11742340c5775fca02f6e1f70cd9a3be7",
+  "train_081.zip": "a2512084c16220e0acd207f5e330dd319a30c3445b5034f2c14f9a65111628a3",
+  "train_082.zip": "d9615ac989465bc85cf990167ce176af55b8affeebb58d5021c215c1f7235c8a",
+  "train_083.zip": "539710fcc33b043dd24499d3987852a35c8a1c5fb75f7530a9caebf57fd5f324",
+  "train_084.zip": "33232eb1d68e493a25126f22e31326b7c1195ea511c332a1413e83a0245bdae6",
+  "train_085.zip": "13e575f24a77278b7de25e3d186f6201692b3e45ed4701b071d5a770c0e1d590"
+}

setup.csh ADDED Viewed

	@@ -0,0 +1,9 @@

+#!/bin/csh
+python -m virtualenv venv
+# -- CUDA 12.6
+pip install torch==2.1.0+cu121 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
+pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"
+python -m pip install pip==24.0
+pip install -r requirements.txt

train.csh ADDED Viewed

	@@ -0,0 +1,34 @@

+#!/bin/csh
+set arg_count = $#argv
+if ( $arg_count >= 1 ) then
+	if ( "$argv[1]" == "-clean" || "$argv[1]" == "-clean_only" ) then
+		echo "[INFO] Killing alll other GPU processes to free up resources."
+		sh -c 'ps | grep python | sed "s/ pts.\+$//g" > .tmp.csh'
+		chmod +x .tmp.csh
+		sed -i "s/^/kill -9 /g" .tmp.csh
+		source .tmp.csh
+		rm -rf .tmp.csh
+		rm -rf debug_rank_*
+		rm -rf dynamicstereo_sf_dr
+	endif
+	if ( "$argv[1]" == "-clean_only" ) then
+		exit 0
+	endif
+endif
+setenv PYTORCH_CUDA_ALLOC_CONF "max_split_size_mb:32,garbage_collection_threshold:0.5,expandable_segments:False"
+setenv CUDA_LAUNCH_BLOCKING 1
+setenv PYTORCH_NO_CUDA_MEMORY_CACHING 1
+setenv CUBLAS_WORKSPACE_CONFIG ":16:8"
+setenv CUDA_VISIBLE_DEVICES 3
+# -- GPU OOM Error when trained with sample_len=8 on kilby.
+python train.py --batch_size 1 \
+  --image_size 480 640 --saturation_range 0 1.4 --num_steps 200000  \
+  --ckpt_path dynamicstereo_sf_dr  \
+  --sample_len 8 --lr 0.0003 --train_iters 8 --valid_iters 8    \
+  --num_workers 28 --save_freq 100  --update_block_3d --different_update_blocks \
+  --attention_type self_stereo_temporal_update_time_update_space --train_datasets dynamic_replica

train.py ADDED Viewed

	@@ -0,0 +1,565 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+import argparse
+import logging
+from pathlib import Path
+from tqdm import tqdm
+import os
+import torch
+import torch.nn as nn
+import torch.optim as optim
+from munch import DefaultMunch
+import json
+from pytorch_lightning.lite import LightningLite
+from torch.cuda.amp import GradScaler
+from train_utils.utils import (
+    run_test_eval,
+    save_ims_to_tb,
+    count_parameters,
+)
+from train_utils.logger import Logger
+from models.core.dynamic_stereo import DynamicStereo
+from models.core.sci_codec import sci_encoder
+from evaluation.core.evaluator import Evaluator
+from train_utils.losses import sequence_loss
+import datasets.dynamic_stereo_datasets as datasets
+class wrapper(nn.Module):
+    def __init__(
+            self,
+            sigma_range=[0, 1e-9],
+            num_frames=8,
+            in_channels=1,
+            n_taps=2,
+            resolution=[480, 640],
+            mixed_precision=True,
+            attention_type="self_stereo_temporal_update_time_update_space",
+            update_block_3d=True,
+            different_update_blocks=True,
+            train_iters=16):
+        super(wrapper, self).__init__()
+        self.train_iters = train_iters
+        self.sci_enc_L = sci_encoder(sigma_range=sigma_range,
+                                     n_frame=num_frames,
+                                     in_channels=in_channels,
+                                     n_taps=n_taps,
+                                     resolution=resolution)
+        self.sci_enc_R = sci_encoder(sigma_range=sigma_range,
+                                     n_frame=num_frames,
+                                     in_channels=in_channels,
+                                     n_taps=n_taps,
+                                     resolution=resolution)
+        self.stereo = DynamicStereo(max_disp=256,
+                                    mixed_precision=mixed_precision,
+                                    num_frames=num_frames,
+                                    attention_type=attention_type,
+                                    use_3d_update_block=update_block_3d,
+                                    different_update_blocks=different_update_blocks)
+    def forward(self, batch):
+        # ---- ---- FORWARD PASS ---- ----
+        # -- Modified by Chu King on 20th November 2025
+        # -- print ("[INFO] batch[\"img\"].device: ", batch["img"].device)
+        # 0) Convert to Gray
+        def rgb_to_gray(x):
+            weights = torch.tensor([0.2989, 0.5870, 0.1140], dtype=x.dtype, device=x.device)
+            gray = (x * weights[None, None, :, None, None]).sum(dim=2)
+            return gray # -- shape: [B, T, H, W]
+        video_L = rgb_to_gray(batch["img"][:, :, 0]).cuda() # ~ (b, t, h, w)
+        video_R = rgb_to_gray(batch["img"][:, :, 1]).cuda() # ~ (b, t, h, w)
+        # -- print ("[INFO] video_L.device: ", video_L.device)
+        # 1) Extract and normalize input videos.
+        # -- min_max_norm = lambda x : 2. * (x / 255.) - 1.
+        min_max_norm = lambda x: x / 255.
+        video_L = min_max_norm(video_L) # ~ (b, t, h, w)
+        video_R = min_max_norm(video_R) # ~ (b, t, h, w)
+        # -- print ("[INFO] video_L.device: ", video_L.device)
+        # 2) If the tensor is non-contiguous and we try .view() later, PyTorch will raise an error:
+        video_L = video_L.contiguous()
+        video_R = video_R.contiguous()
+        # -- print ("[INFO] video_L.device: ", video_L.device)
+        # 3) Coded exposure modeling.
+        snapshot_L = self.sci_enc_L(video_L) # ~ (b, c, h, w) -- c=2 for 2 taps
+        snapshot_R = self.sci_enc_R(video_R) # ~ (b, c, h, w) -- c=2 for 2 taps
+        # -- print ("[INFO] self.sci_enc_L.device: ", next(self.sci_enc_R.parameters()).device)
+        # -- print ("[INFO] snapshot_L.device: ", snapshot_L.device)
+        # 4) Dynamic Stereo
+        output = {}
+        disparities = self.stereo(
+            snapshot_L,
+            snapshot_R,
+            iters=self.train_iters,
+            test_mode=False
+        )
+        n_views = len(batch["disp"][0]) # -- sample_len
+        for i in range(n_views):
+            seq_loss, metrics = sequence_loss(
+                disparities[:, i], batch["disp"][:, i, 0], batch["valid_disp"][:, i, 0]
+            )
+            output[f"disp_{i}"] = {"loss": seq_loss / n_views, "metrics": metrics}
+        output["disparity"] = {
+            "predictions": torch.cat(
+                [disparities[-1, i, 0] for i in range(n_views)], dim=1
+            ).detach(),
+        }
+        return output
+def fetch_optimizer(args, model):
+    """Create the optimizer and learning rate scheduler"""
+    optimizer = optim.AdamW(
+        model.parameters(), lr=args.lr, weight_decay=args.wdecay, eps=1e-8
+    )
+    scheduler = optim.lr_scheduler.OneCycleLR(
+        optimizer,
+        args.lr,
+        args.num_steps + 100,
+        pct_start=0.01,
+        cycle_momentum=False,
+        anneal_strategy="linear",
+    )
+    return optimizer, scheduler
+# -- Modified by Chu King on 20th November 2025
+# -- Take snapshots instead of videos as input.
+# -- def forward_batch(batch, model, args):
+def forward_batch(snapshot_L, snapshot_R, model, args):
+    output = {}
+    disparities = model(
+        # -- batch["img"][:, :, 0],
+        # -- batch["img"][:, :, 1],
+        snapshot_L,
+        snapshot_R,
+        iters=args.train_iters,
+        test_mode=False,
+    )
+    num_traj = len(batch["disp"][0])
+    for i in range(num_traj):
+        seq_loss, metrics = sequence_loss(
+            disparities[:, i], batch["disp"][:, i, 0], batch["valid_disp"][:, i, 0]
+        )
+        output[f"disp_{i}"] = {"loss": seq_loss / num_traj, "metrics": metrics}
+    output["disparity"] = {
+        "predictions": torch.cat(
+            [disparities[-1, i, 0] for i in range(num_traj)], dim=1
+        ).detach(),
+    }
+    return output
+class Lite(LightningLite):
+    def run(self, args):
+        self.seed_everything(0)
+        # ----------------------------------------- Loading Dataset -----------------------------------------------
+        # -- Modified by Chu King on 15th November 2025 to allow quick testing with only 1 training video on the workstation.
+        # -- The number of subframes should be fixed for SCI stereo.
+        eval_dataloader_dr = datasets.DynamicReplicaDataset(
+            # -- split="valid", sample_len=40, only_first_n_samples=1, VERBOSE=False
+            split="valid", sample_len=args.sample_len, only_first_n_samples=1, VERBOSE=False
+        )
+        eval_dataloader_sintel_clean = datasets.SequenceSintelStereo(dstype="clean")
+        eval_dataloader_sintel_final = datasets.SequenceSintelStereo(dstype="final")
+        eval_dataloaders = [
+            ("sintel_clean", eval_dataloader_sintel_clean),
+            ("sintel_final", eval_dataloader_sintel_final),
+            ("dynamic_replica", eval_dataloader_dr),
+        ]
+        evaluator = Evaluator()
+        eval_vis_cfg = {
+            "visualize_interval": 1,  # Use 0 for no visualization
+            "exp_dir": args.ckpt_path,
+        }
+        eval_vis_cfg = DefaultMunch.fromDict(eval_vis_cfg, object())
+        evaluator.setup_visualization(eval_vis_cfg)
+        # ----------------------------------------- Model Instantiation -----------------------------------------------
+        # -- Added by Chu King on 20th November 2025
+        # -- Instantiate the model
+        model = wrapper(sigma_range=[0, 1e-9],
+                        num_frames=args.sample_len,
+                        in_channels=1,
+                        n_taps=2,
+                        resolution=args.image_size,
+                        mixed_precision=args.mixed_precision,
+                        attention_type=args.attention_type,
+                        update_block_3d=args.update_block_3d,
+                        different_update_blocks=args.different_update_blocks,
+                        train_iters=args.train_iters)
+        with open(args.ckpt_path + "/meta.json", "w") as file:
+            json.dump(vars(args), file, sort_keys=True, indent=4)
+        model.cuda()
+        logging.info("count_parameters(model): {}".format(count_parameters(model)))
+        train_loader = datasets.fetch_dataloader(args)
+        train_loader = self.setup_dataloaders(train_loader, move_to_device=False)
+        logging.info(f"Train loader size:  {len(train_loader)}")
+        optimizer, scheduler = fetch_optimizer(args, model)
+        total_steps = 0
+        logger = Logger(model, scheduler, args.ckpt_path)
+        # ----------------------------------------- Loading Checkpoint -----------------------------------------------
+        folder_ckpts = [
+            f
+            for f in os.listdir(args.ckpt_path)
+            if not os.path.isdir(f) and f.endswith(".pth") and not "final" in f
+        ]
+        if len(folder_ckpts) > 0:
+            ckpt_path = sorted(folder_ckpts)[-1]
+            ckpt = self.load(os.path.join(args.ckpt_path, ckpt_path))
+            logging.info(f"Loading checkpoint {ckpt_path}")
+            if "model" in ckpt:
+                model.load_state_dict(ckpt["model"])
+            else:
+                model.load_state_dict(ckpt)
+            if "optimizer" in ckpt:
+                logging.info("Load optimizer")
+                optimizer.load_state_dict(ckpt["optimizer"])
+            if "scheduler" in ckpt:
+                logging.info("Load scheduler")
+                scheduler.load_state_dict(ckpt["scheduler"])
+            if "total_steps" in ckpt:
+                total_steps = ckpt["total_steps"]
+                logging.info(f"Load total_steps {total_steps}")
+        elif args.restore_ckpt is not None:
+            assert args.restore_ckpt.endswith(".pth") or args.restore_ckpt.endswith(
+                ".pt"
+            )
+            logging.info("Loading checkpoint...")
+            strict = True
+            state_dict = self.load(args.restore_ckpt)
+            if "model" in state_dict:
+                state_dict = state_dict["model"]
+            # -- Since we wrapped the model in torch.nn.DataParallel or torch.nn.parallel.DistributedDataParallel,
+            #    PyTorch automatically prefixes all parameter names with "module.":
+            #        state_dict = {
+            #            'module.conv1.weight': tensor(...),
+            #            'module.conv1.bias': tensor(...),
+            #            'module.fc.weight': tensor(...),
+            #            'module.fc.bias': tensor(...),
+            #        }
+            # -- So we need to strip the "module." prefix:
+            if list(state_dict.keys())[0].startswith("module."):
+                state_dict = {
+                    k.replace("module.", ""): v for k, v in state_dict.items()
+                }
+            model.load_state_dict(state_dict, strict=strict)
+            logging.info(f"Done loading checkpoint")
+        # ----------------------------------------- Optimzer, Scheduler -----------------------------------------------
+        model, optimizer = self.setup(model, optimizer, move_to_device=False)
+        model.cuda()
+        model.train()
+        model.module.module.stereo.freeze_bn() # -- We keep BatchNorm frozen
+        save_freq = args.save_freq
+        scaler = GradScaler(enabled=args.mixed_precision)
+        # ----------------------------------------- Training Loop -----------------------------------------------
+        should_keep_training = True
+        global_batch_num = 0
+        epoch = -1
+        while should_keep_training:
+            epoch += 1
+            for i_batch, batch in enumerate(tqdm(train_loader)):
+                optimizer.zero_grad()
+                if batch is None:
+                    print("batch is None")
+                    continue
+                for k, v in batch.items():
+                    batch[k] = v.cuda()
+                assert model.training
+                # ---- ---- FORWARD PASS ---- ----
+                # -- Modified by Chu King on 20th November 2025
+                output = model(batch)
+                loss = 0
+                logger.update()
+                for k, v in output.items():
+                    if "loss" in v:
+                        loss += v["loss"]
+                        logger.writer.add_scalar(
+                            f"live_{k}_loss", v["loss"].item(), total_steps
+                        )
+                    if "metrics" in v:
+                        logger.push(v["metrics"], k)
+                if self.global_rank == 0:
+                    if total_steps % save_freq == save_freq - 1:
+                        save_ims_to_tb(logger.writer, batch, output, total_steps)
+                    if len(output) > 1:
+                        logger.writer.add_scalar(
+                            f"live_total_loss", loss.item(), total_steps
+                        )
+                    logger.writer.add_scalar(
+                        f"learning_rate", optimizer.param_groups[0]["lr"], total_steps
+                    )
+                    global_batch_num += 1
+                self.barrier()
+                # ---- ---- BACKWARD PASS ---- ----
+                self.backward(scaler.scale(loss))
+                scaler.unscale_(optimizer)
+                # -- Prevent exploding gradients in RNNs or very deep networks
+                torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
+                scaler.step(optimizer)
+                scheduler.step()
+                scaler.update()
+                total_steps += 1
+                if self.global_rank == 0:
+                    if (i_batch >= len(train_loader) - 1) or (
+                        total_steps == 1 and args.validate_at_start
+                    ):
+                        ckpt_iter = "0" * (6 - len(str(total_steps))) + str(total_steps)
+                        save_path = Path(
+                            f"{args.ckpt_path}/model_{args.name}_{ckpt_iter}.pth"
+                        )
+                        save_dict = {
+                            "model": model.module.module.state_dict(),
+                            "optimizer": optimizer.state_dict(),
+                            "scheduler": scheduler.state_dict(),
+                            "total_steps": total_steps,
+                        }
+                        logging.info(f"Saving file {save_path}")
+                        self.save(save_dict, save_path)
+                        # ---- ---- EVALUATION ---- ----
+                        if epoch % args.evaluate_every_n_epoch == 0:
+                            # -- Added by Chu King on 21st November 2025
+                            model.eval()
+                            logging.info(f"Evaluation at epoch {epoch}")
+                            run_test_eval(
+                                args.ckpt_path,
+                                "valid",
+                                evaluator,
+                                model.module.module.sci_enc_L,
+                                model.module.module.sci_enc_R,
+                                model.module.module.stereo,
+                                eval_dataloaders,
+                                logger.writer,
+                                total_steps,
+                                resolution=args.image_size
+                            )
+                            # -- Added by Chu King on 20th November 2025 for SCI stereo
+                            model.train()
+                            model.module.module.stereo.freeze_bn()
+                self.barrier()
+                if total_steps > args.num_steps:
+                    should_keep_training = False
+                    break
+        logger.close()
+        # ----------------------------------------- Save models after training -----------------------------------------------
+        # -- Modified by Chu King on 20th November 2025 to save SCI encoders' models.
+        # -- PATH = f"{args.ckpt_path}/{args.name}_final.pth"
+        PATH = f"{args.ckpt_path}/{args.name}_model_final.pth"
+        torch.save(model.module.module.state_dict(), PATH)
+        # ----------------------------------------- Testing -----------------------------------------------
+        # -- Modified by Chu King on 20th November 2025
+        test_dataloader_dr = datasets.DynamicStereoDataset(
+            # -- The number of subframes should be fixed for SCI stereo
+            # -- split="test", sample_len=150, only_first_n_samples=1
+            split="test", sample_len=args.sample_len, only_first_n_samples=1
+        )
+        test_dataloaders = [
+            ("sintel_clean", eval_dataloader_sintel_clean),
+            ("sintel_final", eval_dataloader_sintel_final),
+            ("dynamic_replica", test_dataloader_dr),
+        ]
+        # -- Modifed by Chu King on 21st November 2025
+        model.eval()
+        run_test_eval(
+            args.ckpt_path,
+            "test",
+            evaluator,
+            model.module.module.sci_enc_L,
+            model.module.module.sci_enc_R,
+            model.module.module.stereo,
+            test_dataloaders,
+            logger.writer,
+            total_steps,
+            resolution=args.image_size
+        )
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--name", default="dynamic-stereo", help="name your experiment")
+    parser.add_argument("--restore_ckpt", help="restore checkpoint")
+    parser.add_argument("--ckpt_path", help="path to save checkpoints")
+    parser.add_argument(
+        "--mixed_precision", action="store_true", help="use mixed precision"
+    )
+    # Training parameters
+    parser.add_argument(
+        "--batch_size", type=int, default=6, help="batch size used during training."
+    )
+    parser.add_argument(
+        "--train_datasets",
+        nargs="+",
+        default=["things", "monkaa", "driving"],
+        help="training datasets.",
+    )
+    parser.add_argument("--lr", type=float, default=0.0002, help="max learning rate.")
+    parser.add_argument(
+        "--num_steps", type=int, default=100000, help="length of training schedule."
+    )
+    parser.add_argument(
+        "--image_size",
+        type=int,
+        nargs="+",
+        default=[320, 720],
+        help="size of the random image crops used during training.",
+    )
+    parser.add_argument(
+        "--train_iters",
+        type=int,
+        default=16,
+        help="number of updates to the disparity field in each forward pass.",
+    )
+    parser.add_argument(
+        "--wdecay", type=float, default=0.00001, help="Weight decay in optimizer."
+    )
+    parser.add_argument(
+        "--sample_len", type=int, default=2, help="length of training video samples"
+    )
+    parser.add_argument(
+        "--validate_at_start", action="store_true", help="validate the model at start"
+    )
+    parser.add_argument("--save_freq", type=int, default=100, help="save frequency")
+    parser.add_argument(
+        "--evaluate_every_n_epoch",
+        type=int,
+        default=1,
+        help="evaluate every n epoch",
+    )
+    parser.add_argument(
+        "--num_workers", type=int, default=6, help="number of dataloader workers."
+    )
+    # Validation parameters
+    parser.add_argument(
+        "--valid_iters",
+        type=int,
+        default=32,
+        help="number of updates to the disparity field in each forward pass during validation.",
+    )
+    # Architecure choices
+    parser.add_argument(
+        "--different_update_blocks",
+        action="store_true",
+        help="use different update blocks for each resolution",
+    )
+    parser.add_argument(
+        "--attention_type",
+        type=str,
+        help="attention type of the SST and update blocks. \
+            Any combination of 'self_stereo', 'temporal', 'update_time', 'update_space' connected by an underscore.",
+    )
+    parser.add_argument(
+        "--update_block_3d", action="store_true", help="use Conv3D update block"
+    )
+    # Data augmentation
+    parser.add_argument(
+        "--img_gamma", type=float, nargs="+", default=None, help="gamma range"
+    )
+    parser.add_argument(
+        "--saturation_range",
+        type=float,
+        nargs="+",
+        default=None,
+        help="color saturation",
+    )
+    parser.add_argument(
+        "--do_flip",
+        default=False,
+        choices=["h", "v"],
+        help="flip the images horizontally or vertically",
+    )
+    parser.add_argument(
+        "--spatial_scale",
+        type=float,
+        nargs="+",
+        default=[0, 0],
+        help="re-scale the images randomly",
+    )
+    parser.add_argument(
+        "--noyjitter",
+        action="store_true",
+        help="don't simulate imperfect rectification",
+    )
+    args = parser.parse_args()
+    logging.basicConfig(
+        level=logging.INFO,
+        format="%(asctime)s %(levelname)-8s [%(filename)s:%(lineno)d] %(message)s",
+    )
+    Path(args.ckpt_path).mkdir(exist_ok=True, parents=True)
+    from pytorch_lightning.strategies import DDPStrategy
+    Lite(
+        # -- strategy=DDPStrategy(find_unused_parameters=True),
+        strategy=DDPStrategy(find_unused_parameters=False),
+        devices="auto",
+        accelerator="gpu",
+        precision=32,
+    ).run(args)

train_utils/logger.py ADDED Viewed

	@@ -0,0 +1,67 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+import logging
+import os
+from torch.utils.tensorboard import SummaryWriter
+class Logger:
+    SUM_FREQ = 100
+    def __init__(self, model, scheduler, ckpt_path):
+        self.model = model
+        self.scheduler = scheduler
+        self.total_steps = 0
+        self.running_loss = {}
+        self.ckpt_path = ckpt_path
+        self.writer = SummaryWriter(log_dir=os.path.join(self.ckpt_path, "runs"))
+    def _print_training_status(self):
+        metrics_data = [
+            self.running_loss[k] / Logger.SUM_FREQ
+            for k in sorted(self.running_loss.keys())
+        ]
+        training_str = "[{:6d}] ".format(self.total_steps + 1)
+        metrics_str = ("{:10.4f}, " * len(metrics_data)).format(*metrics_data)
+        # print the training status
+        logging.info(
+            f"Training Metrics ({self.total_steps}): {training_str + metrics_str}"
+        )
+        if self.writer is None:
+            self.writer = SummaryWriter(log_dir=os.path.join(self.ckpt_path, "runs"))
+        for k in self.running_loss:
+            self.writer.add_scalar(
+                k, self.running_loss[k] / Logger.SUM_FREQ, self.total_steps
+            )
+            self.running_loss[k] = 0.0
+    def push(self, metrics, task):
+        for key in metrics:
+            task_key = str(key) + "_" + task
+            if task_key not in self.running_loss:
+                self.running_loss[task_key] = 0.0
+            self.running_loss[task_key] += metrics[key]
+    def update(self):
+        self.total_steps += 1
+        if self.total_steps % Logger.SUM_FREQ == Logger.SUM_FREQ - 1:
+            self._print_training_status()
+            self.running_loss = {}
+    def write_dict(self, results):
+        if self.writer is None:
+            self.writer = SummaryWriter(log_dir=os.path.join(self.ckpt_path, "runs"))
+        for key in results:
+            self.writer.add_scalar(key, results[key], self.total_steps)
+    def close(self):
+        self.writer.close()

train_utils/losses.py ADDED Viewed

	@@ -0,0 +1,158 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+import torch
+# -- Added by Chu King on 23rd November 2025 to check for NaNs
+import math
+import matplotlib.pyplot as plt
+import numpy as np
+def flow_to_rgb(flow):
+    # flow: [2, H, W]
+    u = flow[0]
+    v = flow[1]
+    rad = np.sqrt(u ** 2 + v ** 2)
+    ang = np.arctan2(v, u)
+    hsv = np.zeros((flow.shape[1], flow.shape[2], 3), dtype=np.float32)
+    hsv[..., 0] = (ang + np.pi) / (2 * np.pi)
+    hsv[..., 1] = 1.0
+    hsv[..., 2] = np.clip(rad / np.percentile(rad, 99), 0, 1)
+    rgb = plt.cm.hsv(hsv)
+    return rgb[..., :3]
+def visualize_flow_debug(flow_pred, flow_gt, epe, step=0, save_path="debug"):
+    flow_pred_np = flow_pred.detach().cpu().numpy()
+    flow_gt_np   = flow_gt.detach().cpu().numpy()
+    epe_np       = epe
+    flow_pred0 = flow_pred_np[0, 0, :, :]
+    flow_gt0   = flow_gt_np[0, 0, :, :]
+    epe0       = epe_np
+    fig, axs = plt.subplots(1, 2, figsize=(15, 5))
+    axs[0].imshow(flow_to_rgb(flow_pred0))
+    axs[0].set_title("Predicted Flow")
+    axs[0].axis("off")
+    axs[1].imshow(flow_to_rgb(flow_gt0))
+    axs[1].set_title("Ground Truth Flow")
+    axs[1].axis("off")
+    # -- axs[2].imshow(epe0, cmap="inferno")
+    # -- axs[2].set_title("EPE heatmap")
+    # -- axs[2].axis("off")
+    fig.suptitle(f"STEP = {step}")
+    plt.tight_layout()
+    plt.savefig(f"{save_path}/flow_debug_{step}.png")
+    plt.close()
+def sequence_loss(flow_preds, flow_gt, valid, loss_gamma=0.9, max_flow=700):
+    """Loss function defined over sequence of flow predictions"""
+    n_predictions = len(flow_preds)
+    assert n_predictions >= 1
+    flow_loss = 0.0
+    # exlude invalid pixels and extremely large diplacements
+    mag = torch.sum(flow_gt ** 2, dim=1).sqrt().unsqueeze(1)
+    if len(valid.shape) != len(flow_gt.shape):
+        valid = valid.unsqueeze(1)
+    valid = (valid >= 0.5) & (mag < max_flow)
+    if valid.shape != flow_gt.shape:
+        valid = torch.cat([valid, valid], dim=1)
+    assert valid.shape == flow_gt.shape, [valid.shape, flow_gt.shape]
+    assert not torch.isinf(flow_gt[valid.bool()]).any()
+    for i in range(n_predictions):
+        assert (
+            not torch.isnan(flow_preds[i]).any()
+            and not torch.isinf(flow_preds[i]).any()
+        )
+        if n_predictions == 1:
+            i_weight = 1
+        else:
+            # We adjust the loss_gamma so it is consistent for any number of iterations
+            adjusted_loss_gamma = loss_gamma ** (15 / (n_predictions - 1))
+            i_weight = adjusted_loss_gamma ** (n_predictions - i - 1)
+        flow_pred = flow_preds[i].clone()
+        if valid.shape[1] == 1 and flow_preds[i].shape[1] == 2:
+            flow_pred = flow_pred[:, :1]
+        i_loss = (flow_pred - flow_gt).abs()
+        assert i_loss.shape == valid.shape, [
+            i_loss.shape,
+            valid.shape,
+            flow_gt.shape,
+            flow_pred.shape,
+        ]
+        flow_loss += i_weight * i_loss[valid.bool()].mean()
+    epe = torch.sum((flow_preds[-1] - flow_gt) ** 2, dim=1).sqrt()
+    valid = valid[:, 0]
+    epe = epe.view(-1)
+    epe = epe[valid.reshape(epe.shape)]
+    # -- Added by Chu King to deal with the case when there is no valid disparity.
+    if valid.sum().item() == 0:
+        metrics = {"epe": 0.0, "1px": 0.0, "3px": 0.0, "5px": 0.0}
+    else:
+        metrics = {
+            "epe": epe.mean().item(),
+            "1px": (epe < 1).float().mean().item(),
+            "3px": (epe < 3).float().mean().item(),
+            "5px": (epe < 5).float().mean().item(),
+        }
+    for k, v in metrics.items():
+        if math.isnan(v):
+            print ("[ERROR] Nan detected for k: ", k)
+            if torch.isnan(flow_preds[-1]).any(): print("[WARNING] NaN in flow_preds")
+            if torch.isinf(flow_preds[-1]).any(): print("[WARNING] Inf in flow_preds")
+            if torch.isnan(flow_gt).any(): print("[WARNING] NaN in flow_gt")
+            if torch.isinf(flow_gt).any(): print("[WARNING] Inf in flow_gt")
+            raw_diff = flow_preds[-1] - flow_gt
+            if torch.isnan(raw_diff).any(): print("[WARNING] NaN in flow_diff")
+            sq = (raw_diff ** 2)
+            if torch.isnan(sq).any(): print("[WARNING] NaN in square")
+            sum_sq = torch.sum(sq, dim=1)
+            if torch.isnan(sum_sq).any(): print("[WARNING] NaN in sum")
+            epe = sum_sq.sqrt()
+            if torch.isnan(epe).any(): print("[WARNING] NaN in sqrt")
+            if torch.isinf(epe).any(): print("[WARNING] Inf in sqrt")
+            num_valid = valid.sum().item()
+            print("[INFO] Valid pixels:", num_valid)
+            if num_valid == 0:
+                print("[WARNING]: No valid pixels  metrics will be NaN.")
+            if (epe > 1e6).any():
+                print("[INFP] Large EPE values detected:", epe.max().item())
+            print ("[INFO] Flow pred sample:", flow_preds[-1].view(-1)[:10])
+            print ("[INFO] Flow gt sample:", flow_gt.view(-1)[:10])
+            print ("[INFO] EPE sample:", epe.view(-1)[:10])
+            print ("[INFO] Valid sample:", valid.view(-1)[:10])
+            visualize_flow_debug(flow_preds[-1], flow_gt, v, step=0, save_path="debug")
+            raise SystemExit("Nan detected.")
+    return flow_loss, metrics

train_utils/utils.py ADDED Viewed

	@@ -0,0 +1,180 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+import numpy as np
+import os
+import torch
+import json
+import flow_vis
+import matplotlib.pyplot as plt
+import datasets.dynamic_stereo_datasets as datasets
+from evaluation.utils.utils import aggregate_and_print_results
+def count_parameters(model):
+    return sum(p.numel() for p in model.parameters() if p.requires_grad)
+def run_test_eval(ckpt_path, eval_type, evaluator, sci_enc_L, sci_enc_R, model, dataloaders, writer, step, resolution=[480, 640]):
+    # -- Evalution of real scenes disabled by Chu King on 16th November 2025 as depth data
+    #    are not available.
+    # -- for real_sequence_name in ["teddy_static", "ignacio_waving", "nikita_reading"]:
+    # --     seq_len_real = 50
+    # --     ds_path = f"./dynamic_replica_data/real/{real_sequence_name}"
+    # --     real_dataset = datasets.DynamicReplicaDataset(
+    # --         split="test", root=ds_path, sample_len=seq_len_real, only_first_n_samples=1,
+    # --         VERBOSE=False # -- Added by Chu King on 16th November 2025 for debugging purposes
+    # --     )
+    # --     evaluator.evaluate_sequence(
+    # --         model=model.module.module,
+    # --         test_dataloader=real_dataset,
+    # --         writer=writer,
+    # --         step=step,
+    # --         train_mode=True,
+    # --     )
+    for ds_name, dataloader in dataloaders:
+        evaluator.visualize_interval = 1 if not "sintel" in ds_name else 0
+        evaluate_result = evaluator.evaluate_sequence(
+            sci_enc_L=sci_enc_L,
+            sci_enc_R=sci_enc_R,
+            model=model,
+            test_dataloader=dataloader,
+            writer=writer if not "sintel" in ds_name else None,
+            step=step,
+            train_mode=True,
+            resolution=resolution
+        )
+        aggregate_result = aggregate_and_print_results(
+            evaluate_result,
+        )
+        save_metrics = [
+            "flow_mean_accuracy_5px",
+            "flow_mean_accuracy_3px",
+            "flow_mean_accuracy_1px",
+            "flow_epe_traj_mean",
+        ]
+        for epe_name in ("epe", "temp_epe", "temp_epe_r"):
+            for m in [
+                f"disp_{epe_name}_bad_0.5px",
+                f"disp_{epe_name}_bad_1px",
+                f"disp_{epe_name}_bad_2px",
+                f"disp_{epe_name}_bad_3px",
+                f"disp_{epe_name}_mean",
+            ]:
+                save_metrics.append(m)
+        for k, v in aggregate_result.items():
+            if k in save_metrics:
+                writer.add_scalars(
+                    f"{ds_name}_{k.rsplit('_', 1)[0]}",
+                    {f"{ds_name}_{k}": v},
+                    step,
+                )
+        result_file = os.path.join(
+            ckpt_path,
+            f"result_{ds_name}_{eval_type}_{step}_mimo.json",
+        )
+        print(f"Dumping {eval_type} results to {result_file}.")
+        with open(result_file, "w") as f:
+            json.dump(aggregate_result, f)
+def fig2data(fig):
+    """
+    fig = plt.figure()
+    image = fig2data(fig)
+    @brief Convert a Matplotlib figure to a 4D numpy array with RGBA channels and return it
+    @param fig a matplotlib figure
+    @return a numpy 3D array of RGBA values
+    """
+    import PIL.Image as Image
+    # draw the renderer
+    fig.canvas.draw()
+    # Get the RGBA buffer from the figure
+    w, h = fig.canvas.get_width_height()
+    buf = np.frombuffer(fig.canvas.tostring_rgb(), dtype=np.uint8)
+    buf.shape = (w, h, 3)
+    image = Image.frombytes("RGB", (w, h), buf.tobytes())
+    image = np.asarray(image)
+    return image
+def save_ims_to_tb(writer, batch, output, total_steps):
+    writer.add_image(
+        "train_im",
+        torch.cat([torch.cat([im[0], im[1]], dim=-1) for im in batch["img"][0]], dim=-2)
+        / 255.0,
+        total_steps,
+        dataformats="CHW",
+    )
+    if "disp" in batch and len(batch["disp"]) > 0:
+        disp_im = [
+            (torch.cat([im[0], im[1]], dim=-1) * torch.cat([val[0], val[1]], dim=-1))
+            for im, val in zip(batch["disp"][0], batch["valid_disp"][0])
+        ]
+        disp_im = torch.cat(disp_im, dim=1)
+        figure = plt.figure()
+        plt.imshow(disp_im.cpu()[0])
+        disp_im = fig2data(figure).copy()
+        writer.add_image(
+            "train_disp",
+            disp_im,
+            total_steps,
+            dataformats="HWC",
+        )
+    for k, v in output.items():
+        if "predictions" in v:
+            pred = v["predictions"]
+            if k == "disparity":
+                figure = plt.figure()
+                plt.imshow(pred.cpu()[0])
+                pred = fig2data(figure).copy()
+                dataformat = "HWC"
+            else:
+                pred = torch.tensor(
+                    flow_vis.flow_to_color(
+                        pred.permute(1, 2, 0).cpu().numpy(), convert_to_bgr=False
+                    )
+                    / 255.0
+                )
+                dataformat = "HWC"
+            writer.add_image(
+                f"pred_{k}",
+                pred,
+                total_steps,
+                dataformats=dataformat,
+            )
+        if "gt" in v:
+            gt = v["gt"]
+            gt = torch.tensor(
+                flow_vis.flow_to_color(
+                    gt.permute(1, 2, 0).cpu().numpy(), convert_to_bgr=False
+                )
+                / 255.0
+            )
+            dataformat = "HWC"
+            writer.add_image(
+                f"gt_{k}",
+                gt,
+                total_steps,
+                dataformats=dataformat,
+            )