Diffusers documentation
ZImageTransformer2DModel
ZImageTransformer2DModel
A Transformer model for image-like data from Z-Image.
ZImageTransformer2DModel
class diffusers.ZImageTransformer2DModel
< source >( all_patch_size = (2,) all_f_patch_size = (1,) in_channels = 16 dim = 3840 n_layers = 30 n_refiner_layers = 2 n_heads = 30 n_kv_heads = 30 norm_eps = 1e-05 qk_norm = True cap_feat_dim = 2560 siglip_feat_dim = None rope_theta = 256.0 t_scale = 1000.0 axes_dims = [32, 48, 48] axes_lens = [1024, 512, 512] )
forward
< source >( x: typing.Union[typing.List[torch.Tensor], typing.List[typing.List[torch.Tensor]]] t cap_feats: typing.Union[typing.List[torch.Tensor], typing.List[typing.List[torch.Tensor]]] return_dict: bool = True controlnet_block_samples: typing.Optional[typing.Dict[int, torch.Tensor]] = None siglip_feats: typing.Optional[typing.List[typing.List[torch.Tensor]]] = None image_noise_mask: typing.Optional[typing.List[typing.List[int]]] = None patch_size: int = 2 f_patch_size: int = 1 )
Flow: patchify -> t_embed -> x_embed -> x_refine -> cap_embed -> cap_refine -> [siglip_embed -> siglip_refine] -> build_unified -> main_layers -> final_layer -> unpatchify
patchify_and_embed
< source >( all_image: typing.List[torch.Tensor] all_cap_feats: typing.List[torch.Tensor] patch_size: int f_patch_size: int )
Patchify for basic mode: single image per batch item.
patchify_and_embed_omni
< source >( all_x: typing.List[typing.List[torch.Tensor]] all_cap_feats: typing.List[typing.List[torch.Tensor]] all_siglip_feats: typing.List[typing.List[torch.Tensor]] patch_size: int f_patch_size: int images_noise_mask: typing.List[typing.List[int]] )
Patchify for omni mode: multiple images per batch item with noise masks.