Update README.md
Browse files
README.md
CHANGED
|
@@ -51,7 +51,7 @@ ProFS (Projection Filter for Subspaces) is a tuning-free alignment method that r
|
|
| 51 |
- Theoretically grounded: shown to be a denoised, single-step approximation of Direct Preference Optimization (DPO)—bridging editing-based and tuning-based alignment.
|
| 52 |
|
| 53 |
<div align="center">
|
| 54 |
-
<img src="
|
| 55 |
<i><b>Figure.</b> Schematic of ProFS (previously called DeTox). Toxic directions (in red) are projected out of the model’s MLP-value matrices, leaving other representational directions intact. </i>
|
| 56 |
</div>
|
| 57 |
|
|
|
|
| 51 |
- Theoretically grounded: shown to be a denoised, single-step approximation of Direct Preference Optimization (DPO)—bridging editing-based and tuning-based alignment.
|
| 52 |
|
| 53 |
<div align="center">
|
| 54 |
+
<img src="ProFS Method.png" width="950">
|
| 55 |
<i><b>Figure.</b> Schematic of ProFS (previously called DeTox). Toxic directions (in red) are projected out of the model’s MLP-value matrices, leaving other representational directions intact. </i>
|
| 56 |
</div>
|
| 57 |
|