QQ linear #2931

nastya236 · 2025-12-18T23:20:51Z

QQLinear layer

This reverts commit 867e0fc.

python/mlx/nn/layers/quantized.py

awni · 2025-12-27T14:49:19Z

python/mlx/nn/layers/quantized.py

+                mode=self.mode,
+            )
+
+    def train(self):


Can you make that API consistent with Module::train?

I think if you do that we can get rid of the eval override above and just use the base class eval.

Of course, this is a very good point. Now the weights will be quantized on the first qq_linear.__call__() after calling Module::eval(), and likewise dequantized on the first qq_linear.__call__()after calling Module::train(). This seems to be the only way to keep it consistent with the current API without changing Module::train()..

I see why you had to do it this way but I'm not crazy about how it works... I'm wondering if there is a better way to do it.

Basically the behavior that would be good to have is if we do:

qq_module.eval() qq_module.parameters() # should give me the quantized params qq_module.load_weights(quantized_weights) # should be able to load the quantized params

I think that should work and right now it won't.

I think we have some other options:

Break away from the train/eval API and have something like QQLinear.quantize/QQLinear.dequantize which either quantizes/dequantize the module in-place (or maybe returns a copy that is quantized/dequantized)

Change the base class Module to make it easier to override train (e..g call the submodules train as well as setting the local module's _training.

Wdyt?

Fully agree with the desired behavior you described. Between the options, I prefer (2). I would expect model.train() / model.eval() to recursively propagate mode changes through the tree and adding a separate quantize() / dequantize() API would create a parallel mode system that would likely need to be aligned with train / eval as well..

python/mlx/nn/layers/quantized.py

awni

Looks very nice! Just a few more cosmetic comments then we should get it merged!

awni

Awesome, thanks!!

nastya236 added 4 commits December 18, 2025 23:58

qq linear

c1e03b3

qq linear

867e0fc

Revert "qq linear"

592da25

This reverts commit 867e0fc.

corrected the docs

80b9496

nastya236 changed the title ~~Qq linear~~ QQ linear Dec 18, 2025

awni reviewed Dec 19, 2025

View reviewed changes

python/mlx/nn/layers/quantized.py Outdated Show resolved Hide resolved

awni reviewed Dec 19, 2025

View reviewed changes

python/mlx/nn/layers/quantized.py Show resolved Hide resolved

awni reviewed Dec 19, 2025

View reviewed changes

python/mlx/nn/layers/quantized.py Outdated Show resolved Hide resolved

nastya236 added 6 commits December 22, 2025 13:18

Merge branch 'main' into qq-linear

29174de

Merge branch 'main' into qq-linear

fb0484d

set default to None, added eval and train methods

aebd521

to_linear defaults to None

949b0f2

typo

056bc60

re-commit

fac92e0