Commit 45c1841
committed
Use kernel device-specific descriptor to determine max-wg-size for this kernel
This resolves
```
RuntimeError: Exceeded the number of registers available on the hardware.
The number registers per work-group cannot exceed 65536 for this kernel on this device.
The kernel uses 108 registers per work-item for a total of 1024 work-items per work-group.
-54 (PI_ERROR_INVALID_WORK_GROUP_SIZE)
```
when running example:
```python
import dpctl.tensor as dpt
m1 = dpt.ones((1000, 1000), dtype="i4", device="cuda")
m2 = dpt.ones((1000, 1003), dtype="i4", device="cuda")
r = dpt.matmul(m1[:, :900], m2[:900, :])
```1 parent 6efb2c9 commit 45c1841
File tree
1 file changed
+5
-2
lines changed- dpctl/tensor/libtensor/include/kernels/linalg_functions
1 file changed
+5
-2
lines changedLines changed: 5 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1365 | 1365 | | |
1366 | 1366 | | |
1367 | 1367 | | |
| 1368 | + | |
| 1369 | + | |
| 1370 | + | |
1368 | 1371 | | |
1369 | 1372 | | |
1370 | | - | |
1371 | | - | |
| 1373 | + | |
| 1374 | + | |
1372 | 1375 | | |
1373 | 1376 | | |
1374 | 1377 | | |
| |||
0 commit comments