Skip to content

Conversation

@medyagh
Copy link
Member

@medyagh medyagh commented Dec 14, 2025

No description provided.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 14, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: medyagh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from nirs and prezha December 14, 2025 21:29
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Dec 14, 2025
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Dec 14, 2025
./out/minikube start \
--no-kubernetes \
--memory 4gb \
--cpus 2 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 cpus known to work, 2 cpus never worked for me with lima vm.

--cpus 2 \
--memory 2gb \
--driver ${{ matrix.driver }} \
--wait-timeout=15m \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? if we cannot boot with the default timeout (6m) something is broken and we will never boot.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 14, 2025
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 15, 2025
@medyagh
Copy link
Member Author

medyagh commented Dec 15, 2025

on 1200 mb memory I got this serial logs from qemu

I1215 00:17:53.673152   57107 main.go:143] libmachine: Attempt 129
I1215 00:17:53.673202   57107 main.go:143] libmachine: Searching for 7e:96:47:e6:81:cf in /var/db/dhcpd_leases ...
I1215 00:17:55.674463   57107 main.go:143] libmachine: Attempt 130
I1215 00:17:55.674510   57107 main.go:143] libmachine: Searching for 7e:96:47:e6:81:cf in /var/db/dhcpd_leases ...
I1215 00:17:55.674634   57107 main.go:143] libmachine: qemu status: ip lookup still failing (attempt 130)
I1215 00:17:55.674743   57107 main.go:143] libmachine: qemu status: pid=57371 running=true
I1215 00:17:55.676360   57107 main.go:143] libmachine: qemu status: QMP state=running
I1215 00:17:55.676546   57107 main.go:143] libmachine: qemu status: serial.log tail (/Users/runner/.minikube/machines/minikube/serial.log, last 2041 bytes):
[   17.654784] Kernel panic - not syncing: System is deadlocked on memory
[   17.654784] CPU: 0 PID: 25 Comm: kworker/u2:2 Not tainted 6.6.95 #1
[   17.654784] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
[   17.654784] Workqueue: events_unbound async_run_entry_fn
[   17.654784] Call Trace:
[   17.654784]  <TASK>
[   17.654784]  dump_stack_lvl+0x36/0x50
[   17.654784]  panic+0x179/0x330
[   17.654784]  out_of_memory+0x58e/0x590
[   17.654784]  __alloc_pages_slowpath.constprop.0+0xa91/0xd40
[   17.654784]  __alloc_pages+0x30b/0x330
[   17.654784]  folio_alloc+0x15/0x30
[   17.654784]  __filemap_get_folio+0x17b/0x2c0
[   17.654784]  simple_write_begin+0x2e/0x180
[   17.654784]  generic_perform_write+0xd9/0x240
[   17.654784]  generic_file_write_iter+0x65/0xd0
[   17.654784]  __kernel_write_iter+0xda/0x250
[   17.654784]  kernel_write+0xf9/0x1c0
[   17.654784]  xwrite.constprop.0+0x35/0xb0
[   17.654784]  ? __pfx_flush_buffer+0x10/0x10
[   17.654784]  do_copy+0x52/0x1a0
[   17.654784]  flush_buffer+0x3e/0xb0
[   17.654784]  ? __pfx_nofill+0x10/0x10
[   17.654784]  gunzip+0x28b/0x370
[   17.654784]  unpack_to_rootfs+0x175/0x390
[   17.654784]  ? __pfx_error+0x10/0x10
[   17.654784]  ? _printk+0x64/0x80
[   17.654784]  ? do_populate_rootfs+0x7d/0x140
[   17.654784]  do_populate_rootfs+0x7d/0x140
[   17.654784]  async_run_entry_fn+0x2a/0xb0
[   17.654784]  process_one_work+0x14b/0x340
[   17.654784]  worker_thread+0x2f5/0x410
[   17.654784]  ? __pfx_worker_thread+0x10/0x10
[   17.654784]  kthread+0xe8/0x120
[   17.654784]  ? __pfx_kthread+0x10/0x10
[   17.654784]  ret_from_fork+0x34/0x50
[   17.654784]  ? __pfx_kthread+0x10/0x10
[   17.654784]  ret_from_fork_asm+0x1b/0x30
[   17.654784]  </TASK>
[   17.654784] Kernel Offset: 0x2c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   17.654784] ---[ end Kernel panic - not syncing: System is deadlocked on memory ]--

if runtime.GOARCH == "amd64" {
cmdlineParts = append(cmdlineParts, "console=ttyS0")
}
cmdline := strings.Join(cmdlineParts, " ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not work as you probably discovered. The current code is correct and does not need to change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just a debugging pr everything here is gonna be vanished after make it work

run: ./out/minikube stop ${{ env.LOG_ARGS }}
- name: Start minikube again (2nd boot)
run: ./out/minikube start ${{ env.LOG_ARGS }}
run: ./out/minikube start --force ${{ env.LOG_ARGS }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need force since we changed memory to 2g

// Surface qemu state when DHCP discovery is slow or stuck.
if i > 0 && i%10 == 0 {
d.logQEMUStatus(fmt.Sprintf("ip lookup still failing (attempt %d)", i))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of logging, this should check if qemu is running and abort the wait quickly if qemu terminated.

env:
GOPROXY: https://proxy.golang.org
LOG_ARGS: --v=8 --alsologtostderr
LOG_ARGS: --v=10 --alsologtostderr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed, we are missing some logs when waiting for vm to start, we don't have issue with not logging all logs. This will make k8s client logging much verbose for no benefit.

run: |
sudo killall socketfilterfw bootpd
sudo mdutil -a -i off
sudo pkill -f "mds_stores|mds|mdworker_shared|mdworker|spotlightknowledged|Spotlight|photoanalysisd|cloudphotod|mediaanalysisd|analyticsd|mediaanalysisd|mdwrite|corespotlightd|geod|weatherd"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't, I don't do any of these in vment-helper and they have no issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vmnnet helper process doesnt need as much as minikube

} else {
running := checkPid(pid) == nil
log.Infof("qemu status: pid=%d running=%t", pid, running)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This likely replicates d.GetState()

log.Debugf("qemu status: monitor query failed: %v", err)
} else {
log.Infof("qemu status: QMP state=%s", state)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of info we get here? Why do need it every 10 seconds?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just temp debugging to see if the VM is running at all or qemu rnning at all

} else {
log.Infof("qemu status: QMP state=%s", state)
}
d.logSerialTail()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will log the huge serial log every 10 seconds, does not make sense.

}

// logSerialTail emits the tail of the qemu serial log for debugging boot stalls.
func (d *Driver) logSerialTail() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not log qemu serial log in minikube, minikube log is too noisy as is. If we have issues in qemu boot, the serial log will reveal them.

When tests fail, we need to upload minikube and qemu logs as build artifacts for inspection. Keeping everything in one log is messy.

./out/minikube start \
--no-kubernetes \
--cpus 1 \
--memory 1200mb \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause trouble, we try to use 6g of ram when we need only 2g.

@nirs
Copy link
Contributor

nirs commented Dec 15, 2025

on 1200 mb memory I got this serial logs from qemu

I1215 00:17:53.673152   57107 main.go:143] libmachine: Attempt 129
I1215 00:17:53.673202   57107 main.go:143] libmachine: Searching for 7e:96:47:e6:81:cf in /var/db/dhcpd_leases ...
I1215 00:17:55.674463   57107 main.go:143] libmachine: Attempt 130
I1215 00:17:55.674510   57107 main.go:143] libmachine: Searching for 7e:96:47:e6:81:cf in /var/db/dhcpd_leases ...
I1215 00:17:55.674634   57107 main.go:143] libmachine: qemu status: ip lookup still failing (attempt 130)
I1215 00:17:55.674743   57107 main.go:143] libmachine: qemu status: pid=57371 running=true
I1215 00:17:55.676360   57107 main.go:143] libmachine: qemu status: QMP state=running
I1215 00:17:55.676546   57107 main.go:143] libmachine: qemu status: serial.log tail (/Users/runner/.minikube/machines/minikube/serial.log, last 2041 bytes):
[   17.654784] Kernel panic - not syncing: System is deadlocked on memory

...

[ 17.654784] ---[ end Kernel panic - not syncing: System is deadlocked on memory ]--

Good reason to keep 2g.

@medyagh
Copy link
Member Author

medyagh commented Dec 15, 2025

Good reason to keep 2g.

That is coming from inside minikube vm, I think 1200 is not enough to run docker and other things we need (ssh)

I am thinking maybe we should try no-kuberentes with containerd

@medyagh medyagh changed the title ci: add cleanup step for socketfilterfw and bootpd on macOS wip: add cleanup step for socketfilterfw and bootpd on macOS Dec 15, 2025
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 15, 2025
@k8s-ci-robot
Copy link
Contributor

@medyagh: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
integration-vfkit-docker-macos-arm64 3cb1929 link false /test integration-vfkit-docker-macos-arm64
integration-docker-crio-linux-x86-64 3cb1929 link true /test integration-docker-crio-linux-x86-64
integration-kvm-crio-linux-x86-64 3cb1929 link true /test integration-kvm-crio-linux-x86-64
integration-kvm-docker-linux-x86 3cb1929 link true /test integration-kvm-docker-linux-x86
integration-kvm-crio-linux-x86 3cb1929 link true /test integration-kvm-crio-linux-x86
integration-kvm-containerd-linux-x86 3cb1929 link true /test integration-kvm-containerd-linux-x86
integration-docker-docker-linux-arm 3cb1929 link true /test integration-docker-docker-linux-arm
integration-docker-docker-linux-x86 3cb1929 link true /test integration-docker-docker-linux-x86
integration-docker-containerd-linux-x86 3cb1929 link true /test integration-docker-containerd-linux-x86
integration-none-docker-linux-x86 3cb1929 link true /test integration-none-docker-linux-x86
integration-docker-crio-linux-x86 3cb1929 link true /test integration-docker-crio-linux-x86

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants