Conversation
1afee7c to
49d83c2
Compare
5ac6e21 to
ccc8336
Compare
fa7115c to
911f23b
Compare
911f23b to
c4b7087
Compare
065f914 to
d21b806
Compare
f3262aa to
1f391b0
Compare
e1f53d6 to
ad0810d
Compare
b93ade4 to
4a91e82
Compare
Fetch the configured hostname from the device and populate it in the Device status. The hostname is a configuration item (not state), so it is fetched via GetConfig rather than GetState. This implies one additional "unpacked" gNMI call to the switch. As per [1], it is not possible to issue a `GetRequest` message with different types in the set of requested paths. Note that it is possible to leave the `type` empty and thereby get all data (`CONFIG`, `STATE`, and `OPERATIONAL`). We could implement a `client.GetAny()` method for this case and retrieve all data at once since the models referenced `DeviceInfo` do not have excessive data. However, as this is the only case for such an optimization for now, and the `Device` is infrequently reconciled, this does not seem necessary for now. [1] https://github.com/openconfig/reference/blob/master/rpc/gnmi/gnmi-specification.md#331-the-getrequest-message
The `dhcprelay_controller` was not aligned with the design of the other controllers. When the device is locked it should requeue using jitter and also with priority `LockWaitPriorityDefault`. Adds a test to verify that reconciliation is triggered when an interface gets configured (after it reconciles once a pending `vrf` resource is created).
Add a `neighbors` field to the Interface status containing LLDP neighbor information derived from TLVs: chassis ID, port ID, system name, and expiration time based on TTL. As per 52eae24, users can annotate or label an interface resource with expected neighbor information. This can be cross-validated against the LLDP data that is now stored in the status. We change the annotation format to accept either the chassis ID field or the system name. According to the standard [1], the chassis ID itself can represent different types, like the MAC address, the interface name or the chassis component among others (see Sec. "8.5.2.2 Chassis ID subtype" for details). As information like MAC is not always immediately available, we opt for also allowing the user to use the sysName instead. The rationale is that the hostname is typically configured by the operator, the user, or a known provisioning process. As a result, controller will check the value in the annotation for both cases. The exact mechanism is detailed in the next commit. The status includes a validation field summarizing whether the neighbor could be validated and its result. While TTL is a mandatory field in LLDPDUs, we must exclude them from NeighborInfo to avoid continuous reconciliations. This is because its value changes every time we request it, thus changing the status and triggering the next reconciliation. Instead, we propose to compute an ExpirationTime that avoids this issue as it stays constant (more details in the controller implementation). This design slightly deviates from OpenConfig [2]. In there all LLDP information is contained in the `lldp` subtree. We have decided that configuration is provided by the `lldp` resource. However, the adjacency data is put into the `interface` status and retrieved by its controller. This simplifies the design by removing a dependency towards the `lldp` resource. Notice that if a user enables LLDP by means other than the operator, the data will appear in the interface even if the `lldp` resource does not exist. [1] https://ieeexplore.ieee.org/document/7433915 [2] https://openconfig.net/projects/models/schemadocs/yangdoc/openconfig-lldp.html#lldp-interfaces-interface-neighbors-neighbor-id
In commit 7c6c53e we extended the `interface` type to include neighbor information (LLDP adjacencies). Here we instruct the interface controller to retrieve the LLDP adjacency information via the provider and populate the status accordingly. The ExpirationTime is computed only once from TTL and is preserved across reconciliations. We do so to avoid continuous status updates observed in the lab. In that environment there are clock drifts between the device and operator, meaning that the value of the ExpirationTime changed and was triggering unnecessary reconciliations in the process. For readability reasons we truncate the ExpireTime with second granularity. Note that expiring entries are automatically removed from the device. Hence, in the reconciliation we only need to fetch the entries. As per the neighbor validation, we implement it as a non-blocking operation. This means that errors are logged but they don't prevent the reconciliation of the interface. The validation checks first if the resource has a label. The label is used to perform a validation against an interface resource. If this fails, then the controller falls back to check the annotation. The annotation is used to validate neighbors that are not a kubernetes resource (see 52eae24). In this commit we also add a missing watch for the LLDP controller. Now, if the LLDP resource references an interface resource that has not been created, the LLDP resource will be reconciled once the missing dependency is created. This was problematic during bootstraps, as the lldp feature was not installed if a single interface was missing. We also add tests for LLDP operational status degradation, verifying that the controller correctly sets OperationalCondition to False when the device reports LLDP is down, and recovers when it comes back up.
Retrieve LLDP neighbor information while fetching interface status. As of now, this is only performed on physical interfaces with the use case of cabling validation in mind. The GetStatus response now includes a slice of adjacencies with a subset of the Cisco model for NXOS [1]. [1] https://pubhub.devnetcloud.com/media/dme-docs-10-4-3/docs/Discovery%20Protocols/lldp%3AAdjEp/
NXOS accepts multiple names for the same object, e.g., Ethernet1/1 is equivalent to eth1/1. Until now, we accepted these variations. This has become a limitation when retrieving neighboring interfaces via LLDP, where NXOS exports the canonical name (Ethernet1/1). Validation can fail if the user configured the remote interface using the short name (eth1/1): the neighbor validation fetches the K8s resource and compares the short version with the canonical name exported by LLDP. This commit addresses this by enforcing canonical names at the provider. The undesired side-effect is that no TerminalError is raised to the controller, leading to repeated reconciliations with back-off instead.
Update all interface spec.name fields to use the name convention (e.g., Ethernet1/1 instead of eth1/1, loopback0 instead of lo0, port-channel1 instead of po1).
In 2c9f7e2 we enforce a specific convention for the interface names. However, NX-OS rejects the canonical `Loopback<ID>` notation with ERR_FORMAT for the `sourceInterface` and the `anycastSourceInterface` NVE configuration endpoint. Convert to the short form (e.g. lo1) via ShortNameLoopback. Error handling not considered as names have been already validated by the controller.
4a91e82 to
6864a81
Compare
Merging this branch will increase overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. Changed unit test files
|
| PortDescription string `json:"portDescription,omitempty"` | ||
| } | ||
|
|
||
| type NeighborValidation string |
There was a problem hiding this comment.
I think this should have a description and a kubebuilder Enum statement.
| operSt = phys.OperSt | ||
| operMsg = phys.OperStQual | ||
|
|
||
| for _, adj := range lldpAdj.AdjItems.AdjEpList { |
There was a problem hiding this comment.
nit: a preallocation might bring a small benefit here:
lldpAdjacencies = make([]provider.LLDPAdjacency, 0, len(lldpAdj.AdjItems.AdjEpList))| PortID string | ||
| PortIDType uint8 | ||
| PortDescription string | ||
| TTL uint32 |
There was a problem hiding this comment.
Can we make this typed as time.Duration? Otherwise it's not clear from this type, that is represents seconds. But instead of calling it TTLSeconds it would rather just directly go with the time.Duration type here.
| SystemName string `json:"systemName,omitempty"` | ||
| SystemDescription string `json:"systemDescription,omitempty"` | ||
| ChassisID string `json:"chassisId,omitempty"` | ||
| ChassisIDType uint8 `json:"chassisIdType,omitempty"` |
There was a problem hiding this comment.
By kubernetes conventions, we shouldn't use unsigned integers in the api types. Also, I think just expose raw integers without a semantic meaning might be rather confusing. Could we at least add a comment explaining these?
I am opening this PR even though there is still one action item missing: making the provider reject some of the naming schemes for the physical interfaces. I fear this would make this PR even longer than it is. So I will add this in a follow up PR. As for now, validation via labels and annotations work in the lab in both cases.
This is what an interface resource looks like with neighbor validation using an annotation (i skip the label as it is similar):
I have added context in the commit message, but please let me know if something is not clear.