Skip to content

Comments

Add log type include filter#714

Open
jorbaum wants to merge 27 commits intocloudfoundry:mainfrom
jorbaum:add-log-type-filter
Open

Add log type include filter#714
jorbaum wants to merge 27 commits intocloudfoundry:mainfrom
jorbaum:add-log-type-filter

Conversation

@jorbaum
Copy link
Contributor

@jorbaum jorbaum commented Jan 5, 2026

Description

Addresses #711 :

  • Support both ?include-source-types= and ?exclude-source-types (calling it source type instead of log type as log type is too generic and is already used in this code base)
  • Throw error in case both are configured. Do not throw an error on misconfigured values for the new config option (consistent with previous HTTP query param usage in binding config), but log a warning (new).
  • Only exclude filter shall pass unknown/nil source type, include filter will discard

I still need to verify this, but I think a feature flag is not necessary as I tried to make string parsing more efficient by:

  • running only when source type tag and log type filter is present
  • strings.IndexByte might take some time, but it is used only with a single char / and strings.IndexByte() should be more efficient than strings.Index()
  • Looking up content in a map logTypePrefixes rather than in each log line, which should be more efficient

Related PRs:

Disclaimer: Claude helped me in crafting this code.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Testing performed?

  • Unit tests
  • Integration tests
  • Acceptance tests

Checklist:

  • This PR is being made against the main branch, or relevant version branch
  • I have made corresponding changes to the documentation
  • I have added testing for my changes

Copy link
Contributor

@chombium chombium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorbaum Thanks for the changes.

Generally it looks good, but I have some smaller objections which should be addressed.

DrainData DrainData `json:"type,omitempty"`
OmitMetadata bool
InternalTls bool
LogFilter *LogTypeSet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this has to be a pointer.

Copy link
Contributor Author

@jorbaum jorbaum Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that at first, but in manager.go line 45 Binding is used as a map key:

	sourceDrainMap    map[string]map[syslog.Binding]drainHolder

My understanding is that maps in Go are not comparable, so adding LogTypeSet (which is a map[LogType]struct{}) to the Binding struct makes it non-comparable.

AFAICS the alternative to using a pointer would be to implement a custom comparable key for the map, but using a pointer seemed simpler to me.

default:
// Unknown log type, skip it
// TODO add unit test for unknown log type
d.log.Printf("Unknown log type '%s' in log type filter, ignoring", logType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a log message here and an app owner wrongly configures the drain, than for every app instance and every app log, a log ,message will be written.

The filter settings should be validated upfront and we should not end up in this situation. At this point we should only have valid drain configuration.
I guess we have to bring in #633 first and than this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that this code does not run for every log message, but only parsing the syslog drain config. I intentionally left logging out of filtering_drain_writer.go to avoid creating a log for each app log.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with @chombium we concluded that yhis log gets created for each wrongly configured value in the log types of each syslog drain. This is emitted every few seconds on each VM within cloud foundry. While this is better than for each log line, we would still keep log load at a minimum.

To address this I now made sure to log at most one log line in this method.

set := make(syslog.LogTypeSet, len(logTypes))

for _, logType := range logTypes {
logType = strings.ToLower(strings.TrimSpace(logType))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query string in the URL is case sensitive. We should only accept lowercase and not convert eVerY POssibLE string to lower case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did so.

Still, I wonder if we should be strict here about this. Not caring about casing might bring us some benefits as this might be something that users accidentally misconfigure. It should not hurt much to support both I think.

@chombium
Copy link
Contributor

chombium commented Jan 9, 2026

@ZPascal Wdyt about this PR? It addresses #711

@jorbaum
Copy link
Contributor Author

jorbaum commented Jan 12, 2026

@chombium I left a bunch of replies for points that are not yet clear to me. All the rest look good.

Thanks for the thorough review! You found a bunch of nice things.

Copy link
Contributor Author

@jorbaum jorbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worked in some comments.

set := make(syslog.LogTypeSet, len(logTypes))

for _, logType := range logTypes {
logType = strings.ToLower(strings.TrimSpace(logType))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did so.

Still, I wonder if we should be strict here about this. Not caring about casing might bring us some benefits as this might be something that users accidentally misconfigure. It should not hurt much to support both I think.

Copy link
Contributor Author

@jorbaum jorbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worked in most comments. Still 2 left to clarify.

default:
// Unknown log type, skip it
// TODO add unit test for unknown log type
d.log.Printf("Unknown log type '%s' in log type filter, ignoring", logType)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with @chombium we concluded that yhis log gets created for each wrongly configured value in the log types of each syslog drain. This is emitted every few seconds on each VM within cloud foundry. While this is better than for each log line, we would still keep log load at a minimum.

To address this I now made sure to log at most one log line in this method.

@jorbaum
Copy link
Contributor Author

jorbaum commented Jan 19, 2026

Feel free to take another look at it.

@jorbaum jorbaum force-pushed the add-log-type-filter branch 3 times, most recently from 108740d to ef16bed Compare February 11, 2026 14:33
@jorbaum jorbaum marked this pull request as ready for review February 16, 2026 10:18
@jorbaum jorbaum requested a review from a team as a code owner February 16, 2026 10:18
Copy link
Contributor

@chombium chombium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jorbaum. Thanks for the improvements. Unfortunately, I've found few more problematic things.

return true
}

if sourceTypeTag == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source_type is always set. Why is this check needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expect(err).To(HaveOccurred())
})

Context("when source_type tag is missing", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a source type be missing for log envelopes?
Please read the Loggregator API spec for Log Envelopes.

Copy link
Contributor Author

@jorbaum jorbaum Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU envelope v1 defines it as optional, see https://github.com/cloudfoundry/dropsonde-protocol/tree/master/events#logmessage . Also in v2 envelopes source_type is merely a tag and therefore optional (also in protobuf file). I therefore think it is a good idea to handle this case even though I understand in practice this (currently) does not happen.

for _, sourceType := range sourceTypes {
sourceType = strings.TrimSpace(sourceType)
t, _ := syslog.ParseSourceType(sourceType)
set.Add(t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if a source type is invalid? Shouldn't that throw an error and mark the drain as invalid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is implemented in filtered_binding_fetcher's getUnknownSourceTypes(). It might make sense to write common function that is called in both filtered_binding_fetcher and binding_config .

}

func (d *DrainParamParser) getLogFilter(u *url.URL) *syslog.LogFilter {
includeSourceTypes := u.Query().Get("include-source-types")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it would be better if we add log in the names of the parameters, so that they have clear descriptive names. Maybe something like include-log-source-types and exclude-log-source-types will be better...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did so. No strong opinion from my side. However, I personally do not like this makes the query params quite long.

Are you sure we should make it that long?

},
{
name: "include-source-types=app",
url: "https://test.org/drain?include-source-types=app",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be types=APP. The urls are case sensitive. As the values of the source types are defined uppercase, I'm not sure if lower case should be accepted for the values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorbaum
Copy link
Contributor Author

jorbaum commented Feb 18, 2026

Thanks for the review. I replied to a bunch of comments for some clarifying remarks. Also a quick discussion about uppercase/lowercase config values would help me.

Copy link
Contributor Author

@jorbaum jorbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed hopefully all comments.

I got a bunch of things to clarify. Biggest question that is left is likely if we should recommend users uppercase or lowercase values.


// ParseSourceType parses a string into a SourceType value
func ParseSourceType(s string) (SourceType, bool) {
lt := SourceType(strings.ToUpper(s))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @chombium that we would like to be lenient here and normalize internally.

I am not so sure if we would like to suggest users and normalize internally to uppercase or lowercase here.

  • Reason for lowercase as default: all other values for drain query params are lowercase, so we would follow this convention
  • Reason for uppercase as default: source type is defined as uppercase in cloud foundry, so we would follow this convention

}

func (d *DrainParamParser) getLogFilter(u *url.URL) *syslog.LogFilter {
includeSourceTypes := u.Query().Get("include-source-types")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did so. No strong opinion from my side. However, I personally do not like this makes the query params quite long.

Are you sure we should make it that long?

},
{
name: "include-source-types=app",
url: "https://test.org/drain?include-source-types=app",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorbaum jorbaum force-pushed the add-log-type-filter branch from ef16bed to ecab32a Compare February 20, 2026 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

2 participants